About data transformations

A data transformation is a means of mapping text values to new values, converting date values, or organizing numeric or date values into groups. The source data remains unchanged, and the transformed values are available for selection when you create data mining runs.

You specify data transformations for individual variables within a data configuration. The results of the transformation do not affect the actual source data.

For example, suppose that the source data contains a date variable RCVD_DATE. You want to allow run creators to select half-years as subset categories. You can use a data transformation to convert the date values for the RCVD_DATE variable to new half-year values, such as 2019H1 and 2019H2. A data mining run creator can then use the half-year variable for subsetting, and select different half-year values as subset categories. The source safety data continues to store only date values, but the run results refer to the half-year values instead of the full-date values.

Transformation options

To transform data in a data configuration, you can do the following:

When you define data transformations, it is preferable to include a new data configuration variable for the transformed data. For example, if the RCVD_DATE variable exists, and you want to transform its data, keep the RCVD_DATE variable as is in the data configuration and add a new variable, such as RCVD_HALFYR, to store the transformed data.

An additional transformation option allows you to upload a list of synthetic values that are in the source data. For example, you could upload the names of Standardized MedDRA Queries (SMQs) to be ignored by MGPS when estimating the shrinkage parameters for EBGM scores. Note that custom terms are ignored automatically for the estimation of shrinkage parameters.

Note:

The raw RR scores for combinations involving custom terms and the excluded synthetic values are shrunk by the Bayesian formula, but they do not participate in the determination of the formula itself.

For any one data configuration variable, you can only specify one data transformation type. However, you can add variables that reference the same column in the source database. You can also specify a different transformation for each of those variables. For example, suppose that the source database has a RCVD_DATE column. You could create a RCVD_HALFYR variable to convert date values and a RCVD_RANGE variable that groups received dates according to date cutpoints.

Transformation results

When you define data transformations that map text values, define variable cutpoints, or convert date values, note that:

  • Variables that transform data are not available for creating queries or defining reports.
  • Variables that map text values are available for creating saved lists. The mapped values, however, do not match the source data. As a result, the saved list does not pass validation. You can save and use the saved list, even though it does not pass validation.
  • Transformed variables are not available as the drug variable, event variable, or as additional covariates in a logistic regression run.
  • If included in the drilldown map table, a variable that transforms data shows the source data, rather than the transformed data, when users drill down to a list of cases or to view case details.
  • If you define a data transformation for a variable in a data configuration, for which there are existing runs that use the variable, users cannot drill down correctly in the run results until the runs are re-executed.

Note:

These restrictions do not apply to a synthetic values transformation.