About data transformations

In this section Hide

A data transformation is a means of mapping text values to new values, converting date values, or organizing numeric or date values into groups. The source data remains unchanged, and the transformed values are available for selection when you create data mining runs.

You specify data transformations for individual variables within a data configuration. The results of the transformation do not affect the actual source data.

For example, suppose that the source data contains a date variable RCVD_DATE. You want to allow run creators to select half-years as subset categories. You can use a data transformation to convert the date values for the RCVD_DATE variable to new half-year values, such as 2003H1 and 2003H2. A data mining run creator can then use the half-year variable for subsetting, and select different half-year values as subset categories. The source safety data continues to store only date values, but the run results refer to the half-year values instead of the full-date values.

Transformation options

To transform data in a data configuration, you can do the following:

When you define data transformations, it is preferable to include a new data configuration variable for the transformed data. For example, if the RCVD_DATE variable exists, and you want to transform its data, keep the RCVD_DATE variable as is in the data configuration and add a new variable, such as RCVD_HALFYR, to store the transformed data.

An additional transformation option allows you to upload a list of synthetic values that are in the source data. For example, you could upload the names of Standardized MedDRA Queries (SMQs) to be ignored by MGPS when estimating the shrinkage parameters for EBGM scores. Note that custom terms are ignored automatically for the estimation of shrinkage parameters.

Note: The raw RR scores for combinations involving custom terms and the excluded synthetic values are shrunk by the Bayesian formula, but they do not participate in the determination of the formula itself.

For any one data configuration variable, you can only specify one data transformation type. However, you can add variables that reference the same column in the source database. You can also specify a different transformation for each of those variables. For example, suppose that the source database has a RCVD_DATE column. You could create a RCVD_HALFYR variable to convert date values and a RCVD_RANGE variable that groups received dates according to date cutpoints.

Transformation results

When you define data transformations that map text values, define variable cutpoints, or convert date values, note that:

Note: These restrictions do not apply to a synthetic values transformation.