Subset variables

When you use a subset variable for an MGPS data mining run, Oracle Empirica Signal divides source data into separate backgrounds, one for each value of the subset variable, and performs a separate run for each of those different backgrounds. You use subsets to study how the strength of a combination of variables evolved temporally. You can also use subsets to study the strength of variable combinations for age differences, gender differences, and so on. For example, to review separate signal scores for males and for females, you subset the data mining run by gender. Two separate sets of statistics result, one set computed against the background of cases involving males and the other against the background of cases with females.

You can select only one subset variable for a data mining run. That variable must have only one value for each case in the database. For example, indication, route, or dose cannot be used directly as subset variables because any case with more than one drug has multiple values for these variables, one for each reported drug. After selecting a subset variable, choose which values of that variable to use for subsetting. The application generates a set of results for each subset.

Note:

You cannot select the same variable as both an item variable and a subset variable.

Example

The database contains information on the age of patients reporting adverse reactions. The age has been categorized as Pediatric, Adult, and Elderly, and stored in a variable named AgeGroup. To compare associations of variables for different age categories, you can select AgeGroup as the subset variable (assuming that AgeGroup has been made available as a subset variable by the configuration).

With subsetting, the application computes expected counts, Relative Ratio, and EBGM for each subset separately:

The application computes expected counts, Relative Ratio, and EBGM for each subset separately.

For a data mining run that uses cumulative subsets, there is one set of results for each subset and the computations to determine results for each subset include data from all previous subsets. A previous subset may or may not be previous chronologically. For more information, see Define subset values. Typically, a variable used for cumulative subsetting represents a time period.

For example, suppose that you use Report Year as the subset variable and the possible report years are 2002 through 2004:

For a data mining run that uses cumulative subsets, there is one set of results for each subset.