Data mining results for MGPS runs

Data mining results for an MGPS run appear in the Results Table page. If you included RGPS interaction computations in the run, the interaction scores appear in the View Interactions pop-up window. For more information, see Viewing a data mining results table.

By default, results tables display a limited set of columns. However, you can add or remove columns as needed. To view, print, or download the tables, see About tables.

The following sections describe the results of MGPS runs.

Note: You can rest the cursor on a column heading to display a description of the column.

Results table

Column

Description

<variable-name>

Each variable on which data mining was performed. Typically, there is a drug variable and an event variable.

If the run used drug or event hierarchies, the Empirica Signal application adds a column for each hierarchy level above the term for which data mining was performed to the results table. The primary path in the hierarchy is represented in these columns. For example, if the event variable is PT, the results include HLT, HLGT, and SOC columns showing the primary path of the PT.

When you display results for more than two dimensions, the application identifies columns for variables of the same type by adding a numeric suffix to the column name. For example, when you display results for the pattern D+E+E, you see:

PT1

HLT1

HLGT1

SOC1

PT2

HLT2

HLGT2

SOC2

Note: For runs with three dimensions, these columns cannot be used to sort the results table. The actual column names in the underlying table are not the same as the labels for fields on the Select Criteria page or as column headers in the results table. The actual column names are ITEM1, ITEM2, and so on.

DIM

Dimension. The number of items combined.

E

The expected number of cases with the combination. For a 2-dimensional (2D) run, computed as:

(Observed # cases with ITEM1/Total # cases ) x (Observed # cases with ITEM2/Total # cases) x Total # cases

For a stratified MGPS run, calculated as total of expected values for all the strata, where the expected number of cases for each stratum is computed as:

(Observed # cases with ITEM1 for stratum/Total # cases for stratum) x (Observed # cases with ITEM2 for stratum/Total # cases for stratum) x Total # cases for stratum

Overall Expected = Total of Expected values for all strata

When E < .001, displays in scientific notation.

For a 3-dimensional (3D) MGPS run, the calculations include ITEM3. For 3D runs, the calculation of E depends on the mix of item types in the set of items. If the set of items includes at least two different types and also includes at least two items of the same type (such as D+D+E), then E is the result of MGPS interaction calculations. In this model, E incorporates observed within-item-type associations, and uses only the assumption of cross-item-type independence in the computation of E.

EB05

There is approximately a 5% probability that the true Relative Ratio lies below this value.

EB95

There is approximately a 5% probability that the true Relative Ratio lies above this value.

The interval from EB05 to EB95 is the 90% confidence interval.

EBGM

Empirical Bayesian Geometric Mean. A more stable estimate than RR. This so-called shrinkage estimate is computed as the geometric mean of the posterior distribution of the true Relative Ratio.

EBMAX

Applies to runs with more than 2 dimensions. For each 3D itemset, EBMAX is the largest 2D EBGM among all included cross-item-type 2D combinations. If the itemset is homogeneous, so that there are no included cross-item-type combinations, EBMAX is the largest EBGM among all the included 2D combinations. The 2D combination for which EBGM = EBMAX is specified by the columns MAXITEM1 and MAXITEM2.

ERAM

For runs that include RGPS computations, the Empirical-Bayes Regression-adjusted Arithmetic Mean. This estimate is computed as the shrunken observed-to-expected reporting ratio adjusted for covariates and concomitant drugs.

ER05

There is approximately a 5% probability that the ERAM lies below this value.

The interval from ER05 to ER95 is the 90% confidence interval.

ER95

There is approximately a 5% probability that the ERAM lies above this value.

The interval from ER05 to ER95 is the 90% confidence interval.

EXCESS

A conservative estimate of how many extra cases were observed above what was expected. Computed as:

(EB05 - 1) x E

EXCESS2

Applies only to runs with more than 2 dimensions. A conservative estimate of how many extra cases were observed over what was expected assuming that only a single cross-item interaction is present. Computed as:

E x EB95MAX * (INTSS - 1) = E * (EB05 - EB95MAX)

where EB95MAX is the largest 2D EB95 among all included cross-item-type combinations.

F

Applies only to runs created by Oracle, with the advanced parameter, Do comparative analysis, selected.

IC

For runs that include Information Component computations, the IC value:

IC = log2 ((O + α1) / (E + α2))

where:

  •  α1= α2= 1/2

  • O is the observed count.

  • E is the expected count.

IC025

There is approximately a 2.5% probability that the IC lies below this value.

The interval from IC025 to IC975 is the 95% confidence interval.

IC975

There is approximately a 2.5% probability that the IC lies above this value.

The interval from IC025 to IC975 is the 95% confidence interval.

ID

Identifies the unique row number assigned as Oracle loads data from the run's output files.

INTSS

Interaction Signal Score. Applies only to 3D runs. Essentially, this is a way of measuring of the strength of a higher-order association above and beyond what would be expected from any of the component pairs of items of different types. Computed as:

EB05 / EB95MAX

where EB95MAX is the largest 2D EB95 among component pairs of items of different types.

JOB_ID

The identifier assigned by the listener to the sub-job in the run.

MAXITEM1

First item determining the 2D combination for which EBGM = EBMAX.

MAXITEM1P

Prefix, if any, assigned to the variable MAXITEM1.

MAXITEM2

Second item determining the 2D combination for which EBGM = EBMAX.

MAXITEM2P

Prefix, if any, assigned to the variable MAXITEM2.

N

Observed number of cases with the combination of items. You can click the value of N to display a menu from which you can drill down to view or download case information or run reports. (The same options are available when you click Row menu for the row.)

P_<variable-name>

Prefix, if any, assigned to the variable in the data configuration on which the run is based. The alphabetical ordering of prefixes determines which of the underlying ITEM column names is used for each of the item variables in the results table. For example, if the prefix for the drug variable is D and the prefix for the event variable is E, values for the drug variable appear in the ITEM1 column and values for the event variable in the ITEM2 column.

P_VALUE

For runs that include the calculation of PRR, the probability that chi-square is as large as or larger than the value in the PRR_CHISQ column by chance alone if there is no causal relationship or consistent association between the drug and the event. Small values display in scientific notation.

PRR

For runs that include the calculation of PRR, the Proportional Reporting Ratio for the combination of a particular drug and particular event.

PRR_A

For runs that include the calculation of PRR, the observed count for the combination of a particular drug and particular event. This can be the number of combinations or number of cases, depending on the PRR options used in the run.

PRR_B

For runs that include the calculation of PRR, the observed count for the combination of a particular event and all other drugs. This can be the number of combinations or number of cases, depending on the PRR options used in the run.

PRR_C

For runs that include the calculation of PRR, the observed count for the combination of a particular drug and all other events. This can be the number of combinations or number of cases, depending on the PRR options used in the run.

PRR_D

For runs that include the calculation of PRR, the observed count for the combination of all other drugs and all other events. This can be the number of combinations or number of cases, depending on the PRR options used in the run.

PRR_CHISQ

For runs that include the calculation of PRR, the chi-square of PRR. For more information, see PRR computations.

Q

Intermediate computation in the calculation of EBGM. Defined as the posterior probability that the combination is in component 1 of the mixture prior distribution estimated by the empirical Bayes algorithm.

ROR05

For runs that include the calculation of ROR, the lower 5% confidence limit for ROR.

ROR95

For runs that include the calculation of ROR, the upper 5% confidence limit for ROR.

The interval from ROR05 to ROR95 is the 90% confidence interval.

ROR

For runs that include the calculation of ROR, the Reporting Odds Ratio. Computed as:

ROR = (PRR_A * PRR_D) / (PRR_B * PRR_C)

With stratification, ROR is computed as a weighted average of the ROR within each stratum. For more information, see ROR computations.

ROW_NUM

Identifies the row number assigned in one of the output files. The value in this column may not be unique in the results table for a given run.

RR

Relative Ratio. (The same as N/E.) Observed number of cases with the combination divided by the expected number of cases with the combination. This is a sampling estimate of the true Relative Ratio (that would be observed if the database were much larger, but drawn from the same conceptual population of reports) for the particular combination of drug and event. RR is computed as:

Observed # cases with ITEM1+ITEM2 pair / Expected # cases with ITEM1+ITEM2 pair

For a stratified MGPS run, RR is computed as:

Observed # cases with ITEM1+ITEM2 pair for all strata / Final Expected # cases with ITEM1+ITEM2 pair

For a 3D MGPS run, the computations include ITEM3.

SUBSET

If an MGPS run includes a subset variable to categorize (subset) results, displays the label for the subset that applies to each combination.

Data in the following columns (ending in _IND) are computed using the MGPS independence model and are provided to support runs completed in versions of the Empirica Signal application prior to version 5.0:


Column

Description

E_IND

The expected number of cases with the combination, as calculated by the MGPS independence model. Computed as:

(Observed # cases with ITEM1/Total # cases ) x (Observed # cases with ITEM2/Total # cases) x Total # cases

For a stratified MGPS run, calculated as total of expected values for all the strata, where the expected number of cases for each stratum is computed as:

(Observed # cases with ITEM1 for stratum/Total # cases for stratum) x (Observed # cases with ITEM2 for stratum/Total # cases for stratum) x Total # cases for stratum

Overall Expected = Total of Expected values for all strata

When E_IND < .001, displays in scientific notation.

For a 3D MGPS run, the calculations include ITEM3.

E2D_DIV_E_IND

Ratio computed as: E2D_IND/E_IND

E2D_DIV_F_IND

Ratio computed as: E2D_IND/F

E2D_IND

The expected count based on the all-2-factor log linear model.

EB05_IND

A value such that there is approximately a 5% probability that the true Relative Ratio lies below it, where Relative Ratio is defined by the MGPS independence model.

EB95_IND

A value such that there is approximately a 5% probability that the true Relative Ratio lies above it, where Relative Ratio is defined by the MGPS independence model.

The interval from EB05_IND to EB95_IND may be considered to be the 90% confidence interval.

EBGM_IND

Empirical Bayesian Geometric Mean, as computed by the MGPS independence model. A more stable estimate than RR; the so-called shrinkage estimate.

EBGMDIF_IND

Applies to 3D runs. Essentially, this is a way of measuring of the strength of a higher-order association above and beyond what would be expected from previously computed component two-factor associations. Computed by the MGPS independence model as:

EBGM_IND - E2D_IND/E_IND

EXCESS_IND

A conservative estimate of how many extra cases were observed above what was expected. Computed by the MGPS independence model as:

(EB05_IND - 1) x E_IND

EXCESS2_IND

Applies to 3D runs. An estimate of how many extra cases were observed over what was expected using the all-2-factor model. Computed by the MGPS independence model as:

(EBGM_IND x E_IND) E2D_IND

where E2D_IND is the expected count based on the all-2-factor log linear model.

RR_IND

Relative Ratio computed by the MGPS independence model. (The same as N/E_IND.) Observed number of cases with the combination divided by the expected number of cases with the combination. This may be viewed as a sampling estimate of the true value of observed/expected for the particular combination of drug and event. RR is computed as:

Observed # cases with ITEM+-ITEM2 pair / Expected # cases with ITEM1+ITEM2 pair 

For a stratified MGPS run, RR is computed as:

Observed # cases with ITEM1+ITEM2 pair for all strata / Final Expected # cases with ITEM1-ITEM2 pair

For a 3D MGPS run, the computations include ITEM3.

Interactions Table

Note: Interaction estimates are calculated for a pair of drugs only if for both drugs the number of drug+event reports exceeds the minimum interaction count for the run.

Only drugs and events that you select as criteria appear in the table of interaction results.

To include only interaction scores that exceed a minimum threshold in the table:

1.         In the View Interactions pop-up window, click Columns and Rows.

2.         In the Limit to field, select a statistic and type a minimum value for the statistic.

By default, this limit is set to N12 > 0.0.

Note: To include reports in which Item1 and Item2 did not occur with the Event, you can set the limit to a negative number such as -1.0.

If you enable the Include SQL WHERE Clause for Advanced Results Selection user preference, you can specify a SQL WHERE clause to limit the results.

 

Column

Description

Item1

Value of the first predictor in the interaction, for example, Influenza Vaccine.

Item2

Value of the second predictor in the interaction, for example, Anthrax Vaccine.

Event

Response associated with the interaction, for example, Back Pain.

ERAM1

RGPS logistic regression odds ratio for Item1. This value is an exponential function of the logistic regression coefficient.  

ERAM2

RGPS logistic regression odds ratio for Item1. This value is an exponential function of the logistic regression coefficient.

NREPORTS12

Observed number of cases reporting Item1 and Item2.

N12

Observed number of cases reporting Item1, Item2, and Event.

E12

Expected value of N12 under the null hypothesis that both Item1 and Item2 have no relative ratio reporting effect.

This value is adjusted for covariates and concomitant drugs using the RGPS prediction formula.

When E < .001, the value displays in scientific notation.

INTRR

Ratio of observed (N12) to expected (E12) reports adjusted for the larger of the disproportionality scores for Item1 and Item2:

N12/(E12*max(ERAM1,ERAM2))

INTEB

Signal score that indicates the strength of the interaction between Item1 and Item2. This value is a shrunken INTRR value.

INT05

Lower 5% confidence limit for INTEB.

INT95

Upper 95% confidence limit for INTEB.