Step 4 for MGPS runs: Specify data mining parameters

After you have selected item variables for data mining, the Data Mining Parameters page appears.

  1. In the left navigation pane, hover on the Data Analysis icon (Data Analysis icon), then click Data Mining Runs.
  2. At the top left of the Data Mining Runs home page, click Create Run.
  3. On the Create Data Mining Run page, select the type of run to create and click Next.
  4. (For timestamped data) On the Select "As Of" Date page, choose the latest date and time or select Other to choose a date and time. Then click Next.
  5. On the Select Variables page, select an item variable and, optionally, stratification variables. Define custom terms and subsets if you wish then click Next.
  6. Specify the MGPS data mining parameters shown in the table below.
  7. Click Next to enter the data mining run options.

Parameter descriptions

Field Description

Specify highest dimension

Maximum number of ways that the item variables you selected are combined during the run. For more information, see Dimensions and patterns.

  • 2—Provide a count for each item variable (one dimension), and generate scores for each item variable+item variable combination (two dimensions).

    Note:

    If you plan to include RGPS calculations in the run, you must set the dimension to 2.
  • 3—Provide a count for each item variable (one dimension), generate scores for each item variable+item variable combination (two dimensions), and generate scores for each item variable+item variable+item variable combination (three dimensions).

    Note:

    Three-dimensional runs require very large amounts of space for intermediate storage and to save the final results (potentially hundreds of gigabytes). Use of other data mining parameters, including the minimum count, to reduce the computational effort required for these runs, and restricting results to those results you plan to analyze, is strongly recommended.

Specify minimum count

How many times an item variable combination must occur in the data for the combination to be included in the MGPS computations.

For example, if you specify a minimum count of 5, only combinations that occur in at least 5 cases appear.

The purpose of the minimum count is to decrease computational time and space requirements. The higher the minimum count, the shorter the computational time.

When you specify the minimum count, consider whether or not you are using a subset variable. For example, suppose that the subset categories are report years 2005, 2006, and 2007 and that there are 4 drug-event combinations for DrugA+Rash in 2005, 3 new ones in 2006, and 5 new ones in 2007. If the minimum count is 5, only the 2007 results show any DrugA+Rash combinations. If you are using cumulative subsetting, both 2006 and 2007 show DrugA+Rash combinations.

If you specify a minimum count of less than 5, or the run uses cumulative subsetting, a warning message appears. In these situations, the results table takes a very long time to generate and can be very large in size.

Note:

To supply a default setting for this field, you can set the Minimum Count user preference.

Include drug/event hierarchies' primary paths in run results

Whether to include hierarchy levels in the results.

For example, if the data mining run is for drug and Preferred Term (PT) combinations, and the MedDRA Version 12.0 hierarchy is being used, the results include the HLT, HLGT, and SOC representing the PT's primary path in MedDRA Version 12.0, and this will allow you to view results easily for PTs in a higher level of the hierarchy. This option is only available if the data configuration for the run is set up to use a drug or event hierarchy.

Fit separate distributions for the different item type combinations

Whether to take item types into account during MGPS processing.

Observed Count (N), Expected Count (E), and Relative Reporting Ratio (RR) values are not affected, but EBGM and other statistical values are affected. The statistical values for each combination are based on the counts and statistical computations of other occurrences of the same type of combination. For example, in computing EBGM values for Drug+Event combinations, only the distribution of RR values for Drug+Event combinations is considered. The distributions of RR values for Drug+Drug or Event+Event combinations are not considered.

Note:

This parameter does not affect the computation of PRR or ROR.

Include calculations for PRR and ROR

Whether to include PRR and ROR computations in the run. If you select this option, the following additional options appear:

  • Base counts on cases rather than drug-event combinations.
  • For PRR, include drug of interest in the comparator set.
  • Apply the Yates correction in computing the value of chi-squared.
  • Use stratified computation for PRR and ROR.

For more information, see PRR computations and ROR computations.

Include calculations for Information Component

Whether to include Information Component computations in the run. For more information, see Information Component computations.

Include calculations for RGPS

Whether to include RGPS computations in the run (available only when the highest dimension is 2, and the run does not include subsets). If you select this option, the Specify minimum count field is set to 1 when RGPS computations are included, and the Restrictions of results will have a default value of N>=1. The following additional option appears:

Include Interactions—Whether to include calculations for Drug+Drug interactions using RGPS. If you select this option, set the Specify minimum interaction count field. The default value for the Specify minimum interaction count field is 25. Interaction estimates are calculated for a drug if the number of Drug+Event reports exceeds the specified minimum interaction count.

For more information, see RGPS computations.

EBGM >=

N >=

EB05 >=

PRR >=

Whether to limit the results based on signal scores. For each statistic you select, specify a limit as an integer or floating point number.

Combinations for which the signal score is less than the specified limit are not kept in the results. This limit considers all combination types that are included in the results table.

Note:

You can also restrict the results using these values on the Specify Criteria page when you are viewing results.

Top <n> combinations based on <score>

Whether to retain results only for the highest scoring combinations. If you select this option, do the following:

  1. Specify the number of combinations that you want to include.
  2. In the drop-down list select EBGM or EB05.

Only the top scoring combinations of the statistic you selected are included in the results.

Exclude associations if items are of the same type (all drugs, all events)

Whether to exclude combinations of item variables of the same type in the results, for example, Drug+Drug. Oracle recommends that you select this option.

Note:

To supply a default setting for this field, you can set the Exclude associations if items are of the same type (all drugs, all events) user preference.

Keep only results that include the following values of <item variable>

Restricts the results such that Oracle Empirica Signal retains only results that include certain values for the item variables. To restrict the results, do one of the following:

  • Use the links to the right of each item variable to select the values that you want to include.
    • Select Available Values—Includes custom terms for the run.
    • Select Saved List—Includes custom terms for the run.
    • Select < hierarchy > Terms —Does not include custom terms for the run. For example, if an item variable uses the MedDRA hierarchy and the Enable adverse event hierarchy browser user preference is enabled, you can select only MedDRA terms.
    • Trade Generic Lookup—For data configurations that include trade name-generic name mapping. You can find the generic name for a drug if you know its trade name, or vice versa.
  • Type the values that you want to include separated by commas.

    Note:

    To prevent spelling and capitalization errors, Oracle recommends that you select rather than type values.
    Computations during the run include all item variable values. However, if you restrict the results, only results that include the values that you selected are retained. For example, if you select DrugA, Rash, and Vertigo as values to include, the displayed results include scores for combinations that meet the following criteria:
  • If there are drugs in the combination, at least one of them is DrugA.
  • If there are events in the combination, at least one of them is Rash or Vertigo.

If you do not restrict the results, results for all values of the item variables are retained.