Step 4 for MGPS runs: Specify data mining parameters
After you have selected item variables for data mining, the Data Mining Parameters page appears.
- In the left navigation pane, hover on the Data Analysis icon (), then click Data Mining Runs.
- At the top left of the Data Mining Runs home page, click Create Run.
- On the Create Data Mining Run page, select the type of run to create and click Next.
- (For timestamped data) On the Select "As Of" Date page, choose the latest date and time or select Other to choose a date and time. Then click Next.
- On the Select Variables page, select an item variable and, optionally, stratification variables. Define custom terms and subsets if you wish then click Next.
- Specify the MGPS data mining parameters shown in the table below.
- Click Next to enter the data mining run options.
Parameter descriptions
Field | Description |
---|---|
Specify highest dimension |
Maximum number of ways that the item variables you selected are combined during the run. For more information, see Dimensions and patterns.
|
Specify minimum count |
How many times an item variable combination must occur in the data for the combination to be included in the MGPS computations. For example, if you specify a minimum count of 5, only combinations that occur in at least 5 cases appear. The purpose of the minimum count is to decrease computational time and space requirements. The higher the minimum count, the shorter the computational time. When you specify the minimum count, consider whether or not you are using a subset variable. For example, suppose that the subset categories are report years 2005, 2006, and 2007 and that there are 4 drug-event combinations for DrugA+Rash in 2005, 3 new ones in 2006, and 5 new ones in 2007. If the minimum count is 5, only the 2007 results show any DrugA+Rash combinations. If you are using cumulative subsetting, both 2006 and 2007 show DrugA+Rash combinations. If you specify a minimum count of less than 5, or the run uses cumulative subsetting, a warning message appears. In these situations, the results table takes a very long time to generate and can be very large in size. Note: To supply a default setting for this field, you can set the Minimum Count user preference. |
Include drug/event hierarchies' primary paths in run results |
Whether to include hierarchy levels in the results. For example, if the data mining run is for drug and Preferred Term (PT) combinations, and the MedDRA Version 12.0 hierarchy is being used, the results include the HLT, HLGT, and SOC representing the PT's primary path in MedDRA Version 12.0, and this will allow you to view results easily for PTs in a higher level of the hierarchy. This option is only available if the data configuration for the run is set up to use a drug or event hierarchy. |
Fit separate distributions for the different item type combinations |
Whether to take item types into account during MGPS processing. Observed Count (N), Expected Count (E), and Relative Reporting Ratio (RR) values are not affected, but EBGM and other statistical values are affected. The statistical values for each combination are based on the counts and statistical computations of other occurrences of the same type of combination. For example, in computing EBGM values for Drug+Event combinations, only the distribution of RR values for Drug+Event combinations is considered. The distributions of RR values for Drug+Drug or Event+Event combinations are not considered. Note: This parameter does not affect the computation of PRR or ROR. |
Include calculations for PRR and ROR |
Whether to include PRR and ROR computations in the run. If you select this option, the following additional options appear:
For more information, see PRR computations and ROR computations. |
Include calculations for Information Component |
Whether to include Information Component computations in the run. For more information, see Information Component computations. |
Include calculations for RGPS |
Whether to include RGPS computations in the run (available only when the highest dimension is 2, and the run does not include subsets). If you select this option, the Specify minimum count field is set to 1 when RGPS computations are included, and the Restrictions of results will have a default value of N>=1. The following additional option appears: Include Interactions—Whether to include calculations for Drug+Drug interactions using RGPS. If you select this option, set the Specify minimum interaction count field. The default value for the Specify minimum interaction count field is 25. Interaction estimates are calculated for a drug if the number of Drug+Event reports exceeds the specified minimum interaction count. For more information, see RGPS computations. |
EBGM >= N >= EB05 >= PRR >= |
Whether to limit the results based on signal scores. For each statistic you select, specify a limit as an integer or floating point number. Combinations for which the signal score is less than the specified limit are not kept in the results. This limit considers all combination types that are included in the results table. Note: You can also restrict the results using these values on the Specify Criteria page when you are viewing results. |
Top <n> combinations based on <score> |
Whether to retain results only for the highest scoring combinations. If you select this option, do the following:
Only the top scoring combinations of the statistic you selected are included in the results. |
Exclude associations if items are of the same type (all drugs, all events) |
Whether to exclude combinations of item variables of the same type in the results, for example, Drug+Drug. Oracle recommends that you select this option. Note: To supply a default setting for this field, you can set the Exclude associations if items are of the same type (all drugs, all events) user preference. |
Keep only results that include the following values of <item variable> |
Restricts the results such that Oracle Empirica Signal retains only results that include certain values for the item variables. To restrict the results, do one of the following:
If you do not restrict the results, results for all values of the item variables are retained. |
Parent topic: Create a Data Mining Run