What is a data mining run?
A data mining run is a three-step process: 1) extraction of data from source data; 2) application of statistical algorithms to the extracted data; and 3) generation of statistical values.
When these statistical values, or scores, exceed a predetermined threshold, they alert reviewers to a potential safety signal. The two primary types of data mining runs are MGPS runs and logistic regression runs. Within a two-dimensional MGPS run, you can include:
- Information Component (IC) computations
- Regression-Adjusted GPS Algorithm (RGPS) computations
- Proportional Reporting Ratio (PRR) computations
- Reporting Odds Ratio (ROR) computations
Data mining runs are batch jobs that Oracle Empirica Signal submits to a queue for processing. Depending on the number of processors on the Oracle Empirica Signal server, it is possible for several data mining runs to run at the same time. However, Oracle Empirica Signal queues RGPS jobs so that two do not run simultaneously. The run submitter can request that the run be performed immediately, when a processor is available, or at a scheduled date and time. When a user submits a data mining run, the run appears on the Data Mining Runs home page and can be published to other users, even if it has not started running. A user can also cancel, delete, or re-run a data mining run. The data mining runs that appear on the Data Mining Runs home page and the tasks that you can perform for specific runs depend on your user permissions and the publication level of runs.
To create a data mining run, you must select a data configuration that determines the source data on which the run is based. You can base multiple runs on the same configuration. You then specify the type of run and other run parameters. When the run is complete, Oracle Empirica Signal can display the results in tabular or graphical format. When reviewing the results, users with appropriate permissions can drill down on counts to view case details.
- Configurations based on the source database are created.
- To select cases of interest, create query-based case series.
- Summary and detail reports can be produced for the cases in a case series.
- Report output can also be reviewed in a graphical form
- Data mining runs are created based on the configurations. Runs use different parameters.
- The results reviewer selects a run.
- Narrow down the results to review by specifying drugs and events of interest, such as one drug and all events related to a body system.
- The results reviewer can view the results in a table or in a graph.
- The results reviewer can drill down to see a list of the cases involved.
- The reviewer can drill down further to details for individual cases in a case series for further review in reports and graphs.
- Configuration 1: Allows subsetting by year
- Query 1: Event Date> 2002. Outcome = Hospitalized
- Reports
-
- Run 1: Subset by year. Minimum count = 1
- Run 3: Subset by year. Minimum count = 5
- Data mining results tab: Select Run: Run 1, Run 1, Run 3, Run 4. Select drug: Oxycodone. Select PT: Hyrarchy Level: PT< HLP, HLGT, SOC. Vascular disorder.
- Results show 628 rows sorder by EBGM_IND, PT_plus_Narrow_Alg_SMQ
- List of cases. Case detail: 23634
- Configuration 2: Allows subsetting by gender
- Query 2: Sex is M or F. Age is 45. Body system is Nerv.
- Reports
- Drug: Oxycodone (Subset: [1977Q1]-[20212Q3])
- List of cases. Case detail: 23634
Parent topic: Initiate and monitor data mining runs