Submitting a data mining run

In this section ShowHide

MGPS run parameters
Logistic regression run parameters

After naming a data mining run that you are creating, you confirm run parameters and submit the run.

1. On the Confirm Run Parameters page, review the parameters chosen for your run to make sure that they are correct. Different parameters display for MGPS runs and LR runs.

2. To change any parameters, click Back until you are on the appropriate page and make the necessary changes. Then click Next until you are back on the Confirm Run Parameters page

3. When you are satisfied with the run parameters, click Submit.

The application notes that the run has been submitted.

4. Click Continue.

The run status appears on the Data Mining Runs page. To monitor the progress of a run, you can view jobs for the run.

All data mining runs are batch jobs, which continue to run on the Empirica Signal server even if you log out of the application. The next time you log in to the Empirica Signal application, you can check on the Data Mining Runs tab to determine if your run has completed. If you selected the Email me when complete run option, the application notifies you by email when your run completes.

MGPS run parameters

Field	Description
Type	MGPS.
Name	Name supplied for the run.
Description	Description supplied for the run.
Project	Name of the project to which the run is assigned.
Configuration	Name of the data configuration used for the run.
Configuration description	Description of the data configuration used for the run.
As of date	As Of date for the run, if the run is for timestamped data.
Database restriction	Database restriction, if any, associated with the run.
Item variables	Names of the item variables to be used in the run.
Drug Hierarchy	Name and version of the drug hierarchy used by this run if the data configuration specifies a drug hierarchy.
Event Hierarchy	Name and version of the event hierarchy used by this run if the data configuration specifies an event hierarchy.
Custom terms	Custom terms, if any, specified for the run.
Stratification variables	Stratification variables, if any, to be used for the run.
Subsets	Subset variable, if any, as well as whether the subsets are cumulative, the order of subsets, and the subset labels and values.
Highest dimension	The maximum number of ways in which items are combined. See Specifying data mining parameters.
Minimum count	Minimum number of cases in which a combination of items must occur in order for the combination to be included in the run's MGPS computations. See Specifying data mining parameters.
Calculate PRR	Whether the run includes PRR computations.
Calculate ROR	Whether the run includes ROR computations.
Base counts on cases	For a run that includes PRR and ROR computations, indicates if counts are based on cases rather than drug-event combinations.
Use "all drugs" comparator	For a run that includes PRR computations, indicates whether the drug of interest are included in the comparator set.
Apply Yates correction	For a run that includes PRR computations, indicates whether the Yates correction is applied.
Stratify PRR and ROR	For a run that includes PRR and ROR computations, indicates whether the PRR or ROR computations are stratified.
Include IC	Whether the run includes Information Component computations.
Include RGPS	Whether the run includes RGPS computations.
Calculate RGPS interactions	Whether the run includes Drug+Drug RGPS interaction scores.
Minimum interaction count	Minimum number of times that a drug must appear in Drug+Event reports for the application to calculate RGPS interaction estimates for the drug.
Fill in hierarchy values	Whether the run option to use hierarchy information was checked.
Limit results to	Limitations, if any, on which results will be kept based on statistical thresholds or specified values of item variables. See Specifying data mining parameters.
Exclude single itemtypes	Whether the run excludes combinations of items of the same type. See Specifying data mining parameters.
Fit separate distributions	Indicates the run's setting for the advanced parameter to fit separate distributions for the different item type combinations.
Save intermediate files	Whether intermediate processing files for the run are saved. See Defining data mining run options.
Source database	Information about the source data (from the source description table).
Scheduled to run	Date and time at which the run is scheduled to be run. See Defining data mining run options.

Logistic regression run parameters

Field	Description
Type	LR. For runs completed prior to the installation of Empirica Signal version 7.1, LR (Legacy) appears.
Name	Name supplied for the run.
Description	Description supplied for the run.
Project	The name of the project to which the run is assigned.
LR type	Indicates the algorithm type selected for the run: standard or extended. See Logistic regression computations. For runs completed prior to the installation of Empirica Signal version 7.1, an Extended logistic regression field appears instead, with a Yes or No value.
Configuration	Name of the data configuration used for the run.
Configuration description	Description of the configuration used for the run.
As of date	As Of date for the run, if the run is for timestamped data.
Database restriction	Database restriction, if any, associated with the run.
Item variables	Names of the run's selected event and drug variables.
Custom terms	Custom terms, if any, specified for the run.
Covariates	Variables, if any, selected as covariates for the run.
Drug values	Explicitly specified values of the drug variable included in the run, even if they do not meet the minimum number of times a drug must occur in combination with specified events.
Event values	Values of the event variable used in the run.
Minimum count	Minimum number of cases in which a drug must occur in combination with specified events in order to be included in the run (except for drugs specifically selected as Drug values). See Selecting drugs for logistic regression.
Number of events	Number of event values specified.
Save intermediate files	Indicates whether intermediate processing files for the run are saved. See Defining data mining run options.
Run interactions	Indicates whether the run calculates statistics for two predictors (such as Drug+Drug or Drug+Covariate) and a response.
Save coefficients	Indicates whether the lr_coefficients.log file produced for the run include the coefficient and standard deviation values calculated for the run.
Source database	Information about the source data (from the source description table).
Scheduled to run	Date and time at which the run is scheduled to be run. See Defining data mining run options.