Define subset values

When you have selected a subset variable for an MGPS run, you must specify subsets, with each subset representing one or more values of the subset variable. For example, if you use Received Year as the subset variable, you can create one subset for each year except 1998, 1999, and 2000, which you group together into one subset. Oracle Empirica Signal generates a set of data mining results for each subset that you specify.

The source of subset values is either the source database or values that have been defined by data transformations in the data configuration. For example, if the source database contains report received dates, the creator of a configuration can transform the received dates into years, half-years, or quarter-years.

If a value for the subset variable is missing from the source data, the value of your user preference, Replace Missing Values with, appears in place of a value.

Note:

When specifying subsets, keep in mind that there is an iteration of MGPS for each subset. The more subsets you specify, the longer the MGPS run takes.
  1. In the left navigation pane, hover on the Data Analysis icon (Data Analysis icon), then click Data Mining Runs.
  2. At the top left of the Data Mining Runs home page, click Create Run.
  3. On the Create Data Mining Run page, select the type of run to create and click Next.
  4. On the Select "As Of" Date page, choose the latest date and time or select Other to choose a date and time. Then click Next.
  5. On the Select Variables page, select at least one item variable by clicking its name, and, optionally, select Stratification variables.
  6. From the Available values list, select the variables for subsetting and use the arrow keys to move values from the Available values list to the Subset values list:
    Button Use To

    Double Forward arrow button

    Create a single subset containing all values that are selected in the Available values list, for example, one subset for 1998,1999,2000.

    Forward arrow button

    Create a subset for each of the values that is selected in the Available Values list, for example, a separate subset for each of 2007 and 2008.

  7. To include data from all previous subset categories in the results for each subset, select Subsets will be cumulative. Typically, a cumulative subset variable represents an ordinal value, such as report year. A non-cumulative subset variable may represent a categorical value, such as age group.
  8. If you checked Subsets will be cumulative, click one of the following under Subsets will be ordered:
    • As listed —Generates data mining results for subsets in the order of the subsets.
    • As reverse of listed —Generates data mining results for subsets in the reverse of the specified sort order.

    The sort order is important for cumulative subsetting. The computations to determine results for each subset include data from all previous subsets. (The sort order also affects the order of columns in map graphs.) For example, the listed order of subsets is:

    • 2004
    • 2005
    • 2006

    If you use cumulative subsets (as listed), there are subsets for:

    • 2004
    • 2004 and 2005 combined
    • 2004, 2005, and 2006 combined

    If you reverse the listed order, there are subsets for:

    • 2006
    • 2005 and 2006 combined
    • 2004, 2005, and 2006 combined

    See Define subset labels for more examples.

  9. Click Next and continue with Define subset labels.