Defining subset values

When you have selected a subset variable for an MGPS run, you must specify subsets, with each subset representing one or more values of the subset variable. For example, if you use Received Year as the subset variable, you can create one subset for each year except 1998, 1999, and 2000, which you group together into one subset. The application generates a set of data mining results for each subset that you specify.

The source of subset values is either the source database or values that have been defined by data transformations in the data configuration. For example, if the source database contains report received dates, the creator of a configuration can transform the received dates into years, half-years, or quarter-years.

If a value for the subset variable is missing from the source data, the value of your user preference, Replace Missing Values with, appears in place of a value.

Note: When specifying subsets, keep in mind that there is an iteration of MGPS for each subset. The more subsets you specify, the longer the MGPS run takes.

1.         Select a subset variable.

The Define Subset Values page appears. The Available Values list shows values of the variable in the source data.

2.         Use the arrow keys to move values from the Available values list to the Subset values list:


Button

Use To

=>

Create a single subset containing all values that are selected in the Available values list, for example, one subset for 1998,1999,2000.

->

Create a subset for each of the values that is selected in the Available Values list, for example, a separate subset for each of 2007 and 2008.

3.         To include data from all previous subset categories in the results for each subset, check Subsets will be cumulative. Typically, a cumulative subset variable represents an ordinal value, such as report year. A non-cumulative subset variable may represent a categorical value, such as age group.

4.         If you checked Subsets will be cumulative, click one of the following under Subsets will be ordered:

  1. As listed—Generates data mining results for subsets in the order of the subsets.
  2. As reverse of listed—Generates data mining results for subsets in the reverse of the specified sort order.

The sort order is important for cumulative subsetting. The computations to determine results for each subset include data from all previous subsets. (The sort order also affects the order of columns in map graphs.) For example, the listed order of subsets is:

If you use cumulative subsets (as listed), there are subsets for:

  1. 2004
  2. 2004 and 2005 combined
  3. 2004, 2005, and 2006 combined

If you reverse the listed order, there are subsets for:

See Defining subset labels for more examples.

5.         Click Next.

6.         Continue with Defining subset labels.