9.6.1 Fitting and Sampling from a Generative Model

This section describes fitting and sampling from a generative model

A multi-variate normal distribution is used to model the transaction aggregates. This model was chosen for the following reasons.

To get realistic estimates of transaction aggregates from a sample of customers, we implement the following algorithm.
  1. The model we choose has to be flexible enough to model the dependencies between various transaction aggregates. For example, Amounts and Counts tend to be correlated; similarly, Credits and Debits could be correlated. For this reason, a multi-variate model was chosen.
  2. Assuming the customers in a segment behave homogeneously, we can assume their transactions are drawn from the same distribution. Given the transaction aggregates are just sum of these transactions, the transaction aggregates can be assumed to be approximately normal.

Even so, the marginal distributions of certain aggregates may not be normal. For this reason, the marginal are transformed using Box Cox transforms to ensure they are normal.

Once the model is fit, and samples are drawn from this model, an inverse box cox transformation is applied to reverse transform the samples to the original scale.