9.6 Simulating Aggregates
One of the key considerations in designing this approach is to keep data requirements low.
A segment of customers at a mid-sized financial institution could have millions of customers. Although we can derive accurate threshold estimates if this entire dataset were available, this may impose prohibitive costs in terms of storage, compute and speed.
To get realistic estimates of transaction aggregates from a sample of customers, we
implement the following algorithm.
- Obtain monthly transaction aggregates from a sample of focal entities (1 % of customer segment or 25,000 whichever is bigger). If the scenario being tuned is customer focused, account focused or external entity focused, the aggregates should be at a customer level, account level or external entity level respectively. Only transaction aggregates relevant to the scenario being tuned are used.
- Use an outlier detection technique (IQR or percentile based) to trim outliers that may skew estimates.
- Linearly scale the aggregates to get the aggregates for the appropriate lookback.
- Fit a generative model to this data.
- Sample new observations from this model that approximately captures the real behavior of the segment.