5.1.1 Adding data in sandbox
Introduction
After creating the sandbox workspace, users must populate it with relevant data from the production schema. This data population step ensures that the sandbox contains representative and consistent data needed for configuring and executing stress testing projects.
The data added typically includes source data from the atomic schema such as instrument-level exposures, reference dimensions, time series variables, lookup tables, and any other information required to compute metrics, run models, and analyze portfolios. Populating the sandbox with accurate and complete data is essential for meaningful simulations and valid output results.
Although the system provides a default data ingestion pipeline as part of sandbox creation, user intervention is still required to initiate the population process. Users can define whether to overwrite existing data (truncate and insert) or append to it. They can also apply global or table-level filters, specify SQL conditions (for example, based on MISDATE), and optimize performance using JDBC properties and rejection thresholds.
This activity is not fully automated and is not triggered automatically when a stress testing project is initiated. It must be explicitly executed by the user. However, once a sandbox is populated, it can be reused for multiple stress testing cycles, as long as the data remains relevant. If fresh data is required for a new reporting period or simulation scenario, users must manually re-populate the sandbox with the updated dataset.
Performing this step ensures that the sandbox environment mirrors the necessary production data landscape, enabling accurate testing, validation, and analysis.