Historical Data Extracts
When using Machine Learning Anomaly Scoring with interval measurements, the measurement data being processed includes enough historical data for the machine learning models to perform the anomaly scoring calculations, but when using Machine Learning Anomaly Scoring with scalar measurements, there is a chance that the models will not have access to sufficient historical data to perform those calculations.
In this case, you can extract historical data to supplement the cache of historical data used in the anomaly scoring calculations. This process supports extracts of both interval and scalar data.
Required Configuration
Extracting historical data requires that the Enable ML-Based Validation flag be set yo "YES" for all measuring component type for which data is to extracted.
Batch Controls
Extract of data and creation of extract files is performed by the Initial Extract for ML Cache (D1-INIML) batch process.
This batch process uses the following parameters:
- Start Date/Time: The start date and time of the data extract
- Number of Days: The number of days to extract (the process creates a separate file for each day)
- Interval/Scalar: Flag indicating which type of data to extract, interval (D1IN) or scalar (D1SC)
- File Path: The file path to where the process will post the file so that it can be ingested into the historical data cache used by the machine learning models. Refer to Referencing URIs (LINK) in the user documentation for more details.
- File Limit: The number of initial measurements to be included in the file (default is 10,000)
This process extracts data for ALL measuring components where the Enable ML-Based Validation flag on the Measuring Component Type is set yo "YES".
File Creation and Naming
Because this process can potentially create very large extract files, the files are created with a controllable sizes. Data is extracted in the Adapter Development Kit Native Format, and each file will only contain initial measurements for a single day, and for a default of 10,000 initial measurements. The logic for creating files differs slightly between interval and scalar data:
- Scalar: Each measurement is an initial measurement. For the common scenario there will be one initial measurement per day. However, some customers get scalar data more often per day or some measuring components may have adhoc reading (field or on demand read command based). For those less common scenarios there may be many initial measurements for a single measuring component for a single day.
- Interval: Files contain and initial measurement for each calendar day. Since each initial measurement can contain a variable number of measurements, there will only ever be one initial measurement per measuring component per day.
Extract files are named based on the following syntax:
pre-vee-init-<channel type>-<date being extracted>-<thread number>-<file number>-<batch run>.xml.gz.
where:
- <channel type> is the type of data to extract: interval (D1IN) or scalar (D1SC)
- <date being extracted> is the date for the data being extracted, in YYYYMMDDHH24MISS format YYYY = year, MM = month, DD = day, HH= hour, MI= minute, SS = second)
- <thread number> is the thread number that created the file
- <file number>: is the number of the file, in cases where multiple files are created from same batch process run
- <batch run> is the batch number for the process
