Manage Files in External Storage for Custom Warehouse Integration
During an extract, view object (VO) data in compressed files is uploaded to an external storage with a manifest file that lists the uploaded files from the current batch. Use the information in the manifest file to process data.
For a custom warehouse implementation, you must manage the manifest file and its contents. This section explains the types of files that BICC generates and their properties.
Files Uploaded to External Storage
BICC generates the following files:
File Type | Description | Extension |
---|---|---|
Application Data Files | Comma-separated value data files that contain extracted BIVO data, and are uploaded as compressed files. | .csv |
Metadata Files | Comma-separated value files that contain metadata with details about columns and data type definitions for Data Stores (BIVOs). | .mdcsv |
Primary Key Extract Files | Comma-separated value data files that contain data from primary key columns. You can use this data to delete records in your downstream application, such as a warehouse. | .pecsv |
Manifest Files | These files contain information about the uploaded files. | .mf |
-
The .csv, .pecs, and .mdcdv files and are individually zipped. For example, ‘file_crmanalyticsam_budgetam_budget-batch1510381061-20190517_004657_<time in milli secs>.zip’. The time differs for each zip file.
The extracted files use a naming convention that matches the name of the data store, with underscores instead of periods.
For example, if the data store name is CrmAnalyticsAM.OpportunityAM.Opportunity, the file name is 'file_crmanalyticsam_opportunityam_opportunity-batch2110193550-20160929_094418.zip'.
-
To support parsing of the comma-separated value files, column values are wrapped in double quotes. The double quote value in the column is escaped using two consecutive double quote values. Because of this, a custom delimiter isn’t required.
-
The decimal floating point numbers will have rounding errors due to representational limitations of binary floating point formats in BICC. For example, a decimal number such as 1.365 may be represented as 1.364999999999999 when converting to DOUBLE type. \
Manifest File Formats and Content
The first line of a manifest file describes the source version. File name format depends on the configure storage area and are detailed in a manifest file.
In UCM MANIFEST.MF files, the body of the file contains information
about each of the uploaded files in the format
vo_name;ucm_document_id;md5_check_sum_value
. For example, in the
following sample line, from a UCM manifest file, 9526 is the UCM document ID of the uploaded
file, ;b2af2bf486366e2c2cb7598849f0df2e is the check sum value.
crmanalyticsam_partiesanalyticsam_customer;9526;b2af2bf486366e2c2cb7598849f0df2e
In Cloud
Storage Service MANIFEST-[TIMESTAMP].MF files, the body of the file contains information about each of the
uploaded files in the format extract_uploaded_filename;md5_check_sum_value
. For example, in the following sample
line, from a Storage Service manifest file, file_fscmtopmodelam_analyticsserviceam_currenciestlpvo-batch1209716923-20150615_105514.zip
is the uploaded file name, and ;b2af2bf486366e2c2cb7598849f0df2e is
the check sum value.
file_fscmtopmodelam_analyticsserviceam_currenciestlpvo-batch1209716923-20150615_105514.zip;fa981be0caf70a9a52df3aceb9998cc9
Global Data Extract Manifest
-
UCM will have MANIFEST.MF.
-
Cloud Storage Service and OCI Object Storage manifest files have a file name format as MANIFEST-<Timestamp>.MF.
-
EXTRACT_STATUS_DATA_SCHEDULE_<SCHEDULE ID>_REQUEST_<REQUEST_ID>.JSON is common for all data extracts.
-
EXTRACT_STATUS_PRIMARY_KEYS_SCHEDULE_<SCHEDULE ID>_REQUEST_<REQUEST_ID>.JSON is common for all key extracts.
-
Data Extract
MANIFEST_DATA_<JOB_ID>-SCHEDULE_<SCHEDULE ID>_REQUEST_<ESS_REQUEST_ID>.MF
EXTRACT_STATUS_DATA_<JOB_ID>-SCHEDULE_<SCHEDULE ID>_REQUEST_<ESS_REQUEST_ID>.JSON
-
Keys Extract
MANIFEST_PRIMARY_KEYS_<JOB_ID>-SCHEDULE_<SCHEDULE ID>_REQUEST_<ESS_REQUEST_ID>.MF
EXTRACT_STATUS_PRIMARY_KEYS_<JOB_ID>-SCHEDULE_<SCHEDULE ID>_REQUEST_<ESS_REQUEST_ID>.JSON
Download and Process Content from UCM
To download extracted content from UCM, search for DOCTITLE MANIFEST.MF and sort by DOCDATE in DESC order. This provides all of the manifest UCM files in order by docid. Download each MANIFEST file using docid. Parse the lines in the manifest file to download data files using their respective ucm_document_ids. You can use the md5_check_sum_value to verify downloaded file content. After downloading the files, unzip them and process them based on their file extension, for example by .csv, .mdcsv, or .pecsv.
Once the data files are processed, rename the corresponding MANIFEST.MF file in UCM by adding a timestamp prefix in the format [TIMESTAMP]_MANIFEST.MF so that it’s not reused in the next download from UCM. Expire the manifest file and all the processed files after 30 days so that UCM storage doesn’t run out of space.
Download and Process Content from Cloud Storage Service
To download
extracted content from Cloud Storage Service, search for MANIFEST-
and sort by filename. This provides all of the manifest files in
order by date. Download each manifest file and parse the lines in
the manifest file to download data files using their respective file
names. You can use the md5_check_sum_value to verify downloaded file
content. After downloading the files, unzip them and process them
based on their file extension, for example by .csv, .mdcsv, or .pecsv.
Once the data files are processed, rename the corresponding manifest file in Storage Service by adding a timestamp prefix in the format [TIMESTAMP]_MANIFEST so that it’s not reused in the next download. Expire the manifest file and all the processed files after 30 days so that storage doesn’t run out of space.