Manage Files in External Storage for Custom Warehouse Integration

During an extract, view object (VO) data in compressed files is uploaded to an external storage with a manifest file that lists the uploaded files from the current batch. Use the information in the manifest file to process data.

For a custom warehouse implementation, you must manage the manifest file and its contents. This section explains the types of files that BICC generates and their properties.

Files Uploaded to External Storage

BICC generates the following files:

File Type Description Extension
Application Data Files Comma-separated value data files that contain extracted BIVO data, and are uploaded as compressed files. .csv
Metadata Files Comma-separated value files that contain metadata with details about columns and data type definitions for Data Stores (BIVOs). .mdcsv
Primary Key Extract Files Comma-separated value data files that contain data from primary key columns. You can use this data to delete records in your downstream application, such as a warehouse.

.pecsv
Manifest Files These files contain information about the uploaded files. .mf
Tip: You may notice when comparing Application Data Files and Primary Key Extract Files that they may have different row counts. The reason for this is that normally, an extract includes all data store attributes, so all tables related to them are included. In the case of a primary extract, only primary key attributes are selected in the extract query, so a subset of rows are included from the tables.
Note:
  • The .csv, .pecs, and .mdcdv files and are individually zipped. For example, ‘file_crmanalyticsam_budgetam_budget-batch1510381061-20190517_004657_<time in milli secs>.zip’. The time differs for each zip file.

    The extracted files use a naming convention that matches the name of the data store, with underscores instead of periods.

    For example, if the data store name is CrmAnalyticsAM.OpportunityAM.Opportunity, the file name is 'file_crmanalyticsam_opportunityam_opportunity-batch2110193550-20160929_094418.zip'.

  • To support parsing of the comma-separated value files, column values are wrapped in double quotes. The double quote value in the column is escaped using two consecutive double quote values. Because of this, a custom delimiter isn’t required.

  • The decimal floating point numbers will have rounding errors due to representational limitations of binary floating point formats in BICC. For example, a decimal number such as 1.365 may be represented as 1.364999999999999 when converting to DOUBLE type. \

Caution: To ensure column names are sorted and appear in the ascending (deterministic) order in the extracted files, you must select columns in the BI Cloud Connector Console. If you don't select columns then BI Cloud Connector Console determines the columns available at run time and includes them in the extract files. Adding or removing a column in the BI Cloud Connector Console changes the order of columns in the extract files.

Manifest File Formats and Content

The first line of a manifest file describes the source version. File name format depends on the configure storage area and are detailed in a manifest file.

In UCM MANIFEST.MF files, the body of the file contains information about each of the uploaded files in the format vo_name;ucm_document_id;md5_check_sum_value. For example, in the following sample line, from a UCM manifest file, 9526 is the UCM document ID of the uploaded file, ;b2af2bf486366e2c2cb7598849f0df2e is the check sum value.

crmanalyticsam_partiesanalyticsam_customer;9526;b2af2bf486366e2c2cb7598849f0df2e

In Cloud Storage Service MANIFEST-[TIMESTAMP].MF files, the body of the file contains information about each of the uploaded files in the format extract_uploaded_filename;md5_check_sum_value. For example, in the following sample line, from a Storage Service manifest file, file_fscmtopmodelam_analyticsserviceam_currenciestlpvo-batch1209716923-20150615_105514.zip is the uploaded file name, and ;b2af2bf486366e2c2cb7598849f0df2e is the check sum value.

file_fscmtopmodelam_analyticsserviceam_currenciestlpvo-batch1209716923-20150615_105514.zip;fa981be0caf70a9a52df3aceb9998cc9

Global Data Extract Manifest

  • UCM will have MANIFEST.MF.

  • Cloud Storage Service and OCI Object Storage manifest files have a file name format as MANIFEST-<Timestamp>.MF.

  • EXTRACT_STATUS_DATA_SCHEDULE_<SCHEDULE ID>_REQUEST_<REQUEST_ID>.JSON is common for all data extracts.

  • EXTRACT_STATUS_PRIMARY_KEYS_SCHEDULE_<SCHEDULE ID>_REQUEST_<REQUEST_ID>.JSON is common for all key extracts.

Jobs Manifest
Job specific extracts have the following Manifest file format:
  • Data Extract

    MANIFEST_DATA_<JOB_ID>-SCHEDULE_<SCHEDULE ID>_REQUEST_<ESS_REQUEST_ID>.MF

    EXTRACT_STATUS_DATA_<JOB_ID>-SCHEDULE_<SCHEDULE ID>_REQUEST_<ESS_REQUEST_ID>.JSON

  • Keys Extract

    MANIFEST_PRIMARY_KEYS_<JOB_ID>-SCHEDULE_<SCHEDULE ID>_REQUEST_<ESS_REQUEST_ID>.MF

    EXTRACT_STATUS_PRIMARY_KEYS_<JOB_ID>-SCHEDULE_<SCHEDULE ID>_REQUEST_<ESS_REQUEST_ID>.JSON

Download and Process Content from UCM

To download extracted content from UCM, search for DOCTITLE MANIFEST.MF and sort by DOCDATE in DESC order. This provides all of the manifest UCM files in order by docid. Download each MANIFEST file using docid. Parse the lines in the manifest file to download data files using their respective ucm_document_ids. You can use the md5_check_sum_value to verify downloaded file content. After downloading the files, unzip them and process them based on their file extension, for example by .csv, .mdcsv, or .pecsv.

Once the data files are processed, rename the corresponding MANIFEST.MF file in UCM by adding a timestamp prefix in the format [TIMESTAMP]_MANIFEST.MF so that it’s not reused in the next download from UCM. Expire the manifest file and all the processed files after 30 days so that UCM storage doesn’t run out of space.

Download and Process Content from Cloud Storage Service

To download extracted content from Cloud Storage Service, search for MANIFEST- and sort by filename. This provides all of the manifest files in order by date. Download each manifest file and parse the lines in the manifest file to download data files using their respective file names. You can use the md5_check_sum_value to verify downloaded file content. After downloading the files, unzip them and process them based on their file extension, for example by .csv, .mdcsv, or .pecsv.

Once the data files are processed, rename the corresponding manifest file in Storage Service by adding a timestamp prefix in the format [TIMESTAMP]_MANIFEST so that it’s not reused in the next download. Expire the manifest file and all the processed files after 30 days so that storage doesn’t run out of space.