4.5 Prerequisite for Data Pipeline Script Execution

The below are prerequisites for Data Pipeline. The steps need to be executed before triggering MMG Installer.
  1. Create the below directories in the DB Server:
    • mkdir -p /file_store/fs_list/logs# A directory by the external table to write the logs.
    • mkdir -p /file_store/fs_list/script# A directory to hold a pre-processor script used to list the files in a directory. This needs Read-Execute permissions.
    • mkdir -p /file_store/fs_list/control# A directory to hold files to control which directories can be listed. This needs Read permissions.
    • mkdir -p /scratch/oraofss/fccm-data# A directory to store the csv files.
  2. Create directory object associated with physical directories and provide grants:
    • Create directory object associated with physical directories and provide grants.
    • Create OR Replace Directory fs_list_logs_dir AS '/file_store/fs_list/logs/'; GRANT READ, WRITE ON DIRECTORY fs_list_logs_dir TO amldd;
    • Create OR Replace Directory fs_list_script_dir AS '/file_store/fs_list/script/'; GRANT READ, EXECUTE ON DIRECTORY fs_list_script_dir TO amldd;
    • Create OR Replace Directory fs_list_control_dir AS '/file_store/fs_list/control/'; GRANT READ ON DIRECTORY fs_list_control_dir TO amldd;
    • Create OR Replace Directory external_tables_dir AS '/scratch/oraofss/fccm-data/'; GRANT READ ON DIRECTORY external_tables_dir TO amldd;
    • GRANT CREATE TABLE TO <SCHEMA-NAME>;
  3. Create Pre-processor script. The script lists the files in the directory provided by the external table LOCATION clause. It specifies the date format in a more useful form than the default format. The second cat command shows us the contents of the file once it is written.
    • This will contain the script to monitor the path cat> /fs_list_logs_dir/fs_list/script/list_directory.sh <<EOF

      !/bin/bash /usr/bin/ls -l --time-style=+"%Y-%m-%d:%H:%M:%S" "\$(/usr/bin/cat \$1)" EOF

    • This will contain the script to monitor the path cat > /fs_list_logs_dir/fs_list/control/trace.txt <<EOF /scratch/oraofss/fccm-data EOF
  4. From SQL window, login to amldd user and create external table to list all the files in the directory using below query.
    CREATE TABLE LIST_DIRECTORY_EXT_CSV
    ( "FILE_ID" NUMBER,
    "FILE_NAME" VARCHAR2(200 BYTE),
    "FILE_PERMISSIONS" VARCHAR2(11 BYTE),
    "FILE_HARDLINKS" NUMBER,
    "FILE_OWNER" VARCHAR2(32 BYTE),
    "FILE_GROUP" VARCHAR2(32 BYTE),
    "FILE_SIZE" NUMBER, "FILE_DATETIME" DATE
    )
    ORGANIZATION EXTERNAL
    ( TYPE ORACLE_LOADER
    DEFAULT DIRECTORY "FS_LIST_LOGS_DIR"
    ACCESS PARAMETERS
    ( RECORDS DELIMITED BY NEWLINE
    PREPROCESSOR fs_list_script_dir:'list_directory.sh'
    FIELDS TERMINATED BY WHITESPACE
    (
    file_id,
    file_permissions,
    file_hardlinks,
    file_owner,
    file_group,
    file_size,
    file_datetime DATE 'YYYY-MM-DD:HH24:MI:SS',
    file_name
    )
    )
    LOCATION
    ( "FS_LIST_CONTROL_DIR":'trace.txt'
    )
    )
    REJECT LIMIT UNLIMITED;
  5. Verify the files in the external table using sql : select * from LIST_DIRECTORY_EXT_CSV

    Figure 4-1 List Directory


    This image displays the List Directory.

  6. API for the Listing File
    Entity

    Figure 4-2 Entity


    This image displays the Entity.

    DAO

    Figure 4-3 DAO


    This image displays the DAO.

    Controller

    Figure 4-4 Controller


    This image displays the Controller.

  7. Get Headers Details
    Headers of the CSV files used for creating the fields of the external table. The SQL Procedure P_CSV_FILE_HEADERS returns the headers of the given CSV file.

    Figure 4-5 Get Header Details


    This displays the script for Get Header Details.

    API for getting Headers

    Figure 4-6 DAO


    This image displays the DAO.

    Figure 4-7 Controller


    This image displays the Controller.

    Creating External Table

    Using the headers details of the CSV an external table is created using below PLSQL Procedure

    Figure 4-8 External table


    This image displays the External table.

    Note:

    Limitation to show few of the Interpreter icons by default:
    • %shell: This will not be enabled. It has generally been called out as a security vulnerability.
    • %spark: This is the default behaviour in Data Studio. Default configuration provided in spark.json enables both %spark and %pyspark, but ¿add¿ button is enabled only for %pyspark. It will be enabled only if there is any App/Customer requirement.
    • %jdbc: Default configuration provided by Data Studio enables ¿add¿ button only for %mysql and not for %jdbc. It will be enabled only if there is any App/Customer requirement.
    Parameters required to execute Data Pipeline from Scheduler
    For standalone Data Pipeline definitions, the user has to pass the following parameters as optional during execution, after selecting the graph data pipeline as component.
    • $<Runtime Parameter1>=<Value>
    • $<Runtime Parameter2>=<value>
    • $RUNTYPE$=PROD
    • $batchRunType$=run
    • $BATCHTYPE$=DATA
    • $JOBNAME$=<DataPipeline_Name>