Schema Definition

Schema Definition provides control over dataset structures while maintaining flexibility.

Schema definitions have the following characteristics:
  • Allows predefined or overridden data types and formats for datasets.
  • Provides uniformity across multiple datasets while permitting necessary adjustments.
  • Enables you to define a schema separately as a template or embedded within a dataset as an inline statement.
  • Allows you to reference by name within a dataset when you've a schema defined separately.
  • Becomes part of a code block when you've embedded it. See Code Block.

Syntax

For information on schema statements (template_schema_statement or builtin_schema_statement) in the dataset definition code_block, see Generic Dataset Definition.

Schema Definition Rules

  • Column data types are mandatory; PRIMARYKEY is optional.
  • By default, all columns are nullable unless you explicitly specify otherwise.
  • If you don't define PRIMARYKEY in the schema, it must be provided by the source dataset.
Conflict Resolution
  • Primary Key
    • When you define PRIMARYKEY in both the schema and the dataset, the dataset value takes precedence.
    • For datasets with row source joins, you can omit PRIMARYKEY from the dataset if you've already specified it in the schema.
  • Data Type
    • When you define a column's data type in both the schema and the dataset, the data type that you provide in the dataset overrides the dataset in the schema.

Column Mismatch Handling

When a column is present in either the schema or the dataset but not in both, its inclusion and properties are determined as follows:
  • Extra Columns in the Dataset: If a column appears in the dataset but not in the schema, its properties are derived from the source.
  • Extra Schema Columns: If you define a column in the schema but don't map it in the dataset, it's ignored unless it's part of the primary key. If so, a warning is issued.

Schema Template Definition

You can use DEFINE SCHEMA, the dataset definition code block, to define a schema separately from a dataset definition, refer to it, and apply it to any dataset.

Syntax:
schema_definition ::= DEFINE SCHEMA schema_name
                                                  '[ '
                                                     column_name data_type [ PRIMARYKEY ] [ nullable_flag ]
                                                   [,column_name data_type [ PRIMARYKEY ] [ nullable_flag ]] ...
                                                  ']' 
                                                 END
schema_name ::= identifier
template_schema_statement ::= SCHEMA schema_name ; //used in Dataset Definition
Example of a dataset defined with the template_schema_statement:
IMPORT SOURCE CUSTOMERS
DEFINE SCHEMA CUSTOMERS_D_SCHEMA
    [
    CUST_ID NUMBER(38,0) PRIMARYKEY,
    CUST_LAST_NAME VARCHAR2(32),
    CUST_CITY_ID NUMBER(38,0),
    CUST_VALID VARCHAR2(32),
    CUST_EFF_FROM DATE NOT NULL,
    CUST_EFF_TO DATE
    ]
END
//Usage in Dataset Definition
DEFINE DATASET CUSTOMERS_D
 SCHEMA CUSTOMERS_D_SCHEMA;
 ROWSOURCE CUSTOMERS;
 THIS = CUSTOMERS;
END

Inline Schema Definition

Within the dataset definition code block, you can define a schema directly within a dataset definition, applying it only to that specific dataset.

Syntax:
builtin_schema_statement ::= SCHEMA 
                                                        '[ '
                                                           column_name data_type [ PRIMARYKEY ] [ nullable_flag ]
                                                           [,column_name data_type [ PRIMARYKEY ] [ nullable_flag ]] ...
                                                         ']' ;
Example of a dataset defined with the builtin_schema_statement:
DEFINE DATASET INS_CUSTOMERS_D
    SCHEMA            
    [
        CUST_ID          NUMBER(38,0)    PRIMARYKEY,
        CUST_LAST_NAME.  VARCHAR2(32),
        CUST_CITY_ID     NUMBER(38,0),
        CUST_VALID       VARCHAR2(32), 
        CUST_EFF_FROM    DATE          NOT NULL,
        CUST_EFF_TO      DATE
    ];
 
    ROWSOURCE CUSTOMERS;
    THIS = CUSTOMERS;
END

Code Block Load and Schema Definition

When you separate the full load code from the incremental load within a dataset definition, the inline schema is defined at the beginning of the code_block_load.

Example of a code_block_load defined with template_schema_statement:
DEFINE DATASET FL_CUSTOMERS_D
    SCHEMA CUSTOMERS_D_SCHEMA;
    ON FULL LOAD
        ROWSOURCE CUSTOMERS WHERE CUSTOMERS.CUST_VALID = ‘A’;
        THIS = CUSTOMERS;
    INCREMENTAL LOAD
        ROWSOURCE CUSTOMERS;
        THIS = CUSTOMERS;
    ENDLOAD
END
Example of a code_block_load defined with builtin_schema_statement:
DEFINE VERSIONED DATASET INSCH_FL_CUSTOMERS_CF
    SCHEMA            
    [
        CUST_ID          NUMBER(38,0)    PRIMARYKEY,
        CUST_LAST_NAME   VARCHAR2(32),
        CUST_CITY_ID     NUMBER(38,0),
        CUST_VALID       VARCHAR2(32), 
        CUST_EFF_FROM    DATE          NOT NULL,
        CUST_EFF_TO      DATE
    ];
    ON FULL LOAD
        ROWSOURCE CUSTOMERS WHERE CUSTOMERS.CUST_VALID  = ‘A’;
        THIS = CUSTOMERS;  
    INCREMENTAL LOAD
        ROWSOURCE CUSTOMERS;
        THIS = CUSTOMERS;
    ENDLOAD
END