Schema Definition

Schema Definition provides control over dataset structures while maintaining flexibility.

Schema definitions have the following characteristics:

Allows predefined or overridden data types and formats for datasets.
Provides uniformity across multiple datasets while permitting necessary adjustments.
Enables you to define a schema separately as a template or embedded within a dataset as an inline statement.
Allows you to reference by name within a dataset when you've a schema defined separately.
Becomes part of a code block when you've embedded it. See Code Block.

Syntax

For information on schema statements (template_schema_statement or builtin_schema_statement) in the dataset definition code_block, see Generic Dataset Definition.

Schema Definition Rules

Column data types are mandatory; PRIMARYKEY is optional.
By default, all columns are nullable unless you explicitly specify otherwise.
If you don't define PRIMARYKEY in the schema, it must be provided by the source dataset.

Conflict Resolution

Primary Key
- When you define PRIMARYKEY in both the schema and the dataset, the dataset value takes precedence.
- For datasets with row source joins, you can omit PRIMARYKEY from the dataset if you've already specified it in the schema.
Data Type
- When you define a column's data type in both the schema and the dataset, the data type that you provide in the dataset overrides the dataset in the schema.

Column Mismatch Handling

When a column is present in either the schema or the dataset but not in both, its inclusion and properties are determined as follows:

Extra Columns in the Dataset: If a column appears in the dataset but not in the schema, its properties are derived from the source.
Extra Schema Columns: If you define a column in the schema but don't map it in the dataset, it's ignored unless it's part of the primary key. If so, a warning is issued.

Schema Template Definition

You can use DEFINE SCHEMA, the dataset definition code block, to define a schema separately from a dataset definition, refer to it, and apply it to any dataset.

Syntax:

schema_definition ::= DEFINE SCHEMA schema_name
                                                  '[ '
                                                     column_name data_type [ PRIMARYKEY ] [ nullable_flag ]
                                                   [,column_name data_type [ PRIMARYKEY ] [ nullable_flag ]] ...
                                                  ']' 
                                                 END
schema_name ::= identifier
template_schema_statement ::= SCHEMA schema_name ; //used in Dataset Definition

Example of a dataset defined with the template_schema_statement:

IMPORT SOURCE CUSTOMERS
DEFINE SCHEMA CUSTOMERS_D_SCHEMA
    [
    CUST_ID NUMBER(38,0) PRIMARYKEY,
    CUST_LAST_NAME VARCHAR2(32),
    CUST_CITY_ID NUMBER(38,0),
    CUST_VALID VARCHAR2(32),
    CUST_EFF_FROM DATE NOT NULL,
    CUST_EFF_TO DATE
    ]
END
//Usage in Dataset Definition
DEFINE DATASET CUSTOMERS_D
 SCHEMA CUSTOMERS_D_SCHEMA;
 ROWSOURCE CUSTOMERS;
 THIS = CUSTOMERS;
END

Inline Schema Definition

Within the dataset definition code block, you can define a schema directly within a dataset definition, applying it only to that specific dataset.

Syntax:

builtin_schema_statement ::= SCHEMA 
                                                        '[ '
                                                           column_name data_type [ PRIMARYKEY ] [ nullable_flag ]
                                                           [,column_name data_type [ PRIMARYKEY ] [ nullable_flag ]] ...
                                                         ']' ;

Example of a dataset defined with the builtin_schema_statement:

DEFINE DATASET INS_CUSTOMERS_D
    SCHEMA            
    [
        CUST_ID          NUMBER(38,0)    PRIMARYKEY,
        CUST_LAST_NAME.  VARCHAR2(32),
        CUST_CITY_ID     NUMBER(38,0),
        CUST_VALID       VARCHAR2(32), 
        CUST_EFF_FROM    DATE          NOT NULL,
        CUST_EFF_TO      DATE
    ];
 
    ROWSOURCE CUSTOMERS;
    THIS = CUSTOMERS;
END

Code Block Load and Schema Definition

When you separate the full load code from the incremental load within a dataset definition, the inline schema is defined at the beginning of the code_block_load.

Example of a code_block_load defined with template_schema_statement:

DEFINE DATASET FL_CUSTOMERS_D
    SCHEMA CUSTOMERS_D_SCHEMA;
    ON FULL LOAD
        ROWSOURCE CUSTOMERS WHERE CUSTOMERS.CUST_VALID = ‘A’;
        THIS = CUSTOMERS;
    INCREMENTAL LOAD
        ROWSOURCE CUSTOMERS;
        THIS = CUSTOMERS;
    ENDLOAD
END

Example of a code_block_load defined with builtin_schema_statement:

DEFINE VERSIONED DATASET INSCH_FL_CUSTOMERS_CF
    SCHEMA            
    [
        CUST_ID          NUMBER(38,0)    PRIMARYKEY,
        CUST_LAST_NAME   VARCHAR2(32),
        CUST_CITY_ID     NUMBER(38,0),
        CUST_VALID       VARCHAR2(32), 
        CUST_EFF_FROM    DATE          NOT NULL,
        CUST_EFF_TO      DATE
    ];
    ON FULL LOAD
        ROWSOURCE CUSTOMERS WHERE CUSTOMERS.CUST_VALID  = ‘A’;
        THIS = CUSTOMERS;  
    INCREMENTAL LOAD
        ROWSOURCE CUSTOMERS;
        THIS = CUSTOMERS;
    ENDLOAD
END