Specifying Properties for Load to Table Option

  • T2T
  • T2H
  • H2H
  • F2H
  • H2T
  • F2T

For T2T definition:

Figure 7-11 Properties window


This image displays the Properties window.

Table 7-1 Property Name and Value in the Properties window

Property Name Property Value
Constraints
Delete Duplicate Select Yes if you want to delete the duplicate records after insertion when Primary Keys are disabled.
Disable Primary Key

Select Yes to disable Primary Key while loading the data.

In case of Batch and Bulk modes, if any of the foreign keys are in Disabled state before loading the data using T2T or the property Disable Primary Key is set to Yes, then all the Primary Keys and corresponding Foreign Keys are disabled before loading and are enabled back after loading. Hence the initial status of foreign and primary keys can be changed from Disabled to Enabled.

In case of Direct mode, if the Disable Primary Key property is not set (selected as No), then the Delete Duplicate property is set to Yes automatically, which in turn reports all the duplicate records in the error log table.

File
Frequency

Select the frequency of loading the data file into Data Warehouse. This property can be used to schedule Batch operations.

The options are Daily, Weekly, Monthly, Quarterly, Yearly, and One Time Load.

Load Empty If this is set to Yes, the task will be successful even if there are no records to load or if all the records are discarded or rejected.
MIS Date Field Specify the MIS Date field in the source data file. If MIS Date is not part of the download, then you can use the MISDate () function in the Data Mapping window to add MIS Date to the table automatically.
Loading
Load Previous Set to Yes if you want to load the data of the previous period when the current period data is not available.
Loading Type

Select the loading type from the drop-down list. The options are:

  • Insert- The records will be overwritten.
  • Append- The records will be appended to the target table.
Read Priority Choose the priority of reading the data from either Memory Store or Persistent Store, from the drop-down list.
Write Priority Choose the priority of writing the data into either Memory Store or Persistent Store, from the drop-down list.
LoadingMode
Record Load Limit If the number of records in the source table exceeds the Record Load Limit value, the data loading will not happen. If the value is set as 0 or not specified, the record count check is skipped.
Direct or Batch or Bulk

Specify the Loading Mode as Direct, Batch, or Bulk.

In Bulk Mode of loading, note that:

Loading is possible only when the target database and the data source created for the definition are in the same database.

If the schema used for source and target is different but the database is the same, then the target schema should be granted “Select” access for the source table.

You cannot specify the Batch Size and commit happens at the end of batch load.

Batch loading is faster for fewer records as compared to a larger number of records that sometimes leads to loss of data while loading.

Batch Size

Specify the Batch Size if you want to load the records in batches. The ideal values for batch sizes are 1024, 2048, 10000, or 20000. Huge batch sizes may result in failure if the required system resources are not available.

If it is not specified, commit is done on the entire set.

Source Fetch Size

Specify the Source Fetch Size for fetching data from the source system.

For T2T definitions, Source Fetch size is applicable to both Batch and Direct loading methods.

For example, the default Source Fetch Size for Oracle JDBC is 10.

Rejection
Rejection Threshold

Enter the maximum errors in absolute value that a Data File can have and the Data Load will be marked successful.

After the erroneous record count exceeds the Rejection Threshold value, the data loading task will fail and the inserted values will be rolled back for that table. Inserts for the previous tables won't be reverted. Rejection Threshold will be applied to each target table individually in a batch.

By default, the value is set as UNLIMITED.

Note the behavior of Rejection Threshold and Rejection Threshold %:

  • Rejection Threshold is checked before Rejection Threshold %. If you set a value for Rejection Threshold, it will be considered as the rejection limit and any value given to Rejection Threshold % is not considered.
  • If you set the Rejection Threshold as UNLIMITED or blank, it checks for Rejection Threshold % and the value set for Rejection Threshold % will be taken as rejection limit.
  • If you set both Rejection Threshold and Rejection Threshold % as UNLIMITED or blank, the whole Data file will be loaded irrespective of the number of errors.
Rejection Threshold %

Set Rejection Threshold as a percentage of the number of rows in the Data file.

Enter the maximum errors that a Data File can have as a percentage of the number of rows in the data file and the Data Load will be marked as successful.

By default, the value is set as UNLIMITED.

Rejection Threshold % is considered only if Rejection Threshold is set to UNLIMITED or blank.

For T2H definition:

Figure 7-12 Properties window


This image displays the Properties window.

Table 7-2 Property Name and Value in the Properties window

Property Name Property Value
Loading
Loading Type

Select the loading type from the drop-down list. The options are:

Insert- The records will be overwritten.

Append- The records will be appended to the target table.

Read Priority Choose the priority of reading the data from either Memory Store or Persistent Store, from the drop-down list.
Write Priority Choose the priority of writing the data into either Memory Store or Persistent Store, from the drop-down list.
Loading Mode
Record Load Limit If the number of records in the source table exceeds the Record Load Limit value, the data loading will not happen. If the value is set as 0 or not specified, the record count check is skipped.
Source Fetch Size

Specify the Source Fetch Size for fetching data from the source system.

For example, the default Source Fetch Size for Oracle JDBC is 10.

Sqoop
Split By Column

This is applicable only if you are using Sqoop for loading data into Hive tables.

Specify the Split By Column in the format “TableName.ColumnName”. It should not be an expression. Additionally, the column should not be of data type “Date” and it should not have Null data.

This is a mandatory field for T2H executions using Sqoop.

If you have not provided any value for this field, the T2H Sqoop engine defaults the value to the last mapped source column.

Ideally, you should set the Split-by column to a PK numeric column. If the split by column is String-based, Generic Options property needs to be set to -Dorg.apache.sqoop.splitter.allow_text_splitter=true.

Generic Options

This field is applicable only in Sqoop SSH mode.

Specify the generic arguments that will be appended before all the tool-specific arguments. For example, -Doraoop.nologging=true

Specific Options

This field is applicable only in Sqoop SSH mode.

Specify any tool specific arguments that will be appended at the end of the Sqoop command. For example, --connection-param-file ora.properties --update-mode allowinsert --update-key <COLUMN_NAME>

For H2H Definition:

Figure 7-13 Properties window


This image displays the Properties window.

Table 7-3 Property Name and Value in the Properties window

Property Name Property Value
Loading
Loading Type

Select the loading type from the drop-down list. The options are:

Insert- The records will be overwritten.

Append- The records will be appended to the target table.

Read Priority Choose the priority of reading the data from either Memory Store or Persistent Store, from the drop-down list.
Write Priority Choose the priority of writing the data into either Memory Store or Persistent Store, from the drop-down list.
LoadingMode
Record Load Limit If the number of records in the source table exceeds the Record Load Limit value, the data loading will not happen. If the value is set as 0 or not specified, the record count check is skipped.

For F2H Definition

Figure 7-14 Properties window


This image displays the Properties window.

Table 7-4 Property Name and Value in the Properties window

Property Name Property Value
File
Data File

Enter the name of the Data File that needs to be extracted. You can specify multiple files separated by ‘/’.

This property is useful to create metadata definitions for multiple Flat-Files of the same structure by copying the Definition File.

Hive and Impala
Is File Local To Hive Server Select Yes if the file is on the server where HiveServer is running, else select No from the drop-down list. This is applicable only for remote file source.
Loading
Loading Type

Select the loading type from the drop-down list. The options are:

Insert- The records will be overwritten.

Append- The records will be appended to the target table.

Read Priority Choose the priority of reading the data from either Memory Store or Persistent Store, from the drop-down list.
Write Priority Choose the priority of writing the data into either Memory Store or Persistent Store, from the drop-down list.

For H2T Definition

Figure 7-15 Properties window


This image displays the Properties window.

Table 7-5 Property Name and Value in the Properties window

Property Name Property Value
Loading
Loading Type

Select the loading type from the drop-down list. The options are:

Insert- The records will be overwritten.

NOTE:

Limitation: In the Insert Mode for H2T SQOOP Execution, the Target Tables are truncated. If a Task fails, the changes cannot be rolled back.

Append- The records will be appended to the target table.

Read Priority Choose the priority of reading the data from either Memory Store or Persistent Store, from the drop-down list.
Write Priority Choose the priority of writing the data into either Memory Store or Persistent Store, from the drop-down list.
Loading Mode
Record Load Limit If the number of records in the source table exceeds the Record Load Limit value, the data loading will not happen. If the value is set as 0 or not specified, the record count check is skipped.
Batch Size

Specify the Batch Size if you want to load the records in batches. The ideal values for batch sizes are 1024, 2048, 10000, or 20000. Huge batch sizes may result in failure if the required system resources are not available.

If it is not specified, commit is done on the entire set.

Rejection
Rejection Threshold

Enter the maximum errors in absolute value that a Data File can have and the Data Load will be marked successful.

Once the erroneous record count exceeds the Rejection Threshold value, the data loading task will fail and the inserted values will be rolled back for that table. Inserts for the previous tables won't be reverted. Rejection Threshold will be applied to each of the target tables individually in a batch.

By default, the value is set as UNLIMITED.

Sqoop
Generic Options

This field is applicable only in Sqoop SSH mode.

Specify the generic arguments which will be appended before all the tool-specific arguments. For example, -Doraoop.nologging=true

Specific Options

This field is applicable only in Sqoop SSH mode.

Specify any tool-specific arguments, which will be appended at the end of the Sqoop command. For example, --connection-param-file ora.properties --update-mode allowinsert --update-key <COLUMN_NAME>

NOTE:

To parse the date column values, set this property as shown in the follows:

  • In Sqoop cluster:

--connection-param-file <path to the ora.properties file on the sqoop node>

  • In Sqoop client mode:

--connection-param-file $FIC_DB_HOME/bin/ora.properties

Update the ora.properties file with the following parameter:

oracle.jdbc.mapDateToTimestamp=false

Use Staging Select Yes to use a staging table during Sqoop export.

For F2T Definition

Figure 7-16 Model Dialog window


This image displays the Model Dialog window.

Table 7-6 Property Name and Value in the Properties window

Property Name Property Value
File
Frequency

Select the frequency of loading the data file into Data Warehouse. This property can be used to schedule Batch operations.

The options are Daily, Weekly, Monthly, Quarterly, Yearly, and One Time Load.

MIS Date Field Specify the MIS Date field in the source data file. If MIS Date is not part of the download, then use the MISDate() function in the Data Mapping window to add MIS Date to the table automatically.
Data File

Enter the data file name if it is different from the Definition name. This property is useful to create metadata definitions for multiple Flat-Files of the same structure by copying the Definition File.

Note: For F2T CPP execution, you should not enter “/ “ in the Data File name.

Load Empty If this is set to Yes, the task will be successful, even if there are no records to load or if all the records are discarded or rejected.
Prefix Enter the string that is prefixed with the data file name separated by an underscore (_).
Suffix
  • Select No if the data file name is not suffixed.
  • Select Information Date if the data file name is suffixed with Information Date or MIS Date in YYYYMMDD format separated by an underscore (_).
Constraints
Disable Primary Key

Select Yes to disable Primary Key while loading the data.

In case of Batch and Bulk modes if any of the foreign keys are in Disabled state before loading the data using T2T or the property Disable Primary Key is set to Yes, then all the Primary Keys and corresponding Foreign Keys are disabled before loading and are enabled back after loading. Hence the initial status of foreign and primary keys can be changed from Disabled to Enabled.

In case of Direct mode, if the Disable Primary Key property is not set (selected as No), then the Delete Duplicate property is set to Yes automatically, which in turn reports all the duplicate records in the error log table.

Disable Check Constraints Select Yes if you want to disable the Check Constraints on columns of the table or select No to load with the constraints enabled.
Loading Mode
Record Load Limit If the number of records in the source file exceeds the Record Load Limit value, the data loading will not happen. If the value is set as 0 or not specified, the record count check is skipped.
Loading
Load Previous Set to Yes if you want to load the data of the previous period when the current period data is not available.
Loading Type

Select the loading type from the drop-down list. The options are:

  • Insert- The records will be overwritten.
  • Append- The records will be appended to the target table.
Duplicate Row
Duplicate Row Checks Select Yes to check for Duplicate Rows and to remove them from the Data File.
Duplicate Row This field determines which of the Duplicate Record(s) to be removed if found. The options are Keep Last Occurrence and Keep First Occurrence.
Misc
Abort-Failure Condition Select Stop to stop the loading on reaching the Rejection Threshold. Select Continue to ensure the reading of the entire Data File.
Query Enter the Query that needs to be executed before file loading.
Discard Max Enter the maximum errors allowed for SQL*Loader Discards while loading.
Edit and Reload Select Yes to have the option of editing the error file and re-loading it.
Oracle
Continue If Enter a condition which when satisfied will continue the file load.
Direct Load
  • Select Yes to do Fast Load into the Oracle Database only if you have not defined any target expressions.
  • Select Force to do Fast Load into the Oracle Database if target expressions have only constant values.
  • Select No if you do not want to enable Fast Load.
Load When Enter a condition which when satisfied will start the file load.
Parallel Load Select Yes to load the data in parallel into the Database table for faster loading, else select No.
Preserve Blanks Select Yes to retain blank values in the Data without trimming.
BINDSIZE For conventional path loads, BINDSIZE specifies the maximum size (bytes) of the bind array. The size of the bind array given by BINDSIZE overrides the default size (which is system dependent) and any size determined.
Number of ROWS

For conventional path loads, ROWS specifies the number of rows in the bind array.

For direct path loads, ROWS identifies the number of rows you want to read from the data file before a data save. The default is to read all rows and save data once at the end of the load.

Trailing Null Columns Select Yes to retain Trailing Null Columns in the Data File.
Growth
Incremental Growth Enter the Incremental Growth of Data in absolute values over the previous period.
Incremental Growth % Enter the Incremental Growth of Data in percentage over the previous period.
Rejection
Rejection Threshold

Enter the maximum errors in absolute value that a Data File can have and the Data Load will be marked successful.

After the erroneous record count exceeds the Rejection Threshold value, the data loading task will fail and the inserted values will be rolled back for that table. Inserts for the previous tables won't be reverted. Rejection Threshold will be applied to each of the target tables individually in a batch.

By default, the value is set as UNLIMITED.

Rejection Threshold is considered only if Rejection Threshold % is set to UNLIMITED or blank.

If you set both Rejection Threshold % and Rejection Threshold as UNLIMITED or blank, the whole Data file will be loaded irrespective of the number of errors.

Rejection Threshold %

Set Rejection Threshold as a percentage of the number of rows in the Data file.

Enter the maximum errors as a percentage of the number of rows in the data file, which a Data File can have and the Data Load will be marked as successful.

By default, the value is set as UNLIMITED.

Note the behavior of Rejection Threshold % and Rejection Threshold:

  • Rejection Threshold % is checked before Rejection Threshold. If you set a value for Rejection Threshold %, it will be considered as the rejection limit and it will not check Rejection Threshold.
  • If you set Rejection Threshold % as UNLIMITED or blank, it checks for Rejection Threshold and the value set for Rejection Threshold will be taken as rejection limit.
  • If you set both Rejection Threshold and Rejection Threshold % as UNLIMITED or blank, the whole Data file will be loaded irrespective of the number of errors.