Skip Headers
Oracle® Big Data Connectors User's Guide
Release 2 (2.0)

Part Number E36961-03
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub

2 Oracle SQL Connector for Hadoop Distributed File System

This chapter describes how to use Oracle SQL Connector for Hadoop Distributed File System (HDFS) to facilitate data access between HDFS and Oracle Database.

This chapter contains the following sections:

2.1 About Oracle SQL Connector for HDFS

Using Oracle SQL Connector for HDFS, you can use Oracle Database to access and analyze data residing in HDFS files or a Hive table. You can also query and join data in HDFS or a Hive table with other database-resident data. If required, you can also load data into the database using SQL.

Oracle SQL Connector for HDFS is installed and configured on the system where Oracle Database runs. It can also run on the system where Oracle Database runs. If Hive tables are used as data sources, then Oracle SQL Connector for HDFS must also be installed and running on the system where Hive is installed. See "Oracle SQL Connector for Hadoop Distributed File System Setup."

See Also:

$OSCH_HOME/doc/README.txt for information about known problems with Oracle SQL Connector for HDFS.

2.2 About External Tables

Oracle SQL Connector for HDFS uses external tables to provide Oracle Database with read access to Hive tables, and to delimited text files and Data Pump files in HDFS. An external table is an Oracle Database object that identifies the location of data outside of a database. Oracle Database accesses the data by using the metadata provided when the external table was created. By querying the external tables, you can access data stored in HDFS and Hive tables as if that data were stored in tables in a database.

Oracle SQL Connector for HDFS uses the ORACLE_LOADER access driver.

Because external tables are used to access data, all of the features and limitations of external tables apply. Queries are executed in parallel with automatic load balancing. However, update, insert, and delete operations are not allowed and indexes cannot be created on external tables. When an external table is accessed, a full table scan is always performed.

Oracle SQL Connector for HDFS can create external tables for the following data sources:

See "Creating External Tables".

Note:

Oracle SQL Connector for HDFS requires a patch to Oracle Database before the connector can access Data Pump files produced by Oracle Loader for Hadoop. To download this patch, go to http://support.oracle.com and search for bug 14557588.

See Also:

2.2.1 What Are Location Files?

A location file is a file specified in the location clause of the external table. Oracle SQL Connector for HDFS creates location files that contain only the Universal Resource Identifiers (URIs) of the data files. A data file contains the data stored in HDFS.

2.2.2 Enabling Parallel Processing

To enable parallel processing with external tables, you must specify multiple files in the location clause of the external table. The number of files, also known as the degree of parallelism, determines the number of child processes started by the external table during a table read. Ideally, the degree of parallelism is no larger than the number of data files, to avoid idle child processes.

2.2.3 Location File Management

The Oracle SQL Connector for HDFS command-line tool, ExternalTable, manages the location files of the external table. Location file management involves the following operations:

  • Generating new location files in the database directory after checking for name conflicts

  • Deleting existing location files in the database directory as necessary

  • Publishing data URIs to new location files

  • Altering the LOCATION clause of the external table to match the new location files

Location file management for the supported data sources is described in the following topics.

Data Pump file format

The ORACLE_LOADER access driver is required to access Data Pump files. The driver requires that each location file corresponds to a single Data Pump file in HDFS. Empty location files are not allowed, and so the number of location files in the external table must exactly match the number of data files in HDFS.

Oracle SQL Connector for HDFS automatically takes over location file management and ensures that the number of location files in the external table equals the number of Data Pump files in HDFS.

Delimited files in HDFS and Hive tables

The ORACLE_LOADER access driver has no limitation on the number of location files. Each location file can correspond to one or more data files in HDFS. The number of location files for the external table is suggested by the oracle.hadoop.exttab.locationFileCount configuration property.

See "Configuration Properties".

2.2.4 Location File Names

This is the format of a location file name:

osch-timestamp-number-n

In this syntax:

timestamp has the format yyyyMMddhhmmss, for example, 20121017103941 for October 17, 2012, at 10:39:41.

number is a random number used to prevent location file name conflicts among different tables.

n is an index used to prevent name conflicts between location files for the same table.

For example, osch-20121017103941-6807-1.

2.3 Using the ExternalTable Command-Line Tool

Oracle SQL Connector for HDFS provides a command-line tool named ExternalTable. This section describes the basic use of this tool. See "Using the ExternalTable Command-Line Tool" for the command syntax that is specific to your data sources.

2.3.1 About ExternalTable

The ExternalTable tool uses the values of several properties to do the following tasks:

  • Create an external table

  • Populate the location files

  • Publish location files to an existing external table

  • List the location files

  • Describe an external table

You can specify these property values in an XML document or individually on the command line. See "Configuring Oracle SQL Connector for HDFS".

2.3.2 Altering HADOOP_CLASSPATH

Before using ExternalTable, add the following to the HADOOP_CLASSPATH environment variable:

  • $OSCH_HOME/jlib/*

If you are using Hive tables as data sources, then also add the following to HADOOP_CLASSPATH:

  • $HIVE_HOME/lib/*

  • $HIVE_CONF_DIR (if it is set), or $HIVE_HOME/conf

See "ExternalTable Command-Line Tool Example".

2.3.3 ExternalTable Command-Line Tool Syntax

This is the full syntax of the ExternalTable command-line tool:

$HADOOP_HOME/bin/hadoop jar $OSCH_HOME/jlib/orahdfs.jar \
oracle.hadoop.exttab.ExternalTable \
[-conf config_file]... \ 
[-D property=value]... \
-createTable [--noexecute] 
  | -publish [--noexecute] 
  | -listlocations [--details]
  | -getDDL

Command Descriptions 

-conf config_file

Identifies the name of an XML configuration file containing properties needed by the command being executed. See "Configuring Oracle SQL Connector for HDFS".

-D property=value

Assigns a value to a specific property.

-createTable [--noexecute]

Creates an external table definition and publishes the data URIs to the location files of the external table. The output report shows the DDL used to create the external table and lists the contents of the location files.

Use the --noexecute option to see the execution plan of the command. The operation is not executed, but the report includes the details of the execution plan and any errors. Oracle recommends that you first execute a -createTable command with --noexecute.

-publish [--noexecute]

Publishes the data URIs to the location files of an existing external table.

Use the --noexecute option to see the execute plan of the command. The operation is not executed, but the report shows the planned SQL ALTER TABLE command and location files. The report also shows any errors. Oracle recommends that you first execute a -publish command with --noexecute.

-listLocations [--details]

Shows the location file content as text. With the --details option, this command provides a detailed listing. See "What Are Location Files?."

-getDDL

Prints the table definition of an existing external table. See "Describing External Tables."

2.4 Creating External Tables

You can create external tables automatically using the ExternalTable tool provided in Oracle SQL Connector for HDFS.

2.4.1 Creating External Tables with the ExternalTable Tool

To create an external table using the ExternalTable tool, follow the instructions for your data source:

When the ExternalTable -createTable command finishes executing, the external table is ready for use.

To create external tables manually, follow the instructions in "Creating External Tables in SQL."

ExternalTable Syntax for -createTable

Use the following syntax to create an external table and populate its location files:

$HADOOP_HOME/bin/hadoop jar $OSCH_HOME/jlib/orahdfs.jar oracle.hadoop.exttab.ExternalTable \
[-conf config_file]... \
[-D property=value]... \
-createTable [--noexecute]

2.4.2 Creating External Tables from Data Pump Format Files

Oracle SQL Connector for HDFS supports only Data Pump files produced by Oracle Loader for Hadoop, and does not support generic Data Pump files produced by Oracle Utilities.

Oracle SQL Connector for HDFS creates the external table definition for Data Pump files by using the metadata from the Data Pump file header. It uses the ORACLE_LOADER access driver with the disable_directory_link_check and preprocessor access parameters. It also uses a special access parameter named EXTERNAL VARIABLE DATA, which enables ORACLE_LOADER to read the Data Pump format files generated by Oracle Loader for Hadoop.

2.4.2.1 Required Properties

These properties are required:

  • oracle.hadoop.exttab.tableName

  • oracle.hadoop.exttab.sourceType=datapump

  • oracle.hadoop.exttab.dataPaths

  • oracle.hadoop.exttab.defaultDirectory

  • oracle.hadoop.connection.url

  • oracle.hadoop.connection.user

See "Configuring Oracle SQL Connector for HDFS" for descriptions of the properties used for this data source.

2.4.2.2 Optional Properties

This property is optional:

  • oracle.hadoop.exttab.logDirectory

2.4.2.3 Example

Example 2-1 creates an external table named SALES_DP_XTAB to read Data Pump files.

Example 2-1 Defining an External Table for Data Pump Format Files

Log in as the operating system user that Oracle Database runs under (typically the oracle user), and create a file-system directory:

$ mkdir /scratch/sales_dp_dir

Create a database directory and grant read and write access to it:

$ sqlplus / as sysdba
SQL> CREATE OR REPLACE DIRECTORY sales_dp_dir AS '/scratch/sales_dp_dir'
SQL> GRANT READ, WRITE ON DIRECTORY sales_dp_dir TO scott;

Create the external table:

$HADOOP_HOME/bin/hadoop jar $OSCH_HOME/jlib/orahdsf.jar \
oracle.hadoop.exttab.ExternalTable \
-D oracle.hadoop.exttab.tableName=SALES_DP_XTAB \
-D oracle.hadoop.exttab.sourceType=datapump \
-D oracle.hadoop.exttab.dataPaths=hdfs://user/scott/olh_sales_dpoutput/ \
-D oracle.hadoop.exttab.defaultDirectory=SALES_DP_DIR \
-D oracle.hadoop.connection.url=jdbc:oracle:thin:@//myhost:1521/myservicename \
-D oracle.hadoop.connection.user=SCOTT \
-createTable

2.4.3 Creating External Tables from Hive Tables

Oracle SQL Connector for HDFS creates the external table definition from a Hive table by contacting the Hive metastore client to retrieve information about the table columns and the location of the table data. In addition, the Hive table data paths are published to the location files of the Oracle external table.

To read Hive table metadata, Oracle SQL Connector for HDFS requires that the Hive JAR files are included in the HADOOP_CLASSPATH variable. This means that Oracle SQL Connector for HDFS must be installed and running on a computer with a working Hive client.

Ensure that you add the Hive configuration directory to the HADOOP_CLASSPATH environment variable. You must have a correctly functioning Hive client.

For Hive managed tables, the data paths come from the warehouse directory.

For Hive external tables, the data paths from an external location in HDFS are published to the location files of the Oracle external table. Hive external tables can have no data, because Hive does not check if the external location is defined when the table is created. If the Hive table is empty, then one location file is published with just a header and no data URIs.

The Oracle external table is not a "live" Hive table. When changes are made to a Hive table, you must use the ExternalTable tool to either republish the data or create a new external table.

2.4.3.1 Hive Table Requirements

Oracle SQL Connector for HDFS supports non-partitioned Hive tables that are defined using ROW FORMAT DELIMITED and FILE FORMAT TEXTFILE clauses. Both Hive-managed tables and Hive external tables are supported.

Hive tables can be either bucketed or not bucketed. Table columns with all primitive types from Hive 0.7.1 (CDH3) and the TIMESTAMP type are supported.

2.4.3.2 Required Properties

These properties are required for Hive table sources:

  • oracle.hadoop.exttab.tableName

  • oracle.hadoop.exttab.defaultDirectory

  • oracle.hadoop.exttab.sourceType=hive

  • oracle.hadoop.exttab.hive.tableName

  • oracle.hadoop.exttab.hive.databaseName

  • oracle.hadoop.connection.url

  • oracle.hadoop.connection.user

See "Configuring Oracle SQL Connector for HDFS" for descriptions of the properties used for this data source.

2.4.3.3 Optional Properties

This property is optional for Hive table sources:

  • oracle.hadoop.locationFileCount

2.4.3.4 Example

Example 2-2 creates an external table named SALES_HIVE_XTAB to read data from a Hive table.

Example 2-2 Defining an External Table for a Hive Table

Log in as the operating system user that Oracle Database runs under (typically the oracle user), and create a file-system directory:

$ mkdir /scratch/sales_hive_dir

Create a database directory and grant read and write access to it:

$ sqlplus / as sysdba
SQL> CREATE OR REPLACE DIRECTORY sales_hive_dir AS '/scratch/sales_hive_dir'
SQL> GRANT READ, WRITE ON DIRECTORY sales_hive_dir TO scott;

Create the external table:

$HADOOP_HOME/bin/hadoop jar $OSCH_HOME/jlib/orahdfs.jar \
oracle.hadoop.exttab.ExternalTable \
-D oracle.hadoop.exttab.tableName=SALES_HIVE_XTAB \
-D oracle.hadoop.exttab.sourceType=hive \
-D oracle.hadoop.exttab.locationFileCount=2 \
-D oracle.hadoop.exttab.hive.tableName=sales_country_us \
-D oracle.hadoop.exttab.hive.databaseName=salesdb \
-D oracle.hadoop.exttab.defaultDirectory=SALES_HIVE_DIR \
-D oracle.hadoop.connection.url=jdbc:oracle:thin:@//myhost:1521/myservicename \
-D oracle.hadoop.connection.user=SCOTT \
-createTable

2.4.4 Creating External Tables from Delimited Text Files

Oracle SQL Connector for HDFS creates the external table definition for delimited text files using configuration properties that specify the number of columns, the text delimiter, and optionally, the external table column names. All columns in the external table are VARCHAR2. If column names are not provided, they default to C1 to Cn, where n is the number of columns specified by the oracle.hadoop.exttab.columnCount property.

2.4.4.1 Required Properties

These properties are required for delimited text sources:

  • oracle.hadoop.exttab.tableName

  • oracle.hadoop.exttab.dataPaths

  • oracle.hadoop.exttab.columnCount or oracle.hadoop.exttab.columnNames

  • oracle.hadoop.exttab.defaultDirectory

  • oracle.hadoop.connection.url

  • oracle.hadoop.connection.user

See "Configuring Oracle SQL Connector for HDFS" for descriptions of the properties used for this data source.

2.4.4.2 Optional Properties

These properties are optional for delimited text sources:

  • oracle.hadoop.exttab.recordDelimiter

  • oracle.hadoop.exttab.fieldTerminator

  • oracle.hadoop.exttab.initialFieldEncloser

  • oracle.hadoop.exttab.trailingFieldEncloser

  • oracle.hadoop.exttab.locationFileCount

2.4.4.3 Example

Example 2-3 creates an external table named SALES_DT_XTAB from delimited text files.

Example 2-3 Defining an External Table for Delimited Text Files

Log in as the operating system user that Oracle Database runs under (typically the oracle user), and create a file-system directory:

$ mkdir /scratch/sales_dt_dir

Create a database directory and grant read and write access to it:

$ sqlplus / as sysdba
SQL> CREATE OR REPLACE DIRECTORY sales_dt_dir AS '/scratch/sales_dt_dir'
SQL> GRANT READ, WRITE ON DIRECTORY sales_dt_dir TO scott;

Create the external table:

$HADOOP_HOME/bin/hadoop jar $OSCH_HOME/jlib/orahdfs.jar \
oracle.hadoop.exttab.ExternalTable \
-D oracle.hadoop.exttab.tableName=SALES_DT_XTAB \
-D oracle.hadoop.exttab.locationFileCount=2 \
-D oracle.hadoop.exttab.dataPaths="hdfs://user/scott/olh_sales/*.dat" \
-D oracle.hadoop.exttab.columnCount=10 \
-D oracle.hadoop.exttab.defaultDirectory=SALES_DT_DIR \
-D oracle.hadoop.connection.url=jdbc:oracle:thin:@//myhost:1521/myservicename \
-D oracle.hadoop.connection.user=SCOTT \
-createTable

2.4.5 Creating External Tables in SQL

You can create an external table manually for Oracle SQL Connector for HDFS. For example, the following procedure enables you to use external table syntax that is not exposed by the ExternalTable -createTable command.

Additional syntax might not be supported for Data Pump format files.

To create an external table manually:  

  1. Use the -createTable --noexecute command to generate the external table DDL.

  2. Make whatever changes are needed to the DDL.

  3. Run the DDL from Step 2 to create the table definition in the Oracle database.

  4. Use the ExternalTable -publish command to publish the data URIs to the location files of the external table.

2.5 Publishing the HDFS Data Paths

The -createTable command creates the metadata in Oracle Database and populates the location files with the Universal Resource Identifiers (URIs) of the data files in HDFS. However, you might publish the URIs as a separate step from creating the external table in cases like these:

In both cases, you can use ExternalTable with the -publish command to populate the external table location files with the URIs of the data files in HDFS. See "Location File Management".

ExternalTable Syntax for Publish

$HADOOP_HOME/bin/hadoop jar $OSCH_HOME/jlib/orahdfs.jar \
oracle.hadoop.exttab.ExternalTable \
[-conf config_file]... \
[-D property=value]... \
-publish [--noexecute]

ExternalTable Command-Line Tool Example

Example 2-4 sets HADOOP_CLASSPATH and publishes the HDFS data paths to the external table created in Example 2-1. See "Altering HADOOP_CLASSPATH" for more information about setting this environment variable.

Example 2-4 Publishing HDFS Data Paths to an External Table for Data Pump Format Files

This example uses the Bash shell.

$ export HADOOP_CLASSPATH="$OSCH_HOME/jlib/*"
$ $HADOOP_HOME/bin/hadoop jar \
$OSCH_HOME/jlib/orahdfs.jar oracle.hadoop.exttab.ExternalTable \
-D oracle.hadoop.exttab.tableName=SALES_DP_XTAB \
-D oracle.hadoop.exttab.sourceType=datapump \
-D oracle.hadoop.exttab.dataPaths=hdfs:/user/scott/data/ \
-D oracle.hadoop.connection.url=jdbc:oracle:thin:@//myhost:1521/myservicename \
-D oracle.hadoop.exttab.connection.user=scott -publish

In this example:

  • HADOOP_HOME is an environment variable pointing to the Hadoop home directory.

  • OSCH_HOME is an environment variable pointing to the Oracle SQL Connector for HDFS installation directory.

  • SALES_DP_XTAB is the external table created in Example 2-1.

  • hdfs:/user/scott/data/ is the location of the HDFS data.

  • @myhost:1521/orcl is the database connection string.

2.6 Listing Location File Metadata and Contents

The -listLocations command is a debugging and diagnostic utility that enables you to see the location file metadata and contents. You can use this command to verify the integrity of the location files of an Oracle external table.

These properties are required to use this command:

ExternalTable Syntax for -listLocations

$HADOOP_HOME/bin/hadoop jar $OSCH_HOME/jlib/orahdfs.jar \
oracle.hadoop.exttab.ExternalTable \
[-conf config_file]... \ 
[-D property=value]... \
-listLocations [--details]

2.7 Describing External Tables

The -getDDL command is a debugging and diagnostic utility that prints the definition of an existing external table. This command follows the security model of the PL/SQL DBMS_METADATA package, which enables non-privileged users to see the metadata for their own objects.

These properties are required to use this command:

ExternalTable Syntax for -getDDL

$HADOOP_HOME/bin/hadoop jar $OSCH_HOME/jlib/orahdfs.jar \
oracle.hadoop.exttab.ExternalTable \
[-conf config_file]... \
[-D property=value]... \
-getDDL

2.8 Querying Data in HDFS

Parallel processing is extremely important when you are working with large volumes of data. When you use external tables, always enable parallel query with this SQL command:

ALTER SESSION ENABLE PARALLEL QUERY;

Before loading the data into an Oracle database from the external files created by Oracle SQL Connector for HDFS, enable parallel DDL:

ALTER SESSION ENABLE PARALLEL DDL;

Before inserting data into an existing database table, enable parallel DML with this SQL command:

ALTER SESSION ENABLE PARALLEL DML;

Hints such as APPEND and PQ_DISTRIBUTE also improve performance when you are inserting data.

2.9 Configuring Oracle SQL Connector for HDFS

You can pass configuration properties to the ExternalTable tool on the command line with the -D option, or you can create a configuration file and pass it on the command line with the -conf option. These options must precede the command to be executed (-createTable, -publish, -listLocations, or -getDDL).

See "ExternalTable Command-Line Tool Syntax".

2.9.1 Creating a Configuration File

A configuration file is an XML document with a very simple structure as follows:

<?xmlversion="1.0"?>
<configuration>
  <property>
    <name>property</name>
    <value>value</value>
  </property>
     .
     .
     .
</configuration>

Example 2-5 shows a configuration file. See "Configuration Properties" for descriptions of these properties.

Example 2-5 Configuration File for Oracle SQL Connector for HDFS

<?xmlversion="1.0"?>
<configuration>
  <property>
    <name>oracle.hadoop.exttab.tableName</name>
    <value>SH.SALES_EXT_DIR</value>
  </property>
  <property>
    <name>oracle.hadoop.exttab.dataPaths</name>
    <value>/data/s1/*.csv,/data/s2/*.csv</value>
  </property>
  <property>
    <name>oracle.hadoop.exttab.dataCompressionCodec</name>
    <value>org.apache.hadoop.io.compress.DefaultCodec</value>
  </property>
  <property>
    <name>oracle.hadoop.connection.url</name>
    <value>jdbc:oracle:thin:@//myhost:1521/myservicename</value>
  </property>
  <property>
    <name>oracle.hadoop.connection.user</name>
    <value>SH</value>
  </property>
</configuration>

2.9.2 Configuration Properties

The following is a complete list of the configuration properties used by the ExternalTable command-line tool. The properties are organized into these categories:

General Properties 

oracle.hadoop.exttab.columnCount

The number of columns for the external table created from delimited text files. The column names are set to C1, C2,... Cn, where n is value of this property.

This property is ignored if oracle.hadoop.exttab.columnNames is set.

The -createTable command uses this property when oracle.hadoop.exttab.sourceType=text.

You must set one of these properties when creating an external table from delimited text files:

  • oracle.hadoop.exttab.columnNames

  • oracle.hadoop.exttab.columnCount

oracle.hadoop.exttab.columnNames

A comma-separated list of column names for an external table created from delimited text files. If this property is not set, then the column names are set to C1, C2,... Cn, where n is the value of the oracle.hadoop.exttab.columnCount property.

The value of this property is case insensitive. All column names are uppercased. Embedded commas are not allowed in column names.

The -createTable command uses this property when oracle.hadoop.exttab.sourceType=text.

You must set one of these properties when creating an external table from delimited text files:

  • oracle.hadoop.exttab.columnNames

  • oracle.hadoop.exttab.columnCount

oracle.hadoop.exttab.dataCompressionCodec

The name of the compression codec class used for the data files. Optional.

Default value: None

oracle.hadoop.exttab.dataPathFilter

The path filter class. This property is ignored for Hive data sources.

Oracle SQL Connector for HDFS uses a default filter to exclude hidden files, which begin with a dot or an underscore. If you specify another path filter class using the this property, then your filter acts in addition to the default filter. Thus, only visible files accepted by your filter are considered.

oracle.hadoop.exttab.defaultDirectory

Specifies the default directory for the Oracle external table. This directory is used for all input and output files that do not explicitly name a directory object.

Valid value: The name of an existing database directory

Unquoted names are changed to upper case. Double-quoted names are not changed; use them when case-sensitivity is desired. Single-quoted names are not allowed for default directory names.

The -createTable command requires this property.

oracle.hadoop.exttab.fieldTerminator

Specifies the field terminator for an external table when oracle.hadoop.exttab.sourceType=text. Optional.

Default value: , (comma)

Valid values: A string in one of the following formats:

  • One or more regular printable characters; it cannot start with \u.

  • One or more encoded characters in the format \uHHHH, where HHHH is a big-endian hexadecimal representation of the character in UTF-16. The hexadecimal digits are case insensitive.

Do not mix the two formats.

oracle.hadoop.exttab.hive.databaseName

The name of a Hive database that contains the input data table.

The -createTable command requires this property when oracle.hadoop.exttab.sourceType=hive.

oracle.hadoop.exttab.hive.tableName

The name of an existing Hive table.

The -createTable command requires this property when oracle.hadoop.exttab.sourceType=hive.

oracle.hadoop.exttab.initialFieldEncloser

Specifies the initial field encloser for an external table created from delimited text files. Optional.

Default value: null; no enclosers are specified for the external table definition.

The -createTable command uses this property when oracle.hadoop.exttab.sourceType=text.

Valid values: A string in one of the following formats:

  • One or more regular printable characters; it cannot start with \u.

  • One or more encoded characters in the format \uHHHH, where HHHH is a big-endian hexadecimal representation of the character in UTF-16. The hexadecimal digits are case insensitive.

Do not mix the two formats.

oracle.hadoop.exttab.locationFileCount

Specifies the desired number of location files for the external table. Applicable only to non-Data-Pump files.

Default value: 4

This property is ignored if the data files are in Data Pump format. Otherwise, the number of location files is the lesser of:

  • The number of data files

  • The value of this property

At least one location file is created.

See "Enabling Parallel Processing" for more information about the number of location files.

oracle.hadoop.exttab.logDirectory

Specifies a database directory where log files, bad files, and discard files are stored. The file names are the default values used by external tables. For example, the name of a log file is the table name followed by _%p.log.

This is an optional property for the -createTable command.

These are the default file name extensions:

  • Log files: log

  • Bad files: bad

  • Discard files: dsc

Valid values: An existing Oracle directory object name.

Unquoted names are uppercased. Quoted names are not changed. Table 2-1 provides examples of how values are transformed.

Table 2-1 Examples of Quoted and Unquoted Values

Specified Value Interpreted Value

my_log_dir:'sales_tab_%p.log '

MY_LOG_DIR/sales_tab_%p.log

'my_log_dir':'sales_tab_%p.log'

my_log_dir/sales_tab_%p.log

"my_log_dir":"sales_tab_%p.log"

my_log_dir/sales_tab_%p.log


oracle.hadoop.exttab.preprocessorDirectory

Specifies the database directory for the preprocessor. The file-system directory must contain the hdfs_stream script.

Default value: OSCH_BIN_PATH

The preprocessor directory is used in the PREPROCESSOR clause of the external table.

oracle.hadoop.exttab.recordDelimiter

Specifies the record delimiter for an external table created from delimited text files. Optional.

Default value: \n

The -createTable command uses this parameter when oracle.hadoop.exttab.sourceType=text.

Valid values: A string in one of the following formats:

  • One or more regular printable characters; it cannot start with \u.

  • One or more encoded characters in the format \uHHHH, where HHHH is a big-endian hexadecimal representation of the character in UTF-16. The hexadecimal digits are case insensitive.

Do not mix the two formats.

oracle.hadoop.exttab.sourceType

Specifies the type of source files.

The valid values are datapump, hive, and text.

Default value: text

The -createTable and -publish operations require the value of this parameter.

oracle.hadoop.exttab.trailingFieldEncloser

Specifies the trailing field encloser for an external table created from delimited text files. Optional.

Default value: null; defaults to the value of oracle.hadoop.exttab.initialFieldEncloser

The -createTable command uses this property when oracle.hadoop.exttab.sourceType=text.

Valid values: A string in one of the following formats:

  • One or more regular printable characters; it cannot start with \u.

  • One or more encoded characters in the format \uHHHH, where HHHH is a big-endian hexadecimal representation of the character in UTF-16. The hexadecimal digits are case insensitive.

Do not mix the two formats.

Connection Properties 

oracle.hadoop.connection.url

Specifies the URL of the database connection string. This property takes precedence over all other connection properties.

If an Oracle wallet is configured as an external password store, then the property value must start with the driver prefix jdbc:oracle:thin:@ and db_connect_string must exactly match the credentials defined in the wallet.

Default value: Not defined

Valid values: A string

oracle.hadoop.connection.user

An Oracle database log-in name.

Default value: Not defined

Valid values: A string

oracle.hadoop.connection.tnsEntryName

Specifies a TNS entry name defined in the tnsnames.ora file.

This property is used with the oracle.hadoop.connection.tns_admin property.

Default value: Not defined

Valid values: A string

oracle.hadoop.connection.tns_admin

A file path to a directory containing SQL*Net configuration files such as sqlnet.ora and tnsnames.ora. Define this property to use transparent network substrate (TNS) entry names in database connection strings.

This property must be set when using Oracle Wallet as an external password store. See oracle.hadoop.connection.wallet_location.

Default value: Not defined; the value of the TNS_ADMIN environment variable is used.

Valid values: A string

oracle.hadoop.connection.wallet_location

A file path to an Oracle wallet directory where the connection credential is stored.

Default value: Not defined

Valid values: A string

When using Oracle Wallet as an external password store, set these properties:

  • oracle.hadoop.connection.wallet_location

  • oracle.hadoop.connection.url or oracle.hadoop.connection.tnsEntryName

  • oracle.hadoop.connection.tns_admin

Hive Properties 

hive.metastore.local

Indicates whether the Hive metastore is local or remote.

Default value: true

Set to true if the metastore is local, or false if it is remote. For a remote metastore, you must specify hive.metastore.uris.

hive.metastore.uris

URI of a remote Hive metastore. Hive connects to this URI to make metadata requests.

hive.metastore.warehouse.dir

A default location in HDFS for the Hive tables.

Example: /user/beeswax/warehouse

javax.jdo.option.ConnectionDriverName

JDBC Driver class name for the data store that contains the Hive metadata.

Example: This example specifies the JDBC driver class for MySQL Database:

com.mysql.jdbc.Driver
javax.jdo.option.ConnectionPassword

The password for connecting to a data store such as MySQL Database.

javax.jdo.option.ConnectionURL

JDBC connection string for the data store that contains Hive metadata.

Example: This is the syntax for the connection URL when the Hive metadata is stored in MySQL Database:

jdbc:mysql://host-name/database-name?createDatabaseIfNotExist=true
javax.jdo.option.ConnectionUserName

The user name for connecting to a data store such as MySQL Database.