3 Oracle Loader for Hadoop

この章では、Oracle Loader for Hadoopを使用してHadoopファイルからOracle Databaseの表へデータをコピーする方法について説明します。次の項目について説明します。

Oracle Loader for Hadoopとは
Oracle Loader for Hadoopの使用
OraLoader起動時の出力モード
パーティション表へのデータのロード時の負荷分散
負荷分散機能に関する主な構成プロパティ
OraLoaderの構成プロパティ
Oracle Loader for Hadoopの使用例
ターゲット表の特性
ローダー・マップXMLスキーマ定義
Hadoop用OraLoaderの構成プロパティ
同梱されているソフトウェアのサードパーティ・ライセンス

Oracle Loader for Hadoopとは

Oracle Loader for Hadoopは、Apache Hadoop環境からOracle databaseの表へデータを高速に移動するための効率的でパフォーマンスのよいローダーです。Oracle Loader for Hadoopでは、パーティション化された表へのロード、データの事前パーティション化およびOracle対応形式への変換用にデータを準備します。必要に応じて、データのロードや出力ファイルの作成の前に主キーでレコードをソートします。MapReduceジョブとして実行され、事前パーティション化と変換のステップがHadoopクラスタで実行されます。

注意:

パーティション化は、非常に大規模な表の管理および効率的な問合せを行うためのデータベースの機能です。アプリケーションに対して完全に透過的な方法で、大規模な表をパーティションと呼ばれる小規模でより管理し易いサイズに分割する方法を提供します。パーティション化の詳細は、『Oracle Database VLDBおよびパーティショニング・ガイド』を参照してください。

事前パーティション化と変換のステップの後、HadoopクラスタからOracleデータベースにデータをロードするには2つのモードがあります。

オンライン・データベース・モード: JDBC出力形式またはOCIダイレクト・パス出力形式を使用してデータがデータベースにロードされます。OCIダイレクト・パス出力形式は、パフォーマンスのよい、ターゲット表のダイレクト・パス・ロードを実行します。JDBC出力形式は、従来のパス・ロードを実行します。OCIダイレクト・パス出力形式の制限を含めたオンライン・ロード方法の詳細は、「OraLoader起動時の出力モード」を参照してください。
オフライン・データベース・モード: リデューサ・ノードで、バイナリまたはテキスト形式の出力ファイルが作成されます。データ・ポンプ出力形式では、外部表およびORACLE_DATAPUMPアクセス・ドライバを使用したOracleデータベースへのロードが可能なバイナリ形式ファイルが作成されます。デリミタ付きテキスト出力形式では、デリミタ付きレコード形式でテキスト・ファイルが作成されます。(デリミタがカンマの場合、通常カンマ区切り(CSV)形式と呼ばれます。)これらのテキスト・ファイルは、外部表およびORACLE_LOADERアクセス・ドライバを使用したOracleデータベースへのロードが可能です。ファイルは、SQL*Loaderユーティリティを使用してロードすることもできます。

Oracle Loader for Hadoopの使用

この項では、Oracle Loader for Hadoopを使用するための次の手順について説明します。

InputFormatの実装
loaderMapドキュメントの作成
表のメタデータへのアクセス
OraLoaderの起動
Oracle Databaseへのファイルのロード(オフライン・ロードのみ)

インストール手順については、第1章を参照してください。

InputFormatの実装

Oracle Loader for Hadoopは、mapreduce.inputformat.class configurationプロパティで指定されたorg.apache.hadoop.mapreduce.InputFormatクラスによって指定されたように、org.apache.hadoop.mapreduce.RecordReader実装からの入力を取得するマップ・リデュース・アプリケーションです。Oracle Loader for Hadoopは、RecordReaderでgetCurrentKey()メソッドからAvro IndexedRecordが返されることを求めます。メソッドのシグネチャは次のとおりです。

public org.apache.avro.generic.IndexedRecord getCurrentKey()     
throws IOException, InterruptedException;

Oracle Loader for Hadoopでは、IndexedRecordのスキーマを使用して入力フィールドの名前を検出し、ロードする表の列にマップします。このマッピングについては、後続の各項で詳細を説明します。

Oracle Loader for Hadoopには、2つの組込み入力形式が用意されています。2つのInputFormatのサンプルのソース・コードも含まれています。サンプル・ソース・コードは、jsrc/ディレクトリにあります。表3-1に、これらのすべての入力形式と処理される入力タイプ、Avroスキーマ・フィールド名の生成方法を示します。(ターゲット表にデータをロードするには、InputFormatによるフィールド名の生成方法を理解する必要があります。)

組込み入力形式クラスについては、後続の各項で説明します。サンプルについては、ソース・コードとこれらのクラスのJavadocで詳細を確認してください。

表3-1 InputFormatの各クラス、タイプおよびフィールド名

クラス	入力タイプ	Avroスキーマ・フィールド名
`oracle.hadoop.loader.lib.input.HiveToAvroInputFormat`	Hive表ソース	Hive表の列名(大文字)
`oracle.hadoop.loader.lib.input.DelimitedTextInputFormat`	デリミタ付きテキスト・ファイル	プロパティ`oracle.hadoop.loader. input.fieldNames`からのカンマ区切りのリスト (または、プロパティが定義されていない場合はF0, F1, ...)
`oracle.hadoop.loader.examples.CSVInputFormat`	単純なデリミタ付きテキスト・ファイル	F0, F1,...
`oracle.hadoop.loader.examples.AvroInputFormat`	バイナリ形式のAvroレコード・ファイル	入力ファイルのAvroスキーマのフィールド名

HiveToAvroInputFormat

このクラスは、Hive表からのデータ読み取る入力形式を表します。次の構成プロパティを使用して、hiveのデータベースと表の名前が指定される必要があります。

oracle.hadoop.loader.input.hive.tableName
oracle.hadoop.loader.input.hive.databaseName

HiveToAvroInputFormatは、HiveMetaStoreClientにアクセスして表の列、場所、入力形式、serDeなどの情報を取得します。Hiveの構成方法によっては、追加のhive固有のプロパティを設定する必要があります(hive.metastore.uris、hive.metastore.localなど)。

HiveToAvroInputFormatは、表全体(Hive表のディレクトリ内の全ファイル)をインポートします。このドキュメントに記載されている他のすべての(ファイルベースの)入力形式で、グロビング(ワイルドカード・パターンを入力ディレクトリに追加して入力を制限すること)が可能です。

Hive表の行は、大文字のHive表の列名をフィールド名とするAvroレコードに変換されます。これによって、loaderMapの問題は、大半のケースでほとんど問題にならなくなります(「loaderMap Documentの作成」を参照してください)。

DelimitedTextInputFormat

これは、カンマ区切りファイルやタブ区切りファイルなどのデリミタ付きファイル向けのInputFormatです。DelimitedTextInputFormatでは、改行文字でレコードが区切られ、1文字のマーカーを使用してフィールドが区切られる必要があります。

DelimitedTextInputFormatは、SQL*Loaderの"terminated by t [optionally enclosed by ie [and te]]"の動作をエミュレートするものです。tはフィールドの終端文字、ieは開始フィールド囲み文字、teは終了フィールド囲み文字です。

DelimitedTextInputFormatでは、次の文法に基づいてパーサーが使用されます。

Line = Token t Line | Token\n
Token = EnclosedToken | UnenclosedToken
EnclosedToken = (white-space)* ie [(non-te)* te te ]* (non-te)* te (white-space)*
UnenclosedToken = (white-space)* (non-t)*
white-space = {c | Character.isWhitespace(c) and c!=t}

囲まれたトークン内に含まれる終了フィールド囲み文字は、二重に(2回出力)してコード化する必要があります。

囲まれたトークンの前後の空白は破棄されます。囲まれていないトークンの場合、先頭の空白は廃棄されますが、末尾の空白(ある場合)は破棄されません。

空の文字列のトークンは(囲まれていても囲まれていなくても)、nullで置き換えられます。

この実装では、カスタム囲み文字と終端文字は許可されます(表3-2を参照)が、レコード終端文字と空白は(それぞれ、改行とJavaのCharacter.isWhitespace()に)ハードコードされます。囲み文字は、終端文字と空白とは異なる必要があります(囲み文字同士は同じものでかまいません)。終端文字は空白でもかまいません(その値は、空白文字のクラスから削除されます)。

表3-2に、DelimitedTextInputFormatで使用可能なデリミタを示します。表内のHHHHは、UTF-16の文字をビッグエンディアンの16進で表したものです。

表3-2 DelimitedTextInputFormatのデリミタ

デリミタのタイプ	プロパティ	使用可能な値	デフォルト
フィールド終端文字	oracle.hadoop.loader.input.fieldTerminator	1文字 \uHHHH	,(カンマ)
開始フィールド囲み文字	oracle.hadoop.loader.input.initialFieldEncloser	1文字 \uHHHH なし	デフォルトなし
終了フィールド囲み文字	oracle.hadoop.loader.input.trailingFieldEncloser	1文字 \uHHHH なし	デフォルトなし

フィールド囲み文字は、両方とも設定するか、両方とも設定しません。設定されていない場合、EnclosedToken non-terminalは、前述の文法から基本的には削除されます。フィールド囲み文字が設定されている場合、パーサーは、各フィールドをUnenclosedTokenとして読み取る前にまず、EnclosedTokenとして読み取ります。開始フィールド囲み文字が設定される場合、開始囲み文字と終了囲み文字が同じ場合でも、終了フィールド囲み文字も設定される必要があります。

DelimitedTextInputFormatは、構成プロパティoracle.hadoop.loader.input.fieldNamesから、カンマ区切りのリストとしてフィールド名を読み取ります。行の解析の結果フィールド名を超える数のトークン(フィールド)があった場合、余分なトークンは破棄されます。フィールド名よりトークンが少ない場合、不足分のトークンはnullに設定されます。

oracle.hadoop.loader.input.fieldNamesプロパティが設定されていない場合、DelimitedTextInputFormatのRecordReaderは、F0, F1,...Fnをフィールド名(nは、この時点までにRecordReaderで検出された1行当たりのトークンの最大数)として使用します。

loaderMapドキュメントの作成

Oracle Loader for Hadoopでは、1つのデータベース表にデータがロードされます。この表は、ターゲット表と呼ばれます。次の方法で、ターゲット表、ロードする列および入力フィールドのデータベース列へのマップ方法を指定できます。

データベース表のすべての列をロードし、入力フィールドの名前がデータベースの列名と完全に一致することを指定するには、構成プロパティoracle.hadoop.loader.targetTableを使用します。ターゲットのロード表に対してスキーマで修飾した名前を定義できます。データベースの列ごとに、ローダーは列名を使用して同じ名前の入力フィールドを検出します。その後、フィールドの値が列にロードされます。
ターゲット表の列の一部をロードする場合、または入力フィールド名がデータベースの列名と完全に一致しない場合、loaderMapを作成してターゲット表、列および入力フィールドのデータベース列へのマップ方法を指定します。loaderMapドキュメントの場所は、oracle.hadoop.loader.loaderMapFile構成プロパティを使用して指定します。

loaderMapドキュメントの例

次のloaderMapドキュメントの例では、ロード先のHR.EMPLOYEES表の列のリストを指定します。入力データ・フィールド名と表の列名との間のマッピングも含まれます。その列に使用される入力データの形式も指定されます。

<?xml version="1.0" encoding="UTF-8"?>
<LOADER_MAP>
        <SCHEMA>HR</SCHEMA>
        <TABLE>EMPLOYEES</TABLE>
        <COLUMN field="empId">EMPLOYEE_ID</COLUMN>
        <COLUMN field="lastName">LAST_NAME</COLUMN>
        <COLUMN field="email">EMAIL</COLUMN>
        <COLUMN field="hireDate" format="MM-dd-yyyy">HIRE_DATE</COLUMN>
        <COLUMN field="jobId">JOB_ID</COLUMN>
</LOADER_MAP>

注意:

ターゲット表のすべての列がロードに使用され、IndexedRecord入力オブジェクトの入力データ・フィールド名が列名に完全に一致する場合、表の列のいずれかがDATEでないかぎり、loaderMapファイルは必要ありません。DATE列にマップされる入力フィールドは、デフォルトのJava日付形式を使用して解析されます。入力が別の形式の場合、loaderMapドキュメントを作成し、形式属性を使用して入力値の解析時に使用されるJava日付形式文字列を指定する必要があります。

表のメタデータへのアクセス

Oracle Loader for Hadoopでは、Oracle Databaseの表のメタデータを使用してローダー・ジョブの実行を制御します。JDBC接続が確立可能な場合、ローダーでメタデータを自動的にフェッチします。ローダー・ジョブでデータベースにアクセスできない場合があります。たとえば、Hadoopクラスタが、データベースとは別のネットワークにある場合などです。この場合、OraLoaderMetadataユーティリティ・プログラムを使用して、データベースからXMLドキュメントに表のメタデータを抽出します。メタデータのドキュメントは、Hadoopクラスタに転送されます。構成プロパティoracle.hadoop.loader.tableMetadataFileを使用して、メタデータ・ドキュメントの場所を指定します。ローダー・ジョブの実行時、このドキュメントがアクセスされ、ターゲット表に関する必要なメタデータ情報がすべて検出されます。

OraLoaderMetadataユーティリティの実行

OraLoaderMetadata Javaユーティリティを実行するには、次のjarファイルをCLASSPATH変数に追加します。

${OLH_HOME}/jlib/oraloader.jar
${OLH_HOME}/jlib/ojdbc6.jar
${OLH_HOME}/jlib/oraclepki.jar

注意:

oraclepki.jarライブラリは、Oracle Walletに格納されている資格証明を使用してデータベースに接続する場合にのみ必要です。

次のコマンドを実行します。

java oracle.hadoop.loader.metadata.OraLoaderMetadata 
-user <username> -connection_url <connection URL> [-schema <schemaName>]
-table <tableName> -output <output filename>

OraLoaderMetadataのパラメータ

-userは、Oracleデータベース・ユーザー名です。ユーザーは、パスワードを要求されます。
-connection_urlは、Oracleデータベースに接続するための接続URLです。
-schemaは、ターゲット表に含まれるスキーマの名前です。指定されない場合、ターゲット表は、接続URLで指定されたユーザー・スキーマ内にあるとみなされます。
-tableは、ターゲット表の名前です。
-outputは、メタデータ・ドキュメントを格納する出力ファイル名です。

OraLoaderの起動

OraLoaderは、標準のHadoopツールを使用して実行するHadoopジョブです。OraLoaderは、org.apache.hadoop.util.Toolインタフェースを実装し、MapReduceアプリケーションを構築する標準的なHadoopの方法に従います。OraLoaderは、次の処理を実行します。

入力構成パラメータを読み取り、チェックします。
ターゲット表の表と列のメタデータ情報を取得し、チェックします。JDBC接続が確立可能な場合、メタデータがデータベースから取得されます。そうでない場合、oracle.hadoop.loader.tableMetadataFileプロパティで指定された場所に格納されているメタデータがローダーによって検索されます。
OraLoaderのMapReduceタスク用に内部構成情報を準備し、表のメタデータ情報と従属するJavaライブラリを分散キャッシュに格納して、クラスタ全体でマップとリデュースのタスクに使用できるようにします。
MapReduceジョブをHadoopに発行します。
マップとリデュースのタスクの完了後、個々のタスクからのレポート情報をまとめてジョブの共通のログ・ファイルを作成します。ログ・ファイルは、ジョブ出力ディレクトリに書き込まれ、oraloader-report.txtという名前になります。

OraLoaderはコマンドラインから起動され、一般的なコマンドライン・オプションを受け付けます。起動例を次に示します。

bin/hadoop ${OLH_HOME}/jlib/oraloader.jar oracle.hadoop.loader.OraLoader
-conf MyConf.xml

Hadoop実行可能ファイルの場所とHADOOP_CLASSPATH変数の設定については、Apache Hadoopのドキュメントを参照してください。

Oracle Databaseへのファイルのロード(オフライン・ロードのみ)

オフライン・ロードの場合、Oracle Loader for Hadoopによって、データベース・サーバーにコピーされ、Oracleデータベースにロードされるファイルが生成されます。次の項で、使用可能なオフライン・ロード方法について説明します。

デリミタ付きテキスト・ファイルからOracle Databaseへのロード

デリミタ付きテキスト・ファイルをOracleデータベース・サーバーにコピーしたら、生成された制御ファイルを使用してSQL*Loaderを起動し、デリミタ付きテキスト・ファイルからデータベースにデータをロードします。また、生成されたSQLスクリプトを使用して、外部表のデータベースへのロードを実行することもできます。「デリミタ付きテキスト出力」を参照してください。

OraLoader起動時の出力モード

この項では、次の出力オプションについて説明します。

JDBC出力
Oracle OCIダイレクト・パス出力
デリミタ付きテキスト出力
Oracle Data Pump出力

JDBC出力

JDBCは、オンライン・データベース・モードの出力オプションです。ローダー・ジョブの出力レコードは、OraLoaderプロセスの一部としてマップ・タスクまたはリデュース・タスクによって直接ターゲット表にロードされます。データをロードするために追加手順を実行する必要はありません。この出力オプションには、HadoopシステムとOracleデータベースとの間のJDBC接続が必要です。

JDBC出力オプションでは、標準のJDBCバッチを使用してパフォーマンスと効率を向上させます。バッチの実行時に制約違反などのエラーが発生すると、JDBCドライバは最初のエラーで実行を停止します。つまり、バッチに100行あり、10行目でエラーが発生した場合、9行は挿入され、91行は挿入されません。また、JDBCドライバでは、エラーが発生した行を特定するための情報は提供されません。この場合、Oracle Loaderでは、バッチ内の各行の挿入ステータスは把握されません。バッチ内のすべての行に問題があるとみなされ、次のバッチのロードが続けられます。発生したバッチ・エラーの数と挿入ステータスに問題のある行の数を示すロード・レポートがジョブの最後に生成されます。この問題に対処する方法の1つは、データに対して一意キーを使用することです。データのロード後、キーを有効にして欠落しているキー値を検出します。ロードに失敗した原因がわかったら、欠落している行を入力ファイル内で特定し、再ロードする必要があります。

JDBC出力形式を選択するには、次のHadoopプロパティを設定します。

<property>
  <name>mapreduce.outputformat.class</name>
  <value>oracle.hadoop.loader.lib.output.JDBCOutputFormat</value>
</property>

JDBC出力の構成に関連するプロパティは次のとおりです。

oracle.hadoop.loader.jdbc.defaultExecuteBatch: バッチのサイズを制御します。

Oracle OCIダイレクト・パス出力

Oracle OCIダイレクト・パス出力形式は、オンライン・データベース・モードで使用可能です。この出力形式では、OCIダイレクト・パス・インタフェースを使用して、行をターゲット表にロードします。各リデューサが異なるデータベース・パーティションにロードするため、並列ダイレクト・パス・ロードが可能になります。

Oracle OCIダイレクト・パス出力形式を選択するには、次のHadoopプロパティを設定します。

<property>
  <name>mapreduce.outputformat.class</name>
  <value>oracle.hadoop.loader.lib.output.OCIOutputFormat</value>
</property>

ダイレクト・パス・ストリーム・バッファのサイズは、次のプロパティを使用して制御されます。

<property>
  <name>oracle.hadoop.loader.output.dirpathBufsize</name>
  <value>131072</value>
  <description>
   This property is used to set the size, in bytes, of the direct path 
   stream buffer for OCIOutputFormat.  If needed, values are rounded 
   up to the next nearest multiple of 8k.
  </description>
</property>

Oracle OCIダイレクト・パス出力形式には、次の制限があります。

Linux x86.64プラットフォームでのみ使用できます。
ロード・ターゲット表は、パーティション化されている必要があります。
リデューサの数は、ゼロより大きい数である必要があります。
OCIダイレクト・パス出力では、サブパーティション・キーにCHAR、VARCHAR2、NCHARまたはNVARCHAR2列を含むコンポジット時間隔パーティション表にはロードされません。ローダーはこの状態をチェックし、ターゲット・ロード表がこの条件に合う場合、エラーで停止します。サブパーティション・キーに文字型の列が含まれないコンポジット時間隔パーティションはサポートされます。

Oracle OCIダイレクト・パス出力形式には、次の構成手順が必要です。この手順によって、ローダーで、出力形式を実装するC共有ライブラリの検出が可能になります。これらのライブラリは自動的に配布され、Hadoop分散キャッシュ・メカニズムを使用するノードが導出されます。

ディレクトリ$OLH_HOME/libを指す環境変数JAVA_LIBRARY_PATHを作成します。この環境変数は、ジョブが発行されるノードにのみ必要です。ジョブの作成時、CDH3の$HADOOP_HOME/bin/hadoopコマンドで、この変数の値がJavaシステム・プロパティjava.library.pathに自動的に挿入されます。Apache Hadoopディストリビューションの場合、新しい値と既存の値が連結されるように$HADOOP_HOME/bin/hadoopコマンドを編集する必要があります。Apache hadoopコマンドは、空のJAVA_LIBRARY_PATH値から開始され、環境から値をインポートしません。
ローダー・ジョブが発行されるクライアントで、$OLH_HOME/libをLD_LIBRARY_PATH変数に追加します。

デリミタ付きテキスト出力

デリミタ付きテキストは、オフライン・データベース・モードの出力オプションです。カンマ区切り(CSV)形式のファイルまたは他のデリミタ付きテキスト・ファイルは、マップまたはリデュース・タスクによって生成されます。これらのファイルは、SQL*Loaderまたは外部表を使用してターゲット表にロードされます。

デリミタ付きテキスト出力形式を選択するには、次のHadoopプロパティを設定します。

<property>
  <name>mapreduce.outputformat.class</name>
  <value>oracle.hadoop.loader.lib.output.DelimitedTextOutputFormat</value>
</property>

各出力タスクで、デリミタ付きテキスト形式ファイルと、デリミタ付きテキスト・ファイルをターゲット表にロードするためのSQL*Loader制御ファイルまたはSQLスクリプトが生成されます。

デリミタ付きテキストファイルには、次のテンプレートがあります。

oraloader-${taskId}-csv-${partitionId}.dat

SQL*Loader制御ファイル名には、次のテンプレートがあります。

oraloader-${taskId}-csv-${partitionId}.ctl

外部表を使用したロードのためのSQLスクリプトには、次のテンプレートがあります。

oraloader-${taskId}-csv-${partitionId}.sql

テンプレート・パラメータの定義は次のとおりです。

${taskId}: マッパー(リデューサ)ID

${partitionId}: パーティション識別子

デリミタ付きテキスト・ファイル内のレコードとフィールドの形式は、次のプロパティによって制御されます。

oracle.hadoop.loader.output.fieldTerminator: フィールドを区切る1文字
oracle.hadoop.loader.output.initialFieldEncloser: 設定されると、フィールドは常にこの文字とtrailingFieldEncloserで囲まれます。
oracle.hadoop.loader.output.trailingFieldEncloser: 設定されると、フィールドは常にinitialFieldEncloserとこの文字で囲まれます。
oracle.hadoop.loader.output.escapeEnclosers: 埋込みの終了フィールド囲み文字のエスケープに使用します。

例3-1 サンプルSQL*Loader制御ファイル

LOAD DATA CHARACTERSET AL32UTF8
INFILE 'oraloader-csv-1-0.dat'
BADFILE 'oraloader-csv-1-0.bad'
DISCARDFILE 'oraloader-csv-1-0.dsc'
INTO TABLE "SCOTT"."CSV_PART" PARTITION(10) APPEND
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
(
"ID"      DECIMAL EXTERNAL,
"NAME"    CHAR,
"DOB"     DATE 'SYYYY-MM-DD HH24:MI:SS'
)

Oracle Data Pump出力

Oracle Data Pump出力形式は、オフライン・データベース・モードで使用可能です。外部表とORACLE_DATAPUMPアクセス・ドライバを使用してターゲット表にロードされるバイナリ形式ファイルがローダーによって生成されます。出力ファイルは、HDFSファイルシステムから、Oracle Databaseにアクセス可能なローカル・ファイルシステムにコピーされる必要があります。

Oracle Data Pump出力形式を選択するには、次のHadoopプロパティを設定します。

<property>
  <name>mapreduce.outputformat.class</name>
  <value>oracle.hadoop.loader.lib.output.DataPumpOutputFormat</value>
</property>

Oracle Data Pump出力ファイル名には、次のテンプレートがあります。

oraloader-${taskId}-dp-${partitionId}.dat

Oracle Loader for Hadoopで、次のタスクを実行するコマンドを含むSQLファイルも生成されます。

ORACLE_DATAPUMPアクセス・ドライバを使用する外部表定義を作成します。バイナリ形式のデータ・ポンプ出力ファイルは、外部表のLOCATION句にリストされます。

外部表によって使用されるディレクトリ・オブジェクトを作成します。このコマンドは、使用する前にコメント解除される必要があります。生成されるディレクトリ名をSQLファイルに指定するには、次のプロパティを設定します。

<property>
  <name>oracle.hadoop.loader.extTabDirectoryName</name>
  <value>OLH_EXTTAB_DIR</value>
  <description>
   The name of the Oracle directory object for the external table's
   LOCATION data files. This property applies only to the CSV and 
   DataPump output formats.
  </description>
</property>

外部表からターゲット表に行を挿入します。このコマンドは、使用する前にコメント解除される必要があります。

パーティション表へのデータのロード時の負荷分散

パーティション化されたデータベース表にデータをロードするときにリデューサ間で負荷を分散させるには、Oracle Loader for Hadoopのサンプリング機能を使用します。

リデューサの実行時間は、通常、処理するレコードの数に比例します。レコードが多いほど、実行時間は長くなります。サンプリング機能が無効の場合、特定のデータベース・パーティションのすべてのレコードが1つのリデューサに送られます。データベース・パーティションによってレコードの数が異なることがあるため、これによってリデューサの負荷は不均等になります。Hadoopジョブの実行時間は、最も遅いリデューサの実行時間であるため、リデューサの負荷が不均等な場合、ジョブ全体のパフォーマンスが低下します。

レコードをリデューサ間で均等に分割すると、リデューサの負荷は分散されますが、データベースに挿入する前に、データベース・パーティションごとにレコードが分類されるわけでは必ずしもありません。

Oracle Loader for Hadoopのサンプリング機能では、リデューサの負荷を分散させると同時に、データベース・パーティションごとにレコードを分類する、効率的なMapReduceパーティション化スキームが生成されます。

サンプリング機能の使用

サンプリング機能を有効にするには、構成プロパティoracle.hadoop.loader.sampler.enableSamplingをtrueに設定します。

enableSamplingプロパティがtrueに設定されている場合でも、サンプリングが不要な場合、または適切なサンプルを作成できないとローダーが判断した場合、ローダーによってサンプリング機能が自動的に無効になります。たとえば、表がパーティション化されていない場合、リデューサ・タスクの数が2未満の場合、または入力データが少なすぎて適切な負荷分散の計算ができない場合、サンプリングが自動的に無効になります。このような場合、ローダーは情報メッセージを返します。

注意:

サンプラはマルチスレッド化され、各サンプラ・スレッドは、指定されたInputFormatクラスのコピーをインスタンス化します。Oracle Loader for Hadoopに提供される新規のInputFormatの実装では、静的でミュータブルなデータ構造が複数のスレッド・アクセスに対して必ず同期化されます。

ローダー・ジョブが発行されるクライアント・ノードでサンプラがメモリー不足エラーを返す場合があります。これは、InputFormatによって返される入力分割がメモリーに収まらない場合に起こります。

この問題に対する考えられる解決策は、次のとおりです。

ジョブが発行されるJVMのヒープ・サイズを大きくします。
次のプロパティを調整します。
```
oracle.hadoop.loader.sampler.hintMaxSplitSize
oracle.hadoop.loader.sampler.hintNumMapTasks
```
これらのプロパティの詳細は、「Hadoop用OraLoaderの構成プロパティ」を参照してください。

負荷分散とサンプリング動作のチューニング

Oracle Loader for Hadoopには、負荷分散とサンプリング動作のチューニングに使用できるプロパティが用意されています。これらのプロパティを表3-3にまとめます。

負荷分散をチューニングするプロパティ

負荷分散の目標は、すべてのリデューサにほぼ同量の処理を割り当てるMapReduceパーティション化スキームを生成することです。このスキームは、Oracle Loader for Hadoopのジョブの実行時のパーティション化ステップで使用されます。

maxLoadFactorとloadCIの2つのプロパティで負荷分散の質を制御します。サンプラは、所定のリデューサ負荷係数を使用してパーティション化スキームの質を評価します。負荷係数は、リデューサの負荷が、完全に分散されたリデューサの負荷とどの程度違っているかを示すメトリックです。負荷係数1は、完全に分散された負荷(過負荷ではない)を表します。

負荷係数が小さい場合、負荷分散が適切であることを表します。プロパティmaxLoadFactorは、負荷係数(1+maxLoadFactor)を表します。maxLoadFactorのデフォルト0.05は、5%以上の過負荷状態になるリデューサがないことを表します。サンプラは、統計的信頼度loadCIでこのmaxLoadFactorを保証します。loadCIのデフォルト値は0.95で、maxLoadFactorを超えるリデューサの負荷係数は5%のみであることを表します。

サンプラの実行時間と負荷分散の質の間にはトレードオフがあります。maxLoadFactorの値を低くしてloadCIの値を高くすると、リデューサの負荷はより均等化されますが、サンプリング時間は長くなります。maxLoadFactor=0.05およびloadCI=0.95というデフォルト値では、負荷分散の質と実行時間の兼合いが適切にとられます。

サンプリング動作をチューニングするプロパティ

デフォルトでは、サンプラは、maxLoadFactorとloadCIの基準を満たすパーティション化スキームを生成するのに十分なサンプルを収集するまで実行されます。

ただし、サンプラがサンプリングを停止する最大レコード数を指定するmaxSamplesPctプロパティを使用すると、サンプラの実行時間を制限できます。

Oracle Loader for Hadoopは常にサンプラのパーティション化スキームを使用するのか

Oracle Loader for Hadoopでは、サンプリングが成功の場合にのみ、生成されたパーティション化スキームを使用します。統計的信頼度loadCIで保証される最大リデューサ負荷係数(1+ maxLoadFactor)のパーティション化スキームが生成される場合、サンプリングは成功です。maxLoadFactor、loadCIおよびmaxSamplesPctのデフォルト値で、サンプラは、様々な入力データ分布に対する質の高いパーティション化スキームを正常に生成できます。ただし、サンプラが制約を満たすパーティション化スキームの生成に失敗することがあります(制約が厳しすぎる場合や、必要なサンプルの数が、ユーザーが指定した最大数であるmaxSamplesPctを超えている場合など)。このような場合、Oracle Loader for Hadoopは、十分なサンプルがなかったことを示すログ・メッセージを出力し、デフォルトであるデータベース・パーティション別のレコードの分割を行い、負荷分散は保証されません(「負荷分散とサンプリング動作のチューニング」を参照してください)。

代替策は、構成プロパティの値を緩和することです。これは、maxSamplesPctを大きくするか、maxLoadFactorまたはloadCI、あるいはその両方を小さくすることによって行えます。

サンプリング機能のプロパティの値が無効な場合

サンプリング機能の構成プロパティが、許容可能な範囲外の値に設定されている場合、例外は返されません。かわりに、サンプラは警告メッセージを出力し、プロパティをデフォルト値に設定して実行を続けます。

負荷分散機能に関する主な構成プロパティ

表3-3に、サンプリング動作のチューニングに使用できる主なプロパティを示します。プロパティの完全なリストは、「Hadoop用OraLoaderの構成プロパティ」を参照してください。

表3-3 Oracle Loader for Hadoopのサンプリング機能の構成プロパティ

情報タイプ	値
名前	`oracle.hadoop.loader.sampler.maxSamplesPct`
型	浮動小数
デフォルト	0.01
許容される範囲	[0, 1] 0以下の値の場合、このプロパティは無効になります。
説明	最大サンプル・サイズ(入力データ内のレコード数の割合)。値0.05は、サンプラがサンプリングするのはレコードの総数の5%以下であることを示します。サンプラは、これより少ないサンプルを収集します。
-	-
名前	`oracle.hadoop.loader.sampler.maxLoadFactor`
型	浮動小数
デフォルト	0.05
許容される範囲	>= 0 0以下の値の場合、プロパティがデフォルトに再設定されます。
説明	リデューサの作業負荷に対する最大許容負荷係数。
-	-
名前	`oracle.hadoop.loader.sampler.loadCI`
型	浮動小数
デフォルト	0.95
許容される範囲	>= 0.5および< 1 推奨値は>= 0.9です。 0.5未満の値の場合、プロパティがデフォルトに再設定されます。
説明	リデューサの最大負荷係数に対する統計的信頼度。通常使用される値は、0.95および0.99です。

OraLoaderの構成プロパティ

OraLoaderは、構成プロパティの指定にHadoopの標準的なメソッドを使用します。構成ファイルまたは-D property=valueオプションをGenericOptionsParserおよびToolRunnerに使用することで指定できます。

表3-4および表3-5に、Oracle Loader for Hadoopの主な構成プロパティの簡単な説明を示します。すべての構成プロパティの完全なリストと詳細な説明については、「Hadoop用OraLoaderの構成プロパティ」のoraloader-conf.xmlドキュメントを参照してください。

表3-4 Oracle Loader for Hadoopの主なジョブ構成プロパティ

情報タイプ	値
名前	`oracle.hadoop.loader.jobName`
型	文字列
デフォルト	OraLoader
説明	このOracleローダー・ジョブのHadoopジョブ名。`Job.setJobName()`メソッドの入力として使用されます。
-	-
名前	`oracle.hadoop.loader.targetTable`
型	文字列
デフォルト	定義されていません。
説明	ロード先の表のスキーマで修飾された名前。このオプションを使用して、表のすべての列がロードされることと、入力フィールドの名前が列名と一致することを示します。このプロパティは、`oracle.hadoop.loader.loaderMapFile`プロパティより優先されます。
-	-
名前	`oracle.hadoop.loader.loaderMapFile`
型	文字列
デフォルト	定義されていません。
説明	ローダー・マップ・ファイルへのパス。
-	-
名前	`oracle.hadoop.loader.tableMetadataFile`
型	文字列
デフォルト	定義されていません。
説明	ターゲット表のメタデータ・ファイルへのパス。切断モードで実行する場合、このオプションを使用します。表のメタデータ・ファイルは、`OraLoaderMetadata`ユーティリティを実行すると作成されます。
-	-
名前	`oracle.hadoop.loader.olhcachePath`
型	文字列
デフォルト	${mapred.output.dir}/.../olhcache
説明	Oracle Loader for Hadoopが`DistributedCache`へロードされるファイルを作成できるディレクトリへのパス。分散モードでは、値はHDFSパスである必要があります。
-	-
名前	`oracle.hadoop.loader.extTabDirectoryName`
型	文字列
デフォルト	`OLH_EXTTAB_DIR`
説明	外部表の`LOCATION`データ・ファイルに対するOracleディレクトリ・オブジェクトの名前。このプロパティは、デリミタ付きテキストとデータ・ポンプ出力形式にのみ適用されます。
-	-
名前	`oracle.hadoop.loader.sampler.enableSampling`
型	ブール
デフォルト	true
説明	サンプリング機能が有効かどうかを示します。
-	-
名前	`oracle.hadoop.loader.sampler.enableSorting`
型	ブール
デフォルト	true
説明	各リデューサ・グループ内の出力レコードが、表の主キーでソートされるかどうかを示します。
-	-
名前	`oracle.hadoop.loader.connection.url`
型	文字列
デフォルト	定義されていません。
説明	データベース接続文字列のURLを指定します。このプロパティは、他のすべての接続プロパティより優先されます。Oracle walletが外部パスワード・ストアとして構成されている場合、プロパティ値は、ドライバ接頭辞`jdbc:oracle:thin:@`で始まる必要があり、`db_connect_string`は、ウォレットに定義されている資格証明と完全に一致する必要があります。
-	-
名前	`oracle.hadoop.loader.connection.user`
型	文字列
デフォルト	定義されていません。
説明	データベース・ログインの名前。
-	-
名前	`oracle.hadoop.loader.connection.password`
型	文字列
デフォルト	定義されていません。
説明	接続するユーザーのパスワード。
-	-
名前	`oracle.hadoop.loader.connection.wallet_location`
型	文字列
デフォルト	定義されていません。
説明	接続情報が格納されるOracle walletへのファイル・パス。このプロパティは、JDBC接続にのみ使用されます。 JDBC出力形式では、Oracle Walletを外部パスワード・ストアとして使用する場合、次の2つのプロパティを設定します。 `oracle.hadoop.loader.connection.wallet_location` `oracle.hadoop.loader.connection.url` または、次の3つのプロパティを設定します。 `oracle.hadoop.loader.connection.wallet_location` `oracle.hadoop.loader.connection.tnsEntryName` `oracle.hadoop.loader.connection.tns_admin` OCI出力形式の場合、`oracle.hadoop.loader.connection.tns_admin`プロパティを設定してウォレットの場所を指定します。オンライン・ロードの場合、OCIダイレクト・パス出力形式が指定された場合でも、常にJDBC接続が行われることに注意してください。両方の接続タイプに同じウォレットを使用できます。
-	-
名前	`oracle.hadoop.loader.connection.tnsEntryName`
型	文字列
デフォルト	定義されていません。
説明	`tnsnames.ora`ファイルに定義されたTNSエントリ名を指定します。このプロパティは、`oracle.hadoop.loader.connection.tns_admin`プロパティとともに使用します。
-	-
名前	`oracle.hadoop.loader.connection.tns_admin`
型	文字列
デフォルト	定義されていません。
説明	`sqlnet.ora`や`tnsnames.ora`などのSQL*Net構成ファイルが含まれるディレクトリへのファイル・パス。この値が設定されていない場合、環境変数`TNS_ADMIN`の値(ある場合)が使用されます。データベース接続文字列でTNSエントリ名を使用する場合、このプロパティを定義します。Oracle WalletとOCI接続を組み合せて使用する場合、このプロパティを定義する必要があります。
-	-
名前	`oracle.hadoop.loader.connection.defaultExecuteBatch`
型	整数
デフォルト	100
説明	JDBCおよびOCIダイレクト・パス出力形式にのみ適用されます。データベースへのトリップごとにバッチで挿入されるレコードの数のデフォルト値。1より大きい値を指定すると、デフォルト値がオーバーライドされます。指定された値が1より小さい場合、このプロパティはデフォルト値をとります。最大値の制限はありませんが、パフォーマンスはあまり向上せずにメモリー・フットプリントが大きくなるため、非常に大きいバッチ・サイズを使用することは推奨されません。
-	-
名前	`oracle.hadoop.loader.connection.sessionTimeZone`
型	文字列
デフォルト	LOCAL
説明	このプロパティは、データベース接続のセッション・タイムゾーンの変更に使用されます。有効な値は次のとおりです。 [+\|-] hh:mm: UTCとの差分の時間数と分数 LOCAL: JVMのデフォルト・タイムゾーン time_zone_region: 有効なタイムゾーン・リージョンこのプロパティは、`TIMESTAMP`、`TIMESTAMP WITH TIME ZONE`および`TIMESTAMP WITH LOCAL TIME ZONE`のデータベース列にロードされる入力データの解析に使用されるデフォルト・タイムゾーンも決定します。
-	-
名前	`oracle.hadoop.loader.output.dirpathBufsize`
型	整数
デフォルト	131072
説明	このプロパティは、`OCIOutputFormat`のダイレクト・パス・ストリーム・バッファのサイズ(バイト)の設定に使用されます。必要に応じて、値は8KBの倍数に切り上げられます。
-	-
名前	`oracle.hadoop.loader.output.fieldTerminator`
型	文字列
デフォルト	,(カンマ)
説明	`DelimitedTextOutputFormat`のフィールドを区切る1文字。代替表記: \uHHHH (HHHHは文字のUTF-16エンコーディング)。
-	-
名前	`oracle.hadoop.loader.output.initialFieldEncloser`
型	文字列
デフォルト	なし
説明	この値が設定されている場合、フィールドは常に指定された文字と`${oracle.hadoop.loader.output.trailingFieldEncloser}`で囲まれます。この値を設定する場合、1文字または\uHHHH (HHHHは文字のUTF-16エンコーディング)である必要があります。 `${oracle.hadoop.loader.output.initialFieldEncloser}`と`${oracle.hadoop.loader.output.trailingFieldEncloser}`は両方とも設定しないか、両方とも設定する必要があります。ゼロ長値は囲み文字がないことを表します(デフォルト値)。フィールドに`fieldTerminator`が含まれている可能性がある場合、これを使用します。フィールドに`trailingFieldEncloser`も含まれている可能性がある場合、`escapeEnclosers`プロパティを`true`に設定します。
-	-
名前	`oracle.hadoop.loader.output.trailingFieldEncloser`
型	文字列
デフォルト	なし
説明	この値が設定されている場合、フィールドは常に`${oracle.hadoop.loader.output.initialFieldEncloser}`とこのプロパティに指定された文字で囲まれます。この値を設定する場合、1文字または\uHHHH (HHHHは文字のUTF-16エンコーディング)である必要があります。 `${oracle.hadoop.loader.output.initialFieldEncloser}`と`${oracle.hadoop.loader.output.trailingFieldEncloser}`は両方とも設定しないか、両方とも設定する必要があります。ゼロ長値は囲み文字がないことを表します(デフォルト値)。フィールドに`fieldTerminator`が含まれている可能性がある場合、これを使用します。フィールドに`trailingFieldEncloser`も含まれている可能性がある場合、`escapeEnclosers`プロパティを`true`に設定します。
-	-
名前	`oracle.hadoop.loader.output.escapeEnclosers`
型	ブール
デフォルト	false
説明	これが`true`に設定され、開始と終了の両方のフィールド囲み文字が設定されている場合、フィールドが走査され、埋込みの終了囲み文字がエスケープされます。フィールド値に終了囲み文字が含まれている可能性がある場合、このオプションを使用します。
-	-
名前	`oracle.hadoop.loader.input.fieldTerminator`
型	文字列
デフォルト	,(カンマ)
説明	`DelimitedTextInputFormat`のフィールドを区切る1文字。代替表記: \uHHHH (HHHHは文字のUTF-16エンコーディング)。
-	-
名前	`oracle.hadoop.loader.input.initialFieldEncloser`
型	文字列
デフォルト	なし
説明	この値が設定されている場合、フィールドを指定された文字と`${oracle.hadoop.loader.input.trailingFieldEncloser}`で囲むことができます。この値を設定する場合、1文字または\uHHHH (HHHHは文字のUTF-16エンコーディング)である必要があります。 `${oracle.hadoop.loader.input.initialFieldEncloser}`と`${oracle.hadoop.loader.input.trailingFieldEncloser}`は両方とも設定しないか、両方とも設定する必要があります。ゼロ長値は囲み文字がないことを表します(デフォルト値)。
-	-
名前	`oracle.hadoop.loader.input.trailingFieldEncloser`
型	文字列
デフォルト	なし
説明	この値が設定されている場合、フィールドを`${oracle.hadoop.loader.input.initialFieldEncloser}`と指定された文字で囲むことができます。この値を設定する場合、1文字または\uHHHH (HHHHは文字のUTF-16エンコーディング)である必要があります。 `${oracle.hadoop.loader.input.initialFieldEncloser}`と`${oracle.hadoop.loader.input.trailingFieldEncloser}`は両方とも設定しないか、両方とも設定する必要があります。ゼロ長値は囲み文字がないことを表します(デフォルト値)。
-	-
名前	`oracle.hadoop.loader.input.fieldNames`
型	文字列のカンマ区切りのリスト
デフォルト	F0,F1,F2,...
説明	入力フィールドに割り当てられる名前。名前は、レコードのAvroスキーマの作成に使用されます。文字列は、有効なJSON名文字列である必要があります。

表3-5 Oracle Loader for Hadoopの一般的なプロパティ

プロパティ名説明

プロパティ名	説明
`mapreduce.inputformat.class`	`InputFormat`を実装するクラスの名前。
`mapreduce.outputformat.class`	Oracle Loader for Hadoopでサポートされる出力オプション。値は次のとおりです。 `oracle.hadoop.loader.lib.output.DelimitedTextOutputFormat` データ・レコードを、カンマ区切り(CSV)形式ファイルなどのデリミタ付きテキスト形式ファイルに書き込みます。 `oracle.hadoop.loader.lib.output.JDBCOutputFormat` JDBCを使用してデータ・レコードをターゲット表に挿入します。 `oracle.hadoop.loader.lib.output.OCIOutputFormat` Oracle OCIダイレクト・パス・インタフェースを使用して、行をターゲット表に挿入します。 `oracle.hadoop.loader.lib.output.DataPumpOutputFormat` 外部表を使用してターゲット表にロードされるバイナリ形式ファイルに行を書き込みます。

mapreduce.inputformat.class

InputFormatを実装するクラスの名前。

mapreduce.outputformat.class

Oracle Loader for Hadoopでサポートされる出力オプション。値は次のとおりです。

oracle.hadoop.loader.lib.output.DelimitedTextOutputFormat

データ・レコードを、カンマ区切り(CSV)形式ファイルなどのデリミタ付きテキスト形式ファイルに書き込みます。
oracle.hadoop.loader.lib.output.JDBCOutputFormat

JDBCを使用してデータ・レコードをターゲット表に挿入します。
oracle.hadoop.loader.lib.output.OCIOutputFormat

Oracle OCIダイレクト・パス・インタフェースを使用して、行をターゲット表に挿入します。
oracle.hadoop.loader.lib.output.DataPumpOutputFormat

外部表を使用してターゲット表にロードされるバイナリ形式ファイルに行を書き込みます。

Oracle Loader for Hadoopの使用例

この項に示す例では、JDBCを使用するオンライン・データベース・モードでOracle Loader for Hadoopを使用します。次のステップがあります。

データベースに表を作成します。この例では、Oracle DatabaseのHRサンプル・スキーマの一部として使用可能なHR.EMPLOYEES表を使用します。
oracle.hadoop.loader.examplesパッケージの例と同様にInputFormatクラスを実装します。

構成プロパティを設定します。MyLoaderMap.xmlドキュメントには、入力データ・フィールドとHR.EMPLOYEES表の列との次のようなマッピングが含まれます。

<?xml version="1.0" encoding="UTF-8"?>
<LOADER_MAP>
        <SCHEMA>HR</SCHEMA>
        <TABLE>EMPLOYEES</TABLE>
        <COLUMN field="empId">EMPLOYEE_ID</COLUMN>
        <COLUMN field="lastName">LAST_NAME</COLUMN>
        <COLUMN field="email">EMAIL</COLUMN>
        <COLUMN field="hireDate" format="MM-dd-yyyy">HIRE_DATE</COLUMN>
        <COLUMN field="jobId">JOB_ID</COLUMN>
</LOADER_MAP>

MyConf.xmlの構成プロパティは次のとおりです。

<configuration>
  <property>
    <name>mapreduce.inputformat.class</name>
    <value><full_class_name>.MyInputFormat</value>
    <description> Name of the class implementing InputFormat </description>
  </property>
 
  <property>
    <name>mapreduce.outputformat.class</name>
    <value>oracle.hadoop.loader.lib.output.JDBCOutputFormat</value>
    <description> Output mode after the loader job executes on Hadoop  </description>
  </property>
 
  <property>
    <name>oracle.hadoop.loader.loaderMapFile</name>
    <value>MyLoaderMap.xml</value>
    <description> The loaderMap file specifying the mapping of input data
     fields to the table columns </description>
  </property>
 
 <property>
   <name>oracle.hadoop.loader.connection.user</name>
   <value>HR</value>
   <description> Name of the user connecting to the database</description>
 </property>

<property> 
  <name>oracle.hadoop.loader.connection.password</name>
  <value>[HR password]</value>
  <description>Password of the user connecting to the database</description>
</property>
 
 <property>
   <name>oracle.hadoop.loader.connection.url</name>
   <value>jdbc:oracle:thin:@//example.com:1521/serviceName</value>
   <description> Database connection string </description>
 </property>
</configuration>

OraLoaderを起動します。

bin/hadoop jar oraloader.jar oracle.hadoop.loader.OraLoader -libjars
avro-1.4.1.jar, MyInputFormat.jar -conf MyConf.xml 
-fs [<local|namenode:port>]
-jt [<local|jobtracker:port>]

ターゲット表の特性

Oracle Loader for Hadoopでは、ターゲット表と呼ばれる1つの表へのロードがサポートされます。ターゲット表はOracleデータベース内に存在する必要があります。データを含めることも、空にすることもできます。

サポートされるデータ型

Oracle Loader for Hadoopでは、次のOracle組込みデータ型がサポートされます。

VARCHAR2
CHAR
NVARCHAR2
NCHAR
NUMBER
FLOAT
RAW
BINARY_FLOAT
BINARY_DOUBLE
DATE
TIMESTAMP
TIMESTAMP WITH TIMEZONE
TIMESTAMPWITHLOCALTIMEZONE
INTERVALYEARTOMONTH
INTERVALDAYTOSECOND

ターゲット表には、サポートされていないデータ型の列が含まれていてもかまいませんが、これらの列はnull値可能である必要があります。そうでない場合、値を設定します。

サポートされるパーティション化方法

Oracle Loader for Hadoopでは、次の単一レベルおよび複合レベルのパーティション化方法がサポートされます。

レンジ
リスト
ハッシュ
時間隔
レンジ-レンジ
レンジ-ハッシュ
レンジ-リスト
リスト-レンジ
リスト-ハッシュ
リスト-リスト
ハッシュ-レンジ
ハッシュ-ハッシュ
ハッシュ-リスト
時間隔-レンジ
時間隔-ハッシュ
時間隔-リスト

Oracle Loader for Hadoopでは、参照または仮想列ベースのパーティション化を使用する表はサポートされません。

Oracle Loader for Hadoopでは、ロード時、NOT NULL制約がサポートされます。他の制約は課されません。

ローダー・マップXMLスキーマ定義

これは、ターゲット表にロードされる列を指定するローダー・マップのXMLスキーマ定義(XSD)です。

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema elementFormDefault="qualified" attributeFormDefault="unqualified"
           xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:attributeGroup name="columnAttrs">
    <xs:annotation>
      <xs:documentation>Column attributes define how to map input fields to the
                        database column. field - is the name of the field in the
                        IndexedRecord input object. The field name need not be
                        unique. This means that the same input field can map to
                        different columns in the database table. format - is a
                        format string for interpreting the input. For example,
                        if the field is a date then the format is a date format
                        string suitable for interpreting dates</xs:documentation>
    </xs:annotation>
    <xs:attribute name="field" type="xs:token" use="optional"/>
    <xs:attribute name="format" type="xs:token" use="optional"/>
  </xs:attributeGroup>
  <xs:simpleType name="TOKEN_T">
    <xs:restriction base="xs:token">
      <xs:minLength value="1"/>
    </xs:restriction>
  </xs:simpleType>
  <xs:element name="LOADER_MAP">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="SCHEMA" type="TOKEN_T" minOccurs="0"/>
        <xs:element name="TABLE" type="TOKEN_T" nillable="false"/>
        <xs:element name="COLUMN" maxOccurs="unbounded" minOccurs="0">
          <xs:annotation>
            <xs:documentation>specifies the database column name that will be
                              loaded. Each column name must be unique.
            </xs:documentation>
          </xs:annotation>
          <xs:complexType>
            <xs:simpleContent>
              <xs:extension base="TOKEN_T">
                <xs:attributeGroup ref="columnAttrs"/>
              </xs:extension>
            </xs:simpleContent>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
 </xs:schema>

Hadoop用OraLoaderの構成プロパティ

これはoraloader-conf.xmlドキュメントで、Oracle Loader for Hadoopの構成プロパティを示します。

<?xml version="1.0"?>
<!-- 
 Copyright (c) 2011, Oracle and/or its affiliates. All rights reserved. 
 
   NAME
     oraloader-conf.xml 
 
   DESCRIPTION
     Config properties for OLH.  
     
     This file is loaded as the very first conf resource.
     Properties without default values are commented out.
-->
<configuration>
  <property>
    <name>oracle.hadoop.loader.libjars</name>
    <value>${oracle.hadoop.loader.olh_home}/jlib/ojdbc6.jar,
${oracle.hadoop.loader.olh_home}/jlib/orai18n.jar,
${oracle.hadoop.loader.olh_home}/jlib/orai18n-utility.jar,
${oracle.hadoop.loader.olh_home}/jlib/orai18n-mapping.jar,
${oracle.hadoop.loader.olh_home}/jlib/orai18n-collation.jar,
${oracle.hadoop.loader.olh_home}/jlib/oraclepki.jar,
${oracle.hadoop.loader.olh_home}/jlib/osdt_cert.jar,
${oracle.hadoop.loader.olh_home}/jlib/osdt_core.jar,
${oracle.hadoop.loader.olh_home}/jlib/commons-math-2.2.jar,
${oracle.hadoop.loader.olh_home}/jlib/jackson-core-asl-1.5.2.jar,
${oracle.hadoop.loader.olh_home}/jlib/jackson-mapper-asl-1.5.2.jar,
${oracle.hadoop.loader.olh_home}/jlib/avro-1.5.4.jar,
${oracle.hadoop.loader.olh_home}/jlib/avro-mapred-1.5.4.jar</value> 
    <description>Comma separated list of library jar files. These jars get 
                 injected into the command-line arguments under the 
                 GenericOptionsParser's "-libjars" option. When a "-libjars" 
                 option is used as a command-line argument, then this list of
                 jars is prepended to the list following "-libjars". Users can
                 distribute their application jars using this property in place
                 of, or in combination with, the "-libjars" option.</description>
  </property>

  <property>
    <name>oracle.hadoop.loader.sharedLibs</name>
    <value>${oracle.hadoop.loader.olh_home}/lib/libolh11.so,
${oracle.hadoop.loader.olh_home}/lib/libclntsh.so.11.1,
${oracle.hadoop.loader.olh_home}/lib/libnnz11.so,
${oracle.hadoop.loader.olh_home}/lib/libociei.so</value>
  </property>

  <property>
    <name>oracle.hadoop.loader.olh_home</name>
    <value/>
    <description>
      A path to the OLH_HOME on the node where the OraLoader job
      is initiated. OraLoader uses this path to locate required libraries.
      If this property is not set, OraLoader will use the value in the environment
      variable OLH_HOME.
    </description>
  </property>

  <property>
    <name>oracle.hadoop.loader.jobName</name>
    <value>OraLoader</value>
    <description>
      Hadoop job name for this Oracle loader job. Used as input for
      the Job.setJobName() method.
    </description>
  </property>

  <property> 
    <name>oracle.hadoop.loader.targetTable</name>
    <value/>
    <description>
      A schema qualified name for the table to be loaded. Use this 
      property to indicate that all columns of the table will be 
      loaded and that the names of the input fields match the 
      column names. This property takes precedence over the
      oracle.hadoop.loader.loaderMapFile property. The default 
      value is null.
    </description>
  </property>

  <property>
    <name>oracle.hadoop.loader.loaderMapFile</name>
    <value/>
    <description>
      Path to the loader map file. Use a file:// schema to indicate a local file.
    </description>
  </property>

  <property>
    <name>oracle.hadoop.loader.tableMetadataFile</name>
    <value/>
    <description>
      Path to the target table metadata file. Use this property when
      running in disconnected mode. The table metadata file is
      created by running the OraLoaderMetadata utility. 
      Use a file:// schema to indicate a local file.
    </description>
  </property>

  <property>
    <name>oracle.hadoop.loader.olhcachePath</name>
    <value>${mapred.output.dir}/../olhcache</value>
    <description>
      Path to a directory where Oracle Loader for Hadoop can create
      files that will be loaded into the DistributedCache.
      Unique file names are generated every time; one may want to empty it,
      or it will grow bigger and bigger if jobs are run 
      using the same olhcache directory.
   
      The default value is a directory called 'olhcache' in the parent directory
      of the job's output directory (i.e. ${mapred.output.dir}).
      
      In distributed mode, the value must be a hdfs path
      (see javaDoc for org.apache.hadoop.filecache.DistributedCache).    
    </description>
  </property>

  <property>
    <name>oracle.hadoop.loader.loadByPartition</name>
    <value>true</value>
    <description>
      Instructs the output format to perform a partition-aware load.
      For DelimitedText output format, this option controls whether the 
      keyword "PARTITION" appears in the generated .ctl file(s).
    </description>
  </property>

  <property>
    <name>oracle.hadoop.loader.extTabDirectoryName</name>
    <value>OLH_EXTTAB_DIR</value>
    <description>
      The name of the Oracle directory object for the external table's
      LOCATION data files. This property applies only to the DelimitedText
      and DataPump output formats.
    </description>
  </property> 
 
  <property>
    <name>oracle.hadoop.loader.sampler.enableSampling</name>
    <value>true</value>
    <description>
      Indicates whether the sampling feature is enabled. 
      Set the value to false to disable this feature.
    </description>
  </property>
    
  <property>
    <name>oracle.hadoop.loader.enableSorting</name>
    <value>true</value>
    <description>
      Indicates whether output records within each reducer group 
      should be sorted by the primary key for the table.
    </description>
  </property>
  
  <property>
     <name>oracle.hadoop.loader.configuredCounters</name>
     <value>MAPPER,OUTPUT</value>
     <description>
       Turns ON Oracle Loader for Hadoop counters by category. The value is a
       comma separated list of zero or more of the following keywords: 
       MAPPER, REDUCER, OUTPUT, and SAMPLER.
       
       Note that the input error counters (displayed in the 
       "map phase counters" section of the final report) are always on, 
       regardless of the presence of the keyword MAPPER in this list.
 
       Newer release of Hadoop (0.20.203, CDH3u2) impose a hard limit on the
       total number of counters a job can use (see property 
       mapreduce.job.counters.limit in mapred-site.xml). Note that this 
       limit cannot be changed on a per-job basis, and the cluster needs
       to be restarted after the property has beed updated on all nodes.
 
       In order to turn off all the Oracle Loader for Hadoop specific counters, 
       set this property's value to an empty list using either:
 
       -D oracle.hadoop.loader.configuredCounters= 
 
       or
 
       <property>
         <name>oracle.hadoop.loader.configuredCounters</name>
         <value>,</value>
       </property>
 
     </description>
  </property>
  
  <!-- CONNECTION properties -->
  
  <property>
    <name>oracle.hadoop.loader.connection.url</name>
    <value/>
    <description>
      Specifies the URL of the database connection string. This property 
      takes precedence and overrides all other connection properties.
    
      If Oracle Wallet is configured as an external password store,
      the property value must start with the driver prefix: jdbc:oracle:thin:@ 
      and the db_connect_string must exactly match the credential defined in the 
      wallet.
 
        Example 1: ( using oracle net syntax) 
        jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=
            (ADDRESS=(PROTOCOL=TCP)(HOST=myhost)(PORT=1521)))
                     (CONNECT_DATA=(SERVICE_NAME=my_db_service_name)))
        
        Example 2: ( using TNS entry)
          jdbc:oracle:thin:@myTNS
        
        - Also see documentation for
          oracle.hadoop.loader.connection.wallet_location          
    
      If Oracle Wallet is NOT used, then set the following conf properties:      
      oracle.hadoop.loader.connection.url
 
        Examples of connection URL styles:
          thin-style: 
            jdbc:oracle:thin:@//myhost:1521/my_db_service_name  
            jdbc:oracle:thin:user/password@//myhost:1521/my_db_service_name
         
          Oracle Net:
            jdbc:oracle:thin:@(DESCRIPTION=(ADDRESS_LIST=
                (ADDRESS=(PROTOCOL=TCP)(HOST=myhost)(PORT=1521)))
                         (CONNECT_DATA=(SERVICE_NAME=my_db_service_name)))
        
          TNSEntry Name:
            jdbc:oracle:thin:@myTNSEntryName
        
     AND 
     oracle.hadoop.loader.connection.user  
     oracle.hadoop.loader.connection.password 
 
     If OCIOutputFormat is configured,and Oracle Wallet is not used,then
     username and password must be specified in these separate properties. 
       
    </description>
  </property>

   <property>
     <name>oracle.hadoop.loader.connection.user</name>
     <value/>
     <description>Name for the database login.</description>
   </property>

  <property>
    <name>oracle.hadoop.loader.connection.password</name>
    <value/>
    <description>Password for the connecting user.</description>
  </property>

  <property>
    <name>oracle.hadoop.loader.connection.wallet_location</name>
    <value/>
    <description>File path to an Oracle wallet where the connection information
     is stored. This property is used only for JDBC connections. For JDBC output
     format, when using Oracle Wallet as an external password store, set the
     following two properties:
 
     - oracle.hadoop.loader.connection.wallet_location
     - oracle.hadoop.loader.connection.url

     Or, set the following three properties:

     - oracle.hadoop.loader.connection.wallet_location
     - oracle.hadoop.loader.connection.tnsEntryName
     - oracle.hadoop.loader.connection.tns_admin 

     For the OCI output format, set the 
     oracle.hadoop.loader.connection.tns_admin property to indicate 
     wallet location.

     Note that JDBC connections are always made for online loads, even when  
     the OCI Direct Path output format is specified. The same wallet can be 
     used for both connection types.
    </description>
  </property>
  
  <property>
    <name>oracle.hadoop.loader.connection.tnsEntryName</name>
    <value/>
    <description>Specifies a TNS entry name defined in the tnsnames.ora file.
    This property is used together with the 
    oracle.hadoop.loader.connection.tns_admin property.
    </description>
  </property>  
  
  <property>
    <name>oracle.hadoop.loader.connection.tns_admin</name>
    <value/>
    <description>File path to a directory containing
      SQL*Net configuration files like sqlnet.ora and tnsnames.ora.
      If this property is not set, the value of the environment
      variable TNS_ADMIN will be used. Define this property in order
      to use TNS entry names in database connect strings. 
      This property must be defined when using an Oracle Wallet with OCI
      connections.
    </description>
  </property>  
 
  <property>
    <name>oracle.hadoop.loader.connection.defaultExecuteBatch</name>
    <value>100</value>
    <description>
       Applicable only for JDBC and OCI output formats. The default
       value for the number of records to be inserted in a batch for
       each trip to the database. Specify a value >= 1 to
       override the default value. If the specified value is less than 1,
       this property assumes the default value. Though the maximum
       value is unlimited, using very large batch sizes is not
       recommended, as it results in a large memory footprint without
       much increase in performance.
     </description>
  </property>

  <property>
    <name>oracle.hadoop.loader.connection.sessionTimezone</name>
    <value>LOCAL</value>
    <description>
      This property is used to alter the session time zone for 
      database connections.  Valid values are:
      
        [+|-] hh:mm      - hours and minutes before or after UTC
        LOCAL            - the default timezone of the JVM 
        time_zone_region - a valid time zone region
      
      This property also determines the default timezone when parsing
      input data that will be loaded to database column types:
      TIMESTAMP, TIMESTAMP WITH TIME ZONE and TIMESTAMP WITH LOCAL TIME ZONE
    </description>
  </property>
  
  <!-- properties for OCIOutputFormat -->
  <property>
    <name>oracle.hadoop.loader.output.dirpathBufsize</name>
    <value>131072</value>
    <description>
      This property is used to set the size, in bytes, of the direct path stream
      buffer for OCIOutputFormat.  If needed, values are rounded up to the next 
      nearest multiple of 8k.
    </description>
  </property>

  <property>
    <name>oracle.hadoop.loader.compressionFactors</name>
    <value>BASIC=5.0,OLTP=5.0,QUERY_LOW=10.0,QUERY_HIGH=10.0,
      ARCHIVE_ LOW=10.0,ARCHIVE_HIGH=10.0</value>
    <description>
      This property is used to define the compression factor for different types
      of compression. The format is a comma separated list of name=value pairs
      where name is one of BASIC, OLTP, QUERY_LOW, QUERY_HIGH, ARCHIVE_LOW, or
      ARCHIVE_HIGH.  Value is a decimal number.
    </description>
  </property>
  
  <!-- properties for DelimitedTextOutputFormat -->
    <property>
      <name>oracle.hadoop.loader.output.fieldTerminator</name>
      <value>,</value>
      <description>
        A single character to delimit fields for DelimitedTextOutputFormat.
        Alternate representation: \uHHHH (where HHHH is the character's UTF-16 
        encoding).
      </description>
    </property>

    <property>
      <name>oracle.hadoop.loader.output.initialFieldEncloser</name>
      <value></value>
      <description>
        When this value is set, fields are always enclosed between the
        specified character and 
        ${oracle.hadoop.loader.output.trailingFieldEncloser}.
        
        If this value is set, it must be either a single character, or \uHHHH 
        (where HHHH is the character's UTF-16 encoding).
        
        ${oracle.hadoop.loader.output.initialFieldEncloser} and 
        ${oracle.hadoop.loader.output.trailingFieldEncloser} must be either 
        both not set, or both set.
        A zero length value means no enclosers (default value).
                           
        Use this when some field may contain the fieldTerminator. 
        If some field may also contain the trailingFieldEncloser, then
        the escapeEnclosers property should be set to true.
      </description>
    </property>

    <property>
      <name>oracle.hadoop.loader.output.trailingFieldEncloser</name>
      <value></value>
      <description>
        When this value is set, fields are always enclosed between 
        ${oracle.hadoop.loader.output.initialFieldEncloser} and the
        specified character for this property.
 
        If this value is set, it must be either a single character, or \uHHHH 
        (where HHHH is the character's UTF-16 encoding).
        
        ${oracle.hadoop.loader.output.initialFieldEncloser} and 
        ${oracle.hadoop.loader.output.trailingFieldEncloser} must be either 
        both not set, or both set.
        A zero length value means no enclosers (default value).
                           
        Use this when some field may contain the fieldTerminator. 
        If some field may also contain the trailingFieldEncloser, then
        the escapeEnclosers property should be set to true.
      </description>
    </property>
 
    <property>
      <name>oracle.hadoop.loader.output.escapeEnclosers</name>
      <value>false</value>
      <description>
        When this is set to true and both initial and trailing field enclosers 
        are set, fields will be scanned, and embedded trailing encloser 
        characters will be escaped. Use this option when some of the field
        values may contain the trailing encloser character.
      </description>
    </property>
 
  <!-- properties for DelimitedTextInputFormat -->
    <property>
      <name>oracle.hadoop.loader.input.fieldTerminator</name>
      <value>,</value>
      <description>
        A single character to delimit fields for DelimitedTextInputFormat.
        Alternate representation: \uHHHH (where HHHH is the character's UTF-16 
        encoding).
      </description>
    </property>

    <property>
      <name>oracle.hadoop.loader.input.initialFieldEncloser</name>
      <value></value>
      <description>
        When this value is set, fields are allowed to be enclosed
        between the specified character and 
        ${oracle.hadoop.loader.input.trailingFieldEncloser}.
        
        If this value is set, it must be either a single character, or \uHHHH 
        (where HHHH is the character's UTF-16 encoding).
        
        ${oracle.hadoop.loader.input.initialFieldEncloser} and 
        ${oracle.hadoop.loader.input.trailingFieldEncloser} must be either 
        both not set, or both set.
        A zero length value means no enclosers (default value).
      </description>
    </property>

    <property>
      <name>oracle.hadoop.loader.input.trailingFieldEncloser</name>
      <value></value>
      <description>
        When this value is set, fields are allowed to be enclosed
        between ${oracle.hadoop.loader.input.initialFieldEncloser} 
        and the specified character.
        
        If this value is set, it must be either a single character, or \uHHHH 
        (where HHHH is the character's UTF-16 encoding).
        
        ${oracle.hadoop.loader.input.initialFieldEncloser} and 
        ${oracle.hadoop.loader.input.trailingFieldEncloser} must be either 
        both not set, or both set.
        A zero length value means no enclosers (default value).      
      </description>
    </property>
 
    <!--Properties for tuning the sampler-->
    <!-- set numThreads > 1 for large datasets -->
    <property>
      <name>oracle.hadoop.loader.sampler.numThreads</name>
        <value>5</value>
      <description>Number of sampler threads.  
      This value should be set based on the processor and memory resources 
      available to the job tracker node. A higher number of sampler threads 
      implies higher concurrency in sampling.
      The default value is 5 threads.  
      </description>
    </property>
 
  <property>
     <name>oracle.hadoop.loader.sampler.maxLoadFactor</name>
     <value>0.05</value>
     <description> 
       The maximum acceptable reducer load factor.
       In a perfectly load balanced job, every reducer is assigned 
       an equal amount of work (or load). 
       Load factor is the percent overload per reducer 
       i.e. (assigned load - ideal load)%
       For example: a value of 0.05, indicates that it is acceptable for 
       reducers to be assigned up to 5% more data than their ideal load. 
       If load balancing is successful, it guarantees this 
       maximum load factor at the specified confidence.
       (see oracle.hadoop.loader.sampler.loadCI)
       Default = 0.05, another common value is 0.1.
     </description>
  </property>
 
  <property>
    <name>oracle.hadoop.loader.sampler.loadCI</name>
    <value>0.95</value>
    <description> 
      The confidence level for the specified 
      maximum reducer load factor.
      (See oracle.hadoop.loader.sampler.maxLoadFactor)
      Default = 0.95, other common values = 0.90, 0.99
    </description>
  </property>
 
  <property>
    <name>oracle.hadoop.loader.sampler.minSplits</name>
    <value>5</value>
    <description>
      The minimum number of splits that will be 
      read by the sampler. If the total number of splits 
      is lesser than this value, then the sampler will read
      all splits. Splits may be read partially. 
      A non-positive value is equivalent to minSplits=1. 
      The default value is 5.
    </description>
  </property>
 
  <property>
    <name>oracle.hadoop.loader.sampler.hintMaxSplitSize</name>
    <value>1048576</value>
    <description> 
      The sampler sets Hadoop configuration parameter
      mapred.max.split.size to this value before it calls the InputFormat's 
      getSplits() method.
      The value of mapred.max.split.size is only set to this value for the 
      duration of sampling, it is not changed in the actual job 
      configuration. Some InputFormats (e.g. FileInputFormat) use the 
      maximum split size as a hint to determine the number of splits 
      returned by getSplits(). Smaller split sizes imply that more
      chunks of data will be sampled at random (good). While large splits are 
      better for IO performance, they are not necessarily better for sampling. 
      Set this value to be small enough for good sampling performance, 
      but not any smaller: extremely small values can cause inefficient IO 
      performance and cause getSplits() to run out of memory by returning too
      many splits. 
      The recommended minimum value for this property is 1048576 bytes (1 MB).
      This value can be increased for larger datasets (e.g. tens of terabytes) 
      or if the InputFormat's getSplits() method throws an OutOfMemoryError.       
      If the specified value is less than 1, this property is ignored. 
      The default value is 1048576 bytes (1 MB).
    </description>
  </property>
 
  <property>
    <name>oracle.hadoop.loader.sampler.hintNumMapTasks</name>
    <value>100</value>
    <description> 
      The sampler sets Hadoop configuration parameter
      mapred.map.tasks to this value for the duration of sampling. 
      The value of mapred.map.tasks is not changed in the actual job 
      configuration. Some InputFormats (e.g. DBInputFormat) use the 
      number of map tasks parameter as a hint to determine the number of 
      splits returned by getSplits(). Higher values imply that more chunks 
      of data will be sampled at random (good). The default value is 100. 
      This value should typically be increased for large datasets (e.g. more 
      than a million rows), while keeping in mind that extremely large values
      can cause the InputFormat's getSplits() method to run out of memory by
      returning too many splits.
      If the specified value is less than 1, this property is ignored.  
      The default value is 100.
    </description>
  </property>
 
  <property>
    <name>oracle.hadoop.loader.sampler.maxSamplesPct</name>
    <value>0.01</value>
    <description> 
      This property specifies the maximum data to sample, as a
      percentage of the total amount of data. In general, the 
      sampler will stop sampling if any one of the following is true:
      (1) it has collected the minimum number of samples 
          required for optimal load-balancing, or 
      (2) the percent of data sampled exceeds 
          oracle.hadoop.loader.sampler.maxSamplesPct, or 
      (3) the number of bytes sampled exceeds 
          oracle.hadoop.loader.sampler.maxHeapBytes.
      If this parameter is set to a negative value, 
      condition (2) is not imposed.
      The default value is 0.01 (1%).
    </description>
  </property>
 
  <property>
    <name>oracle.hadoop.loader.sampler.maxHeapBytes</name>
    <value>-1</value>
    <description> 
      This value specifies the maximum memory available to 
      the sampler in bytes. In general, the sampler will 
      stop sampling when any one of these conditions is true:
      (1) it has collected the minimum number of samples 
          required for optimal load-balancing, or 
      (2) the percent of data sampled exceeds 
          oracle.hadoop.loader.sampler.maxSamplesPct, or 
      (3) the number of bytes sampled exceeds 
          oracle.hadoop.loader.sampler.maxHeapBytes.
      If this parameter is set to a negative value, 
      condition (3) is not imposed.
      Default = -1 (no memory restrictions on the sampler).
    </description>
  </property>
  
</configuration>

同梱されているソフトウェアのサードパーティ・ライセンス

Oracle Loader for Hadoopで、次のサードパーティ製品がインストールされます。

Apache Avro
Apache Commons Mathematics Library
Jackson JSON Processor

Oracle Loader for Hadoopには、Oracle 11gリリース2 (11.2)クライアント・ライブラリが含まれます。Oracle Database 11gリリース2 (11.2)に含まれるサードパーティ製品については、『Oracle Databaseライセンス情報』を参照してください。

特に断りがないかぎり、あるいは、サードパーティ・ライセンス(LGPLなど)の条項で求められている場合、Apache Licensed Codeに関連するすべてのステートメントを含めた、この項のライセンスとステートメントは、告知のみを目的とするものです。

Apache Licensed Code

The following is included as a notice in compliance with the terms of the Apache 2.0 License, and applies to all programs licensed under the Apache 2.0 license:

You may not use the identified files except in compliance with the Apache License, Version 2.0 (the "License.")

You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

A copy of the license is also reproduced below.

See the License for the specific language governing permissions and limitations under the License.

Apache License

Version 2.0, January 2004
http://www.apache.org/licenses/

TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

Definitions

"License" shall mean the terms and conditions for use, reproduction, and distribution as defined by Sections 1 through 9 of this document.

"Licensor" shall mean the copyright owner or entity authorized by the copyright owner that is granting the License.

"Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity.For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity.

"You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License.

"Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files.

"Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types.

"Work" shall mean the work of authorship, whether in Source or Object form, made available under the License, as indicated by a copyright notice that is included in or attached to the work (an example is provided in the Appendix below).

"Derivative Works" shall mean any work, whether in Source or Object form, that is based on (or derived from) the Work and for which the editorial revisions, annotations, elaborations, or other modifications represent, as a whole, an original work of authorship.For the purposes of this License, Derivative Works shall not include works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work and Derivative Works thereof.

"Contribution" shall mean any work of authorship, including the original version of the Work and any modifications or additions to that Work or Derivative Works thereof, that is intentionally submitted to Licensor for inclusion in the Work by the copyright owner or by an individual or Legal Entity authorized to submit on behalf of the copyright owner.For the purposes of this definition, "submitted" means any form of electronic, verbal, or written communication sent to the Licensor or its representatives, including but not limited to communication on electronic mailing lists, source code control systems, and issue tracking systems that are managed by, or on behalf of, the Licensor for the purpose of discussing and improving the Work, but excluding communication that is conspicuously marked or otherwise designated in writing by the copyright owner as "Not a Contribution."

"Contributor" shall mean Licensor and any individual or Legal Entity on behalf of whom a Contribution has been received by Licensor and subsequently incorporated within the Work.
Grant of Copyright License.Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare Derivative Works of, publicly display, publicly perform, sublicense, and distribute the Work and such Derivative Works in Source or Object form.
Grant of Patent License.Subject to the terms and conditions of this License, each Contributor hereby grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, and otherwise transfer the Work, where such license applies only to those patent claims licensable by such Contributor that are necessarily infringed by their Contribution(s) alone or by combination of their Contribution(s) with the Work to which such Contribution(s) was submitted.If You institute patent litigation against any entity (including a cross-claim or counterclaim in a lawsuit) alleging that the Work or a Contribution incorporated within the Work constitutes direct or contributory patent infringement, then any patent licenses granted to You under this License for that Work shall terminate as of the date such litigation is filed.
Redistribution.You may reproduce and distribute copies of the Work or Derivative Works thereof in any medium, with or without modifications, and in Source or Object form, provided that You meet the following conditions:
1. You must give any other recipients of the Work or Derivative Works a copy of this License; and
2. You must cause any modified files to carry prominent notices stating that You changed the files; and
3. You must retain, in the Source form of any Derivative Works that You distribute, all copyright, patent, trademark, and attribution notices from the Source form of the Work, excluding those notices that do not pertain to any part of the Derivative Works; and
4. If the Work includes a "NOTICE" text file as part of its distribution, then any Derivative Works that You distribute must include a readable copy of the attribution notices contained within such NOTICE file, excluding those notices that do not pertain to any part of the Derivative Works, in at least one of the following places: within a NOTICE text file distributed as part of the Derivative Works; within the Source form or documentation, if provided along with the Derivative Works; or, within a display generated by the Derivative Works, if and wherever such third-party notices normally appear.The contents of the NOTICE file are for informational purposes only and do not modify the License.You may add Your own attribution notices within Derivative Works that You distribute, alongside or as an addendum to the NOTICE text from the Work, provided that such additional attribution notices cannot be construed as modifying the License.
You may add Your own copyright statement to Your modifications and may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Derivative Works as a whole, provided Your use, reproduction, and distribution of the Work otherwise complies with the conditions stated in this License.
Submission of Contributions.Unless You explicitly state otherwise, any Contribution intentionally submitted for inclusion in the Work by You to the Licensor shall be under the terms and conditions of this License, without any additional terms or conditions.Notwithstanding the above, nothing herein shall supersede or modify the terms of any separate license agreement you may have executed with Licensor regarding such Contributions.
Trademarks.This License does not grant permission to use the trade names, trademarks, service marks, or product names of the Licensor, except as required for reasonable and customary use in describing the origin of the Work and reproducing the content of the NOTICE file.
Disclaimer of Warranty.Unless required by applicable law or agreed to in writing, Licensor provides the Work (and each Contributor provides its Contributions) on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied, including, without limitation, any warranties or conditions of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE.You are solely responsible for determining the appropriateness of using or redistributing the Work and assume any risks associated with Your exercise of permissions under this License.
Limitation of Liability.In no event and under no legal theory, whether in tort (including negligence), contract, or otherwise, unless required by applicable law (such as deliberate and grossly negligent acts) or agreed to in writing, shall any Contributor be liable to You for damages, including any direct, indirect, special, incidental, or consequential damages of any character arising as a result of this License or out of the use or inability to use the Work (including but not limited to damages for loss of goodwill, work stoppage, computer failure or malfunction, or any and all other commercial damages or losses), even if such Contributor has been advised of the possibility of such damages.
Accepting Warranty or Additional Liability.While redistributing the Work or Derivative Works thereof, You may choose to offer, and charge a fee for, acceptance of support, warranty, indemnity, or other liability obligations and/or rights consistent with this License.However, in accepting such obligations, You may act only on Your own behalf and on Your sole responsibility, not on behalf of any other Contributor, and only if You agree to indemnify, defend, and hold each Contributor harmless for any liability incurred by, or claims asserted against, such Contributor by reason of your accepting any such warranty or additional liability.

END OF TERMS AND CONDITIONS

APPENDIX: How to apply the Apache License to your work

To apply the Apache License to your work, attach the following boilerplate notice, with the fields enclosed by brackets "[]" replaced with your own identifying information.(Don't include the brackets!)The text should be enclosed in the appropriate comment syntax for the file format.We also recommend that a file or class name and description of purpose be included on the same "printed page" as the copyright notice for easier identification within third-party archives.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.See the License for the specific language governing permissions and limitations under the License.

この製品には、The Apache Software Foundation (http://www.apache.org/)によって開発されたソフトウェアが含まれています(次に記載)。

Apache Avro avro-1.5.4.jar

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Apache Commons Mathematics Library 2.2

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Jackon JSON Library 1.5.2

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License.You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0