Record adapters read and write record data. A record adapter describes where the data is located (or will be saved to), the format, and various aspects of processing.

Forge can read source data from a variety of file formats and source systems. Each data source needs a corresponding input record adapter describing the particulars of that source. Based on this information, Forge parses the data and turns it into Endeca records. Input record adapters automatically decompress source data that is compressed in the gzip format.

To add an input record adapter to your pipeline:

  1. In the Pipeline Diagram editor, choose NewRecordAdapter.

    The Record Adapter editor appears.

  2. In the Name text box, type a unique name for this record adapter.

  3. In the General tab, do the following:

    1. In the Direction frame, choose Input.

    2. In the Format list, choose one of the following: XML, binary, fixed-width, delimited, vertical, document, JDBC adapter, Exchange, ODBC (Windows only), or custom adapter (available only by request from Oracle).

    3. In the URL text box, type the location of the source data.

    4. In the Delimiters frame, if the format is delimited, add row and column delimiters. If the format is vertical, add row, column, and record delimiters.

    5. (Optional) In the Encoding text box, define the encoding of the input data. If Encoding is not set, it is assumed to be Latin-1.

    6. If any of the text boxes in the Java properties frame are made available by your format selection, type in the required information.

    7. (Optional) Check Require Data if you want Forge to exit with an error if the URL does not exist or is empty.

    8. (Optional) Check Filter empty properties. Keep in mind that this attribute applies only to input record adapters and is valid only for the Vertical, Delimited, Fixed-width, and ODBC input formats.

    9. (Optional) Check Multi File if Forge can read data from more than one input file.

    10. Check Maintain State if you are using the Endeca Application Controller (EAC) environment.

    11. (Optional) Check Custom Compression Level if your input file is compressed to indicate to Forge that it must decompress data from this source.

  4. Ignore the Sources tab. Its settings are not used by an input record adapter.

  5. (Optional) In the Record Index tab, do the following:

  6. If you are using XSLT to transform your XML into Endeca-compatible XML, in the Transformer tab, specify the type (XSLT) and the location of the stylesheet.

  7. If your format is ODBC, fixed-width, delimited, JDBC, custom, or Exchange, in the Pass Through tab, enter the necessary information.

  8. (Optional) In the Comment tab, add a comment for the component.

  9. Click OK.

The Record Adapter editor contains a unique name for this record adapter.

The Record Adapter editor contains the following tabs:

The General tab contains the following options:

Option

Description

Direction

Input Adapter

Required. Set to input.

Output Adapter

Required. Set to Output.

Format

Input Adapter

Required. The format type of the raw data to be loaded. One of the following: delimited, XML, binary, fixed width, document, ODBC (Windows only), vertical, JDBC Adapter, Exchange, or Custom Adapter. Your record format affects what delimiter options, if any, are necessary.

Note

The custom adapter option is only available by request from Oracle.

Output Adapter

Required. Can be set to delimited, XML, binary, fixed width, or vertical.

URL

Input Adapter

Required for delimited, XML, binary, fixed-width, and vertical input adapters. Location of the file being loaded. The path can be either an absolute path, or a path relative to the Pipeline.epx file. With an absolute path, the protocol can be specified in RFC 2396 syntax. Usually this means the file:/// prefix precedes the path to the data file. Relative paths should not specify the protocol. Any paths that are part of this URL will be overridden if the Forge --inputDir option is specified.

Note

Exchange input adapters also require a URL but the URL is specified in a pass through element using the Pass Throughs tab.

Output Adapter

Required. Location to which the data will be saved, using the same path caveats as input adapters.

Row, column, and record delimiters

Input Adapter

Optional. Used by input adapters only if the data is in delimited or vertical format. Row and Column are used for Delimited and Vertical formats. Record is used for Vertical formats.

Output Adapter

Not used.

Java properties

Input Adapter

Required as follows:

Note

When running your pipeline through Forge, you can override the Java home and Class path settings using command-line options. See Overriding Java home and class path settings.

Output Adapter

Not used.

Encoding

Input Adapter

Optional. Defines the encoding of the input data. Several hundred encodings are supported; the following are typical examples.

If Encoding is not set, it is assumed to be Latin-1. If an incorrect encoding is specified, then Forge will generate warnings about any characters that do not make sense in the specified encoding. For example, in the ASCII encoding, any character with a number above 127 is considered invalid.

Note

This setting is ignored by the XML format, because the encoding is specified in the XML header, and by Output record adapters. It is also ignored for binary format encoding only applies to text files.

Output Adapter

Required. Set to UTF-8.

Require data

Input Adapter

Optional. If checked, Forge exits immediately with an error if the URL does not exist or is empty. The error is sent to wherever logging is configured to send errors, typically to the console or stderr.

Output Adapter

Not used.

Filter empty properties

Input Adapter

Optional. Determines whether source properties with empty property values are assigned to Endeca Records:

This attribute is is valid only for the Vertical, Delimited, Fixed-width, and ODBC input formats. For a filtering example, see Filtering empty properties.

Output Adapter

Not used.

Multi file

Input Adapter

Optional. Specifies whether Forge can read data from more than one input file. If checked, the input URL is interpreted as a pattern, and Forge reads each file matching the pattern in alphabetical order. For example, the record adapter may specify a URL pattern of "*.update.txt", in which case Forge reads any file in the given directory that has the .update.txt suffix

Output Adapter

Not used.

Maintain state

Input Adapter

Not used.

Output Adapter

Optional. If checked, indicates that the value of URL is relative to the Forge flag --stateDir. (This allows you to change your state directory using the --stateDir flag and yet not require you to modify your record adapter configuration.

Compression level

Input Adapter

Not used. Compression of input files is detected automatically.

Output Adapter

Sets the level of compression to be performed on the record data when its written to disk. To save on the amount of disk space used, check Custom compression level and slide the bar to the recommended value of 7.

Note

Compressed data consumes less disk space but takes longer to read and write.

Forge can read source data from a variety of file formats.

The Delimited format reads source records that are organized into rows and columns.

Each row is separated from other rows by a row delimiter character, such as the new-line character, and each column is separated from other columns by a column delimiter character, such as a comma or the tab character. The row and column delimiters must not be present within the data itself. For example, if the column delimiter is a comma, no data in a column can contain a comma.

When the source records are read into the Data Foundry, two mappings occur:

Properties are trimmed as they are read in. White space on the ends of properties (including the space, tab, new-line, and other characters) is removed. However, white space within a property is preserved.

The records in a delimited file must have identical properties, in terms of number and type, although it is possible for a record to have a null value for a property. You can use the "Filter empty properties" checkbox, on the Record Adapter editor's General tab, to tell the record adapter to ignore properties that have null values.

The fixed width format reads source data records from columns of fixed widths.

Each column represents a property and has a width that is a specific number of characters. For example, the first three characters could represent an ID, characters 4 through 10 could represent a name, and so forth. Each row represents a record.

The fixed width record adapter requires the following six attributes, which are specified on the Pass Throughs tab. The names of the attributes must be entered as shown:

The vertical format reads source records stored as property name/value pairs.

Vertical format requires delimiters specifying how to identify each property name, property value, and record. These delimiters are defined in the General tab of the Record Adapter editor:

All name/value pairs leading up to a record delimiter are considered part of a single record. The properties for the records in a vertical file format can be of a variable number and type.

Properties are trimmed as they are read in. White space (such as the space, tab, and new-line characters) is removed from both ends of properties, but white space within a property is preserved.


The ODBC format enables the Endeca Information Transformation Layer (ITL) to connect directly to and read records from any database that supports ODBC connections.

The JDBC format enables the Endeca Information Transformation Layer (ITL) to connect to and read records from any JDBC data source.

In addition to name, direction, and format, a JDBC record adapter requires settings on the General and Pass Throughs tabs.

In addition, if the connection requires properties (such as a password or username), then the following attribute can also be specified, as many times as necessary, on the Pass Throughs tab:

Note that configuring user name and password parameters varies according to your JDBC driver. For example, with Oracle JDBC Thin drivers, the user name and password parameters are included as part of the DB_URL string rather than as separate DB_CONNECT_PROP values. You may have to refer to the documentation for your JDBC driver type to determine exact configuration requirements.

Instead of specifying clear text credentials, you can use Oracle Credentials Store to specify the database credentials information. In which case instead of specifying the username and password information along with DB_URL or DB_CONNECT_PROP, you need to use the passthrough CREDENTIALS_KEY and provide the key name that should be used to retrieve the credentials from Oracle Credentials Store.

• Name = CREDENTIALS_KEY Value = The key name required to access the credentials information from Oracle Credentials Store

The following illustration shows the Pass Throughs tab for a record adapter that is configured to access a JDBC data source through an Oracle JDBC Thin driver using clear text credentials:

Example of a record adapter with JDBC passthrough information.

The following illustration shows the Pass Throughs tab for a record adapter that is configured to access a JDBC data source via an Oracle JDBC Thin driver using Oracle Credentials Store:

The Exchange format allows the Endeca Information Transformation Layer (ITL) to connect to one or more Microsoft Exchange Servers (versions 2000 and beyond) and extract information from specified public folders.

The Exchange format produces one record for each document and each sub-folder contained in the specified public folders. This includes mail messages, calendar items, and generic documents of any format.

In addition to name, direction, and format, the Exchange adapter requires the following attributes on the General and Pass Throughs tabs:


Copyright © Legal Notices