You can load source data from a variety of formats.
Your Endeca applications will most often read data directly from one or more database systems, or from database extracts. Input components load records in a variety of formats including delimited, JDBC, and XML. Each input component has its own set of configuration properties. One of the most commonly used type of input component loads data stored in delimited format.
The format used as an example in this chapter is a two-dimensional format similar to the tables found in database management systems. Database tables are organized into rows of records, with columns that represent the source properties and property values for each record. (This type of format is often called a rectangular data format.) The source records are stored in a CSV file named FactSales.csv (the file is in the data-in folder of the Latitude Sample Application).
The following image, which shows the beginning lines of the FactSales.csv input file, illustrates how the source data is organized in a two-dimensional format:
You specify the location and format of the source data to be loaded in the LDI reader component in the graph. The reader component passes the data to the Endeca connector, which is configured to connect to either the Data Ingest Web Service (DIWS) or the Bulk Load Interface, both of which reside on the MDEX Engine. The records are then loaded into the MDEX Engine in batches of a pre-configured size. During the ingest operation, each source row is transformed into an Endeca record with a key-value pair for each non-null source column. The MDEX Engine then indexes the records for use during search queries.
You will be using one of the Endeca standard attributes as the primary-key attribute for the records. (The primary-key property is also known as the record spec property.) The primary-key property must be a unique, single-assign property. For more information on primary keys, see the Data Ingest API Guide.
In our sample graph, the FactSales.csv input file does not have a field that contains unique values. Therefore, the Create Spec component creates the FactSales_RecordSpec primary-key attribute by concatenating two attributes. The name of the primary-key attribute will be specified in the Metadata definition for the Edge component.
Although the MDEX Engine will accept attribute names with hyphens (because hyphens are valid NCName characters), the Designer will not accept source property names with hyphens as metadata. Therefore, if you have a source property name such as "Ship-Date", make sure you remove the hyphen from the name.
ComponentID|Color|Size 123|Blue|Medium 456|Blue;Red|Small 789|Red;Black;Silver|Large
In the example, the pipe character (|) is the delimiter between the properties, while the semi-colon (;) is the delimiter between multiple values in a given property. For example, the Color property for record 789 has values of "Red", "Black", and "Silver".
When configuring the Writer component, you can then specify (in the Writer Edit Component dialog) that the semi-colon is to be used as the delimiter for multi-assign properties.
Keep in mind that an Endeca property that is multi-assign must have the mdex-property_IsSingleAssign property set to false in its PDR. The default value of the property is false, which means the property is enabled for multi-assign by default.