Generally, applications consist of more than one data source. If those sources exist in separate locations, you would need to implement a join in order to create a single record.
For example, an application used to navigate books would have records that contain both title and author information. If the title and author source data reside in different locations, you would need to join them together to create a single record with both pieces of information.
Implementing a join is typically a three-step process.
To implement a join:
With two exceptions, all data sources feeding a join must be record caches, so this section details the procedures from that perspective.
Switch joins do not do record comparisons and, hence, do not require record caches for their data sources. You can use any type of record server component (record adapter, record cache, record assembler, Perl manipulator, and so on) as a source for a switch join.
For a left join, for which all of the right sources are record caches, the left source does not require a record cache. This special case is useful for optimizing a left join with a large, unsorted data source.
To add a record cache for each record source that will feed the join:
In the Pipeline Diagram editor, click New, and then choose Record > Cache.
The Record Cache editor appears.
In the Name text box, type a unique name for this record cache.
(Optional) In the General tab, you may do the following:
If the cache should load fewer than the total number of records from the record source, type the number of records to load in the Maximum Records text box. This feature is provided for testing purposes.
If you want to merge records with equivalent record index key values into a single record, check Combine Records. For one-to-many or many-to-many joins, leave Combine Records unchecked. The Combine records option can have unexpected results if you do not understand how it functions. See Tips and troubleshooting for joins for complete details.
In the Sources tab, select a record source and, optionally, a dimension source.
If a component's record index contains dimension values, you must provide a dimension source. Generally, this is only the case if you are caching data that has been previously processed by Forge.
In the Record Index tab, do the following:
Specify which properties or dimensions you want to use as the record index for this component. The record index you specify for a cache must match the join key that you will specify for that cache in the record assembler.
Indicate whether you want to discard records with duplicate keys. Developer Studio performs a case-insensitive search for duplicate keys.
(Optional) In the Comment tab, add a comment for the component.
Repeat these steps for all record sources that will participate in the join.
The Record Cache editor contains the unique name for this record cache.
The Record Cache editor contains the following tabs:
Option |
Description |
---|---|
Maximum records |
Optional. Record caches can be set to load a limited number of records. The Max records field specifies that limit. By default it is set to -1, which means the cache will load all records. The Max records field only has to be provided if the cache should load fewer than the total number of records. |
Combine records |
Optional. When checked, Forge merges records with the same key values into a single record. Do not check Combine records if you are performing one-to-many or many-to-many joins. |
The Sources tab contains the following fields:
The Record Index tab allows you to add or remove dimensions or properties used in a component's record index, and to change their order. Record indexes support join functionality. See Join sources must have matching join keys and record indexes for more details.
The Record Index tab contains the following fields:
A record assembler is a pipeline component used to join source records originating from different files.
To add a record assembler to your pipeline:
In the Pipeline Diagram editor, click New, and then choose → .
The Record Assembler editor appears.
In the Name text box, type a unique name for the new record assembler.
In the Sources tab, do the following:
Repeat as necessary to add additional record sources. With two exceptions, record assemblers must use record caches as their source of record data.
In the Dimension Source list, select a dimension source.
If the key on which a join is performed contains dimension values, you must provide a dimension source. Generally, this is only the case if you are joining data that has already been processed once by Forge.
(Optional) In the Record Index tab, do the following:
An assembler's record index does not affect the join; it only affects the order in which downstream components will retrieve records from the assembler.
(Optional) In the Comment tab, add a comment for the component.
See "Configuring the join in the record assembler" for details on this procedure.
The Record Assembler editor contains a unique name for this record assembler.
The Record Assembler contains the following tabs:
The Sources tab contains the following fields:
Field |
Description |
---|---|
Record sources |
Required. A choice of the record server components in the project. A record assembler normally contains multiple record sources. |
Dimension source |
If the key on which a join is performed contains dimension values, you must provide a dimension source. Generally, this is only the case if you are joining data that has already been processed once by Forge. |
(Optional) The Record Index tab allows you to add or remove dimensions or properties used in a component's record index, and to change their order. Record indexes support join functionality. See Join sources must have matching join keys and record indexes for more details.
The Record Index tab contains the following fields:
The Record Join tab contains the following fields:
Field |
Description |
---|---|
Join type |
A drop-down list of the available join types. |
Multi sub-records |
If you are performing a left join, check the Multi Sub-records option if the left record can be joined to more than one right record. |
Remove duplicate property values |
If checked, duplicate property values (same name, value and children) are not added when records are combined. The property values of the new record are unique. |
Join entries |
List of the record sources to be joined. |
Applications consisting of more than one data source require a join to combine the separate data sources. These joins must be configured in a record assembler.
To configure the join in the record assembler:
In the Join Type list, select the kind of join you want to perform:
If you are performing a left join, check Multi Sub-records if the left record can be joined to more than one right record.
The join entries list represents the record sources that will participate in the join, as specified on the Sources tab. In the Join Entries list, define the order of your join entries by selecting an entry and clicking Up or Down.
For all joins, properties get processed from join sources in the order in they are in the list. The first entry is the Left entry for a left join. See Join keys that use multiple properties or dimensions for more details.
To define the join key for a join entry, select the entry from the Join Entries list and click Edit.
The Join Entry editor appears.
The Key Component editor appears.
Using the steps below, create a join key that is identical to the record index key for the join entry you selected.
In the Type frame, choose Custom Property or Dimension.
If you choose Custom Property, type a name for the property in the Custom Property text box. If you choose Dimension, choose a dimension name from the Dimension list.
(Optional) Repeat these steps for each component you want to add to the key.
(Optional) To reorder the components in the key, select a component in the Join Entry editor and click Up or Down.
Repeat steps 5 through 7 for each record source that is participating in the join.
When you are done configuring your join, click OK to close the Record Assembler editor.