The IBML Tool provides high-performance, scalable matching and loading of bulk data to the Sun MDM Suite. It provides the following features:
Includes a match analysis tool that can be used to test and analyze the values of the match threshold and duplicate threshold. (Depending on certain matching parameters, records with a match weight above the match threshold are automatically matched, and records with a match weight between the match threshold and the duplicate threshold are considered potential duplicates.)
Quickly and accurately performs the matching required for a high volume of legacy data that will become the MDM reference data.
Provides a highly scalable and powerful loading mechanism that dramatically reduces the length of time required to load bulk data.
Uses a cluster-based architecture to distribute the processing over multiple servers, so all activities are performed concurrently by all servers.
Reduces the time and resources required to perform a bulk match and load by first grouping records into blocks and then matching within each block rather than matching each record in sequence.
Synchronizes activities between all match and load processes, with a cluster of processors executing the same activity at any point. A cluster synchronizer coordinates activities across all components and processors.
Uses a sequential file I/O to read and write intermediate data.
Performs load balancing across all servers dynamically by having each server process one block of data at a time. Once a server completes a block, it picks up the next one to process.
Provides a default data reader that reads a flat file in the format output by the Data Cleanser, but also allows you to define a custom data reader for other formats.
Uses the existing configuration of the master index project for blocking and matching, and generates the master images based on the object structure of the master index.
Data Integrator provides a convenient wizard to help you generate the ETL collaboration that defines the load process. You can also use a command-line utility instead.