The following topics provide information about Sun Master Index, how it is used to create a master index application, and the master index applications you create with Sun Master Index. It also includes a description of the files stored in the Java CAPS repository, the XML files that define the structure and configuration of the master index environment, and the runtime components.
Sun Master Index provides a flexible framework to allow you to create matching and indexing applications called enterprise-wide master index applications. It is an application building tool to help you design, configure, and create a master index application that will uniquely identify and cross-reference the business objects stored in your system databases. Business objects can be any type of entity for which you store information, such as customers, patients, vendors, businesses, hardware parts, and so on. In Sun Master Index, you define the data structure of the business objects to be stored and cross-referenced as well as the logic that determines how data is updated, standardized, weighted, and matched in the master index application.
The structure and logic you define is located in a group of XML configuration files that you create using the Wizard Editor for Master Index. These files are created within the context of a Java CAPS project, and can be further customized using the XML editor provided in the NetBeans IDE.
Sun Master Index provides features and functions to allow you to create and configure an enterprise-wide master index application for any type of data. The primary function of Sun Master Index is to automate the creation of a highly configurable master index application . Sun Master Index provides a wizard to guide you through the initial setup steps, and various editors so you can further customize the configuration of the master index application. Sun Master Index automatically generates the components you need to implement a master index application.
Here are some of the features or Sun Master Index.
Rapid Development - Sun Master Index allows for rapid and intuitive development of a master index application using a wizard to create the master index configuration and using XML documents to configure the attributes of the index. Templates are provided for quick development of person and company object structures.
Automated Component Generation - Sun Master Index automatically creates the Sun Master Index configuration files that define the primary attributes of the master index application, including the configuration of the Enterprise Data Manager (EDM). Sun Master Index also generates scripts that create the appropriate database schemas and an Object Type Definition (OTD) based on the object definition you create and configure.
Configurable Survivor Calculator - Sun Master Index provides predefined strategies for determining which field values to populate in the single best record (SBR). You can define different survivor rules for each field and you can create a custom survivor strategy to implement in the master index application.
Flexible Architecture - Sun Master Index provides a flexible platform that allows you to create a master index application for any business object. You can customize the object structure so the master index application can match and store any type of data, allowing you to design an application that specifically meets your data processing needs.
Configurable Matching Algorithm - Sun Master Index provides standard support for the Sun Match Engine and also provides the ability to plug in a custom matching algorithm of your choice.
Custom Java API - Sun Master Index generates a Java API that is customized to the object structure you define. You can call the methods in this API in the Collaborations and Business Processes that define the transformation rules for data processed by the master index application.
Standard Reports - Sun Master Index provides a set of standard reports with each master index application that can be run from a command line or from the EDM. The reports help you monitor the state of the data stored in the master index application and help you identify configuration changes that might be required. You can also create custom reports using any ODBC-compliant reporting tool, SQL, or Java.
The components of Sun Master Index are designed to work within Java CAPS to create and configure the master index application and to define connectivity between external systems and the master index application. The primary components of Sun Master Index include the following.
The Wizard Editor takes you through each step of the master index application setup process, and creates the XML files that define the configuration of the application. The wizard allows you to define the name of the master index application, the objects to store, the fields in each object and their attributes, the Enterprise Data Manager (EDM) configuration, and the database and match engine platforms to use. The wizard generates a set of configuration files and database scripts based on the information you specify. You can further customize these files as needed.
Sun Master Index provides the following editors to help you customize the files generated in the Sun Master Index project.
Configuration Editor (Repository) - Allows you to customize certain portions of the XML configuration files using a graphic interface. The Configuration Editor provides validation services for file structure and syntax.
XML Editor - Allows you to review and customize the XML configuration files created by the wizard. The editor provides schema validation services and verification for XML syntax. The XML editor is automatically launched when you open a Sun Master Index configuration file.
Text Editor - Allows you to review and customize the database scripts created by the wizard. This editor is very similar to the XML editor but without the verification services. The text editor is automatically launched when you open a Sun Master Index database script or configuration file.
Java Source Editor - Allows you to create and customize custom plug-in classes for the master index application. This editor is a simple text editor, similar to the Java Source Editor in the Java Collaboration Editor. The Java source editor is automatically launched when you open a custom plug-in file.
A master index application is implemented within a project in the Java CAPS repository. When you create an master index application, a set of configuration files and a set of database files are generated based on the information you specified in the wizard. When you generate the project, additional components are created, including a method OTD, an outbound OTD, Business Process methods, necessary .jar files, and a Custom Plug-in function that allows you to define additional custom processing for the index. To complete the project, you create a Connectivity Map and Deployment Profile.
Additional components can be added to the client projects that access the master index application, including Services, Collaborations, OTDs, Web Connectors, Adapters, JMS Queues, JMS Topics, Business Processes, and so on. You can use the standard Java CAPS editors, such as the OTD or Collaboration editors, to create these components.
Following is a list of Sun Master Index project components.
Configuration Files
Database Scripts
Custom Plug-ins
Match Engine Configuration Files
Object Type Definitions
Dynamic Java Methods
Connectivity Components
Deployment Profile
The following figure illustrates the project and Environment components of Sun Master Index.
Several XML files together determine certain characteristics of the master index application, such as how data is processed, queried, and matched. These files configure the runtime components of the master index application, which are listed in Master Index Runtime Components.
Object Definition - Defines the data structure of the object being indexed in a master index application.
Enterprise Data Manager - Configures the search functions and appearance of the EDM, along with debug information and security information for authorization.
Candidate Select - Configures the Query Builder component of the master index application and defines the queries available for the index.
Match Field - Configures the Matching Service and defines the fields to be standardized or used for matching. It also specifies the match and standardization engines to use.
Threshold - Configures the Manager Service and defines certain system parameters, such as match thresholds, EUID attributes, and update modes. It also specifies the query from the Query Builder to use for matching queries.
Best Record - Configures the Update Manager and defines the strategies used by the survivor calculator to determine the field values for the single best record (SBR). You can define custom update procedures in this file.
Field Validation - Defines rules for validating field values. Rules are predefined for validating the local ID field and you can create custom validation rules to plug in to this file.
Security - This file is a placeholder to be used in future versions.
Two database scripts are generated by the wizard to define external systems and code lists. Additional scripts to create or drop database tables are created when you generate the project (or by the wizard if you choose to generate all project files in the wizard).
Systems - Contains the SQL insert statements that add the external systems you specified in the wizard to the database. You can define additional systems in this file.
Code List - Contains the SQL statements to insert processing codes and drop-down list values into the database. Some of the entries in this file are generated by the wizard. Code lists must be defined in this file to make them available to the master index application.
Create database script - Defines the structure of the master index database based on the object structure specified in the wizard. You can customize this file and then run it against an Oracle or SQL Server database to create a customized master index database.
Drop database script - Used primarily in testing when you need to drop existing database tables and create new ones. The delete script removes all tables related to the master index application so you can recreate a fresh database for your project.
You can also create custom scripts to store in the master index application project and run against the master index database.
Sun Master Index provides a method by which you can create custom processing logic for the master index application. To do this, you need to define and name a custom plug-in, which is a Java class that performs the required functions. Once you create a custom plug-in, you incorporate it into the index by adding it to the appropriate configuration file. You can create custom update procedures and field validations, as well as define custom master index components. Update procedures must be referenced in the update policies of the Best Record file; field validations must be referenced in the Field Validation file; and custom components must be referenced in the configuration file for that component. For example, if you create a custom Query Builder, it must be listed in the Candidate Select file to be accessible to the master index application.
Several configuration files for the Sun Match Engine are created in the master index project. The configuration files under the Match Engine node define certain weighting characteristics and constants for the match engine. The configuration files under the Standardization Engine node define how to standardize names, business names, and address fields. You can customize these files as needed.
Sun Master Index generates an outbound OTD based on the object structure defined in the Object Definition file. This OTD is used for distributing information that has been added or updated in the master index application to external systems. It includes the objects and fields defined in the Object Definition file plus additional SBR information (such as the create date and create user) and additional system object information (such as the local ID and system code). If you plan to use this OTD to make the master index application data available to external systems, you must define a JMS Topic in the master index Connectivity Map to which the master index application can publish transactions.
Due to the flexibility of the object structure, Sun Master Index generates several dynamic Java methods for use in Collaborations and in Business Processes. One set is provided in a method OTD for use in Collaborations and one set is provided for Business Processes. The names, parameter types, and return types of these methods vary based on the objects you defined in the object structure.
Generating the master index application creates a method OTD that includes Java functions you can use to define data processing rules in Collaborations. These functions allow you to define how messages received from external systems are processed by the Service. You can define rules for inserting new records, retrieving record information, updating existing records, performing match processing, and so on.
In addition to the method OTD, which can be used in Collaborations, Sun Master Index creates a set of Java methods that can be incorporated into Business Processes and into Web Services. These methods are a subset of those defined for the method OTD, providing the ability to view, retrieve, and match information in the master index application database.
The master index project Connectivity Map consists of two required components: the web application service and the application service. Two optional components are a JMS Topic for broadcasting messages and an Oracle or SQL Server Adapter for database connectivity. In client project Connectivity Maps you can use any of the standard project components to define connectivity and data flow to and from the master index application. Client projects include those created for the external systems sharing data with the index through a Collaboration or Business Process.
For client projects, you can use connectivity components from the master index server project and any standard Java CAPS connectivity components, such as OTDs, Services, Collaborations, JMS Queues, JMS Topics, and Adapters. Client project components transform and route incoming data into the master index database according to the rules contained in the Collaborations or Business Processes. They can also route the processed data back to the appropriate local systems through Adapters.
The Deployment Profile defines information about the production environment of the master index application. It contains information about the assignment of Services and message destinations to the application server and JMS IQ Managers within the master index system. Each master index project must have at least one Deployment Profile and can have several, depending on the project requirements and the number of Environments used. You must deploy the project before you can use the custom master index application you created using Sun Master Index.
The Sun Master Index Environments define the deployment environment of the master index application, including the domain, application server, external systems, and so on. If master index client projects use the same Environment, it might also include a JMS IQ Manager, constants, Web Connectors, and External Systems. Each Environment represents a unit of software that implements one or more master index applications. You must define and configure at least one Environment for the master index application before you can deploy the application. The application server hosting the master index application is configured within the Environment in NetBeans.
In today’s business environment, important information about certain business objects in your organization might exist in many disparate information systems. It is vital that this information flow seamlessly and rapidly between departments and systems throughout the entire business network. As organizations grow, merge, and form affiliations, sharing data between different information systems becomes a complicated task. The master index applications you create from Sun Master Index can help you manage this data and ensure that the data you have is the most current and accurate information available.
Regardless of how you define the structure of the business object and configure the runtime environment for the master index application, the final product will include much of the same functions and features. The master index application provides a cross-reference of centralized information that is kept current by the logic you define for unique identification, matching, and update transactions.
In the runtime environment, the master index application provides the following functions to help you monitor and maintain the data shared throughout the index system.
Transaction History - The system provides a complete history of each object by recording all changes to each object’s data. This history is maintained for both the local system records and the SBR.
Data Maintenance - The web-based user interface supports all the necessary features for maintaining data records. It allows you to add new records; view, update, deactivate, or reactivate existing records; and compare records for similarities and differences. You can perform these functions against each local system record or SBR associated with an enterprise object.
Search - The information contained in each SBR or system record can be obtained from the database using a variety of search criteria. You can perform searches against the database for a specific object or a set of objects. For certain searches, the results are assigned a matching weight that indicates the probability of a match.
Potential Duplicate Detection and Handling - One of the most important features of the master index application is its ability to match records and identify possible duplicates. Using matching algorithm logic, the index identifies potential duplicate records and provides the functionality to correct the duplication. Potential duplicate records are easily corrected by either merging the records in question or marking the records as “resolved”.
Merge and Unmerge - You can compare potential duplicate records and then merge the records if you find them to be actual duplicates of one another. You can merge records at either the EUID or system record level. You can determine which record to retain as the active record and what information from each record to preserve in the resulting record.
Reports - You can generate reports that provide information about the current state of the data in the master index application, helping you monitor stored data and determine how that data needs to be updated. Report information also helps verify that the matching logic and weight thresholds are defined correctly.
The runtime components of the master index application are designed to uniquely identify, match, and maintain information throughout a business enterprise. These components are highly configurable, allowing you to create a custom master index application suited to your specific data processing needs.
Primary features of the master index application include the following:
Centralized Information - The master index application maintains a centralized database, enabling the integration of data records throughout the enterprise while allowing local systems to continue operating independently. The index stores copies of local system records and of SBRs, which represent the most accurate and complete data for each object. This database is the central location of information and identifiers, and is accessible throughout the enterprise.
Configurability - Before deploying the master index application, you define the components and processing capabilities of the system to suit your organization’s processing requirements. You can configure the object structure, matching and standardization rules, survivorship rules, queries, EDM appearance, and field validation rules.
Cross-referencing - The master index application is a global cross-referencing application that automates record matching across disparate source systems, simplifying the process of sharing data between systems. The master index application uses the local identifiers assigned by your existing systems as a reference, allowing you to maintain your current systems and practices while maintaining the most current and accurate information.
Data Cleansing - The master index application uses configurable matching algorithm logic to uniquely identify object records and to identify duplicate and potential duplicate records. The index provides the functionality to easily merge or resolve duplicates and can be configured to automatically match records that are found to be duplicates of one another.
Data Updates - The master index application provides the ability to add, update, deactivate, and delete data in the database tables through messages received from external systems. Records received from external systems are checked for potential duplicates during processing. Merges can also be performed through external system messages. Data updates from external systems can occur in real time or as batch processes.
Identification - The master index application employs configurable probabilistic matching technology, which uses a matching algorithm to formulate an effective statistical measure of how closely records match. Using a state-of-the-art algorithm in real-time mode and establishing a common method of locating records, the index consistently and precisely identifies objects within an enterprise.
Integration - Relying on the application server, the master index application provides the power and flexibility to identify, route, and transform data to and from any system or application throughout your business enterprise. It can accept incoming transactions and distribute updates to external systems, providing seamless integration with the systems in your enterprise.
Matching Algorithm - The master index application is designed to use the Sun Match Engine or a custom matching algorithm to provide a matching probability weight between records. Sun Master Index provides the flexibility to create user-defined matching thresholds, which control how potential duplicates and automatic matches are determined.
Shared Information - Each time a record is updated, added, merged, or unmerged from the EDM, the master index application generates a message that can be transmitted to external systems. It also receives, processes, and broadcasts messages containing information about the objects in your index.
Unique Identifier - Records from various systems are cross-referenced using an enterprise-wide unique identifier, known as an EUID, that the index assigns to each object record. The index uses the EUID to cross-reference the local IDs assigned to each object by the various computer systems throughout the enterprise.
The master index applications created by Sun Master Index are made up of several components that work together to form the complete indexing system. The primary components of the master index application include the following:
In addition, the master index application uses the connectivity components defined in the Sun Master Index server and client projects to route data between external systems and the master index application.
The Java CAPS repository stores information about the configuration and structure of the master index environment. Because the master index application is deployed within the repository, it can be implemented in a distributed environment. The master index system requires the Sun Java System Application Server.
The components of a master index application are illustrated in the following figure.
The Matching Service stores the logic for standardization (which includes data parsing and normalization), phonetic encoding, and matching. It includes the specified standardization and match engines, along with the configuration you defined for each. The Matching Service also contains the data standardization tables and configuration files for the match engine. The configuration of the Matching Service is defined in the Match Field file.
The Manager Service provides a session bean to all components of the master index application, such as the Enterprise Data Manager, Query Builder, and Update Manager. The service also manages connectivity to the master index database. The configuration of the Manager Service specifies the query to use for matching and defines system parameters that control EUID generation, matching thresholds, and update modes. The configuration of the Manager Service is defined in the Threshold file.
The Query Builder defines all queries available to the master index application, including the queries performed automatically by the master index application when searching for possible matches to an incoming record. It also includes the queries performed manually through the Enterprise Data Manager (EDM). The EDM queries can be either alphanumeric or phonetic and have the option of using wildcard characters. The configuration of the Query Builder is defined in the Candidate Select file.
The Query Manager is a service that performs queries against the master index database and returns a list of objects that match or closely match the query criteria. The Query Manager uses classes specified in the Match Field file to determine how to perform a query for match processing. All queries performed in the master index application are executed through the Query Manager.
The Update Manager controls how updates are made to an entity’s SBR by defining a survivor strategy for each field. The survivor calculator in the Update Manager uses these strategies to determine the relative reliability of the data from external systems and to determine which value for each field to populate into the SBR. The Update Manager also manages certain update policies, allowing you to define additional processing to be performed against incoming data. The configuration of the Update Manager is defined in the Best Record file.
OPS is a database service that translates high-level and descriptive object requests into actual JDBC calls. The service provides mapping from the Java object to the database and from the database to the Java object.
The master index application uses an Oracle or SQL Server database to store the information you specify for the business objects being cross-referenced. The database stores local system records, the single best record for each object record, and certain administrative information, such as drop-down menu lists, processing codes, and information about the systems from which data originates. The scripts that are generated to create the database tables are based on the information specified in the Object Definition file.
The Enterprise Data Manager (EDM) is a web-based interface that allows you to monitor and maintain the data in your master index database. Most of the configurable attributes of the EDM are defined by information you specify in the wizard, but you can further configure the EDM in the Enterprise Data Manager file after you generate the Sun Master Index application. The EDM provides the ability to manually search for records; update, add, deactivate, and reactivate records; merge and unmerge records; view potential duplicates; and view comparisons of object records.
An enterprise record includes all components of a record that represents one entity. The master index application stores two different types of records in each enterprise record: system records and a single best record (SBR). A system record contains an enterprise record’s information as it appears in an incoming message from an external system. An enterprise record’s SBR stores data from a combination of external systems and it represents the most reliable and current information contained in all system records for an enterprise record. An enterprise record consists of both system records and the SBR.
The structure of a system record is different from the SBR in that each system record contains a system and local ID pair. The remaining information contained in the system records of an enterprise record is used to determine the best data for the SBR in that enterprise record. If an enterprise record only contains one system record, the SBR is identical to that system record (less the system and local ID information). However, if the enterprise record contains multiple system records, the SBR might be identical to one system record but will more likely include a combination of information from all system records.
The SBR for an object is created from the most reliable information contained in each system record representing that object. The information used from each external system to populate the SBR is determined by the survivor calculator, which is configured in the Best Record file. This data is determined to be the most reliable information from all system records in the enterprise record. The survivor calculator can consider factors such as the relative reliability of an external system, how recent the data is, and whether the SBR contains any “locked” field values. You define the rules that select a field value to be persisted in the SBR.
In a master index application, each system record and SBR in an enterprise record typically contains a set of objects that store different types of information about the business object. A record usually contains a parent object and several child objects. A record can have only one parent object, but can have multiple child objects and multiple instances of each type of child object. For example, if the business object being indexed is a person, the record can only contain one primary name and social security number, which would be contained in the parent object (for example, a person object). However, the record could have multiple addresses, telephone numbers, and aliases, which would each be defined in different child objects (for example, in address, phone, and alias objects). Each address would be stored in a different instance of an address object.
Sun Master Index supports standard Java Composite Application Platform Suite functions for version control, XML validation, and copy, cut, and paste functions.
Sun Master Index supports the version control functionality provided by Java CAPS. You can check files in and out, retrieve older versions to a workspace, view a version history, and so on. In addition, Sun Master Index supports recursive check-ins and check-outs. When you select Recurse Project, you can check in or out all components below the selected node or a subset of those components.
You can use standard cut, copy, and paste commands to copy or move files between projects. Sun Master Index follows the standard functionality, with the exception that you can only copy or move a component from one project into the same node of another project. For example, you can only paste a copied configuration file into the Configuration node of another project. In addition, you cannot cut or delete components that are essential to a project, such as the configuration files, match and standardization files, and so on.