Sun Identity Manager Deployment Guide

Planning for Data Exporter

Before you begin deploying Data Exporter, you need to plan for the following:

How much data will you export? The number of exported data types determines how many database tables will be required. If you choose to queue all changes to a data type, or even to export object deletions, the database requirements grow. See Database Considerations for more information.
Do you need to have a dedicated export server? Performance can be diminished if you export data on the same server that performs complex workflows. See Export Server Considerations for more information.
Do you have custom extended attributes that need to be exported? If yes, you must update the export schema and recompile the Warehouse Interface Code (WIC) and update the export schema. See Customizing Data Exporter for more information.

Database Considerations

Data Exporter can export to any database that is supported as an Identity Manager repository. In addition, Data Exporter should also work with any RDBMS supported by Hibernate 3.2.

Hibernate Support

Data Exporter uses Hibernate 3.2 for the bi-directional mapping between Identity Manager Java objects and RDBMS tables. Identity Manager provides a set of files (one for each data type) that control the mapping between warehouse beans and RDBMS tables. These files are located in the $WSHOME/exporter/hbm directory.

See Customizing Data Exporter for more details.

Hibernate uses C3P0 as its connection pool. C3P0 sends its log entries to the JRE logging system, which has INFO-level logging enabled by default. To restrict what is logged, add the following lines to the bottom of the $JRE/lib/logging.properties file:

com.mchange.v2.c3p0.impl.level=SEVERE
com.mchange.v2.c3p0.level=SEVERE
com.mchange.v2.log.level=SEVERE

Object/Relational Mapping

Identity Manager uses (Java) objects to perform its work, but when these objects are to be exported to a set of relational database tables, the objects must undergo a transformation commonly called object/relational mapping. This transformation is necessary because there are differences between the types of data that can be expressed in a RDBMS relationship and the types of data that can be expressed in an arbitrary Java object. For example, consider the following Java class:

class Widget {
  private String _id;
  private Map<String,Widget> _subWidgets;
  ...
}

This class presents a problem when expressed in relational terms, because the _subWidgets field is a nested structure. If you try decomposing two hierarchies of Widget objects that have shared subWidgets into a set of RDBMS tables, and delete one of the hierarchies, you quickly end up with a reference-counting problem.

To address the representational differences, Identity Manager places some constraints on what type of data can be exported. Specifically, the limit allows for the top-level Java object to contain scalar attributes, lists of scalar attributes, and maps of scalar attributes. In a few instances, Identity Manager needs a slightly richer expression, and to resolve these cases Identity Manager has introduced the PseudoModel. A PseudoModel is conceptually a data structure containing only scalar attributes. A top-level Java object can contain attributes that are PseudoModels or Lists of PseudoModels. PseudoModels are Identity Manager structures that cannot be extended. The following is an example of a PseudoModel.

class TopLevelModel
{
    private String _name;
    private List<PseudoModelPoint> _points;
}
class PseudoModelPoint
{
    private String _name;
    private String _color;
    private int _x;
    private int _y;
    private int _z;
}

Identity Manager can properly perform the object/relational transformation of TopLevelModel because PseudoModelPoint only contains scalar attributes. In query-notation, the color attribute of the PseudoModel is addressable as:

TopLevelModel.points[].color

When inspecting the Identity Manager Data Export schema, you will find a few PseudoModel types. These types represent some of the more complex data in the top-level export models. You cannot query for a PseudoModel directly because a PseudoModel is not exported directly. A PseudoModel is simply structured data held by an attribute of a top-level model.

Database Tables

The number of RDBMS tables defined in the warehouse DDL depends on the number of model types being exported, and what types of attributes each model is exporting. In general, each model requires three to five tables, with list/map valued attributes stored in their own table. The default DDL contains about 50 tables. After studying the export schema, you may choose to modify the Hibernate mapping files to exclude some attributes tables.

Space Requirements

The amount of space required in the exporter warehouse depends on

Which objects are to be exported
How long the records are to stay in the export warehouse
How busy the Identity Manager servers are

WorkflowActivity and ResourceAccount are usually the highest-volume exported models. For example, a single workflow could contain multiple activities, and as each workflow is executed, Identity Manager could create dozens of new records to be written to the warehouse. Editing a User object may result in one ResourceAccount record per account linked to the User. TaskInstance, WorkItem and LogRecord are also high-volume models. A single Identity Manager server can produce over 50,000 object changes to be exported in one hour of operation.

Export Server Considerations

You should consider running the export task on a dedicated server, especially if you expect to export a large amount of data. The export task is efficient at transferring data from Identity Manager to the warehouse and will consume as much CPU as possible during the export operation. If you do not use a dedicated server, you should restrict the server from handling interactive traffic, because the response time will degrade dramatically during a large export.

The Export Task primarily performs input/output operations between the Identity Manager repository and the staging tables. The memory requirements of the export task are modest, although the memory needs increase as the number of queued records increases. The export task is typically constrained by the speed of the input/output and uses multiple concurrent threads to increase throughput.

Choosing the appropriate server requires experimentation. If the transfer rates of the input (Identity Manager repository) or the output (staging tables) are slow, the export task will not saturate a modern CPU. The query speed of the input path will not be an issue, as the export operation only issues a query at the beginning of the export cycle. The majority of the time is spent reading and writing records.

Identity Manager provides JMX MBeans to determine the input and output data rates. See Business Administrator's Guide for more information about these MBeans.