Oracle Waveset 8.1.1 Deployment Guide

Chapter 5 Data Exporter

This chapter describes the Data Exporter feature and provides information required to deploy it.

What is Data Exporter?

Waveset processes user account information on a wide range of systems and applications, providing a controlled, audited environment useful for making changes that remain in compliance with corporate policies. Waveset is a “data light” architecture. It locally stores a minimal amount of account information on the systems and applications that it manages and fetches the data from the actual system or application when necessary.

This architecture helps reduce data duplication and minimizes the risks of transferring stale data during provisioning operations, but there are times when having the account data stored locally is desirable. For example, being able to query account information without accessing the underlying system or application can bring significant performance improvements for some operations, such as identifying all accounts that have a specific attribute value. Typically, the use of system or application account data is related to reporting operations rather than provisioning operations, but in some cases the data does have value to the organization.

In addition to being a “data light” architecture, Waveset uses a “current data only” data model, which means it does not keep historical records (other than the audit and system logs). The advantage of this model is that the size of the operational repository tends to be proportional to the number of accounts, systems, and applications being managed. As a result, the provisioning system itself needs less maintenance. However, the data processed by Waveset may be valuable for historical processing.

For example, questions similar to the following rely on historical data:

Data Exporter allows you to selectively capture a large amount of the information processed by Waveset, including the account and workflow data necessary to answer questions like those listed above. Waveset produces this data in a form that can flow into a data warehouse to be further processed or used as a basis for queries and transformations using commercial database transformation, reporting, and analysis tools.

You are not required to export data from Waveset. If you do not need to track this type of the historical data, you are not required to keep it. If you require this data, you are free to establish your own data aging and retention policies without impact to Waveset.

Exportable Data Types

Data Exporter can export both persistent and transient data. Persistent refers to the data Waveset stores in the repository. Transient data is data that is either not stored in the Waveset repository by default, or data that has a lifecycle that precludes periodic fetching of changed records. Some types of data are both transient and persistent, such as Task Instances and WorkItems. These data types are considered transient because they are deleted by Waveset at times that are not externally predictable.

Waveset exports the following data types.

Table 5–1 Supported Data Types

Data Type  

Persistence 

Description  

Account 

Persistent 

Record containing the linkage between a User and a ResourceAccount 

AdminGroup 

Persistent 

A group of Identity Manager permissions available on all ObjectGroups 

AdminRole 

Persistent 

The permissions assigned to one or more ObjectGroups 

AuditPolicy 

Persistent 

A collection of rules evaluated against an Identity Manager object to determine complicance to a business policy 

ComplianceViolation 

Persistent 

Tracks a User's non-compliance with an AuditPolicy 

Entitlement 

Persistent 

A record containing the list of attestations for a specific User 

LogRecord 

Persistent 

A record containing a single audit record 

ObjectGroup 

Persistent 

A security container that is modeled as an organization 

Resource 

Persistent 

A system/application on which accounts are provisioned 

ResourceAccount 

Transient 

A set of attributes that comprise an account on a specific Resource 

Role

Persistent 

A logical container for access 

Rule 

Persistent 

A block of logic that can be executed by Waveset 

TaskInstance 

Transient and persistent 

A record indicating an executing or completed process 

User 

Persistent 

A logical user that includes zero or more accounts 

WorkflowActivity 

Transient 

A single activity of an Waveset workflow 

WorkItem 

Transient and persistent 

A manual action from an Waveset workflow 

Data Exporter allows you to define strategies for exporting each type of data, depending on the exact needs of the warehouse. For example, some data types may need to export every change to an object while other data types may be satisfied with exporting at a fixed interval, potentially skipping intermediate changes to the data.

You can select which types will be exported. Once a type is selected, all new and modified instances of that type will be exported. Persistent data types can also be configured to export deleted objects.

Data Exporter Architecture

When Data Exporter is enabled, Waveset stores each detected change to a specified object (data type) as a record in a table in the repository. At a configurable interval for each data type, the system executes two queries that select the records to export.

The exported records are not ordered. However, there are fields in the exported data that allow a subsequent query of the warehouse to put the data in chronological order.

In a typical deployment, Data Exporter writes data to a set of staging tables. Waveset provides SQL scripts that define these tables for each type of supported database. You do not need to modify these tables, unless your Waveset deployment contains extended attributes that need to be exported. However, if you have extended attributes that will be exported, then you must customize your export schema and compile your own factory class for handling these attributes. For more information, see Customizing Data Exporter.

Exporting data to staging tables allows you to write your own Extract, Transform, and Load (ETL) infrastructure so that the data can be processed for storage in a data warehouse, and ultimately, in a datamart. Timestamp manipulation is a commonly-implemented transformation. The system uses the java.sql.Timestamp format of YYYY-MM-DD hh:mm:ss. Although the day of the week is not explicitly specified in the timestamp, it can be extracted using a transformation.

If you do not need to transfer information to a warehouse and datamart, then you can consider the staging tables to be the final destination. In this case, be sure to use the same connection information for read and write operations. See the Business Administrator's Guide for information about configuring Data Exporter.

Forensic queries allow Waveset to read data that has been stored in the data warehouse (or staging tables in a simple environment). They can identify users or roles based on current or historical values of the user, role, or related data types. A forensic query is similar to a Find User or Find Role report, but it differs in that the matching criteria can be evaluated against historical data, and because it allows you to search attributes that are of data types other than the user or role being queried. See the Business Administrator's Guide for information about defining forensic query.

The following diagram illustrates the data flow when Data Exporter is enabled.

Figure 5–1 Data Exporter Data Flow

Data Flow for Data Exporter

Planning for Data Exporter

Before you begin deploying Data Exporter, you need to plan for the following:

Database Considerations

Data Exporter can export to any database that is supported as an Waveset repository. In addition, Data Exporter should also work with any RDBMS supported by Hibernate 3.2.

Hibernate Support

Data Exporter uses Hibernate 3.2 for the bi-directional mapping between Waveset Java objects and RDBMS tables. Waveset provides a set of files (one for each data type) that control the mapping between warehouse beans and RDBMS tables. These files are located in the $WSHOME/exporter/hbm directory.

See Customizing Data Exporter for more details.

Hibernate uses C3P0 as its connection pool. C3P0 sends its log entries to the JRE logging system, which has INFO-level logging enabled by default. To restrict what is logged, add the following lines to the bottom of the $JRE/lib/logging.properties file:

com.mchange.v2.c3p0.impl.level=SEVERE
com.mchange.v2.c3p0.level=SEVERE
com.mchange.v2.log.level=SEVERE

Object/Relational Mapping

Waveset uses (Java) objects to perform its work, but when these objects are to be exported to a set of relational database tables, the objects must undergo a transformation commonly called object/relational mapping. This transformation is necessary because there are differences between the types of data that can be expressed in a RDBMS relationship and the types of data that can be expressed in an arbitrary Java object. For example, consider the following Java class:


class Widget {
  private String _id;
  private Map<String,Widget> _subWidgets;
  ...
}

This class presents a problem when expressed in relational terms, because the _subWidgets field is a nested structure. If you try decomposing two hierarchies of Widget objects that have shared subWidgets into a set of RDBMS tables, and delete one of the hierarchies, you quickly end up with a reference-counting problem.

To address the representational differences, Waveset places some constraints on what type of data can be exported. Specifically, the limit allows for the top-level Java object to contain scalar attributes, lists of scalar attributes, and maps of scalar attributes. In a few instances, Waveset needs a slightly richer expression, and to resolve these cases Waveset has introduced the PseudoModel. A PseudoModel is conceptually a data structure containing only scalar attributes. A top-level Java object can contain attributes that are PseudoModels or Lists of PseudoModels. PseudoModels are Waveset structures that cannot be extended. The following is an example of a PseudoModel.


class TopLevelModel
{
    private String _name;
    private List<PseudoModelPoint> _points;
}
class PseudoModelPoint
{
    private String _name;
    private String _color;
    private int _x;
    private int _y;
    private int _z;
}

Waveset can properly perform the object/relational transformation of TopLevelModel because PseudoModelPoint only contains scalar attributes. In query-notation, the color attribute of the PseudoModel is addressable as:

TopLevelModel.points[].color

When inspecting the Waveset Data Export schema, you will find a few PseudoModel types. These types represent some of the more complex data in the top-level export models. You cannot query for a PseudoModel directly because a PseudoModel is not exported directly. A PseudoModel is simply structured data held by an attribute of a top-level model.

Database Tables

The number of RDBMS tables defined in the warehouse DDL depends on the number of model types being exported, and what types of attributes each model is exporting. In general, each model requires three to five tables, with list/map valued attributes stored in their own table. The default DDL contains about 50 tables. After studying the export schema, you may choose to modify the Hibernate mapping files to exclude some attributes tables.

Space Requirements

The amount of space required in the exporter warehouse depends on

WorkflowActivity and ResourceAccount are usually the highest-volume exported models. For example, a single workflow could contain multiple activities, and as each workflow is executed, Waveset could create dozens of new records to be written to the warehouse. Editing a User object may result in one ResourceAccount record per account linked to the User. TaskInstance, WorkItem and LogRecord are also high-volume models. A single Waveset server can produce over 50,000 object changes to be exported in one hour of operation.

Export Server Considerations

You should consider running the export task on a dedicated server, especially if you expect to export a large amount of data. The export task is efficient at transferring data from Waveset to the warehouse and will consume as much CPU as possible during the export operation. If you do not use a dedicated server, you should restrict the server from handling interactive traffic, because the response time will degrade dramatically during a large export.

The Export Task primarily performs input/output operations between the Waveset repository and the staging tables. The memory requirements of the export task are modest, although the memory needs increase as the number of queued records increases. The export task is typically constrained by the speed of the input/output and uses multiple concurrent threads to increase throughput.

Choosing the appropriate server requires experimentation. If the transfer rates of the input (Waveset repository) or the output (staging tables) are slow, the export task will not saturate a modern CPU. The query speed of the input path will not be an issue, as the export operation only issues a query at the beginning of the export cycle. The majority of the time is spent reading and writing records.

Waveset provides JMX MBeans to determine the input and output data rates. See Business Administrator's Guide for more information about these MBeans.

Loading the Default DDL

This section lists the commands needed to create a database and load the default Data Definition Language (DDL). The export DDL is generated by tools provided with Waveset to match the current export schema.

The create_warehouse scripts are located in the $WSHOME/exporter directory. Waveset also includes corresponding drop_warehouse scripts in the same directory.

DB2

Execute a script similar to the following as the system DBA. Be sure to create the idm_warehouse database and the idm_warehouse/idm_warehouse user before running the script.

CONNECT TO idm_warehouse USER idm_warehouse using ’idm_warehouse’
CREATE SCHEMA idm_warehouse AUTHORIZATION idm_warehouse
GRANT CONNECT ON DATABASE TO USER idm_warehouse

To load the DDL, add the following line to the %WSHOME%\exporter\create_warehouse.db2 file:

CONNECT TO idm_warehouse USER idm_warehouse using ’idm_warehouse’

Then run the following command (assuming a Windows DB2 server):

db2cmd db2setcp.bat db2 -f create_warehouse.db2

MySQL

Execute a script similar to the following as the system DBA.

# Create the database (Schema in MySQL terms)
CREATE DATABASE IF NOT EXISTS idm_warehouse CHARACTER SET utf8 COLLATE utf8_bin;
# Give permissions to the "idm_warehouse" userid logging in from any host.
GRANT ALL PRIVILEGES on idm_warehouse.* TO idm_warehouse IDENTIFIED BY ’idm_warehouse’;
# Give permissions to the "idm_warehouse" userid logging in from any host.
GRANT ALL PRIVILEGES on idm_warehouse.* TO idm_warehouse@’%’ IDENTIFIED BY ’idm_warehouse’;
# Give permissions to the "idm_warehouse" user when it logs in from the localhost.
GRANT ALL PRIVILEGES on idm_warehouse.* TO idm_warehouse@localhost IDENTIFIED BY ’idm_warehouse’;

To load the DDL, execute the following command:

# mysql -uidm_warehouse -pidm_warehouse -Didm_warehouse < create_warehouse.mysql

Oracle

Execute a script similar to the following as the system DBA.

-- Create tablespace and a user for warehouse
CREATE TABLESPACE idm_warehouse_ts
   DATAFILE ’D:/Oracle/warehouse/idm_warehouse.dbf’ SIZE 10M
   AUTOEXTEND ON NEXT 10M
   DEFAULT STORAGE (INITIAL 10M NEXT 10M);
CREATE USER idm_warehouse IDENTIFIED BY idm_warehouse
   DEFAULT TABLESPACE idm_warehouse_ts
   QUOTA UNLIMITED ON idm_warehouse_ts;
GRANT CREATE SESSION to idm_warehouse;

To load the DDL, execute the following command

sqlplus idm_warehouse/idm_warehouse@idm_warehouse < create_warehouse.oracle

SQL Server

Execute a script similar to the following as the system DBA. Uncomment lines as necessary.

CREATE DATABASE idm_warehouse
GO
--For SQL Server authentication:
-- sp_addlogin user, password, defaultdb--For Windows authentication:
-- sp_grantlogin <domain\user>
--For SQL Server 2005:
--CREATE LOGIN idm_warehouse WITH PASSWORD = ’idm_warehouse’, DEFAULT_DATABASE = idm_warehouse sp_addlogin 
’idm_warehouse’, ’idm_warehouse’, ’idm_warehouse’
USE idm_warehouse
GO
--For SQL Server 2005 SP2 create a schema - not needed in other versions:
--CREATE SCHEMA idm_warehouse
--GO
--For SQL Server 2005 SP2 use CREATE user instead of sp_grantdbaccess
--CREATE USER idm_warehouse FOR LOGIN idm_warehouse with DEFAULT_SCHEMA = idm_warehouse
sp_grantdbaccess ’idm_warehouse’
GO

To load the DDL, execute the following command:

osql -d idm_warehouse -U idm_warehouse -P idm_warehouse < create_warehouse.sqlserver

Upgrading Data Exporter

Data Exporter provides the means to periodically export data that is managed or has been processed by Waveset to a set of DBMS tables for further processing. The export process is intentionally open to customizations, some of which may require manual intervention for the proper behavior. The Waveset configuration objects that are relevant to Data Exporter are preserved and updated appropriately. However, some exporter customization is done to files within the web application, and these take special handling.

During the upgrade process, Waveset overwrites all unmodified Data Exporter files in the $WSHOME and $WSHOME/exporter directories. If you made changes to any Data Exporter files, then the upgrade process leaves your modified version in place and installs the newer version of the file in $WSHOME/patches/Identity_Manager_8_1_0_0_Date/filesNotInstalled. If you want to merge the new functionality with your customizations, you must do this manually.

Note that the following files in $WSHOME are often customized:

model-export.dtd
model-export.xml
model-export.xsl
exporter/exporter.jar
exporter/create_warehouse.*
exporter/drop_warehouse.*
exporter/hbm/*.hbm.xml

The upgrade steps you must perform vary depending on whether you customized Data Exporter in 8.0 and your plans for Data Exporter in 8.1

After 8.1 is installed, if the 8.1 version of model-export.xml is in place, you can see the new data types and attributes by looking at the schema file at http://server:port/idm/model-export.xml. New types and attributes are flagged with the 8.1 release number.

Customizing Data Exporter

Data exporting has two levels of schema in effect, the internal (ObjectClass) and the external (Export) schemas. These schemas provide a data “interface” that can be proven to be compliant over multiple releases of the Waveset. Compliant means that the attribute names, data types, and data meanings will not change. An attribute may be removed, but the attribute name cannot be re-used to mean something different. Attributes may be added at any time. A compliant schema allows reports to be written against a version of the schema and run without modification against any later version.

The ObjectClass schema tells programs in the Waveset server what the data should look like, while the external schema tells the warehouse what the data should look like. The internal schema will vary from release to release, but the external schema will stay compliant across releases.

Waveset ObjectClass Schema

The ObjectClass schema can be extended for User and Role types, but otherwise cannot be changed. The ObjectClass schema is used by programs executing on the Waveset servers to provide access to the data objects themselves. This schema is compiled into Waveset and represents the data that is stored and operated on within Waveset.

This schema may change between versions of Waveset, but is abstract to the data warehouse because of the export schema. The ObjectClass schema provides a schema abstraction on top of the Waveset Persistent Object layer, which are the data objects stored in the Waveset repository.

Custom User and Role attributes, also known as extended attributes, are defined in the IDMSchemaConfiguration object. See Chapter 12, Editing Configuration Objects for information about adding extended attributes to the ObjectClass schema.

Export Schema

The export schema defines what data can be written to the warehouse. By default, it is limited to a subset of the ObjectClass schema, although the difference between the two is very small. The ObjectClass schema is represented by Java objects, but the export schema must have a bi-directional mapping between Java objects and RDBMS tables.

After you have added an extended attribute to the IDMSchemaConfiguration object, you must define the same attribute in the export schema, which is defined in the $WSHOME/model-export.xml file. Locate the Role or User model in this file and add a field element that defines the attribute. The field element can contain the following parameters.

Table 5–2 Export attribute parameters

Parameter  

Description  

name 

The name of the attribute. This value must match the name assigned in the IDMSchemaConfiguration object.

type 

The data type of the attribute. You must specify the full Java class name, such as java.lang.String or java.util.List.

introduced 

Optional. Specifies the release that the attribute was added to the schema. 

friendlyName 

The label that is displayed on the Data Exporter configuration pages. 

elementType 

If the type parameter is java.util.List, then this parameter specifies the data type of the items in the list. Common values include java.lang.String and com.sun.idm.object.ReferenceBean.

referenceType 

If the elementType parameter is com.sun.idm.object.ReferenceBean, then this parameter references to another Waveset object or pseudo-object.

forensic 

Indicates the attribute is used to determine relationships. Possible values are User and Role.

exported 

When set to false, the attribute is not exported. If you want to hide a default attribute from the Data Exporter data type configuration page, add exported=’false’ to the attribute definition.

You must create a custom WIC library to be able to export an attribute in the default schema that has exporting disabled. 

queryable 

When set to false, the field is not available for forensic queries. 

max-length 

The maximum length of a value. 

The following example adds an extended attribute named telno to the export schema as part of the User model:

<field name=’telno’
    type=’java.lang.String’
    introduced=’8.0’
   max-length=’20’
   friendlyName=’Telephone Number’>
   <description>The phone number assigned to the user.</description>
</field>

Modifying the Warehouse Interface Code

The Warehouse Interface Code (WIC) is provided in binary and source form in Waveset. Many deployments will be able to use the WIC code in binary form (no modifications), but some deployments may want to make other changes. The WIC code must implement two interfaces to be used for exporting, and a third interface to be used by the Forensic Query interface.

The default WIC implementation writes to a set of RDBMS tables. For many applications this is sufficient, but you could create custom WIC code to write the date to a JMS queue or to some other consumer.

The com.sun.idm.exporter.Factory and com.sun.idm.exporter.Exporter classes are used to export data. The export code is responsible for converting models (Java data objects) to a form suitable for storage. Typically, this means writing to a relational database. As a result, the WIC code is responsible for Object to Relational transformation.

The default WIC implementation uses Hibernate to provide the Object/Relational mapping. This mapping is controlled by the Hibernate .hbm.xml mapping files, which are in turn generated based on the export schema. Hibernate prefers to use a Java bean-style data object for its work, and has various get and set methods to accomplish this. The WIC code generates the corresponding Bean and hibernate files that match the export schema. If Hibernate provides the necessary mapping features, there may be no need to modify any WIC code manually.

The WIC files are located in the InstallationDirectory/REF/exporter directory.

Generating a New Factory Class

Waveset allows you to add custom User and Role attributes to the ObjectClass schema. These attributes, known as extended attributes, cannot be exported unless you also add them to the export schema, regenerate the Warehouse Interface Code (WIC), and deploy the code.

When extended attributes are added, you will need to edit the export schema control file and add the attributes. If attributes are to be excluded from the exporter, then you can simply mark the schema fields with exported=’false’ and regenerate the WIC code.

To modify the WIC code you will need the following installed on your system

The steps required to export extended attributes are as follows:

ProcedureTo Export Extended Attributes

  1. Get the WIC source code from the REF kit

  2. Set the WSHOME environment variable to the installation directory of Waveset

  3. Back-up the export schema control file $WSHOME/model-export.xml then edit it.

  4. Change directories to the WIC source top-level directory. This directory should contain files named build.xml, BeanGenerator.java, and HbmGenerator.java.

  5. Stop the application server.

  6. Remove CLASSPATH from the environment.


    Note –

    You must remove CLASSPATH from the environment before performing executing ant rebuild in the next step.


  7. Rebuild the WIC code with the ant rebuild command.

  8. Deploy the modified WIC code to the application server with the ant deploy command.

  9. Restart the application server.


    Note –

    If you change model-export.xml and rebuild the WIC as shown in the preceding steps, a new warehouse DDL is generated. You must drop the old tables and load the new DDL, which deletes any data that is already in the tables.


Adding Localization Support for the WIC

The export schema contains numerous strings that are displayed on the Data Exporter Type Configuration pages. Use the following procedure to display the strings in a language that is not officially supported:

ProcedureTo Add Localization Support for the WIC

  1. Extract the contents of the $WSHOME/WEB-INF/lib/wicmessages.jar file.

  2. Navigate to the com/sun/idm/warehouse/msgcat directory.

  3. Translate the contents of WICMessages.properties file. Make sure the final results are saved in a file that contains the locale.

    You do not need to save the message catalog to the System Configuration object.

Troubleshooting Data Exporter

The volume and variety of data flowing through the exporter increase the possibility of problems occurring during data export.

Beans and Other Tools

Data Exporter performance and throughput can be monitored through the JMX management beans provided in Waveset. To minimize the performance impact of exporting data, Waveset uses some memory-based queues that are volatile. If the server terminates unexpectedly, the data in these queues will be lost. You can monitor the size of these queues over a period of time to judge your exposure to this risk.

Model Serialization Limits

Data Exporter must queue some objects to ensure they are available for export at the appropriate time. Queuing these objects is done by Java serialization. However, it is possible to include data in an exported object that is not serializable. In this case, the exporter code should detect the non-serializable data and replace it with tokens that indicate the problem, allowing the rest of the object to be exported.

Repository Polling Configuration

Each type may specify an independent export cycle. The administrator interface provides an easy way to define the simpler cycles which will be sufficient for most purposes. However, the export cycles can also be specified in the native cron style, which supports even more flexibility.

Tracing and Logging

The default WIC code uses Hibernate to provide object/RDBMS mapping for the exported data objects, but using the Hibernate library means the tracing and logging is not fully integrated. The actual WIC code can be traced by using the com.sun.idm.warehouse.* package. However, enabling Hibernate logging requires a different technique.

To pass a Hibernate property to the code that initiates the Hibernate sessions, add an attribute to the DatabaseConnection configuration object. You must prefix the attribute name with an “X”. For example, if the native property name is hibernate.show_sql, you must define it in the configuration object as Xhibernate.show_sql. The following example causes Hibernate to print any generated SQL to the application server’s standard output.

<Attribute name=’Xhibernate.show_sql’ value=’true’>

By default, Hibernate uses C3P0 for connection pooling. C3P0 uses the java.logging facility for its logging, which is controlled by the $JRE/lib/logging.properties file.