Oracle Waveset 8.1.1 Business Administrator's Guide

Chapter 16 Data Exporter

The Data Exporter feature allows you to write information about users, roles, and other object types to an external data warehouse.

Read this chapter for information and procedures to help you set up and maintain Data Exporter. For full details about planning and implementing Data Exporter, see Chapter 5, Data Exporter, in Oracle Waveset 8.1.1 Deployment Guide.

This chapter is organized as follows:

What is Data Exporter?

Waveset contains and processes data relevant to managing identities across distributed systems and applications. To improve overall performance, Waveset does not retain all of the data it generates during normal provisioning and other daily activities. For example, Waveset by default does not persist the intermediate status workflow activities and task instances. If it is necessary to capture all or some of the data that Waveset normally discards, you can enable the Data Exporter feature.

When Data Exporter is enabled, Waveset stores each detected change to a specified object (data type) as a record in a table in the repository. These events are queued until a task writes them to an external data warehouse. (You can configure how frequently each type of data is exported.) The exported data can be further processed or used as a basis for queries and transformations with commercial transformation, reporting, and analysis tools.

Exporting data to a data warehouse has a negative impact on the Waveset server’s performance, and this feature should not be enabled unless there is a business need for the exported data.

Waveset also allows you to create and execute forensic queries. A forensic query searches the data warehouse to identify User or Role objects that meet the criteria you specify. See Configuring Forensic Queries for more information.

Planning to Implement Data Exporter

Because Data Exporter is disabled by default, it must be configured to become operational. Configuration of Data Exporter requires several decisions to be made before configuration can begin.

Which data types will be exported?
Which techniques will be used to capture data for each data type?
How often will data be exported for each type?
What will be in the exported schema for each type?
Will a custom Warehouse Interface Code (WIC) factory class be required?

When Data Exporter is enabled, the default configuration exports all attributes of all data types. This may cause an unnecessary processing burden on Waveset and the warehouse by consuming warehouse storage that will never be used. Data warehousing tends to be conservative and capture data when there is a chance the data might be used later. You do not have to export all the data that can be exported. You can configure which data types to export and restrict some events from being export.

Once these decisions above have been made, use the following steps to implement Data Exporter:

To Implement Data Exporter

(Optional) Customize the export schema for selected types and regenerate the warehouse DDL. Refer to the Customizing Data Exporter in Oracle Waveset 8.1.1 Deployment Guide for more information.

Create a user account on the warehouse RDBMS and load the warehouse DDL on that system. Refer to the Customizing Data Exporter in Oracle Waveset 8.1.1 Deployment Guidefor more information.

Configure Data Exporter, as described in Configuring Data Exporter.

Test Data Exporter to ensure it was configured correctly. See Testing Data Exporter for more information.

(Optional) Create forensic queries that can search data written to the data warehouse. See Configuring Forensic Queries for more information.

Maintain Data Exporter using JMX and monitoring the log files. See Maintaining Data Exporter for more information.

Configuring Data Exporter

The Data Exporter configuration page allows you to define what types of data to retain, specify which attributes to export, and schedule when to export the data. Each data type can be configured independently.

To Configure Data Exporter

In the Administrator interface, click Configure in the main menu. Then click the Warehouse secondary tab. The Data Exporter Configuration page opens.

Figure 16–1 Data Exporter Configuration

To define read and write connections, click the Add Connection button. The Edit Database Connection page opens.

Complete the fields on this page and click Save to return to the Data Exporter Configuration page. See Defining Read and Write Connections for more information.

To assign the WIC class and database connections, click the Edit link that is in the Warehouse Configuration Information section. The Data Exporter Warehouse Configuration page opens.

Complete the fields on this page and click Save to return to the Data Exporter Configuration page. See Defining the Warehouse Configuration Information for more information.

Click on a data type link in the Warehouse Model Configuration table. The Data Exporter Type Configuration page opens.

Complete the Export, Attributes, and Schedule tabs on this page and click Save to return to the Data Exporter Configuration page. See Configuring Warehouse Models for more information.

Repeat this step for every data type.

To configure which workflow to run before and after each data type is exported, click the Edit link in the Exporter Automation section. The Data Exporter Automation Configuration opens.

Complete the fields on this page and click save to return to the Data Exporter Configuration page. See for more information.

To configure the export task daemon, click the Edit link that is in the Warehouse Task Configuration section. The Data Exporter Warehouse Configuration page opens.

Complete the fields on this page and click Save to return to the Data Exporter Configuration page. See Configuring the Warehouse Task for more information.

Note –
Exporting is fully operational once these steps have been completed. When exporting is enabled, data records will start queuing for export. If you do not enable the export task, the queue tables will fill up, and queuing will be suspended. It is generally more efficient to export smaller batches (more frequently) than larger ones, but exporting is subject to the write availability of the warehouse itself, which may be constrained for other reasons.

Optionally set the maximum queue size. See Modifying the Configuration Object for more information.

Defining Read and Write Connections

Waveset uses a write connection during the export cycles. It uses the read connection to indicate how many records are currently in the warehouse (during warehouse configuration) and to service the forensic query interface.

Warehouse connections can be defined as an application server DataSource, as a JDBC connection, or as a reference to a database resource. If a JDBC connection or database resource is defined, data exporting uses a small number of connections extensively during write operations and then closes all of the connections. Data Exporter only uses the read connection during warehouse configuration and during forensic query execution, and it will close those connections as soon as the operation completes.

Exporter uses the same schema for write and read connections, and you can use the same connection information for both. However, if you have separate connections, the deployment can write to a set of warehouse staging tables, transform those tables into the real warehouse, and then transform the warehouse tables to a data mart that Waveset will read from.

You can edit the Data Export Configuration form to prevent Waveset from reading from the warehouse. This form contains the includeWarehouseCount property, which causes Waveset to query the warehouse and display the number of records of each data type. To disable this feature, copy the Data Export Configuration Form, change the value of the includeWarehouseCount property to true, and import your customized form.

To Define Read and Write Connections

From the Data Exporter Configuration page, click the Add Connection button.

Figure 16–2 Data Exporter Configuration

Specify how Waveset will establish read or write connections to the data warehouse by selecting an option from the Connection Type drop-down menu.
- JDBC. Connects to a database using the Java Database Connectivity (JDBC) application programming interface. Connection pooling is provided by the Warehouse Interface Code.
- Resource. Uses the connection information defined in a resource. Connection pooling is provided by the Warehouse Interface Code.
- Data Source. Uses the underlying application server for connection management and pooling. This type of connection requests connections from the application server.
  
  The fields that are displayed on the page vary, depending on which option you selected from the Connection Type drop-down menu. Refer to the online help for detailed information about configuring the database connection.

Click Save to save your configuration changes and return to the Data Exporter Configuration page.

Repeat this procedure if you will use separate read and write connections.

Defining the Warehouse Configuration Information

To configure the warehouse, you must select a read connection, a write connection, and specify a Warehouse Interface Code factory class. The WIC factory class provides the interface between Waveset and the warehouse. Waveset provides a default implementation of the code, but you may build your own. See Chapter 5, Data Exporter, in Oracle Waveset 8.1.1 Deployment Guide for information about creating custom factory classes.

The jar file containing the factory class and any supporting jar files must be present in the $WSHOME/exporter directory on the Waveset server that executes the export task and on any server that configures the Data Exporter. Only one Waveset server can export data at any given time.

To Define Warehouse Configuration Information

From the Data Exporter Configuration page, click the Edit link that is in the Warehouse Configuration Information section.

Figure 16–3 Data Exporter Configuration

Specify a value in the Warehouse Interface Code Factory Class Name field. If your integrator has not created a custom class, enter the value com.sun.idm.warehouse.base.Factory.

Specify the connections by selecting an option from both the Read Connection and Write Connection drop-down menus.

Click Save to save your configuration changes and return to the Data Exporter Configuration page.

Configuring Warehouse Models

Each exportable data type has a set of options that are used to control if, how and when the type is exported. Exporting data increases the load on the Waveset servers, so exporting should only be enabled for data types that are of business interest.

The following table describes each of the data types that can be exported.

Table 16–1 Supported Data Types


Data Type	Description
`Account`	A record containing the linkage between a User and a ResourceAccount
`AdminGroup`	A group of Waveset permissions available on all ObjectGroups
`AdminRole`	The permissions assigned to one or more ObjectGroups
AuditPolicy	A collection of rules evaluated against an Waveset object to determine compliance to a business policy.
`ComplianceViolation`	A record containing a user's non-compliance with an AuditPolicy
`Entitlement`	A record containing the list of attestations for a specific User
`LogRecord`	A record containing a single audit record
`ObjectGroup`	A security container that is modeled as an organization
`Resource`	A system/application on which accounts are provisioned
`ResourceAccount`	A set of attributes that comprise an account on a specific Resource
`Role`	A logical container for access
`Rule`	A block of logic that can be executed by Waveset
`TaskInstance`	A record indicating an executing or completed process
`User`	A logical user that includes zero or more accounts.
`WorkflowActivity`	A single activity of an Waveset workflow
`WorkItem`	A manual action from an Waveset workflow

To Configure Warehouse Models

From the Data Exporter Configuration page, click on a data type link.

In the Export tab, specify whether to export the data type. If you do not want to export this data type, deselect the Export check box and click Save. Otherwise, select the remaining options on this Export tab as needed.
- Allow Query. Controls whether the model can be queried.
- Queue All. Captures all changes to objects of this type. Checking this option may add significant processing costs to the Exporter. Use this option sparingly.
- Capture Deletes. Records all deleted objects of this type. Checking this option may add significant processing costs to the Exporter. Use this option sparingly.

The Attributes tab allows you to select which attributes may be specified as part of a forensic query, and which attributes can be displayed in the query results. You cannot delete the default attributes from the Administrator interface. See Chapter 1, Working with Attributes, in Oracle Waveset 8.1.1 Deployment Guide for information about changing the default attributes.

New attribute names have the following characteristics:
- attrName — The attribute is a top-level and scalar.
- attrName[] — The attribute is a list-valued top-level attribute, and the elements in the list are scalar.
- attrName[’key’] — The attribute contains a map value, and the value of the map with the specified key is desired.
- attrName[].name2 — The attribute is a list-valued top-level attribute, where the elements in the list are structures. name2 is the attribute in the structure to be accessed.
Note –
If you want to export attributes to the EXT_RESOURCEACCOUNT_ACCTATTR table, you must check the Audit box for each attribute to be exported.

Specify how often to export the information associated with the data type on the Schedule tab. Cycles are relative to midnight on the server. A cycle of every 20 minutes would occur on the hour, then 20 minutes and 40 minutes past the hour. If an export attempt takes longer than a scheduled cycle, the next cycle will be skipped. For example, if a cycle is defined as 20 minutes and starts at midnight, and it takes 25 minutes to complete the export, the next export will start at 12:40. The export originally scheduled for 12:20 will not occur.

Configuring Exporter Automation

Waveset allows you to specify workflows that executes before and after exporting data.

The Cycle Start workflow could be used to prevent an export if an event occurs that warrants a cancellation. For example, if an application that reads or writes to the staging tables needs exclusive access to the tables at the same time an export is scheduled to occur, the export should be cancelled. The workflow should return a value of 1 to cancel the export. Waveset creates an audit record that indicates the export was skipped and provides the error results. If the workflow returns 0 and no errors occur, the data type will be exported.

The Cycle Complete workflow runs after all the records have been exported. This workflow usually triggers another application to process the exported data. After this workflow completes, the Exporter checks for another data type to export.

Sample workflows are provided in the $WSHOME/sample/web/exporter.xml file. The subtype for a Exporter workflow is DATA_EXPORT_AUTOMATION and the authType is WarehouseConfig.

To Configure Exporter Automation

From the Data Exporter Configuration page, click the Edit link that is in the Exporter Automation section.

Optionally select a workflow to run before an export from the Cycle Start Workflow drop-down menu.

Optionally select a workflow to run after an export from the Cycle Start Workflow drop-down menu.

Configuring the Warehouse Task

It is not required to run the export task on a dedicated server, but you should consider it if you expect to export a large amount of data. The export task is efficient at transferring data from Waveset to the warehouse, and will consume as much CPU as possible during the export operation. If you do not use a dedicated server, you should restrict the server from handling interactive traffic, because the response time will degrade dramatically during a large export.

To Configure the Warehouse Configuration Information

From the Data Exporter Configuration page, click the Edit link that is in the Warehouse Task Configuration section.

Figure 16–4 Data Warehouse Schedule Configuration

Select an option from the Startup Mode drop-down menu to determine whether the warehouse task starts automatically when Waveset starts. Selecting Disabled means the task must be started manually.

Check the Run As Me check box to cause the Exporter task to run under the your administrative account.

Select the servers that the task can run on. You may specify multiple servers, but only one warehouse task can run at any given time. If the server executing the task is stopped, the scheduler automatically restarts the task on another server from the list (if available).

Specify the number of records read from the queue into a memory buffer before writing in the Queue read block size field. The default value for this field is good for most exports. Increase this value if the Waveset repository server is slow compared to the warehouse server.

Specify the number of records written to the warehouse in a single transaction in the Queue write block size field.

Specify the number of Waveset threads to use for reading queued records in the Queue drain Thread Count field. Increase this number if the queue table has a large number of records of different types. Decrease this number if the queue table has few data types.

Click Save to save your configuration changes and return to the Data Exporter Configuration page.

Modifying the Configuration Object

When Data Exporter is configured and operational, any data types that are configured to be queued will be captured in the internal queue table. By default this table does not have an upper bound, but one can be configured by editing the Data Warehouse Configuration Configuration object. This object has a nested object named warehouseConfig. Add the following line to the warehouseConfig object:

<Attribute name=’maxQueueSize’ value=’YourValue’/>

The value of maxQueueSize can be any positive integer that is less than 2³¹. Data Exporter disables queuing when that limit is reached. Data that is generated cannot be exported until the queue is drained.

Normal Waveset operation can generate multiple thousands of changed records per hour, so the queued table can grow very quickly. Since the queue table is in the Waveset repository, this growth will consume tablespace in the RDBMS, with the potential to exhaust the tablespace. Placing a cap on the queue may be necessary if you have a limited amount of tablespace.

Use the Data Queue JMX Mbean to monitor the size of the queue table. See Monitoring Data Exporter for more information.

Testing Data Exporter

After Data Exporter is correctly configured, it behaves as a background process, sending data to the warehouse at the configured intervals. To run the Exporter on demand, use the Data Warehouse Exporter Launcher task.

To Start the Data Warehouse Exporter Launcher

Disable the Warehouse Task. See Configuring the Warehouse Task for more information.

Click Server Tasks in the main menu. Then click the Run Tasks secondary tab. The Available Tasks page opens.

Click the Data Warehouse Exporter Launcher link. The Launch Task page opens.

Select the Debug options check box to display additional options.

Select the Ignore Initial LastMods check box to cause the Exporter to ignore the “last polled” timestamp it uses to determine which records in the Waveset repository have already been exported. When this option is selected, all records in the Waveset repository of the selected types will be exported.

Choose which types of data to export from the Export Once list. If you do not choose any types in the Export Once list, the export task runs as a daemon and exports based on the schedule previously defined. If you select one or more data types, Waveset exports these types immediately, and the export task exits.

Set the values for the other fields on the page as needed.

Click Launch to begin the task.

Configuring Forensic Queries

Forensic queries allow Waveset to read data that has been stored in the data warehouse. They can identify users or roles based on current or historical values of the user, role, or related data types. A forensic query is similar to a Find User or Find Role report, but it differs in that the matching criteria can be evaluated against historical data, and because it allows you to search attributes that are of data types other than the user or role being queried.

The purpose of the forensic query is to take action on the results using Waveset. The forensic query is not a general-purpose reporting tool.

A forensic query can ask questions similar to the following:

Who had access to system X between time A and B, and who approved of that access?
How many provisioning requests have been processed in the last 48 hours, and how long did each request take?

The results of a forensic query cannot be saved. General reporting on the warehouse data should be accomplished using commercial reporting tools.

Creating a Query

A forensic query can search for either User or Role objects. The query can be very complex, allowing the author to select one or more attribute conditions on related data types. User forensic queries can search attributes with the data types of User, Account, ResourceAccount, Role, and Entitlement, and WorkItem. Role forensic queries can search attributes with data types of Role, User, and Work Item.

Within a single data type, all attribute conditions are logically ANDed, so that all conditions must be met for a match to occur. By default, matches are ANDed across data types, but if you select the Use OR check box, the matches across data types are logically ORed.

The warehouse may contain multiple records for a single User or Role object, and a single query could return multiple matches for the same user or role. To help differentiate these matches, each data type can be constrained with a date range, such that only records from within the specified date range are considered matches. Each related data type may be constrained with a date range, so it is possible to issue a query of the form:

find all Users with Resource Account on ERP1 between May and July 2005 
who were attested by Fred Jones between June and August 2005

The date range is from midnight to midnight. For example, the range May 3, 2007 to May 5, 2007 is 48 hours. It would not include any records from May 5, 2007.

The operands (values to be compared to) for each attribute condition must be specified as part of the query definition. The schema restricts some attributes to have a limited set of potential values, while other attributes have no restrictions. For example, most date fields must be entered in YYYY-MM-DD HH:mm:ss format.

Note –

Due to the potentially large volume of data in the warehouse, and the complexity of the query, it may take a long time for the query to produce results. If you navigate away from the query page while a forensic query is running. you will not be able to see the results of the query.

To Create A Forensic Query

In the Administrator interface, click Compliance in the main menu.

The Audit Policies page (Manage Policies tab) opens.

Click the Forensic Query secondary tab.

The Search Data Warehouse page opens.

Figure 16–5 Search Data Warehouse

Select whether to search user or role records from the Type drop-down menu.

Select the Use OR check box to cause Waveset to logically OR the results of each data type queried. By default, the system performs a logical AND on the results.

Select a tab that represents a data type that will be in the forensic query.
1. Click Add Condition. A set of drop-down menus displays.
2. Select an operand (condition to check for) from the left drop-down menu and the type of comparison to make in the right drop. Then enter a string or integer to search for. The list of possible operands is defined in the external schema. Refer to the online help for a description of each operand.
3. Optionally, select a range of dates to narrow the scope of the query.
  
  Add more conditions as necessary to the currently-selected data type. Repeat this step for all data types that will be part of the forensic query definition.

Pick the attributes in the available attributes that you would like to display in the results of the forensic query.

Specify the a value in the Limit results to first field. When using conditions from multiple data types, the limit will be applied to the subquery for each type, and the final result is the intersection of all subqueries. As a result, the final result may exclude some records because of the limit on a subquery.

Click Search to run the forensic query immediately or Save Query to reuse the query. See Saving a Forensic Query for information about reusing your forensic queries.

Saving a Forensic Query

After you have configured a query (and optionally executed it to ensure that it produces the desired results), you can save the query for later execution.

To Save a Forensic Query

From the Search Data Warehouse page, click Save Query. The Save Forensic Query page opens.

Specify a name and description for query.

Select the Save condition values check box to save the values of the conditions (strings and integers) you entered on the Search Data Warehouse page. If you do not select this check box, then the saved forensic query serves as a template, and you must enter values each time you run the query.

Anyone can execute any saved query, but by default only the query author can modify the query. To allow other users to modify your query, select the Allow others to alter this query check box.

Because the query returns User or Role objects, you can choose which attributes of the objects to display in the results. If you want to display attributes that are not included in the Attributes to Display list, you can go to Data Exporter Configuration page and add new displayable attributes to the User or Role type.

Loading a Query

You can load any query that has been saved by any user, but you can only alter queries that you have created, or that other people have marked as modifiable by anyone.

To Load a Forensic Query

From the Search Data Warehouse page, click Load Query. The Load Forensic Query page opens. The Query Summary column displays Incomplete Query if the query has been saved as a template.

Select the check box to the left of the query and click Load Query.

Maintaining Data Exporter

This section describes how to track the status of Data Exporter. This information is organized into the following topics:

Monitoring Data Exporter

After the Exporter has been configured and is operational, you may choose to monitor it to ensure its continuous operation. The Exporter has several JMX beans that are useful for determining how the Exporter is behaving. The JMX beans include statistics on the average read/write rates for the Exporter, the current/maximum size of the internal memory queue, and the size of the persistent queue. The Exporter also produces audit records during export, one record for each cycle of each data type. The audit record includes how many records of the type were exported, and how long the export took.

Data Exporter provides the following JMX management beans that monitor the Exporter.

Table 16–2 JMX Management Beans


Bean Name	Description
DataExporter	Contains the number of currently queued exports and the upper limit for the queue.
DataQueue	Contains the number of currently cached queued exports and the rate of arrival to the cache.
ExporterTask	Contains the number of export reads (from Waveset), writes (to the warehouse), rates (records/second) for reading, writing, and number of errors.

Data Exporter can be configured to queue export records to a queuing table during normal Waveset operation. Because the queue needs to potentially scale to a large number of records and survive a server restart, the queue is backed by a table in the Waveset repository. Since writes to the repository would typically slow down normal Waveset operations, the queue uses a small memory cache to buffer records in memory until they can be persisted in the repository.

The DataQueue MBean attributes can be plotted to show the largest number of records queued in memory (on a single Waveset server). On a balanced system, the number of records in the memory cache should be small and trend quickly to zero. If you observe this number get large (in the thousands) or not return to zero within a few seconds, you should investigate the write performance of the repository.

The ExportTask MBean contains two error counts, one for read and one for write. These counts should be zero, but there are a number of reasons that errors might occur, especially during write. The most common write error will result from the exported data not fitting within the warehouse table columns - typically a string overflow. Some exported String data is unbounded, where the export table columns must have some upper limit.

Monitoring Logging

Waveset has two sets of objects that grow without bounds: the audit log and the system log. Data Exporter addresses some of the maintenance problems associated with the log tables.

Audit Logs

Waveset writes immutable audit records to the audit log to serve as a historical audit trail of the operations it performs. Waveset uses these records in certain reports, and the data from the records may be displayed in the administrator interface. However, because the audit log grows without bounds and it grows at a modest rate, the deployer must determine when to truncate the audit log. Before Data Exporter, if you wanted to preserve the records prior to truncation, you were forced to dump the tables from the repository. If Data Exporter is enabled and configured to export log records, then the old records are preserved in the warehouse, and Waveset may truncate the audit tables as needed.

System Logs

System logs have the same immutable property that the audit logs have, but system logs are not typically generated as frequently. Data Exporter does not export system logs. To truncate the system log and preserve old records, you must dump the tables in the repository.