Waveset relies on a repository database to store and manage its identity and configuration data. For this reason, database performance can greatly influence Waveset’s performance.
Detailed information about performance tuning and managing databases is beyond the scope of this document because this information is dataset-specific and vendor-specific. In addition, customer database administrators (DBAs) should already be experts on their own databases.
This section characterizes the Waveset application and provides general information about the nature of Waveset data and its typical usage patterns to help you plan, tune, and manage your databases.
This information is organized into the following sections:
You must understand the Waveset repository database scripts and how Waveset uses that content before you can effectively tune the Waveset repository.
This section provides an overview of the database and the information is organized into the following topics:
The Waveset repository contains three types of tables, and each table has slightly different usage characteristics.
Attribute Tables. These tables enable you to query for predefined single-valued or multi-valued object attributes.
For most object types, stored attributes are hard-coded.
The User and Role object types are exceptions to this rule. The inline attributes that are stored in the object table for User and Role are configurable, so you can configure additional custom attributes as queryable.
When you search for objects based on attribute conditions, Waveset accesses attribute tables in joins with the corresponding object tables. Some form of join (such as a JOIN, an EXISTS predicate, or a SUB-SELECT) occurs for each attribute condition.
The number of rows in the attribute table are proportional to the number of rows in the corresponding object table. The values distribution might exhibit skew, where multi-valued attributes have a row per value and some objects might have more attributes or more attribute values than others. Typically, there is a many-to-one relation between rows in the attribute table and rows in the object table.
Attribute tables have ATTR in the table name.
Change Tables. Waveset uses a change table to track changes made to a corresponding object table. These table sizes are proportional to the rate of object change, but the tables are not expected to grow without bound. Waveset automatically truncates change tables.
Change tables can be highly volatile because the lifetime of a row is relatively short and new rows can be created frequently.
Access to a change table is typically performed by a range scan on the time-stamp field.
Change tables have CHANGE in the table name.
Object Tables. The Waveset repository uses object tables to hold serialized data objects, such as Large Objects (LOBs). Object tables can also hold commonly queried, single-valued object attributes.
For most object types, stored attributes are hard-coded.
The User and Role object types are exceptions to this rule. The inline attributes that are stored in the object table are configurable, and you can configure additional custom attributes as queryable for User and Role.
The number of rows in an object table equals the number of objects being stored. The number of objects stored in each object table depends on which object types are being stored in the table. Some object types are numerous, while other types are few.
Generally, Waveset accesses an object table by object ID or name, though Waveset can also access the table by using one of the attributes stored in the table. Object IDs and names are unique across a single object type, but attribute values are not unique or evenly distributed. Some attributes have many values, while other attributes have relatively few values. In addition, several object types can expose the same attribute. An attribute may have many values for one object type and few values for another object type. The uneven distribution of values might cause an uneven distribution of index pages, which is a condition known as skew.
Object tables are tables that do not have ATTR or CHANGE suffixes in the table name.
Every object table contains an XML column, which is used to store each serialized object except the LOG table-set. Certain LOG table-set optional attributes are stored in the XML column if these attributes are present. For example, if digital signing is enabled.
You can roughly divide Waveset data into a number of classes that exhibit similar properties with respect to access patterns, cardinality, lifetime, volatility, and so forth. Each of the following classes corresponds to a set of tables in the repository:
User Data. User data consists of user objects.
You can expect this data to grow quite large because there is an object for each managed identity. After an initial population phase, you can expect a proportionally small number of creates because the majority of operations will be updates to existing objects.
User objects are generally long-lived and they are removed at a relatively low rate.
User data is stored in USEROBJ, USERATTR, and USERCHANGE tables.
Role Data. Role data consists of Role objects, including Roles subtypes such as Business Roles, IT Roles, Applications, and Assets.
Role data is similar to organization data, and these objects are relatively static after a customer deploys Waveset.
An exception to the preceding statement is a deployment that is integrated with an external source containing an authoritative set of roles. One integration style might be to feed role changes into Waveset, which causes Waveset Role data to be more volatile.
Generally, the number of role objects is small when compared to the number of identity objects such as users (assuming that multiple users share each role), but this depends on how each enterprise defines its roles.
Role data is stored in ROLEOBJ, ROLEATTR, and ROLECHANGE tables.
Account Data. Account data solely consists of account objects in the Account Index.
As with user data, account data can become rather large, with an object for each known resource account. Account objects are generally long-lived, removed at a relatively low rate, and after initial population, are created infrequently. Unless you frequently add or remove native accounts, or enable native change detection, account object modifications occur infrequently.
Waveset stores account data in ACCOUNT, ACCTATTR, and ACCTCHANGE tables.
Compliance Violation Data. Compliance Violation data contains violation records that indicate when the evaluation of an Audit Policy failed. These violation records exist until the same Audit Policy is evaluated against the same User and the policy passes. Violation records are created, modified, or deleted as part of an Audit Policy Scan or as part of an Access Review.
The number of violation records is proportional to the number of Audit Policies that are used in scans and the number of Users. An installation with 5000 users and 10 Audit Policies might have 500 violation records (5000 x 10 x 0.01), where the 0.01 multiplier depends on how strict the policies are and how user accounts are changed.
Waveset stores Compliance Violation records in OBJECT, ATTRIBUTE, and OBJCHANGE tables.
Entitlement Data. Entitlement data predominately consists of user entitlement objects, which are only created if you are doing compliance access reviews.
Entitlement records are created in large batches, modified slowly (days) after initial creation, and are then untouched. These records are deleted after an Access Review is deleted.
Waveset stores entitlement data in ENTITLE, ENTATTR, and ENTCHANGE tables.
Organization Data. This data consists of object group or organization objects.
Object group data is similar to configuration data, and this data is relatively static after being deployed. Generally, the number of objects is small (one for each defined organization) when compared to task objects or to identity objects such as users or accounts, however, the number can become large compared to other configuration objects.
Organization data is stored in ORG, ORGATTR, and ORGCHANGE tables.
Task Data. Task data consists of objects that are related to tasks and workflows, including state and result data.
The data contained in these tables is short-lived compared to other classes because objects are created, modified, and deleted at a high rate. The volume of data in this table is proportional to the amount of activity on the system.
Task data is stored in TASK, TASKATTR, and TASKCHANGE tables.
Configuration Data. Configuration data consists of objects related to Waveset system configuration, such as forms, roles, and rules.
Generally, configuration data is:
Relatively small compared to other classes
Only expected to change during deployment and upgrade, and changes occur in large batches
Not expected to change much after being deployed
Waveset stores configuration data in ATTRIBUTE, OBJCHANGE, and OBJECT tables.
Export Queue Data. If you enable Data Exporting, some records are queued inside Waveset until the export task writes those records to the Data Warehouse. The number of records that are queued is a function of Data Exporting configuration and the export interval for all queued types.
The following data types are queued by default, and all other data types are not:
ResourceAccount
WorkflowActivity
TaskInstace
WorkItem
The number of records in these tables grows until the export task drains the queue. The current table size is visible through a JMX Bean.
Records added to this table are never modified. These records are written during other Waveset activities, such as reconciliation, provisioning, and workflow execution. When the Data Exporter export task runs, the task drains the table.
Waveset stores Export Queue data records in QUEUE, QATTR, and QCHANGE tables.
Log Data. Log data consists of audit and error log objects. Log data is write-once only, so you can create new audit and error log objects, but you cannot modify these objects.
Log data is long-lived and can potentially become very large because you can only purge log data by explicit request. Access to log data frequently relies on attributes that are stored in the object table instead of in the attribute table. Both the distribution of attribute values and queries against the log specifically depend on how you are using Waveset.
For example, the distribution of attribute values in the log tables depends on the following:
What kind of changes are made
Which Waveset interface was used to make the changes
Which types of objects were changed
The pattern of queries against the log table also depends on which Waveset reports, which custom reports, or which external data mining queries a customer runs against the log table.
Waveset stores audit log records in LOG and LOGATTR tables, and error log records in SYSLOG and SLOGATTR tables. This data does not have corresponding change tables.
Waveset generates globally unique identifiers (GUIDs) for objects by using the VMID class provided in the JDK software.
These GUID values exhibit a property that gets sorted by its string representations, based on the order in which the objects are created. For example, when you create new objects with Waveset, the newer objects have object IDs that are greater than the older objects. Consequently, when Waveset inserts new objects into the database, the index based on object IDs can encounter contention for the same block or blocks.
Generally, Waveset uses prepared statements for activities such as inserting and updating database rows, but does not use prepared statements for queries.
If you are using Oracle, this behavior can create issues with the library cache. In particular, the large number of statements versions can cause contention on the library cache latch.
To address this contention, change the Oracle CURSOR_SHARING parameter value from EXACT to SIMILAR. Changing this value causes Oracle to replace literals in SQL statements with bind variables, thereby reducing the number of versions.
Because Waveset is a Java application that generally reads and writes character data rather than bytes, it does not restrict which encoding the database uses.
Waveset only requires that the data is sent and returned correctly. For example, the data does not become corrupted when written or reread. Use an encoding that supports multi-byte characters and is appropriate for the customer’s data. Generally, UTF-8 encoding is sufficient, but enterprises with a large number of true multi-byte characters, such as Asian or Arabic, might prefer UTF-16.
Most database administrators prefer to use an encoding that supports multi-byte characters because of the following:
Their deployments often grow to support international characters.
Their end users cut-and-paste from a Microsoft application’s text containing characters that look like ASCII but are actually multi-byte, such as em dashes (—).
This section describes how to configure some commonly configured properties in the Waveset repository.
Do not modify properties in the Waveset repository unless you are very familiar with repository databases and understand the consequences of making these changes.
If you are using a DataSource, set the connectionPoolDisable attribute to true in the RepositoryConfiguration object to disable automatic internal connection pooling in the Waveset repository.
For example, setting <RepositoryConfiguration connectionPoolDisable=’true’> allows you to avoid having two connection pools (one for Waveset and one for your application server).
You can see the current connectionPoolDisable setting in the ObjectRepository JMX MBean.
You can edit the RepostioryConfiguration object to enhance the performance of searches against specific, single-valued attributes. For example, you might edit this object to add an extended attribute, such as employeeID, that is used to search for Users or as a correlation key.
The default RepositoryConfiguration object looks like the following example:
<RepositoryConfiguration ... > <TypeDataStore Type="User" ... attr1="MemberObjectGroups", attr2="lastname" attr3="firstname" attr4="" attr5=""> </TypeDataStore> </RepositoryConfiguration> |
The ellipses represent XML attributes that are not relevant here.
Each of the attr1, attr2, attr3, attr4, and attr5 XML attributes specifies a single-valued attribute to be copied into the waveset.userobj table. The waveset.userobj table can contain up to five inline attributes. The attribute value named by attr1 in RepositoryConfiguration will be copied into the “attr1” database column in this table.
Inline attributes are stored in the base object table for a Type (rather than as rows in the child attribute table).
Using inline attributes improves the performance of repository queries against those attributes. (Because inline attributes reside in the main “object” table, queries against inline attributes are faster than those against non-inline attributes, which are stored in the child “attribute” table. A query condition against a non-inline attribute requires a “join” to the attribute table.)
By default, Waveset uses the MemberObjectGroups, lastname, and firstname inline attributes.
You can add two more attributes to enable faster searching, as long as those attributes are queryable.
For example, if your deployment contains an employeeID extended attribute, adding that attribute inline will improve the performance of repository searches against that attribute.
If you do not need lastname or firstname, you can remove them or replace them with other attributes.
Do not remove MemberObjectGroups. Waveset uses this attribute internally to speed up authorization checks.
Every public repository method acquires a connection when the method begins and then releases that connection when the method exits. This connection is almost always a JDBC connection because the repository almost always accesses a DBMS. The method releases the connection whether it completes successfully or unsuccessfully.
You can configure the number of concurrent connections by modifying the maxConcurrentConnections value in the RepositoryConfiguration object.
The number of concurrent JDBC connections is unlimited by default (maxConcurrentConnections=0). The Waveset repository can request as many connections as necessary to serve each concurrent repository request.
You specify a maximum number of concurrent connections that the repository can use at one time by changing the maxConcurrentConnections value to a positive integer. In effect, this setting limits the number of calls to the repository that can be actively executing at any one time.
If the repository reaches the configured limit, calls to the repository will wait until the number of connections being used by the repository falls back below the limit. The repository might appear to slow down, or even stop, if this is necessary to stay within the specified number of connections.
Any limit on the number of concurrent connections applies whether the repository pools connections or not. In other words, maxConcurrentConnections is unaffected by connectionPoolDisable.
To configure the maxConcurrentConnections value, you must edit the RepositoryConfiguration object from the System Settings debug page (http://host:port/idm/debug/session.jsp),
Select Configuration from the menu located next to the List Objects button.
Click the List Objects button to view all of the Configuration-type objects.
Locate RepositoryConfiguation and click Edit.
When the Checkout Object: Configuration, #ID#REPOSITORYCONFIGURATION page displays (similar to the following example), locate maxConcurrentConnections and modify the existing value as needed.
<?xml version='1.0' encoding='UTF-8'?> <!DOCTYPE Configuration PUBLIC 'waveset.dtd' 'waveset.dtd'> <!-- MemberObjectGroups="#ID#Top" extensionClass="RepositoryConfiguration" id="#ID#RepositoryConfiguration" name="RepositoryConfiguration"--> <Configuration id='#ID#RepositoryConfiguration' name='RepositoryConfiguration' lock='Configurator#1251229174812' creator='getRepositoryConfiguration(boolean)' createDate='1249052436640' repoMod='1249048790203' type= 'RepositoryConfiguration'> <Extension> <RepositoryConfiguration instanceOf= 'com.waveset.repository.RepositoryConfiguration' lockTimeoutMillis='300000' maxConcurrentConnections='0' blockRowsGet='1000' blockRowsList='10000' maxAttrValLength='255' maxLogAcctAttrChangesLength='4000' maxLogMessageLength='255' maxLogXmlLength='2147483647' maxLogAcctAttrValueLength='128'maxLogParmValueLength='128' maxXmlLength='2147483647' maxSummaryLength='2048' optimizeReplaceAttributes= 'true' maxInList='1000' maxDelSet='1000' mcDBCall='10' mcDeleteAttrVal='3' mcInsertAttrVal='15' mcUpdateAttrVal='20' connectionPoolDisable='false' changeScanInterval='5000' changePropagationDelay='3000' changeAgeOut='60000'> <RelocatedTypes> <TypeDataStore typeName='User' deleteDestroyInterval='1' attr1='MemberObjectGroups' attr2='lastname' attr3='firstname'> </TypeDataStore> </RelocatedTypes> </RepositoryConfiguration> </Extension> <MemberObjectGroups> <ObjectRef type='ObjectGroup' id='#ID#Top' name='Top'/> </MemberObjectGroups> </Configuration>L |
Save your changes.
For more information about which object types are stored in each set of tables, see Data Classes.
JDBC supports execution of two types of SQL statements:
Statement
PreparedStatement
Generally, a PreparedStatement is more efficient for the DBMS because the DBMS server can cache and reuse a PreparedStatement for multiple executions, avoiding the cost of SQL parsing and optimizer selections. To get the benefit of this caching, Waveset associates a PreparedStatement with a JDBC connection. When Waveset is going to issue the same SQL statement multiple times using the same connection, it automatically uses a PreparedStatement.
However, most of the JDBC operations performed by Waveset do not share a connection between JDBC calls. Most JDBC operations are the result of the application server processing an HTTP request through a JSP, so that the processing thread acquires a JDBC connection when needed and releases that connection when it is done. Consequently, you must do some additional work to get the benefits of using PreparedStatements.
Many application servers provide DataSource implementations that provide both connection pooling and implicit statement caching. Implicit statement caching is the ability of the DataSource or connection pool to implicitly cache JDBC statements and associate them with the connection.
If you configure Waveset to use an application server DataSource, you must have both of these features to benefit from using Waveset's preferPreparedStatement repository option. If you are not using an application server DataSource, if the DataSource is not configured to use a connection pool, or if the connection pool does not support implicit statement caching, using preferPreparedStatement causes decreased performance because the DBMS cannot reuse the statement.
Waveset code is not structured to associate a statement with a connection, except in a few circumstances, so Waveset depends on the DataSource or connection pool to perform this caching. If you configure the Waveset repository to use a DataSource or connection pool that performs implicit statement caching, and you enable preferPreparedStatement=true in the RepositoryConfiguration configuration object, Waveset can use PreparedStatement for object lock/unlock operations and most get operations, which potentially results in performance improvements. The Waveset ObjectRepository has a JMX MBean that indicates the setting of the preferPreparedStatement switch and the number of Statement/PreparedStatement requests made by the Waveset server.
You can see the current preferPreparedStatement setting in the ObjectRepository JMX MBean.
This section describes some general guidelines for tuning a repository database:
Update statistics frequently to monitor what is happening with the repository database.
Defragment indexes and tables more frequently.
Report top queries.
Preallocate the user table to be sure it is large enough.
Estimate how many users you have and make the table big enough to accommodate the data. If you just allow the table to extend itself, loading users can become a very slow process. Periodically the table will stop to double its size, reformat itself, and push all the data back in.
Similarly, preallocate the accounts table to be sure it is large enough.
Estimate how many accounts you have per user and make the table large enough to accommodate the data.
Oracle's Professional Services group and Partners have estimator tools that you can use to estimate user table and accounts table sizes.
Periodically run Waveset's built-in Audit Log Maintenance Task and System Log Maintenance Task to configure log record expirations. Log records can grow without bound, so use these tasks to prevent the repository database from running out of space. For information, see the Oracle Waveset 8.1.1 Business Administrator’s Guide.
Keep in mind that the tasks are volatile. DBAs should consider defragmenting the database more frequently, consider what type of storage to use, and leave plenty of free space in the database.
This section describes some vendor-specific guidelines for tuning Oracle and SQL Server repository databases.
Currently, MySQL databases are only supported in development and for demonstrations.
This section describes guidelines for tuning Oracle repository databases:
The Waveset application does not require Oracle database features or options.
If you are using an Oracle repository database and Oracle Waveset Service Provider or Waveset, you might encounter problems with object table fragmentation because Waveset uses LONG, rather than LOB, data types by default. Using LONG data types can result in large amounts of “unallocated” extent space, which cannot be made into usable space.
To mitigate this problem, do the following:
Take EXPORT dumps of the Object table and re-import them to free up unallocated extent space. After importing, you must stop and restart the database.
Use LOB data types and DataDirect Technologies’ Merant drivers, which provide a standard LOB implementation for Oracle.
Use Locally Managed Tablespaces (LMTs), which offer automatic free space management. LMTs are available in Oracle 8.1.5.
Waveset does not require Oracle init.ora parameter settings for SGA sizing, buffer sizing, open cursors, processes, and so forth.
While the Waveset repository is a general-purpose database, it is best described as an object database.
Of the Waveset tables, the TASK table-set comes closest to having transaction-processing characteristics. The LOG and SYSLOG table-sets are also exceptional because these tables do not store serialized objects.
See Repository Table Types and Data Classes for descriptions of the tables, the object types stored in each table, and the general access pattern for each table.
If you have performance issues with the Oracle database, check for issues related to poor query plans being chosen for what Waveset expects to be relatively efficient queries.
For example, Waveset is configured to perform a full table-scan when an index is available for use. These issues are often visible in Automated Workload Repository (AWR) reports provided in the SQL by the buffer gets table. You can also view issues in the Enterprise Manager tool.
Performance problems typically appear to be the result of bad or missing database table statistics. Addressing this problem improves performance for both the database and Waveset.
The following articles (available from Oracle) are a good source of information about the cost-based optimizer (CBO) in Oracle:
Oracle MetaLink: Note:114671.1: Gathering Statistics for the Cost Based Optimizer
You might also investigate using SQL Profiles, which are another method for choosing the best query plans. You can use the SQL Advisor within Enterprise Manager to create these profiles when you identify poorly performing SQL.
If you detect unexpected growth in the Oracle redo log, you might have workflows that are caught in an infinite loop with a manual action. The loop causes constant updates to the repository, which in turn causes the size of each TaskInstance to grow substantially. The workflow errors are caused by improper handling of WF_ACTION_TIMEOUT and by users closing their browser in the middle of a workflow.
To prevent problematic workflows, preview each manual action before a production launch and verify the following:
Have you set a timeout?
Have you created appropriate transition logic to handle a timeout for the activity with the manual action?
Is the manual action using the exposed variables tag when there is a large amount of data in the TaskInstance?
Frequently, you can significantly improve Waveset performance if you change the CURSOR_SHARING parameter value from EXACT to SIMILAR.
Waveset uses prepared statements for some activities (such as inserting and updating database rows), but does not use these statements for most queries.
When you use Oracle, this behavior can cause issues with the library cache. In particular, the large number of statement versions can create contention on the library cache latch. Changing CURSOR_SHARING to SIMILAR causes Oracle to replace literals in SQL statements with bind variables, which greatly reduces the number of versions.
See Prepared Statementsfor more information.
Some customers who used an SQL Server 2000 database as a repository reported that as concurrency increased, SQL Server 2000 reported deadlocking problems that were related to SQL Server’s internal use of pessimistic locking (primarily lock escalation).
These deadlock errors display in the following format:
com.waveset.util.IOException: ==> com.microsoft.sqlserver.jdbc.SQLServerException: Transaction (Process ID 51) was deadlocked on lock | communication buffer resources with another process and has been chosen as the deadlock victim. Rerun the transaction. |
To prevent or address deadlocking problems, do the following:
Use the SQL Server 2005 database.
Configure the READ_COMMITTED_SNAPSHOT parameter by formatting the command as follows:
ALTER DATABASE waveset SET READ_COMMITTED_SNAPSHOT ON
Enabling the READ_COMMITTED_SNAPSHOT parameter does the following:
Removes contention during the execution of SELECT statements that can cause blocks, which greatly reduces the potential for deadlocks internal to SQL Server.
Prevents uncommitted data from being read and guarantees that SELECT statements receive a consistent view of committed data.
For more information about the READ_COMMITTED_SNAPSHOT parameter, see:http://msdn.microsoft.com/en-us/library/ms188277.aspx.