Deployment Guide

Tuning Liquid Data Performance

This chapter describes how to tune performance in a BEA Liquid Data for WebLogic deployment. It includes the following sections:

In a production environment, Liquid Data performance is generally measured by the speed with which queries are processed and results are returned. This topic describes general guidelines for performance and, where possible, provides specific guidelines for tuning a Liquid Data deployment.

Where to Begin

This topic describes where to begin tuning a Liquid Data deployment. It contains the following sections:

Checking Your System Configuration

Before you begin to investigate how to tune Liquid Data performance, make sure that the system on which Liquid Data runs is reasonably configured.

System platform configuration. Do you have sufficient CPU, memory, and disk space resources? A properly tuned heap size? The right Java Virtual Machine (JVM)? Reasonable network speed? For more information, see Platform Performance Factors.
Data sources configuration. Are data sources properly tuned? For example, if the deployment uses relational database data sources, do the tables in those data sources have adequate indexes? Is the JDBC connection pool size reasonably configured? If the deployment uses Web Services data sources, is the Web Service available and does it provide a reasonable response time? For more information, see Data Source Performance Factors.

Tuning Queries

In addition to the system configuration, Liquid Data performance is greatly affected by the query design, as described in Query Performance Factors. Therefore, make sure that the queries running on the Liquid Data server are reasonably designed:

Does each query follow the appropriate query design pattern, as described in "Design Patterns" in Designing Queries in Building Queries and Data Views?
Does each query provide the appropriate hints, as described in Optimizing Queries in Building Queries and Data Views?
Does each query have the appropriate scope, as described in "Understanding Scope in Basic and Advanced Views" in Designing Queries in Building Queries and Data Views?
Does any query produce a large intermediate result set or final result set? For more information, see Query Performance Factors.
Have you debugged the query using a verbose logging mode (info) and reviewed the trace? For more information, see Monitoring Liquid Data Performance.

Liquid Data Performance Factors

Many factors influence overall Liquid Data performance. Certain factors, such as query design, are within Liquid Data's scope, while other factors, such as data source processing speed, are outside Liquid Data's area. This topic identifies the main factors that can affect Liquid Data performance. It includes the following sections:

Query Performance Factors

Liquid Data performance depends on the way queries are designed and configured for execution. The following query factors affect Liquid Data performance:

Table 5-1 Query Performance Factors

Factor	Description
Query complexity	Some query operations are more resource intensive than others. Simple queries perform basic operations, such as retrieving a customer list or an employee's profile. Analytical queries, such as complex joins or aggregates, perform interim processing steps that produce intermediate result sets. Recommendations: Because analytical queries generally consume more memory and CPU resources than simple queries, see the sections later in this table on caching and large result sets.
Query type	The type of query (stored or ad hoc) affects performance: For stored queries, the query is compiled only once and query execution plan is cached. In addition, the query result can be cached for faster retrieval if the query is executed subsequently with the same parameters. For more information, see Configuring the Query Results Cache in the Liquid Data Administration Guide. For ad hoc queries, the query execution plans are compiled once and cached, but the query results are not cached. For certain queries, the time to compile the query might take much longer then the time to execute. Recommendation: Use stored queries in a production system.
Design pattern used (query hints)	When building queries, the appropriate design patterns must be used to ensure the fastest possible execution speed. For example, for joins, you should use a query hint to supply as much information to the execution engine as possible about the amount of data to search when processing a query. Recommendation: Use hints, if applicable. Hints are particularly significant with large data sources. If you are using the Data View Builder, design patterns for target schema are also very important. For more information, see "Understanding Query Design Patterns" in Designing Queries and Optimizing Queries in Building Queries and Data Views.
Caching	Caching improves performance for stored queries. Liquid Data supports two types of caching: Query plan cache, which is always enabled, allows Liquid Data to retrieve the query plan for a repeated query from cache rather than regenerating the query plan. Query results cache allows Liquid Data to retrieve the results for a repeated query from in-memory cache (rather than running the query against the data source each time). Stored query results that are configured to expire sooner must be executed against the data source(s) more frequently. Recommendations: Always use stored queries in a production system. Use query results caching, if applicable. For a complex query, or for a query that retrieves data from slower data sources, caching the query result provides a substantial performance gain. Once the query is cached, the query execution time is the fixed cost of retrieving the results cache from the database plus returning it as an XML document. For more information, see Configuring the Query Results Cache in the Liquid Data Administration Guide.
Size of intermediate or final results returned and memory usage	Generally the larger the result size (final or intermediate results), the longer it takes to retrieve and process the results. Recommendations: Queries should be designed to retrieve only the required results. Restrict the number of elements in the target schema (project). Specify strict conditions that limit the number of rows returned in the result set. Increase the heap size to as large as the system can allow. In general, increasing the heap size increases performance. If increasing the heap size solves the out-of-memory problem, then no further action is required. For relational database data sources, use the `{--! merge !-- }` query hint, if applicable. The effect is similar to a database merge join. Using this technique has a space/performance trade-off: a merge hint provides better memory usage but it might also slow performance if extra sorting is required to perform the merge join. For queries that perform joins on multiple data sources, specify the sequence of data sources in ascending order by increasing size: the smallest resource should appear in the first FOR loop, and the largest resource should appear in the last one. For more information, see "Source Order Optimization" in Optimizing Queries in Building Queries and Data Views. Use the disk swapping option, which allows Liquid Data to store intermediate results on disk when the results exceed the `MEM_SORT_BUF` size specified in the server startup script (`startWebLogic.cmd` on Windows or `startWebLogic.sh` on Unix). The default setting for `MEM_SORT_BUF` size is 50MB. The recommended size is less than one third (1/3) of the maximum memory size defined in `MEM_ARGS` in the server startup script. To configure disk swapping, see Configuring Liquid Data Server Settings in the Administration Guide. Note: Liquid Data queries do not retrieve binary large object (BLOB) data from relational databases. See Supported Data Types in XQuery Reference Guide.
Number of concurrent queries	The higher the number of concurrent queries, the slower the performance, particularly during peak loads. Performance improves through the use of additional CPUs and WebLogic Server clusters, as described in Clustered Deployments, and with tuning the thread pool, as described in Using the WebLogic Administration Console to Monitor Performance.

Data Source Performance Factors

In general, Liquid Data performance depends on the speed at which the data source host system is able to process query requests and return results.

Performance Factors for All Data Sources

The following data source factors affect Liquid Data performance:

Table 5-2 Data Source Performance Factors

Factor	Description
Data source type	Some types of data sources offer higher performance (such as relational databases) than other types (such as application integrations or Web Services). For more information, see Table 5-3.
Data source size	The size of the data source affects performance. In general, the larger the data source, the longer it takes to retrieve the query results. For example, a large XML document takes longer to process than a small XML document. For relational databases, indexing can very substantially improve performance, particularly for large databases.
Number of data sources	For queries that access multiple data sources, data is retrieved from each data source in sequence, one data source at a time. This is true for all data source types except application views, Web Services, and custom functions (which are processed asynchronously). For application views, Web Services, and custom functions you can configure the maximum number of connections or the maximum number of concurrent threads to be used. If queries use Web Services, application views, or custom functions extensively, then consider tuning this setting.
Data source performance and availability	A query fails if a required data source is unavailable during query execution due to server failure, insufficient available connections, failed authentication, or other factors. Performance delays result if the server hosting the data source is congested. Faster hardware (storage, memory, and CPU throughput) for the data source host machine generally provides higher performance, particularly for larger data sources. Except for any XML files stored locally on the host system, all data sources are remote. Therefore, network connection availability and speed affects how quickly the query results are returned. In addition to network capacity and throughput speeds, the number of hops between nodes can greatly affect performance. Secure connections, such as SSL (Secure Sockets Layer) increase security but slow performance. Network speed is not a factor with local data sources, such as XML files stored locally on the host system. However, network speed is a factor with XML files stored remotely.
Transaction isolation level (relational databases only)	For relational databases, the transaction isolation level setting can affect query performance: For databases containing static or read-only data, in general, use the default setting (`TRANSACTION_READ_COMMITTED`). For databases containing dynamic or updateable data, use the transaction isolation level that supports the degree of concurrency required. However, increased concurrency can result in slower query performance. Using a transaction level of `TRANSACTION_SERIALIZABLE`, the highest level of concurrency, is more likely to reduce performance than using a lower level of concurrency, such as `TRANSACTION_REPEATABLE_READ` or `TRANSACTION_READ_COMMITTED`. For more information about configuring the transaction isolation level for a relational database, see "Creating a Relational Database Data Source Description" in Configuring Access to Relational Databases in the Liquid Data Administration Guide.

Performance Factors for Data Source Types

The following table describes the most important performance factors for each supported data source type:

Table 5-3 Performance Factors for Data Source Types

Type	Important Performance Factor(s)
Relational databases	A fast host machine is especially important for large databases. An optimized database design and a highly tuned configuration substantially improves performance. For example, indexing improves performance, particularly for large databases. For more information, see your database vendor's documentation. The JDBC driver configuration can affect query performance, and JDBC connection pool settings must be properly tuned. For more information, see JDBC in the BEA WebLogic Server Administration Console Online Help. The JDBC transaction isolation level can affect query performance. For more information, see the discussion of transaction isolation levels in Table 5-2.
XML files	Size of file in generally predictive of performance.
Web Services	Query performance depends primarily on the Web Service implementation and the speed at which the Web Service returns results, as well as the network connection speed. Liquid Data runs query requests on Web Services asynchronously. If a query uses multiple data sources, the query can continue processing while waiting for a response from a Web Service. For Web Services, therefore, performance also depends on the Maximum Threads setting specified on the General tab in the Liquid Data node in the Administration Console, as described in Configuring Liquid Data Server Settings in the Liquid Data Administration Guide.
Application views	Query performance depends on the performance of the EIS and its associated adapter. Liquid Data runs query requests on application views asynchronously. If a query uses multiple data sources, the query can continue processing while waiting for a response from an application view. For application views, therefore, performance also depends on the Maximum Threads setting specified on the General tab in the Liquid Data node in the WebLogic Administration Console, as described in Configuring Liquid Data Server Settings in the Liquid Data Administration Guide.
Data views	Complexity of the underlying query, as described in Query Performance Factors. The data source performance factors described in Performance Factors for All Data Sources.

Platform Performance Factors

This section describes performance factors associated with the Liquid Data server, including the host server hardware, clustering Liquid Data servers, tuning threads, tuning WebLogic Server, and tuning WebLogic Integration. The most important factor is running Liquid Data on a very fast server machine with the maximum amount of available memory. For general information about platform performance, see Tuning Hardware, Operating System, and Network Performance in BEA WebLogic Server Performance and Tuning.

This section describes the following platform performance factors:

General Platform Performance Factors

The following general performance factors are associated with a Liquid Data deployment:

Table 5-4 Server Hardware Performance Factors

Factor	Description
Network connection speed	For remote resources such as data sources, the speed and capacity of the network connection is an important factor. In addition to network capacity and throughput speeds, the number of hops between nodes can greatly affect performance. Secure connections, such as SSL (Secure Sockets Layer) increase security but slow performance.
Distribution of resources across servers	Performance is greatly affected by the way in which Liquid Data, WebLogic Server, and other WebLogic Platform resources are distributed across servers. For example: If a Liquid Data deployment uses application views as data sources, it is generally optimal to run Liquid Data and Application Integration on separate server machines, as described in Multi-Node Deployments. Liquid Data is built on the scalable WebLogic Server platform. In deployments that support a high volume of concurrent query requests, you can increase system performance by deploying on a WebLogic Server cluster—a group of WebLogic Servers that are managed as a single unit and distribute the load for processing query requests. For more information, see Clustered Deployments and Designing Deployments.

Factor

Description

Network connection speed

For remote resources such as data sources, the speed and capacity of the network connection is an important factor. In addition to network capacity and throughput speeds, the number of hops between nodes can greatly affect performance. Secure connections, such as SSL (Secure Sockets Layer) increase security but slow performance.

Distribution of resources across servers

Performance is greatly affected by the way in which Liquid Data, WebLogic Server, and other WebLogic Platform resources are distributed across servers. For example:

If a Liquid Data deployment uses application views as data sources, it is generally optimal to run Liquid Data and Application Integration on separate server machines, as described in Multi-Node Deployments.

Liquid Data is built on the scalable WebLogic Server platform. In deployments that support a high volume of concurrent query requests, you can increase system performance by deploying on a WebLogic Server cluster—a group of WebLogic Servers that are managed as a single unit and distribute the load for processing query requests. For more information, see Clustered Deployments and Designing Deployments.

WebLogic Server Performance Factors

Liquid Data performance is affected by WebLogic Server performance. The WebLogic Server documentation provides a detailed suggestions for monitoring and tuning run-time performance. For detailed information, see BEA WebLogic Server Performance and Tuning in the WebLogic Server documentation.

The following table provides a summary of WebLogic Server tuning factor:

Table 5-5 Summary of WebLogic Server Performance Factors

Component	Tunable Performance Factor(s)
Hardware Resources	WebLogic Server Platform Supported Configurations CPU—maximize speed and throughput Memory—maximize capacity and speed Network resources—maximum bandwidth and speed Disk I/O and controllers—maximize disk speed and capacity Number of machines—increase as needed
Operating System	Configurable file descriptor limits Memory allocation for user processes Configurable TCP tuning parameters Configurable settings for the threading model
Network Resources	Network hardware and software Network bandwidth LAN infrastructure
Java Virtual Machine (JVM)	JVM vendor and version JVM heap size and garbage collection Generational garbage collection Mixed client/server JVMs Unix threading models Just-in-time (JIT) JVMs
WebLogic Server	config.xml file parameters weblogic-ejb-jar.xml parameters WebLogic Server startup parameters Java compiler WebLogic Server clusters JDBC driver and JDBC connection pool
WebLogic Server Applications	Performance analysis tools JDBC application tuning Session persistence Minimizing sessions Execute queues and thread usage

Liquid Data Host Server Machine

Faster hardware (storage, memory, and CPU throughput), large capacity storage (for caching and disk swapping), and for the Liquid Data server host machine generally provides higher performance. The following performance factors are associated with the host server machine:

Table 5-6 Liquid Data Host Server Machine Performance Factors

Factor	Description
CPU utilization	Optimal utilization is up to 80%.
Storage utilization	Machine should have sufficient available workspace for disk swapping and other storage operations. For recommendations, see "Installation Prerequisites" in Preparing to Install WebLogic Server in the WebLogic Server Installation Guide.
Memory utilization	The default memory size is 256MB, but it is recommended that you add as much memory as the server machine can support. Memory size is one of the most significant factors affecting Liquid Data performance. If Liquid Data uses application views or Web Services as data sources, the Maximum Threads setting specified on the General tab in the Liquid Data node in the Administration Console determines the number of threads available for asynchronous requests to application views and Web Services. For more information, see Configuring Liquid Data Server Settings in the Liquid Data Administration Guide.
Thread pools	The pool size for execute threads should be optimized. Liquid Data uses WebLogic Server's sophisticated multi-threading capabilities to process concurrent query requests more efficiently. By default, WebLogic Server uses 15 threads for the thread pool size. For more information, see "Tuning WebLogic Server in WebLogic Server Performance and Tuning. You need to tune the thread pool size according to the characteristics of the query request load on the Liquid Data server. Too many or too few threads can impede performance. In general, increase the number of threads if CPU utilization for the workload is low, and decrease the number of threads if CPU utilization exceeds 80%. You may need to experiment by repeatedly changing the maximum number of threads until you determine the optimum setting for your deployment.

WebLogic Integration Performance Factors

If Liquid Data is deployed with WebLogic Integration, then WebLogic Integration performance might affect Liquid Data performance, depending on how the two components interact. The WebLogic Integration documentation provides a detailed suggestions for monitoring and tuning run-time performance. For detailed information, see Understanding WebLogic Integration Clusters in Deploying WebLogic Integration Solutions.

The following table provides a summary of tuning factors, which are described in detail in the WebLogic Integration documentation.

Table 5-7 Summary of WebLogic Integration Performance Factors

Component	Tunable Performance Factor(s)
Business Process Management	Event listener message-driven bean for event-driven workflows
Application Integration	application view bean for synchronous service invocations thread pool size of the asynchronous request processor for asynchronous service invocations J2EE-CA resource pool size for each adapter
B2B integration	There are no primary resources that can be tuned
WebLogic Server	EJB pool and cache sizes JDBC connection pool sizes Execution thread pool size Resource connection pools for J2EE Connector Architecture adapters Large message support for B2B
Java Virtual Machine (JVM)	JVM vendor and version JVM heap size Garbage collection control (Hotspot JVM)
Hardware Resources	CPU—maximize speed and throughput Memory—maximize capacity and speed Network resources—maximum bandwidth and speed Disk I/O and controllers—maximize disk speed and capacity Number of machines—increase as needed
Operating System	Configurable file descriptor limits Memory allocation for user processes Configurable TCP tuning parameters Configurable settings for the threading model
JDBC Databases	Number of concurrently opened cursors Disk I/O optimization Database sizing and organization of table spaces Checkpointing Database compatibility Database monitoring

Monitoring Liquid Data Performance

This section describes how to monitor Liquid Data performance. It includes the following sections:

For detailed information about monitoring and tuning resources for the Liquid Data server, see "Tuning WebLogic Server" in WebLogic Server Performance and Tuning.

Monitoring Guidelines

When monitoring Liquid Data performance, consider the following guidelines:

Try simulating workloads and monitor performance at different load levels—peaks, lulls, and mid-range volumes—to determine optimal overall settings.
At a minimum, monitor server memory, threads, and CPU usage at these various load levels.
Characterize the query request workload in the production environment, determining which queries are used and how often they are requested. For example, you might learn that five queries represent 80% of the workload and 10 queries represent the remaining 20%.
Determine whether this workload involves the execution of small queries many times or large queries a few times and configure the thread pool accordingly.
Determine whether clustering is appropriate for your deployment. For more information, see Clustered Deployments.
Review your domain's server log for certain performance information. Liquid Data records the time to generate the query plan, compile the query plan, and execute the query. For relational databases, Liquid Data also records the SQL statement it generated for the query.

Using the WebLogic Administration Console to Monitor Performance

You can use the WebLogic Administration Console to monitor performance on a Liquid Data server, including the following areas:

Active execute queues—As work enters a WebLogic Server, it is placed in an execute queue and then assigned to a thread in which the work is processed. The size of the execute pool determines the number of threads that can be running concurrently on an execute queue. Threads consume resources, so it is important that you tune the number of threads to optimize performance, taking care not to slow performance by increasing the value unnecessarily. By default, WebLogic Server uses 15 threads for the thread pool size.
JDBC connection pools—If queries in your deployment use relational databases for data sources, you should monitor any active JDBC connection pools for performance. A JDBC connection pool contains a group of JDBC connections. At run-time, when a query that uses an RDBMS data source is executed, the query borrows a connection from the connection pool, uses it, then returns it to the connection pool by closing it. The size of the JDBC connection pool determines the number of JDBC connections that can be used concurrently. Connections consume resources, so it is important that you tune the number of connections to optimize performance, taking care not to slow performance by increasing the value unnecessarily.

For detailed information about using the WebLogic Administration Console to monitor server performance, see "Top Tuning Recommendations for WebLogic Server" in BEA WebLogic Server Performance and Tuning.