Deployment Guide
Tuning Liquid Data Performance
This chapter describes how to tune performance in a BEA Liquid Data for WebLogic deployment. It includes the following sections:
In a production environment, Liquid Data performance is generally measured by the speed with which queries are processed and results are returned. This topic describes general guidelines for performance and, where possible, provides specific guidelines for tuning a Liquid Data deployment.
Where to Begin
This topic describes where to begin tuning a Liquid Data deployment. It contains the following sections:
Checking Your System Configuration
Before you begin to investigate how to tune Liquid Data performance, make sure that the system on which Liquid Data runs is reasonably configured.
- System platform configuration. Do you have sufficient CPU, memory, and disk space resources? A properly tuned heap size? The right Java Virtual Machine (JVM)? Reasonable network speed? For more information, see Platform Performance Factors.
- Data sources configuration. Are data sources properly tuned? For example, if the deployment uses relational database data sources, do the tables in those data sources have adequate indexes? Is the JDBC connection pool size reasonably configured? If the deployment uses Web Services data sources, is the Web Service available and does it provide a reasonable response time? For more information, see Data Source Performance Factors.
Tuning Queries
In addition to the system configuration, Liquid Data performance is greatly affected by the query design, as described in Query Performance Factors. Therefore, make sure that the queries running on the Liquid Data server are reasonably designed:
- Does each query follow the appropriate query design pattern, as described in "Design Patterns" in Designing Queries in Building Queries and Data Views?
- Does each query provide the appropriate hints, as described in Optimizing Queries in Building Queries and Data Views?
- Does each query have the appropriate scope, as described in "Understanding Scope in Basic and Advanced Views" in Designing Queries in Building Queries and Data Views?
- Does any query produce a large intermediate result set or final result set? For more information, see Query Performance Factors.
- Have you debugged the query using a verbose logging mode (info) and reviewed the trace? For more information, see Monitoring Liquid Data Performance.
Liquid Data Performance Factors
Many factors influence overall Liquid Data performance. Certain factors, such as query design, are within Liquid Data's scope, while other factors, such as data source processing speed, are outside Liquid Data's area. This topic identifies the main factors that can affect Liquid Data performance. It includes the following sections:
Query Performance Factors
Liquid Data performance depends on the way queries are designed and configured for execution. The following query factors affect Liquid Data performance:
Table 5-1 Query Performance Factors
Factor
|
Description
|
Query complexity
|
Some query operations are more resource intensive than others.
Recommendations: Because analytical queries generally consume more memory and CPU resources than simple queries, see the sections later in this table on caching and large result sets.
|
Query type
|
The type of query (stored or ad hoc) affects performance:
For stored queries, the query is compiled only once and query execution plan is cached. In addition, the query result can be cached for faster retrieval if the query is executed subsequently with the same parameters. For more information, see Configuring the Query Results Cache in the Liquid Data Administration Guide.
For certain queries, the time to compile the query might take much longer then the time to execute.
Recommendation: Use stored queries in a production system.
|
Design pattern used (query hints)
|
When building queries, the appropriate design patterns must be used to ensure the fastest possible execution speed. For example, for joins, you should use a query hint to supply as much information to the execution engine as possible about the amount of data to search when processing a query.
Recommendation:
Use hints, if applicable. Hints are particularly significant with large data sources. If you are using the Data View Builder, design patterns for target schema are also very important. For more information, see "Understanding Query Design Patterns" in Designing Queries and Optimizing Queries in Building Queries and Data Views.
|
Caching
|
Caching improves performance for stored queries. Liquid Data supports two types of caching:
Recommendations:
Use query results caching, if applicable. For a complex query, or for a query that retrieves data from slower data sources, caching the query result provides a substantial performance gain. Once the query is cached, the query execution time is the fixed cost of retrieving the results cache from the database plus returning it as an XML document. For more information, see Configuring the Query Results Cache in the Liquid Data Administration Guide.
|
Size of intermediate or final results returned and memory usage
|
Generally the larger the result size (final or intermediate results), the longer it takes to retrieve and process the results.
Recommendations:
Increase the heap size to as large as the system can allow. In general, increasing the heap size increases performance. If increasing the heap size solves the out-of-memory problem, then no further action is required.
For relational database data sources, use the {--! merge !-- } query hint, if applicable. The effect is similar to a database merge join. Using this technique has a space/performance trade-off: a merge hint provides better memory usage but it might also slow performance if extra sorting is required to perform the merge join.
For queries that perform joins on multiple data sources, specify the sequence of data sources in ascending order by increasing size: the smallest resource should appear in the first FOR loop, and the largest resource should appear in the last one. For more information, see "Source Order Optimization" in Optimizing Queries in Building Queries and Data Views.
Use the disk swapping option, which allows Liquid Data to store intermediate results on disk when the results exceed the MEM_SORT_BUF size specified in the server startup script (startWebLogic.cmd on Windows or startWebLogic.sh on Unix). The default setting for MEM_SORT_BUF size is 50MB. The recommended size is less than one third (1/3) of the maximum memory size defined in MEM_ARGS in the server startup script. To configure disk swapping, see Configuring Liquid Data Server Settings in the Administration Guide.
Note: Liquid Data queries do not retrieve binary large object (BLOB) data from relational databases. See Supported Data Types in XQuery Reference Guide.
|
Number of concurrent queries
|
The higher the number of concurrent queries, the slower the performance, particularly during peak loads. Performance improves through the use of additional CPUs and WebLogic Server clusters, as described in Clustered Deployments, and with tuning the thread pool, as described in Using the WebLogic Administration Console to Monitor Performance.
|
Data Source Performance Factors
In general, Liquid Data performance depends on the speed at which the data source host system is able to process query requests and return results.
Performance Factors for All Data Sources
The following data source factors affect Liquid Data performance:
Table 5-2 Data Source Performance Factors
Factor
|
Description
|
Data source type
|
Some types of data sources offer higher performance (such as relational databases) than other types (such as application integrations or Web Services). For more information, see Table 5-3.
|
Data source size
|
The size of the data source affects performance. In general, the larger the data source, the longer it takes to retrieve the query results. For example, a large XML document takes longer to process than a small XML document. For relational databases, indexing can very substantially improve performance, particularly for large databases.
|
Number of data sources
|
For queries that access multiple data sources, data is retrieved from each data source in sequence, one data source at a time. This is true for all data source types except application views, Web Services, and custom functions (which are processed asynchronously).
For application views, Web Services, and custom functions you can configure the maximum number of connections or the maximum number of concurrent threads to be used. If queries use Web Services, application views, or custom functions extensively, then consider tuning this setting.
|
Data source performance and availability
|
A query fails if a required data source is unavailable during query execution due to server failure, insufficient available connections, failed authentication, or other factors.
Faster hardware (storage, memory, and CPU throughput) for the data source host machine generally provides higher performance, particularly for larger data sources.
Except for any XML files stored locally on the host system, all data sources are remote. Therefore, network connection availability and speed affects how quickly the query results are returned. In addition to network capacity and throughput speeds, the number of hops between nodes can greatly affect performance. Secure connections, such as SSL (Secure Sockets Layer) increase security but slow performance. Network speed is not a factor with local data sources, such as XML files stored locally on the host system. However, network speed is a factor with XML files stored remotely.
|
Transaction isolation level (relational databases only)
|
For relational databases, the transaction isolation level setting can affect query performance:
For databases containing dynamic or updateable data, use the transaction isolation level that supports the degree of concurrency required. However, increased concurrency can result in slower query performance. Using a transaction level of TRANSACTION_SERIALIZABLE , the highest level of concurrency, is more likely to reduce performance than using a lower level of concurrency, such as TRANSACTION_REPEATABLE_READ or TRANSACTION_READ_COMMITTED .
For more information about configuring the transaction isolation level for a relational database, see "Creating a Relational Database Data Source Description" in Configuring Access to Relational Databases in the Liquid Data Administration Guide.
|
Performance Factors for Data Source Types
The following table describes the most important performance factors for each supported data source type:
Table 5-3 Performance Factors for Data Source Types
Type
|
Important Performance Factor(s)
|
Relational databases
|
An optimized database design and a highly tuned configuration substantially improves performance. For example, indexing improves performance, particularly for large databases. For more information, see your database vendor's documentation.
The JDBC driver configuration can affect query performance, and JDBC connection pool settings must be properly tuned. For more information, see JDBC in the BEA WebLogic Server Administration Console Online Help.
|
XML files
|
|
Web Services
|
Liquid Data runs query requests on Web Services asynchronously. If a query uses multiple data sources, the query can continue processing while waiting for a response from a Web Service. For Web Services, therefore, performance also depends on the Maximum Threads setting specified on the General tab in the Liquid Data node in the Administration Console, as described in Configuring Liquid Data Server Settings in the Liquid Data Administration Guide.
|
Application views
|
Liquid Data runs query requests on application views asynchronously. If a query uses multiple data sources, the query can continue processing while waiting for a response from an application view. For application views, therefore, performance also depends on the Maximum Threads setting specified on the General tab in the Liquid Data node in the WebLogic Administration Console, as described in Configuring Liquid Data Server Settings in the Liquid Data Administration Guide.
|
Data views
|
|
Platform Performance Factors
This section describes performance factors associated with the Liquid Data server, including the host server hardware, clustering Liquid Data servers, tuning threads, tuning WebLogic Server, and tuning WebLogic Integration. The most important factor is running Liquid Data on a very fast server machine with the maximum amount of available memory. For general information about platform performance, see Tuning Hardware, Operating System, and Network Performance in BEA WebLogic Server Performance and Tuning.
This section describes the following platform performance factors:
General Platform Performance Factors
The following general performance factors are associated with a Liquid Data deployment:
Table 5-4 Server Hardware Performance Factors
Factor
|
Description
|
Network connection speed
|
For remote resources such as data sources, the speed and capacity of the network connection is an important factor. In addition to network capacity and throughput speeds, the number of hops between nodes can greatly affect performance. Secure connections, such as SSL (Secure Sockets Layer) increase security but slow performance.
|
Distribution of resources across servers
|
Performance is greatly affected by the way in which Liquid Data, WebLogic Server, and other WebLogic Platform resources are distributed across servers. For example:
If a Liquid Data deployment uses application views as data sources, it is generally optimal to run Liquid Data and Application Integration on separate server machines, as described in Multi-Node Deployments.
Liquid Data is built on the scalable WebLogic Server platform. In deployments that support a high volume of concurrent query requests, you can increase system performance by deploying on a WebLogic Server cluster—a group of WebLogic Servers that are managed as a single unit and distribute the load for processing query requests. For more information, see Clustered Deployments and Designing Deployments.
|
WebLogic Server Performance Factors
Liquid Data performance is affected by WebLogic Server performance. The WebLogic Server documentation provides a detailed suggestions for monitoring and tuning run-time performance. For detailed information, see BEA WebLogic Server Performance and Tuning in the WebLogic Server documentation.
The following table provides a summary of WebLogic Server tuning factor:
Table 5-5 Summary of WebLogic Server Performance Factors
Component
|
Tunable Performance Factor(s)
|
Hardware Resources
|
|
Operating System
|
|
Network Resources
|
|
Java Virtual Machine (JVM)
|
|
WebLogic Server
|
|
WebLogic Server Applications
|
|
Liquid Data Host Server Machine
Faster hardware (storage, memory, and CPU throughput), large capacity storage (for caching and disk swapping), and for the Liquid Data server host machine generally provides higher performance. The following performance factors are associated with the host server machine:
Table 5-6 Liquid Data Host Server Machine Performance Factors
Factor
|
Description
|
CPU utilization
|
Optimal utilization is up to 80%.
|
Storage utilization
|
Machine should have sufficient available workspace for disk swapping and other storage operations. For recommendations, see "Installation Prerequisites" in Preparing to Install WebLogic Server in the WebLogic Server Installation Guide.
|
Memory utilization
|
If Liquid Data uses application views or Web Services as data sources, the Maximum Threads setting specified on the General tab in the Liquid Data node in the Administration Console determines the number of threads available for asynchronous requests to application views and Web Services. For more information, see Configuring Liquid Data Server Settings in the Liquid Data Administration Guide.
|
Thread pools
|
The pool size for execute threads should be optimized. Liquid Data uses WebLogic Server's sophisticated multi-threading capabilities to process concurrent query requests more efficiently. By default, WebLogic Server uses 15 threads for the thread pool size. For more information, see "Tuning WebLogic Server in WebLogic Server Performance and Tuning.
You need to tune the thread pool size according to the characteristics of the query request load on the Liquid Data server. Too many or too few threads can impede performance. In general, increase the number of threads if CPU utilization for the workload is low, and decrease the number of threads if CPU utilization exceeds 80%. You may need to experiment by repeatedly changing the maximum number of threads until you determine the optimum setting for your deployment.
|
WebLogic Integration Performance Factors
If Liquid Data is deployed with WebLogic Integration, then WebLogic Integration performance might affect Liquid Data performance, depending on how the two components interact. The WebLogic Integration documentation provides a detailed suggestions for monitoring and tuning run-time performance. For detailed information, see Understanding WebLogic Integration Clusters in Deploying WebLogic Integration Solutions.
The following table provides a summary of tuning factors, which are described in detail in the WebLogic Integration documentation.
Table 5-7 Summary of WebLogic Integration Performance Factors
Component
|
Tunable Performance Factor(s)
|
Business Process Management
|
|
Application Integration
|
|
B2B integration
|
There are no primary resources that can be tuned
|
WebLogic Server
|
|
Java Virtual Machine (JVM)
|
|
Hardware Resources
|
|
Operating System
|
|
JDBC Databases
|
|
Monitoring Liquid Data Performance
This section describes how to monitor Liquid Data performance. It includes the following sections:
For detailed information about monitoring and tuning resources for the Liquid Data server, see "Tuning WebLogic Server" in WebLogic Server Performance and Tuning.
Monitoring Guidelines
When monitoring Liquid Data performance, consider the following guidelines:
- Try simulating workloads and monitor performance at different load levels—peaks, lulls, and mid-range volumes—to determine optimal overall settings.
- At a minimum, monitor server memory, threads, and CPU usage at these various load levels.
- Characterize the query request workload in the production environment, determining which queries are used and how often they are requested. For example, you might learn that five queries represent 80% of the workload and 10 queries represent the remaining 20%.
- Determine whether this workload involves the execution of small queries many times or large queries a few times and configure the thread pool accordingly.
- Determine whether clustering is appropriate for your deployment. For more information, see Clustered Deployments.
- Review your domain's server log for certain performance information. Liquid Data records the time to generate the query plan, compile the query plan, and execute the query. For relational databases, Liquid Data also records the SQL statement it generated for the query.
Using the WebLogic Administration Console to Monitor Performance
You can use the WebLogic Administration Console to monitor performance on a Liquid Data server, including the following areas:
- Active execute queues—As work enters a WebLogic Server, it is placed in an execute queue and then assigned to a thread in which the work is processed. The size of the execute pool determines the number of threads that can be running concurrently on an execute queue. Threads consume resources, so it is important that you tune the number of threads to optimize performance, taking care not to slow performance by increasing the value unnecessarily. By default, WebLogic Server uses 15 threads for the thread pool size.
- JDBC connection pools—If queries in your deployment use relational databases for data sources, you should monitor any active JDBC connection pools for performance. A JDBC connection pool contains a group of JDBC connections. At run-time, when a query that uses an RDBMS data source is executed, the query borrows a connection from the connection pool, uses it, then returns it to the connection pool by closing it. The size of the JDBC connection pool determines the number of JDBC connections that can be used concurrently. Connections consume resources, so it is important that you tune the number of connections to optimize performance, taking care not to slow performance by increasing the value unnecessarily.
For detailed information about using the WebLogic Administration Console to monitor server performance, see "Top Tuning Recommendations for WebLogic Server" in BEA WebLogic Server Performance and Tuning.