Tuning EDQ Performance

4 Tuning EDQ Performance

This chapter describes the server properties that can be used to optimize the performance of the EDQ system and how these properties should be configured in various circumstances.

This chapter includes the following topics:

EDQ has a large number of properties that are used to configure various aspects of the system. A relatively small number of these are used to control the performance characteristics of the system.

Performance tuning in EDQ is often discussed in terms of CPU cores. In this chapter, this refers to the number of CPUs reported by the Java Virtual Machine as returned by a call to the Runtime.availableProcessors()method.

Understanding the Properties File

The tuning controls are exposed as properties in the director.properties file. This file is found in the oedq_local_home configuration directory.

The available tuning properties are as follows:

Properties Description

Properties	Description
`runtime.threads`	This property determines the number of threads that will be used for each batch job which is invoked. The default value of this property is zero, meaning that the system should start one thread for each CPU core that is available. You can specify an explicit number of threads by supplying a positive, non-zero integer as the value of this property. For example, if you know that you want to start a total of four threads for each batch process, set `runtime.threads` to four.
`runtime.intervalthreads`	This property determines the number of threads that will be used by each process when running in interval mode. This will also define the number of requests that can be processed simultaneously. The default behavior is to run a single thread for each process running in interval mode.

runtime.threads

This property determines the number of threads that will be used for each batch job which is invoked. The default value of this property is zero, meaning that the system should start one thread for each CPU core that is available. You can specify an explicit number of threads by supplying a positive, non-zero integer as the value of this property. For example, if you know that you want to start a total of four threads for each batch process, set runtime.threads to four.

runtime.intervalthreads

This property determines the number of threads that will be used by each process when running in interval mode. This will also define the number of requests that can be processed simultaneously. The default behavior is to run a single thread for each process running in interval mode.

Properties Description

Properties	Description
`workunitexecutor. outputThreads`	This property determines the number of threads that will be used to write data to the results database. These threads service the queue of results and output data for the whole system, and so are shared by all the processes which are running on the system. The default value of this property is zero, meaning that the system should use one output thread for each CPU core that is available. You can specify an explicit number of output threads by supplying a positive, non-zero integer as the value of this property. For example, if you know that you want to use a total of four threads for each batch process, set `workunitexecutor.outputThreads` to 4.

workunitexecutor. outputThreads

This property determines the number of threads that will be used to write data to the results database. These threads service the queue of results and output data for the whole system, and so are shared by all the processes which are running on the system. The default value of this property is zero, meaning that the system should use one output thread for each CPU core that is available. You can specify an explicit number of output threads by supplying a positive, non-zero integer as the value of this property. For example, if you know that you want to use a total of four threads for each batch process, set workunitexecutor.outputThreads to 4.

Tuning for Batch Processing

The default tuning settings provided with EDQ are appropriate for most systems that are primarily used for batch processing. Enough threads are started when running a job to use all available cores. If multiple jobs are started, the operating system can schedule the work for efficient sharing between the cores. It is best practice to allow the operating system to perform the scheduling of these kinds of workloads.

Tuning for Real-Time Processing

When a production system is being used for a significant amount of real time processing, it should not be used for simultaneous batch and real time processing unless the real time response is not critical. Run batch processing only to process data that is required by the real time processes.

Tuning Batch Processing On Real-Time Systems

If batch processing must be run on a system that is being used for real time processing, it is best practice to run the batch work when the real time processes are stopped, such as during a scheduled maintenance window. In this case, the default setting of runtime.threads is appropriate.

If it is necessary to run batch processing while real time services are running, set runtime.threads to a value that is less than the total number of cores. By reducing the number of threads started for the batch processes, you prevent those processes from placing a load on all of the available cores when they run. Real time service requests that arrive when the batch is running will not be competing with it for CPU time.

Tuning Real-Time Thread Numbers

For most production systems the default value of one for runtime.intervalthreads is not appropriate. The default setting implies that, for any given real-time service handled by a process running in interval mode, all requests will be processed sequentially. If four requests for the same service arrive simultaneously, and the average time to process a request is 100 ms, then the first message will be processed after 100 ms, the second after 200 ms, and so on. In addition, all the work will be performed by a single core, meaning that on a four-core machine three of the cores are idle. It is best practice to set runtime.intervalthreads to the same as the number of available cores. This configuration allows incoming requests to be processed simultaneously, resulting in a more efficient use of resources and a much faster turnaround speed. The default setting for runtime.intervalthreads is adequate for development environments.

Tuning I/O Heavy Real-Time Processes

If a process performs significant I/O, you can try increasing the value of runtime.intervalthreads above the number of available cores. When a process performs intensive I/O, there will be times when all the threads are waiting for disk activity to complete, leaving one or more cores idle. By using more active threads than there are cores, you ensure that when one thread stalls for I/O, another thread can utilize the core that the thread was using.

Example of Tuning Real-Time Processes

In this example of how to tune real-time processes, a four-core Intel server is being used to support four different web services. The web services are CPU-intensive and perform minimal amounts of I/O. Some data used by the web services must be updated on a daily basis, which includes running a data preparation process in a batch mode. The web services receive intermittent sets of simultaneous requests. Overnight, the web services are stopped for maintenance and data preparation.

In this scenario, it is appropriate to leave the runtime.threads property set to its default value of one thread per CPU core: in this case, four threads. With the goal of performing data preparation in the quickest possible time, and assuming the process is not likely to become I/O bound, you can set the runtime.intervalthreads property to four. Using the same number of threads as processes ensures that the maximum number of requests are processed at the same time.

Note:

Increasing the value of runtime.intervalthreads means that there will be a significant increase in the memory requirement, particularly at interval turnover.

Tuning JVM Parameters

JVM parameters should be configured during the installation of EDQ. For more information, see Installing and Configuring Enterprise Data Quality. If it becomes necessary to tune these parameters post-installation to improve performance, follow the instructions in this section.

Note:

All of the recommendations in this section are based on EDQ installations using the Java HotSpot Virtual Machine. Depending on the nature of the implementations, these recommendations may also apply to other JVMs.

Setting the PermGen Space

If the following error message is reported in the log file, it may be necessary to increase the maximum PermGen space available:

java.lang.OutOfMemoryError: PermGen space

To do this, change the value against the -XX:MaxPermSize parameter on the JVM on the EDQ server. It will also be necessary to change the -XX:ReservedCodeCacheSize parameter proportionally. For example, if the MaxPermSize is doubled from 1024m to 2048m, the ReservedCodeCacheSize should be doubled.

Setting the Maximum Heap Memory

If an OutOfMemory error message is generated in the log file, it may be necessary to increase the maximum heap space parameter, -Xmx. For most use cases, a setting of 8GB is sufficient. However, large EDQ installations may require a higher max heap size, and therefore setting the -Xmx parameter to a value half that of the server memory is the normal recommendation.

Tuning Database Parameters

The most significant database tuning parameter with respect to performance tuning within EDQ is workunitexecutor.outputThreads. This parameter determines the number of threads, and hence the number of database connections, that will be used to write results and staged data to the database. All processes that are running on the application server share this pool of threads, so there is a risk of processing becoming I/O bound in some circumstances. If there are processes that are particularly I/O intensive relative to their CPU usage, and the database machine is more powerful than the machine hosting the EDQ application server, it may be worth increasing the value of workunitexecutor.outputThreads. The additional database threads would use more connections to the database and put more load on the database.

Adjusting the Client Heap Size

Under certain conditions, client heap size issues can occur; for example, when:

attempting to export a large amount of data to a client-side Excel file, or
opening up Match Review when there are many groups.

EDQ allows the client heap size to be adjusted using a property in the blueprints.properties file.

To double the default maximum client heap space for all Java Web Start client applications, create (or edit if it exists) the file blueprints.properties in the local configuration directory of the EDQ server. For more information about the EDQ configuration directories, see "EDQ Directory Requirements" in Installing Oracle Enterprise Data Quality.

Add the line:

*.jvm.memory = 512m

Note:

Increasing this value will cause all connecting clients to change their heap sizes to 512MB. This could have a corresponding impact on client performance if other applications are in use.

To adjust the heap size for a specific application, replace the asterisk, *, with the blueprint name of the client application from the following list:

director - (Director)
matchreviewoverview - (Match Review)
casemanager - (Case Management)
casemanageradmin - (Case Management Administration)
opsui - (Server Console)
diff - (Configuration Analysis)
issues - (Issue Manager)

Note:

Dashboard is not a Java Web Start application, and therefore cannot be controlled using this property.

For example, to double the maximum client heap space for Director, add the following line:

director.jvm.memory = 512m

When doubling the client heap space for more than one application, simply repeat the property; for example, for Director and Match Review:

director.jvm.memory = 512m

matchreviewoverview.jvm.memory = 512m