Skip Headers
Oracle® Enterprise Data Quality for Product Data Oracle DataLens Server Administration Guide
Release 5.6.2

Part Number E23614-02
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
View PDF

F Tuning the Server(s)

This appendix describes steps that can be taken to improve the throughput of the servers. The emphasis is on running DSA jobs as fast as possible.

Checking the Results

The most accurate way to check the timing is to place a timer around the calls to run the DSA.

Another way is to look at the results of the job in the Administration Web pages and check the duration of the job as follows:

Description of image093.png follows
Description of the illustration image093.png

Oracle DataLens Server Options

Load-Balancing the Servers

This cannot be taken advantage until there are two or more Oracle DataLens Servers in a single Server Group. The Oracle DataLens Server group will provide automatic load balancing and fail-over for all servers within a particular server group.

When running the application, be certain to call one of these production servers in the Server Group and not call the Admin server.

Manual load balancing can be performed for the servers in a single Server Group by selecting which data lenses are loaded by each server. Additionally, servers can be set to load DSA on a server-by-server basis. It is recommended that each server be setup with all the data lenses and DSAs and allow the Oracle DataLens Server to control the load balancing internally.

Round Robin Calls

When running DSA jobs from an application using the API, the Ping Servlet can be used to check for an active Oracle DataLens Server within a server group before making the call.

Ensure Tracing is Turned Off

This is turned off by default. Tracing is only turned on by Oracle Consulting Services to trace information flow in the system. This can be turned off in the Options menu of the Administration Web Pages. Additionally, there are a set of scs.trace.network flags that should be omitted or set to false in the server.cfg configuration file.

Data Service Application Optimization

Simplify the Data Service Application Process Steps

Each step in a DSA incurs additional overhead. This is because there is job information stored in the RDBMS repository for each the step of a DSA. Additionally there is overhead to package-up and ship the SOAP data contents from the DSA to each step during processing. What this means is that simplifying the DSA structure and placing as much of the process flow inside of Decision Maps will improve the speed of execution. We have observed timing improvements of up to .2 seconds for each DSA step that is replaced with a Decision Map.

Running Ultra High-Priority Jobs

Ultra-high priority jobs are supported. These DSA jobs do not store the step information in the RDBMS repository. The overhead of job execution is eliminated at the expense of job information and details of completed jobs. Especially for single-line jobs, ultra-high priority makes sense because the job execution will be as fast as possible and job details on thousands of single-line jobs will just clog up the DSA Job status Administration Web pages.

Run Jobs at the Correct Priority

The rule is that huge jobs should be run with a low priority, giving processing cycles to smaller medium and tiny high priority jobs. DSA jobs with a small number of input records and jobs where the user is waiting for a response need to be run at a high priority to get the fastest response time.

File Writing Between Steps

By default, when a DSA is being processed by the Oracle DataLens Server, all data will be held in memory, unless there are more than 5000 records being processed in a single DSA job. The speed of execution of these large jobs can be increased by setting the number of data records that are held in memory between these processing steps. This is controlled in the Oracle DataLens Server.cfg file with the following line:

wfg.maxlines=150000

Data Lens Optimization

Caching the Data Lenses

Individual data lenses can cache parsing rules in memory for re-use without re-loading the rule each time. This is mostly useful for data processing by data lenses that reuse the same data repeatedly. Examples of this would be manufacturer names, redundant data, part numbers that are reused often. Data lenses that are not a good candidate are those that process things like descriptions that are different each time and would require a different parse tree for each line.

The cache should be large enough that the most often repeated lines are allowed to stay in memory (using a LRU Queue where the least often used rules will drop out of memory). For instance if there are 300 manufacturer names that are often reused among several thousand names, then the cache should be set to 1000 or perhaps 2000 depending on the frequency of use, to ensure that the 300 most often used names continue to reside in memory.

This is change is required for each data lens that need the caching.

  • Check out the data lens to the client

  • Go to the C:\Datalens\Applications\data\cbidwell\project\CablesF\config directory

  • Edit the Project.xml file and modify the following line to the cache size

    <parseTreeCacheSize>0</parseTreeCacheSize>
    
  • Save and check-in the project after making this change.

Do Not Load Data Lenses That Are Not Being Used

When running in a production environment, the number of data lenses is controlled by the lenses that are deployed to Production. Do not deploy data lenses to Production if they are not going to be used for actual production DSA jobs.

Fine-tuning of which data lens are used by a particular server can be controlled by setting the particular data lenses that are loaded by a particular Production Oracle DataLens Server.

Description of image094.jpg follows
Description of the illustration image094.jpg

Tuning Multiple Parameterized Domains

Set the number of parameterized domain instances that will be loaded into memory. A single domain with two instances should set instances to three to maximize performance when using these domains.

  • One for the first parameterized domain

  • Another for the second

  • A third for both in memory

This is set in the server.cfg file as follows:

server.nle.instances=3

API Integration

WSDL Versus Java API Calls

The WSDL definition will create a dynamically generated Java API call that should have the same performance as the Enterprise DQ for Product Java API. Which method you use should be based on your current architecture, but not be based on any performance considerations.

Optimize the Available Hardware and Operating Systems

Windows Memory and Application Servers

See the section Tune memory usage on the servers for information on memory limitations of Windows servers.

Linux and Unix Memory, Windows Memory, and Java Servers

Linux and Unix running on 64-bit hardware does not have the 1.6 GB memory limitation for Java Web Server that we have observed on 32-bit Microsoft Windows servers. Windows 64-bit servers do not have this memory limitation either.

Important:

In an Enterprise DQ for Product production environment, only run on a 64-bit server running a 64-bit installation of Java. Never try to run a production environment on any 32-bit servers.

Database Query Tuning

In database-intensive DSAs, major performance improvements can be made by tuning the database DDL statements. Simple things like indexing fields that are being searched on and reducing the number of tables in computationally intensive SQL joins can be very effective in improving the performance of the DSAs.

These tuning tasks are very dependent on the particular database schema and would need to be examined by a database professional or Oracle Consulting Services.