Oracle® Application Server Performance Guide 10g Release 3 (10.1.3.1.0) Part Number B28942-02 |
|
|
View PDF |
This document describes the steps that need to be taken by the DBAs to configure the Oracle Application Server Wireless Messaging Server for high performance. The performance enhancements include SQL optimizations which improve the messaging server's one-way and two-way performance on both RAC and non-RAC environments. The following topics are described:
Various benchmarking tests on RAC environment revealed that the application must be carefully designed in a highly cooperative manner and properly tuned to leverage high performing AQ running in a multi-node RAC environment.
Throughput in a multi-node RAC can be increased considerably by creating multiple queues, such that (a) each queue has affinity with a particular RAC node, and (b) the enqueue/dequeue requests on a RAC node pick messages only from the queue that it has affinity with. This avoids the cache buffer waits that the requests from two or more RAC nodes experience while accessing a shared queue.
Some of the performance recommendations are also applicable to non-RAC environments.
The following subsections describe the steps that should be performed for configuring the messaging server for high performance.
For a more detailed explanation of the factors affecting the messaging server performance, see the section "Factors Affecting Messaging Server Performance".
This section discusses the steps to apply the one-off patch to Oracle Application Server 10.1.2.0.2 Wireless Messaging Server.
Note:
If your environment has a single RAC node (or is a single database machine), some of the tuning suggestions provided may not impact performance.The database should be tuned to provide optimal performance for a high-transaction environment. We recommend the following steps to tune the database.
Stop the application server instance(s).
Reduce checkpoint waits.
The redo logs should be set to at least 500MB, with at least 3 members per group (or 3 groups).
Reduce the frequency of checkpointing.
The log_checkpoint_timeout can be set to 0 to disable automatic checkpoints.
Use the following command:
alter system set log_checkpoint_timeout=0 scope=both;
Increase the System Global Area (SGA) to at least 1GB.
Increase the Shared Pool Size to at least a recommended minimum size of 300 MB.
Change the number of LMS (lock manager server) processes for RHEL 4. This is the key component of the global cache service. The LMS process is responsible for maintaining global cache coherency and moving blocks between instances for cache fusion requests.
For improved performance set LMS=1 by using the following command:
alter system set gcs_server_processes = 1 scope=spfile sid='*'
If you are using RHEL 4, you may consider tuning the network interfaces according to metalink note: 363147.1
Start the application server instance(s).
Table 9–1 summarizes the database system parameters and recommended values.
Table 9-1 Database System Parameters and Recommended Values
Database System Parameter | Recommended Value(s) | Remarks |
---|---|---|
Redo log |
500 MB (minimum) 3 members per group |
Reduce checkpoint waits. |
Log_checkpoint_timeout |
0 |
Disables automatic checkpointing |
System Global Area |
1 GB (minimum) |
|
Shared Pool Size |
300 MB (minimum) |
|
LMS processes |
1 |
For a more detailed explanation of the database tuning and OS factors affecting the messaging server performance see the section "Database Tuning"
Perform the following procedure to optimize performance in a multi-node RAC environment only.
Note:
Do not perform this procedure for a non-RAC setup.Connect to the database as sysdba and run the following script:
SQL> @@trans_tbs_create.sql
Note:
Òtrans_tbs_create.sqlÓ creates a new tablespace "TRANS" with ASSM (Automatic Segment Space Management). ASSM is required in a RAC environment for optimal performance of the messaging server.When prompted, enter the path to the base directory for the datafiles of this database.
For example, "/product/10.1.0/oradata/orcl"
The create script first attempts to create the tablespace without specifying the path to the datafile. On a typical RAC environment with ASM, this would run successfully. However on a typical non-RAC environment, the command would fail with an error indicating that datafile path was not specified. The script catches this error and then attempts to create the tablespace with the user-specified datafile base dir and the file name "trans.dbf".
SQL> @@trans_tbs_migrate.sql
Note:
trans_tbs_migrate.sql migrates/moves all transport (i.e., messaging server) tables and indexes to the new tablespace "TRANS".Restart all Wireless mid-tiers (including OC4J_Wireless and Wireless components).
Note:
On Windows, also start Enterprise Manager.This section describes the following messaging server configuration:
updating the dequeue navigation mode
adding a node-specific db connect string on the mid-tier instances.
Updating the dequeue navigation mode from ÒfirstÓ to ÒnextÓ message will provide an added increase in throughput, but with loss of message priority semantics. Perform the following steps to do so:
Login to the database as the wireless user and run the following command:
SQL> @@trans_config_update.sql dequeue_navigation_mode next
Note:
To configure some attributes used by the messaging server. We provide a script, trans_config_update.sql, to add or update these attributes.Usage: SQL> @@trans_config_update.sql <attr_name> <attr_val>
To revert to the default dequeue navigation mode run the following command:
SQL> @@trans_config_update.sql dequeue_navigation_mode first
In a multi-node RAC environment, to achieve high messaging performance in a multiple node RAC environment, the messaging server and OC4J_Wireless processes on a mid-tier instance should specifically always connect to only one RAC node. Owing to this configuration, if there are more RAC nodes than mid-tier instances, the messaging server will not utilize the extra RAC nodes. Figure 9–1 and Figure 9–2 show some valid configurations.
Figure 9-1 Three Mid-Tiers and Three RAC Nodes Each Connected to One Specific RAC
Figure 9-2 Three Mid-Tiers and Two RAC Nodes
To configure the mid-tier instances as described above, we can specify a customized connect string as a JVM parameter for these processes. The steps are as follows:
For each mid-tier instance, login to its Enterprise Manager and go to its home page.
Under the section ÒRelated LinksÓ, click on the ÒProcess ManagementÓ link.
Locate the XML section for the process-type "messaging_server".
For example, look for the XML: <process-type id="messaging_server" module-id="messaging">
In that section, add or modify the following XML:
<module-data> <category id="start-parameters"> <data id="java-parameters" value="-Xms64M -Xmx256M -Dwireless.db.instance.string= (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=<%hostname%>)(PORT=<%port%>))(CONNECT_DATA=(SID=<%sid%>)))"/> </category></module-data>
Replace the tokens <%hostname%>, <%port%> and <%sid%> with the appropriate ones for each RAC node.
Locate the XML section for the process-type ÒOC4J_WirelessÓ.
For example, look for the XML: <process-type id="OC4J_Wireless" module-id="OC4J">
In that section, find the Òjava-optionsÓ data node, i.e. look for the XML: <data id="java-options" value=Ó…Ó/>
Append the following JVM parameter to the ÒvalueÓ attribute:
-Dwireless.db.instance.string= (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=<%hostname%>)(PORT=<%port%>))(CONNECT_DATA=(SID=<%sid%>)))
Replace the tokens <%hostname%>, <%port%> and <%sid%> with the appropriate ones for each RAC node.
Note:
You can specify any valid db connect string (without the Òusername/password@Ó prefix). For example, you may want to add a fail-over instance to the connect string. However, it is important that you use ÒSIDÓ instead of ÒSERVICE_NAMEÓ, since we want to force the mid-tier process to connect to the specified RAC instance, rather than have the listener attempt any load balancing amongst the RAC instances.Restart the OC4J_Wireless and Wireless components on all mid-tier machines for the changes to take effect.
To tune the messaging server for high performance, perform the following procedures.
Two of the sequences, trans_mid and trans_did, are used extensively by the messaging server to generate various ids like status ids, message ids, store id etc.
Recommendation: A cache of 50,000 pre-allocated sequence numbers is provided in the instance's SGA for faster access.You may also increase it further to 100,000 to reduce disk I/O and cr waits, though care must be taken to ensure that the large cache does not reduce the memory footprint for other applications.
Warning: The DBA must be aware that large sequence caches can cause disordering of message ids and can also create ÒgapsÓ in ids when the database is restarted.
object_id column of the table trans_ids were accessed very frequently.
Recommendation: Provided in the software, the forced use of index created on object_id column of the table trans_ids.
The tests showed a significant improvement in messaging performance.
The default implementation of dequeue operation uses first message as navigation mode (dequeue_option.navigation = DBMS_AQ.FIRST_MESSAGE).
Recommendation: Change the navigation mode to next message (dequeue_option.navigation = DBMS_AQ.NEXT_MESSAGE). Instructions are in section "Updating the Dequeue Navigation Mode".
The tests showed a marginal increase in messaging performance. But the increase in performance comes with a limitation. With next message as navigation mode, the cursor used to select message from queue is cached and dequeue operation gets a snapshot of the queue. So any new message enqueued to the queue will not be visible. This will become a problem if the new message enqueued is of high priority and needs processed before the message present in the snapshot. Hence this change should be applied only if it is known that the priorities of the entire incoming message would be same.
A high transaction system will generate a lot of redo. This can cause too frequent checkpoints at log switches. Checkpoints are logged in the databases alert.log file.
Recommendation: Reduce frequency of checkpointing and log switching.
To reduce checkpoint waits, the redo logs should be set to a minimum of 500MB with at least 3 members per group (or 3 groups). Additionally, the parameter log_checkpoint_timeout can be set to 0 to disable automatic checkpoints.
To realize high throughout, some database parameters related to resource sharing should be fine-tuned.
Recommendation: Set the SGA and the Shared Pool to a sufficient size.
The SGA should be set to at least 1GB.
The shared pool should be set to at least 300MB.
The tests showed significant improvement in messaging performance.
The benchmark tests on two RAC nodes revealed that the application must be carefully designed in a highly cooperative manner and properly tuned to take full advantage of high performing AQ running in a multi-node RAC environment.
Recommendation: Provided in the software, multiple queues have been created, such that (a) each queue has affinity with a particular RAC node, and (b) the enqueue or dequeue requests on a RAC node select messages only from the queue that it has affinity with. This avoids the cache buffer waits that the requests from two or more RAC nodes experience while accessing a shared queue.
The tests showed sizable improvement in messaging performance.
When data is frequently inserted into the messaging server tables from multiple nodes, performance issues were found due to concurrent access to data blocks, table segment headers, and other global resource demands.
Using ASSM separates the data structures associated with the free space management of a table into disjoint sets that are available for individual instances. With ASSM, the performance issues among processes working on different instances are reduced because data blocks with sufficient free space for inserts are managed separately for each instance.
Recommendation: Use ASSM tablespace for the messaging server (transport) tables and indexes. Refer to "Optimizing Performance in a Multi-RAC Environment".
Since we now need multiple queues, the mid-tier instances also need to be balanced accordingly.
If the client or messaging server on one mid-tier instance unevenly connects among the available RAC nodes, some queues may get processed faster than other queues. Moreover, too many enqueuers or dequeuers on one queue can reduce the throughput performance on that RAC node.
Recommendation: To avoid this imbalance, we tie the messaging_server and OC4J_Wireless processes of a mid-tier instance to one particular RAC node. Instructions are in the section "Adding Node-Specific DB Connect String on Mid-Tier Instances"
The number of enqueue and dequeue threads have a big impact on the overall throughput. It is important to note that once a real driver is used and real client applications are used, the number of threads may have to be tuned again to get the best results. If there are too many enqueuers or dequeuers on a given queue (on a RAC node), the throughput for that queue (and other queues on that RAC node) can be negatively affected. The optimal number of enqueuers or dequeuers (governed by Driver Sending Threads, Driver Receiving Threads or Client Sending and Receiving Threads) also depends on the physical hardware characteristics of the RAC nodes, middle tier machines, disk storage, etc.
Recommendation: Assuming each enqueuing thread (Client Sending Threads in the case of sending, or Driver Receiving Threads in the case of receiving) generates equal load, the number of threads to set depends on the number of messages in the queue after several minutes of sustained load. If the queues are mostly empty, then the enqueue rate could be increased by increasing the number of enqueue threads (for sending, Client Sending Threads). Then if the queue size begins to grow beyond bounds (constantly rising as the client is enqueuing messages), then the dequeue threads (for sending, Driver Sending Threads) should be increased. You get the best results when you can get the queue sizes (on each RAC node) to stay constant (no more than 2000 messages during a 100,000 message test). It's a fine line, and requires a lot of iterative testing to get the best results. Too few threads or too many threads (for both driver and client) reduce the maximum achievable throughput.
Oracle Streams AQ time manager processes can be controlled by the init.ora parameter AQ_TM_PROCESSES, which can be set to nonzero to perform time monitoring on queue messages and for processing messages with delay and expiration properties specified. It was observed in one of our test environment that this parameter was set with the value '10' – this caused an issue in the AQ queue monitoring processes and caused one of the processes to consume all CPU cycles in a tight loop and reducing the CPU time available to other processes. Killing the corresponding process did not solve the issue as another process is spawned to replace the killed process. The recommendation from Oracle's AQ team to set this parameter's value to 9 fixed this behavior.
In fact, according to AQ team in Oracle Streams AQ release 10.1, this has been changed to a coordinator-slave architecture, where a coordinator is automatically spawned if Oracle Streams AQ or Streams is being used in the system. This process, named QMNC, dynamically spawns slaves depending on the system load. The slaves, named qXXX, do various background tasks for Oracle Streams AQ or Streams. Because the number of processes is determined automatically and tuned constantly, you no longer need to set AQ_TM_PROCESSES. However, it was observed on a test environment that removing this parameter caused the messaging server to create more queues than it intended. If the value was set to 9 the messaging server behaved as expected and created the desired number of queues.
If you observe any one of the following:
The number of driver queues created by the messaging server for a channel is more than the number of RAC nodes (and likewise for service queues)
It takes several minutes for the messaging server to complete creating queues (observed using Òselect * from wireless.trans_queueÓ, Òselect * from wireless.trans_driver_queueÓ and Òselect * from wireless.trans_service_queueÓ).
Even after you stop the messaging server, you observe a db process on the Infra nodes consuming 100% CPU that does not stop even after waiting for several minutes.
Then, perform the following tasks:
Stop all middle tiers.
Check the value of the aq_tm_processes parameter:
SQL> show parameter aq_tm_processes;
If the value is 0 or 10, then set the value to 9. An example that shows how to change the value as sysdba:
SQL> alter system set aq_tm_processes=9 scope=memory sid='*';
Note:
To permanently change the value, you may have to create a pfile from the spfile, update the value, and create the spfile using the updated pfile.Connect to the database as the wireless user and run:
SQL> execute transport.drop_all_queues;
Restart one middle tier – verify that none of the issues listed above are observed.
Restart remaining middle tiers.
This information is extracted from http://metalink.oracle.com Note: 363147.1.
Upgrading from RHEL3 to RHEL4 could result in poor performance under certain set of circumstances for the RAC interconnect. This note will detail the steps involved in identifying the issue and resolve it.
This section applies to the following:
Customers who have upgraded from RHEL3 to RHEL4
New customers using RHEL4
Global cache lost blocks and IP fragmentation failures causing poor RAC interconnect performance: After upgrading RAC cluster from Linux from RHEL3 (2.4.21) to RHEL4 (2.6.9), it was observed high rate of global cache blocks lost up to 1.5 per second, gc cr block lost in the top five wait event, and degraded performance up to 30% drop in throughput. The amount of gc blocks lost is associated with the amount of packets reassemble failure. These observations do not depend on Oracle release version. This could also happen in any system that using RHEL4.
Further analysis showed this was due to a change in Ethernet flow control setting for Intel e1000 driver between RHEL3 to RHEL4.
Internal and Customer system configuration this issue was observed:
Intel Xeon based server both 32bit and 64bit EM64T
Intel GigE Ethernet card with e1000 driver
Redhat Enterprise Linux 4 upgrade 2 (32bit and 64 bit kernels)
Running Oracle RAC database (not version dependent) in a Cluster environment
How to confirm you are running into the same issue:
From Oracle awr or statspack report
'global cache cr block lost' or 'global cache current block lost' shows up in the top five waits
'global cache block lost' statistic has non zero value and drop rate is greater than 0.4 blocks per second
From OS 'netstat -s ' output IP statistics
'packet reassembles failed' is non zero value and its rate is associated with 'global cache block lost'.
In RHEL4 (2.6.9), RX flow control of the network adapter e1000 is turned off by default, which is on in RHEL3. Enabling RX flow control of the adapter eliminates lost blocks and packets reassemble failures.
The following syntax uses eth1 as an example to illustrate setting of flow control.
To check flow control setting of eth1: ethtool –a eth1
To enable rx flow control setting for eth1: ethtool –A eth1 rx on
To make flow control persistent after reboot: edit /etc/modprobe.conf, and add the following in modprobe.conf: options e1000 FlowControl=1,1 Perform a reboot. The setting will be preserved.
The full syntax: FlowControl value: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx) Specify the value for each adapter, separated by comma.
For Intel Xeon servers with e1000 network adaptor, in RHEL3 (2.4), RX (i.e., e1000 responding to frames sent by the switches) flow control of the network adapter is enabled by default. But after upgrading to RHEL4 (2.6.9), RX flow control is turned off by default, and causes blocks lost. The reason for this change in this change in behavior is still under investigation. Note that the default setting for flow control depends on both the version of the e1000 driver in kernel and the revision of the e1000 card itself. With the availability of Intel GigE NIC's based on different chipsets and different e1000 drivers complicates the issue. We need to observe the exact set of problems described above and use solution suggested. This behavior is not pervasive in either RHEL4 or all Intel GigE NIC's.TX flow control in both Linux 2.4 and 2.6.9 are disabled (by default). It is not advised enabling TX flow control.
This section discusses what the messaging server does when one of the RAC instances fails (or goes offline).
When the driver queues and service queues are created, the messaging server assigns a primary node and a secondary node to the queue. By default, the primary node is the ÒactiveÓ node (the node that the queue currently has affinity with). When the primary node fails, AQ makes the secondary node active, i.e., assigns affinity to the secondary node.
The messaging server could/should keep track of such changes, because it needs to select the appropriate queue for enqueuing and dequeueing when it connects to a node.
In our current implementation, we have a lightweight dbms job that monitors the queue affinities at a specified time interval (every two minutes) and automatically updates the queue to RAC node mappings used by the messaging server.
Thus, in case of a node failure, the messaging server will continue to process messages from the queues that have changed affinities. When the failed node comes back online, the mappings are automatically reassigned by the dbms job. The DBA does need not do anything to handle the failure of a single instance.
In case the RAC instance downtime is planned, and one does not wish to wait for the dbms job to update the queue to RAC node mappings, the DBA can force this update using the following steps (after bringing down a RAC node):
Sqlplus as wireless/<%wireless_pwd%>@<%db-connect-string%>
Run the following SQL command to update the queue to RAC node mappings maintained by the messaging server:
SQL> execute transport.monitor_queues;
This section describes the steps required to reconfigure the messaging server when RAC nodes and mid-tiers are added or removed (permanently).
When you add or remove a RAC node (permanently) then you need to reconfigure the messaging server so that it can recreate the queues appropriately. Perform the following steps:
Stop opmn on all mid-tier instances.
Change to $ORACLE_HOME/wireless/repository/sql directory
Perform sqlplus <wireless/<%wireless_pwd%>@<%db_conn_string%>.
Drop the existing queues.
execute transport.drop_all_queues;
Start opmn on all mid-tier instances.
The queues are auto-created at startup.
Since load balancing is essential to achieve optimal throughput, it is important that there be at least as many mid-tier instances as RAC nodes (configured for the messaging server).
If there are fewer mid-tier instances than RAC nodes, the messaging server will only make use of the RAC nodes that the mid-tier instances connect to.
On the other hand, if there are more mid-tier instances than RAC nodes, you could configure the db connect strings of the mid-tier instances as discussed in section 1.4.1. However, note that if there are too many enqueuers or dequeuers on a given queue (on a RAC node), the throughput for that queue (and other queues on that RAC node) can be negatively affected. The optimal number of enqueuers or dequeuers (governed by Driver Sending Threads, Driver Receiving Threads or Client Sending and Receiving Threads) depends on the physical hardware characteristics of the RAC nodes, middle tier machines, disk storage, etc.
This section describes the setup for the test scenarios and the result.
This section describes the machine, software, hardware, and RAC setup.
Six identical machines were used for performing these tests.
Middle tier was installed in 3 of these machines; Database was on RAC configured with 3 Nodes. IM was installed on a separate machine.
All six machines (3 Middle tiers, 3 RAC nodes) are 2 Intel Xeon CPUs (from OS visible as 4) 4GB, 3.0 GHz.
OS: Red Hat Enterprise Linux AS R4 (Nahant Update 3)
Oracle Application server 10.1.2.0.2
DB and CRS are 10.2.0.2; ASM 2.0
The database servers are configured in RAC. To allow the DB servers to access the same database, a network storage unit is required that can be shared between the three servers. This was resolved using a set of SAN disks (described in Table 9–2), connected to servers with fibre channel. The SAN provides fast and easy disk access from the database servers, and it is easily configurable using a web-based client tool.
Table 9-2 SAN Disks
Vendor |
EMC |
Model and version |
EMC Clarion CX 300 |
Disks |
3* 33GB, Fibre Channel in RAID 0 configuration, disk0,disk1,disk2 2 * 33GB, Fibre Channel, RAID 1 configuration, disk3 |
Additional information |
disk0,disk1 and disk2 were added to DATADG ASM Diskgroup, and are used to contain database datafiles and spfile. They are visible from db01,db02 and db03 as /dev/sdc, /dev/sdd and /dev/sdedisk3 was divided into 3 partitions, first two are used as raw devices for voting disk (/dev/sdb1) and cluster repository (/dev/sdb2). Third one (/dev/sdb3) was added to LOGDG Asm Diskgroup, and keeps database redo logs. |
One-way test was performed before and after performance improvements in the code were implemented. The details of the test scenario and result are shown below.
Total number of messages for which the test was run = 100,000 in each Mid Tier
Number of client programs running = 1 in each middle tier machine.
Number of messaging server running = 1 in each middle tier machine.
Number of client threads sending messages from each middle tier = 10
Number of message each client threads send from each middle tier = 10,000.
Delay between two send operation = 0
Number of driver sending threads in each middle tier messaging server instance = 8.
Note:
he driver used is dummy because it does not connect to any real SMSC or SMSC Simulator. The actual test results in a real live scenario may differ slightly.After applying the performance enhancement patch, the one-way throughput results are illustrated in Figure 9–3.
Figure 9-3 One-Way Throughput Test Results
Table 9–3 shows two-way test data generated using a dummy driver.
Table 9-3 One-way Test Detail
Test Detail | Send Throughput (msg/sec) |
---|---|
Test 1: Result for one way testing (Send) after performance enhancement (tuning messaging server and database) with one DB node configuration. |
185 |
Test 2: Result for one way testing (Send) after performance enhancement (tuning messaging server and database) with two DB node configuration. |
290 |
Test 3: Result for one way testing (Send) after performance enhancement (tuning messaging server and database) with three DB node configuration. |
420 |
Note:
The throughputs reported are steady-state throughputs (since the system and the test has a ramp-up period).Two-way test was run before and after applying the performance enhancements. The details of the test scenario and the result are below.
Total number of messages for which the test was run = 20,000 on each mid tier.
Number of client programs running = 1 in each middle tier machine
Number of messaging server running = 1 in each middle tier machine.
Client configuration:
Number of receiving threads set for the client running from each middle tier = 4.
Driver configuration:
Number of driver sending threads in each middle tier messaging server instance = 3.
Number of driver receiving threads in each middle tier messaging server instance = 2.
Note:
The driver used is dummy and does not connect to any real SMSC or SMSC Simulator. The actual test results in a real live scenario may differ slightly.The two-way test scenario results are shown below.
Figure 9-4 Two-Way Throughput Test Results
Table 9–4 shows two-way test data generated using a dummy driver.
Table 9-4 Two-way Test Data
Test Detail | Send Throughput (msg/sec) | Receive Throughput (msg/sec) |
---|---|---|
Test 1: Result for two-way testing (Send and Receive) after performance enhancements (tuning messaging server and database) with one DB node configuration. |
86 |
140 |
Test 2: Result for two-way testing (Send and Receive) after performance enhancements (tuning messaging server and database) with two DB node configuration. |
130 |
140 |
Test 3: Result for two-way testing (Send and Receive) after performance enhancements (tuning messaging server and database) with three DB node configuration. |
160 |
160 |
Note:
The throughputs reported are steady-state throughputs (since the system and the test has a ramp-up period).