3
Scaling Applications for Real Application Clusters

This chapter describes methods for scaling applications for deployment in Oracle Real Application Clusters environments. This chapter provides a methodical approach to application design as well as procedures for resolving application performance issues. Topics in this chapter include:

Overview of Development Techniques in Real Application Clusters

In general, application deployment for Oracle Real Application Clusters should not be significantly different than application deployment for single instance environments. There are, however, special topics and particular techniques that you should keep in mind. This chapter explains these issues and provides a high-level development methodology for deploying Real Application Clusters-based applications to optimize Oracle9i features.

Begin with an Analysis

To use Real Application Clusters to improve overall database throughput, conduct a detailed analysis of your database design and your application's workload. This ensures that you fully exploit the added processing power provided by the additional nodes. Even if you are using Real Application Clusters only for high availability, careful analysis enables you to more accurately predict your system resource requirements.

A primary characteristic of high performance Real Application Clusters systems is that they minimize the computing resources used for Cache Fusion processing. That is, these systems minimize the number of inter-instance resource operations. Before beginning your analysis, however, you must understand how Real Application Clusters accesses database blocks when processing SQL statements that are issued by your applications. This is described in the following section.

SQL Statement Execution in Real Application Clusters

Most transactions involve a mixture of INSERT, UPDATE, DELETE, and SELECT statements. Exactly what percentage of these statement types that each transaction uses depends on the transaction type. Likewise, each of these operations accesses certain types of database blocks. You can categorize these block types as:

Data blocks
Index blocks (root, branch, and leaf)
Segment header blocks
Rollback segment header blocks
Rollback segment blocks

Block access modes control concurrent access to database blocks in a cache. In the buffer cache, a block can be accessed in any of the following modes:

X, or exclusive current read (XCUR)
S, or shared current read (SCUR)
N, or null, or consistent read (CR)

The Global Cache Service (GCS) maintains block access modes. To see a buffer's state as well as information about each buffer header, query the STATUS column of the V$BH dynamic performance view.

See Also:

Oracle9i Real Application Clusters Concepts for a detailed description of how the Global Cache Service performs data access

Block Accesses During INSERT Statement Execution

When Oracle executes INSERT statements, Oracle reads the segment header of a database object. This might mean that an INSERT statement must read a segment free list in the segment header of a table or an index to locate a block with sufficient space into which to fit a new row.

Therefore, to process inserts, Oracle reads the current, or most up-to-date version of the header block. If there is enough free space in the block after completing the insert, then the block remains on the free list and the transaction reads the corresponding data block and writes to it. For this sequence of events:

The segment header is acquired in SCUR mode, which means that the instance must request a global S mode
The data block is acquired in XCUR mode, or globally in X mode

If the remaining free space in the block is insufficient after the insert operation, then Oracle removes the block from the free list. This means Oracle updates the segment header block containing the free list. For this sequence of events:

The segment header block is first acquired in SCUR mode (global S mode).
After checking the block, Oracle escalates the buffer access mode to XCUR, (global X).
Oracle removes the block from the free list.
If a new block beyond the current high water mark is used, then Oracle raises the high water mark.
The data block is read in XCUR mode and written to disk.

This scenario assumes that freelist groups were not defined for the segment. In that case, Oracle stores the high water mark as well as a map of allocated extents in the segment header.

If Oracle allocates an additional extent to insert into an object, then Oracle raises the high water mark and updates the extent map. In other words, Oracle changes the segment header block in a consistent fashion; this also requires that Oracle lock the header block in exclusive mode.

Note:

For the preceding explanations and the following descriptions, assume Oracle has cached all the blocks required for the operation in memory.

For an insert into a table with an index, even more data block accesses are required. First, Oracle reads the header block of the index segment in SCUR mode, then Oracle reads the root and branch blocks in SCUR mode. Finally, Oracle reads the leaf block in XCUR mode.

Depending on the height of the index tree, Oracle would also have to read more branch blocks. If a free list modification is required, then Oracle escalates the index segment header access mode to XCUR. If there is concurrency for the segment header due to free list modifications, then Oracle sends the header block back and forth between multiple instances. Using free list groups at table creation effectively achieves free list partitioning.

Block Accesses During UPDATE Statement Execution

An UPDATE statement always acquires a database block in its current version and sets the buffer to XCUR mode. Globally, this maps to a request for the block in S mode. Assuming all blocks are cached, an UPDATE transaction:

Reads the buffer in XCUR mode and gets the block in X mode.
Writes to the buffer and modifies a row.
If the updated row fits into the same block, then the instance does not need to acquire new blocks from the free list and the modification is complete; segment header access is unnecessary.
The instance retains exclusive access and the GCS sends the block to the requesting instance.
The local instance releases the resource or retains it in NULL mode. The local instance also requests the block in exclusive mode for subsequent updates; this can involve other Cache Fusion block transfers.

If the table has an index, then the UPDATE statement:

Reads the root block of the index in SCUR mode.
Reads one or more branch blocks in SCUR mode.
Reads the leaf block and pins it into the cache in SCUR mode.
Reads the data block in XCUR mode.
Modifies the data block.

If the index key value was changed, then Oracle:

Re-reads the root and branch blocks in SCUR mode.
Reads the leaf block in XCUR mode.
Modifies the index key value for the updated row. This can involve deleting the key value from the current block and acquiring another leaf block in XCUR mode.

During the update operation with an index, a block can be transferred out of the cache of the updating instance at any time and would have to be reacquired. The shared global resources on the root and branch blocks are not a performance issue, as long as another instance reads only these blocks. If a branch block has to be modified because a leaf block splits, then Oracle escalates the S mode to an X mode, thus increasing the probability of conflict. The dispersion of key values in an index can be very important. With monotonically increasing index keys, a hot spot can be created in the right edge of the index key.

Block Accesses During DELETE Statement Execution

Oracle accesses blocks in the cache for a delete in a similar way that it does for an update. Oracle scans the table for the block containing the row to be deleted. Therefore, if the table does not have an index, the transaction reads the segment header, reads the block, and then modifies the block. The transaction can create free space in the block so that if the data in the block drops below PCTUSED, the block is linked to the free list.

Consequently, the transaction acquires the segment header or free list group block in exclusive mode. Then the block in question is returned to the instance's free list group, if there is one. To avoid excess overhead during peak processing periods, you should schedule massive deletions to occur during off-peak hours.

Block Accesses During SELECT Statement Execution

A SELECT statement reads a buffer in either SCUR or CR mode. To access a buffer in SCUR mode, such as for segment headers, a global shared resource must be acquired. Most of the buffer accesses for SELECT statements are in CR mode and may not involve global resource operations. When a transaction needs to acquire a buffer in CR mode, three scenarios are possible:

If another instance holds the needed block in exclusive mode, then the holding instance creates a consistent read version of the block and ships the block to the requesting instance. No resource operations are involved.
If another instance holds the needed block in shared mode, then the holding instance transfers a copy of the block to the requesting instance; the requesting instance is granted a shared mode on the block.
If no other instance has the block in its cache, then the requesting instance is granted a shared mode on the block and the instance reads it from disk.

For tables that have read-only data, you can greatly minimize the overhead for SELECT statements by putting read-only tables into read-only tablespaces. Do this using the ALTER TABLESPACE READ ONLY statement. Making a tablespace read-only has two main advantages:

Workload Distribution Concepts in Real Application Clusters

Partitioning distributes workloads among existing Real Application Clusters instances to effectively use hardware resources. With Cache Fusion in Real Application Clusters, partitioning is less critical. This is because the default cache coherency mechanism consumes fewer machine resources than the I/O-based forced disk write architecture used in previous Oracle releases.

To reduce Real Application Clusters overhead, each instance in a cluster should ideally perform most DML operations against a set of database tables that is not frequently modified by other instances. However, variables such as CPU and memory use are also important factors. In many cases, the performance and scalability gains of distributing data access based on load surpasses the loss of performance due to excess inter-instance communication.

There are no strict rules about implementing application partitioning. In general, however, you can apply several strategies to partition application workloads. These strategies are not necessarily mutually exclusive and are discussed in the following sections:

Functional Partitioning

Functional partitioning is often the first logical approach to achieve an optimally performing environment in terms of Real Application Clusters overhead. Modules and functional areas usually share only a small subset of Oracle objects, so contention is limited.

On the other hand, as you integrate all modules of your application, there will always be common objects for a given set of modules on any workload. In other words, it is impossible to completely eliminate Real Application Clusters overhead. Therefore, the ideal partitioning strategy depends on how the modules interact, as well as on how each module uses system resources.

Separating E-Commerce and Data Warehousing Processing

Another application partitioning method is to separate online e-commerce processing from data warehousing workloads. For example, by executing long running reports on one node, you can reduce excessive CPU use on another node dedicated to OLTP. This improves overall response times for OLTP users while providing more CPU power for reporting.

Although some Real Application Clusters overhead is expected when the reports read data recently modified by OLTP transactions, it is very unlikely that the overhead substantially affects performance. Reports require consistent read versions of buffers modified by the OLTP instance. To accommodate the data warehousing instance's requests, the OLTP instance constructs the consistent read buffers and transfers them to the data warehousing instance. In this case, there are minimal resource operations.

Departmental and User Partitioning

An alternative partitioning method that increases scalability is departmental and user partitioning. There are several methods for implementing departmental and user partitioning.

For one type of departmental partitioning, separate the tables by access groups based on geographic location. For example, assume a hotel reservation system processes room requests for hotels worldwide. In this case, you might partition the application by geographic markets such as:

European Market
North American Market
Central and South American Market
Asia Pacific Market

Physical Table Partitioning

By accurately implementing table partitioning by range, you can reduce concurrent access to the same blocks of a particular table from different instances. An example of employing table partitioning to reduce overhead is the way Oracle Applications 11i implements batch job processing.

One of the main tables used for batch processing in Oracle Applications is FND_CONCURRENT_REQUESTS, which is a batch queue table. When a user requests a batch job, Oracle inserts a row into the queue. The Concurrent Managers are processes that periodically query the queue, pick up requests to be run, and update the requests' statuses.

In a Real Application Clusters configuration with Concurrent Managers processing batch requests from two instances, there can be a high volume of DML statements on FND_CONCURRENT_REQUESTS. The DML statements might consist of INSERT statements from the requesting users and UPDATE statements from the Concurrent Managers. This can increase the cache transfer frequencies for blocks belonging to that table and its indexes.

To reduce Real Application Clusters overhead, partition the FND_CONCURRENT_REQUESTS based on the INSTANCE_NUMBER column. When a user submits a request, Oracle reads INSTANCE_NUMBER from V$INSTANCE and stores its value with other request information. That way, Oracle places requests generated from users connected to different instances on different partitions.

By default, each Concurrent Manager only processes requests from the instance to which it is connected, which means a single table partition. This nearly eliminates contention on FND_CONCURRENT_REQUESTS and its indexes, most of which were created as local indexes.

Similar table partitioning techniques can be very effective at reducing Real Application Clusters overhead for tables subject to very high volumes of DML activity. However, carefully consider the development costs associated with application changes needed to implement that type of solution. With Cache Fusion, most applications can achieve acceptable scalability without code changes.

See Also:

Oracle9i Database Administrator's Guide for more information on creating and managing partitioned tables

Transaction Partitioning

Transaction partitioning is the lowest level partitioning method. This method requires a three-tiered architecture where clients and servers are separated by a transaction monitor processing layer. Based on the content of a transaction, the transaction monitor routes transactions that act on specific tables by way of specific nodes. The correct node for execution of the transaction is a function of the actual data values being used in the transaction. This process is more commonly known as data dependent routing.

Using this method, you can create and load your tables using any method because the transaction monitor determines which node processes a particular transaction. Transaction partitioning also enables you to achieve fine-grained transaction control. This makes transaction processing monitors very scalable. However, significant development effort is required to deploy this method.

You can accomplish data-dependent routing in one of two ways. If the partitioning of the tables fits well within actual partition usage patterns, in other words, you partitioned the table by locations and users are similarly partitionable, then you can accomplish manual routing by having users connect to the instance that is running the relevant application. Otherwise, the administration of data-dependent routing can be complex and can involve additional application code.

You can simplify the process if the application uses a transaction monitor or remote procedure call (RPC) mechanism. It is possible to place code into the transaction monitor's configuration that defines a data-dependent routing strategy. You must base this code on the input RPC arguments. Similarly, you could code this strategy within the procedural code using case statements to determine which instance should execute a particular transaction.

Workload Characterization in Real Application Clusters

One of the most important steps in developing scalable systems is to perform workload characterization studies. By understanding the application load characteristics, you can properly plan for growth and make use of the system resources available to provide optimal performance and scalability.

An accurate workload characterization can help you decide how to partition your application for Real Application Clusters. The steps discussed in this section describe a methodology for workload characterization to implement functional partitioning:

Step 1: Define Your System's Major Functional Areas

Identify the major functions of your application. For example, assume a major hotel chain develops a system to automate the following high-level functions:

Reservations
Property Management and Maintenance
Sales and Marketing
Front Desk, Concierge, and Dining Facilities Management

Also determine which users are going to access the data from each of the functional areas.

Step 2: Estimate Each Functional Area's System Resource Consumption

It is important to estimate how much system resources, such as CPU, memory, and so on, that each module or functional area is expected to consume during peak system use. If the system is not yet in production, then this involves predicting the behavior of hypothetical workloads. Often, this estimation is not very precise.

If your system is already in production, or if you have a test system with similar characteristics, then it is easier to compute key performance indicators by module or functional area. For that, you should have an easy way to determine what application module a given database session is running. The PROGRAM column in the V$SESSION view gives you the name of the executable running on the client side. If different modules run the same executable, then that information is not adequate. In these cases, use Oracle's DBMS_APPLICATION_INFO package to provide additional information.

For instance, Oracle Applications 11i calls the DBMS_APPLICATION_INFO.SET_MODULE procedure to register the Oracle Forms and Oracle Reports names in the MODULE column in V$SESSION. This is useful because you can gather all relevant session statistics from V$SESSTAT, join them to V$SESSION and group them by MODULE. Then you can break down instance statistics by module to produce a workload profile. The following example syntax does this:

SELECT s.module, SUM(st.value)
FROM V$SESSION S, V$SESSTAT ST
WHERE s.sid=st.sid
AND st.statistic#=12 /* CPU used by this session */
GROUP BY s.module

Another Oracle workload characterization feature is the BEFORE LOGOFF ON DATABASE event trigger. Use this trigger with the information from V$SESSION to gather relevant statistics from sessions just before they disconnect and store those statistics in a table. Because this operation requires only one INSERT ... AS SELECT statement for each session, the associated overhead can be minimal. However, take care not to gather too many unnecessary statistics that might increase the trigger overhead. With that in mind, you can use this trigger in a production environment with minimal effect on performance. The following syntax is an example of this:

CREATE OR REPLACE TRIGGER my_logoff
BEFORE LOGOFF ON DATABASE
BEGIN
 INSERT INTO w_session 
  (instance_number, logoff_time, sid, audsid, module, program, cpu, pgamem)
  SELECT p.value, 
SYSDATE,s1.sid,s1.audsid,s1.module,s1.program,s2.value,s3.value
  FROM V$SESSION s1, V$MYSTAT s2, V$MYSTAT s3, V$PARAMETER p
  WHERE s2.statistic#=12  /* CPU used by this session */ 
and s3.statistic#=21 /* session pga memory max */ 
AND s1.sid=s2.sid
AND s1.sid=s3.sid
AND p.name='instance_number';
END;
/

After computing CPU and memory use for each module during peak system use, the possibilities in terms of application partitioning become clearer and solutions also become more obvious. With this type of information, you can distribute application modules among existing instances so that you can fully exploit the capacity of each node.

However, you should not base your partitioning strategy only on system resource consumption. In some cases, in high volume OLTP systems with different application modules frequently modifying the same tables, Real Application Clusters overhead can become an important factor for determining the ideal scaling configuration. In this case, also consider how each functional area accesses data.

See Also:

Oracle9i Supplied PL/SQL Packages Reference for more information about Oracle packages

Step 3: Analyze Each Functional Area's Data Access Pattern

Functional areas that access disjoint sets of tables perform best with Real Application Clusters. For that reason, your focus in data access analysis should be to identify and study tables that are accessed by more than one functional area--in other words, overlaps among functional areas.

Step 3.1: Identify Table Access Requirements and Define Overlaps

Determine which tables each functional area accesses and identify the overlaps. Overlaps are tables that users from more than one functional area access. Table 3-1 shows the overlapping tables from this example in bold; the remaining tables are accessed exclusively by the functions denoted by the column headings.

Table 3-1 Example of Overlapping Tables

Hotel Reservation Operations	Front Desk Operations
Table 1	Table 12
Table 7	Table 14
Table 15	Table 15
Table 11	Table 16
Table 19	Table 19
Table 20	Table 20

Your objective is to identify overlaps that can cause global conflicts and thus might adversely affect application performance. In this example, both functions concurrently access three tables. The remaining tables that are accessed exclusively require fewer resources.

Step 3.2: Define the Access Type for Each Overlap

Determine the access type for each overlap as shown in Table 3-2.

Table 3-2 Example of Table Access Types

Overlap Access Type by Reservations	Overlapping Tables	Overlap Access Type by Front Desk
S (Select) I (Insert) U (Update)	Table 15 Table 19 Table 20	S I U

Overlap Access Type by Reservations

Overlapping Tables

Overlap Access Type by Front Desk

S (Select)

I (Insert)

U (Update)

Table 15

Table 19

Table 20

In this example, both functions access:

Table 15 for selects
Table 19 for inserts
Table 20 for updates

Step 3.3: Identify Transaction Volumes

Estimate the number of transactions that you expect the overlaps to generate as shown in Table 3-3.

Table 3-3 Example of Table Transaction Volumes

Transaction Overlap by Reservations	Overlaps	Transaction Overlap by Front Desk
S (10 per second) I (100 per second) U (50 per second)	Table 15 Table 19 Table 20	S (50 per second) I (10 per second) U (90 per second)

Transaction Overlap by Reservations

Overlaps

Transaction Overlap by Front Desk

S (10 per second)

I (100 per second)

U (50 per second)

Table 15

Table 19

Table 20

S (50 per second)

I (10 per second)

U (90 per second)

Given these transaction volumes, the overlap tables can be a performance problem. However, if the application infrequently accesses the tables, the volumes shown in Table 3-3 may not be a problem.

Step 3.4: Classify Overlaps

Use the following criteria to determine how to improve the scalability of tables accessed by more than one functional area:

Ignore non-overlapping tables, select-only overlaps, and low-frequency overlaps. From the previous example, you would ignore tables 1, 7, 11, 12, 14, 15, and 16.
Partition your application to minimize the number of high frequency overlaps occurring from different instances. In other words, try to place functional areas subject to high frequency overlaps in the same instance.
Remaining overlaps can be resolved using other techniques like departmental partitioning or physical table partitioning.
Keep in mind that some overlapping occurs in any system, and in most cases, Cache Fusion provides acceptable performance with no need for code changes.
You can minimize the effect of high frequency INSERT overlaps by using free list groups in the insert intensive tables.

See Also :
"Using Free List Groups For Concurrent Inserts from Multiple Nodes"

Scaling-Up and Partitioning in Real Application Clusters

If you have properly partitioned your application for Real Application Clusters, then as the size of your database increases you can maintain the same partitioning strategy and simultaneously achieve optimal performance. The partitioning method to use when adding new functionality depends on the types of data the new functions access. If the functions access disjoint data, then your existing partitioning scheme should be adequate. If the new functions access the same data as the existing functions, then you may need to change your partitioning strategy.

If your application attracts more users than you expected, then you may need to add more instances. Adding a new instance can also require that you repartition your application.

Before adding instances to your Real Application Clusters environment, analyze the new instance's data access requirements. If the new instance accesses its own subset of data, or data that is not accessed by existing instances, then your current partitioning strategy should adequately prevent data contention. However, if the new instance accesses existing data, consider the following issues:

If you are adding new functionality to the new instance and the new functionality requires access to existing tables, then consider revising your partitioning strategy.
You may also need to alter your partitioning strategy if you reassign some users of an existing application to the additional instance.

See Also:
Oracle9i Real Application Clusters Administration for information about adding instances

3 Scaling Applications for Real Application Clusters

Overview of Development Techniques in Real Application Clusters

Begin with an Analysis

SQL Statement Execution in Real Application Clusters

Block Accesses During INSERT Statement Execution

Block Accesses During UPDATE Statement Execution

Block Accesses During DELETE Statement Execution

Block Accesses During SELECT Statement Execution

Workload Distribution Concepts in Real Application Clusters

Functional Partitioning

Separating E-Commerce and Data Warehousing Processing

Departmental and User Partitioning

Physical Table Partitioning

Transaction Partitioning

Workload Characterization in Real Application Clusters

Step 1: Define Your System's Major Functional Areas

Step 2: Estimate Each Functional Area's System Resource Consumption

Step 3: Analyze Each Functional Area's Data Access Pattern

Step 3.1: Identify Table Access Requirements and Define Overlaps

Table 3-1 Example of Overlapping Tables

Step 3.2: Define the Access Type for Each Overlap

Step 3.3: Identify Transaction Volumes

Step 3.4: Classify Overlaps

Scaling-Up and Partitioning in Real Application Clusters

3
Scaling Applications for Real Application Clusters