6
Database Design Techniques

This chapter describes database design techniques for Oracle Parallel Server environments. The sections in this chapter include:

Principles of Database Design for Oracle Parallel Server

When designing database layouts for shared Oracle Parallel Server databases, remember that accessing globally shared data from multiple nodes increases transaction processing costs. In other words, multi-node transactions incur more wait time and higher CPU consumption than transactions processed on single-node systems. Because of this, carefully consider the database access characteristics of your applications you can create scalable database designs. In general, you can achieve scalable systems by:

Assigning transactions with similar data access characteristics to specific nodes
Creating data objects with parameters that enable more efficient access when globally shared

Many of the principles of the above to points have been covered in Chapter 5. The most scalable and efficient application designs for clustered systems enable a high degree of transaction affinity to the data the transactions access on the nodes. The more that your application' s data access is local and thus does not require cross-instance synchronization, the more efficient your application.

All systems have a certain proportion of data with lower node affinity. This data is shared across the cluster and thus requires synchronization. Cache Fusion reduces the costs associated with globally shared database partitions by more efficiently keeping data synchronized across multiple nodes. This increases the data's availability to all nodes in the cluster. Some features exist that help optimize concurrent access to globally shared data.

Some database resources can become critical when certain transactions execute in an Oracle Parallel Server environment. A high rate of inter-instance access to blocks in the same table can cause increased I/O, messaging, context switches and general processing overhead. If a table has one or more indexes to maintain, the cost may increase even more due to the relative complexity of an index access. Searching for free space and allocating it when inserting new data requires access to space management structures, such as segment free lists. Also, generating unique sequence numbers using an Oracle sequence number can become a severe bottleneck if every node in the cluster uses it.

All resources can be either locally cached on disk or they must be re-generated frequently. Each type of database operation, such as INSERTs, UPDATEs, DELETEs, and SELECTs, require different types of resources depending on whether your application runs in shared or exclusive mode.

Database Operations, Block Types, and Access Control

Most business transactions involve a mixture of INSERTs, UPDATEs, DELETEs, and SELECTs. Exactly what percentage of each of these a transaction uses depends on the business transaction type.

Likewise, each of these operations accesses certain types of data blocks. These block types can be categorized as:

Data blocks
Index blocks (root, branch, leaf)
Segment header blocks
Rollback segment header blocks
Rollback segment blocks

Concurrent access to data blocks in a cache is controlled by access or lock modes. In the buffer cache, a block can be accessed in exclusive current read (XCUR), shared current read (SCUR), or consistent read (CR) mode. To guarantee global cache coherency, these access modes map to global lock modes as shown in Table 6-1:

Table 6-1 Parallel Cache Management Lock Mode and Buffer State

Parallel Cache Management Lock Mode	Buffer State Name	Description
X	XCUR	Instance has an EXCLUSIVE lock for this buffer.
S	SCUR	Instance has a SHARED lock for this buffer.
N	CR	Instance has a NULL lock for this buffer.

Each operation accesses certain types of blocks in a particular mode, as shown in Figure 6-1:

Figure 6-1 Modes of Block Access

Block Accesses During INSERTS

When Oracle processes an INSERT, it reads the segment header of a database object. This might mean that the INSERT must read the segment header of a table or an index to locate a block with sufficient space in which to fit a new row into the segment free list. Therefore, to process INSERTs, Oracle reads the CURRENT, or most up-to-date version of the header block. If there is enough free space in the block after completing the INSERT, the block remains on the free list and the transaction reads the corresponding data block and writes to it. For this sequence of events:

The segment header is acquired in SCUR mode, which means that the instance must request a global S lock
The data block is acquired in XCUR mode, or globally in X mode

If the free space in the block is insufficient after inserting, Oracle unlinks the block from the free list. This means Oracle updates the segment header block containing the free list. For this sequence of events:

The segment header block is first acquired in SCUR mode (global S lock)
After checking the block, Oracle then escalates the buffer access mode to XCUR, (global X)
Oracle removes the block from the free list
If a new block beyond the current highwater mark is used, Oracle raises the highwater mark
The data block is read in XCUR mode and written to disk

This scenario assumes that the highwater mark of the segment, or the last or highest block in the segment containing data, resides in an allocated extent. If no free list groups were defined, the highwater mark as well as a map of allocated extents is stored in the segment header. If Oracle must allocate an additional extent to insert the object, Oracle raises the highwater mark and updates the extent map. In other words, Oracle changes the segment header block in a consistent fashion; this also requires Oracle to lock the header block in exclusive mode.

Note:
For the preceding explanations and the following descriptions, assume that all the blocks required for the operation are cached in memory.

For an insert into a table with an index, even more data block accesses are required. First, Oracle reads the header block of the index segment in SCUR mode, then Oracle reads the root and branch blocks in SCUR mode. Finally, Oracle reads the leaf block in XCUR mode. Depending on the height of the index tree, Oracle would also have to read more branch blocks. If a free list modification is required, Oracle must escalate the index segment header lock mode to XCUR mode. If there is concurrency for the segment header due to free list modifications, the header block can ping back and forth between multiple instances.

Using FREELIST GROUPS at table creation effectively achieves free list partitioning. The number of FREELIST GROUPS that you define for an object should a least match the number of nodes in the cluster that participate in INSERTs. With free list groups, the free lists are in a separate block that is physically located after the header block. Processes inserting rows into the same objects from different nodes must hash to different free list group blocks. Hence, the free list group blocks remain local to a particular instance. This means Oracle does not need to acquire globally conflicting locks.

Static and Dynamic Extent Allocation

There are two methods with which Oracle allocates extents to provide sufficient free space for newly inserted rows. One method requires manual intervention; the other is derived automatically from certain settings. Static allocation requires, for example, that you issue statements such as:

   CREATE TABLE
   STORAGE (FREELIST GROUPS 2)
   ALTER TABLE 
   ALLOCATE EXTENT
   FILE SIZE INSTANCE;

when creating or altering a table or index. This type of statement allows the allocation of extents of data blocks from particular physical files to free list groups and thus to instances. Note that you must set the parameter INSTANCE_NUMBER to ensure than an instance consistently uses a particular allocation of free list groups. If you do not set a value for INSTANCE_NUMBER, Oracle allocates the space to the object; Oracle does not take the space from a particular free list group. Instead, Oracle uses the master free list in the general segment header.

When Oracle pre-allocates extents in this manner, Oracle contiguously allocates blocks to a particular free list group. Files containing these objects are good candidates for 1:N locks with a blocking factor; in other words, use the ! syntax when you set the GC_FILES_TO_LOCKS parameter and when you set the blocking factor to the extent size allocated. For example, if your extent size is 10 blocks for file number 4 consisting of 1000 blocks, then use the following syntax:

   GC_FILES_TO_LOCKS3D "43D1000!10"

This ensures that the blocks in that extent are covered by one lock.

For dynamic allocations, simply set GC_FILES_TO_LOCKS with a blocking factor, as described in the previous syntax. The space management layer determines the new extent size based on the value for !n.

You can use several methods to define this, for example:

   GC_FILES_TO_LOCKS3D"43D1000!10"

   GC_FILES_TO_LOCKS3D"43D1000!10R"

   GC_FILES_TO_LOCKS3D"43D0!10"

where the first example assigns 10 contiguous blocks to a fixed lock out of 1000 locks pooled for this file. The second example assigns a releasable lock in the same manner, and the third uses releasable locks out of an unlimited pool.

Depending on which method you use, a certain number of contiguous blocks comprise an extent of free space and these blocks are covered by the same lock. The method you use depends on your application's requirements. If ease of use is a high priority and the data files are extensible, use dynamic allocation as shown by the third entry in the previous example set. However, static allocation has the advantage of reducing run-time space management and the required data dictionary table and row cache updates. This is because the extents are already allocated.

In summary, when designing for INSERT-intensive transactions in Oracle Parallel Server, if you have identified certain tables as "insert only" when determining the partitioning strategy, then:

Run inserting transactions only from one node
Use free list groups on these tables and on any indexes associated with them
Use the ! syntax for the GC_FILES_TO_LOCKS parameter when dynamically allocating space
Pre-allocate extents to the table or index using the CREATE TABLE... ALLOCATE EXTENTS or ALTER TABLE... ALLOCATE EXTENTS statements and set the blocking factor ! for the GC_FILES_TO_LOCKS parameter to the size of the extents

Block Accesses During UPDATES

An UPDATE statement always reads a database block in its current version and sets the buffer to XCUR mode. Globally, this maps to a request for an X lock on the block. Assuming all blocks are cached, the transaction:

Reads the buffer in XCUR mode and get a global X lock
Writes to the buffer and modify a row
If the updated row fits into the same block, the instance does not need to acquire new blocks from the free list and the modification is complete; segment header access is unnecessary
The instance retains the global X lock until another instance requests a lock on the block in a conflicting mode, so Oracle writes the dirty buffer to disk for the other instance to "see" the most current changes
Oracle closes the lock or retains it in NULL mode, and Oracle reads the buffer from disk if the local instance requests the block for subsequent updates; this is known as a "forced read"

If Oracle has built an index on the table, the UPDATE:

Reads the root block of the index in SCUR mode
Reads one or more branch blocks in SCUR mode
Reads the leaf block and pins it into the cache in SCUR mode
Reads the data block in XCUR mode
Modifies the data block

If the index key value was changed, Oracle:

Rereads the root and branch blocks in SCUR mode
Reads the leaf block in XCUR mode
Modifies the index key value for the updated row

During the update operation with an index, a block can be "pinged" out of the cache of the updating instance at any time and would have to be reacquired. The shared global locks on the root and branch blocks are not an issue, as long as another instance reads only these blocks. If a branch block has to be modified because a leaf block splits, Oracle escalates the S lock to an X lock, thus increasing the probability of conflict.

It is therefore essential to clearly identify frequently updated tables when you establish your partitioning strategy. The update frequency, randomness of access, and data referencing patterns eventually determine the layout and locking strategy you use. For random access, a locking policy using 1:1 releasable locks is appropriate. If certain transactions access the same range of blocks repetitively, you can use a table partitioning and transaction routing strategy. Once data partitions within a table can be established, very few fixed locks are needed.

In the case of frequent random access, the probability of contention for the same data or index blocks can be decreased by setting PCTFREE to a higher value when creating and loading the table, so that a block is populated by fewer rows. However, the trade-off might be more block reads and more I/O to access the same number of rows.

See Also :
Oracle8i Parallel Server Concepts for more information about 1:1 and 1:n locks, and releasable and fixed lock durations.

Block Accesses During DELETES

Oracle accesses blocks in the cache for a DELETE in a similar way that it does for an update. Oracle scans the table for the block containing the row to be deleted. Therefore, if the table does not have an index, the transaction reads the segment header, reads the block, and then modifies the block. The transaction creates free space in the block so that if the data in the block drops below PCTUSED, the block is linked to the free list.

Consequently, the transaction acquires the segment header or free list group block in exclusive mode. The block in question is returned to the instance's free list group, if there is one. In general, you should schedule massive deletions to occur during off-peak hours and you should run them from one instance.

Block Accesses During SELECTS

A SELECT reads a buffer in either SCUR or CR mode. For an SCR, such as for segment headers or when the only readers in the system are for a particular block, a global S lock is acquired. When a block is modified or contains uncommitted changes, and a consistent read version is requested, the buffer containing that version will be in CR mode. If the most recent modifications are made on another node, a CR copy from the modifying node must be requested. Once the request has completed, no lock is retained on the instance that received the consistent read version of the block.

For a full table scan, a SELECT may have to read the segment header in order to determine the extent boundaries for the extents allocated to a table. As this requires Shared Current access to the buffer containing the header block, in Parallel Server the global S lock might conflict with an INSERT that attempts to modify the header block.

Global Cache Coherence Work and Block Classes

In general, data blocks, index blocks, rollback segment headers, free list group blocks, rollback segment blocks, and segment headers are considered to be different classes of blocks. All of these classes are subject to "pinging" because they represent structures of a shared database.

For rollback segment headers and rollback segment blocks, the effect of global cache synchronization is less significant, because instances should privately own rollback segments in Oracle Parallel Server. This limits writes to a rollback segment to the local instance. With Cache Fusion for consistent reads, the effect of rollback segment cache coherence operations is limited. Most consistent read information is generated by the local instance which creates a version of a data block and sends it directly to the requesting instance.

Without free list groups and with suboptimal physical storage parameters for an index or a table, segment header blocks can be "pinged" frequently. Configuring free list groups can alleviate this. As a general rule, segment header pings should amount to more than 5% of the total pings.

The V$CLASS_PING view lists the number of lock converts of a particular type for the different block classes. Use this view to monitor the proportion of pings for each block class of the total pings.

General Recommendations for Database Object Parameters

The following represent some general recommendations for database objects:

For tables or indexes that are expected to grow or shrink during regular OLTP processing, or objects that can have frequent INSERTs or DELETEs from multiple nodes, use FREELISTS and FREELIST GROUPS to build the objects.
Preallocate extents to the object's FREELIST GROUP and/or use a blocking factor when defining GC_FILES_TO_LOCKS. This ensures that blocks belonging to a particular extent are used by only one instance and that the blocks are covered by the same lock.
For objects and files with random local and remote access, restrict the number of rows contained in one block by selecting the correct database block size or by allowing more free space when loading data by setting PCTFREE to a higher value (the default is 10%). However, this results in increased space requirements.

Index Issues

In most high volume OLTP systems, inter-instance contention for index blocks increases the cost of Oracle Parallel Server processing and the commonly used B*Tree index structures are vulnerable to "pinging" at various levels. A "right-growing" tree can incur frequent pinging of one particular leaf block. While traversing the tree structure, branch blocks might have to be "forced read", because they were last modified by another instance. Leaf block splits are vulnerable because three blocks need to be modified in one transaction. Generally, some situations you should avoid are:

Leaf and branch block contention
Root block contention
Index segment header contention

The following section addresses how to avoid leaf and branch block contention.

Minimizing Leaf/Branch Block Contention

You can use various strategies to isolate or distribute access to different parts of the index and improve performance.

Use reverse-key indexes to avoid the right-growing index trees. By reversing the keys, you can achieve a broader spread of index keys over the leaf blocks of an index and thus lower the probability of accessing the same leaf and branch blocks from multiple instances. However, reverse key indexes do not allow index range scans, so carefully consider their use.

For indexes based on sequence numbers, assign different subsequences to each instance. In the case of database objects that can be partitioned based on certain characteristics, this might adequately distribute the access patterns.

For other sequentially assigned values, adjust the index value and use INSTANCE_NUMBER to generate the index key, as shown in the following example:

Note:
Use local partitioned indexes wherever possible.

Figure 6-2 shows how transactions operating on records stored in tables partitioned by range can minimize leaf and branch block contention.

Figure 6-2 Node Affinity for Transactions Against Tables Partitioned by Range

Locking Policy For Indexes

Determining the optimal locking policy for files that have indexes can be problematic. Usually, the best practice is to use releasable locks. Using fixed locks on indexes increases the potential for false pings. The following guidelines should help you when deciding how to assign locks:

Avoid a right-growing index tree by using reverse key indexes, if possible, or use techniques to spread the access over multiple leaf blocks
Partition index access by using either an appropriate index key value that is based on a logical partitioning key or by using local partitioned indexes wherever possible; route transactions to dedicated nodes based on key data values
If you cannot partition at all, use releasable locks
If you can use local index partitioning, assign fewer 1:N fixed locks to the files containing the partitions.

See Also:
Oracle8i Parallel Server Concepts for more information about 1:1 and 1:N locks, and releasable and fixed lock durations.

Using Sequence Numbers

When designing applications for Oracle Parallel Server, use sequence numbers whenever possible. To maximize the use of sequences, each instance's cache must be large enough to accommodate the sequences. The default cache size holds 20 sequence numbers. To increase this, for example to hold 200, use this syntax:

ALTER SEQUENCE SEQUENCE_NAME CACHE 200;

Calculating Sequence Number Cache Size

Base your estimates for sequence number cache size on three factors:

Transaction rate
Number of nodes in the cluster
Stability of the cluster

Using ordering suppresses caching in Oracle Parallel Server. It is normal to lose some numbers after executing the SHUTDOWN command. It is also not unusual to experience a complete loss of sequence numbers after instance failures.

External Sequence Generators

You should implement external sequence generators when:

Ordering is essential
Accountability for all numbers is required
There are high transaction rates
There are many instances
There are many sequence numbers

Note:
Avoid using database tables to hold sequence numbers.

Detecting Global Conflicts On Sequences

If sequences are insufficiently cached or not cached at all, severe performance problems may result with an increase in service times. This may also result in reduced scalability.

If you experience performance problems, examine statistics in the V$SYSTEM_EVENT view as described below to determine whether the problem is due to the use of Oracle sequences:

A problem with sequences appears in V$SYSTEM_EVENT as extended average wait times for "row cache locks" in the range of a few hundred milliseconds. The proportion of time waited for row cache locks of the total time waited for non-idle events will be very high.
For the DC_SEQUENCES parameter, the ratio of DLM_CONFLICTS to DLM_REQUESTS will be very high. If this ratio exceeds 10 to 15% and the row cache lock wait time is a significant portion of the total wait time, then it is likely that the deterioration of service times is due to insufficiently cached sequences.

Logical And Physical Database Layout

Base a physical partitioning strategy on the following considerations:

Transaction profiles for objects
- How often are they accessed?
- How are they accessed?
Function dependency for objects, if any
- Business function
- Data

You should group these objects into files and tablespaces so that you can apply consistent locking policies.

General Suggestions for Physical Layouts

Before grouping objects into tablespaces, create a pool of raw volumes that you can later assign to tablespaces. Because Oracle Parallel Server must use shared raw volumes or shared raw partitions to physically store data, the available disks should be partitioned into volumes of various sizes. Begin by considering the following recommendations:

Create a large pool of shared volumes to be assigned to logical partitions later.
Define standard sizes for raw volumes, such as 10M, 100M, 500M, and 1G.
Slice or stripe the volumes over as many disks as your initial I/O volume calculation indicates. Take into consideration that 10 to 15% read and write I/O should be added to the calculations for global cache synchronization I/O.
Review the plan. You may need to restructure the layout because some files will experience higher I/O volumes than others due to pings.

Your goal should be to create a flexible physical layout. Also make sure you account for the I/O required to preserve cache coherence and ensure that this will not adversely affect I/O latencies.

Tablespace Design

Your goal in tablespace design is to group database objects depending on their data access distributions. If you consider the dependency analyses and transaction profiles for your database objects, you can divide tablespaces into containers for the following objects:

Frequently and randomly accessed tables and indexes with a lower probability of having affinity to certain transactions or portions of the database
Tables and indexes that are mostly read or read-only and infrequently modified
Tables and indexes whose data can be viewed as subdivided into largely disjoint and autonomous data sets

Consider the following additional criteria for separating database objects into tablespaces:

Tables should be separated from indexes
Assign true, read-only tables to READ-ONLY tablespaces.
Each node should have TEMPORARY tablespaces
Group smaller reference tables in the same tablespace
Separate insert-only tables from other tables

Once you have logically grouped your database objects into partitions and physically assigned them to tablespaces, you can develop an efficient and adaptable locking policy as described in the following chapters.

Global Cache Lock Allocation

After completing your tablespace design, there will be a few distinct groupings based on their expected access distribution, estimated access frequency, and affinity to transactions and nodes. These groupings might include:

Frequently and randomly accessed files with little or no affinity to transactions or nodes and a high probability of false pinging
Read-only or "read-mostly" files
Frequently accessed files that are shared by multiple nodes but whose data are subdivided into largely autonomous data sets
Files containing rollback segments and temporary segments

Based on these subdivisions, assign locks as follows:

Assign releasable locks to files that are frequently and randomly modified by multiple nodes; this locking policy is often the best choice for index files
Do not assign locks to read-only tablespaces
Only assign fixed locks to read-mostly data that has a low probability of experiencing access conflicts and that has data portions that are accessed mostly in a disjoint fashion
Do not assign locks to rollback segments and files that belong to temporary tablespaces. Instead, allocate rollback segment locks by setting GC_ROLLBACK_LOCKS so that only a few fixed locks are necessary

Wherever you assign locks, assigned them as releasable if GC_RELEASABLE_LOCKS provides sufficient resources so that the cost of opening and closing locks can be minimized.

Conclusions And Guidelines

Your response time and throughput requirements ultimately determine how stringent and well-isolated your partitioning strategy needs to be and how much effort you should invest in achieving an optimal design. Cache Fusion removes a significant amount of overhead associated with consistent read transactions that request data other nodes. This allows you to implement a simpler database design and still achieve optimal performance.

A careful analysis of data access distributions in terms of transaction profiles and functional dependencies is the basis for allocating work to particular instances. Moreover, this makes a system more robust and more scalable.

Broadly speaking, 80% or more of overhead results from 20% or less of a given workload. Dealing with the 20% by observing some simple guidelines and caveats can produce real benefits with minimal effort:

Index block contention should be avoided by all means. However, if index key values are not modified by multiple instances, the overhead may be acceptable.
Globally shared data access may be acceptable if the probability of remote access and thus false pings is low.
Long running transaction that modify globally shared data must be avoided and should be run at times of lower system use.
Read-only tablespaces should be used wherever data remain constant.
Free list groups should be defined for partitioned as well as nonpartitioned data that are modified frequently.

In general, you should design your database for application partitioning and create the tablespaces to permit data striping. This simplifies the processing for parallel data loads and inter-instance parallel queries.

6Database Design Techniques