|Oracle8 Parallel Server Concepts & Administration
This chapter provides a conceptual framework for optimizing Oracle Parallel Server application design. It includes the following sections:
See Also: Oracle8 Tuning for a discussion of performance tuning principles and method.
The level of detail to which you must analyze an application depends upon your goals for the use of Oracle Parallel Server. If you need OPS to boost overall database throughput, then a detailed analysis of the database design and application workload profile will be necessary. This is to ensure that the additional CPU power provided by each node of the parallel server is fully used for application processing. Even if you are using Oracle Parallel Server primarily to provide high availability, careful analysis will enable you to predict the resources that will be needed by your workload.
Experience gained over many benchmark and real applications shows that, for optimal performance, Oracle Parallel Server systems must minimize the computing resources used for parallel cache management. This means minimizing the number of instance lock operations. A successful OPS implementation ensures that each node performs very few instance lock operations and subsequently the machine-to-machine high speed interconnect traffic is within the design limitations of the cluster.
You cannot successfully minimize the number of PCM lock operations during the final fine tuning phase of the database lifetime. Rather, you must plan this early in the physical database design process.
See Also: Chapter 13, "Designing a Database for Parallel Server", for a case study which shows how to design applications to take advantage of the Oracle Parallel Server.
To understand your application profile you must classify tables according to application functions and access patterns. This section describes:
The following comments apply equally to clustered tables or non-clustered tables.
Beyond performing the usual application and data analysis phases, the database designer for parallel server must anticipate the types of transactions or business functions that may cause excessive lock conversion rates. You must cross reference the core application tables and their access patterns with the application functions.
See Also: Chapter 13, "Designing a Database for Parallel Server", for worksheets you can use to analyze table access patterns.
With tables that are predominantly read-only, all Oracle Parallel Server nodes quickly initialize the PCM locks to shared mode and very little lock activity takes place. Read-only tables and their associated index structures require the allocation of very few PCM locks. With this table type you can expect good performance and scalability with Oracle Parallel Server.
Also consider putting tables in read-only tablespaces, using the SQL statement ALTER TABLESPACE READ ONLY. This has two advantages: it speeds up recovery, and no PCM instance locks are required.
Scalability of the parallel query on an Oracle Parallel Server environment is subject to the interconnect speed between the nodes. You may need to run high levels of parallelism just to keep the processors busy. It is not unusual to run a degree of parallelism three times the number of nodes (or processors).
These files should have their own PCM lock as specified in the GC_FILES_TO_LOCKS parameter, even if the application is read-only Large sorts, such as queries utilizing SORT MERGE JOINs, or with GROUP-BYs and ORDER-BYs, can update the data dictionary in the SYSTEM tablespace.
"Setting the Degree of Parallelism" in Oracle8 Tuning.
Random SELECT and UPDATE tables (that is, tables that are not partitioned) have transactions that may read and then subsequently update any of the rows in a table. This kind of access requires many lock conversions. First, the instance executing the transaction must obtain a shared PCM lock on the data block. This lock request may cause a lock downgrade operation on another node. The instance executing the transaction must finally obtain an exclusive mode PCM lock when the UPDATE is actually performed.
If user transactions on different Oracle Parallel Server nodes modify data blocks locked by the same PCM lock concurrently, there will be a noticeable performance penalty. In some cases you can reduce this contention by creating additional hashed PCM locks. In large tables, however, hardware and practical limitations may mean that the number of hashed PCM locks you can effectively use may be limited. For example, to reduce false contention you would need millions of hashed PCM locks--but memory limitations and startup time would make this impossible. On supported platforms, fine grain locks offer a viable and economical solution.
For this type of table, if none of the table's index keys are actually updated, then the index's PCM locks are only converted to shared mode and thus require few PCM locks.
Transactions on random INSERT, UPDATE and DELETE tables require reading a number of data blocks and then modifying some or all of the data blocks read. This process for each of the data blocks specified again requires converting the PCM lock to shared mode and then converting it to exclusive mode upon block modification. This process has the same performance issues as random SELECT and UPDATE tables.
For this table type more performance issues exist for two main reasons: index data blocks are changed, and contention occurs for data blocks on the table's free list.
In INSERT, DELETE and UPDATE transactions that modify indexed keys, you need to maintain the table's indexes. This process requires the modification of additional index blocks--and so the number of potential lock converts increases. In addition, index blocks will probably require additional lock converts since users on other nodes will be using the index to access other data. This applies particularly to the initial root components of the index where block splitting may be taking place. This causes more lock converts from null to exclusive and vice versa on all nodes within the cluster.
If the INSERT and DELETE operations are subject to long running transactions, then there is a high chance that another Oracle Parallel Server instance will require read consistency information to complete its transactions. This process will force yet more lock conversions as rollback segment data blocks are flushed to disk and are made available to other instances.
Index block contention involving high lock convert rates must be avoided at all costs, if performance is a critical issue in the Oracle Parallel Server implementation.
Index block contention can be made more extreme when using a sequence number generator to generate unique keys for a table from multiple OPS nodes. When generating unique keys, make the instance number part of the primary key so that each instance performs INSERTs into a different part of the index. Spreading the INSERT load over the full width of the index can improve both single and multiple instance performance.
In INSERT operations the allocation of free space within an extent may also cause high lock convert rates. This is because multiple instances may wish to insert new rows into the same data blocks, or into data blocks which are close together. If these data blocks are managed by the same PCM lock, there will be contention. To avoid this, create tables so as to allow the use of multiple free lists and multiple free list groups.
Having analyzed the application workload, you can now plan the application's OPS implementation. Using the access profile you can see which transactions will run well over multiple Oracle Parallel Server nodes, and which transactions should be executed within a single Oracle Parallel Server node. In many cases compromises and trade-offs are required to ensure that the application performs as needed.
Note: Load balancing between nodes should not be the main objective. Whereas load balancing is useful in a benchmarking situation, it may not be useful in a real-world application. Partitioning is the key to performance in an Oracle Parallel Server system.
This section covers the following topics:
The database designer must clearly understand the system performance implications and design trade-offs made by application partitioning. Always bear in mind that your goal is to minimize synchronization: this will result in optimized performance.
As noted earlier, if the number of lock conversions is minimized the performance of the Oracle Parallel Server system will be predictable and scalable. By partitioning the application and/or data you can create and maintain cache affinities of database data with respect to specific nodes of a cluster. A partitioned application ensures that a minimum number of lock conversions are performed, thus data block pinging and Integrated DLM activity should be very modest. If excessive IDLM lock activity occurs in a partitioned application, your partitioning strategy may be inappropriate, or the database creation and tuning process was incorrect.
Many partitioning techniques exist to achieve high system performance. One of the simplest ways to break up or partition the load upon the database is to run different applications that access the same database on different nodes of the cluster. For example, one application may only reference a fixed set of tables that reside in one set of datafiles, and another application may reference a different set of tables that reside in a different set of datafiles. These applications can be run on different nodes of a cluster and should yield good performance if the datafiles are assigned different PCM locks. There will be no conflict for the same database objects (since they are in different files) and hence no conflict for the same database blocks.
This scenario is particularly applicable to applications that during the day need to support many users and a high OLTP workload, and during the night need to run a high batch and decision support workload. In this case applications can be partitioned amongst the cluster nodes to sustain good OLTP performance during the day.
This model is very similar to a distributed database model, where tables that are accessed together are stored together. At night, when it is necessary to access tables that may be partitioned for OLTP purposes, you still can exploit the advantages of a single database: all the data is stored effectively within a single database. Advantages include improved batch and decision support, query performance, reduced network traffic, and data replication issues.
With this approach you must ensure that each application's tables and indexes are stored such that one PCM lock does not cover any data blocks that are used by both applications. Should this happen the purpose of partitioning would be lost. To rectify the situation you would store each application's table and index data in separate datafiles.
Applications which share a set of SQL statements perform best when they run on the same instance. Because shared SQL areas are not shared across instances, similar sets of SQL statements should run on one instance to improve memory usage and reduce parsing.
Sometimes the partitioning of applications between nodes may not be possible. As an alternative approach, you can partition the database objects themselves. To do this effectively you must analyze the application profile in depth. You may or may not need to split a table into multiple tables. In Oracle Parallel Server situations the partitioning process can involve horizontal partitioning of the table between predefined key ranges.
In addition to partitioning and splitting database objects, you must ensure that each transaction from a user is executed upon the correct OPS instance. The correct node for execution of the transaction is a function of the actual data values being used in the transaction. This process is more commonly known as data-dependent routing.
The process of partitioning a table for purposes of increasing parallel server performance brings with it various development and administration implications.
From a development perspective, as soon as the table is partitioned the quantity and complexity of application code increases. In addition, partitioning a table may compromise the performance of other application functions such as batch and decision support queries.
The administration of data-dependent routing may be complex and involve additional application code. The process may be simplified if a transaction processing monitor (TPM) or RPC mechanism is used by the application. It is possible to code into the configuration of the TPM a data-dependent routing strategy based upon the input RPC arguments. Similarly, this process could be coded into piece of procedural code using a case statement to determine which instance should execute the transaction.
See Also: "Client-Server Systems" on page 1-22