5
Cache Fusion and the Global
Cache Service

This chapter is an overview Cache Fusion and how the Global Cache Service (GCS) operates.

Topics in this chapter include:

Overview of Cache Fusion

Cache Fusion is a new technology that uses a high speed interprocess communication (IPC) interconnect to provide cache to cache transfers of data blocks between instances in a cluster. This eliminates disk I/O (which is inherently slow, since it is a mechanical process) and optimizes read/write concurrency. Block reads take advantage of the speed of IPC and an interconnecting network. Cache Fusion also relaxes the requirements of data partitioning.

Cache Fusion addresses these types of concurrency between instances, each of which is discussed in the following sections:

Concurrent Reads on Multiple Nodes

Concurrent reads on multiple nodes occur when two instances need to read the same data block. Real Application Clusters easily resolves this situation because multiple instances can share the same blocks for read access without cache coherency conflicts.

Concurrent Reads and Writes on Different Nodes

Concurrent reads and writes on different nodes are the dominant form of concurrency in Online Transaction Processing (OLTP) and hybrid applications. A read of a data block that was recently modified can be either for the current version of the block or for a read-consistent previous version. In both cases, the block will be transferred from one cache to the other.

Concurrent Writes on Different Nodes

Concurrent writes on different nodes occur when the same data block is modified frequently by processes on different instances.

The main features of the cache coherency model used in Cache Fusion are:

The cache-to-cache data transfer is done through the high speed IPC interconnect. This virtually eliminates any disk I/Os to achieve cache coherency.
The Global Cache Service (GCS) tracks one or more past image (PI) for a block in addition to the traditional GCS resource roles and modes. (The GCS tracks blocks that were shipped to other instances by retaining block copies in memory. Each such copy is called a past image (PI). In the event of a failure, Oracle can reconstruct the current version of a block by using a PI.
The work required for recovery in node failures is proportional to the number of failed nodes. Oracle must perform a log merge in the event of failure on multiple nodes.
The number of context switches is reduced because of the reduced sequence of round trip messages. In addition, database writer (DBWR) is not involved in Cache Fusion block transfers. Reducing the number of context switches adds to the more efficient use of the cache coherency protocol.

Global Cache Service Operations

The GCS tracks the location and status (mode and role) of data blocks, as well as the access privileges of various instances. Oracle uses the GCS for cache coherency when the current version of a data block is in one instance's buffer cache and another instance requests that block for modification. It is also used for reading blocks.

Following the initial acquisition of exclusive resources in subsequent transactions multiple transactions running on a single Real Application Clusters instance can share access to a set of data blocks without involvement of the GCS as long as the block is not transferred out of the local cache. If the block has to be transferred out of the local cache, then the Global Resource Directory is updated by the GCS.

Cache Coherency

Data blocks are the most often required database resources. The GCS manages all types of data blocks.

The GCS ensures cache coherency by requiring that instances acquire a resource cluster-wide before modifying or reading a database block. Thus, the GCS synchronizes global cache access, allowing only one instance at a time to modify a block. The GCS coordination of the buffer caches located on separate nodes provides cache coherency to Real Application Clusters. The GCS ensures the status of data blocks cached in any mode in the cluster is globally visible and maintained.

Oracle's multi-versioning architecture distinguishes between current data blocks and one or more consistent read (CR) versions of a block. A data block can reside in many buffer caches under shared resources. The current block contains changes for all committed and yet-to-be-committed transactions. A consistent read (CR) version of a block represents a consistent snapshot of the data at a previous point in time. Consistent read versions are produced by applying rollback segment information. Both current and consistent read blocks are managed by the GCS

To transfer data blocks among database caches, buffers are shipped by means of a high speed IPC interconnect. Just as in a single instance system, disk writes are only required for cache replacement. A past image (PI) of a block is kept in memory before the block is sent and if it is a dirty block. In the event of failure, Oracle can reconstruct the current version of the block by reading PIs.

GCS Resource Modes and Roles

GCS resources track the transmission of blocks through the system. The same block can exist in multiple caches as a result of block transfers. The block can be held in different modes depending on whether a resource holder intends to modify data or merely read them.

It is important to understand that a resource is identified by these factors:

Resource mode: The modes are null, shared, and exclusive.
Resource role: The roles are local and global

Resource Modes

Resource modes are generally determined by the holder, as part of a request for a data block. The resource modes determine whether the holder can modify the block. Table 5-1 compares the null (N) mode, shared (S) mode, and exclusive (X) mode.

Table 5-1 Global Cache Service Resource Modes

Resource Mode	Identifier	Description
Null	N	Holding a resource at this level conveys no access rights.
Shared	S	A protected read. When a resource is held at this level, a process cannot modify it. Multiple processes can read the resource.
Exclusive	X	When a resource is held at this level, it grants the holding process exclusive access. Other processes cannot write to the resource. Consistent reads of older blocks are still available.

Resource Roles

Oracle assigns GCS resource roles to the holder. They supplement the user-requested modes based on the knowledge of the global state of the blocks by the resource management system. The roles are either local or global:

When a block is first read into an instance's cache and other instances have not read the block, the block is said to be locally managed. It is therefore assigned a local role.
After the block has been modified by the local instance and transmitted to another instance, it is considered globally managed. It is therefore assigned a global role.

All GCS resources effectively have the local role if they only exist in one cache. Roles are mutually exclusive. If a data block was changed in one instance and subsequently transferred to another instance, the buffer containing the data is considered globally dirty. That is, the resource has the global role.

When running a single instance in exclusive mode, all concurrency control is done within the instance. With Real Application Clusters in shared mode, synchronization is accomplished by the GCS or Global Enqueue Service (GES).

Past Images

A block is initially acquired in local role, with no past images (PIs) present. Only after a block has been changed (or becomes a dirty block) and another instance requests it, does the node that dirtied the block begin to keep PIs. The resource then becomes global.

The exclusive current copy of a data block can only exist in the cache of the instance that last modified it. There might also be PIs of the block in other caches. These PIs represent earlier versions of the block with modifications that have not been written to disk, and can be used for consistent reads in the cluster.

Write Protocol and Past Image Tracking

When a block is requested for modification for a current read, the instance that last modified a data block sends the block by using a high speed IPC interconnect and retains a PI. Writes to disks are only triggered by cache replacements and checkpoints. The write protocol is largely asynchronous. This reduces the I/O requirements of an Real Application Cluster node to those comparable to a single instance.

Consider when an instance intends to initiate a write of a data block, and the resource has a global role, and it does not have the current buffer, only a PI. Under these circumstances the instance informs the GCS. The GCS then forwards the write request to the instance where the current (or most recent) version of the block is held.

The holder of the current version writes the block to disk. Then, upon completion, the holder sends a completion message to the GCS. Finally, all instances with PI buffers for the written block free their PI buffers.

The GCS always mediates global operations at the cache layer and tracks the latest global state of resources.

Real Application Clusters Resource Control Mechanisms

To guarantee coherent and accurate access to cached data, the cluster database controls access to shared resources. This includes resources such as data blocks or data structures used for other purposes such as instance management, data dictionary access, and recovery synchronization.

When Oracle reads a data block into memory, Oracle opens a GCS resource to coordinate concurrent access to the resource from multiple instances. Oracle opens or converts the resource in different modes and roles depending on whether:

The data accessed is to be modified or read
A data block exists in the cache of only one instance or in multiple caches

Oracle closes GCS resources when the block access mode is down-converted to NULL, and there no PI, or when Oracle flushes the buffer from the cache due to cache replacement.

By default, a resource is allocated for each data block in a cache. Due to Cache Fusion and the virtual elimination of immediate disk writes that occur when other instances make modification requests, the performance overhead of concurrency on shared data between instances is diminished. This reduces the tuning and administrative effort for Real Application Clusters environments.

Note:

Cache Fusion only works with the default resource control scheme. If you override Cache Fusion and set GC_FILES_TO_LOCKS in your initialization parameter file and assign resources to multiple blocks, then Oracle uses pre-9.0.1 behavior. In other words, Oracle will use forced disk writes for cross-instance modification requests. This is not recommended in most circumstances.

Eliminating the Need for Configuring Resources

The new architecture for global resource control and Oracle's breakthrough Cache Fusion technology simplify the performance tuning and administration of Real Application Clusters environments. The importance of configuring accurate resource allocations to provide optimal performance, as well as the planning of sufficient capacity for Global Cache Service and Global Enqueue Service (GES) resources has been largely reduced. If you use the default resource control scheme, you do not need special initialization parameter settings to configure resources in Real Application Clusters.

Resource Control, Cache-to-Cache Transfer, and Cache Coherency

The GCS assigns and opens resources for each database block read into the buffer cache. Oracle closes resources when the resources do not manage any more buffers or when buffered blocks are written to disks due to cache replacement and free buffer requests.

When Oracle closes a resource, it returns it to a free list from which Oracle can assign new resources. The size of the free list is by default equal to the size of the buffer cache. Oracle allocates the free list from the shared pool.

There are no special considerations for global enqueues. Their number is calculated automatically at startup and Oracle records the calculated values in the alert.log file. You do not need to set initialization parameters.

Generally, global enqueues have different uses and semantics than GCS resources. Global enqueues are used by the different kernel layers such as the row cache, the library cache and so on, to coordinate access to a variety of objects.

Cache Fusion Resource Assignment and Block Coverage

This section describes how Cache Fusion controls resource assignments. The topics in this section are:

Block Access Modes and Buffer States

There are three concurrency control concepts that need to be distinguished: buffer state, resource mode, and resource role:

The buffer state is the state of a buffer in the local cache of an instance.
The resource mode controls global access rights for instances in a cluster.
The resource role defines whether a block is cached in only one instance (local) or if it cached in multiple instances (global).

The buffer state of a block relates directly to the access mode of the block and the role assigned to the instance in relation to the block. For example, if a buffer is in exclusive current (XCUR) state, you know that an instance owns the resource in exclusive mode. In addition, if the data block is read from disk and cached in only one instance, the role is local.

There can be only one block buffered in XCUR state in the cluster at any time. To perform modifications on a block, a process must assign an XCUR buffer state to the buffer containing the data block.

If another instance requests reading the same block in its most current version, for example, then Oracle changes the access mode from exclusive to shared, sends the block and keeps a PI buffer if the buffer contained a dirty block. It sends a current read version of the block to the requesting instance. At this point, the first instance has the current block, the changes made to it, and the requesting instance also has the current block in shared mode. The role of the resource becomes global. There can be multiple shared current (SCUR) versions of this block cached at any time.

Finding the State of a Buffer

To see a buffer's state, query the STATUS column of the V$BH dynamic performance view. The V$BH view provides information about each buffer header as shown in Table 5-2.

Table 5-2 Block Access Modes and Buffer States

Block Access Mode	Buffer State Name	Description
X	XCUR	Instance has exclusive access to the block and can modify it.
S	SCUR	Instance has shared access to the block and can only perform reads.
NULL	CR	Instance can perform a consistent read of the block. (That is, if it contains an older version of the data.).

How Buffer States and Block Access Modes Change

Figure 5-1 shows how buffer states and block access modes change as instances perform various operations on a given buffer. The block access mode appears in parentheses.

Figure 5-1 How Buffer States and Block Access Modes Change

Text description of sps81053.gif follows

Text description of the illustration sps81053.gif

In Figure 5-1, the two instances begin with blocks in shared current mode and with shared resources. When Instance 1 performs an update on the block, its access mode on the block changes to exclusive mode (X). The shared resource owned by instance 2 converts to null mode (N). Meanwhile, the block state in instance 1 becomes XCUR, and in instance 2 it becomes CR. These block access modes are compatible.

Block Access Modes Can Be Compatible or Incompatible

When one process owns a resource in a given mode, another process requesting a resource in any particular mode succeeds or fails as shown in Table 5-3.

Table 5-3 Block Access Mode Compatibility

Mode Requested: Mode Owned	Null	S	X
Null	Succeed	Succeed	Succeed
S	Succeed	Succeed	Fail
X	Succeed	Fail	Fail

Cache Fusion Scenarios

The following scenarios illustrate the key points of Cache Fusion processing. These scenarios, which illustrate key concepts and do not address all possible configurations, are described in the following sections:

Requesting a Block for a Read from Another Instance: Scenario

The scenario shown in Figure 5-2 assumes that one instance has read a data block into its cache. The data block is protected by a resource in shared mode (S) and its role is local (L). This indicates that the block only exists in the local cache of this instance.

Figure 5-2 Requesting a Block for a Read from Another Instance

Text description of pscon003.gif follows

Text description of the illustration pscon003.gif

Instance 1 submits a request to the GCS to read a block. The GCS always knows the global distribution of resources, so it knows that a copy of the block is already in the cache of Instance 2.
The GCS then forwards the request to Instance 2.
The holding instance (Instance 2) transmits a copy of the block to the requesting instance (Instance 1), but keeps the resource in shared mode and also retains the local role. Along with the block, Instance 2 transmits its own resource disposition (shared and local), and the mode and role the requestor is to use in taking the resource. The mode is shared and role is local.
Once Instance 1 has received the block, it informs the GCS that it has taken the block and resource in shared mode and local role, and that the sender has retained the block and resource with the same disposition.

Note that the block and the mode and role information is transferred cache-to-cache through the high speed IPC interconnect without any disk I/O.

Requesting a Changed Block for Modification: Scenario

The scenario shown in Figure 5-3 assumes that the data block has been changed (or dirtied) by one instance and held in exclusive mode (X). Furthermore, this scenario assumes that the block has only been accessed by the instance that changed it. That is, only one copy of it exists cluster-wide. In other words, the block is in a local role (L).

Figure 5-3 Requesting a Changed Block for Modification

Text description of pslkgdt2.gif follows

Text description of the illustration pslkgdt2.gif

As in the first scenario, the instance attempting to modify the block (Instance 1) submits a request to the GCS.
The GCS transmits the request on to the holder (Instance 2).
Instance 2 receives the message, sends the block to Instance 1. Before sending the block, the resource is downgraded to null mode and keeps the changed (dirty) buffer is kept as a PI. Thus, the role changes to global (G), because the block is dirty. Along with the block, Instance 2 relays that to the requestor it retained a PI copy and a null resource. In the same message, it also specifies that the requestor take the block held in exclusive mode and with a global role.
On receipt of the block and the resource dispositions, Instance 1 informs the GCS that it is now holding the block in exclusive mode and with a global role. Meanwhile, Instance 2 (the former holder) retains a PI of the same block in null mode and global role. Note that the data block is not written to disk before the resource is granted to the other instance. That is, DBWR is not involved in the cache coherency scheme.

Writing Blocks to Disk: Scenario

The scenario shown in Figure 5-4 illustrates how an instance can checkpoint at any time or replace buffers in the cache due to free buffer requests. Because multiple versions of the data block with changes could exist in the caches of instances in the cluster, a write protocol mediated by the GCS must ensure that the current version of the data is written to disk. It must also ensure that all existing previous versions are purged from the other caches. A write request for a data block can originate in any instance that has the current or previous version of the block.

In this scenario, assume that the instance holding a PI buffer in null mode requests that the buffer be written.

Figure 5-4 Writing Blocks to Disk

Text description of pscon005.gif follows

Text description of the illustration pscon005.gif

Instance 2 first sends a write request to the GCS.
The GCS forwards the request to Instance 1 (the current block holder). The GCS remembers that a write at the System Change Number (SCN) is pending. The GCS also remembers that it has to notify nodes that have PIs of the same block.
Instance 1 receives the write request and writes the block to disk.
Instance 1 logs the completion of the write. It then notifies the GCS of the write completion. Instance 1 also informs the GCS that the resource role can become local because Instance 1 performed the write of the current block. After completion of the protocol, all PIs of the block should be discarded.
After receipt of the notification, the GCS orders all PI holders to discard (or flush) their PIs. Discarding in this case means that on receipt of the message, PI holders log that the current block has been written and the buffer is released. The PI is no longer needed for recovery. The buffer is essentially free and the resource previously held in null mode is closed.

See Also:
Oracle9i Real Application Clusters Deployment and Performance for additional information on System Change Numbers

How the GCS Grants and Coordinates Resource Requests

This section describes the basic concepts of how the GCS grants and coordinates resource requests. The topics in this section are:

The GCS tracks block access requests within your Real Application Clusters environment, granting requests for resources whenever possible. The GCS also tracks requests for resources that are not currently available. Access rights are granted when these resources later become available. The GCS maintains an inventory of block access requests and status of resources.

Interrupt and Completion Processing

There are three situations where processes are interrupted or notified to handle a request for a data block:

When a block is received from another cache
When read and write permissions are granted on a resource and no block was transferred because no other instance had cached it
When a request is blocked on another instance.

The usual flow of a Cache Fusion request is that a block request is made to the GCS and forwarded to the instance in which the data is cached. From there, the buffer is sent directly to the requestor, which is interrupted and completes the request. One key part of request completion is that the requestor informs the GCS that it has received the block. This is called a block arrival interrupt. It also informs the GCS that is taking it in a particular mode and role. This is called the assume notification. Informing the GCS is an asynchronous task. (That is, it is not blocking.)

On the holding side, an interrupt occurs when the resource requested by another instance is held in a conflicting mode. When this occurs, processes on the holding instance are interrupted. Processes on the holding instance are interrupted in order to release or downgrade their access privileges and send the block. This is called a blocking notification or blocking interrupt.

In some cases the GCS determines that resource is not available in any other instance in the cluster and grants permission to access the block directly. Upon request completion the requesting process will then read the block from disk. If the GCS can make the decision to grant the request locally (that is, without sending messages) the request will be completed immediately. Otherwise, an acquisition interrupt is sent to the requesting process.

The Global Enqueue Service (GES) uses a similar notification mechanism. There, only completion interrupts and blocking interrupts are used.

All requests for cluster-wide access to a resource are maintained in grant queues and convert queues. While requests are in progress and until they are completed, the requests remain in a convert queue. These queues are managed by the GCS and GES.

Block Access Requests are Queued

The GCS maintains two queues for resource requests:

Granted queue

The GCS tracks resource requests that have been granted in the granted queue.

Acquisition Interrupts Communicate Block Access Request Status

To communicate the statuses of resource requests, the GCS uses two types of interrupts (also known as wake up calls):

Block Access Requests are Granted and Converted

The following figures show how the GCS handles resource requests. In Figure 5-5, shared request 1 has been granted on the resource to process 1, and shared request 2 has been granted to process 2. As mentioned, the GCS tracks the resources in the granted queue. When a request for an exclusive block access mode is made by process 2, it must wait in the convert queue.

Figure 5-5 The Global Cache Service Grants and Converts Queues

Text description of sps81043.gif follows

Text description of the illustration sps81043.gif

Figure 5-6 shows the GCS sending a blocking interrupt to Process 1, the owner of the shared resource, notifying it that a request for an exclusive resource is waiting. When the shared resource is relinquished by Process 1, Oracle converts the access mode NULL or releases it.

Figure 5-6 Blocking Interrupt

Text description of sps81044.gif follows

Text description of the illustration sps81044.gif

An acquisition interrupt is then sent to alert Process 2, the requestor of the exclusive resource. The GCS grants the exclusive resource and converts it to the granted queue. Figure 5-7 illustrates this.

Figure 5-7 Function of an Acquisition Interrupt

Text description of sps81045.gif follows

Text description of the illustration sps81045.gif

Recovery in Real Applications Clusters

Real Application Clusters recovery is optimized to execute certain steps in parallel. Data blocks become available immediately after they are recovered. Database recovery and resource space reconfiguration are divided into two phases and can be executed in parallel. Generally, the recovery process allows a high degree of parallelism and hence better availability and scalability.

When an instance expires and the failure is detected by another Oracle instance in the cluster, Oracle performs the following recovery steps:

GCS resources and write requests are frozen while GES enqueues are reconfigured.
After the reconfiguration of the enqueues that are controlled by the GES, the following take place in parallel: a log read, recovery, and remastering of GCS resources. At the end of this step, the resources of the blocks that need to be recovered have been identified and the Global Resource Directory is reconstructed. Pending requests or writes have been cancelled or replayed.
Buffer space for recovery is allocated and the resources identified in the previous pass over the log are claimed as recovery resources. Then, assuming that there are PIs of blocks to be recovered in other caches in the cluster, source buffers are requested from other instances. The resource buffers are the starting point of recovery for a particular block.
All resources and enqueues required for subsequent processing have been acquired and the Global Resource Directory is now unfrozen. Any data blocks that are not in recovery can now be accessed. Note that the system is already partially available.
The cache layer recovers and writes each block identified in step 2, releasing the recovery resources immediately after block recovery so that more and more blocks become available as cache recovery proceeds.
After all blocks have been recovered and recovery resources have been released, the system is again fully available.

In summary, the recovered database or recovered portions of the database become available earlier, and before the completion of the entire recovery sequence. This makes the system more available and recovery more scalable.

In the rare occurrence of multiple simultaneous instance failures, neither the PI buffers nor the current buffer for a data block can be found in any of the surviving instances' caches. Then a log merge of the failed instances must be performed. The performance penalty of a log merge is proportional to the number of failed instances and the size of the redo logs for each instance. The size of the log to be read can be controlled by checkpoint features.

With its advanced design, Real Application Clusters recovery is able to handle multiple simultaneous failures and sequential failures. The shared cache server is also resilient to instance failures or crashes during recovery.

5 Cache Fusion and the Global Cache Service