This chapter introduces Oracle Exadata System Software.
1.1 Overview of Oracle Exadata System Software
Oracle Exadata Storage Server is a highly optimized storage server that runs Oracle Exadata System Software to store and access Oracle Database data.
With traditional storage, data is transferred to the database server for processing. In contrast, Oracle Exadata System Software provides database-aware storage services, such as the ability to offload SQL and other database processing from the database server, while remaining transparent to the SQL processing and database applications. Oracle Exadata Database Machine storage servers process data at the storage level, and pass only what is needed to the database servers.
Oracle Exadata System Software is installed on both the storage servers and the database servers. Oracle Exadata System Software offloads some SQL processing from the database server to the storage servers. Oracle Exadata System Software enables function shipping between the database instance and the underlying storage, in addition to traditional data shipping. Function shipping greatly reduces the amount of data processing that must be done by the database server. Eliminating data transfers and database server workload can greatly benefit query processing operations that often become bandwidth constrained. Eliminating data transfers can also provide a significant benefit to online transaction processing (OLTP) systems that include large batch and report processing operations.
The hardware components of Oracle Exadata Storage Server are carefully chosen to match the needs of high performance processing. The Oracle Exadata System Software is optimized to maximize the advantage of the hardware components. Each storage server delivers outstanding processing bandwidth for data stored on disk, often several times better than traditional solutions.
Oracle Exadata Database Machine storage servers use state-of-the-art RDMA Network Fabric interconnections between servers and storage. Each RDMA Network Fabric link provides bandwidth of 40 GB for InfiniBand Network Fabric or 100 GB for RoCE Network Fabric. Additionally, the interconnection protocol uses direct data placement, also referred to as direct memory access (DMA), to ensure low CPU overhead by directly moving data from the wire to database buffers with no extra copies. The RDMA Network Fabric has the flexibility of a LAN network with the efficiency of a storage area network (SAN). With an RDMA Network Fabric network, Oracle Exadata Database Machine eliminates network bottlenecks that could reduce performance. This RDMA Network Fabric network also provides a high performance cluster interconnection for Oracle Real Application Clusters (Oracle RAC) servers.
The Oracle Exadata Database Machine architecture scales to any level of performance. To achieve higher performance or greater storage capacity, you add more storage servers (cells) to the configuration. As more storage servers are added, capacity and performance increase linearly. Data is mirrored across storage servers to ensure that the failure of a storage server does not cause loss of data or availability. The scale-out architecture achieves near infinite scalability, while lowering costs by allowing storage to be purchased incrementally on demand.
Oracle Exadata System Software must be used with Oracle Exadata Database Machine storage server hardware, and only supports Oracle databases on the database servers of Oracle Exadata Database Machines. Information is available on My Oracle Support at
and on the Products page of Oracle Technology Network at
1.2 Key Features of Oracle Exadata System Software
This section describes the key features of Oracle Exadata System Software.
1.2.1 Reliability, Modularity, and Cost-Effectiveness
Oracle Exadata System Software enables cost-effective modular storage hardware to be used in a scale-out architecture while providing a high level of availability and reliability.
All single points of failure are eliminated in the Oracle Exadata Storage Server architecture by data mirroring, fault isolation technology, and protection against disk and other storage hardware failure.
In the Oracle Exadata Storage Server architecture, one or more storage cells can support one or more databases. The placement of data is transparent to database users and applications. Storage cells use Oracle Automatic Storage Management (Oracle ASM) to distribute data evenly across the cells. Because Oracle Exadata Storage Servers support dynamic disk insertion and removal, the online dynamic data redistribution feature of Oracle ASM ensures that data is appropriately balanced across the newly added, or remaining, disks without interrupting database processing. Oracle Exadata Storage Servers also provides data protection from disk and cell failures.
1.2.2 Compatibility with Oracle Database
When the minimum required versions are met, all Oracle Database features are fully supported with Oracle Exadata System Software.
Oracle Exadata System Software works equally well with single-instance or Oracle Real Application Clusters (Oracle RAC) deployments of Oracle Database. Oracle Data Guard, Oracle Recovery Manager (RMAN), Oracle GoldenGate, and other database features are managed the same with Exadata storage cells as with traditional storage. This enables database administrators to use the same tools with which they are familiar.
Refer to My Oracle Support Doc ID 888828.1 for a complete list of the minimum required software versions.
1.2.3 Smart Flash Technology
The Exadata Smart Flash Cache feature of the Oracle Exadata System Software intelligently caches database objects in flash memory, replacing slow, mechanical I/O operations to disk with very rapid flash memory operations.
220.127.116.11 Flash Cache
Oracle has implemented smart flash cache directly in Oracle Exadata Storage Server.
Oracle Exadata Smart Flash Cache holds frequently-accessed data in very fast flash storage while most data is kept in very cost-effective disk storage. This happens automatically without the user having to take any action.
Oracle Exadata Smart Flash Cache is smart because it knows when to avoid trying to cache data that will never be reused or will not fit in the cache. Oracle Database and Oracle Exadata System Software allow the user to provide directives at the database table, index and segment level to ensure that specific data is retained in flash. Tables can be moved in and out of flash with a simple command, without the need to move the table to different tablespaces, files or LUNs as is done with traditional storage using flash disks.
18.104.22.168 Flash Logging
Oracle Exadata Smart Flash technology is also used to reduce the latency of log write I/O operations by eliminating performance bottlenecks that might occur due to database logging.
The time to commit user transactions is very sensitive to the latency of log write operations. In addition, many performance-critical database algorithms, such as space management and index splits, are very sensitive to log write latency.
Although the disk controller has a large battery-backed DRAM cache that can accept writes very quickly, some write operations to disk can still be slow during periods of high I/O. Even with relatively few redo log write operations that are slow, these write operations can cause performance issues. It is these situations that Oracle Exadata Smart Flash Log is designed to alleviate.
The goal of the Oracle Exadata Smart Flash Log is to perform redo write operations simultaneously to both flash memory and disk, and complete the write operation when the first of the two completes. This gives Oracle Exadata Database Machine the best of both worlds by avoiding problems due to latency spikes on either type of media. Smart Flash Logging is most beneficial during busy periods when the disk controller cache occasionally becomes filled with blocks that have not been written to disk and therefore degrades to real disk performance versus disk cache performance. It is important to note that Smart Flash Logging improves latency of log write operations, but it does not improve total disk throughput. If an application is bottle-necked on disk throughput, then Smart Flash Logging can provide little benefit because log response time is not the limiting factor to performance.
Oracle Exadata Smart Flash Log improves user transaction response time, and increases overall database throughput for I/O intensive workloads by accelerating performance critical database algorithms.
22.214.171.124 WriteBack Flash Cache
WriteBack flash cache provides the ability to cache write I/Os directly to PCI flash in addition to read I/Os.
The Flash Cache component on the Oracle Exadata storage cells can be configured in two ways: WriteThrough or WriteBack. WriteThrough cache reads IOs on the flash cache. In WriteBack mode, introduced with Oracle Exadata System Software release 126.96.36.199.0, all I/Os (reads/writes) are cached into the flash cache, boosting the performance of the databases.
WriteBack flash cache significantly improves the write intensive operations because writing to flash cache is faster than writing to hard disks. If your application writes intensively and if you find significant waits for "free buffer waits" or high I/O times, then you should consider using WriteBack flash cache.
1.2.4 Persistent Memory Accelerator and RDMA
Persistent Memory (PMEM) Accelerator provides direct access to persistent memory using remote direct memory access (RDMA), enabling faster response times and lower read latencies.
Starting with Oracle Exadata System Software release 19.3.0, workloads that require ultra low response time such as stock trades and IOT devices can take advantage of PMEM and RDMA in the form of a PMEM Cache and PMEM Logging. PMEM is a new, persistent memory tier available on Oracle Exadata Storage Servers X8M-2 EF and HC and newer generations of Exadata Storage Servers with Persistent Memory (X*M). When clients read from the PMEM cache, Oracle Exadata System Software can do an RDMA read of the cached data, with much faster results compared to Flash Cache.
PMEM Cache can be used in the following configurations:
|PMEM Cache Mode||Flash Cache Mode||Supported Configuration?|
|Write Through||Write Through||Yes. This is the default configuration for High Capacity servers with Normal Redundancy.|
|Write Through||Write Back||Yes. This is the default configuration for High Capacity servers with High Redundancy. This is also the default configuration for Extreme Flash servers.|
|Write Back||Write Back||Yes.|
|Write Back||Write Through||No. Write-intensive workloads can overload the write-back PMEM Cache.|
PMEM Logging uses PMEM and RDMA to provide substantially lower redo log write latency. If the redo log resides on PMEM, clients can do an RDMA-write directly into the redo log. While this solution can provide low latencies, placing all redo log files on PMEM is cost prohibitive. If the redo log does not reside on PMEM, the Oracle Exadata System Software uses shared receive queues (SRQs) to send I/O buffers from the client to cellsrv via RDMA, and this reduces transport latencies. Cellsrv still performs the writes of the redo log data to disk (and flash, if Flash Logging is enabled). In cases when the
PMEMLOG is bypassed due to lack of buffers or because it is disabled for a given database, Flash Logging is used instead.
1.2.5 Centralized Storage
You can use Oracle Exadata Storage Server to consolidate your storage requirements into a central pool that can be used by multiple databases.
Oracle Exadata System Software with Oracle Automatic Storage Management (Oracle ASM) evenly distributes the data and I/O load for every database across available disks in the storage pool. Every database can use all of the available disks to achieve superior I/O rates. Oracle Exadata Storage Servers can provide higher efficiency and performance at a lower cost while also lowering your storage administration overhead.
1.2.6 I/O Resource Management (IORM)
I/O Resource Management (IORM) and the Oracle Database Resource Manager enable multiple databases and pluggable databases to share the same storage while ensuring that I/O resources are allocated across the various databases.
Oracle Exadata System Software works with IORM and Oracle Database Resource Manager to ensure that customer-defined policies are met, even when multiple databases share the grid. As a result, one database cannot monopolize the I/O bandwidth and degrade the performance of the other databases.
IORM enables storage cells to service I/O resources among multiple applications and users across all databases in accordance with sharing and prioritization levels established by the administrator. This improves the coexistence of online transaction processing (OLTP) and reporting workloads, because latency-sensitive OLTP applications can be given a larger share of disk and flash I/O bandwidth than throughput-sensitive batch applications. Oracle Database Resource Manager enables the administrator to control processor utilization on the database host on a per-application basis. Combining IORM and Oracle Database Resource Manager enables the administrator to establish more accurate policies.
IORM also manages the space utilization for Exadata Smart Flash Cache. Critical OLTP workloads can be guaranteed space in Exadata Smart Flash Cache to provide consistent performance.
IORM for a database or pluggable database (PDB) is implemented and managed from the Oracle Database Resource Manager. Oracle Database Resource Manager in the database instance communicates with the IORM software in the storage cell to manage user-defined service-level targets. Database resource plans are administered from the database, while interdatabase plans are administered on the storage cell.
1.2.7 In-Memory Columnar Format Support
You can store data in the In-Memory columnar format in the flash cache in an Oracle Exadata Database Machine environment.
Oracle Exadata Database Machine supports all of the In-Memory optimizations, such as accessing only the compressed columns required, SIMD vector processing, storage indexes, and so on.
If you set the
INMEMORY_SIZE database initialization parameter to a non-zero value (requires the Oracle Database In-Memory option), then objects accessed using a Smart Scan are brought into the flash cache and are automatically converted into the In-Memory columnar format. The data is converted initially into a columnar cache format, which is different from Oracle Database In-Memory’s columnar format. The data is rewritten in the background into Oracle Database In-Memory columnar format. As a result, all subsequent accesses to the data benefit from all of the In-Memory optimizations when that data is retrieved from the flash cache.
Any write to an in-memory table does not invalidate the entire columnar cache of that table. It only invalidates the columnar cache unit of the disk region in which the block resides. For subsequent scans after a table update, a large part of the table is still in the columnar cache. The scans can still make use of the columnar cache, except for the units in which the writes were made. For those units, the query uses the original block version to get the data and then tat data is converted back in the columnar format into the columnar cache. After a sufficient number of scans, the invalidated columnar cache units are automatically repopulated.
A new segment-level attribute,
CELLMEMORY, has also been introduced to help control which objects should not be populated into flash using the In-Memory columnar format and which type of compression should be used. Just like the
INMEMORY attribute, you can specify different compression levels as sub-clauses to the
CELLMEMORY attribute. However, not all of the
INMEMORY compression levels are available; only
MEMCOMPRESS FOR QUERY LOW and
MEMCOMPRESS FOR CAPACITY LOW (default). You specify the
CELLMEMORY attribute using a SQL command, such as the following:
ALTER TABLE trades CELLMEMORY MEMCOMPRESS FOR QUERY LOW
PRIORTY sub-clause available with Oracle Database In-Memory is not available on Oracle Exadata Database Machine because the process of populating the flash cache on Exadata storage servers if different from populating DRAM in the In-Memory column store on Oracle Database servers.
1.2.8 Offloading of Data Search and Retrieval Processing
One of the most powerful features of Oracle Exadata System Software is that it offloads the data search and retrieval processing to the storage servers.
Known as Exadata Smart Scan Offload, or simply Smart Scan, Oracle Exadata System Software does this by performing predicate filtering, which entails evaluating database predicates to optimize the performance of certain classes of bulk data processing.
Oracle Database can optimize the performance of queries that perform table and index scans to evaluate selective predicates in Oracle Exadata Storage Server. The database can complete these queries faster by pushing the database expression evaluations to the storage cell. These expressions include simple SQL command predicates, such as
amount > 200, and column projections, such as
SELECT customer_name. For example:
SQL> SELECT customer_name FROM calls WHERE amount > 200;
In the preceding example, only rows satisfying the predicate, specified columns, and predicated columns are returned to the database server, eliminating unproductive data transfer to the database server.
Oracle Exadata System Software uses storage-side predicate evaluation that transfers simplified, predicate evaluation operations for table and index scans to the storage cell. This brings the table scan closer to the disk to enable a higher bandwidth, and prevents sending unmatched rows to hosts.
Figure 1-1 Offloading Data Search and Retrieval
Description of "Figure 1-1 Offloading Data Search and Retrieval"
1.2.9 Offloading of Incremental Backup Processing
To optimize the performance of incremental backups, the database can offload block filtering to Oracle Exadata Storage Server.
This optimization is only possible when taking backups using Oracle Recovery Manager (RMAN). The offload processing is done transparently without user intervention. During offload processing, Oracle Exadata System Software filters out the blocks that are not required for the incremental backup in progress. Therefore, only the blocks that are required for the backup are sent to the database, making backups significantly faster.
1.2.10 Protection Against Data Corruption
Data corruptions, while rare, can have a catastrophic effect on a database, and therefore on a business.
Oracle Exadata System Software takes data protection to the next level by protecting business data, not just the physical bits.
The key approach to detecting and preventing corrupted data is block checking in which the storage subsystem validates the Oracle block contents. Oracle Database validates and adds protection information to the database blocks, while Oracle Exadata System Software detects corruptions introduced into the I/O path between the database and storage. It stops corrupted data from being written to disk, and validates data when reading the disk. This eliminates a large class of failures that the database industry had previously been unable to prevent.
Unlike other implementations of corruption checking, checks with Oracle Exadata System Software operate completely transparently. No parameters need to be set at the database or storage tier. These checks transparently handle all cases, including Oracle Automatic Storage Management (Oracle ASM) disk rebalance operations and disk failures.
1.2.11 Fast File Creation
File creation operations are offloaded to Oracle Exadata Storage Servers.
Operations such as
CREATE TABLESPACE, which can create one or more files, have a significant increase in speed due to file creation offload.
1.2.12 Storage Index
Oracle Exadata Storage Servers maintain a storage index which contains a summary of the data distribution on the disk.
The storage index is maintained automatically, and is transparent to Oracle Database. It is a collection of in-memory region indexes, and each region index stores summaries for up to eight columns. There is one region index for each 1 MB of disk space. Storage indexes work with any non-linguistic data type, and work with linguistic data types similar to non-linguistic indexes.
Each region index maintains the minimum and maximum values of the columns of the table. The minimum and maximum values are used to eliminate unnecessary I/O, also known as I/O filtering. The cell physical IO bytes saved by storage index statistic, available in the
V$SYS_STAT view, shows the number of bytes of I/O saved using storage index. The content stored in one region index is independent of the other region indexes. This makes them highly scalable, and avoids latch contention.
Queries using the following comparisons are improved by the storage index:
Inequality (<, !=, or >)
Less than or equal (<=)
Greater than or equal (>=)
IS NOT NULL
Storage indexes are built automatically after Oracle Exadata System Software receives a query with a comparison predicate that is greater than the maximum or less than the minimum value for the column in a region, and would have benefited if a storage index had been present. Oracle Exadata System Software automatically learns which storage indexes would have benefited a query, and then creates the storage index automatically so that subsequent similar queries benefit.
The effectiveness of storage indexes can be improved by ordering the rows based on columns that frequently appear in
WHERE query clauses.
The storage index is maintained during write operations to uncompressed blocks and OLTP compressed blocks. Write operations to Exadata Hybrid Columnar CompressionExadata Hybrid Columnar Compression compressed blocks or encrypted tablespaces invalidate a region index, but not the storage index. The storage index for Exadata Hybrid Columnar Compression is rebuilt on subsequent scans.
Example 1-1 Elimination of Disk I/O with Storage Index
The following figure shows a table and region indexes. The values in the table range from one to eight. One region index stores the minimum 1, and the maximum of 5. The other region index stores the minimum of 3, and the maximum of 8.
For a query such as
SELECT * FROM TABLE WHERE B<2, only the first set of rows match. Disk I/O is eliminated because the minimum and maximum of the second set of rows do not match the
WHERE clause of the query.
Example 1-2 Partition Pruning-like Benefits with Storage Index
In the following figure, there is a table named
Orders with the columns
Order_Item. The table is range partitioned by
The following query looks for orders placed since January 1, 2015:
SELECT count (*) FROM Orders WHERE Order_Date >= to_date ('2015-01-01', \ 'YYY-MM-DD')
Because the table is partitioned on the
Order_Date column, the preceding query avoids scanning unnecessary partitions of the table. Queries on
Ship_Date do not benefit from
Order_Date partitioning, but
Order_Number are highly correlated with
Order_Date. Storage indexes take advantage of ordering created by partitioning or sorted loading, and can use it with the other columns in the table. This provides partition pruning-like performance for queries on the
Example 1-3 Improved Join Performance Using Storage Index
Using storage index allows table joins to skip unnecessary I/O operations. For example, the following query would perform an I/O operation and apply a Bloom filter to only the first block of the fact table.
SELECT count(*) FROM fact, dim WHERE fact.m=dim.m AND dim.product="Hard drive"
The I/O for the second block of the fact table is completely eliminated by storage index as its minimum/maximum range (5,8) is not present in the Bloom filter.
1.3 Oracle Exadata System Software Components
This section provides a summary of the following Oracle Exadata System Software components.
1.3.1 About Oracle Exadata System Software
Unique software algorithms in Oracle Exadata System Software implement database intelligence in storage, PCI-based flash, and RDMA Network Fabric networking to deliver higher performance and capacity at lower costs than other platforms.
Oracle Exadata Storage Server is a network-accessible storage device with Oracle Exadata System Software installed on it. The software communicates with the database using a specialized iDB protocol, and provides both simple I/O functionality, such as block-oriented reads and writes, and advanced I/O functionality, including predicate offload and I/O Resource Management (IORM). Each storage server has physical disks. The physical disk is an actual device within the storage server that constitutes a single disk drive spindle.
Within the storage servers, a logical unit number (LUN) defines a logical storage resource from which a single cell disk can be created. The LUN refers to the access point for storage resources presented by the underlying hardware to the upper software layers. The precise attributes of a LUN are configuration-specific. For example, a LUN could be striped, mirrored, or both striped and mirrored.
A cell disk is an Oracle Exadata System Software abstraction built on the top of a LUN. After a cell disk is created from the LUN, it is managed by Oracle Exadata System Software and can be further subdivided into grid disks, which are directly exposed to the database and Oracle Automatic Storage Management (Oracle ASM) instances. Each grid disk is a potentially non-contiguous partition of the cell disk that is directly exposed to Oracle ASM to be used for the Oracle ASM disk group creations and expansions.
This level of virtualization enables multiple Oracle ASM clusters and multiple databases to share the same physical disk. This sharing provides optimal use of disk capacity and bandwidth. Various metrics and statistics collected on the cell disk level enable you to evaluate the performance and capacity of storage servers. IORM schedules the cell disk access in accordance with user-defined policies.
The following image illustrates how the components of a storage server (also called a cell) are related to grid disks.
- A LUN is created from a physical disk.
- A cell disk is created on a LUN. A segment of cell disk storage is used by the Oracle Exadata System Software system, referred to as the cell system area.
- Multiple grid disks can be created on a cell disk.
Figure 1-2 Oracle Exadata Storage Server Components
Description of "Figure 1-2 Oracle Exadata Storage Server Components"
The following image illustrates software components in the Oracle Exadata Storage Server environment.
Figure 1-3 Software Components in the Oracle Exadata Database Machine Environment
Description of "Figure 1-3 Software Components in the Oracle Exadata Database Machine Environment"
The figure illustrates the following environment:
Single-instance or Oracle RAC databases access storage servers using the iDB protocol over a RDMA Network Fabric network. Each database server runs the Oracle Database and Oracle Grid Infrastructure software. Resources are managed for each database instance by Oracle Database Resource Manager (shown as DBRM).
The database servers include Oracle Exadata System Software functionality, such as a Management Server (MS) and command-line interface (DBMCLI).
Storage servers contain cell-based utilities and processes from Oracle Exadata System Software, including:
Cell Server (CELLSRV)—the primary component of the Oracle Exadata System Software running in the storage server, which provides the majority of the storage server services. CELLSRV services database requests for disk I/O and provides the advanced SQL offload capabilities. CELLSRV implements the I/O Resource Management (IORM) functionality to meter out I/O bandwidth to the various databases and consumer groups issuing I/O calls on the storage server.
- Management Server (MS)—the primary interface to administer, manage and query the status of the storage server. It works in cooperation with the Cell Control Command-Line Interface (CellCLI) and processes most of the commands from CellCLI.
- Restart Server (RS)—monitors the heartbeat with the MS and the CELLSRV processes, and restarts the servers if they fail to respond within the allowable heartbeat period.
Storage cells are configured on the network, and are managed by the Oracle Exadata System Software CellCLI utility.
- Each storage server contains multiple disks which store the data for the database instances on the database servers. The data is stored in disks managed by Oracle ASM.
1.3.2 About Oracle Automatic Storage Management
Oracle Automatic Storage Management (Oracle ASM) is the cluster volume manager and file system used to manage Oracle Exadata Storage Server resources.
Oracle ASM provides enhanced storage management by:
- Striping database files evenly across all available storage cells and disks for optimal performance.
- Using mirroring and failure groups to avoid any single point of failure.
- Enabling dynamic add and drop capability for non-intrusive cell and disk allocation, deallocation, and reallocation.
- Enabling multiple databases to share storage cells and disks.
The following topics provide a brief overview of Oracle ASM:
188.8.131.52 Oracle ASM Disk Groups
An Oracle Automatic Storage Management (Oracle ASM) disk group is the primary storage abstraction within Oracle ASM, and is composed of one or more grid disks.
Oracle Exadata Storage Server grid disks appear to Oracle ASM as individual disks available for membership in Oracle ASM disk groups. Whenever possible, grid disk names should correspond closely with Oracle ASM disk group names to assist in problem diagnosis between Oracle ASM and Oracle Exadata System Software.
The Oracle ASM disk groups are as follows:
DATA is the data disk group.
RECO is the recovery disk group.
DBFS (Oracle Database File System) is the file system disk group.
SPARSE is a sparse disk group to keep snapshot files.
To take advantage of Oracle Exadata System Software features, such as predicate processing offload, the disk groups must contain only Oracle Exadata Storage Server grid disks, and the tables must be fully inside these disk groups.
The Oracle Database and Oracle Grid Infrastructure software must be release 184.108.40.206.0 BP3 or later when using sparse grid disks.
220.127.116.11 Oracle ASM Failure Group
An Oracle ASM failure group is a subset of disks in an Oracle ASM disk group that can fail together because they share the same hardware.
Oracle ASM considers failure groups when making redundancy decisions.
For Oracle Exadata Storage Servers, all grid disks, which consist of the Oracle ASM disk group members and candidates, can effectively fail together if the storage cell fails. Because of this scenario, all Oracle ASM grid disks sourced from a given storage cell should be assigned to a single failure group representing the cell.
For example, if all grid disks from two storage cells, A and B, are added to a single Oracle ASM disk group with normal redundancy, then all grid disks on storage cell A are designated as one failure group, and all grid disks on storage cell B are designated as another failure group. This enables Oracle Exadata System Software and Oracle ASM to tolerate the failure of either storage cell.
Failure groups for Oracle Exadata Storage Server grid disks are set by default so that the disks on a single cell are in the same failure group, making correct failure group configuration simple for Oracle Exadata Storage Servers.
You can define the redundancy level for an Oracle ASM disk group when creating a disk group. An Oracle ASM disk group can be specified with normal or high redundancy. Normal redundancy double mirrors the extents, and high redundancy triple mirrors the extents. Oracle ASM normal redundancy tolerates the failure of a single cell or any set of disks in a single cell. Oracle ASM high redundancy tolerates the failure of two cells or any set of disks in two cells. Base your redundancy setting on your desired protection level. When choosing the redundancy level, ensure the post-failure I/O capacity is sufficient to meet the redundancy requirements and performance service levels. Oracle recommends using three cells for normal redundancy. This ensures the ability to restore full redundancy after cell failure. Consider the following:
If a cell or disk fails, then Oracle ASM automatically redistributes the cell or disk contents across the remaining disks in the disk group as long as there is enough space to hold the data. For an existing disk group using Oracle ASM redundancy, the
REQUIRED_FREE_MIRROR_MBcolumns in the
V$ASM_DISGKROUPview give the amount of usable space and space for redundancy, respectively.
If a cell or disk fails, then the remaining disks should be able to generate the IOPS necessary to sustain the performance service level agreement.
After a disk group is created, the redundancy level of the disk group cannot be changed. To change the redundancy of a disk group, you must create another disk group with the appropriate redundancy, and then move the files.
Each Exadata Cell is a failure group. A normal redundancy disk group must contain at least two failure groups. Oracle ASM automatically stores two copies of the file extents, with the mirrored extents placed in different failure groups. A high redundancy disk group must contain at least three failure groups. Oracle ASM automatically stores three copies of the file extents, with each file extent in separate failure groups.
System reliability can diminish if your environment has an insufficient number of failure groups. A small number of failure groups, or failure groups of uneven capacity, can lead to allocation problems that prevent full use of all available storage.
18.104.22.168 Maximum Availability with Oracle ASM
Oracle recommends high redundancy Oracle ASM disk groups, and file placement configuration which can be automatically deployed using Oracle Exadata Deployment Assistant.
High redundancy can be configured for DATA, RECO or any other Oracle ASM group with a minimum of 3 storage cells. Starting with Exadata Software release 22.214.171.124.0, the voting disks can reside in a high redundancy disk group, and additional quorum disks (essentially equivalent to voting disks) can reside on database servers if there are fewer than 5 Exadata storage cells.
Maximum availability architecture (MAA) best practice uses three Oracle ASM disk groups, DATA, RECO, and DBFS. The disk groups are located as follows:
- The disk groups are striped across all disks and Oracle Exadata Storage Servers to maximize I/O bandwidth and performance, and simplify management.
- The DATA disk group is located on the outer section of all disks. This is true only for physical deployments. This is not applicable to Oracle VM deployments.
- The RECO disk group is located on the outer/inner section of all disks. This is true only for physical deployments. This is not applicable to Oracle VM deployments.
- The DBFS disk group is located on the inner section of all disks.
- The DATA and RECO disk groups are configured for high redundancy.
The preceding attributes ensure optimal file placement in the different Oracle ASM disk groups. In addition, all operations have access to full I/O bandwidth, when needed. To avoid excessive resource consumption, use I/O Resource Management, Oracle Database Resource Manager, and instance caging.
The benefits of high redundancy disk groups are illustrated by the following outage scenarios:
- Double partner disk failure: Protection against loss of the database and Oracle ASM disk group due to a disk failure followed by a second partner disk failure.
- Disk failure when Oracle Exadata Storage Server is offline: Protection against loss of the database and Oracle ASM disk group when a storage server is offline and one of the storage server's partner disks fails. The storage server may be offline because of Exadata storage planned maintenance, such as Exadata rolling storage server patching.
- Disk failure followed by disk sector corruption: Protection against data loss and I/O errors when latent disk sector corruptions exist and a partner storage disk is unavailable either due to planned maintenance or disk failure.
If the voting disk resides in a high redundancy disk group that is part of the default Exadata high redundancy deployment, the cluster and database will remain available for the above failure scenarios. If the voting disk resides on a normal redundancy disk group, then the database cluster will fail and the database has to be restarted. You can eliminate that risk by moving the voting disks to a high redundancy disk group and creating additional quorum disks on database servers.
Oracle recommends High Redundancy for ALL (DATA and RECO) disk groups because it provides maximum application availability against storage failures and operational simplicity during a storage outage. In contrast, if all disk groups were configured with normal redundancy and two partner disks fail, all clusters and databases on Exadata will fail and you will lose all your data (normal redundancy does not survive double partner disk failures). Other than better storage protection, the major difference between high redundancy and normal redundancy is the amount of usable storage and write I/Os. High redundancy requires more space, and has three write I/Os instead of two. The additional write I/O normally has negligible impact with Exadata smart write-back flash cache.
The following table describes that redundancy option, as well as others, and the relative availability trade-offs. The table assumes that voting disks reside in high redundancy disk group. Refer to Oracle Exadata Database Machine Maintenance Guide to migrate voting disks to high redundancy disk group for existing high redundancy disk group configurations.
|Redundancy Option||Availability Implications||Recommendation|
High Redundancy for ALL (DATA and RECO)
Zero application downtime and zero data loss for the preceding storage outage scenarios if voting disks reside in high redundancy disk group.
If voting disks currently reside in normal redundancy disk group, refer to Oracle Exadata Database Machine Maintenance Guide to migrate them to high redundancy disk group.
Use this option for best storage protection and operational simplicity for mission-critical applications. Requires more space for higher redundancy.
High Redundancy for DATA only
Zero application downtime and zero data loss for preceding storage outage scenarios. This option requires an alternative archive destination.
New default deployment configuration with 8 TB disks.
Use this option for best storage protection for DATA with slightly higher operational complexity. More available space than High Redundancy for ALL.
Refer to My Oracle Support note 2059780.1 for details.
High Redundancy for RECO only
Zero data loss for the preceding storage outage scenarios.
Use this option when longer recovery times are acceptable for the preceding storage outage scenarios. Recovery options include the following:
Normal Redundancy for ALL (DATA and RECO)
Note: Cross-disk mirror isolation by using ASM disk group content type limits an outage to a single disk group when two disk partners are lost in a normal redundancy group that share physical disks and storage servers.
The preceding storage outage scenarios resulted in failure of all Oracle ASM disk groups. However, using cross-disk group mirror isolation the outage is limited to one disk group.
Note: This option is not available for eighth or quarter racks.
Oracle recommends a minimum of High Redundancy for DATA only.
Use the Normal Redundancy for ALL option when the primary database is protected by an Oracle Data Guard standby database deployed on a separate Oracle Exadata Database Machine or when the Exadata Database Machine is servicing only development or test databases. Oracle Data Guard provides real-time data protection and fast failover for storage failures.
If Oracle Data Guard is not available and the DATA or RECO disk groups are lost, then leverage recovery options described in My Oracle Support note 1339373.1.
The optimal file placement for setup for MAA is:
- Oracle Database files — DATA disk group
- Flashback log files, archived redo files, and backup files — RECO disk group
- Redo log files — First high redundancy disk group. If no high redundancy disk group exists, then redo log files are multiplexed across the DATA and RECO disk groups.
- Control files — First high redundancy disk group. If no high redundancy disk groups exist, the use one control file in the DATA disk group. The backup control files should reside in the RECO disk group, and
RMAN CONFIGURE CONTROLFILE AUTOBACKUP ONshould be set.
- Server parameter files (SPFILE) — First high redundancy disk group. If no high redundancy disk group exists, then SPFILE should reside in the DATA disk group. SPFILE backups should reside in the RECO disk group.
- Oracle Cluster Registry (OCR) and voting disks for Oracle Exadata Database Machine Full Rack and Oracle Exadata Database Machine Half Rack — First high redundancy disk group. If no high redundancy disk group exists, then the files should reside in the DATA disk group.
- Voting disks for Oracle Exadata Database Machine Quarter Rack or Eighth Rack — First high redundancy disk group, otherwise in normal redundancy disk group. If there are fewer than 5 Exadata storage cells with high redundancy disk group, additional quorum disks will be stored on Exadata database servers during OEDA deployment. Refer to Oracle Exadata Database Machine Maintenance Guide to migrate voting disks to high redundancy disk group for existing high redundancy disk group configurations.
- Temporary files — First normal redundancy disk group. If the High Redundancy for ALL option is used, then the use the first high redundancy disk group.
- Staging and non-database files — DBFS disk group
- Database High Availability Checklist
- Configuration Prerequisites and Operational Steps for Higher Availability for a RECO disk group or Fast Recovery Area Failure (My Oracle Support Doc ID 2059780.1)
- Operational Steps for Recovery after Losing a Disk Group in an Exadata Environment (My Oracle Support Doc ID 1339373.1)
1.3.3 About Grid RAID
A grid Redundant Array of Independent Disks (RAID) configuration uses Oracle ASM mirroring capabilities.
To use grid RAID, you place grid disks in an Oracle ASM disk group with a normal or high redundancy level, and set all grid disks in the same cell to be in the same Oracle ASM failure group. This ensures that Oracle ASM does not mirror data extents using disks within the cell. Using disks from different cells ensures that an individual cell failure does not cause the data to be unavailable.
Grid RAID also provides simplified creation of cell disks. With grid RAID, LUNs are automatically created from available physical disks because Oracle software automatically creates the required LUNs.
1.3.4 About Storage Server Security
Security for Exadata Storage Servers is enforced by identifying which clients can access storage servers and grid disks.
Clients include Oracle ASM instances, database instances, and clusters. When creating or modifying grid disks, you can configure the Oracle ASM owner and the database clients that are allowed to use those grid disks.
1.3.5 About iDB Protocol
The iDB protocol is a unique Oracle data transfer protocol that serves as the communications protocol among Oracle ASM, database instances, and storage cells.
General-purpose data transfer protocols operate only on the low-level blocks of a disk. In contrast, the iDB protocol is aware of the Oracle internal data representation and is the necessary complement to Exadata storage server specific features, such as predicate processing offload.
In addition, the iDB protocol provides interconnection bandwidth aggregation and failover.
1.3.6 About Oracle Exadata System Software Processes
Oracle Exadata System Software uses its own set of background processes.
Oracle Exadata System Software includes the following software processes:
Cell Server (CELLSRV) services iDB requests for disk I/O and advanced Oracle Exadata Storage Server services, such as predicate processing offload. CELLSRV is implemented as a multithreaded process and should be expected to use the largest portion of processor cycles on a storage cell.
Management Server (MS) provides standalone storage cell management and configuration.
Restart Server (RS) monitors the CELLSRV and MS processes and restarts them, if necessary.
1.3.7 About Cell Management
Each cell in the Oracle Exadata Storage Server grid is individually managed with Cell Control Command-Line Interface (CellCLI).
The CellCLI utility provides a command-line interface to the cell management functions, such as cell initial configuration, cell disk and grid disk creation, and performance monitoring. The CellCLI utility runs on the cell, and is accessible from a client computer that has network access to the storage cell or is directly connected to the cell. The CellCLI utility communicates with Management Server to administer the storage cell.
To access the cell, you should either use Secure Shell (SSH) access, or local access, for example, through a KVM switch (keyboard, video or visual display unit, mouse) switch. SSH allows remote access, but local access might be necessary during the initial configuration when the cell is not yet configured for the network. With local access, you have access to the cell operating system shell prompt and use various tools, such as the CellCLI utility, to administer the cell.
You can run the same CellCLI commands remotely on multiple cells with the dcli utility.
To manage a cell remotely from a compute node, you can use the ExaCLI utility. ExaCLI enables you to run most CellCLI commands on a cell. This is necessary if you do not have direct access to a cell to run CellCLI, or if SSH service on the cell has been disabled. To run commands on multiple cells remotely, you can use the
1.3.8 About Database Server Software
Oracle software is installed on the Exadata database servers.
Oracle Exadata System Software works seamlessly with Oracle Database. The software on the database servers includes:
Oracle Database instance, which contains the set of Oracle Database background processes that operate on the stored data and the shared allocated memory that those processes use to do their work.
Oracle Automatic Storage Management (Oracle ASM), which provides storage management optimized for the database and Oracle Exadata Storage Servers. Oracle ASM is part of Oracle Grid Infrastructure.
The Oracle ASM instance handles placement of data files on disks, operating as a metadata manager. The Oracle ASM instance is primarily active during file creation and extension, or during disk rebalancing following a configuration change. Run-time I/O operations are sent directly from the database to storage cells without passing through an Oracle ASM instance.
The Oracle Database Resource Manager, which ensures that I/O resources are properly allocated within a database.
The iDB protocol is used by the database instance to communicate with cells, and is implemented in an Oracle-supplied library statically linked with the database server.
1.3.9 About Oracle Enterprise Manager for Oracle Exadata Database Machine
Oracle Enterprise Manager provides a complete target that enables you to monitor Oracle Exadata Database Machine, including configuration and performance, in a graphical user interface (GUI).
The following figure shows the Exadata Storage Server Grid home page. Viewing this page, you can quickly see the health of the storage servers, key storage performance characteristics, and resource utilization of storage by individual databases.
Figure 1-4 Exadata Storage Server Grid home page in Oracle Enterprise Manager
Description of "Figure 1-4 Exadata Storage Server Grid home page in Oracle Enterprise Manager"
In addition to reports, Oracle Enterprise Manager enables you to set metric thresholds for alerts and monitor metric values to determine the health of your Exadata systems.