Exadata Database Machine Technical Architecture
December 2024
Copyright © 2024 Oracle and/or its affiliates
Copyright Ⓒ 2024 Oracle and/or its affiliates.
This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited.
The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing.
If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable:
U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are "commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs. No other rights are granted to the U.S. Government.
This software or hardware is developed for general use in a variety of information management applications. It is not developed or intended for use in any inherently dangerous applications, including applications that may create a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall be responsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use. Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software or hardware in dangerous applications.
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. UNIX is a registered trademark of The Open Group.
This software or hardware and documentation may provide access to or information about content, products, and services from third parties. Oracle Corporation and its affiliates are not responsible for and expressly disclaim all warranties of any kind with respect to third-party content, products, and services unless otherwise set forth in an applicable agreement between you and Oracle. Oracle Corporation and its affiliates will not be responsible for any loss, costs, or damages incurred due to your access to or use of third-party content, products, or services, except as set forth in an applicable agreement between you and Oracle.
Oracle Exadata Database Machine features scale-out industry-standard database servers, scale-out intelligent storage servers, and high-speed internal RDMA Network Fabric that connects the database and storage servers.
In a single rack, you can start with a base configuration containing two database servers and three storage servers and expand to an elastic configuration with up to 19 database and storage servers in total.
Exadata Database Machine also includes network switches to connect the database servers to the storage servers, and you can add an optional spine switch to connect multiple racks.
Note: All specifications are for Exadata X11M racks. See the related resources for more details about Exadata X11M and information about prior models.
Oracle Exadata Database Machine includes equipment to connect the system to your network. The network connections allow clients to connect to the database servers and also enable remote system administration.
In a single-rack configuration, Oracle Exadata Database Machine includes the following networked components:
Exadata Database Machine provides the following networks and interfaces:
BONDETH0
.BONDETH1
, the second additional network bonded interface name is BONDETH2
, and so on.You can connect up to 14 Exadata X11M racks together before external RDMA Network Fabric switches are required.
The diagram shows the RDMA Network Fabric architecture for two interconnected X11M racks. Each rack illustration shows two database servers (1 and n) and two storage servers (1 and m) representing all the database and storage servers in the rack.
Each rack has one spine switch and two leaf switches. The connections between the leaf switches are removed and replaced by connections between every spine switch and every leaf switch. As shown in the diagram, when connecting two racks, each spine switch has seven connections to each leaf switch. All database and storage servers connect to both leaf switches, the same as in a single rack.
When deploying Oracle Exadata Database Machine, you can choose between the following database server deployment options:
Regardless of the chosen deployment option:
root
and oracle
. The grid
user account is also created if you choose to configure role separation.When configuring Oracle Exadata Database Machine, you can choose High Capacity (HC) or Extreme Flash (EF) storage servers. HC storage servers contain high-performance flash memory and hard disk drives (HDDs). EF storage servers have an all-flash configuration.
Exadata X11M HC and EF storage server models are also equipped with additional memory (DDR5 DRAM) for Exadata RDMA Memory Cache (XRMEM cache), which supports high-performance data access using Remote Direct Memory Access (RDMA).
Note: All specifications are for Exadata X11M storage servers. See the related resources for more details about Exadata X11M and information about prior models.
The Exadata X11M system family contains the following High Capacity (HC) storage server offerings:
Note: All specifications are for Exadata X11M storage servers. See the related resources for more details and information about other models.
Each storage server runs Oracle Exadata System Software to process data at the storage level and pass on only what is needed to the database servers.
On HC storage servers, the flash devices primarily support Exadata Smart Flash Cache, which automatically caches frequently used data in high-performance flash memory. Also, Exadata Smart Flash Log uses a small portion of flash memory as temporary storage to reduce latency and increase throughput for redo log writes.
Starting with Exadata System Software release 24.1, Oracle Exadata Exascale transforms Exadata storage management by decoupling Oracle Database and GI clusters from the underlying Exadata storage servers. Exascale software services manage pools of storage that span the fleet of Exadata storage servers and service multiple users and database server clusters.
Exadata Storage Server X11M Extreme Flash (EF) is the premium all-flash extreme-performance Exadata storage server offering. Each Exadata X11M EF storage server includes the following hardware components:
Note: All specifications are for Exadata X11M storage servers. See the related resources for more details and information about other models.
Each storage server runs Oracle Exadata System Software to process data at the storage level and pass on only what is needed to the database servers.
Like HC storage servers, the 6.8 TB performance-optimized flash devices on EF storage servers primarily support Exadata Smart Flash Cache, which automatically caches frequently used data in high-performance flash memory. Likewise, Exadata Smart Flash Log uses a small portion of flash memory as temporary storage to reduce latency and increase throughput for redo log writes. However, unlike HC storage servers, the 30.72 TB capacity-optimized flash devices provide data storage with much lower latency than hard disk drives (HDDs).
Starting with Exadata System Software release 24.1, Oracle Exadata Exascale transforms Exadata storage management by decoupling Oracle Database and GI clusters from the underlying Exadata storage servers. Exascale software services manage pools of storage that span the fleet of Exadata storage servers and service multiple users and database server clusters.
Oracle Exadata System Software provides database-aware storage services, such as the ability to offload SQL and other database processing from the database server. The database and storage servers both contain components of the Exadata System Software.
Starting with Exadata System Software release 24.1, Oracle Exadata Exascale transforms Exadata storage management by decoupling Oracle Database and GI clusters from the underlying Exadata storage servers. Exascale software services manage pools of storage that span the fleet of Exadata storage servers and service multiple users and database server clusters. With the introduction of Exascale, you can choose from the following storage configuration options:
Each database server includes the following software components:
Each storage server contains data storage hardware and Exadata System Software to manage the data. The software includes the following components:
Administrators manage the database and storage servers using secure network connections over the administration network. In addition to CellCLI and DBMCLI, administrators can use the following command-line interfaces:
Note: This slide lists the major Exadata System Software components. See the related resources for more information.
With the introduction of Exascale, some new software is located on the Exadata database servers.
From an end-user perspective, Oracle Database functionality remains essentially the same. However, the database kernel is modified internally to provide seamless support for Exascale. Instead of using a separate ASM instance, databases on Exascale contain a mapping table in the SGA. This table is a relatively small directory that enables the database to locate the appropriate storage server for any given data. The database instance also contains two new background processes (EGSB and EDSB), which maintain instance-level metadata about the Exascale cluster (otherwise known as Exascale global services or EGS) and Exascale vaults (otherwise known as Exascale data stores or EDS). With Exascale, it is important to note that database clients direct I/O to the appropriate Exadata storage server, not through EGSB or EDSB.
On each Exadata database server, the Exadata System Software also contains new software components, including:
The Exascale node proxy (ESNP) service maintains information about the current state of the Exascale cluster, which it provides to local Oracle Grid Infrastructure and Oracle Database processes.
The Exascale Direct Volume (EDV) service exposes Exascale block storage as raw block devices on Exadata compute nodes. EDV-managed storage can be used as raw storage devices or to support various file systems, including Oracle Advanced Cluster File System (ACFS). This service is required on all Exadata compute nodes where you want to use EDV devices.
Exascale cluster services, also known as Exascale global services (EGS), provide the core foundation for the Exascale system. For high availability, EGS runs in a cluster of five service instances. Each EGS instance typically runs on an Exadata storage server. However, for Exadata configurations with fewer than five storage servers, EGS instances run on the Exadata database servers to make up the required number.
The block store worker service (BSW) primarily services requests from block store clients and performs the resulting storage server I/O. BSW instances usually run on the Exadata storage servers. However, BSW can also run on the Exadata database servers. This location enables the BSW instance to access the Exadata client network, which is required to service iSCSI initiators external to Exadata or run volume backups to Oracle Cloud Infrastructure (OCI) object storage.
Oracle Grid Infrastructure continues to provide cluster services for Exadata databases. However, databases that use Exascale storage do not require an ASM instance.
On each Exadata storage server, the Exadata System Software contains new Exascale software components, including:
Exascale cluster services, also known as Exascale global services (EGS), provide the core foundation for the Exascale system. EGS primarily manages the storage that is allocated to Exascale storage pools. It also manages storage cluster membership and provides security and identity services for storage servers and Exascale clients. Exascale cluster services use the Raft consensus algorithm. For high availability, EGS runs in a cluster of five service instances. Each EGS instance typically runs on an Exadata storage server. However, for Exadata configurations with fewer than five storage servers, EGS instances run on the Exadata database servers to make up the required number.
Exascale control services, also known as Exascale RESTful Services (ERS), provide a management endpoint for Exascale management operations. All Exascale management operations, including all ESCLI commands, come through ERS. But, no file I/O operations come through ERS.
Exascale vault manager, also known as Exascale data services (EDS), is the collective name for the Exascale software services that manage file and vault metadata:
The system vault manager service (SYSEDS) serves and manages the metadata for Exascale vaults. This metadata includes vault-level access control lists (ACLs) and attributes.
The user vault manager service (USREDS) serves and manages the metadata for files inside the Exascale vaults. This metadata includes file-level access control lists (ACLs) and attributes, along with metadata that defines clones and snapshots. All file control operations, such as open and close, are serviced by the user vault manager service.
The block store manager service (BSM) serves and manages the metadata for Exascale block storage. All block store management operations are serviced by BSM. These operations include creating a volume, attaching a volume to an iSCSI initiator, creating a volume snapshot, and so on. BSM also coordinates the block store worker processes and maintains the availability of the block store virtual IP (VIP) addresses used by iSCSI targets.
The block store worker service (BSW) primarily services requests from block store clients and performs the resulting storage server I/O. BSW also plays a role in clone and snapshot creation operations and is responsible for performing volume backup and restore operations.
The instance failure detection (IFD) service is a dedicated lightweight service that quickly detects and responds to any storage server failure. IFD automatically runs on every storage server that is associated with the Exascale cluster.
Apart from EGS, which always runs in a cluster of five service instances, and IFD, which runs on every Exascale storage server. The other Exascale services (ERS, SYSEDS, USREDS, BSM, and BSW) all run multiple instances spread across the available storage servers to provide high availability and share the workload. The exact placement of these Exascale service instances depends on the number of storage servers in the Exascale cluster. Consequently, on a system with only a few storage servers, each will host many different service instances. However, on a system with many storage servers, each might host only a few service instances (possibly none).
Exascale works in conjunction with, and relies on, the core Exadata cell services. Specifically, Exascale requires running instances of Cell Server (CELLSRV), Management Server (MS), and Restart Server (RS) on every storage server.
Every Exadata storage server includes multiple physical disks, which can be hard disk drives (HDDs) or flash devices.
Each physical disk has a logical representation in the operating system (OS) known as a Logical Unit Number (LUN). Typically, there is a one-to-one relationship between physical disks and LUNs on all Exadata storage server models. However, on Exadata Extreme Flash (EF) storage servers with four 30.72 TB capacity-optimized flash devices, each capacity-optimized flash devices is configured with 2 LUNs, resulting in 8 LUNs on each storage server.
A cell disk reserves the space on a LUN for use by Exadata System Software. A LUN may contain only one cell disk.
With the introduction of Exascale, storage is physically organized in Exascale storage pools, each containing numerous pool disks. While one Exadata cell disk can accommodate multiple Exascale pool disks (as indicated in the diagram), this is usually unnecessary because Exascale provides additional facilities that securely share storage pool resources amongst numerous tenants.
On systems that also use Oracle ASM, you can continue to define multiple grid disks on the available space in each cell disk.
With the introduction of Exascale, storage is physically organized in Exascale storage pools, each containing numerous pool disks.
An Exascale vault is a logical storage container that uses the physical resources provided by Exascale storage pools. By default, a vault can use all the associated storage pool resources. However, an Exascale administrator can limit the amount of space, I/O resources (I/Os per second, or IOPS), and cache resources associated with each vault.
To an end-user and Oracle Database, a vault appears like a top-level directory that contains files. Referencing files on Exascale is essentially the same as using an ASM disk group, except that Exascale uses the convention of beginning vault names with the ampersand (@
) character (for example, @VAULT1
) instead of the plus (+
) character (for example, +DATA
).
However, Exascale vaults are much more sophisticated than ASM disk groups.
Exascale vaults facilitate strict data separation, ensuring that data is isolated to specific users and separated from other data and users. A vault, and its contents, are inaccessible to users without the appropriate privileges. For example, without the correct entitlements, users of one vault cannot see another vault, even though data from both vaults is striped across the same underlying storage pool (as illustrated in the diagram).
Furthermore, Exascale inherently distinguishes between various file types and automatically places data files and associated recovery files on separate pool disks. This enables users and databases to maintain all files within one vault instead of different disk groups for data and recovery files.
On systems that also use Oracle ASM, you can continue to define and use ASM disk groups, which reside alongside the Exascale storage pools.