3 Maintaining Oracle Exadata Storage Servers

Oracle Exadata Storage Servers contain disks and memory devices that might require maintenance.

Note:

  • All procedures in this chapter are applicable to Oracle Exadata and Oracle Exadata Storage Expansion Rack.
  • For ease of reading, the name "Oracle Exadata Rack" is used when information refers to both Oracle Exadata and Oracle Exadata Storage Expansion Rack.

3.1 Maintaining Oracle Exadata Storage Servers

This section describes how to perform maintenance on Oracle Exadata Storage Servers.

3.1.1 Shutting Down Exadata Storage Server

When performing maintenance on Exadata Storage Servers, it may be necessary to power down or restart the cell.

If Exadata Storage Server is to be shut down when one or more databases are running, then you must verify that taking Exadata Storage Server offline will not impact Oracle ASM disk group and database availability. The ability to take Exadata Storage Server offline without affecting database availability depends on the level of Oracle ASM redundancy used on the affected disk groups. Availability also depends on the current status of disks in other Exadata Storage Servers that have mirror copies of data for the Exadata Storage Server that you are taking offline.

  1. Optional: Configure the grid disks to remain offline after restarting the cell.
    If you are planning to have multiple restarts, or you want to control when the Exadata Storage Server becomes active again, then you can perform this step. Making the grid disks inactive allows you to verify the planned maintenance activity was successful before making the grid disks available again.
    1. Set the grid disks to inactive.
      CellCLI> ALTER GRIDDISK ALL INACTIVE
      
    2. Wait at least 30 seconds, or until Oracle ASM has completed taking the corresponding Oracle ASM disks offline.
      This step is very important if you are using versions of Oracle Exadata System Software before release 18.1. If you put the commands into a script, then make sure to add a sleep command with a value over 30 seconds.

    Note:

    If you set the grid disks to inactive, then you must complete step 6 later to activate the grid disks.
  2. Stop the cell services.
    CellCLI> ALTER CELL SHUTDOWN SERVICES ALL
    

    The preceding command checks if any disks are offline, in predictive failure status, or need to be copied to its mirror. If Oracle ASM redundancy is intact, then the command takes the grid disks offline in Oracle ASM, and then stops the cell services. If the following error is displayed, then it may not be safe to stop the cell services because a disk group may be forced to dismount due to redundancy.

    Stopping the RS, CELLSRV, and MS services...
    The SHUTDOWN of ALL services was not successful.
    CELL-01548: Unable to shut down CELLSRV because disk group DATA, RECO may be
    forced to dismount due to reduced redundancy.
    Getting the state of CELLSRV services... running
    Getting the state of MS services... running
    Getting the state of RS services... running
    

    If the CELL-01548 error occurs, then restore Oracle ASM disk group redundancy and retry the command when disk status is back to normal for all the disks.

  3. Shut down the Exadata Storage Server.
  4. After performing the maintenance, restart the Exadata Storage Server. The cell services are started automatically. As part of the Exadata Storage Server startup, all grid disks are automatically changed to ONLINE in Oracle ASM.
  5. Verify that all grid disks have been successfully brought online.
    CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus
    

    Wait until asmmodestatus shows ONLINE or UNUSED for all grid disks.

  6. Optional: Change the grid disks status to ONLINE.

    This step is only necessary when step 1 has been performed. If step 1 was not performed, then the grid disks were set to online automatically when the Exadata Storage Server was restarted.

    CellCLI> ALTER GRIDDISK ALL ACTIVE

3.1.2 Checking Status of a Rebalance Operation

When dropping or adding a disk, you can check the status of the Oracle ASM rebalance operation.

  • The rebalance operation may have completed successfully. Check the Oracle ASM alert logs to confirm.

  • The rebalance operation may be currently running. Check the GV$ASM_OPERATION view to determine if the rebalance operation is still running.

  • The rebalance operation may have failed. Check the V$ASM_OPERATION.ERROR view to determine if the rebalance operation failed.

  • Rebalance operations from multiple disk groups can be done on different Oracle ASM instances in the same cluster if the physical disk being replaced contains Oracle ASM disks from multiple disk groups. One Oracle ASM instance can run one rebalance operation at a time. If all Oracle ASM instances are busy, then rebalance operations are queued.

Note:

For Oracle Exadata Storage Servers running Oracle Exadata System Software release 12.1.2.0 with Oracle Database release 12.1.0.2 with BP4, Oracle ASM sends an e-mail about the status of a rebalance operation. In earlier releases, the administrator had to check the status of the operation.

3.1.3 Enabling Network Connectivity with the Diagnostic ISO

If a storage server does not restart, the diagnostic ISO may be needed to access the cell so it can be manually repaired.

The diagnostic ISO should be used after other boot methods, such as using the USB, do not work.

The following procedure enables networking with the diagnostic ISO so files can be transferred to repair the cell:

  1. Restart the system using the diagnostics.iso file.
    See Booting a Server using the Diagnostic ISO File in Oracle Exadata System Software User's Guide.
  2. Log in to the diagnostics shell as the root user.
    When prompted, enter the diagnostics shell.

    For example:

    Choose from following by typing letter in '()':
    (e)nter interactive diagnostics shell. Must use credentials 
    from Oracle support to login (reboot or power cycle to exit
    the shell),
    (r)estore system from NFS backup archive, 
    Type e to enter the diagnostics shell and log in as the root user.
    If prompted, log in to the system as the root user. If you are prompted for the root user password and do not have it, then contact Oracle Support Services.
  3. Use the following command to avoid pings:
    alias ping="ping -c"
    
  4. Make a directory named /etc/network.
  5. Make a directory named /etc/network/if-pre-up.d.
  6. Add the following lines to the /etc/network/interfaces file:
    iface eth0 inet static
    address IP_address_of_cell
    netmask netmask_of_cell
    gateway gateway_IP_address_of_cell
    
  7. Bring up the eth0 interface using the following command:
    ifup eth0
     

    There may be some warning messages, but the interface is operational.

  8. Use either FTP or the wget command to retrieve the files to repair the cell.

3.2 Using Exadata Extended (XT) Storage Servers

Oracle Exadata Extended (XT) Storage Server offers a lower cost storage option that can be used for infrequently accessed, older, or regulatory data.

3.2.1 About Oracle Exadata Extended (XT) Storage Servers

Oracle Exadata Extended (XT) Storage Servers help you extend the operational and management benefits of Exadata Database Machine to rarely accessed data that must be kept online.

Each XT storage server includes twelve high-capacity disk drives. However, to achieve a lower cost, flash devices are not included. Also, Oracle Exadata System Software licensing is optional, with some software features disabled unless licensed. Hybrid Columnar Compression is included without requiring Oracle Exadata System Software licenses.

XT storage servers use the same RDMA Network Fabric as the other servers in your Oracle Exadata Rack. XT storage servers add storage capacity while remaining transparent to applications, transparent to SQL, and retaining the same operational model. You can use the same security model and encryption used for your other Exadata storage servers.

You can add XT storage servers to Oracle Exadata Rack X4 or newer, including Eighth Rack configurations. You must add at least 2 servers initially. After the initial 2 servers, you can add XT storage servers as needed. To implement high redundancy, you must have a minimum of 3 XT storage servers. XT storage servers follow the same placement patterns as High Capacity (HC) and Extreme Flash (EF) storage servers.

An Oracle ASM disk group should use storage provided by only one type of storage server (HC, EF, or XT). After adding the XT storage servers to your rack, create new disk groups to use the storage. The default disk group name for XT storage servers is XTND. However, you can use a different name as required.

XT storage servers provide fully integrated storage for Oracle Database. You can use the new disk group with database features such as Oracle Partitioning, Oracle Automatic Data Optimization, and Oracle Advanced Compression.

3.2.2 What Data Can Be Stored on Oracle Exadata Extended (XT) Storage Servers?

Oracle Exadata Extended (XT) Storage Servers are intended provide lower-cost storage for infrequently accessed, older, or regulatory data.

XT storage servers help you to keep all your required data online and available for queries. This includes data such as:

  • Historical data
  • Images, BLOBs, contracts, and other large table-based objects
  • Compliance and regulatory data
  • Local backups

The XT storage servers can also provide storage for development databases which have less stringent performance requirements compared to production databases.

3.2.3 Enabling Smart Scan on Exadata Extended (XT) Storage Servers

If you purchase Oracle Exadata System Software licenses for your Oracle Exadata Extended (XT) Storage Servers, then you can enable features such as Smart Scan and Storage Indexes to improve performance.

  1. Procure or transfer Oracle Exadata System Software licenses.

    All drives must be licensed to enable Smart Scan.

  2. Modify the enableSmartStorage attribute for the XT storage servers.

    You do not need to stop the storage servers first. Simply run the following command on each XT storage server that is licensed:

    cellcli -e ALTER CELL enableSmartStorage=true
  3. Verify the cell has been modified.
    cellcli -e "LIST CELL ATTRIBUTES name, status, enableSmartStorage" 

3.3 Maintaining the Hard Disks of Oracle Exadata Storage Servers

Every Oracle Exadata Storage Server in Oracle Exadata Rack has a system area, which is where the Oracle Exadata System Software system software resides. In Oracle Exadata X7 and later systems, two internal M.2 devices contain the system area. In all other systems, the first two disks of Oracle Exadata Storage Server are system disks and the portions on these system disks are referred to as the system area.

In Oracle Exadata X7 and later systems, all the hard disks in the cell are data disks. In systems prior to Oracle Exadata X7, the non-system area of the system disks, referred to as data partitions, is used for normal data storage. All other disks in the cell are data disks.

Starting in Oracle Exadata System Software release 11.2.3.2.0, if there is a disk failure, then Oracle Exadata System Software sends an alert stating that the disk can be replaced, and, after all data has been rebalanced out from that disk, turns on the blue OK to Remove LED for the hard disk with predictive failure. In Oracle Exadata System Software releases earlier than 11.2.3.2.0, the amber Fault-Service Required LED was turned on for a hard disk with predictive failure, but not the blue LED. In these cases, it is necessary to manually check if all data has been rebalanced out from the disk before proceeding with disk replacement.

Starting with Oracle Exadata System Software release 18.1.0.0.0 and Oracle Exadata X7 systems, there is an additional white Do Not Service LED that indicates when redundancy is reduced to inform system administrators or field engineers that the storage server should not be powered off for services. When redundancy is restored, Oracle Exadata System Software automatically turns off the Do Not Service LED to indicate that the cell can be powered off for services.

For a hard disk that has failed, both the blue OK to Remove LED and the amber Fault-Service Required LED are turned on for the drive indicating that disk replacement can proceed. The behavior is the same in all releases. The drive LED light is a solid light in Oracle Exadata System Software releases 11.2.3.2.0 and later; the drive LED blinks in earlier releases.

Note:

Oracle Exadata Rack is online and available while replacing the Oracle Exadata Storage Server physical disks.

This section contains the following topics:

3.3.1 Monitoring the Status of Hard Disks

You can monitor the status of a hard disk by checking its attributes with the CellCLI LIST PHYSICALDISK command.

For example, a hard disk status equal to failed (the status for failed hard disks was critical in earlier releases), or warning - predictive failure is probably having problems and needs to be replaced. The disk firmware maintains the error counters, and marks a drive with Predictive Failure when internal thresholds are exceeded. The drive, not the cell software, determines if it needs replacement.

  • Use the CellCLI command LIST PHSYICALDISK to determine the status of a hard disk:
    CellCLI> LIST PHYSICALDISK WHERE disktype=harddisk AND status!=normal DETAIL
             name:                            8:4
             deviceId:              12
               deviceName:                   /dev/sde
               diskType:                      HardDisk
             enclosureDeviceId:      8
             errOtherCount:          0
             luns:                   0_4
               makeModel:                    "HGST    H7280A520SUN8.0T"
             physicalFirmware:         PD51
             physicalInsertTime:      2016-11-30T21:24:45-08:00
             physicalInterface:     sas
             physicalSerial:            PA9TVR
             physicalSize:               7.153663907200098T
             slotNumber:                  4
             status:                        failed

When disk I/O errors occur, Oracle ASM performs bad extent repair for read errors due to media errors. The disks will stay online, and no alerts are sent. When Oracle ASM gets a read error on a physically-addressed metadata block, it does not have mirroring for the blocks, and takes the disk offline. Oracle ASM then drops the disk using the FORCE option.

The Oracle Exadata Storage Server hard disk statuses are as follows:

  • Oracle Exadata System Software release 11.2.3.3 and later:

    • normal
    • normal - dropped for replacement
    • normal - confinedOnline
    • normal - confinedOnline - dropped for replacement
    • not present
    • failed
    • failed - dropped for replacement
    • failed - rejected due to incorrect disk model
    • failed - rejected due to incorrect disk model - dropped for replacement
    • failed - rejected due to wrong slot
    • failed - rejected due to wrong slot - dropped for replacement
    • warning - confinedOnline
    • warning - confinedOnline - dropped for replacement
    • warning - peer failure
    • warning - poor performance
    • warning - poor performance - dropped for replacement
    • warning - poor performance, write-through caching
    • warning - predictive failure, poor performance
    • warning - predictive failure, poor performance - dropped for replacement
    • warning - predictive failure, write-through caching
    • warning - predictive failure
    • warning - predictive failure - dropped for replacement
    • warning - predictive failure, poor performance, write-through caching
    • warning - write-through caching
  • Oracle Exadata System Software release 11.2.3.2:

    • normal
    • normal - confinedOnline
    • not present
    • failed
    • failed - rejected due to incorrect disk model
    • failed - rejected due to wrong slot
    • warning - confinedOnline
    • warning - peer failure
    • warning - poor performance
    • warning - poor performance, write-through caching
    • warning - predictive failure, poor performance
    • warning - predictive failure, write-through caching
    • warning - predictive failure
    • warning - predictive failure, poor performance, write-through caching
    • warning - write-through caching
  • Oracle Exadata System Software release 11.2.3.1.1 and earlier:

    • normal
    • critical
    • poor performance
    • predictive failure
    • not present

3.3.2 Monitoring Hard Disk Controller Write-through Caching Mode

The hard disk controller on each Oracle Exadata Storage Server periodically performs a discharge and charge of the controller battery. During the operation, the write cache policy changes from write-back caching to write-through caching.

Write-through cache mode is slower than write-back cache mode. However, write-back cache mode has a risk of data loss if the Oracle Exadata Storage Server loses power or fails. For Oracle Exadata System Software releases earlier than release 11.2.1.3, the operation occurs every month. For Oracle Exadata System Software release 11.2.1.3.0 and later, the operation occurs every three months, for example, at 01:00 on the 17th day of January, April, July and October.

  • To change the start time for the learn cycle, use a command similar to the following:
    CellCLI> ALTER CELL bbuLearnCycleTime="2013-01-22T02:00:00-08:00"
    

    The time reverts to the default learn cycle time after the cycle completes.

  • To see the time for the next learn cycle, use the following command:
    CellCLI> LIST CELL ATTRIBUTES bbuLearnCycleTime
    

    Oracle Exadata Storage Server generates an informational alert about the status of the caching mode for logical drives on the cell, similar to the following:

    HDD disk controller battery on disk controller at adapter 0 is going into a learn
    cycle. This is a normal maintenance activity that occurs quarterly and runs for
    approximately 1 to 12 hours. The disk controller cache might go into WriteThrough
    caching mode during the learn cycle. Disk write throughput might be temporarily
    lower during this time. The message is informational only, no action is required.
    
  • To view the status of the battery, use a command similar to the following example:

    Note:

    If you are running Oracle Exadata System Software 19.1.0 or later, substitute /opt/MegaRAID/storcli/storcli64 for /opt/MegaRAID/MegaCli/MegaCli64 in the following command:
    # /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -GetBbuStatus -a0
    BBU status for Adapter: 0
     
    BatteryType: iBBU08
    Voltage: 3721 mV
    Current: 541 mA
    Temperature: 43 C
     
    BBU Firmware Status:
    Charging Status : Charging
    Voltage : OK
    Temperature : OK
    Learn Cycle Requested : No
    Learn Cycle Active : No
    Learn Cycle Status : OK
    Learn Cycle Timeout : No
    I2c Errors Detected : No
    Battery Pack Missing : No
    Battery Replacement required : No
    Remaining Capacity Low : Yes
    Periodic Learn Required : No
    Transparent Learn : No
     
    Battery state:
     
    GasGuageStatus:
    Fully Discharged : No
    Fully Charged : No
    Discharging : No
    Initialized : No
    Remaining Time Alarm : Yes
    Remaining Capacity Alarm: No
    Discharge Terminated : No
    Over Temperature : No
    Charging Terminated : No
    Over Charged : No
     
    Relative State of Charge: 7 %
    Charger System State: 1
    Charger System Ctrl: 0
    Charging current: 541 mA
    Absolute state of charge: 0 %
    Max Error: 0 %
     
    Exit Code: 0x00

3.3.3 Replacing a Hard Disk Due to Disk Failure

A hard disk outage can cause a reduction in performance and data redundancy. Therefore, the disk should be replaced with a new disk as soon as possible. When the disk fails, the Oracle ASM disks associated with the grid disks on the hard disk are automatically dropped with the FORCE option, and an Oracle ASM rebalance follows to restore the data redundancy.

An Exadata alert is generated when a disk fails. The alert includes specific instructions for replacing the disk. If you have configured the system for alert notifications, then the alert is sent by e-mail to the designated address.

After the hard disk is replaced, the grid disks and cell disks that existed on the previous disk in that slot are re-created on the new hard disk. If those grid disks were part of an Oracle ASM group, then they are added back to the disk group, and the data is rebalanced on them, based on the disk group redundancy and ASM_POWER_LIMIT parameter.

Note:

For storage servers running Oracle Exadata System Software release 12.1.2.0 with Oracle Database release 12.1.0.2 with BP4, Oracle ASM sends an e-mail about the status of a rebalance operation. In earlier releases, the administrator had to check the status of the operation.

For earlier releases, check the rebalance operation status as described in Checking Status of a Rebalance Operation.

The following procedure describes how to replace a hard disk due to disk failure:

  1. Determine the failed disk using the following command:
    CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk AND status=failed DETAIL
    

    The following is an example of the output from the command. The slot number shows the location of the disk, and the status shows that the disk has failed.

    CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk AND status=failed DETAIL
    
             name:                   28:5
             deviceId:               21
             diskType:               HardDisk
             enclosureDeviceId:      28
             errMediaCount:          0
             errOtherCount:          0
             foreignState:           false
             luns:                   0_5
             makeModel:              "SEAGATE ST360057SSUN600G"
             physicalFirmware:       0705
             physicalInterface:      sas
             physicalSerial:         A01BC2
             physicalSize:           558.9109999993816G
             slotNumber:             5
             status:                 failed
    
  2. Ensure the blue OK to Remove LED on the disk is lit before removing the disk.
  3. Replace the hard disk on Oracle Exadata Storage Server and wait for three minutes. The hard disk is hot-pluggable, and can be replaced when the power is on.
  4. Confirm the disk is online.

    When you replace a hard disk, the disk must be acknowledged by the RAID controller before you can use it. This does not take long.

    Use the LIST PHYSICALDISK command similar to the following to ensure the status is NORMAL.

    CellCLI> LIST PHYSICALDISK WHERE name=28:5 ATTRIBUTES status
    
  5. Verify the firmware is correct using the ALTER CELL VALIDATE CONFIGURATION command.

In rare cases, the automatic firmware update may not work, and the LUN is not rebuilt. This can be confirmed by checking the ms-odl.trc file.

See Also:

3.3.4 Replacing a Hard Disk Due to Disk Problems

You may need to replace a hard disk because the disk is in warning - predictive failure status.

The predictive failure status indicates that the hard disk will soon fail, and should be replaced at the earliest opportunity. The Oracle ASM disks associated with the grid disks on the hard drive are automatically dropped, and an Oracle ASM rebalance relocates the data from the predictively failed disk to other disks.

If the drop did not complete before the hard drive dies, then refer to Replacing a Hard Disk Due to Disk Failure.

An alert is sent when the disk is removed. After replacing the hard disk, the grid disks and cell disks that existed on the previous disk in the slot are re-created on the new hard disk. If those grid disks were part of an Oracle ASM disk group, then they are added back to the disk group, and the data is rebalanced based on disk group redundancy and the ASM_POWER_LIMIT parameter.

Note:

On Oracle Exadata Storage Servers running Oracle Exadata System Software release 12.1.2.0 with Oracle Database release 12.1.0.2 with BP4, Oracle ASM sends an e-mail about the status of a rebalance operation. In earlier releases, the administrator had to check the status of the operation.

For earlier releases, check the rebalance operation status as described in Checking Status of a Rebalance Operation.

  1. Determine which disk is the failing disk.
    CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk AND status= \
            "warning - predictive failure" DETAIL
    

    The following is an example of the output. The slot number shows the location of the disk, and the status shows the disk is expected to fail.

    CellCLI> LIST PHYSICALDISK WHERE diskType=HardDisk AND status= \
             "warning - predictive failure" DETAIL
             name:                   28:3
             deviceId:               19
             diskType:               HardDisk
             enclosureDeviceId:      28
             errMediaCount:          0
             errOtherCount:          0
             foreignState:           false
             luns:                   0_3
             makeModel:              "SEAGATE ST360057SSUN600G"
             physicalFirmware:       0705
             physicalInterface:      sas
             physicalSerial:         E07L8E
             physicalSize:           558.9109999993816G
             slotNumber:             3
             status:                 warning - predictive failure
    
  2. Ensure the blue OK to Remove LED on the disk is lit before removing the disk.
  3. Wait until the Oracle ASM disks associated with the grid disks on the hard disk have been successfully dropped. To determine if the grid disks have been dropped, query the V$ASM_DISK_STAT view on the Oracle ASM instance.

    Caution:

    On all systems prior to Oracle Exadata Database Machine X7, the disks in the first two slots are system disks which store the operating system and Oracle Exadata System Software. One system disk must be in working condition to keep up the server.

    Wait until ALTER CELL VALIDATE CONFIGURATION shows no mdadm errors, which indicates the system disk resynchronization has completed, before replacing the other system disk.

  4. Replace the hard disk on Oracle Exadata Storage Server and wait for three minutes. The hard disk is hot-pluggable, and can be replaced when the power is on.
  5. Confirm the disk is online.

    When you replace a hard disk, the disk must be acknowledged by the RAID controller before you can use it. This does not take long. Use the LIST PHYSICALDISK command to ensure the status is NORMAL.

    CellCLI> LIST PHYSICALDISK WHERE name=28:3 ATTRIBUTES status
    
  6. Verify the firmware is correct using the ALTER CELL VALIDATE CONFIGURATION command.

See Also:

3.3.5 Replacing a Hard Disk Due to Bad Performance

A single bad hard disk can degrade the performance of other good disks. It is better to remove the bad disk from the system than let it remain.

Starting with Oracle Exadata System Software release 11.2.3.2, an underperforming disk is automatically identified and removed from active configuration. Oracle Exadata Database Machine then runs a set of performance tests. When poor disk performance is detected by CELLSRV, the cell disk status changes to normal - confinedOnline, and the hard disk status changes to warning - confinedOnline.

The following conditions trigger disk confinement:

  • Disk stopped responding. The cause code in the storage alert log is CD_PERF_HANG.

  • Slow cell disk such as the following:

    • High service time threshold (cause code CD_PERF_SLOW_ABS)

    • High relative service time threshold (cause code CD_PERF_SLOW_RLTV)

  • High read or write latency such as the following:

    • High latency on writes (cause code CD_PERF_SLOW_LAT_WT)

    • High latency on reads (cause code CD_PERF_SLOW_LAT_RD)

    • High latency on reads and writes (cause code CD_PERF_SLOW_LAT_RW)

    • Very high absolute latency on individual I/Os happening frequently (cause code CD_PERF_SLOW_LAT_ERR)

  • Errors such as I/O errors (cause code CD_PERF_IOERR).

If the disk problem is temporary and passes the tests, then it is brought back into the configuration. If the disk does not pass the tests, then it is marked as poor performance, and Oracle Auto Service Request (ASR) submits a service request to replace the disk. If possible, Oracle ASM takes the grid disks offline for testing. If Oracle ASM cannot take the disks offline, then the cell disk status stays at normal - confinedOnline until the disks can be taken offline safely.

The disk status change is associated with the following entry in the cell alert history:

MESSAGE ID date_time info "Hard disk entered confinement status. The LUN
 n_m changed status to warning - confinedOnline. CellDisk changed status to normal
 - confinedOnline. Status: WARNING - CONFINEDONLINE  Manufacturer: name  Model
 Number: model  Size: size  Serial Number: serial_number  Firmware: fw_release 
 Slot Number: m  Cell Disk: cell_disk_name  Grid Disk: grid disk 1, grid disk 2
 ... Reason for confinement: threshold for service time exceeded"

The following would be logged in the storage cell alert log:

CDHS: Mark cd health state change cell_disk_name  with newState HEALTH_BAD_
ONLINE pending HEALTH_BAD_ONLINE ongoing INVALID cur HEALTH_GOOD
Celldisk entering CONFINE ACTIVE state with cause CD_PERF_SLOW_ABS activeForced: 0
inactiveForced: 0 trigger HistoryFail: 0, forceTestOutcome: 0 testFail: 0
global conf related state: numHDsConf: 1 numFDsConf: 0 numHDsHung: 0 numFDsHung: 0
...

Note:

In releases earlier than Oracle Exadata System Software release 11.2.3.2, use the CALIBRATE command to identify a bad hard disk, and look for very low throughput and IOPS for each hard disk.

The following procedure describes how to remove a hard disk once the bad disk has been identified:

  1. Illuminate the hard drive service LED to identify the drive to be replaced using a command similar to the following, where disk_name is the name of the hard disk to be replaced, such as 20:2:
    cellcli -e 'alter physicaldisk disk_name serviceled on'
    
  2. Find all the grid disks on the bad disk.

    For example:

    [root@exa05celadm03 ~]# cellcli -e "list physicaldisk 20:11 attributes name, id"
            20:11 RD58EA 
    [root@exa05celadm03 ~]# cellcli -e "list celldisk where physicalDisk='RD58EA'"
            CD_11_exa05celadm03 normal 
    [root@exa05celadm03 ~]# cellcli -e "list griddisk where cellDisk='CD_11_exa05celadm03'"
            DATA_CD_11_exa05celadm03 active
            DBFS_CD_11_exa05celadm03 active
            RECO_CD_11_exa05celadm03 active
            TPCH_CD_11_exa05celadm03 active
  3. Direct Oracle ASM to stop using the bad disk immediately.
    SQL> ALTER DISKGROUP diskgroup_name DROP DISK asm_disk_name;
    
  4. Ensure the blue OK to Remove LED on the disk is lit before removing the disk.
  5. Ensure that the Oracle ASM disks associated with the grid disks on the bad disk have been successfully dropped by querying the V$ASM_DISK_STAT view.
  6. Remove the badly-performing disk. An alert is sent when the disk is removed.
  7. When a new disk is available, install the new disk in the system. The cell disks and grid disks are automatically created on the new hard disk.

    Note:

    When a hard disk is replaced, the disk must be acknowledged by the RAID controller before it can be used. The acknowledgement does not take long, but use the LIST PHYSICALDISK command to ensure the status is NORMAL.

See Also:

3.3.6 Replacing a Hard Disk Proactively

Exadata Storage software has a complete set of automated operations for hard disk maintenance, when a hard disk has failed or has been flagged as a problematic disk. But there are situations where a hard disk has to be removed proactively from the configuration.

In the CellCLI ALTER PHYSICALDISK command, the DROP FOR REPLACEMENT option checks if a normal functioning hard disk can be removed safely without the risk of data loss. However, after the execution of the command, the grid disks on the hard disk are inactivated on the storage cell and set to offline in the Oracle ASM disk groups.

To reduce the risk of having a disk group without full redundancy and proactively replace a hard disk, follow this procedure:

  1. Identify the LUN, cell disk, and grid disk associated with the hard disk.

    Use a command similar to the following where, X:Y identifies the hard disk name of the drive you are replacing.

    # cellcli –e "list diskmap" | grep 'X:Y'

    The output should be similar to the following:

       20:5            KEBTDJ          5                       normal  559G           
        CD_05_exaceladm01    /dev/sdf                
        "DATAC1_CD_05_exaceladm01, DBFS_DG_CD_05_exaceladm01, 
         RECOC1_CD_05_exaceladm01"
    

    To get the LUN, issue a command similar to the following:

    CellCLI> list lun where deviceName='/dev/sdf/'
             0_5     0_5     normal
    
  2. Drop the disk.
    • If you are using at least Oracle Exadata System Software release 21.2.0, use the following command to drop the physical disk while maintaining redundancy:

      CellCLI> alter physicaldisk X:Y drop for replacement maintain redundancy

      Wait for the operation to complete before continuing.

    • If you are using an Oracle Exadata System Software release before 21.2.0, do the following:

      1. Drop the affected grid disks from the Oracle ASM disk groups in normal mode.

        SQL> ALTER DISKGROUP diskgroup_name DROP DISK asm_disk_name;
      2. Wait for the ASM rebalance operation to complete before continuing.

      3. Drop the physical disk.

        Use a command similar to the following where, X:Y identifies the hard disk name of the drive you are replacing.

        CellCLI> alter physicaldisk X:Y drop for replacement
  3. Ensure that the blue OK to Remove LED on the disk is lit before removing the disk.
  4. Replace the new hard disk.
  5. Verify the LUN, cell disk and grid disk associated with the hard disk were created.
    CellCLI> list lun lun_name
    CellCLI> list celldisk where lun=lun_name
    CellCLI> list griddisk where celldisk=celldisk_name
  6. Verify the grid disk was added to the Oracle ASM disk groups.

    The following query should return no rows.

    SQL> SELECT path,header_status FROM v$asm_disk WHERE group_number=0;

    The following query shows whether all the failure groups have the same number of disks:

    SQL> SELECT group_number, failgroup, mode_status, count(*) FROM v$asm_disk
         GROUP BY group_number, failgroup, mode_status;

3.3.7 Moving All Drives to Another Exadata Storage Server

It may necessary to move all drives from one Exadata Storage Server to another Exadata Storage Server.

This need may occur when there is a chassis-level component failure, such as a motherboard or ILOM failure, or when troubleshooting a hardware problem.

  1. Back up the files in the following directories:
    • /etc/hosts

    • /etc/modprobe.conf

    • /etc/sysconfig/network

    • /etc/sysconfig/network-scripts

  2. Safely inactivate all grid disks and shut down Exadata Storage Server.

    Refer to "Shutting Down Exadata Storage Server". Make sure the Oracle ASM disk_repair_time attribute is set to a sufficiently large enough value so Oracle ASM does not drop the disks before the grid disks can be activated in another Exadata Storage Server.

  3. Move the hard disks, flash disks, disk controller and USB flash drive from the original Exadata Storage Server to the new Exadata Storage Server.

    Caution:

    • Ensure the first two disks, which are the system disks, are in the same first two slots. Failure to do so causes the Exadata Storage Server to function improperly.

    • Ensure the flash cards are installed in the same PCIe slots as the original Exadata Storage Server.

  4. Power on the new Exadata Storage Server using either the service processor interface or by pressing the power button.
  5. Log in to the console using the service processor or the KVM switch.
  6. Check the files in the following directories. If they are corrupted, then restore them from the backups.
    • /etc/hosts

    • /etc/modprobe.conf

    • /etc/sysconfig/network

    • /etc/sysconfig/network-scripts

  7. Use the ifconfig command to retrieve the new MAC address for eth0, eth1, eth2, and eth3. For example:
    # ifconfig eth0
    eth0      Link encap:Ethernet  HWaddr 00:14:4F:CA:D9:AE
              inet addr:10.204.74.184  Bcast:10.204.75.255  Mask:255.255.252.0
              inet6 addr: fe80::214:4fff:feca:d9ae/64 Scope:Link
              UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
              RX packets:141455 errors:0 dropped:0 overruns:0 frame:0
              TX packets:6340 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:1000
              RX bytes:9578692 (9.1 MiB)  TX bytes:1042156 (1017.7 KiB)
              Memory:f8c60000-f8c80000
    
  8. Edit the ifcfg-eth0 file, ifcfg-eth1 file, ifcfg-eth2 file, and ifcfg-eth3 file in the /etc/sysconfig/network-scripts directory to change the HWADDR value based on the output from step 7. The following is an example of the ifcfg-eth0 file:
    #### DO NOT REMOVE THESE LINES ####
    #### %GENERATED BY CELL% ####
    DEVICE=eth0
    BOOTPROTO=static
    ONBOOT=yes
    IPADDR=10.204.74.184
    NETMASK=255.255.252.0
    NETWORK=10.204.72.0
    BROADCAST=10.204.75.255
    GATEWAY=10.204.72.1
    HOTPLUG=no
    IPV6INIT=no
    HWADDR=00:14:4F:CA:D9:AE
    
  9. Restart Exadata Storage Server.
  10. Activate the grid disks using the following command:
    CellCLI> ALTER GRIDDISK ALL ACTIVE
    

    If the Oracle ASM disk on the disks on the cell have not been dropped, then they change to ONLINE automatically, and start getting used.

  11. Validate the configuration using the following command:
    CellCLI> ALTER CELL VALIDATE CONFIGURATION
    
  12. Activate the ILOM for ASR.

3.3.8 Repurposing a Hard Disk

You may want to delete all data on a disk, and then use the disk for another purpose.

Before repurposing a hard disk, ensure that you have copies of the data that is on the disk.

If you use this procedure for the system disks (disk 0 and disk1), then only the data partitions are erased, not the system partitions.

  1. Use the CellCLI LIST command to display the Exadata Storage Server objects. You must identify the grid disks and cell disks on the hard drive. For example:
    CellCLI> LIST PHYSICALDISK
             20:0   D174LX    normal
             20:1   D149R0    normal
             ...
    
  2. Determine the cell disks and grid disks on the LUN, using a command similar to the following:
    CellCLI> LIST LUN WHERE physicalDrives='20:0' DETAIL
      name:              0_0
      deviceName:        /dev/sda
      diskType:          HardDisk
      id:                0_0
      isSystemLun:       TRUE
      lunSize:           557.861328125G
      lunUID:            0_0
      physicalDrives:    20:0
      raidLevel:         0
      lunWriteCacheMode: "WriteBack, ReadAheadNone, Direct, No Write Cache if Bad BBU"
      status:            normal
    

    To get the celldisks and grid disks, use a command similar to the following:

    #cellcli -e "list diskmap" | grep 20:0
    
       20:0            K68DWJ          0                       normal  559G
       CD_00_burd01celadm01    /dev/sda3   
       "DATAC1_CD_00_burd01celadm01, RECOC1_CD_00_burd01celadm01"
  3. From Oracle ASM, drop the Oracle ASM disks on the hard disk using the following command:
    SQL> ALTER DISKGROUP diskgroup_name DROP DISK asm_disk_name;
    
  4. From Exadata Storage Server, drop the cell disks and grid disks on the hard disk using the following command:
    CellCLI> DROP CELLDISK celldisk_on_this_lun FORCE 
    

    Note:

    To overwrite all data on the cell disk, use the ERASE option with the DROP CELLDISK command. The following is an example of the command:

    CellCLI> DROP CELLDISK CD_03_cell01 ERASE=1pass NOWAIT
    
    CellDisk CD_03_cell01 erase is in progress
    
  5. Drop the drive for hot removal. For example:
    CellCli> ALTER PHYSICALDISK 20:0 DROP FOR REPLACEMENT
    
  6. Ensure the blue OK to Remove LED on the disk is lit before removing the disk.

    Caution:

    Ensure the disk blue LED is turned on before removing the drive. Do not remove the drive if the disk blue LED is unlit, or it may cause your system to crash.

  7. Remove the disk to be repurposed, and insert a new disk.
  8. Wait for the new hard disk to be added as a LUN.
    CellCLI> LIST LUN
    

    The cell disks and grid disks are automatically be created on the new hard disk, and the grid disks are added to the Oracle ASM group.

3.3.9 Removing and Replacing the Same Hard Disk

What happens if you accidentally remove the wrong hard disk?

If you inadvertently remove the wrong hard disk, then put the disk back. It will automatically be added back in the Oracle ASM disk group, and its data is resynchronized.

Note:

When replacing disk due to disk failure or disk problems, the LED is lit on the disk for identification.

3.3.10 Re-Enabling a Hard Disk That Was Rejected

If a physical disk was rejected because it was inserted into the wrong slot, you can re-enable the disk.

Run the following command:

Caution:

The following command removes all data on the physical disk.

CellCLI> ALTER PHYSICALDISK hard_disk_name reenable force

The following is an example of the output from the command:

Physical disk 20:0 was reenabled.

3.4 Maintaining Flash Disks on Oracle Exadata Storage Servers

Data is mirrored across Exadata Cells, and write operations are sent to at least two storage cells. If a flash card in one Oracle Exadata Storage Server has problems, then the read and write operations are serviced by the mirrored data in another Oracle Exadata Storage Server. No interruption of service occurs for the application.

If a flash card fails while in write-back mode, then Oracle Exadata System Software determines the data in the flash cache by reading the data from the mirrored copy. The data is then written to the cell that had the failed flash card. The location of the data lost in the failed flash cache is saved by Oracle Exadata System Software at the time of the flash failure. Resilvering then starts by replacing the lost data with the mirrored copy. During resilvering, the grid disk status is ACTIVE -- RESILVERING WORKING. If the cache is in write-through mode, then the data in the failed device is already present on the data grid disk, so there is no need for resilvering.

3.4.1 Replacing a Flash Disk Due to Flash Disk Failure

Each Oracle Exadata Storage Server is equipped with flash devices.

Starting with Oracle Exadata Database Machine X7, the flash devices are hot-pluggable on the Oracle Exadata Storage Servers. When performing a hot-pluggable replacement of a flash device on Oracle Exadata Storage Servers for X7 or later, the disk status should be Dropped for replacement, and the power LED on the flash card should be off, which indicates the flash disk is ready for online replacement.

Caution:

Removing a card with power LED on could result in a system crash. If a failed disk has a status of Failed – dropped for replacement but the power LED is still on, contact Oracle Support Services.

For Oracle Exadata Database Machine X6 and earlier, the flash devices are hot-pluggable on Extreme Flash (EF) storage servers, but not on High Capacity (HC) storage servers. On HC storage servers, you need to power down the storage servers before replacing them.

To identify a failed flash disk, use the following command:

CellCLI> LIST PHYSICALDISK WHERE disktype=flashdisk AND status=failed DETAIL

The following is an example of the output from an Extreme Flash storage server:


    name:                          NVME_10
    deviceName:                    /dev/nvme7n1
    diskType:                      FlashDisk
    luns:                          0_10
    makeModel:                     "Oracle NVMe SSD"
    physicalFirmware:              8DV1RA13
    physicalInsertTime:            2016-09-28T11:29:13-07:00
    physicalSerial:                CVMD426500E21P6LGN
    physicalSize:                  1.4554837569594383T
    slotNumber:                    10
    status:                        failed

The following is an example of the output from an Oracle Flash Accelerator F160 PCIe Card:

CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=flashdisk AND STATUS=failed DETAIL

         name:                   FLASH_5_1
         deviceName:             /dev/nvme1n1
         diskType:               FlashDisk
         luns:                   5_1
         makeModel:              "Oracle Flash Accelerator F160 PCIe Card"
         physicalFirmware:       8DV1RA13
         physicalInsertTime:     2016-11-30T21:24:45-08:00
         physicalSerial:         1030M03UYM
         physicalSize:           1.4554837569594383T
         slotNumber:             "PCI Slot: 5; FDOM: 1"
         status:                 failed

The following is an example of the output from a Sun Flash Accelerator F40 PCIe card:

         name:                   FLASH_5_3
         diskType:               FlashDisk
         luns:                   5_3
         makeModel:              "Sun Flash Accelerator F40 PCIe Card"
         physicalFirmware:       TI35
         physicalInsertTime:     2012-07-13T15:40:59-07:00
         physicalSerial:         5L002X4P
         physicalSize:           93.13225793838501G
         slotNumber:             "PCI Slot: 5; FDOM: 3"
         status:                 failed

For the PCIe cards, the name and slotNumber attributes show the PCI slot and the FDOM number. For Extreme Flash storage servers, the slotNumber attribute shows the NVMe slot on the front panel.

On Oracle Exadata Database Machine X7 and later systems, all flash disks are in the form of an Add-in-Card (AIC), which is inserted into a PCIe slot on the motherboard. The slotNumber attribute shows the PCI number and FDOM number, regardless of whether it is an EF or HC storage server.

If an flash disk is detected to have failed, then an alert is generated indicating that the flash disk, as well as the LUN on it, has failed. The alert message includes either the PCI slot number and FDOM number or the NVMe slot number. These numbers uniquely identify the field replaceable unit (FRU). If you have configured the system for alert notification, then an alert is sent by e-mail message to the designated address.

A flash disk outage can cause reduction in performance and data redundancy. The failed disk should be replaced with a new flash disk at the earliest opportunity. If the flash disk is used for flash cache, then the effective cache size for the storage server is reduced. If the flash disk is used for flash log, then flash log is disabled on the disk thus reducing the effective flash log size. If the flash disk is used for grid disks, then the Oracle Automatic Storage Management (Oracle ASM) disks associated with these grid disks are automatically dropped with the FORCE option from the Oracle ASM disk group, and a rebalance operation starts to restore the data redundancy.

The following procedure describes how to replace an FDOM due to disk failure on High Capacity storage servers that do not support online flash replacement. Replacing an NVMe drive on Extreme Flash storage servers is the same as replacing a physical disk: you can just remove the NVMe drive from the front panel and insert a new one. You do not need to shut down the storage server.

  1. Shut down the storage server. See "Shutting Down Exadata Storage Server"
  2. Replace the failed flash disk based on the PCI number and FDOM number. A white Locator LED is lit to help locate the affected storage server.
  3. Power up the storage server. The cell services are started automatically. As part of the storage server startup, all grid disks are automatically ONLINE in Oracle ASM.
  4. Verify that all grid disks have been successfully put online using the following command:
    CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus
             data_CD_00_testceladm10     ONLINE
             data_CD_01_testceladm10     ONLINE
             data_CD_02_testceladm10     ONLINE
             ...
    

    Wait until asmmodestatus shows ONLINE or UNUSED for all grid disks.

The new flash disk is automatically used by the system. If the flash disk is used for flash cache, then the effective cache size increases. If the flash disk is used for grid disks, then the grid disks are re-created on the new flash disk. If those grid disks were part of an Oracle ASM disk group, then they are added back to the disk group, and the data is rebalanced on them based on the disk group redundancy and ASM_POWER_LIMIT parameter.

See Also:

3.4.2 About Flash Disk Degraded Performance Statuses

If a flash disk has degraded performance, you might need to replace the disk.

You may need to replace a flash disk because the disk is has one of the following statuses:

  • warning - predictive failure
  • warning - poor performance
  • warning - write-through caching
  • warning - peer failure

Note:

For Oracle Exadata System Software releases earlier than release 11.2.3.2.2, the status is not present.

An alert is generated when a flash disk is in predictive failure, poor performance, write-through caching or peer failure status. The alert includes specific instructions for replacing the flash disk. If you have configured the system for alert notifications, then the alerts are sent by e-mail message to the designated address.

predictive failure

Flash disk predictive failure status indicates that the flash disk will fail soon, and should be replaced at the earliest opportunity. If the flash disk is used for flash cache, then it continues to be used as flash cache. If the flash disk is used for grid disks, then the Oracle ASM disks associated with these grid disks are automatically dropped, and Oracle ASM rebalance relocates the data from the predictively failed disk to other disks.

When a flash disk has predictive failure due to one flash disk, then the data is copied. If the flash disk is used for grid disks, then Oracle ASM re-partners the associated partner, and does a rebalance. If the flash disk is used for write back flash cache, then the data is flushed from the flash disks to the grid disks.

To identify a predictive failure flash disk, use the following command:

CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=flashdisk AND STATUS=  \
'warning - predictive failure' DETAIL

         name:               FLASH_1_1
         deviceName:         /dev/nvme3n1
         diskType:           FlashDisk
         luns:               1_1
         makeModel:          "Oracle Flash Accelerator F160 PCIe Card"
         physicalFirmware:   8DV1RA13
         physicalInsertTime: 2016-11-30T21:24:45-08:00
         physicalSerial:     CVMD519000251P6KGN
         physicalSize:       1.4554837569594383T
         slotNumber:         "PCI Slot: 1; FDOM: 1"
         status:             warning - predictive failure

poor performance

Flash disk poor performance status indicates that the flash disk demonstrates extremely poor performance, and should be replaced at the earliest opportunity. Starting with Oracle Exadata System Software release 11.2.3.2, an under-performing disk is automatically identified and removed from active configuration. If the flash disk is used for flash cache, then flash cache is dropped from this disk thus reducing the effective flash cache size for the storage server. If the flash disk is used for grid disks, then the Oracle ASM disks associated with the grid disks on this flash disk are automatically dropped with FORCE option, if possible. If DROP...FORCE cannot succeed due to offline partners, then the grid disks are automatically dropped normally, and Oracle ASM rebalance relocates the data from the poor performance disk to other disks.

Oracle Exadata Database Machine then runs a set of performance tests. When poor disk performance is detected by CELLSRV, the cell disk status changes to normal - confinedOnline, and the physical disk status changes to warning - confinedOnline. The following conditions trigger disk confinement:

  • Disk stopped responding. The cause code in the storage alert log is CD_PERF_HANG.
  • Slow cell disk such as the following:
    • High service time threshold (cause code CD_PERF_SLOW_ABS)
    • High relative service time threshold (cause code CD_PERF_SLOW_RLTV)
  • High read or write latency such as the following:
    • High latency on writes (cause code CD_PERF_SLOW_LAT_WT)
    • High latency on reads (cause code CD_PERF_SLOW_LAT_RD)
    • High latency on reads and writes (cause code CD_PERF_SLOW_LAT_RW)
    • Very high absolute latency on individual I/Os happening frequently (cause code CD_PERF_SLOW_LAT_ERR)
  • Errors such as I/O errors (cause code CD_PERF_IOERR).

If the disk problem is temporary and passes the tests, then it is brought back into the configuration. If the disk does not pass the tests, then it is marked as poor performance, and Oracle Auto Service Request (ASR) submits a service request to replace the disk. If possible, Oracle ASM takes the grid disks offline for testing. If Oracle ASM cannot take the disks offline, then the cell disk status stays at normal - confinedOnline until the disks can be taken offline safely.

To identify a poor performance flash disk, use the following command:

CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=flashdisk AND STATUS= \
'warning - poor performance' DETAIL

         name:                FLASH_1_4
         diskType:            FlashDisk
         luns:                1_4
         makeModel:           "Sun Flash Accelerator F20 PCIe Card"
         physicalFirmware:    D20Y
         physicalInsertTime:  2012-09-27T13:11:16-07:00
         physicalSerial:      508002000092e70FMOD2
         physicalSize:        22.8880615234375G
         slotNumber:          "PCI Slot: 1; FDOM: 3"
         status:              warning - poor performance

The disk status change is associated with the following entry in the cell alert history:

MESSAGE ID date_time info "Hard disk entered confinement status. The LUN
 n_m changed status to warning - confinedOnline. CellDisk changed status to normal
 - confinedOnline. Status: WARNING - CONFINEDONLINE  Manufacturer: name  Model
 Number: model  Size: size  Serial Number: serial_number  Firmware: fw_release 
 Slot Number: m  Cell Disk: cell_disk_name  Grid Disk: grid disk 1, grid disk 2
 ... Reason for confinement: threshold for service time exceeded"

The following would be logged in the storage server alert log:

CDHS: Mark cd health state change cell_disk_name  with newState HEALTH_BAD_
ONLINE pending HEALTH_BAD_ONLINE ongoing INVALID cur HEALTH_GOOD
Celldisk entering CONFINE ACTIVE state with cause CD_PERF_SLOW_ABS activeForced: 0
inactiveForced: 0 trigger HistoryFail: 0, forceTestOutcome: 0 testFail: 0
global conf related state: numHDsConf: 1 numFDsConf: 0 numHDsHung: 0 numFDsHung: 0
...

Note:

In Oracle Exadata System Software releases earlier than release 11.2.3.2, use the CALIBRATE command to identify a bad flash disk, and look for very low throughput and IOPS for each flash disk.

If a flash disk exhibits extremely poor performance, then it is marked as poor performance. The flash cache on that flash disk is automatically disabled, and the grid disks on that flash disk are automatically dropped from the Oracle ASM disk group.

write-through caching

Flash disk write-through caching status indicates the capacitors used to support data cache on the PCIe card have failed, and the card should be replaced as soon as possible.

peer failure

Flash disk peer failure status indicates one of the flash disks on the same Sun Flash Accelerator PCIe card has failed or has a problem. For example, if FLASH_5_3 fails, then FLASH_5_0, FLASH_5_1, and FLASH_5_2 have peer failure status. The following is an example:

CellCLI> LIST PHYSICALDISK
         36:0            L45F3A          normal
         36:1            L45WAE          normal
         36:2            L45WQW          normal
...
         FLASH_5_0       5L0034XM        warning - peer failure
         FLASH_5_1       5L0034JE        warning - peer failure
         FLASH_5_2       5L002WJH        warning - peer failure
         FLASH_5_3       5L002X4P        failed

When CellSRV detects a predictive or peer failure in any flash disk used for write back flash cache and only one FDOM is bad, then the data on the bad FDOM is resilvered, and the data on the other three FDOMs is flushed. CellSRV then initiates an Oracle ASM rebalance for the disks if there are valid grid disks. The bad disk cannot be replaced until the tasks are completed. MS sends an alert when the disk can be replaced.

3.4.3 Replacing a Flash Disk Due to Flash Disk Problems

Oracle Exadata Storage Server is equipped with four PCIe cards. Each card has four flash disks (FDOMs) for a total of 16 flash disks. The four PCIe cards are present on PCI slot numbers 1, 2, 4, and 5. Starting with Oracle Exadata Database Machine X7, you can replace the PCIe cards without powering down the storage server. See Performing a Hot Pluggable Replacement of a Flash Disk.

In Oracle Exadata Database Machine X6 and earlier systems, the PCIe cards are not hot-pluggable. The Oracle Exadata Storage Server must be powered down before replacing the flash disks or cards.

Starting with Oracle Exadata Database Machine X7, each flash card on both High Capacity and Extreme Flash storage servers is a field-replaceable unit (FRU). The flash cards are also hot-pluggable, so you do not have to shut down the storage server before removing the flash card.

On Oracle Exadata Database Machine X5 and X6 systems, each flash card on High Capacity and each flash drive on Extreme Flash are FRUs. This means that there is no peer failure for these systems.

On Oracle Exadata Database Machine X3 and X4 systems, because the flash card itself is a FRU, if any FDOMs were to fail, the Oracle Exadata System Software would automatically put the rest of FDOMs on that card to peer failure so that the data can be moved out to prepare for the flash card replacement.

On Oracle Exadata Database Machine V2 and X2 systems, each FDOM is a FRU. There is no peer failure for flash for these systems.

Determining when to proceed with disk replacement depends on the release, as described in the following:

  • For Oracle Exadata System Software releases earlier than 11.2.3.2:

    Wait until the Oracle ASM disks have been successfully dropped by querying the V$ASM_DISK_STAT view before proceeding with the flash disk replacement. If the normal drop did not complete before the flash disk fails, then the Oracle ASM disks are automatically dropped with the FORCE option from the Oracle ASM disk group. If the DROP command did not complete before the flash disk fails, then refer to Replacing a Flash Disk Due to Flash Disk Failure.

  • For Oracle Exadata System Software releases 11.2.3.2 and later:

    An alert is sent when the Oracle ASM disks have been dropped, and the flash disk can be safely replaced. If the flash disk is used for write-back flash cache, then wait until none of the grid disks are cached by the flash disk. Use the following command to check the cachedBy attribute of all the grid disks. The cell disk on the flash disk should not appear in any grid disk's cachedBy attribute.

    CellCLI> LIST GRIDDISK ATTRIBUTES name, cachedBy
    

    If the flash disk is used for both grid disks and flash cache, then wait until receiving the alert, and the cell disk is not shown in any grid disk's cachedBy attribute.

The following procedure describes how to replace a flash disk on High Capacity storage servers for Oracle Exadata Database Machine X6 and earlier due to disk problems.

Note:

On Extreme Flash storage servers for Oracle Exadata Database Machine X6 and all storage servers for Oracle Exadata Database Machine X7 and later, you can just remove the flash disk from the front panel and insert a new one. You do not need to shut down the storage server.
  1. Stop the cell services using the following command:
    CellCLI> ALTER CELL SHUTDOWN SERVICES ALL
    

    The preceding command checks if any disks are offline, in predictive failure status or need to be copied to its mirror. If Oracle ASM redundancy is intact, then the command takes the grid disks offline in Oracle ASM, and then stops the cell services. If the following error is displayed, then it may not be safe to stop the cell services because a disk group may be forced to dismount due to redundancy.

    Stopping the RS, CELLSRV, and MS services...
    The SHUTDOWN of ALL services was not successful.
    CELL-01548: Unable to shut down CELLSRV because disk group DATA, RECO may be
    forced to dismount due to reduced redundancy.
    Getting the state of CELLSRV services... running
    Getting the state of MS services... running
    Getting the state of RS services... running
    

    If the error occurs, then restore Oracle ASM disk group redundancy and retry the command when disk status is back to normal for all the disks.

  2. Shut down the storage server.
  3. Replace the failed flash disk based on the PCI number and FDOM number. A white Locator LED is lit to help locate the affected storage server.
  4. Power up the storage server. The cell services are started automatically. As part of the storage server startup, all grid disks are automatically ONLINE in Oracle ASM.
  5. Verify that all grid disks have been successfully put online using the following command:
    CellCLI> LIST GRIDDISK ATTRIBUTES name, asmmodestatus
    

    Wait until asmmodestatus shows ONLINE or UNUSED for all grid disks.

The new flash disk is automatically used by the system. If the flash disk is used for flash cache, then the effective cache size increases. If the flash disk is used for grid disks, then the grid disks are re-created on the new flash disk. If those gird disks were part of an Oracle ASM disk group, then they are added back to the disk group, and the data is rebalanced on them based on the disk group redundancy and the ASM_POWER_LIMIT parameter.

See Also:

3.4.4 Performing a Hot Pluggable Replacement of a Flash Disk

Starting with Oracle Exadata Database Machine X7, flash disks are hot-pluggable on both Extreme Flash (EF) and High Capacity (HC) storage servers.

Also, for Oracle Exadata Database Machine X6 and earlier, the flash devices on EF storage servers are hot-pluggable. However, for HC storage servers on Oracle Exadata Database Machine X6 and earlier systems, you must power down the storage servers before replacing the flash disks.

To replace a hot-pluggable flash disk device:

  1. If necessary, prepare the disk for hot-pluggable replacement.

    Typically, you will replace a flash drive only after Oracle Exadata System Software identifies a problem and sets the device status to failed - dropped for replacement, which indicates the flash disk is ready for online replacement.

    If you need to replace a flash disk that is in another state, then you must first use the CellCLI ALTER PHYSICALDISK command with the DROP FOR REPLACEMENT clause to prepare the disk for hot-pluggable replacement.

  2. Verify that the flash disk is ready for hot-pluggable replacement.

    Verify that the device status is failed - dropped for replacement.

    You can check the device status by using the CellCLI LIST PHYSICALDISK command. For example:

    CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=flashdisk AND STATUS LIKE '.*dropped 
    for replacement.*' DETAIL
    
             name:               FLASH_6_1
             deviceName:         /dev/nvme0n1
             diskType:           FlashDisk
             luns:               6_0
             makeModel:          "Oracle Flash Accelerator F640 PCIe Card"
             physicalFirmware:   QDV1RD09
             physicalInsertTime: 2017-08-11T12:25:00-07:00
             physicalSerial:     PHLE6514003R6P4BGN-1
             physicalSize:       2.910957656800747T
             slotNumber:         "PCI Slot: 6; FDOM: 1"
             status:             failed - dropped for replacement
    

    Note that when one FDOM fails, the affected flash disk PCI card is considered failed and the whole card must be replaced.

  3. Physically locate the flash disk device.

    Use the slotNumber information from the CellCLI LIST PHYSICALDISK command output to help identify the PCI slot containing the affected flash device.

    Also, a white Locator LED is lit to help locate the affected storage server. An amber Fault-Service Required LED is lit to identify the affected flash device.

  4. Make sure that the flash disk power LED is off.

    Caution:

    Removing a flash device when the power LED is on could result in a system crash. If the status is failed – dropped for replacement but the power LED is still on, contact Oracle Support Services.
  5. Remove and replace the flash disk device.

Related Topics

3.4.5 Enabling Write Back Flash Cache

Write operations serviced by flash instead of by disk are referred to as write-back flash cache.

Starting with Oracle Exadata System Software release 11.2.3.2.1, Exadata Smart Flash Cache can transparently cache frequently-accessed data to fast solid-state storage, improving query response times and throughput.

3.4.5.1 Enable Write Back Flash Cache for 11.2.3.3.1 or Higher

Enable write back Flash Cache on the storage servers to improve query response times and throughput.

For Oracle Exadata System Software release 11.2.3.3.1 or higher, you do not have to stop cell services or inactivate grid disks when changing the Flash Cache from Write Through mode to Write Back mode.

Note:

Any time the Flash Cache is dropped and re-created, there is a performance impact for database operations. While the Flash Cache is being repopulated, there are more cache misses, which impacts database performance.
  1. Validate all the physical disks are in NORMAL state before modifying Exadata Smart Flash Cache.

    The following command should return no rows:

    # dcli –l root –g cell_group cellcli –e “list physicaldisk attributes name,status”|grep –v NORMAL
  2. Drop the Flash Cache.
    # dcli –l root –g cell_group cellcli -e drop flashcache
  3. Set the flashCacheMode attribute to writeback.
    # dcli –l root – g cell_group cellcli -e "alter cell flashCacheMode=writeback"
  4. Re-create the Flash Cache.
    # dcli –l root –g cell_group cellcli -e create flashcache all 
  5. Verify the flashCacheMode has been set to writeback.
    # dcli –l root –g cell_group cellcli -e list cell detail | grep flashCacheMode
  6. Validate the grid disk attributes cachingPolicy and cachedby.
    # cellcli –e list griddisk attributes name,cachingpolicy,cachedby
3.4.5.2 Enabling Write Back Flash Cache on a Rolling Basis for Software Versions Lower Than 11.2.3.3.1

You can enable Write Back Flash Cache on a rolling basis.

To modify the Flash Cache attribute from writethrough to writeback, the Flash Cache must be dropped first. For Oracle Exadata System Software releases prior to 11.2.3.3.1, you must stop cell services or inactivate grid disks when enabling Write Back Flash Cache.

There is a shell script to automate enabling and disabling Write Back Flash Cache. Refer to My Oracle Support note 1500257.1 for the script and additional information.

Note:

Any time the Flash Cache is dropped and re-created, there is a performance impact for database operations. While the Flash Cache is being repopulated, there are more cache misses, which impacts database performance.

Oracle Grid Infrastructure homes and Oracle Database homes must at release 11.2.0.3 BP9 or higher to use write-back Flash Cache. Refer to My Oracle Support note 888828.1 for the minimum release requirements for Oracle Exadata System Software, Oracle Grid Infrastructure home, and Oracle Database home.

  1. Log in as the root user to the first cell to be enabled for Write Back Flash Cache.
  2. Check that the Flash Cache is in normal state and no flash disks are degraded or in a critical state.
    # cellcli -e LIST FLASHCACHE detail
    
  3. Drop the Flash Cache on the cell.
    # cellcli -e DROP FLASHCACHE
    
  4. Inactivate the grid disks on the cell.
    # cellcli -e ALTER GRIDDISK ALL INACTIVE
    
  5. Shut down CELLSRV services.
    # cellcli -e ALTER CELL SHUTDOWN SERVICES CELLSRV
    
  6. Set the flashCacheMode attribute to writeback.
    # cellcli -e "ALTER CELL FLASHCACHEMODE=writeback"
    
  7. Restart cell services.
    # cellcli -e ALTER CELL STARTUP SERVICES CELLSRV
    
  8. Reactivate the grid disks on the cell.
    # cellcli -e ALTER GRIDDISK ALL ACTIVE
    
  9. Re-create the Flash Cache.
    # cellcli -e CREATE FLASHCACHE ALL
    
  10. Check the status of the cell.
    # cellcli -e LIST CELL DETAIL | grep flashCacheMode
    

    The flashCacheMode attribute should be set to writeback.

  11. Check the grid disk attributes asmDeactivationOutcome and asmModeStatus before moving to the next cell using the following command:
    CellCLI> LIST GRIDDISK ATTRIBUTES name,asmdeactivationoutcome,asmmodestatus
    

    The asmDeactivationOutcome attribute should be yes, and the asmModeStatus attribute should be online.

  12. Repeat the preceding steps on the next cell.
3.4.5.3 Enabling Write Back Flash Cache in a Non-Rolling Basis for Software Versions Lower Than 11.2.3.3.1

You can enable Write Back Flash Cache on a non-rolling basis.

The cell services must be shut down before changing the flashCacheMode attribute. For Oracle Exadata System Software releases prior to 11.2.3.3.1, you must stop cell services when disabling Write Back Flash Cache in a non-rolling manner.

There is a shell script to automate enabling and disabling write back Flash Cache. Refer to My Oracle Support note 1500257.1 for the script and additional information.

Oracle Grid Infrastructure homes and Oracle Database homes must at release 11.2.0.3 BP9 or higher to use write-back Flash Cache. Refer to My Oracle Support note 888828.1 for the minimum release requirements for Oracle Exadata System Software, Oracle Grid Infrastructure home, and Oracle Database home.

  1. Log in as the root user to a database node.
  2. Shut down the entire cluster.
    # cd $Grid_home/bin
    # ./crsctl stop cluster -all
    
  3. Drop the Flash Cache for all cells.
    # dcli -g cell_group -l root cellcli -e DROP FLASHCACHE
    
  4. Shut down CELLSRV services.
    # dcli -g cell_group -l root cellcli -e ALTER CELL SHUTDOWN SERVICES CELLSRV
    
  5. Confirm that the Flash Cache is in writethrough mode.
    # dcli -g cell_group -l root "cellcli -e list cell detail | grep -i flashcachemode"
    
  6. Set the flashCacheMode attribute to writeback.
    # dcli -g cell_group -l root cellcli -e "ALTER CELL FLASHCACHEMODE=writeback"
    
  7. Restart cell services.
    # dcli -g cell_group -l root cellcli -e ALTER CELL STARTUP SERVICES CELLSRV
    
  8. Re-create the Flash Cache.
    # dcli -g cell_group -l root cellcli -e CREATE FLASHCACHE ALL
    
  9. Restart the cluster:
    # cd $Grid_home/bin
    # ./crsctl start cluster -all

3.4.6 Disabling Write Back Flash Cache

You can disable the Write-Back Flash Cache by enabling Write-Through Flash Cache.

Starting with Oracle Exadata System Software release 11.2.3.2.1, Exadata Smart Flash Cache can transparently cache frequently-accessed data to fast solid-state storage, improving query response times and throughput.

Write operations serviced by flash instead of by disk are referred to as write back flash cache.

3.4.6.1 Disable Write-Back Flash Cache Along With Write-Back PMEM Cache

Before Oracle Exadata System Software release 23.1.0, write-back PMEM cache is only supported in conjunction with write-back flash cache. Consequently, if write-back PMEM Cache is enabled, you must also disable write-back PMEM Cache to disable write-back flash cache.

This requirement only applies prior to Oracle Exadata System Software release 23.1.0 because PMEM cache only operates in write-through mode starting with Oracle Exadata System Software release 23.1.0.

Note:

To reduce the performance impact on your applications, change the cache mode during a period of reduced workload.

The following command examples use a text file named cell_group that contains the host names of the storage servers that are the subject of the procedure.

  1. Check the current PMEM cache mode setting (pmemCacheMode):
    # dcli –l root –g cell_group cellcli -e "list cell detail" | grep pmemCacheMode
  2. If the PMEM cache is in write-back mode:
    1. Flush the PMEM cache.

      If the PMEM cache utilizes all available PMEM cell disks, you can use the ALL keyword as shown here.

      # dcli –l root –g cell_group cellcli ALTER PMEMCACHE ALL FLUSH

      Otherwise, list the specific disks using the CELLDISK="cdisk1 [,cdisk2] ..." clause.

    2. Drop the PMEM cache.
      # dcli –l root –g cell_group cellcli DROP PMEMCACHE
    3. Configure the cell to use PMEM cache in write-through mode.
      # dcli –l root –g cell_group cellcli ALTER CELL pmemCacheMode=writethrough
    4. Re-create the PMEM cache.

      If the PMEM cache utilizes all available PMEM cell disks, you can use the ALL keyword as shown here. Otherwise, list the specific disks using the CELLDISK="cdisk1 [,cdisk2] ..." clause. If the size attribute is not specified, then the PMEM cache consumes all available space on each cell disk.

      # dcli –l root –g cell_group cellcli -e CREATE PMEMCACHE ALL
    5. Verify that pmemCacheMode is set to writethrough.
      # dcli –l root –g cell_group cellcli -e "list cell detail" | grep pmemCacheMode
  3. Validate that all the physical disks are in NORMAL state before modifying the flash cache.
    # dcli –l root –g cell_group cellcli –e "LIST PHYSICALDISK ATTRIBUTES name,status" | grep –v NORMAL
    The command should return no rows.
  4. Determine amount of dirty data in the flash cache.
    # dcli –g cell_group –l root cellcli -e "LIST METRICCURRENT ATTRIBUTES name,metricvalue WHERE name LIKE \'FC_BY_DIRTY.*\' "
  5. Flush the flash cache.

    If the flash cache utilizes all available flash cell disks, you can use the ALL keyword instead of listing the flash disks.

    # dcli –g cell_group –l root cellcli -e "ALTER FLASHCACHE CELLDISK=\'FD_02_dm01celadm12,
    FD_03_dm01celadm12,FD_00_dm01celadm12,FD_01_dm01celadm12\' FLUSH"
  6. Check the progress of the flash cache flush operation.

    The flushing process is complete when the metric FC_BY_DIRTY is zero.

    # dcli -g cell_group -l root cellcli -e "LIST METRICCURRENT ATTRIBUTES name,metricvalue WHERE name LIKE \'FC_BY_DIRTY.*\' " 

    Or, you can check to see if the attribute flushstatus is set to Completed.

    # dcli -g cell_group -l root cellcli -e "LIST CELLDISK ATTRIBUTES name, flushstatus,flusherror" | grep FD 
  7. After the flash cache is flushed, drop the flash cache.
    # dcli -g cell_group -l root cellcli -e "drop flashcache" 
  8. Configure the cell to use flash cache in write-through mode.
    # dcli -g cell_group -l root cellcli -e "ALTER CELL flashCacheMode=writethrough"
  9. Re-create the flash cache.

    If the flash cache utilizes all available flash cell disks, you can use the ALL keyword instead of listing the cell disks.

    If the size attribute is not specified, the flash cache consumes all available space on each cell disk.

    # dcli –l root –g cell_group cellcli -e "create flashcache celldisk=\'FD_02_dm01celadm12,
    FD_03_dm01celadm12,FD_00_dm01celadm12,FD_01_dm01celadm12\'
  10. Verify that flashCacheMode is set to writethrough.
    # dcli –l root –g cell_group cellcli -e "list cell detail" | grep flashCacheMode
3.4.6.2 Disable Write Back Flash Cache for 11.2.3.3.1 or Higher

You can disable write back Flash Cache on the storage servers by changing the mode to Write Through.

With release 11.2.3.3.1 or higher, you do not have to stop the CELLSRV process or inactivate grid disks.

Note:

To reduce the performance impact on the application, disable the write back flash cache during a period of reduced workload.
  1. Validate all the Physical Disks are in NORMAL state before modifying FlashCache.

    The following command should return no rows:

    # dcli –l root –g cell_group cellcli –e “LIST PHYSICALDISK ATTRIBUTES name,status”|grep –v NORMAL
  2. Determine amount of dirty data in the flash cache.
    # cellcli -e "LIST METRICCURRENT ATTRIBUTES name,metricvalue WHERE name LIKE \'FC_BY_DIRTY.*\' "
  3. Flush the flash cache .
    # dcli –g cell_group –l root cellcli -e "ALTER FLASHCACHE ALL FLUSH" 
  4. Check the progress of the flushing of flash cache.

    The flushing process is complete when FC_BY_DIRTY is 0 MB.

    # dcli -g cell_group -l root cellcli -e "LIST METRICCURRENT ATTRIBUTES name,metricvalue WHERE name LIKE \'FC_BY_DIRTY.*\' " 

    Or, you can check to see if the attribute flushstatus has been set to Completed.

    # dcli -g cell_group -l root cellcli -e "LIST CELLDISK ATTRIBUTES name, flushstatus, flusherror" | grep FD 
  5. After flushing of the flash cache completes, drop the flash cache.
    # dcli -g cell_group -l root cellcli -e drop flashcache 
  6. Set the flashCacheMode attribute to writethrough.
    # dcli -g cell_group -l root cellcli -e "ALTER CELL flashCacheMode=writethrough"
  7. Re-create the flash cache.
    # dcli –l root –g cell_group cellcli -e create flashcache all 
  8. Verify the flashCacheMode has been set to writethrough.
    # dcli –l root –g cell_group cellcli -e list cell detail | grep flashCacheMode
3.4.6.3 Disabling Write Back Flash Cache on a Rolling Basis for Software Versions Lower Than 11.2.3.3.1

You can enable Write Back Flash Cache on each storage server using a rolling method.

The cell services must be shut down before changing the flashCacheMode attribute. The cell services can shut down on rolling basis. The flash cache must be flushed and dropped before changing the attribute to writethrough. After the flush operation begins, all caching to the flash cache stops. Ensure cell services resynchronization is complete on the current storage server before changing the next storage server.

There is a shell script to automate enabling and disabling write back Flash Cache. Refer to My Oracle Support note 1500257.1 for the script and additional information.

  1. Log in as the root user to the first cell to be disabled for write back flash cache.
  2. Verify the asmDeactivationOutcome attribute is yes for all grid disks on the cell.
    # dcli -g cell_group -l root cellcli -e "LIST GRIDDISK WHERE   \
     asmdeactivationoutcome != 'Yes' attributes name, asmdeactivationoutcome, \
    asmmodestatus"
    The grid disk attribute asmDeactivationOutcome must be yes, and the asmModeStatus attribute must be online for all grid disks on the current cell before moving to the next cell. If a grid disk does not have an asmDeactivationOutcome attribute value of yes, then you must resolve this issue before proceeding.
  3. Check the amount of dirty data in the flash cache.
    # dcli -g cell_group -l root cellcli -e "LIST METRICCURRENT ATTRIBUTES  \
    name,metricvalue WHERE name LIKE \'FC_BY_DIRTY.*\'"
  4. Flush the flash cache.
    # dcli -g cell_group -l root cellcli -e ALTER FLASHCACHE ALL FLUSH
  5. Check the status of the flash cache.
    # dcli -g cell_group -l root cellcli -e LIST CELLDISK ATTRIBUTES name, \
    flushstatus, flusherror | grep FD

    The status shows completed when the flush is done.

  6. Perform the following set of steps on for all cells, one cell at a time.
    In other words, perform steps (a) through (i) on one cell, then perform the same steps on another cell until all the cells are done.
    1. Drop the flash cache.
      # cellcli -e DROP FLASHCACHE
    2. Inactivate all grid disks on the cell.
      # cellcli -e ALTER GRIDDISK ALL INACTIVE
    3. Shut down CELLSRV services.
      # cellcli -e ALTER CELL SHUTDOWN SERVICES CELLSRV
    4. Set the flashCacheMode attribute to writethrough.
      # cellcli -e "ALTER CELL FLASHCACHEMODE=writethrough"
    5. Restart cell services.
      # cellcli -e ALTER CELL STARTUP SERVICES CELLSRV
    6. Reactivate the grid disks on the cell.
      # cellcli -e ALTER GRIDDISK ALL ACTIVE
    7. Re-create the flash cache.
      # cellcli -e CREATE FLASHCACHE ALL
    8. Check the status of the cell.
      # cellcli -e LIST CELL DETAIL | grep flashCacheMode
    9. Check the grid disk attributes asmDeactivationOutcome and asmModeStatus.
      # cellcli -e LIST GRIDDISK ATTRIBUTES name,status,asmdeactivationoutcome,asmmodestatus
      

      The asmDeactivationOutcome attribute should be yes, and the asmModeStatus attribute should be online.

      If the disk status is SYNCING, wait until it is ACTIVE before proceeding.

3.4.6.4 Disabling Write Back Flash Cache on a Non-Rolling Basis for Software Versions Lower Than 11.2.3.3.1

You can enable Write Back Flash Cache on a non-rolling basis.

When changing the Flash Cache mode on a non-rolling basis, ensure the entire cluster is shut down, including the Oracle Clusterware stack and all the databases. The cell services must be shut down before changing the flashCacheMode attribute. The Flash Cache must be flushed and dropped before changing the attribute to writethrough. The Flash Cache flush operation can be performed prior to shutting down the entire cluster. After the flush operation begins, all caching to the Flash Cache stops.

  1. Log in as the root user to the first database node to be disabled for write back Flash Cache.
  2. Check the amount of dirty data in the Flash Cache using the following command:
    # dcli -g cell_group -l root cellcli -e "LIST METRICCURRENT ATTRIBUTES  \
            name,metricvalue WHERE name LIKE \'FC_BY_DIRTY.*\'"
    
  3. Flush the Flash Cache using the following command:
    # dcli -g cell_group -l root cellcli -e ALTER FLASHCACHE ALL FLUSH
    
  4. Check the status as the blocks are moved to disk using the following command. The count reduces to zero.
    # dcli -g cell_group -l root cellcli -e "LIST METRICCURRENT ATTRIBUTES name, \
           metricvalue WHERE NAME LIKE \'FC_BY_DIRTY.*\'"
    
  5. Check the status of the flash disks using the following command:
    # dcli -g cell_group -l root cellcli -e LIST CELLDISK ATTRIBUTES name, flushstatus, flusherror | grep FD 
    

    The status shows completed when the flush is done.

  6. Shut down the database and the entire cluster using the following commands:
    # cd Grid_home/bin
    # ./crsctl stop cluster -all
    
  7. Drop the Flash Cache across all cells using the following command:
    # dcli -g cell_group -l root cellcli -e DROP FLASHCACHE
    
  8. Shut down CELLSRV services using the following command:
    # dcli -g cell_group -l root cellcli -e ALTER CELL SHUTDOWN SERVICES CELLSRV
    
  9. Set the flashCacheMode attribute to writethrough using the following command:
    # dcli -g cell_group -l root cellcli -e "ALTER CELL FLASHCACHEMODE=writethrough"
    
  10. Restart cell services using the following command:
    # dcli -g cell_group -l root cellcli -e ALTER CELL STARTUP SERVICES CELLSRV
    
  11. Re-create the Flash Cache using the following command:
    # dcli -g cell_group -l root cellcli -e CREATE FLASHCACHE ALL
    
  12. Check the Flash Cache mode of the cells using the following command:
    # dcli -g cell_group -l root cellcli -e LIST CELL DETAIL | grep flashCacheMode
    
  13. Restart the cluster and database using the following commands:
    # cd Grid_home/bin
    # ./crsctl start cluster -all
    

3.4.7 Enabling Flash Cache Compression

Flash cache compression can be enabled on Oracle Exadata Database Machine X4-2, Oracle Exadata Database Machine X3-2, and Oracle Exadata Database Machine X3-8 Full Rack systems. Oracle Exadata Database Machine X5-2, X5-8, and later systems do not have flash cache compression. Flash cache compression dynamically increases the logical capacity of the flash cache by transparently compressing user data as it is loaded into the flash cache.

Note:

  • Oracle Advanced Compression Option is required to enable flash cache compression.

  • User data is not retained when enabling flash cache compression.

The following procedure describes how to enable flash cache compression:

  1. Perform this step only if writeback flash cache is enabled. If writeback flash cache is not enabled, skip this step. Performing this step when writeback flash cache is not enabled can result in error messages. You can check the flash cache mode by running the following command:
    # cellcli -e LIST CELL DETAIL | grep flashCacheMode

    If writeback flash cache is enabled, then save the user data on the flash cell disks.

    # cellcli -e ALTER FLASHCACHE ALL FLUSH
    

    During the flash operation, the flushstatus attribute has a value of working. When the flush operation completes successfully, the value is changed to complete. For grid disks, the attribute cachedby should be null. Also, the number of dirty buffers (unflushed) will be 0 after flush is complete.

    # cellcli -e LIST METRICCURRENT FC_BY_DIRTY
              FC_BY_DIRTY     FLASHCACHE      0.000 MB
  2. Remove the flash cache from the cell.
    # cellcli -e DROP FLASHCACHE ALL
    
  3. Remove the flash log from the cell.
    # cellcli -e DROP FLASHLOG ALL
    
  4. Drop the cell disks on the flash disks.
    # cellcli -e DROP CELLDISK ALL FLASHDISK
    
  5. Enable flash cache compression using the following commands, based on the system:
    • For Oracle Exadata Database Machine X3-2, Oracle Exadata Database Machine X3-8, and Oracle Exadata Database Machine X4-2 storage servers with Oracle Exadata System Software release 11.2.3.3.1 or higher:

      # cellcli -e ALTER CELL flashcachecompress=true
      
    • For Oracle Exadata Database Machine X3-2 storage servers with Oracle Exadata System Software release 11.2.3.3.0:

      # cellcli -e ALTER CELL flashCacheCompX3Support=true
      # cellcli -e ALTER CELL flashCacheCompress=true
      
  6. Verify the size of the physical disks has increased.
    # cellcli -e LIST PHYSICALDISK attributes name,physicalSize,status WHERE disktype=flashdisk

    The status should be normal. Use the following information to validate the expected size when Compression is ON:

    • Aura 2.0/F40/X3:

      • Physical Disk Size: 93.13 G (OFF) or 186.26 G (ON)

      • Flash Cache Size: 1489 G (OFF) or 2979 G (ON)

    • Aura 2.1/F80/X4:

      • Physical Disk Size: 186.26 G (OFF) or 372.53 G (ON)

      • Flash Cache Size: 2979 G (OFF) or 5959 G (ON)

  7. Create the cell disks on the flash disks.
    # cellcli -e CREATE CELLDISK ALL FLASHDISK
    CellDisk FD_00_exampleceladm18 successfully created
    ...
    CellDisk FD_15_exampleceladm18 successfully created 
    
  8. Create the flash log.
    # cellcli -e CREATE FLASHLOG ALL
    Flash log RaNdOmceladm18_FLASHLOG successfully created 
    
  9. Create the flash cache on the cell.
    # cellcli -e CREATE FLASHCACHE ALL
    Flash cache exampleceladm18_FLASHCACHE successfully created
    

3.4.8 Monitoring Exadata Smart Flash Cache Usage Statistics

Use the following methods to monitor Exadata Smart Flash Cache usage:

  • AWR report, in the Exadata Statistics section.
    • Under Performance Summary you can find various statistics related to Flash Cache and its benefits.
    • Under Exadata Smart Statistics there is a section for Flash Cache with several different reports on Exadata Smart Flash Cache statistics.
  • ExaWatcher reports

    Flash Cache size and read, write, and population operation related stats are exposed in the Cell Server Charts and in the FlashCache related stats section.

  • Use the CellCLI LIST command to display and monitor metrics for the flash cache.

3.4.9 Disabling Flash Cache Compression

Flash cache compression can be disabled on Oracle Exadata Database Machine X4-2, Oracle Exadata Database Machine X3-2, and Oracle Exadata Database Machine X3-8 Full Rack systems. Oracle Exadata Database Machine X5-2, X5-8, and later systems do not have flash cache compression.

Note:

  • User data is not retained when disabling flash cache compression.

The following procedure describes how to disable flash cache compression:

  1. Save the user data on the flash cell disks.
    # cellcli -e ALTER FLASHCACHE ALL FLUSH
    

    For grid disks, the attribute cachedby should be null. Also, the number of dirty buffers (unflushed) will be 0 after flush is complete.

    # cellcli -e LIST METRICCURRENT FC_BY_DIRTY
              FC_BY_DIRTY     FLASHCACHE      0.000 MB
  2. Remove the flash cache from the cell.
    # cellcli -e DROP FLASHCACHE ALL
    
  3. Remove the flash log from the cell.
    # cellcli -e DROP FLASHLOG ALL
    
  4. Drop the cell disks on the flash disks.
    # cellcli -e DROP CELLDISK ALL FLASHDISK
    
  5. Disable Flash Cache Compression using the following commands, based on the system:
    • If Exadata Storage Cell Server image is 11.2.3.3.1 or higher and the Exadata Storage Cell is X3-2 or X4-2:

      # cellcli -e ALTER CELL flashcachecompress=false
      
    • If Exadata Storage Cell Server image is 11.2.3.3.0 and the Exadata Storage Cell is X3-2:

      # cellcli -e ALTER CELL flashCacheCompX3Support=true
      # cellcli -e ALTER CELL flashCacheCompress=false
      

      Note:

      Note that flashCacheCompress is set to false, but flashCacheCompX3Support has to be set to true.

    You can verify that Flash Cache Compress has been disabled by viewing the cell attributes:

    # cellcli -e LIST CELL attributes name,flashCacheCompress

    Correct values are FALSE or a null string.

  6. Verify the size of the physical disks has decreased.
    # cellcli -e LIST PHYSICALDISK attributes name,physicalSize,status WHERE disktype=flashdisk

    The status should be normal. Use the following information to validate the expected size when Compression is OFF:

    • Aura 2.0/F40/X3:

      • Physical Disk Size: 93.13 G (OFF) or 186.26 G (ON)

      • Flash Cache Size: 1489 G (OFF) or 2979 G (ON)

    • Aura 2.1/F80/X4:

      • Physical Disk Size: 186.26 G (OFF) or 372.53 G (ON)

      • Flash Cache Size: 2979 G (OFF) or 5959 G (ON)

  7. Create the cell disks on the flash disks.
    # cellcli -e CREATE CELLDISK ALL FLASHDISK
    CellDisk FD_00_exampleceladm18 successfully created
    ...
    CellDisk FD_15_exampleceladm18 successfully created 
    
  8. Create the flash log.
    # cellcli -e CREATE FLASHLOG ALL
    Flash log RaNdOmceladm18_FLASHLOG successfully created 
    

    Verify the flash log is in normal mode.

    # cellcli -e LIST FLASHLOG DETAIL
  9. Create the flash cache on the cell.
    # cellcli -e CREATE FLASHCACHE ALL
    Flash cache exampleceladm18_FLASHCACHE successfully created
    

    Verify the flash cache is in normal mode.

    # cellcli -e LIST FLASHCACHE DETAIL
  10. Verify that flash cache compression is disabled.
    # cellcli -e LIST CELL
    

    The value of the flashCacheCompress attribute should be false.

3.5 Maintaining PMEM Devices on Oracle Exadata Storage Servers

Persistent memory (PMEM) devices reside in Exadata X8M-2 and X9M-2 storage server models with High Capacity (HC) or Extreme Flash (EF) storage.

If a PMEM device fails, Oracle Exadata System Software isolates the failed device and automatically recovers the cache using the device.

If the cache is in write-back mode, the recovery operation, also known as resilvering, restores the lost data by reading a mirrored copy. During resilvering, the grid disk status is ACTIVE -- RESILVERING WORKING. If the cache is in write-through mode, then the data in the failed PMEM device is already stored in the data grid disk, and no recovery is required.

3.5.1 Replacing a PMEM Device Due to Device Failure

If the PMEM device has a status of Failed, you should replace the PMEM device on the Oracle Exadata Storage Server.

A PMEM fault could cause server to reboot. The failed device should be replaced with a new PMEM device at the earliest opportunity. Until the PMEM device is replaced, the corresponding cache size is reduced. If the PMEM device is used for commit acceleration (XRMEMLOG or PMEMLOG), then the size of the corresponding commit accelerator is also reduced.

An alert is generated when a PMEM device failure is detected. The alert message includes the slot number and cell disk name. If you have configured the system for alert notification, then an alert is sent by e-mail message to the designated address.

To identify a failed PMEM device, you can also use the following command:

CellCLI> LIST PHYSICALDISK WHERE disktype=PMEM AND status=failed DETAIL

    name:                          PMEM_0_1
    diskType:                      PMEM
    luns:                          P0_D1
    makeModel:                     "Intel NMA1XBD128GQS"
    physicalFirmware:              1.02.00.5365
    physicalInsertTime:            2019-09-28T11:29:13-07:00
    physicalSerial:                8089-A2-1838-00001234
    physicalSize:                  126.375G
    slotNumber:                    "CPU: 0; DIMM: 1"
    status:                        failed

In the above output, the slotNumber shows the socket number and DIMM slot number.

  1. Locate the storage server that contains the failed PMEM device.
    A white Locator LED is lit to help locate the affected storage server. When you have located the server, you can use the Fault Remind button to locate the failed DIMM.

    Caution:

    Do not attempt to remove a faulty DCPMM DIMM when the Do Not Service LED indicator is illuminated.
  2. Power down the storage server with the failed PMEM device and unplug the power cable for the server.
  3. Replace the failed PMEM device.
  4. Restart the storage server.

    Note:

    During the restart, the storage server will shut down a second time to complete the initialization of the new PMEM device.

The new PMEM device is automatically used by the system. If the PMEM device is used for caching, then the effective cache size increases. If the PMEM device is used for commit acceleration, then commit acceleration is enabled on the device.

3.5.2 Replacing a PMEM Device Due to Degraded Performance

If a PMEM device has degraded performance, you might need to replace the module.

If degraded performance is detected on a PMEM device, the module status is set to warning - predictive failure and an alert is generated. The alert includes specific instructions for replacing the PMEM device. If you have configured the system for alert notifications, then the alerts are sent by e-mail message to the designated address.

The predictive failure status indicates that the PMEM device will fail soon, and should be replaced at the earliest opportunity. No new data is cached in the PMEM device until it is replaced.

To identify a PMEM device with the status predictive failure, you can also use the following command:

CellCLI> LIST PHYSICALDISK WHERE disktype=PMEM AND status='warning - predictive failure' DETAIL

         name:               PMEM_0_6
         diskType:           PMEM
         luns:               P0_D6
         makeModel:          "Intel NMA1XBD128GQS"
         physicalFirmware:   1.02.00.5365
         physicalInsertTime: 2019-11-30T21:24:45-08:00
         physicalSerial:     8089-A2-1838-00001234
         physicalSize:       126.375G
         slotNumber:         "CPU: 0; DIMM: 6"
         status:             warning - predictive failure

You can also locate the PMEM device using the information in the LIST DISKMAP command:

CellCLI> LIST DISKMAP
Name      PhysicalSerial         SlotNumber        Status       PhysicalSize
   CellDisk       DevicePartition    GridDisks
PMEM_0_1  8089-a2-0000-00000460  "CPU: 0; DIMM: 1"  normal      126G
   PM_00_cel01    /dev/dax5.0        PMEMCACHE_PM_00_cel01
PMEM_0_3  8089-a2-0000-000004c2  "CPU: 0; DIMM: 3"  normal      126G
   PM_02_cel01    /dev/dax4.0        PMEMCACHE_PM_02_cel01
PMEM_0_5  8089-a2-0000-00000a77  "CPU: 0; DIMM: 5"  normal      126G
   PM_03_cel01    /dev/dax3.0        PMEMCACHE_PM_03_cel01
PMEM_0_6  8089-a2-0000-000006ff  "CPU: 0; DIMM: 6"  warning -   126G
   PM_04_cel01    /dev/dax0.0        PMEMCACHE_PM_04_cel01
PMEM_0_8  8089-a2-0000-00000750  "CPU: 0; DIMM: 8"  normal      126G
   PM_05_cel01    /dev/dax1.0        PMEMCACHE_PM_05_cel01
PMEM_0_10 8089-a2-0000-00000103  "CPU: 0; DIMM: 10" normal      126G
   PM_01_cel01    /dev/dax2.0        PMEMCACHE_PM_01_cel01
PMEM_1_1  8089-a2-0000-000008f6  "CPU: 1; DIMM: 1"  normal      126G
   PM_06_cel01    /dev/dax11.0       PMEMCACHE_PM_06_cel01
PMEM_1_3  8089-a2-0000-000003bb  "CPU: 1; DIMM: 3"  normal      126G
   PM_08_cel01    /dev/dax10.0       PMEMCACHE_PM_08_cel01
PMEM_1_5  8089-a2-0000-00000708  "CPU: 1; DIMM: 5"  normal      126G
   PM_09_cel01    /dev/dax9.0        PMEMCACHE_PM_09_cel01
PMEM_1_6  8089-a2-0000-00000811  "CPU: 1; DIMM: 6"  normal      126G
   PM_10_cel01    /dev/dax6.0        PMEMCACHE_PM_10_cel01
PMEM_1_8  8089-a2-0000-00000829  "CPU: 1; DIMM: 8"   normal     126G
   PM_11_cel01    /dev/dax7.0        PMEMCACHE_PM_11_cel01
PMEM_1_10 8089-a2-0000-00000435  "CPU: 1; DIMM: 10"   normal    126G
   PM_07_cel01    /dev/dax8.0        PMEMCACHE_PM_07_cel01

If the PMEM device is used for write-back caching, then the data is flushed from the PMEM device to the flash cache. To ensure that data is flushed from the PMEM device, check the cachedBy attribute of all the grid disks and ensure that the affected PMEM device is not listed.

  1. Locate the storage server that contains the failing PMEM device.
    A white Locator LED is lit to help locate the affected storage server. When you have located the server, you can use the Fault Remind button to locate the failed DIMM.

    Caution:

    Do not attempt to remove a faulty DCPMM DIMM when the Do Not Service LED indicator is illuminated.
  2. Power down the storage server with the failing PMEM device and unplug the power cable for the server.
  3. Replace the failing PMEM device.
  4. Restart the storage server.

    Note:

    During the restart, the storage server will shut down a second time to complete the initialization of the new PMEM device.

The new PMEM device is automatically used by the system. If the PMEM device is used for caching, then the effective cache size increases. If the PMEM device is used for commit acceleration, then commit acceleration is enabled on the device.

3.5.3 Enabling and Disabling Write-Back PMEM Cache

Prior to Oracle Exadata System Software release 23.1.0, you can configure PMEM cache to operate in write-back mode. Also known as write-back PMEM cache, this mode enables the cache to service write operations.

Note:

The best practice recommendation is to configure PMEM cache in write-through mode. This configuration provides the best performance and availability.

Commencing with Oracle Exadata System Software release 23.1.0, PMEM cache only operates in write-through mode.

3.5.3.1 Enable Write-Back PMEM Cache

Write-back PMEM cache is only supported in conjunction with write-back flash cache. Consequently, to enable write-back PMEM cache you must also enable write-back flash cache.

Note:

Commencing with Oracle Exadata System Software release 23.1.0, you cannot enable write-back PMEM cache because PMEM cache only operates in write-through mode.

Note:

To reduce the performance impact on your applications, change the cache mode during a period of reduced workload.

The following command examples use a text file named cell_group that contains the host names of the storage servers that are the subject of the procedure.

  1. Check the current flash cache mode setting (flashCacheMode):
    # dcli –l root –g cell_group cellcli -e "list cell detail" | grep flashCacheMode
  2. If the flash cache is in write-back mode:
    1. Validate that all the physical disks are in NORMAL state before modifying the flash cache.
      # dcli –l root –g cell_group cellcli –e "LIST PHYSICALDISK ATTRIBUTES name,status" | grep –v NORMAL
      The command should return no rows.
    2. Determine amount of dirty data in the flash cache.
      # dcli –g cell_group –l root cellcli -e "LIST METRICCURRENT ATTRIBUTES name,metricvalue WHERE name LIKE \'FC_BY_DIRTY.*\' "
    3. Flush the flash cache.

      If the flash cache utilizes all available flash cell disks, you can use the ALL keyword instead of listing the flash disks.

      # dcli –g cell_group –l root cellcli -e "ALTER FLASHCACHE CELLDISK=\'FD_02_dm01celadm12,
      FD_03_dm01celadm12,FD_00_dm01celadm12,FD_01_dm01celadm12\' FLUSH"
    4. Check the progress of the flash cache flush operation.

      The flushing process is complete when the metric FC_BY_DIRTY is zero.

      # dcli -g cell_group -l root cellcli -e "LIST METRICCURRENT ATTRIBUTES name, metricvalue WHERE name LIKE \'FC_BY_DIRTY.*\' " 

      Or, you can check to see if the attribute flushstatus is set to Completed.

      # dcli -g cell_group -l root cellcli -e "LIST CELLDISK ATTRIBUTES name, flushstatus, flusherror" | grep FD 
    5. After the flash cache is flushed, drop the flash cache.
      # dcli -g cell_group -l root cellcli -e "drop flashcache" 
    6. Modify the cell to use flash cache in write-back mode.
      # dcli -g cell_group -l root cellcli -e "ALTER CELL flashCacheMode=writeback"
    7. Re-create the flash cache.

      If the flash cache utilizes all available flash cell disks, you can use the ALL keyword instead of listing the cell disks.

      If the size attribute is not specified, then the flash cache consumes all available space on each cell disk.

      # dcli –l root –g cell_group cellcli -e "create flashcache celldisk=\'FD_02_dm01celadm12,
      FD_03_dm01celadm12,FD_00_dm01celadm12,FD_01_dm01celadm12\'
    8. Verify that flashCacheMode is set to writeback.
      # dcli –l root –g cell_group cellcli -e "list cell detail" | grep flashCacheMode
  3. Flush the PMEM cache.

    If the PMEM cache utilizes all available PMEM cell disks, you can use the ALL keyword as shown here.

    # dcli –l root –g cell_group cellcli -e "ALTER PMEMCACHE ALL FLUSH"

    Otherwise, list the specific disks using the CELLDISK="cdisk1 [,cdisk2] ..." clause.

  4. Drop the PMEM cache.
    # dcli –l root –g cell_group cellcli -e "DROP PMEMCACHE"
  5. Modify the cell to use PMEM cache in write-back mode.
    # dcli –l root –g cell_group cellcli -e "ALTER CELL pmemCacheMode=WriteBack"

    Starting with Oracle Exadata System Software release 20.1.0, this command warns about the best practice recommendation to use PMEM cache in write-through mode and prompts for confirmation of the change.

  6. Re-create the PMEM cache.

    If the PMEM cache utilizes all available PMEM cell disks, you can use the ALL keyword as shown here. Otherwise, list the specific disks using the CELLDISK="cdisk1 [,cdisk2] ..." clause. If the size attribute is not specified, then the PMEM cache consumes all available space on each cell disk.

    # dcli –l root –g cell_group cellcli -e "CREATE PMEMCACHE ALL"
  7. Verify that pmemCacheMode is set to writeback.
    # dcli –l root –g cell_group cellcli -e "list cell detail" | grep pmemCacheMode
3.5.3.2 Disable Write-Back PMEM Cache

Use these steps if you need to disable Write-Back PMEM cache on the storage servers.

You do not have to stop the cellsrv process or inactivate grid disks when disabling Write-Back PMEM cache. However, to reduce the performance impact on the application, disable the Write-Back PMEM cache during a period of reduced workload.

  1. Validate all the Physical Disks are in NORMAL state before modifying PMEM cache.

    The following command should return no rows:

    # dcli –l root –g cell_group cellcli –e “LIST PHYSICALDISK ATTRIBUTES name,status”|grep –v NORMAL
  2. Flush the PMEM cache.
    # dcli –g cell_group –l root cellcli -e "ALTER PMEMCACHE ALL FLUSH" 

    The PMEM cache flushes the dirty data to the lower layer Write-Back Flash Cache.

  3. Check that the flushing operation for the PMEM cache has completed.

    The flushing process is complete when the PMEM devices do not show up in the cachedBy attribute for the grid disks.

    CellCLI> LIST GRIDDISK ATTRIBUTES name, cachedBy
    DATA_CD_00_cel01     FD_00_cel01
    DATA_CD_01_cel01     FD_01_cel01
    DATA_CD_02_cel01     FD_03_cel01
    DATA_CD_03_cel01     FD_02_cel01
    DATA_CD_04_cel01     FD_00_cel01
    DATA_CD_05_cel01     FD_02_cel01
    ...
    
  4. Drop the PMEM cache.
    # dcli -g cell_group -l root cellcli -e drop pmemcache all 
  5. Set the pmemCacheMode attribute to writethrough.
    # dcli -g cell_group -l root cellcli -e "ALTER CELL pmemCacheMode=writethrough"
  6. Re-create the PMEM cache.
    # dcli –l root –g cell_group cellcli -e create pmemcache all 
  7. Verify the pmemCacheMode has been set to writethrough.
    CellCLI> LIST CELL ATTRIBUTES pmemcachemode
       WriteThrough

3.6 Maintaining the M.2 Disks of Oracle Exadata Storage Server

Oracle Exadata X7 and later systems come with two internal M.2 devices that contain the system area.

In all previous systems, the first two disks of the Oracle Exadata Storage Server are system disks and the portions on these system disks are referred to as the system area.

Note:

Oracle Exadata Rack and Oracle Exadata Storage Servers can remain online and available while replacing an M.2 disk.

3.6.1 Monitoring the Status of M.2 Disks

You can monitor the status of a M.2 disk by checking its attributes with the CellCLI LIST PHYSICALDISK command.

The disk firmware maintains the error counters, and marks a drive with Predictive Failure when the disk is about to fail. The drive, not the cell software, determines if it needs replacement.

  • Use the CellCLI command LIST PHSYICALDISK to determine the status of a M.2 disk:
    CellCLI> LIST PHYSICALDISK WHERE disktype='M2Disk' DETAIL
             name:                           M2_SYS_0
               deviceName:                  /dev/sdm
               diskType:                      M2Disk
               makeModel:                    "INTEL SSDSCKJB150G7"
             physicalFirmware:         N2010112
             physicalInsertTime:      2017-07-14T08:42:24-07:00
             physicalSerial:            PHDW7082000M150A
             physicalSize:               139.73558807373047G
             slotNumber:                  "M.2 Slot: 0"
             status:                failed
    
             name:                  M2_SYS_1        
             deviceName:            /dev/sdn
             diskType:              M2Disk
             makeModel:             "INTEL SSDSCKJB150G7"
             physicalFirmware:      N2010112
             physicalInsertTime:    2017-07-14T12:25:05-07:00
             physicalSerial:        PHDW708200SZ150A
             physicalSize:          139.73558807373047G
             slotNumber:            "M.2 Slot: 1"
             status:                normal

    The Exadata Storage Server M.2 disk statuses are:

    • normal

    • not present

    • failed

    • warning - predictive failure

3.6.2 Replacing a M.2 Disk Due to Failure or Other Problems

Failure of a M.2 disks reduces redundancy of the system area, and can impact patching, imaging, and system rescue. Therefore, the disk should be replaced with a new disk as soon as possible. When a M.2 disk fails, the storage server automatically and transparently switches to using the software stored on the inactive system disk, making it the active system disk.

An Exadata alert is generated when an M.2 disk fails. The alert includes specific instructions for replacing the disk. If you have configured the system for alert notifications, then the alert is sent by e-mail to the designated address.

M.2 disk is hot-pluggable and can be replaced when the power is on.

After the M.2 disk is replaced, Oracle Exadata System Software automatically adds the new device to the system partition and starts the rebuilding process.

  1. Identify the failed M.2 disk.
    CellCLI> LIST PHYSICALDISK WHERE diskType=M2Disk AND status!=normal DETAIL
             name:                      M2_SYS_0
             deviceName:                /dev/sda
             diskType:                  M2Disk
             makeModel:                 "INTEL SSDSCKJB150G7"
             physicalFirmware:          N2010112
             physicalInsertTime:        2017-07-14T08:42:24-07:00
             physicalSerial:            PHDW7082000M150A
             physicalSize:              139.73558807373047G
             slotNumber:                "M.2 Slot: 0"
             status:                    failed
    
  2. Locate the cell that has the white LED lit.
  3. Open the chassis and identify the M.2 disk by the slot number in Step 1. The amber LED for this disk should be lit to indicate service is needed.

    M.2 disks are hot pluggable, so you do not need to power down the cell before replacing the disk.

  4. Remove the M.2 disk:
    1. Rotate both riser board socket ejectors up and outward as far as they will go.
      The green power LED on the riser board turns off when you open the socket ejectors.
    2. Carefully lift the riser board straight up to the remove it from the sockets.
  5. Insert the replacement M.2 disk:
    1. Unpack the replacement flash riser board and place it on an antistatic mat.
    2. Align the notch in the replacement riser board with the connector key in the connector socket.
    3. Push the riser board into the connector socket until the riser board is securely seated in the socket.

      Caution:

      If the riser board does not easily seat into the connector socket, verify that the notch in the riser board is aligned with the connector key in the connector socket. If the notch is not aligned, damage to the riser board might occur.

    4. Rotate both riser board socket ejectors inward until the ejector tabs lock the riser board in place.
      The green power LED on the riser board turns on when you close the socket ejectors.
  6. Confirm the M.2 disk has been replaced.
    CellCLI> LIST PHYSICALDISK WHERE DISKTYPE=M2Disk DETAIL
       name:                  M2_SYS_0 
       deviceName:            /dev/sdm   
       diskType:              M2Disk   
       makeModel:             "INTEL SSDSCKJB150G7"   
       physicalFirmware:      N2010112    
       physicalInsertTime:    2017-08-24T18:55:13-07:00   
       physicalSerial:        PHDW708201G0150A   
       physicalSize:          139.73558807373047G   
       slotNumber:            "M.2 Slot: 0"   
       status:                normal   
    
       name:                  M2_SYS_1   
       deviceName:            /dev/sdn   
       diskType:              M2Disk   
       makeModel:             "INTEL SSDSCKJB150G7"    
       physicalFirmware:      N2010112   
       physicalInsertTime:    2017-08-24T18:55:13-07:00   
       physicalSerial:        PHDW708200SZ150A   
       physicalSize:          139.73558807373047G   
       slotNumber:            "M.2 Slot: 1"   
       status:                normal
  7. Confirm the system disk arrays are have an active sync status, or are being rebuilt.
    # mdadm --detail /dev/md[2-3][4-5]
    /dev/md24:
          Container : /dev/md/imsm0, member 0
         Raid Level : raid1
         Array Size : 104857600 (100.00 GiB 107.37 GB)
      Used Dev Size : 104857600 (100.00 GiB 107.37 GB)
       Raid Devices : 2
      Total Devices : 2
    
                   State  : active
     Active Devices  : 2
    Working Devices  : 2
     Failed Devices  : 0
       Spare Devices : 0  
    
                UUUID : 152f728a:6d294098:5177b2e5:8e0d766c
       Number    Major    Minor    RaidDevice    State
            1        8       16             0    active sync  /dev/sdb
            0        8        0             1    active sync  /dev/sda
    /dev/md25:
          Container : /dev/md/imsm0, member 1
         Raid Level : raid1
         Array Size : 41660416 (39.73 GiB 42.66 GB)
      Used Dev Size : 41660544 (39.73 GiB 42.66 GB)
       Raid Devices : 2
      Total Devices : 2
    
                   State  : clean
     Active Devices  : 2
    Working Devices  : 2
     Failed Devices  : 0
       Spare Devices : 0  
    
                 UUID : 466173ba:507008c7:6d65ed89:3c40cf23
       Number    Major    Minor    RaidDevice    State
            1        8       16             0    active sync  /dev/sdb
            0        8        0             1    active sync  /dev/sda

3.7 Resizing Grid Disks

You can resize grid disks and Oracle ASM disk groups to shrink one with excess free space and increase the size of another that is near capacity.

Initial configuration of Oracle Exadata disk group sizes is based on Oracle best practices and the location of the backup files.
  • For internal backups: allocation of available space is 40% for the DATA disk groups, and 60% for the RECO disk groups.

  • For external backups: allocation of available space is 80% for the DATA disk group, and 20% for the RECO disk group.

The disk group allocations can be changed after deployment. For example, the DATA disk group allocation may be too small at 60%, and need to be resized to 80%.

If your system has no free space available on the cell disks and one disk group, for example RECO, has plenty of free space, then you can resize the RECO disk group to a smaller size and reallocate the free space to the DATA disk group. The free space available after shrinking the RECO disk group is at a non-contiguous offset from the existing space allocations for the DATA disk group. Grid disks can use space anywhere on the cell disks and do not have to be contiguous.

If you are expanding the grid disks and the cell disks already have sufficient space to expand the existing grid disks, then you do not need to first resize an existing disk group. You would skip steps 2 and 3 below where the example shows the RECO disk group and grid disks are shrunk (you should still verify the cell disks have enough free space before growing the DATA grid disks). The amount of free space the administrator should reserve depends on the level of failure coverage.

If you are shrinking the size of the grid disks, you should understand how space is reserved for mirroring. Data is protected by Oracle ASM using normal or high redundancy to create one or two copies of data, which are stored as file extents. These copies are stored in separate failure groups. A failure in one failure group does not affect the mirror copies, so data is still accessible.

When a failure occurs, Oracle ASM re-mirrors, or rebalances, any extents that are not accessible so that redundancy is reestablished. For the re-mirroring process to succeed, sufficient free space must exist in the disk group to allow creation of the new file extent mirror copies. If there is not enough free space, then some extents will not be re-mirrored and the subsequent failure of the other data copies will require the disk group to be restored from backup. Oracle ASM sends an error when a re-mirror process fails due to lack of space.

You must be using Oracle Exadata System Software release 12.1.2.1.0 or higher, or have the patch for bug 19695225 applied to your software.

This procedure for resizing grid disks applies to bare metal and virtual machine (VM) deployments.

3.7.1 Determine the Amount of Available Space

To increase the size of the disks in a disk group you must either have unallocated disk space available, or you have to reallocate space currently used by a different disk group.

You can also use a script available in "Script to Calculate New Grid Disk and Disk Group Sizes in Exadata (My Oracle Support Doc ID 1464809.1)" to assist in determining how much free space is available to shrink a disk group.

  1. View the space currently used by the disk groups.
    SELECT name, total_mb, free_mb, total_mb - free_mb used_mb, round(100*free_mb/total_mb,2) pct_free
    FROM v$asm_diskgroup
    ORDER BY 1;
    
    NAME                             TOTAL_MB    FREE_MB    USED_MB   PCT_FREE
    ------------------------------ ---------- ---------- ---------- ----------
    DATAC1                           68812800    9985076   58827724      14.51
    RECOC1                           94980480   82594920   12385560      86.96

    The example above shows that the DATAC1 disk group has only about 15% of free space available while the RECOC1 disk group has about 87% free disk space. The PCT_FREE displayed here is raw free space, not usable free space. Additional space is needed for rebalancing operations.

  2. For the disk groups you plan to resize, view the count and status of the failure groups used by the disk groups.
    SELECT dg.name, d.failgroup, d.state, d.header_status, d.mount_mode, 
     d.mode_status, count(1) num_disks
    FROM V$ASM_DISK d, V$ASM_DISKGROUP dg
    WHERE d.group_number = dg.group_number
    AND dg.name IN ('RECOC1', 'DATAC1')
    GROUP BY dg.name, d.failgroup, d.state, d.header_status, d.mount_status,
      d.mode_status
    ORDER BY 1, 2, 3;
    
    NAME       FAILGROUP      STATE      HEADER_STATU MOUNT_S  MODE_ST  NUM_DISKS
    ---------- -------------  ---------- ------------ -------- -------  ---------
    DATAC1     EXA01CELADM01  NORMAL     MEMBER        CACHED  ONLINE   12
    DATAC1     EXA01CELADM02  NORMAL     MEMBER        CACHED  ONLINE   12
    DATAC1     EXA01CELADM03  NORMAL     MEMBER        CACHED  ONLINE   12
    DATAC1     EXA01CELADM04  NORMAL     MEMBER        CACHED  ONLINE   12
    DATAC1     EXA01CELADM05  NORMAL     MEMBER        CACHED  ONLINE   12
    DATAC1     EXA01CELADM06  NORMAL     MEMBER        CACHED  ONLINE   12
    DATAC1     EXA01CELADM07  NORMAL     MEMBER        CACHED  ONLINE   12
    DATAC1     EXA01CELADM08  NORMAL     MEMBER        CACHED  ONLINE   12
    DATAC1     EXA01CELADM09  NORMAL     MEMBER        CACHED  ONLINE   12
    DATAC1     EXA01CELADM10  NORMAL     MEMBER        CACHED  ONLINE   12
    DATAC1     EXA01CELADM11  NORMAL     MEMBER        CACHED  ONLINE   12
    DATAC1     EXA01CELADM12  NORMAL     MEMBER        CACHED  ONLINE   12
    DATAC1     EXA01CELADM13  NORMAL     MEMBER        CACHED  ONLINE   12
    DATAC1     EXA01CELADM14  NORMAL     MEMBER        CACHED  ONLINE   12
    RECOC1     EXA01CELADM01  NORMAL     MEMBER        CACHED  ONLINE   12
    RECOC1     EXA01CELADM02  NORMAL     MEMBER        CACHED  ONLINE   12
    RECOC1     EXA01CELADM03  NORMAL     MEMBER        CACHED  ONLINE   12
    RECOC1     EXA01CELADM04  NORMAL     MEMBER        CACHED  ONLINE   12
    RECOC1     EXA01CELADM05  NORMAL     MEMBER        CACHED  ONLINE   12
    RECOC1     EXA01CELADM06  NORMAL     MEMBER        CACHED  ONLINE   12
    RECOC1     EXA01CELADM07  NORMAL     MEMBER        CACHED  ONLINE   12
    RECOC1     EXA01CELADM08  NORMAL     MEMBER        CACHED  ONLINE   12
    RECOC1     EXA01CELADM09  NORMAL     MEMBER        CACHED  ONLINE   12
    RECOC1     EXA01CELADM10  NORMAL     MEMBER        CACHED  ONLINE   12
    RECOC1     EXA01CELADM11  NORMAL     MEMBER        CACHED  ONLINE   12
    RECOC1     EXA01CELADM12  NORMAL     MEMBER        CACHED  ONLINE   12
    RECOC1     EXA01CELADM13  NORMAL     MEMBER        CACHED  ONLINE   12
    RECOC1     EXA01CELADM14  NORMAL     MEMBER        CACHED  ONLINE   12
    

    The above example is for a full rack, which has 14 cells and 14 failure groups for DATAC1 and RECOC1. Verify that each failure group has at least 12 disks in the NORMAL state (num_disks). If you see disks listed as MISSING, or you see an unexpected number of disks for your configuration, then do not proceed until you resolve the problem.

    Extreme Flash systems should see a disk count of 8 instead of 12 for num_disks.

  3. List the corresponding grid disks associated with each cell and each failure group, so you know which grid disks to resize.
    SELECT dg.name, d.failgroup, d.path
    FROM V$ASM_DISK d, V$ASM_DISKGROUP dg
    WHERE d.group_number = dg.group_number
    AND dg.name IN ('RECOC1', 'DATAC1')
    ORDER BY 1, 2, 3;
    
    NAME        FAILGROUP      PATH
    ----------- -------------  ----------------------------------------------
    DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_00_exa01celadm01
    DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_01_exa01celadm01
    DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_02_exa01celadm01
    DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_03_exa01celadm01
    DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_04_exa01celadm01
    DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_05_exa01celadm01
    DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_06_exa01celadm01
    DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_07_exa01celadm01
    DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_08_exa01celadm01
    DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_09_exa01celadm01
    DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_10_exa01celadm01
    DATAC1      EXA01CELADM01  o/192.168.74.43/DATAC1_CD_11_exa01celadm01
    DATAC1      EXA01CELADM02  o/192.168.74.44/DATAC1_CD_00_exa01celadm01
    DATAC1      EXA01CELADM02  o/192.168.74.44/DATAC1_CD_01_exa01celadm01
    DATAC1      EXA01CELADM02  o/192.168.74.44/DATAC1_CD_02_exa01celadm01
    ...
    RECOC1      EXA01CELADM13  o/192.168.74.55/RECOC1_CD_00_exa01celadm13
    RECOC1      EXA01CELADM13  o/192.168.74.55/RECOC1_CD_01_exa01celadm13
    RECOC1      EXA01CELADM13  o/192.168.74.55/RECOC1_CD_02_exa01celadm13
    ...
    RECOC1      EXA01CELADM14  o/192.168.74.56/RECOC1_CD_09_exa01celadm14
    RECOC1      EXA01CELADM14  o/192.168.74.56/RECOC1_CD_10_exa01celadm14
    RECOC1      EXA01CELADM14  o/192.168.74.56/RECOC1_CD_11_exa01celadm14  
    
    168 rows returned.
  4. Check the cell disks for available free space.
    Free space on the cell disks can be used to increase the size of the DATAC1 grid disks. If there is not enough available free space to expand the DATAC1 grid disks, then you must shrink the RECOC1 grid disks to provide the additional space for the desired new size of DATAC1 grid disks.
    [root@exa01adm01 tmp]# dcli -g ~/cell_group -l root "cellcli -e list celldisk \
      attributes name,freespace" 
    exa01celadm01: CD_00_exa01celadm01 0 
    exa01celadm01: CD_01_exa01celadm01 0 
    exa01celadm01: CD_02_exa01celadm01 0 
    exa01celadm01: CD_03_exa01celadm01 0 
    exa01celadm01: CD_04_exa01celadm01 0 
    exa01celadm01: CD_05_exa01celadm01 0 
    exa01celadm01: CD_06_exa01celadm01 0 
    exa01celadm01: CD_07_exa01celadm01 0 
    exa01celadm01: CD_08_exa01celadm01 0 
    exa01celadm01: CD_09_exa01celadm01 0 
    exa01celadm01: CD_10_exa01celadm01 0 
    exa01celadm01: CD_11_exa01celadm01 0 
    ...

    In this example, there is no free space available, so you must shrink the RECOC1 grid disks first to provide space for the DATAC1 grid disks. In your configuration there might be plenty of free space available and you can use that free space instead of shrinking the RECOC1 grid disks.

  5. Calculate the amount of space to shrink from the RECOC1 disk group and from each grid disk.

    The minimum size to safely shrink a disk group and its grid disks must take into account the following:

    • Space currently in use (USED_MB)

    • Space expected for growth (GROWTH_MB)

    • Space needed to rebalance in case of disk failure (DFC_MB), typically 15% of total disk group size

    The minimum size calculation taking the above factors into account is:

    Minimum DG size (MB) = ( USED_MB + GROWTH_MB ) * 1.15 
    • USED_MB can be derived from V$ASM_DISKGROUP by calculating TOTAL_MB - FREE_MB

    • GROWTH_MB is an estimate specific to how the disk group will be used in the future and should be based on historical patterns of growth

    For the RECOC1 disk group space usage shown in step 1, we see the minimum size it can shrink to assuming no growth estimates is:

    Minimum RECOC1 size = (TOTAL_MB - FREE_MB + GROWTH_MB) * 1.15

    = ( 94980480 - 82594920 + 0) * 1.15 = 14243394 MB = 13,910 GB

    In the example output shown in Step 1, RECOC1 has plenty of free space and DATAC1 has less than 15% free. So, you could shrink RECOC1 and give the freed disk space to DATAC1. If you decide to reduce RECOC1 to half of its current size, the new size is 94980480 / 2 = 47490240 MB. This size is significantly above the minimum size we calculated for the RECOC1 disk group above, so it is safe to shrink it down to this value.

    The query in Step 2 shows that there are 168 grid disks for RECOC1, because there are 14 cells and 12 disks per cell (14 * 12 = 168). The estimated new size of each grid disk for the RECOC1 disk group is 47490240 / 168, or 282,680 MB.

    Find the closest 16 MB boundary for the new grid disk size. If you do not perform this check, then the cell will round down the grid disk size to the nearest 16 MB boundary automatically, and you could end up with a mismatch in size between the Oracle ASM disks and the grid disks.

    SQL> SELECT 16*TRUNC(&new_disk_size/16) new_disk_size FROM dual;
    Enter value for new_disk_size: 282680
    
    NEW_DISK_SIZE
    -------------
           282672

    Based on the above result, you should choose 282672 MB as the new size for the grid disks in the RECOC1 disk group. After resizing the grid disks, the size of the RECOC1 disk group will be 47488896 MB.

  6. Calculate how much to increase the size of each grid disk in the DATAC1 disk group.

    Ensure the Oracle ASM disk size and the grid disk sizes match across the entire disk group. The following query shows the combinations of disk sizes in each disk group. Ideally, there is only one size found for all disks and the sizes of both the Oracle ASM (total_mb) disks and the grid disks (os_mb) match.

    SELECT dg.name, d.total_mb, d.os_mb, count(1) num_disks
    FROM v$asm_diskgroup dg, v$asm_disk d
    WHERE dg.group_number = d.group_number
    GROUP BY dg.name, d.total_mb, d.os_mb;
    
    NAME                             TOTAL_MB      OS_MB  NUM_DISKS
    ------------------------------ ---------- ---------- ----------
    DATAC1                             409600     409600        168
    RECOC1                             565360     565360        168
    

    After shrinking RECOC1's grid disks, the following space is left per disk for DATAC1:

    Additional space for DATAC1 disks = RECOC1_current_size - RECOC1_new_size
                                                           = 565360 - 282672 = 282688 MB

    To calculate the new size of the grid disks for the DATAC1 disk group, use the following:

    DATAC1's disks new size  = DATAC1_ disks_current_size + new_free_space_from_RECOC1
                                              = 409600 + 282688 = 692288 MB

    Find the closest 16 MB boundary for the new grid disk size. If you do not perform this check, then the cell will round down the grid disk size to the nearest 16 MB boundary automatically, and you could end up with a mismatch in size between the Oracle ASM disks and the grid disks.

    SQL> SELECT 16*TRUNC(&new_disk_size/16) new_disk_size FROM dual;
    Enter value for new_disk_size: 692288
    
    NEW_DISK_SIZE
    -------------
           692288

    Based on the query result, you can use the calculated size of 692288 MB for the disks in the DATAC1 disk groups because the size is on a 16 MB boundary. If the result of the query is different from the value you supplied, then you must use the value returned by the query because that is the value to which the cell will round the grid disk size.

    The calculated value of the new grid disk size will result in the DATAC1 disk group having a total size of 116304384 MB (168 disks * 692288 MB).

3.7.2 Shrink the Oracle ASM Disks in the Donor Disk Group

If there is no free space available on the cell disks, you can reduce the space used by one disk group to provide additional disk space for a different disk group.

This task is a continuation of an example where space in the RECOC1 disk group is being reallocated to the DATAC1 disk group.
Before resizing the disk group, make sure the disk group you are taking space from has sufficient free space.
  1. Shrink the Oracle ASM disks for the RECO disk group down to the new desired size for all disks.

    Use the new size for the disks in the RECO disk group that was calculated in Step 5 of Determine the Amount of Available Space.

    SQL> ALTER DISKGROUP recoc1 RESIZE ALL SIZE 282672M REBALANCE POWER 64;

    Note:

    The ALTER DISKGROUP command may take several minutes to complete. The SQL prompt will not return until this operation has completed.

    If the specified disk group has quorum disks configured within the disk group, then the ALTER DISKGROUP ... RESIZE ALL command could fail with error ORA-15277. Quorum disks are configured if the requirements specified in Managing Quorum Disks for High Redundancy Disk Groups are met.

    As a workaround, for regular storage server failure groups (FAILGROUP_TYPE=REGULAR, not QUORUM), you can specify the failure group names explicitly in the SQL command, for example:

    SQL> ALTER DISKGROUP recoc1 RESIZE DISKS IN FAILGROUP exacell01 SIZE 282672M,
    exacell02 SIZE 282672M, exacell03 SIZE 282672M REBALANCE POWER 64;

    Wait for rebalance to finish by checking the view GV$ASM_OPERATION.

    SQL> set lines 250 pages 1000
    SQL> col error_code form a10
    SQL> SELECT dg.name, o.*
      2  FROM gv$asm_operation o, v$asm_diskgroup dg
      3  WHERE o.group_number = dg.group_number;

    Proceed to the next step ONLY when the query against GV$ASM_OPERATION shows no rows for the disk group being altered.

  2. Verify the new size of the ASM disks using the following queries:
    SQL> SELECT name, total_mb, free_mb, total_mb - free_mb used_mb,
      2   ROUND(100*free_mb/total_mb,2) pct_free
      3  FROM v$asm_diskgroup
      4  ORDER BY 1;
    
    NAME                             TOTAL_MB    FREE_MB    USED_MB   PCT_FREE
    ------------------------------ ---------- ---------- ---------- ----------
    DATAC1                           68812800    9985076   58827724      14.51
    RECOC1                           47488896   35103336   12385560      73.92
    
    SQL> SELECT dg.name, d.total_mb, d.os_mb, COUNT(1) num_disks
      2  FROM v$asm_diskgroup dg, v$asm_disk d
      3  WHERE dg.group_number = d.group_number
      4  GROUP BY dg.name, d.total_mb, d.os_mb;
    
    NAME                             TOTAL_MB      OS_MB  NUM_DISKS
    ------------------------------ ---------- ---------- ----------
    DATAC1                             409600     409600        168
    RECOC1                             282672     565360        168

    The above query example shows that the disks in the RECOC1 disk group have been resized to a size of 282672 MG each, and the total disk group size is 47488896 MB.

3.7.3 Shrink the Grid Disks in the Donor Disk Group

After shrinking the disks in the Oracle ASM disk group, you then shrink the size of the grid disks on each cell.

This task is a continuation of an example where space in the RECOC1 disk group is being reallocated to the DATAC1 disk group.
You must have first completed the task Shrink the Oracle ASM Disks in the Donor Disk Group.
  1. Shrink the grid disks associated with the RECO disk group on all cells down to the new, smaller size.

    For each storage cell identified in Determine the Amount of Available Space in Step 3, shrink the grid disks to match the size of the Oracle ASM disks that were shrunk in the previous task. Use commands similar to the following:

    dcli -c exa01celadm01 -l root "cellcli -e alter griddisk RECOC1_CD_00_exa01celadm01 \
    ,RECOC1_CD_01_exa01celadm01 \
    ,RECOC1_CD_02_exa01celadm01 \
    ,RECOC1_CD_03_exa01celadm01 \
    ,RECOC1_CD_04_exa01celadm01 \
    ,RECOC1_CD_05_exa01celadm01 \
    ,RECOC1_CD_06_exa01celadm01 \
    ,RECOC1_CD_07_exa01celadm01 \
    ,RECOC1_CD_08_exa01celadm01 \
    ,RECOC1_CD_09_exa01celadm01 \
    ,RECOC1_CD_10_exa01celadm01 \
    ,RECOC1_CD_11_exa01celadm01 \
    size=282672M "
    
    dcli -c exa01celadm02 -l root "cellcli -e alter griddisk RECOC1_CD_00_exa01celadm02 \
    ,RECOC1_CD_01_exa01celadm02 \
    ,RECOC1_CD_02_exa01celadm02 \
    ,RECOC1_CD_03_exa01celadm02 \
    ,RECOC1_CD_04_exa01celadm02 \
    ,RECOC1_CD_05_exa01celadm02 \
    ,RECOC1_CD_06_exa01celadm02 \
    ,RECOC1_CD_07_exa01celadm02 \
    ,RECOC1_CD_08_exa01celadm02 \
    ,RECOC1_CD_09_exa01celadm02 \
    ,RECOC1_CD_10_exa01celadm02 \
    ,RECOC1_CD_11_exa01celadm02 \
    size=282672M "
    
    ...
    
    dcli -c exa01celadm14 -l root "cellcli -e alter griddisk RECOC1_CD_00_exa01celadm14 \
    ,RECOC1_CD_01_exa01celadm14 \
    ,RECOC1_CD_02_exa01celadm14 \
    ,RECOC1_CD_03_exa01celadm14 \
    ,RECOC1_CD_04_exa01celadm14 \
    ,RECOC1_CD_05_exa01celadm14 \
    ,RECOC1_CD_06_exa01celadm14 \
    ,RECOC1_CD_07_exa01celadm14 \
    ,RECOC1_CD_08_exa01celadm14 \
    ,RECOC1_CD_09_exa01celadm14 \
    ,RECOC1_CD_10_exa01celadm14 \
    ,RECOC1_CD_11_exa01celadm14 \
    size=282672M "
  2. Verify the new size of the grid disks using the following command:
    [root@exa01adm01 tmp]# dcli -g cell_group -l root "cellcli -e list griddisk attributes name,size where name like \'RECOC1.*\' "
    
    exa01celadm01: RECOC1_CD_00_exa01celadm01 276.046875G
    exa01celadm01: RECOC1_CD_01_exa01celadm01 276.046875G
    exa01celadm01: RECOC1_CD_02_exa01celadm01 276.046875G
    exa01celadm01: RECOC1_CD_03_exa01celadm01 276.046875G
    exa01celadm01: RECOC1_CD_04_exa01celadm01 276.046875G
    exa01celadm01: RECOC1_CD_05_exa01celadm01 276.046875G
    exa01celadm01: RECOC1_CD_06_exa01celadm01 276.046875G
    exa01celadm01: RECOC1_CD_07_exa01celadm01 276.046875G
    exa01celadm01: RECOC1_CD_08_exa01celadm01 276.046875G
    exa01celadm01: RECOC1_CD_09_exa01celadm01 276.046875G
    exa01celadm01: RECOC1_CD_10_exa01celadm01 276.046875G
    exa01celadm01: RECOC1_CD_11_exa01celadm01 276.046875G  
    ...

    The above example shows that the disks in the RECOC1 disk group have been resized to a size of 282672 MB each (276.046875 * 1024).

3.7.4 Increase the Size of the Grid Disks Using Available Space

You can increase the size used by the grid disks if there is unallocated disk space either already available, or made available by shrinking the space used by a different Oracle ASM disk group.

This task is a continuation of an example where space in the RECOC1 disk group is being reallocated to the DATAC1 disk group. If you already have sufficient space to expand an existing disk group, then you do not need to reallocate space from a different disk group.

  1. Check that the cell disks have the expected amount of free space.
    After completing the tasks to shrink the Oracle ASM disks and the grid disks, you would expect to see the following free space on the cell disks:
    [root@exa01adm01 tmp]# dcli -g ~/cell_group -l root "cellcli -e list celldisk \
    attributes name,freespace"
    
    exa01celadm01: CD_00_exa01celadm01 276.0625G
    exa01celadm01: CD_01_exa01celadm01 276.0625G
    exa01celadm01: CD_02_exa01celadm01 276.0625G
    exa01celadm01: CD_03_exa01celadm01 276.0625G
    exa01celadm01: CD_04_exa01celadm01 276.0625G
    exa01celadm01: CD_05_exa01celadm01 276.0625G
    exa01celadm01: CD_06_exa01celadm01 276.0625G
    exa01celadm01: CD_07_exa01celadm01 276.0625G
    exa01celadm01: CD_08_exa01celadm01 276.0625G
    exa01celadm01: CD_09_exa01celadm01 276.0625G
    exa01celadm01: CD_10_exa01celadm01 276.0625G
    exa01celadm01: CD_11_exa01celadm01 276.0625G 
    ...
  2. For each storage cell, increase the size of the DATA grid disks to the desired new size.

    Use the size calculated in Determine the Amount of Available Space.

    dcli -c exa01celadm01 -l root "cellcli -e alter griddisk DATAC1_CD_00_exa01celadm01 \
    ,DATAC1_CD_01_exa01celadm01 \
    ,DATAC1_CD_02_exa01celadm01 \
    ,DATAC1_CD_03_exa01celadm01 \
    ,DATAC1_CD_04_exa01celadm01 \
    ,DATAC1_CD_05_exa01celadm01 \
    ,DATAC1_CD_06_exa01celadm01 \
    ,DATAC1_CD_07_exa01celadm01 \
    ,DATAC1_CD_08_exa01celadm01 \
    ,DATAC1_CD_09_exa01celadm01 \
    ,DATAC1_CD_10_exa01celadm01 \
    ,DATAC1_CD_11_exa01celadm01 \
    size=692288M "
    ...
    dcli -c exa01celadm14 -l root "cellcli -e alter griddisk DATAC1_CD_00_exa01celadm14 \
    ,DATAC1_CD_01_exa01celadm14 \
    ,DATAC1_CD_02_exa01celadm14 \
    ,DATAC1_CD_03_exa01celadm14 \
    ,DATAC1_CD_04_exa01celadm14 \
    ,DATAC1_CD_05_exa01celadm14 \
    ,DATAC1_CD_06_exa01celadm14 \
    ,DATAC1_CD_07_exa01celadm14 \
    ,DATAC1_CD_08_exa01celadm14 \
    ,DATAC1_CD_09_exa01celadm14 \
    ,DATAC1_CD_10_exa01celadm14 \
    ,DATAC1_CD_11_exa01celadm14 \
    size=692288M "
  3. Verify the new size of the grid disks associated with the DATAC1 disk group using the following command:
    dcli -g cell_group -l root "cellcli -e list griddisk attributes name,size \ 
    where name like \'DATAC1.*\' "
    
    exa01celadm01: DATAC1_CD_00_exa01celadm01 676.0625G
    exa01celadm01: DATAC1_CD_01_exa01celadm01 676.0625G
    exa01celadm01: DATAC1_CD_02_exa01celadm01 676.0625G
    exa01celadm01: DATAC1_CD_03_exa01celadm01 676.0625G
    exa01celadm01: DATAC1_CD_04_exa01celadm01 676.0625G
    exa01celadm01: DATAC1_CD_05_exa01celadm01 676.0625G
    exa01celadm01: DATAC1_CD_06_exa01celadm01 676.0625G
    exa01celadm01: DATAC1_CD_07_exa01celadm01 676.0625G
    exa01celadm01: DATAC1_CD_08_exa01celadm01 676.0625G
    exa01celadm01: DATAC1_CD_09_exa01celadm01 676.0625G
    exa01celadm01: DATAC1_CD_10_exa01celadm01 676.0625G
    exa01celadm01: DATAC1_CD_11_exa01celadm01 676.0625G

Instead of increasing the size of the DATA disk group, you could instead create new disk groups with the new free space or keep it free for future use. In general, Oracle recommends using the smallest number of disk groups needed (typically DATA, RECO, and DBFS_DG) to give the greatest flexibility and ease of administration. However, there may be cases, perhaps when using virtual machines or consolidating many databases, where additional disk groups or available free space for future use may be desired.

If you decide to leave free space on the grid disks in reserve for future use, please see the My Oracle Support Note 1684112.1 for the steps on how to allocate free space to an existing disk group at a later time.

3.7.5 Increase the Size of the Oracle ASM Disks

You can increase the size used by the Oracle ASM disks after increasing the space allocated to the associated grid disks.

This task is a continuation of an example where space in the RECOC1 disk group is being reallocated to the DATAC1 disk group.
You must have completed the task of resizing the grid disks before you can resize the corresponding Oracle ASM disk group.
  1. Increase the Oracle ASM disks for DATAC1 disk group to the new size of the grid disks on the storage cells.
    SQL> ALTER DISKGROUP datac1 RESIZE ALL;

    This command resizes the Oracle ASM disks to match the size of the grid disks.

    Note:

    If the specified disk group has quorum disks configured within the disk group, then the ALTER DISKGROUP ... RESIZE ALL command could fail with error ORA-15277. Quorum disks are configured if the requirements specified in Oracle Exadata Database Machine Maintenance Guide are met.

    As a workaround, for regular storage server failure groups (FAILGROUP_TYPE=REGULAR, not QUORUM), you can specify the failure group names explicitly in the SQL command, for example:

    SQL> ALTER DISKGROUP datac1 RESIZE DISKS IN FAILGROUP exacell01, exacell02, exacell03;
  2. Wait for the rebalance operation to finish.
    SQL> set lines 250 pages 1000 
    SQL> col error_code form a10 
    SQL> SELECT dg.name, o.* FROM gv$asm_operation o, v$asm_diskgroup dg 
         WHERE o.group_number = dg.group_number;

    Do not continue to the next step until the query returns zero rows for the disk group that was altered.

  3. Verify that the new sizes for the Oracle ASM disks and disk group is at the desired sizes.
    SQL> SELECT name, total_mb, free_mb, total_mb - free_mb used_mb, 
         ROUND(100*free_mb/total_mb,2) pct_free
         FROM v$asm_diskgroup
         ORDER BY 1;
    
    NAME                             TOTAL_MB    FREE_MB    USED_MB   PCT_FREE
    ------------------------------ ---------- ---------- ---------- ----------
    DATAC1                          116304384   57439796   58864588      49.39
    RECOC1                           47488896   34542516   12946380      72.74
    
    SQL>  SELECT dg.name, d.total_mb, d.os_mb, COUNT(1) num_disks
          FROM  v$asm_diskgroup dg, v$asm_disk d
          WHERE dg.group_number = d.group_number
          GROUP BY dg.name, d.total_mb, d.os_mb;
     
    NAME                             TOTAL_MB      OS_MB  NUM_DISKS
    ------------------------------ ---------- ---------- ----------
    DATAC1                             692288     692288        168
    RECOC1                             282672     282672        168
    
    

    The results of the queries show that the RECOC1 and DATAC1 disk groups and disk have been resized.

3.8 Using the Oracle Exadata System Software Rescue Procedure

In the rare event that the system disks fail simultaneously, you must use the Oracle Exadata Storage Server rescue procedure to recover the system.

3.8.1 About the Oracle Exadata System Software Rescue Procedure

The rescue procedure is necessary when system disks fail, the operating system has a corrupt file system, or there was damage to the boot area.

If only one system disk fails, then use CellCLI commands to recover.

If you are using normal redundancy, then there is only one mirror copy for the cell being rescued. The data may be irrecoverably lost if that single mirror also fails during the rescue procedure. Oracle recommends that you take a complete backup of the data on the mirror copy, and immediately take the mirror copy cell offline to prevent any new data changes to it prior to attempting a rescue. This ensures that all data residing on the grid disks on the failed cell and its mirror copy is inaccessible during rescue procedure.

The Oracle Automatic Storage Management (Oracle ASM) disk repair timer has a default repair time of 3.6 hours. If you know that you cannot perform the rescue procedure within that time frame, then you should use the Oracle ASM rebalance procedure to rebalance the disk until you can do the rescue procedure.

When using high redundancy disk groups, such as having more than one mirror copy in Oracle ASM for all the grid disks of the failed cell, then take the failed cell offline. Oracle ASM automatically drops the grid disks on the failed cell after the configured Oracle ASM time out, and starts rebalancing data using mirror copies. The default timeout is two hours. If the cell rescue takes more than two hours, then you must re-create the grid disks on the rescued cells in Oracle ASM.

Caution:

Use the rescue procedure with extreme caution. Incorrectly using the procedure can cause data loss.

It is important to note the following when using the rescue procedure:

  • The rescue procedure can potentially rewrite some or all of the disks in the cell. If this happens, then you can lose all the content on those disks without possibility of recovery.

    Use extreme caution when using this procedure, and pay attention to the prompts. Ideally, you should use the rescue procedure only with assistance from Oracle Support Services, and when you have decided that you can afford the loss of data on some or all of the disks.

  • The rescue procedure does not destroy the contents of the data disks or the contents of the data partitions on the system disks unless you explicitly choose to do so during the rescue procedure.

  • Starting in Oracle Exadata System Software release 11.2, the rescue procedure restores the Oracle Exadata System Software to the same release. This includes any patches that existed on the cell as of the last successful boot. Note the following about using the rescue procedure:

    • Cell configuration information, such as alert configurations, SMTP information, administrator e-mail address, and so on is not restored.

    • The network configuration that existed at the end of last successful run of /usr/local/bin/ipconf utility is restored.

    • The SSH identities for the cell, and the root, celladmin and cellmonitor users are restored.

    • Integrated Lights Out Manager (ILOM) configurations for Oracle Exadata Storage Servers are not restored. Typically, ILOM configurations remain undamaged even in case of Oracle Exadata System Software failures.

  • The rescue procedure does not examine or reconstruct data disks or data partitions on the system disks. If there is data corruption on the grid disks, then do not use the rescue procedure. Instead use the rescue procedure for Oracle Database and Oracle ASM.

After a successful rescue, you must reconfigure the cell, and if you had chosen to preserve the data, then import the cell disks. If you chose not to preserve the data, then you should create new cell disks, and grid disks.

3.8.2 Performing the Rescue Procedure

You can use the rescue procedure to recover the Oracle Exadata Storage Server system software.

  1. Connect to the Oracle Exadata Storage Server using the console.
  2. Start the Oracle Exadata Storage Server and select the option to boot into rescue mode.
    • On X7 and newer servers, during the initial boot sequence you will see something like the following menu of boot options:

            Exadata_DBM_0: CELL_BOOT_trying_HD0
            Exadata_DBM_0: CELL_BOOT_trying_CELLBOOT
            Exadata_DBM_1: CELL_BOOT_in_rescue_mode
      
            Use the ^ and v keys to change the selection.
            Press 'e' to edit the selected item, or 'c' for a command prompt. 

      Note:

      The above menu appears for only a short time if there are no inputs. Therefore, to retain the menu, press the up-arrow or down-arrow key immediately when it appears.

      In the menu, select Exadata_DBM_1: CELL_BOOT_in_rescue_mode, and then press Enter.

    • On X6 and older servers, during the initial boot sequence, you will see something like the following:

      Press any key to enter the menu
      Booting Exadata_DBM_0: CELL_USB_BOOT_trying_C0D0_as_HD1 in 4 seconds...
      Booting Exadata_DBM_0: CELL_USB_BOOT_trying_C0D0_as_HD1 in 3 seconds...
      Press any key to see the menu.

      When you see the above, press any key to enter the boot options menu.

      Note:

      For older versions of Oracle Exadata System Software, you may see the "Oracle Exadata" splash screen. If the splash screen appears, press any key on the keyboard. The splash screen remains visible for only 5 seconds.

      In the boot options menu, select CELL_USB_BOOT_CELLBOOT_usb_in_rescue_mode, and then press Enter.

  3. When prompted, select the option to reinstall the Oracle Exadata System Software. Then, confirm your selection.

    For example:

             Choose from the following by typing letter in '()': 
               (e)nter interactive diagnostics shell. 
                 Use diagnostics shell password to login as root user 
                 (reboot or power cycle to exit the shell), 
               (r)einstall or try to recover damaged system, 
    Select: r 
    [INFO     ] Reinstall or try to recover damaged system
    Continue (y/n) [n]: y 
  4. If prompted, specify the rescue root password.
    If you do not have the required password, then contact Oracle Support Services.
  5. When prompted, specify whether you want to erase the data partitions and data disks.

    Specify n to preserve existing data on the storage server.

    If you specify y, you will permanently erase all of the data on the storage server. Do not specify this option unless you are sure that it is safe.

    For example:

    Do you want to erase data partitions and data disks (y/n)  [n]: n
  6. When prompted, specify the root password to enter the rescue shell.

    If you do not have the required password, then contact Oracle Support Services.

    For example:

    [INFO     ] You are in the rescue mode.
    [INFO     ] Imaging pre-boot phase finished with success.
    [INFO     ] Installation will continue after reboot.
    [INFO     ] Log in to the rescue shell as root with the rescue (Diagnostic shell) root password.
    
    ...
    
    Welcome to Exadata Shell!
    Give root password for maintenance
    (or press Control-D to continue):
  7. Using the rescue prompt, reboot the storage server to complete the rescue process.

    For example:

    sh-4.2# shutdown -r now

    The rescue process typically takes between 45 and 90 minutes to complete. The storage server may reboot a few times during the rescue process. An on-screen message indicates when the rescue process is completed. For example:

    Run validation checkconfigs - PASSED
    2020-08-17 18:14:01 -0600 The first boot completed with SUCCESS

    Finally, the login prompt is also displayed.

3.8.3 Configuring Oracle Exadata Storage Servers After Rescue

After a successful rescue, you must configure the cell. If the data partitions were preserved, then the cell disks were imported automatically during the rescue procedure.

  1. Re-create the cell disks and grid disks for any disks that were replaced during the rescue procedure.
    1. Create the cell disks on only the replaced disks using the following command:
      # cellcli -e create celldisk all harddisk
    2. Get the name of the new cell disks created.
    3. Get the mapping of the grid disks.
      cellcli -e list griddisk attributes name,offset,size

      Get the grid disk attributes from an existing disk. If you replaced a system disk (slot 0 or slot 1 on X6 and earlier servers), then you need to retrieve the values from the other system disk. If any of the grid disks are SPARSE grid disks, then also get the virtualsize attribute from another sparse disk.

      For example, if the new grid disks are CD_01* and CD_08*, then you would use a command such as this:

      # cellcli -e list griddisk attributes name,cachingpolicy,offset,size,virtualsize |
      egrep '_CD_00|_CD_07'
      DATAC1_CD_00_dbm01celadm04    default    32M                  779G
      DATAC1_CD_07_dbm01celadm04    default    32M                  779G
      DBFSC1_CD_07_dbm01celadm04    default    1.0575714111328125T  33.6875G
      RECOC1_CD_00_dbm01celadm04    none       887.046875G          195.90625G
      RECOC1_CD_07_dbm01celadm04    none       887.046875G          195.90625G
      SPARSEC1_CD_00_dbm01celadm04  default    779.046875G          108G         1.0546875T
      SPARSEC1_CD_07_dbm01celadm04  default    779.046875G          108G         1.0546875T
    4. Create the grid disks on the new cell disks using the retrieved attributes.

      For example, using the attributes retrieved in the previous step for CD00, you would create the grid disks on CD01 as follows:

      # cellcli -e create griddisk DATAC1_CD_01_dbm01celadm04 celldisk=CD_01_dbm01celadm04, 
      size=779G, cachingpolicy=default
      
      # cellcli -e create griddisk SPARSEC1_CD_01_dbm01celadm04 celldisk=CD_01_dbm01celadm04, 
      size=108G, virtualsize=1.0546875T,cachingpolicy=default
      
      # cellcli -e create griddisk RECOC1_CD_01_dbm01celadm04 celldisk=CD_01_dbm01celadm04 , 
      size=195.90625G, cachingpolicy=none

      Using the attributes retrieved in the previous step for CD07, you would create the grid disks on CD08 as follows:

      # cellcli -e create griddisk DATAC1_CD_08_dbm01celadm04 celldisk=CD_08_dbm01celadm04, 
      size=779G, cachingpolicy=default
      
      # cellcli -e create griddisk SPARSEC1_CD_08_dbm01celadm04 celldisk=CD_08_dbm01celadm04, 
      size=108G, virtualsize=1.0546875T,cachingpolicy=default
      
      # cellcli -e create griddisk RECOC1_CD_08_dbm01celadm04 celldisk=CD_08_dbm01celadm04, 
      size=195.90625G, cachingpolicy=none
      
      # cellcli -e create griddisk DBFSC1_CD_08_dbm01celadm04 celldisk=CD_08_dbm01celadm04, 
      size=33.6875G, cachingpolicy=default
  2. Check the status of the grid disk.
    If any grid disk is inactive, alter its status to active.
    CellCLI> ALTER GRIDDISK ALL ACTIVE
  3. Log in to the Oracle Automatic Storage Management (Oracle ASM) instance, and set the disks to ONLINE for each disk group:
    SQL> ALTER DISKGROUP disk_group_name ONLINE DISKS IN FAILGROUP \
    cell_name WAIT;

    Note:

    • If the command fails because the disks were already force-dropped, then you need to force-add the disks back to the Oracle ASM disk groups.

    • The grid disk attributes asmmodestatus and asmdeactivationoutcome will not report correctly until the ALTER DISKGROUP statement is complete.

  4. Reconfigure the cell using the ALTER CELL command.

    In the following example, e-mail notification is configured to send notification messages to the storage server administrator according to the specified notification policy:

    CellCLI> ALTER CELL                                     -
               mailServer='mail_relay.example.com',            -
               smtpFromAddr='john.doe@example.com',         -
               smtpToAddr='jane.smith@example.com',         -
               notificationPolicy='critical,warning,clear', -
               notificationMethod='mail,snmp'
  5. Re-create the I/O Resource Management (IORM) plan.
  6. Re-create the metric thresholds.

3.8.4 Configuring Oracle Exadata Database Machine Eighth Rack Storage Servers After Rescue

For storage servers that are part of an Eighth Rack system, after a successful rescue, you must configure the cell using these steps.

In Oracle Exadata System Software release 11.2.3.3 and later, no extra steps are needed after cell rescue.

  1. Copy the /opt/oracle.SupportTools/resourcecontrol utility from another storage server to the /opt/oracle.SupportTools/resourcecontrol directory on the recovered server.
  2. Ensure proper permissions are set on the utility.
    # chmod 740 /opt/oracle.SupportTools/resourcecontrol
    
  3. Verify the current configuration.
    # /opt/oracle.SupprtTools/resourcecontrol -show
    
    Validated hardware and OS. Proceed.
    Number of cores active: 6
    Number of harddisks active: 6
    Number of flashdisks active: 8
    

    For an Eighth Rack configuration, the output depends on the hardware model:

    • X3 storage server: 6 active CPU, 6 hard disks, and 8 flash disks should be enabled
    • X4 storage server: 6 active CPU cores, 6 hard disks, and 8 flash disks should be enabled
    • X5 HC storage server: 8 active CPU cores, 6 hard disks, and 2 flash disks should be enabled
    • X5 EF storage server: 8 active CPU cores and 4 flash disks should be enabled
    • X6 HC storage server: 10 active CPU cores, 6 hard disks, and 2 flash disks should be enabled
    • X6 EF storage server: 10 active CPU cores and 4 flash disks should be enabled
    • X7 HC storage server: 10 active CPU cores, 6 hard disks, and 2 flash disks should be enabled
    • X7 EF storage server: 10 active CPU cores and 4 flash disks should be enabled
    • X8 HC storage server: 16 active CPU cores, 6 hard disks, and 2 flash disks should be enabled
    • X8 EF storage server: 16 active CPU cores and 4 flash disks should be enabled
  4. If the configuration shows all the cores and disks enabled, then enable the Eighth Rack configuration.
    CellCLI> ALTER CELL eighthRack=true

3.8.5 Re-creating a Damaged CELLBOOT USB Flash Drive

If the CELLBOOT USB flash drive is lost or damaged, then you can create a new one using the following procedure:

Note:

To create a USB flash drive for a machine running Oracle Exadata Storage Server Software release 12.1.2.1.0 or later requires a machine running Oracle Linux 6.

  1. Log in to the cell as the root user.
  2. On X6 and older servers only, which do not contain M.2 system devices:
    1. Attach a new USB flash drive.
      This flash drive should have a capacity of at least 1 GB, and up to 8 GB.
    2. Remove any other USB flash drives from the system.
  3. Run the following commands:
    # cd /opt/oracle.SupportTools
    # ./make_cellboot_usb -rebuild -verbose

3.9 Changing Existing Elastic Configurations for Storage Cells

You can modify the capacity of your Oracle Exadata using elastic configuration.

3.9.1 Adding a Cell Node

In this scenario, you want to add a new storage server (or cell) to an existing Oracle Exadata that includes disk groups.

  1. If this is a brand new storage server, perform these steps:
    1. Complete all necessary cabling requirements to make the new storage server available to the desired storage grid.

      Refer to the Oracle Exadata Database Machine Installation and Configuration Guide.

    2. Image the storage server with the appropriate Oracle Exadata System Software image and provide appropriate input when prompted for the IP addresses.
  2. If this is an existing storage server in the rack and you are allocating it to another cluster within the RDMA Network Fabric network, note the IP addresses assigned to the RDMA Network Fabric interfaces (such as ib0 and ib1 or re0 and re1) of the storage server being added.

    Add the IP addresses to the /etc/oracle/cell/network-config/cellip.ora file on every Oracle RAC node.

    1. cd /etc/oracle/cell/network-config
    2. cp cellip.ora cellip.ora.orig
    3. cp cellip.ora cellip.ora-bak
    4. Add the new entries to /etc/oracle/cell/network-config/cellip.ora-bak.
    5. Copy the edited file to the cellip.ora file on all database nodes using the following command, where database_nodes refers to a file containing the names of each database server in the cluster, with each name on a separate line:
      /usr/local/bin/dcli -g database_nodes -l root -f cellip.ora-bak -d
       /etc/oracle/cell/network-config/cellip.ora
  3. If Oracle Auto Service Request (ASR) alerting was set up on the existing storage servers, configure cell Oracle ASR alerting for the storage server being added.
    1. From any existing storage server, list the cell snmpsubscriber attribute.
      cellcli -e LIST CELL ATTRIBUTES snmpsubscriber
      
    2. Apply the same snmpsubscriber attribute value to the new storage server by running the command below as the celladmin user, replacing snmpsubscriber with the value from the previous command.
      cellcli -e "ALTER CELL snmpsubscriber=snmpsubscriber"
      
    3. From any existing storage server, list the cell attributes required for configuring cell alerting.
      cellcli -e LIST CELL ATTRIBUTES
       notificationMethod,notificationPolicy,mailServer,smtpToAddr,smtpFrom,
       smtpFromAddr,smtpUseSSL,smtpPort
      
    4. Apply the same values to the new storage server by running the command below as the celladmin user, substituting the placeholders with the values found from the existing storage server.
      cellcli -e "ALTER CELL
       notificationMethod='notificationMethod',
       notificationPolicy='notificationPolicy',
       mailServer='mailServer',
       smtpToAddr='smtpToAddr',
       smtpFrom='smtpFrom',
       smtpFromAddr='smtpFromAddr',
       smtpUseSSL=smtpUseSSL,
       smtpPort=smtpPort"
      
  4. If needed, create cell disks on the cell being added.
    1. Log in to the cell as celladmin and list the cell disks.
      cellcli -e LIST CELLDISK
      
    2. If the cell disks are not present on the media types, then create the cell disks.
      cellcli -e CREATE CELLDISK ALL
    3. If your system has PMEM devices, then check that the PMEM log was created by default.
      cellcli –e LIST PMEMLOG

      You should see the name of the PMEM log. It should look like cellnodename_PMEMLOG, and its status should be normal.

      If the PMEM log does not exist, create it.

      cellcli -e CREATE PMEMLOG ALL
    4. Check that the flash log was created by default.
      cellcli –e LIST FLASHLOG

      You should see the name of the flash log. It should look like cellnodename_FLASHLOG, and its status should be normal.

      If the flash log does not exist, create it.

      cellcli -e CREATE FLASHLOG ALL
      
    5. If the system contains PMEM devices, then check the current PMEM cache mode and compare it to the PMEM cache mode on existing cells.
      cellcli -e LIST CELL ATTRIBUTES pmemcachemode

      Note:

      Commencing with Oracle Exadata System Software release 23.1.0, PMEM cache only operates in write-through mode.

      If the PMEM cache mode on the new cell does not match the existing cells, change the PMEM cache mode as follows:

      1. If the PMEM cache exists and the cell is in WriteBack PMEM cache mode, you must first flush the PMEM cache.

        cellcli –e ALTER PMEMCACHE ALL FLUSH

        Wait for the command to return.

        If the PMEM cache mode is WriteThrough, then you do not need to flush the cache first.

      2. Drop the PMEM cache.

        cellcli -e DROP PMEMCACHE ALL
      3. Change the PMEM cache mode.

        The value of the pmemCacheMode attribute is either writeback or writethrough. The value has to match the PMEM cache mode of the other storage cells in the cluster.

        cellcli -e "ALTER CELL PMEMCacheMode=writeback_or_writethrough"
      4. Re-create the PMEM cache.

        cellcli -e CREATE PMEMCACHE ALL
    6. Check the current flash cache mode and compare it to the flash cache mode on existing cells.
      cellcli -e LIST CELL ATTRIBUTES flashcachemode

      If the flash cache mode on the new cell does not match the existing cells, change the flash cache mode as follows:

      1. If the flash cache exists and the cell is in WriteBack flash cache mode, you must first flush the flash cache.

        cellcli –e ALTER FLASHCACHE ALL FLUSH

        Wait for the command to return.

      2. Drop the flash cache.

        cellcli -e DROP FLASHCACHE ALL
      3. Change the flash cache mode.

        The value of the flashCacheMode attribute is either writeback or writethrough. The value has to match the flash cache mode of the other storage cells in the cluster.

        cellcli -e "ALTER CELL flashCacheMode=writeback_or_writethrough"
      4. Create the flash cache.

        cellcli -e CREATE FLASHCACHE ALL
  5. Create the grid disks on the cell being added.
    1. Query the size and cachingpolicy attributes of the existing grid disks from an existing cell.
      CellCLI> LIST GRIDDISK ATTRIBUTES name,asmDiskGroupName,cachingpolicy,size,offset
    2. For each disk group found by the above command, create grid disks on the new cell that is being added to the cluster.

      Match the size and the cachingpolicy attributes of the existing grid disks for the particular disk group reported by the command above. Grid disks should be created in the order of increasing offset to ensure similar layout and performance characteristics as the existing cells. For example, the LIST GRIDDISK command could return something like this:

      DATAC1          default         2.15625T        32M
      DBFS_DG         default         33.796875G      2.695465087890625T
      RECOC1          none            552.109375G     2.1562957763671875T
      

      When creating grid disks, begin with DATAC1, then RECOC1, and finally DBFS_DG using the following command:

      cellcli -e CREATE GRIDDISK ALL HARDDISK
       PREFIX=matching_prefix_of_the_corresponding_existing_diskgroup,
       size=size_followed_by_G_or_T,
       cachingPolicy=\'value_from_command_above_for_this_disk_group\',
       comment =\"Cluster cluster_name diskgroup diskgroup_name\"
      

      Caution:

      Be sure to specify the EXACT size shown along with the unit (either T or G)
  6. Log in to each Oracle RAC node and verify that the newly created grid disks are visible from the Oracle RAC nodes.

    In the following example, Grid_home refers to the directory in which the Oracle Grid Infrastructure software is installed.

    $Grid_home/bin/kfod op=disks disks=all | grep cellName_being_added

    The kfod command should list all the grid disks created in step 5 above.

  7. Add the newly created grid disks to the respective existing Oracle ASM disk groups.

    In this example, comma_separated_disk_names refers to the disk names from step 5 corresponding to disk_group_name.

    SQL> ALTER DISKGROUP disk_group_name ADD DISK 'comma_separated_disk_names';

    This command kicks off an Oracle ASM rebalance at the default power level.

  8. Monitor the progress of the rebalance by querying GV$ASM_OPERATION.
    SQL> SELECT * FROM GV$ASM_OPERATION;

    When the rebalance completes, the addition of the cell to the Oracle RAC cluster is complete.

  9. Download and run the latest version of Oracle EXAchk to ensure that the resulting configuration implements the latest best practices for Oracle Exadata.

3.9.2 Adding a New Storage Server to an Eighth Rack Cluster

Perform the following steps to add a new Oracle Exadata X7 or later storage server to an existing Oracle Exadata X7 or later Eighth Rack.

  1. If configured, drop the PMEM Cache and PMEM log.
    $ cellcli -e drop pmemcache all
    $ cellcli -e drop pmemlog all
  2. On the new storage server, drop the flash cache, flash log and cell disks.
    cellcli -e drop flashcache all
    cellcli -e drop flashlog all
    cellcli -e drop celldisk all
  3. On the new storage server, enable the eighthrack attribute.
    cellcli -e alter cell eighthRack=true
    
  4. On the new storage server, create the cell disks.
    cellcli -e create celldisk all
    
  5. On the new storage server, create the flash log.
    cellcli -e create flashlog all
    
  6. If applicable, on the new storage server, create the PMEM log.
    cellcli -e create pmemlog all
    
  7. On any of the existing storage servers, retrieve the value of the cell attribute flashcachemode.
    cellcli -e list cell attributes flashcachemode

    The flashcachemode attribute on the new storage server is set to WriteThrough by default. All storage servers should have the same flashcachemode attribute setting.

    If the existing storage servers are using WriteBack mode, then you should change the attribute flashcachemode on the new storage server, as shown here:

    cellcli -e alter cell flashcachemode=writeback
    
  8. On the new storage server, create the flash cache.
    cellcli -e create flashcache all
    
  9. If the storage servers use PMEM cache, then retrieve the value of the cell attribute pmemcachemode.
    cellcli -e list cell attributes pmemcachemode

    The pmemcachemode attribute on the new storage server is set to WriteThrough by default. All storage servers should have the same pmemcachemode attribute setting.

    If the existing storage servers are using WriteBack mode, then you should change the attribute pmemcachemode on the new storage server, as shown here:

    cellcli -e alter cell pmemcachemode=writeback
    
  10. If the storage servers use PMEM cache, then, on the new storage server, create the PMEM cache.
    cellcli -e create pmemcache all
    
  11. On any of the existing storage servers, obtain information on the grid disk configuration.
    cellcli -e list griddisk attributes name,offset,size,cachingpolicy
    
  12. On the new storage server, create the grid disks (repeat for each set of grid disks to match the configuration of the existing storage servers).

    In the following command, replace the italicized text with the corresponding values obtained in step 11.

    cellcli -e CREATE GRIDDISK ALL HARDDISK PREFIX=matching_prefix_of_the_
    corresponding_existing_diskgroup, size=size_followed_by_G_or_T, 
    cachingPolicy=\'value_from_command_above_for_this_disk_group\', 
    comment =\"Cluster cluster_name diskgroup diskgroup_name\"
    
  13. On the new storage server, validate the grid disks have the same configuration of the grid disks as the existing storage servers (by comparing with the information obtained in step 11.
    cellcli -e list griddisk attributes name,offset,size,cachingpolicy
    
  14. (X2 to X8 servers only) If the environment has partition keys (pkeys) implemented, configure pkeys for the RDMA Network Fabric interfaces. Refer to step 6 from Implementing InfiniBand Partitioning across OVM RAC clusters on Exadata (My Oracle Support Doc ID 2075398.1) for this task.
  15. On the new storage server, identify the IP address for both ports for either InfiniBand Network Fabric or RoCE Network Fabric.
    cellcli -e list cell attributes name,ipaddress1,ipaddress2
    
  16. Add the IP addresses from step 15 to the /etc/oracle/cell/network-config/cellip.ora file on every database server.

    Perform these steps on any database server in the cluster:

    1. cd /etc/oracle/cell/network-config
    2. cp cellip.ora cellip.ora.orig
    3. cp cellip.ora cellip.ora-bak
    4. Add the new entries to /etc/oracle/cell/network-config/cellip.ora-bak.
    5. Copy the edited file to the cellip.ora file on all database s using the following command, where database_nodes refers to a file containing the names of each database server in the cluster, with each name on a separate line:
      /usr/local/bin/dcli -g database_nodes -l root -f cellip.ora-bak -d /etc/oracle/cell/network-config/cellip.ora
  17. Connect to any of the Oracle ASM instances and ensure the grid disks from the new storage server are discoverable.
    SQL> set pagesize 30
    SQL> set linesize 132
    SQL> col path format a70
    SQL> SELECT inst_id,path FROM gv$asm_disk WHERE header_status='CANDIDATE' 
      2> ORDER BY inst_id,path;
    
    INST_ID    PATH
    ---------- ----------------------------------------------------------------------
             1 o/192.168.17.235;192.168.17.236/DATAC1_CD_00_celadm11
             1 o/192.168.17.235;192.168.17.236/DATAC1_CD_01_celadm11
             1 o/192.168.17.235;192.168.17.236/DATAC1_CD_02_celadm11
             1 o/192.168.17.235;192.168.17.236/DATAC1_CD_03_celadm11
             1 o/192.168.17.235;192.168.17.236/DATAC1_CD_04_celadm11
             1 o/192.168.17.235;192.168.17.236/DATAC1_CD_05_celadm11
             1 o/192.168.17.235;192.168.17.236/RECOC1_CD_00_celadm11
             1 o/192.168.17.235;192.168.17.236/RECOC1_CD_01_celadm11
             1 o/192.168.17.235;192.168.17.236/RECOC1_CD_02_celadm11
             1 o/192.168.17.235;192.168.17.236/RECOC1_CD_03_celadm11
             1 o/192.168.17.235;192.168.17.236/RECOC1_CD_04_celadm11
             1 o/192.168.17.235;192.168.17.236/RECOC1_CD_05_celadm11
             2 o/192.168.17.235;192.168.17.236/DATAC1_CD_00_celadm11
             2 o/192.168.17.235;192.168.17.236/DATAC1_CD_01_celadm11
             2 o/192.168.17.235;192.168.17.236/DATAC1_CD_02_celadm11
             2 o/192.168.17.235;192.168.17.236/DATAC1_CD_03_celadm11
             2 o/192.168.17.235;192.168.17.236/DATAC1_CD_04_celadm11
             2 o/192.168.17.235;192.168.17.236/DATAC1_CD_05_celadm11
             2 o/192.168.17.235;192.168.17.236/RECOC1_CD_00_celadm11
             2 o/192.168.17.235;192.168.17.236/RECOC1_CD_01_celadm11
             2 o/192.168.17.235;192.168.17.236/RECOC1_CD_02_celadm11
             2 o/192.168.17.235;192.168.17.236/RECOC1_CD_03_celadm11
             2 o/192.168.17.235;192.168.17.236/RECOC1_CD_04_celadm11
             2 o/192.168.17.235;192.168.17.236/RECOC1_CD_05_celadm11
  18. Connect to one of the Oracle ASM instances and add the new disks to the existing disk groups.
    SQL> ALTER DISKGROUP datac1 ADD DISK ‘o/192.168.17.235;192.168.17.
    236/DATAC1*’;
    SQL> ALTER DISKGROUP recoc1 ADD DISK ‘o/192.168.17.235;192.168.17.
    236/RECOC1*’;
    

    Note:

    The rebalance operation triggered by adding the disks will run at the default Oracle Maximum Availability Architecture (MAA) best practice power (should be 4). If the application service level performance is not a concern, then consider increasing the power for a faster rebalance.
  19. Obtain a report of the number of disks per failure group. 6 disks per failure group are expected for High Capacity (HC) Storage Servers and 4 disks per failure group are expected for Extreme Flash (EF) Storage Servers.
    SQL> SELECT d.group_number,dg.name,failgroup,mode_status,COUNT(*)
      2> FROM v$asm_disk d,v$asm_diskgroup dg
      3> WHERE d.group_number=dg.group_number
      4> AND failgroup_type='REGULAR'
      5> GROUP BY d.group_number,dg.name,failgroup,mode_status;
    
    GROUP_NUMBER NAME                FAILGROUP            MODE_ST COUNT(*)
    ------------ ------------------- -------------------- ------- --------
               1 DATAC1              CELADM08             ONLINE         6
               1 DATAC1              CELADM09             ONLINE         6
               1 DATAC1              CELADM10             ONLINE         6
               1 DATAC1              CELADM11             ONLINE         6
               2 RECOC1              CELADM08             ONLINE         6
               2 RECOC1              CELADM09             ONLINE         6
               2 RECOC1              CELADM10             ONLINE         6
               2 RECOC1              CELADM11             ONLINE         6
  20. If Oracle Auto Service Request (ASR) alerting was set up on the existing storage servers, configure cell Oracle ASR alerting for the storage server being added.
    1. From any existing storage server, list the cell snmpsubscriber attribute.
      cellcli -e LIST CELL ATTRIBUTES snmpsubscriber
      
    2. Apply the same snmpsubscriber attribute value to the new storage server by running the command below as the celladmin user, replacing snmpsubscriber with the value from the previous command.
      cellcli -e "ALTER CELL snmpsubscriber=snmpsubscriber"
      
    3. From any existing storage server, list the cell attributes required for configuring cell alerting.
      cellcli -e LIST CELL ATTRIBUTES
       notificationMethod,notificationPolicy,mailServer,smtpToAddr,smtpFrom,
       smtpFromAddr,smtpUseSSL,smtpPort
      
    4. Apply the same values to the new storage server by running the command below as the celladmin user, substituting the placeholders with the values found from the existing storage server.
      cellcli -e "ALTER CELL
       notificationMethod='notificationMethod',
       notificationPolicy='notificationPolicy',
       mailServer='mailServer',
       smtpToAddr='smtpToAddr',
       smtpFrom='smtpFrom',
       smtpFromAddr='smtpFromAddr',
       smtpUseSSL=smtpUseSSL,
       smtpPort=smtpPort"
      

3.9.3 Adding Storage Cells using OEDACLI

OEDACLI provides the interface to perform elastic storage expansion for different configurations such as Bare Metal, single Oracle VM or multiple Oracle VMs.

OEDACLI creates all the objects required on the storage servers and adds the new grid disks to the disk groups. One of the existing storage servers acts as a guide for the configuration of the new storage servers. There is only one rebalance operation triggered even if multiple storage servers are added. The grid disks of the new servers are added to all the disk groups for all the clusters configured.

Prerequisites

  • All the new storage servers must be installed in and connected into the physical rack.
  • All the new storage servers must have the management and RDMA Network Fabric networks configured.
  • OEDACLI must run on a machine that has network access to the database servers (bare metal or guest domains) and the storage servers.
  1. Generate a new OEDA XML configuration file, reflecting an exact list of the current database servers and storage servers used in the environment.
    Use the OEDA DISCOVER command, where the dirname usually is the directory where OEDACLI is installed and host_names_list is the list of nodes to be discovered, separated by commas or spaces, for example, 'dbnode1,dbnode2,celadm01,celadm02'.
    DISCOVER ES hostnames='host_names_list' location='dirname'

    For an environment with multiple Oracle VMs, the command generates a global XML configuration file containing information of all clusters, and also one XML configuration file for each cluster. In the following commands, you should use the global XML configuration file instead of a cluster-specific configuration file.

  2. Create an OEDACLI script to update the XML configuration file using OEDACLI.

    For each cell to be added, the script requires:

    • The name (hostname) of one cell which is part of the current configuration, to be used as a reference for the creation of the objects (cell disks, grid disks, flash cache, and so on)
    • The name (hostname) and IP addresses for the Management and RDMA Network Fabric interfaces.
    • The rack number. For non-interconnected environments this is 1.
    • ULOC is the location on the physical rack. Although it is not used for the storage expansion, pick the value according the information referenced in Oracle Exadata Database Machine System Overview, Part 2 - Cabling Diagrams

    Save the following commands in a file named add_cell_script.cmd. In this example, two new storage servers are being added: celadm04 and celadm05.

    CLONE NEWCELL SRCNAME=celadm01 tgtname=celadm04
    SET ADMINNET NAME=celadm04, IP=203.0.113.35
    SET PRIVNET NAME1=celadm04-priv1, IP1=192.168.216.235, NAME2=celadm04-priv2, IP2=192.168.216.236
    SET ILOMNET NAME=celadm04-ilom, IP=203.0.113.135
    SET RACK NUM=1, ULOC=39
    SAVE ACTION FORCE
    CLONE NEWCELL SRCNAME=celadm01 tgtname=celadm05
    SET ADMINNET NAME=celadm05, IP=203.0.113.36
    SET PRIVNET NAME1=celadm05-priv1, IP1=192.168.216.221, NAME2=celadm05-priv2, IP2=192.168.216.222
    SET ILOMNET NAME=celadm05-ilom, IP=203.0.113.136
    SET RACK NUM=1, ULOC=14
    SAVE ACTION FORCE
    SAVE FILE
  3. Run the script add_cell_script.cmd.

    In this example, the oeda_xml_file is the file generated in the first step, and add_cell_script.cmd is the script created in the previous step.

    $ oedacli -c oeda_xml_file -f add_cell_script.cmd

    When you run the script, it updates the OEDA XML configuration file, adding directives and attributes related to the new storage servers. It does not trigger any action on any component (storage servers or database servers).

  4. Create an OEDACLI script to configure the new storage server(s), or cell(s).

    The script uses the following settings:

    • The Oracle ASM power limit for rebalance is set to 4, which is an Oracle MAA best practice.
    • The WAIT option is set to FALSE, which means the disk rebalance operations are run in parallel for every disk group in the cluster. The number of outstanding rebalances that can be run concurrently is limited to the number of database servers. If you set WAIT to TRUE, then each rebalance operation is run sequentially.

    Save the following commands in a file named config_cell_script.cmd. In this example, the cluster name is q1-vm01. Replace this with the name of your cluster. Also, replace the example cell names (celadm04,celadm05) with your own.

    ALTER CLUSTER ADDCELLS='celadm04,celadm05' power=4, wait=false WHERE clustername='q1-vm01'
    SAVE ACTION
    MERGE ACTIONS
    DEPLOY ACTIONS
    SAVE FILE
  5. Run the script config_cell_script.cmd.

    In this example, the oeda_xml_file is the file generated in the first step, and config_cell_script.cmd is the script created in the previous step.

    $ oedacli -c oeda_xml_file -f config_cell_script.cmd

    When you run the script, it creates the flash cache, flash log, cell disks and grid disks. It also adds the disks to the Oracle ASM disk groups and adds the storage servers to the cluster.

3.9.4 Expanding an Existing Exadata Storage Grid

In this scenario, you have an Exadata storage cell in an Exadata rack, and you want to add the storage cell to an Exadata storage grid that you most probably want to expand.

  1. Decommission the storage cell from its current cluster. To do this, follow the procedure in "Dropping a Storage Server from an Existing Disk Group or Storage Grid".
  2. Add the storage cell to the desired Exadata storage grid. To do this, follow the procedure in "Adding a Cell Node".
  3. Download and run the latest exachk to ensure that the resulting configuration implements the latest best practices for Oracle Exadata.

3.9.5 Dropping a Storage Server from an Existing Disk Group or Storage Grid

You can remove a storage server from an existing Oracle Exadata Rack.

  1. Drop the disks belonging to the storage server to be removed from Oracle Automatic Storage Management (Oracle ASM).

    Note:

    For Oracle Exadata VM deployments, the substeps below need to be performed on all of the Oracle VM clusters.
    1. Log in to any node in the cluster.
    2. Query the list of grid disks being used by the cluster for the targeted storage server.
      Grid_home/bin/asmcmd  lsdsk --suppressheader | grep cellName_being_removed | awk -F'/' '{print $NF}'

      Note:

      Make sure the available free space in every disk group that contains disks from the storage server being removed is at least 15% of the allocated storage for that disk group.
    3. Drop the Oracle ASM disks returned by the command above from their respective disk groups.
      SQL> ALTER DISKGROUP diskgroup_name DROP DISKS IN FAILGROUP cellName_being_removed;
    4. The disk drop operation above kicks off a rebalance operation at the default power level. Monitor for the rebalance using the following command:
      SQL> SELECT * FROM gv$asm_operation;

      Wait until the rebalance completes, that is, wait until gv$asm_operation returns no rows.

    5. Verify that all the disk groups do not have any references to the disks from the storage server being removed.
      SQL> SELECT path, name, header_status, mode_status, mount_status, state,
       failgroup FROM v$asm_disk ORDER BY path;
      

      The header_status column for all the disks belonging to the storage server being removed should show FORMER.

      Reminder:

      For Exadata Oracle VM deployments, the substeps above need to be performed on all of the Oracle VM clusters.
  2. Clean up the storage server being removed.

    Log in to the storage server as celladmin and run the following commands. Run the following commands for each set of grid disks:

    1. Drop the grid disks.
      cellcli -e drop griddisk all prefix=prefix_of_the_grid_disk
    2. If flash cache exists and the storage server is in WriteBack flash cache mode, you must first flush the flash cache before dropping it.
      cellcli -e alter flashcache all flush

      Wait for the command to return.

    3. Drop the flash cache.
      cellcli -e drop flashcache all
      
    4. Drop the cell disks.
      cellcli -e drop celldisk all
      

      If you need to erase data securely, you can run the DROP CELLDISK command with the erase option, or the DROP CELL with the erase option.

      The time required to complete the erase operation is listed in the table under the DROP CELL command.

  3. Remove the entry of the storage server being removed from /etc/oracle/cell/network-config/cellip.ora on all the database server nodes in the cluster.
    Run the following steps on any database server node in the cluster:
    1. Make a backup copies of the cellip.ora file.
      cd /etc/oracle/cell/network-config
      cp cellip.ora cellip.ora.orig
      cp cellip.ora cellip.ora-bak
    2. Remove the entries for the storage server being removed from /etc/oracle/cell/network-config/cellip.ora-bak.
    3. Use dcli to copy the updated cellip.ora-bak file to the other database servers.

      In the following command database_nodes refers to a file containing the names of each database server in the cluster. Each database server name is on a separate line in the file.

      /usr/local/bin/dcli -g database_nodes -l root -f cellip.ora-bak -d 
      /etc/oracle/cell/network-config/cellip.ora
  4. Download and run the latest version of Oracle EXAchk to ensure that the resulting configuration implements the latest best practices for Oracle Exadata.

3.9.6 Dropping Storage Servers using OEDACLI

OEDACLI provides the interface to drop storage servers for different configuration such as Bare Metal, single Oracle VM or multiple Oracle VMs.

The procedure to remove the grid disks from the disk groups and drop the objects on the storage server is implemented through only a few commands. There is only one rebalance operation triggered, regardless of the number of storage cells dropped.

Prerequisites

  • A valid OEDA XML configuration file, reflecting an exact list of the compute nodes and storage cells used in the environment to be expanded. The first step in this task generates a new OEDA XML configuration file to ensure that the current configuration is used.
  • OEDACLI must run on a machine that has network access to the database servers (bare metal or guest domains) and the storage servers.
  1. Generate a new OEDA XML configuration file, reflecting an exact list of the current database servers and storage servers used in the environment.
    Use the OEDA DISCOVER command, where the dirname usually is the directory where OEDACLI is installed and host_names_list is the list of nodes to be discovered, separated by commas or spaces, for example, 'dbnode1,dbnode2,celadm01,celadm02'.
    DISCOVER ES hostnames='host_names_list' location='dirname'

    For an environment with multiple Oracle VMs, the command generates a global XML configuration file containing information of all clusters, and also one XML configuration file for each cluster. In the following commands, you should use the global XML configuration file instead of a cluster-specific configuration file.

  2. Create an OEDACLI script to update the XML configuration file using OEDACLI.

    In this example, the WAIT option is set to TRUE, which means each rebalance operation is run sequentially. After the rebalance for the last disk group has completed, the storage servers are removed from the configuration.

    Save the following commands in a file named drop_cell_cluster.cmd. In this example, two storage servers are being removed: celadm04 and celadm05. In this example, the clustername q1-vm01 is used. You should replace this with the name of your cluster.

    ALTER CLUSTER DROPCELLS='celadm04,celadm05' power=4, wait=true \
    WHERE clustername=q1-vm01
    SAVE ACTION
    MERGE ACTIONS
    DEPLOY ACTIONS
    SAVE FILE
  3. Run the script drop_cell_cluster.cmd.

    In this example, the oeda_xml_file is the file generated in the first step, and drop_cell_cluster.cmd is the script created in the previous step.

    $ oedacli -c oeda_xml_file -f drop_cell_cluster.cmd

    When you run the script, it drops the grid disks from all the Oracle ASM disk groups configured in the cluster, de-configures the cell (drops objects like flash cache, flash log, cell disks, grid disks, and so on).

  4. Create an OEDACLI script to remove the storage servers, or cells, from the OEDA XML configuration file.

    Save the following commands in a file named drop_cell_xml.cmd.

    DELETE NEWCELL WHERE SRCNAME='celadm04 celadm05'
    SAVE ACTION FORCE
    SAVE FILE
  5. Run the script drop_cell_xml.cmd.

    In this example, the oeda_xml_file is the file generated in the first step, and drop_cell_xml.cmd is the script created in the previous step.

    $ oedacli -c oeda_xml_file -f drop_cell_xml.cmd

    When you run the script, it removes the information for the storage servers from the OEDA XML configuration file

3.10 Managing Disk Controller Batteries

This section applies only to Exadata systems prior to X6 that use batteries. Newer systems have CVPM02 (Cache Vault), which is a super cap and not a battery.

3.10.1 About Disk Controller Batteries

The disk controllers in Oracle Exadata storage servers and database servers have battery-backed write cache to accelerate write performance.

Note:

This applies only to Exadata systems prior to X6 that use batteries. Newer systems have CVPM02 (Cache Vault), which is a super cap and not a battery.

If the battery charge capacity degrades such that the battery can no longer protect the cached data for a power loss of 48 hours or more, then the write cache is disabled and the disk controller switches to write through mode. This results in reduced write performance, but there is no data loss. Oracle Exadata Storage Servers generate an alert when battery charge capacity is insufficient or the temperature is high, and when the battery should be replaced.

Battery charge capacity degrades over time, and its life expectancy is inversely proportional to the operating temperature. The worst case life expectancy of the battery in Oracle Exadata Rack is as follows:

Inlet Ambient Temperature Battery Lifetime

< 25 degrees Celsius (77 degrees Fahrenheit)

3 years

< 32 degrees Celsius (89.6 degrees Fahrenheit)

2 years

3.10.2 Monitoring Batteries in the Database Servers

Note:

Exadata Storage Servers generate an alert when battery charge capacity is insufficient or the temperature is high, and when the battery should be replaced.

The battery charge capacity and battery temperature in the database servers can be monitored using the following commands:

Note:

If you are running Oracle Exadata System Software 19.1.0 or later, substitute /opt/MegaRAID/storcli/storcli64 for /opt/MegaRAID/MegaCli/MegaCli64 in the following commands:
# /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -a0 | grep "Full Charge" -A5 | sort \
| grep Full -A1
The following is an example of the output from the command:
Full Charge Capacity: 1357 mAh
Max Error: 2 %
Proactive battery replacement should be done on batteries that show capacity less than 800 mAh and have maximum error less than 10%. Immediately replace any battery that has less than 674 mAh or has maximum error more than 10%.
The battery temperature can be monitored using the following command:
/opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -a0 | grep BatteryType; \
/opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -a0 | grep -i temper
The following is an example of the output from the command:
BatteryType: iBBU08
Temperature: 38 C
  Temperature                  : OK
  Over Temperature        : No
If the battery temperature is greater than or equal to 55 degrees Celsius, then determine the cause, and correct the problem.

3.10.3 Replacing Batteries in Disk Controllers

If the battery charge capacity in the disk controllers falls below the minimum threshold, then Oracle will replace the failed batteries at no extra charge, if the system is covered either by the Oracle Premier Support for Systems or occurs during the warranty period.

For customers with Premier Support for Systems, Oracle attempts to proactively replace the batteries in Oracle Exadata Rack before the end of the estimated lifetime, on a best efforts basis.

3.11 Managing F20 PCIe Energy Storage Modules

Sun Flash Accelerator F20 PCIe cards are used in Oracle Exadata X3 models.

3.11.1 About F20 PCIe Energy Storage Modules

The Sun Flash Accelerator F20 PCIe card includes an energy storage module (ESM) to ensure data integrity during a power interruption, functioning similar to a battery backup.

Sun Flash Accelerator F20 PCIe cards accelerate performance in Oracle Exadata Rack by caching frequently-accessed Oracle Database data and avoids the need to do physical I/O to the disk in Exadata Storage Server. Write operations to the flash cards are temporarily staged in volatile local DRAM memory on the card to speed write operations. The data in the DRAM is protected by an Energy Storage Module (ESM) which provides enough electrical power, in the event of a power failure, to move the data in the DRAM to the local flash.

The flash modules used in Oracle Exadata X3 systems have an expected endurance of 10 years or more, even in write intensive applications. Flash endurance is determined primarily by the total data written to flash across many years, as well as the type of data written. No application runs at maximum flash write IOPS for every second of every day for years. Applications also do many reads and have periods of high and low activity, such as day versus night, quarter close, end of a trading day, and so on. A very highly write intensive application might average 25 percent of the maximum flash write IOPS when measured over many months. Each Exadata X3 storage server has a total flash write endurance of over 50 PB for typical database data. In a full rack, if the application writes an average of 250,000 8K flash IOPS (25 percent of maximum writes) for 10 years, then it will write a total of 41 PB of data to each cell. This is less than the 50 PB per cell endurance.

If the ESM does not have sufficient charge, then the F20 PCIe card operates in fail-safe write-through mode, bypassing the DRAM memory and writing all data directly to flash. This results in reduced write performance, but there is no data loss. Exadata Storage Server generates an alert when the ESM capacity is insufficient, and the ESM should be replaced.

The charge capacity of the ESM degrades over time, and its life expectancy is inversely proportional to the operating temperature. The worst case life expectancy of the ESM in Oracle Exadata Rack is as follows:

Type of Exadata Storage Server Lifetime

Exadata Storage Server with Sun Fire X4275 Servers

3 years

Exadata Storage Server with Sun Fire X4270 M2 Servers

4 years

3.11.2 Replacing Flash ESM

If the charge capacity in the F20 PCIe ESM falls below the minimum threshold, then Oracle will replace the failed ESM modules at no extra charge, if the system is covered either by the Oracle Premier Support for Systems or occurs during the warranty period.

For customers with Premier Support for Systems, Oracle attempts to proactively replace the F20 PCIe ESM in the Oracle Exadata Rack before the end of the estimated lifetime, on a best efforts basis.

3.12 Exadata Storage Server LED Indicator Descriptions

The indicator LEDs on Oracle Exadata storage servers help you to verify the system status and identify components that require servicing.

For information about the various indicator LEDs on Oracle Exadata storage servers, see the section on Troubleshooting Using the Server Front and Back Panel Status Indicators in the server service manual for your system.

See Related Documentation for a list of the server service manuals.

Additionally, the Do Not Service LED is included only on Oracle Exadata X7-2 and later storage servers.

On Oracle Exadata storage servers, the Do Not Service LED has the following states:

  • Do Not Service LED is white/on: Indicates that the storage server is required to remain online to preserve data availability. Do not restart or power off the storage server. Otherwise, data availability may be compromised.

    Typically, the Do Not Service LED lights in response to an issue with a partner storage server. However, the Do Not Service LED also lights in the following situations:

    • During an Oracle ASM cluster rolling upgrade, the Do Not Service LED lights simultaneously on all participating storage servers. Furthermore, on affected grid disks, the asmDeactivationOutcome attribute contains the value: Cannot deactivate because ASM is in rolling upgrade mode.

    • When a database server containing a voting disk goes down, the Do Not Service LED lights simultaneously on all storage servers, which warns against shutting down any storage servers to preserve quorum in the cluster.

  • Do Not Service LED is off: The storage server can be safely powered off for servicing.

3.13 Exadata Storage Server Images

The Exadata Storage Server models have different external layouts and physical appearance.

3.13.1 Oracle Exadata Storage Server X9M-2 Extreme Flash Server Images

The following figure shows the front view of the Oracle Exadata Storage Server X9M-2 Extreme Flash (EF) servers.

Figure 3-1 Front View of Oracle Exadata Storage Server X9M-2 Extreme Flash Server

Description of Figure 3-1 follows
Description of "Figure 3-1 Front View of Oracle Exadata Storage Server X9M-2 Extreme Flash Server"

The following figure shows the rear view of the Oracle Exadata Storage Server X9M-2 Extreme Flash servers.

Figure 3-2 Rear View of Oracle Exadata Storage Server X9M-2 Extreme Flash Server

Description of Figure 3-2 follows
Description of "Figure 3-2 Rear View of Oracle Exadata Storage Server X9M-2 Extreme Flash Server"

3.13.2 Oracle Exadata Storage Server X9M-2 High Capacity Server Images

The following figure shows the front view of the Oracle Exadata Storage Server X9M-2 High Capacity (HC) servers.

Figure 3-3 Front View of Oracle Exadata Storage Server X9M-2 High Capacity servers

Description of Figure 3-3 follows
Description of "Figure 3-3 Front View of Oracle Exadata Storage Server X9M-2 High Capacity servers "

The following figure shows the rear view of the Oracle Exadata Storage Server X9M-2 High Capacity servers.

Figure 3-4 Rear View of Oracle Exadata Storage Server X9M-2 High Capacity servers

Description of Figure 3-4 follows
Description of "Figure 3-4 Rear View of Oracle Exadata Storage Server X9M-2 High Capacity servers "

3.13.3 Oracle Exadata Storage Server X9M-2 Extended Server Images

The following figure shows the front view of the Oracle Exadata Storage Server X9M-2 Extended (XT) servers.

Figure 3-5 Front View of Oracle Exadata Storage Server X9M-2 Extended servers

Description of Figure 3-5 follows
Description of "Figure 3-5 Front View of Oracle Exadata Storage Server X9M-2 Extended servers"

The following figure shows the rear view of the Oracle Exadata Storage Server X9M-2 Extended servers.

Figure 3-6 Rear View of Oracle Exadata Storage Server X9M-2 Extended servers

Description of Figure 3-6 follows
Description of "Figure 3-6 Rear View of Oracle Exadata Storage Server X9M-2 Extended servers"

3.13.4 Oracle Exadata Storage Server X8M-2 and X8-2 High Capacity and Extended (XT) Server Images

The following figure shows the front view of the Oracle Exadata Storage Server X8M-2 and X8-2 High Capacity and XT servers.

Figure 3-7 Front View of Oracle Exadata Storage Server X8M-2 and X8-2 High Capacity and XT servers

Description of Figure 3-7 follows
Description of "Figure 3-7 Front View of Oracle Exadata Storage Server X8M-2 and X8-2 High Capacity and XT servers "

The following figure shows the rear view of the Oracle Exadata Storage Server X8M-2 and X8-2 High Capacity and XT servers.

Figure 3-8 Rear View of Oracle Exadata Storage Server X8M-2 and X8-2 High Capacity and XT servers

Description of Figure 3-8 follows
Description of "Figure 3-8 Rear View of Oracle Exadata Storage Server X8M-2 and X8-2 High Capacity and XT servers "

3.13.5 Oracle Exadata Storage Server X8M-2 and X8-2 Extreme Flash Server Images

The front view of the Oracle Exadata Storage Server X8M-2 and X8-2 Extreme Flash server is almost identical to the X7-2 server. The main difference is the product logo.

Figure 3-9 Front View of Oracle Exadata Storage Server X8M-2 and X8-2 Extreme Flash Server

Description of Figure 3-9 follows
Description of "Figure 3-9 Front View of Oracle Exadata Storage Server X8M-2 and X8-2 Extreme Flash Server"

The rear view of the Oracle Exadata Storage Server X8M-2 and X8-2 Extreme Flash server is almost identical to the X7-2 server. The main difference is the product logo.

Figure 3-10 Rear View of Oracle Exadata Storage Server X8M-2 and X8-2 Extreme Flash Server

Description of Figure 3-10 follows
Description of "Figure 3-10 Rear View of Oracle Exadata Storage Server X8M-2 and X8-2 Extreme Flash Server"

3.13.6 Oracle Exadata Storage Server X7-2 High Capacity Server Images

The following figure shows the front view of the Oracle Exadata Storage Server X7-2 High Capacity Server.

Figure 3-11 Front View of Oracle Exadata Storage Server X7-2 High Capacity Server

Description of Figure 3-11 follows
Description of "Figure 3-11 Front View of Oracle Exadata Storage Server X7-2 High Capacity Server "

The following figure shows the rear view of the Oracle Exadata Storage Server X7-2 High Capacity Server.

Figure 3-12 Rear View of Oracle Exadata Storage Server X7-2 High Capacity Server

Description of Figure 3-12 follows
Description of "Figure 3-12 Rear View of Oracle Exadata Storage Server X7-2 High Capacity Server"

3.13.7 Oracle Exadata Storage Server X7-2 Extreme Flash Server Images

The following figure shows the front view of the Oracle Exadata Storage Server X7-2 Extreme Flash server.

Figure 3-13 Front View of Oracle Exadata Storage Server X7-2 Extreme Flash Server

Description of Figure 3-13 follows
Description of "Figure 3-13 Front View of Oracle Exadata Storage Server X7-2 Extreme Flash Server"

The following figure shows the rear view of the Oracle Exadata Storage Server X7-2 Extreme Flash server.

Figure 3-14 Rear View of Oracle Exadata Storage Server X7-2 Extreme Flash Server

Description of Figure 3-14 follows
Description of "Figure 3-14 Rear View of Oracle Exadata Storage Server X7-2 Extreme Flash Server"

3.13.8 High Capacity Exadata Storage Server X6-2 Images

The following figure shows the front view of the Oracle Exadata Storage Server X6-2 High Capacity server.

Figure 3-15 Front View of Oracle Exadata Storage Server X6-2 High Capacity Server

Description of Figure 3-15 follows
Description of "Figure 3-15 Front View of Oracle Exadata Storage Server X6-2 High Capacity Server "

The following figure shows the rear view of the Oracle Exadata Storage Server X6-2 High Capacity server.

Figure 3-16 Rear View of Oracle Exadata Storage Server X6-2 High Capacity Server

Description of Figure 3-16 follows
Description of "Figure 3-16 Rear View of Oracle Exadata Storage Server X6-2 High Capacity Server "

3.13.9 Extreme Flash Exadata Storage Server X6-2 Images

The following figure shows the front view of the Oracle Exadata Storage Server X6-2 Extreme Flash server.

Figure 3-17 Front View of Oracle Exadata Storage Server X6-2 Extreme Flash Server

Description of Figure 3-17 follows
Description of "Figure 3-17 Front View of Oracle Exadata Storage Server X6-2 Extreme Flash Server"

The following figure shows the rear view of the Oracle Exadata Storage Server X6-2 Extreme Flash server.

Figure 3-18 Rear View of Oracle Exadata Storage Server X6-2 Extreme Flash Server

Description of Figure 3-18 follows
Description of "Figure 3-18 Rear View of Oracle Exadata Storage Server X6-2 Extreme Flash Server "

3.13.10 High Capacity Exadata Storage Server X5-2 Images

The following image shows the front view of High Capacity Exadata Storage Server X5-2 Servers.

Figure 3-19 Front View of High Capacity Exadata Storage Server X5-2 Servers

Description of Figure 3-19 follows
Description of "Figure 3-19 Front View of High Capacity Exadata Storage Server X5-2 Servers"

The following image shows the rear view of High Capacity Exadata Storage Server X5-2 Servers.

Figure 3-20 Rear View of High Capacity Exadata Storage Server X5-2 Servers

Description of Figure 3-20 follows
Description of "Figure 3-20 Rear View of High Capacity Exadata Storage Server X5-2 Servers"

3.13.11 Extreme Flash Exadata Storage Server X5-2 Images

The following image shows the front view of Extreme Flash Exadata Storage Server X5-2 Servers.

Figure 3-21 Front View of Extreme Flash Exadata Storage Server X5-2 Servers

Description of Figure 3-21 follows
Description of "Figure 3-21 Front View of Extreme Flash Exadata Storage Server X5-2 Servers"

The following image shows the rear view of Extreme Flash Exadata Storage Server X5-2 Servers.

Figure 3-22 Rear View of Extreme Flash Exadata Storage Server X5-2 Servers

Description of Figure 3-22 follows
Description of "Figure 3-22 Rear View of Extreme Flash Exadata Storage Server X5-2 Servers"

3.13.12 Exadata Storage Server X4-2L Images

The following image shows the front view of Exadata Storage Server X4-2L Servers. The hard drives are numbered from left to right, starting in the lower left. The drives in the bottom row are numbers 0, 1, 2, and 3. The drives in the middle row are numbers 4, 5, 6, and 7. The drives in the top row are numbers 8, 9, 10, and 11.

Figure 3-23 Front View of Exadata Storage Server X4-2L Servers

Description of Figure 3-23 follows
Description of "Figure 3-23 Front View of Exadata Storage Server X4-2L Servers"

The following image shows the rear view of Exadata Storage Server X4-2L Servers.

Figure 3-24 Rear View of Exadata Storage Server X4-2L Servers

Description of Figure 3-24 follows
Description of "Figure 3-24 Rear View of Exadata Storage Server X4-2L Servers"

3.13.13 Exadata Storage Server X3-2L Images

The following image shows the front view of Exadata Storage Server X3-2L Servers. The hard drives are numbered from left to right, starting in the lower left. The drives in the bottom row are numbers 0, 1, 2, and 3. The drives in the middle row are numbers 4, 5, 6, and 7. The drives in the top row are numbers 8, 9, 10, and 11.

Figure 3-25 Front View of Exadata Storage Server X3-2L Servers

Description of Figure 3-25 follows
Description of "Figure 3-25 Front View of Exadata Storage Server X3-2L Servers"

The following image shows the rear view of Exadata Storage Server X3-2L Servers.

Figure 3-26 Rear View of Exadata Storage Server X3-2L Servers

Description of Figure 3-26 follows
Description of "Figure 3-26 Rear View of Exadata Storage Server X3-2L Servers"

3.13.14 Exadata Storage Server with Sun Fire X4270 M2 Images

The following image shows the front view of Exadata Storage Server with Sun Fire X4270 M2 Servers.

Figure 3-27 Front View of Exadata Storage Server with Sun Fire X4270 M2 Servers

Description of Figure 3-27 follows
Description of "Figure 3-27 Front View of Exadata Storage Server with Sun Fire X4270 M2 Servers"
  1. Hard disk drives. The top drives are, from left to right, HDD2, HDD5, HDD8, and HDD11. The middle drives are, from left to right, HDD1, HDD4, HDD7, and HDD10. The bottom drives are, from left to right, HDD0, HDD3, HDD6, and HDD9.

The following image shows the rear view of Exadata Storage Server with Sun Fire X4270 M2 Servers.

Figure 3-28 Rear View of Exadata Storage Server with Sun Fire X4270 M2 Servers

Description of Figure 3-28 follows
Description of "Figure 3-28 Rear View of Exadata Storage Server with Sun Fire X4270 M2 Servers"
  1. Power supplies.

  2. InfiniBand host channel adapter PCI Express module.

  3. Sun Flash Accelerator F20 PCIe Cards.

3.13.15 Exadata Storage Server with Sun Fire X4275 Images

The following figure shows the front view of Sun Fire X4275 Servers.

Figure 3-29 Front View of Sun Fire X4275 Server

Description of Figure 3-29 follows
Description of "Figure 3-29 Front View of Sun Fire X4275 Server"
  1. Hard disk drives. The top drives are, from left to right, HDD2, HDD5, HDD8, and HDD11. The middle drives are, from left to right, HDD1, HDD4, HDD7, and HDD10. The bottom drives are, from left to right, HDD0, HDD3, HDD6, and HDD9.

The following figure shows the rear view of Sun Fire X4275 servers.

Figure 3-30 Rear View of Sun Fire X4275 Server

Description of Figure 3-30 follows
Description of "Figure 3-30 Rear View of Sun Fire X4275 Server"