2 Extending the Hardware

Oracle Exadata Database Machine can be extended from a Quarter Rack to Half Rack, from a Half Rack to a Full Rack, and by cabling racks together.

All new equipment receives a Customer Support Identifier (CSI). Any new equipment for your Oracle Engineered System Rack has a new CSI. Contact Oracle Support Services to reconcile the new CSI with the existing Oracle Engineered System Rack CSI. Have the original instance numbers or serial numbers available, as well as the new numbers when contacting Oracle Support Services.

2.1 Extending an Eighth Rack to a Quarter Rack in Oracle Engineered System X4-2 and Later

Extending Oracle Engineered System X4-2 or X5-2 from an eighth rack to a quarter rack is done using software. No hardware modifications are needed to extend the rack.

However, hardware modifications may be needed for other Oracle Engineered System versions. See Oracle Exadata Database Machine X6-2: Adding High Capacity Disks and Flash Cards and Oracle Engineered System X7-2, X8-2, X8M: Upgrading Eighth Rack Systems to a Quarter Rack for details.

This procedure can be done with no downtime or outages, other than a rolling database outage.

Note:

In the following procedures, the disk group names and sizes are examples. The values should be changed in the commands to match the actual system.

The procedures assume user equivalence exists between the root user on the first database server and all other database servers, and to the celladmin user on all storage cells.

The text files cell_group and db_group should be created to contain lists of cell host names and database server host names, respectively.

2.1.1 Reviewing and Validating Current Configuration of Eighth Rack Oracle Exadata Database Machine X4-2 or Later

The following procedure describes how to review and validate the current configuration.

  1. Log in as the root user on the first database server.

  2. Review the current configuration of the database servers using the following command:

    # dcli -g db_group -l root 'dbmcli -e list dbserver attributes coreCount'
    

    The following is an example of the output from the command for Oracle Exadata Database Machine X5-2 Eighth Rack:

    dm01db01: 18
    dm01db02: 18
    

    Note:

    The number of active cores in Oracle Exadata Database Machine X5-2 Eighth Rack database server is 18. The number of active cores in Oracle Exadata Database Machine X4-2 Eighth Rack database server is 12.

    If the number of cores on a database server configured as an eighth rack differs, then contact Oracle Support Services.

  3. Review the current configuration of the storage servers using the following command. The expected output is TRUE.

    # dcli -g cell_group -l celladmin 'cellcli -e LIST CELL attributes eighthrack'
    

2.1.2 Activating Database Server Cores in Eighth Rack Oracle Exadata Database Machine X4-2 or Later

The following procedure describes how to activate the database server cores.

  1. Log in as the root user on the first database server.

  2. Activate all the database server cores using the following dcli utility command on the database server group:

    # dcli -g db_group -l root  'dbmcli  -e    \
    ALTER DBSERVER pendingCoreCount = number_of_cores'
    

    In the preceding command, number_of_cores is the total number of cores to activate. The value includes the existing core count and the additional cores to be activated. The following command shows how to activate all the cores in Oracle Exadata Database Machine X5-2 Eighth Rack:

    # dcli -g db_group -l root 'dbmcli -e ALTER DBSERVER pendingCoreCount = 36'
    

    For a description of the supported core counts for each server model, see Restrictions for Capacity-On-Demand on Oracle Exadata Database Machine

  3. Restart each database server.

    Note:

    If this procedure is done in a rolling fashion with the Oracle Database and Oracle Grid Infrastructure active, then ensure the following before restarting the database server:

    • All Oracle ASM grid disks are online..

    • There are no active Oracle ASM rebalance operations. You can query the V$ASM_OPERATION view for the status of the rebalance operation.

    • Shut down Oracle Database and Oracle Grid Infrastructure in a controlled manner, failing over services as needed. .

  4. Verify the following items on the database server after the restart completes and before proceeding to the next server:

    • The Oracle Database and Oracle Grid Infrastructure services are active.

      See Using SRVCTL to Verify That Instances are Running in Oracle Real Application Clusters Administration and Deployment Guide and the crsctl status resource –w "TARGET = ONLINE" —t command.

    • The number of active cores is correct. Use the dbmcli -e list dbserver attributes coreCount command to verify the number of cores.

2.1.3 Oracle Exadata Database Machine X6-2: Adding High Capacity Disks and Flash Cards

Upgrade of Oracle Exadata Database Machine X6-2 Eighth Rack High Capacity systems require hardware modification, but upgrade of X6-2 Extreme Flash does not require hardware modification.

Eighth Rack High Capacity storage servers have half the cores enabled, but half the disks and flash cards are removed. Eighth Rack Extreme Flash storage servers have half the cores and flash drives enabled.

Eighth Rack database servers have half the cores enabled.

On Oracle Exadata Database Machine X6-2 Eighth Rack systems with High Capacity disks, you can add high capacity disks and flash cards to extend the system to a Quarter Rack:

  1. Install the six 8 TB disks in HDD slots 6 - 11.

  2. Install the two F320 flash cards in PCIe slots 1 and 4.

2.1.4 Oracle Engineered System X7-2, X8-2, X8M: Upgrading Eighth Rack Systems to a Quarter Rack

Upgrade of Oracle Engineered System X7-2, X8-2, or X8M-2 Eighth Rack systems requires hardware modification. Eighth Rack database servers have one of the CPUs removed, and all of the memory for CPU1 was moved to CPU0. Storage servers have half the cores enabled, and half the disks and flash cards were removed.

On Oracle Engineered System X7-2, X8-2, or X8M Eighth Rack systems with Extreme Flash storage servers, you can add CPUs and flash cards to extend the system to a Quarter Rack.

For Oracle Engineered System X7-2, X8-2, or X8M-2 Eighth Rack systems with High Capacity storage servers, you can add the CPU and memory to the database servers and additional Eighth Rack High Capacity storage servers to expand the system.

  1. On the Exadata X7-2, X8-2, or X8M-2 database server, install CPU1, move half of CPU0's memory to CPU1, and move the 10/25GbE PCI card to PCIe slot 1.

  2. On Exadata X7-2, X8-2, or X8M-2 Extreme Flash Storage Servers, install four F640/F640v2 flash cards in PCIe slots 2,3,8, and 9.

2.1.5 Activating Storage Server Cores and Disks in Eighth Rack Oracle Exadata Database Machine X4-2 or Later

The following procedure describes how to activate the storage server cores and disks.

  1. Log in as the root user on the first database server.

  2. Activate the cores on the storage server group using the following command. The command uses the dcli utility, and runs the command as the celladmin user.

    # dcli -g cell_group -l celladmin cellcli -e "alter cell eighthRack=false"
    
  3. Create the cell disks using the following command:

    # dcli -g cell_group -l celladmin cellcli -e  "create celldisk all"
    
  4. Recreate the flash log using the following commands:

    # dcli -g cell_group -l celladmin cellcli -e  "drop flashlog all force"
    # dcli -g cell_group -l celladmin cellcli -e  "create flashlog all"
    
  5. Expand the flash cache using the following command:

    # dcli -g cell_group -l celladmin cellcli -e  "alter flashcache all"
    

2.1.6 Creating Grid Disks in Eighth Rack Oracle Exadata Database Machine X4-2 or Later

Grid disk creation must follow a specific order to ensure the proper offset.

The order of grid disk creation must follow the same sequence that was used during initial grid disks creation. For a standard deployment using Oracle Exadata Deployment Assistant, the order is DATA, RECO, and DBFS_DG. Create all DATA grid disks first, followed by the RECO grid disks, and then the DBFS_DG grid disks.

The following procedure describes how to create the grid disks:

Note:

The commands shown in this procedure use the standard deployment grid disk prefix names of DATA, RECO and DBFS_DG. The sizes being checked are on cell disk 02. Cell disk 02 is used because the disk layout for cell disks 00 and 01 are different from the other cell disks in the server.
  1. Check the size of the grid disks using the following commands. Each cell should return the same size for the grid disks starting with the same grid disk prefix.

    # dcli -g cell_group -l celladmin cellcli -e    \
    "list griddisk attributes name, size where name like \'DATA.*_02_.*\'"
    
    # dcli -g cell_group -l celladmin cellcli -e    \
    "list griddisk attributes name, size where name like \'RECO.*_02_.*\'"
    
    # dcli -g cell_group -l celladmin cellcli -e    \
    "list griddisk attributes name, size where name like \'DBFS_DG.*_02_.*\'" 
    

    The sizes shown are used during grid disk creation.

  2. Create the grid disks for the disk groups using the sizes shown in step 1. The following table shows the commands to create the grid disks based on rack type and disk group.

Table 2-1 Commands to Create Disk Groups When Extending Oracle Exadata Database Machine X4-2 Eighth Rack or Later

Rack Commands

Extreme Flash Oracle Exadata Database Machine X5-2 and later

dcli -g cell_group -l celladmin "cellcli -e create griddisk         \
DATA_FD_04_\'hostname -s\' celldisk=FD_04_\'hostname -s\',size=datasize"

dcli -g cell_group -l celladmin "cellcli -e create griddisk         \
DATA_FD_05_\'hostname -s\' celldisk=FD_05_\'hostname -s\',size=datasize"

dcli -g cell_group -l celladmin "cellcli -e create griddisk         \
DATA_FD_06_\'hostname -s\' celldisk=FD_06_\'hostname -s\',size=datasize"

dcli -g cell_group -l celladmin "cellcli -e create griddisk         \
DATA_FD_07_\'hostname -s\' celldisk=FD_07_\'hostname -s\',size=datasize"
dcli -g cell_group -l celladmin "cellcli -e create griddisk          \
RECO_FD_04_\'hostname -s\' celldisk=FD_04_\'hostname -s\',size=recosize, \
cachingPolicy=none"

dcli -g cell_group -l celladmin "cellcli -e create griddisk          \
RECO_FD_05_\'hostname -s\' celldisk=FD_05_\'hostname -s\',size=recosize, \
cachingPolicy=none"

dcli -g cell_group -l celladmin "cellcli -e create griddisk          \
RECO_FD_06_\'hostname -s\' celldisk=FD_06_\'hostname -s\',size=recosize, \
cachingPolicy=none"

dcli -g cell_group -l celladmin "cellcli -e create griddisk          \
RECO_FD_07_\'hostname -s\' celldisk=FD_07_\'hostname -s\',size=recosize, \
cachingPolicy=none"
dcli -g cell_group -l celladmin "cellcli -e create griddisk           \
DBFS_DG_FD_04_\'hostname -s\' celldisk=FD_04_\'hostname -s\',size=dbfssize, \
cachingPolicy=none"

dcli -g cell_group -l celladmin "cellcli -e create griddisk           \
DBFS_DG_FD_05_\'hostname -s\' celldisk=FD_05_\'hostname -s\',size=dbfssize, \
cachingPolicy=none"

dcli -g cell_group -l celladmin "cellcli -e create griddisk           \
DBFS_DG_FD_06_\'hostname -s\' celldisk=FD_06_\'hostname -s\',size=dbfssize, \
cachingPolicy=none"

dcli -g cell_group -l celladmin "cellcli -e create griddisk           \
DBFS_DG_FD_07_\'hostname -s\' celldisk=FD_07_\'hostname -s\',size=dbfssize, \
cachingPolicy=none"

High Capacity Oracle Exadata Database Machine X5-2 or Oracle Exadata Database Machine X4-2 and later

dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
DATA_CD_06_\'hostname -s\' celldisk=CD_06_\'hostname -s\',size=datasize"

dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
DATA_CD_07_\'hostname -s\' celldisk=CD_07_\'hostname -s\',size=datasize"

dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
DATA_CD_08_\'hostname -s\' celldisk=CD_08_\'hostname -s\',size=datasize"

dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
DATA_CD_09_\'hostname -s\' celldisk=CD_09_\'hostname -s\',size=datasize"

dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
DATA_CD_10_\'hostname -s\' celldisk=CD_10_\'hostname -s\',size=datasize"

dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
DATA_CD_11_\'hostname -s\' celldisk=CD_11_\'hostname -s\',size=datasize"
dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
RECO_CD_06_\'hostname -s\' celldisk=CD_06_\'hostname -s\',size=recosize, \
cachingPolicy=none"

dcli -g cell_group -l celladmin "cellcli -e create griddisk             \
RECO_CD_07_\'hostname -s\' celldisk=CD_07_\'hostname -s\',size=recosize, \
cachingPolicy=none"

dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
RECO_CD_08_\'hostname -s\' celldisk=CD_08_\'hostname -s\',size=recosize, \
cachingPolicy=none"

dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
RECO_CD_09_\'hostname -s\' celldisk=CD_09_\'hostname -s\',size=recosize, \
cachingPolicy=none"

dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
RECO_CD_10_\'hostname -s\' celldisk=CD_10_\'hostname -s\',size=recosize, \
cachingPolicy=none"

dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
RECO_CD_11_\'hostname -s\' celldisk=CD_11_\'hostname -s\',size=recosize, \
cachingPolicy=none"
dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
DBFS_DG_CD_06_\'hostname -s\' celldisk=CD_06_\'hostname -s\',size=dbfssize, \
cachingPolicy=none"

dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
DBFS_DG_CD_07_\'hostname -s\' celldisk=CD_07_\'hostname -s\',size=dbfssize, \
cachingPolicy=none"

dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
DBFS_DG_CD_08_\'hostname -s\' celldisk=CD_08_\'hostname -s\',size=dbfssize, \
cachingPolicy=none"

dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
DBFS_DG_CD_09_\'hostname -s\' celldisk=CD_09_\'hostname -s\',size=dbfssize, \
cachingPolicy=none"

dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
DBFS_DG_CD_10_\'hostname -s\' celldisk=CD_10_\'hostname -s\',size=dbfssize, \
cachingPolicy=none"

dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
DBFS_DG_CD_11_\'hostname -s\' celldisk=CD_11_\'hostname -s\',size=dbfssize, \
cachingPolicy=none"

2.1.7 Adding Grid Disks to Oracle ASM Disk Groups in Eighth Rack Oracle Exadata Database Machine X4-2 or Later

The following procedure describes how to add the grid disks to Oracle ASM disk groups.

The grid disks created in Creating Grid Disks in Eighth Rack Oracle Exadata Database Machine X4-2 or Later must be added as Oracle ASM disks to their corresponding, existing Oracle ASM disk groups.

  1. Validate the following:

    • No rebalance operation is currently running.
    • All Oracle ASM disks are active.
  2. Log in to the first database server as the owner who runs the Oracle Grid Infrastructure software.

  3. Set the environment to access the +ASM instance on the server.

  4. Log in to the ASM instance as the sysasm user using the following command:

    $ sqlplus / as sysasm
    
  5. Validate the current settings, as follows:

    SQL> set lines 100
    SQL> column attribute format a20
    SQL> column value format a20
    SQL> column diskgroup format a20
    SQL> SELECT att.name attribute, upper(att.value) value, dg.name diskgroup
    FROM V$ASM_ATTRIBUTE att, V$ASM_DISKGROUP DG
    WHERE DG.group_number=att.group_number AND att.name LIKE '%appliance.mode%'
    ORDER BY att.group_number;

    The output should be similar to the following:

    ATTRIBUTE            VALUE                DISKGROUP
    -------------------- -------------------- --------------------
    appliance.mode       TRUE                 DATAC1
    appliance.mode       TRUE                 DBFS_DG
    appliance.mode       TRUE                 RECOC1
    
  6. Disable the appliance.mode attribute for any disk group that shows TRUE using the following commands:

    SQL> ALTER DISKGROUP data_diskgroup set attribute 'appliance.mode'='FALSE';
    SQL> ALTER DISKGROUP reco_diskgroup set attribute 'appliance.mode'='FALSE';
    SQL> ALTER DISKGROUP dbfs_dg_diskgroup set attribute 'appliance.mode'='FALSE';
    

    In the preceding commands, data_diskgroup, reco_diskgroup, and dbfs_dg_diskgroup are the names of the DATA, RECO and DBFS_DG disk groups, respectively.

  7. Add the grid disks to the Oracle ASM disk groups. The following table shows the commands to create the grid disks based on rack type and disk group. Adding the new disks requires a rebalance of the system.

    Table 2-2 Commands to Add Disk Groups When Extending Eighth Rack Oracle Exadata Database Machine X4-2 and Later

    Rack Commands

    Extreme Flash Oracle Exadata Database Machine X5-2 and later

    SQL> ALTER DISKGROUP data_diskgroup ADD DISK 'o/*/DATA_FD_0[4-7]*'      \
    REBALANCE POWER 32;
     
    SQL> ALTER DISKGROUP reco_diskgroup ADD DISK 'o/*/RECO_FD_0[4-7]*'      \
    REBALANCE POWER 32;
     
    SQL> ALTER DISKGROUP dbfs_dg_diskgroup ADD DISK 'o/*/DBFS_DG_FD_0[4-7]*'\
    REBALANCE POWER 32; 

    High Capacity Oracle Exadata Database Machine X5-2 or Oracle Exadata Database Machine X4-2 and later

    SQL> ALTER DISKGROUP data_diskgroup ADD DISK 'o/*/DATA_CD_0[6-9]*','    \
    o/*/DATA_CD_1[0-1]*' REBALANCE POWER 32;
     
    SQL> ALTER DISKGROUP reco_diskgroup ADD DISK 'o/*/RECO_CD_0[6-9]*','    \
    o/*/RECO_CD_1[0-1]*' REBALANCE POWER 32;
     
    SQL> ALTER DISKGROUP dbfs_dg_diskgroup ADD DISK '                       \
    o/*/DBFS_DG_CD_0[6-9]*',' o/*/DBFS_DG_CD_1[0-1]*' REBALANCE POWER 32; 

    The preceding commands return Diskgroup altered, if successful.

  8. (Optional) Monitor the current rebalance operation using the following command:

    SQL> SELECT * FROM  gv$asm_operation;
    
  9. Re-enable the appliance.mode attribute, if it was disabled in step 6 using the following commands:

    SQL> ALTER DISKGROUP data_diskgroup set attribute 'appliance.mode'='TRUE';
    SQL> ALTER DISKGROUP reco_diskgroup set attribute 'appliance.mode'='TRUE';
    SQL> ALTER DISKGROUP dbfs_dg_diskgroup set attribute 'appliance.mode'='TRUE';
    

2.1.8 Validating New Quarter Rack Configuration for Oracle Exadata Database Machine X4-2 or Later

After adding the grid disks to the Oracle ASM disk groups, validate the configuration.

  1. Log in as the root user on the first database server.

  2. Check the core count using the following command:

    # dcli -g db_group -l root 'dbmcli -e list dbserver attributes coreCount'
    
  3. Review the storage server configuration using the following command.

    # dcli -g cell_group -l celladmin 'cellcli -e list cell attributes eighthrack'
    

    The output should show FALSE.

  4. Review the appliance mode for each disk group using the following commands:

    SQL> set lines 100
    SQL> column attribute format a20
    SQL> column value format a20
    SQL> column diskgroup format a20
    SQL> SELECT att.name attribute, upper(att.value) value, dg.name diskgroup    \
    FROM V$ASM_ATTRIBUTE att, V$ASM_DISKGROUP DG                                 \
    WHERE DG.group_number = att.group_number AND                                 \
    att.name LIKE '%appliance.mode%' ORDER BY DG.group_number;
    
  5. Validate the number of Oracle ASM disks using the following command:

    SQL> SELECT g.name,d.failgroup,d.mode_status,count(*)                      \
    FROM v$asm_diskgroup g, v$asm_disk d                                       \
    WHERE d.group_number=g.group_number                                        \
    GROUP BY g.name,d.failgroup,d.mode_status;
    
    NAME                             FAILGROUP                           MODE_ST  COUNT(*)
    ------------------------- ----------------------------- ------- ----------
    DATAC1                    EXA01CELADM01                 ONLINE          12
    DATAC1                    EXA01CELADM02                 ONLINE          12
    DATAC1                    EXA01CELADM03                 ONLINE          12
    RECOC1                    EXA01CELADM01                 ONLINE          12
    RECOC1                    EXA01CELADM02                 ONLINE          12
    RECOC1                    EXA01CELADM03                 ONLINE          12
    RECOC2                    EXA01CELADM01                 ONLINE          12
    RECOC2                    EXA01CELADM02                 ONLINE          12
    RECOC2                    EXA01CELADM03                 ONLINE          12
    DBFS_DG                   EXA01CELADM01                 ONLINE          10
    DBFS_DG                   EXA01CELADM02                 ONLINE          10
    DBFS_DG                   EXA01CELADM03                 ONLINE          10

    All two-socket systems (except eighth rack configurations) will have 12 disks per cell for any system model. Eighth rack configurations will have 6 disks per cell.

2.2 Extending an Eighth Rack to a Quarter Rack in Oracle Exadata Database Machine X3-2

Extending Oracle Exadata Database Machine X3-2 or earlier rack from an eighth rack to a quarter rack is done using software. No hardware modifications are needed to extend the rack. This procedure can be done with no downtime or outages, other than a rolling database outage. The following procedures in this section describe how to extend an Oracle Exadata Database Machine X3-2 eighth rack to a quarter rack:

2.2.1 Reviewing and Validating Current Configuration of Oracle Exadata Database Machine X3-2 Eighth Rack

The following procedure describes how to review and validate the current configuration:

  1. Log in as the root user on the first database server.

  2. Review the current configuration of the database servers using the following command:

    # dcli -g db_group -l root /opt/oracle.SupportTools/resourcecontrol -show
    

    The following is an example of the output from the command:

    dm01db01: [INFO] Validated hardware and OS. Proceed.
    dm01db01:
    dm01db01: system_bios_version:  25010600
    dm01db01: restore_status:  Ok
    dm01db01: config_sync_status:  Ok
    dm01db01: reset_to_defaults: Off
    dm01db01: [SHOW] Number of cores active per socket: 4
    dm01db02: [INFO] Validated hardware and OS. Proceed.
    dm01db02:
    dm01db02: system_bios_version:  25010600
    dm01db02: restore_status:  Ok
    dm01db02: config_sync_status:  Ok
    dm01db02: reset_to_defaults: Off
    dm01db02: [SHOW] Number of cores active per socket: 4
    

    Note:

    The number of active cores in Oracle Exadata Database Machine X3-2 Eighth Rack database server is 4.

    If the number of cores on a database server configured as an eighth rack differs, then contact Oracle Support Services.

    Ensure the output for restore_status and config_sync_status are shown as Ok before continuing this procedure.

  3. Review the current configuration of the storage servers using the following command. The expected output is TRUE.

    # dcli -g cell_group -l celladmin 'cellcli -e LIST CELL attributes eighthrack'
    
  4. Ensure that flash disks are not used in Oracle ASM disk groups using the following command. Flash cache is dropped and recreated during this procedure:

    # dcli -g cell_group -l celladmin cellcli -e  "list griddisk attributes   \
    asmDiskgroupName,asmDiskName,diskType where diskType ='FlashDisk'         \
    and asmDiskgroupName !=null"
    

    No rows should be returned by the command.

2.2.2 Activating Database Server Cores in Oracle Exadata Database Machine X3-2 Eighth Rack

This task describes how to activate the database server cores for capacity-on-demand.

:

  1. Log in as the root user on the first database server.

  2. Activate all the database server cores using the following dcli utility command on the database server group:

    # dcli -g db_group -l root /opt/oracle.SupportTools/resourcecontrol      \
    -core number_of_cores 
    

    In the preceding command, number_of_cores is the total number of cores to activate. To activate all the cores, enter All for the number of cores.

  3. Restart the database servers in a rolling manner using the following command:

    # shutdown -r now
    

    Note:

    Ensure the output for restore_status and config_sync_status are shown as Ok before activating the storage server cores and disks. Getting the status from the BIOS after restarting may take several minutes.

2.2.3 Activating Storage Server Cores and Disks in Oracle Exadata Database Machine X3-2 Eighth Rack

The following procedure describes how to activate the storage server cores and disks:

  1. Log in as the root user on the first database server.

  2. Activate the cores on the storage server group using the following command. The command uses the dcli utility, and runs the command as the celladmin user.

    # dcli -g cell_group -l celladmin cellcli -e "alter cell eighthRack=false"
    
  3. Create the cell disks using the following command:

    # dcli -g cell_group -l celladmin cellcli -e  "create celldisk all"
    
  4. Recreate the flash log using the following commands:

    # dcli -g cell_group -l celladmin cellcli -e  "drop flashlog all force"
    # dcli -g cell_group -l celladmin cellcli -e  "create flashlog all"
    
  5. Expand the flash cache using the following command:

    # dcli -g cell_group -l celladmin cellcli -e  "alter flashcache all"
    

2.2.4 Creating Grid Disks in Oracle Exadata Database Machine X3-2 Eighth Rack

Grid disk creation must follow a specific order to ensure the proper offset.

The order of grid disk creation must follow the same sequence that was used during initial grid disks creation. For a standard deployment using Oracle Exadata Deployment Assistant, the order is DATA, RECO, and DBFS_DG. Create all DATA grid disks first, followed by the RECO grid disks, and then the DBFS_DG grid disks.

The following procedure describes how to create the grid disks:

Note:

The commands shown in this procedure use the standard deployment grid disk prefix names of DATA, RECO and DBFS_DG. The sizes being checked are on cell disk 02. Cell disk 02 is used because the disk layout for cell disks 00 and 01 are different from the other cell disks in the server.
  1. Check the size of the grid disks using the following commands. Each cell should return the same size for the grid disks starting with the same grid disk prefix.

    # dcli -g cell_group -l celladmin cellcli -e    \
    "list griddisk attributes name, size where name like \'DATA.*02.*\'"
    
    # dcli -g cell_group -l celladmin cellcli -e    \
    "list griddisk attributes name, size where name like \'RECO.*02.*\'"
    
    # dcli -g cell_group -l celladmin cellcli -e    \
    "list griddisk attributes name, size where name like \'DBFS_DG.*02.*\'" 
    

    The sizes shown are used during grid disk creation.

  2. Create the grid disks for the disk groups using the sizes shown in step 1. The following table shows the commands to create the grid disks based on rack type and disk group.

    Table 2-3 Commands to Create Disk Groups When Extending Oracle Exadata Database Machine X3-2 Eighth Rack

    Rack Commands

    High Performance or High Capacity Oracle Exadata Database Machine X3-2

    dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
    DATA_CD_06_\`hostname -s\` celldisk=CD_06_\`hostname -s\`,size=datasize"
    
    dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
    DATA_CD_07_\`hostname -s\` celldisk=CD_07_\`hostname -s\`,size=datasize"
    
    dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
    DATA_CD_08_\`hostname -s\` celldisk=CD_08_\`hostname -s\`,size=datasize"
    
    dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
    DATA_CD_09_\`hostname -s\` celldisk=CD_09_\`hostname -s\`,size=datasize"
    
    dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
    DATA_CD_10_\`hostname -s\` celldisk=CD_10_\`hostname -s\`,size=datasize"
    
    dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
    DATA_CD_11_\`hostname -s\` celldisk=CD_11_\`hostname -s\`,size=datasize"
    
    dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
    RECO_CD_06_\`hostname -s\` celldisk=CD_06_\`hostname -s\`,size=recosize"
    
    dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
    RECO_CD_07_\`hostname -s\` celldisk=CD_07_\`hostname -s\`,size=recosize"
    
    dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
    RECO_CD_08_\`hostname -s\` celldisk=CD_08_\`hostname -s\`,size=recosize"
    
    dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
    RECO_CD_09_\`hostname -s\` celldisk=CD_09_\`hostname -s\`,size=recosize"
    
    dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
    RECO_CD_10_\`hostname -s\` celldisk=CD_10_\`hostname -s\`,size=recosize"
    
    dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
    RECO_CD_11_\`hostname -s\` celldisk=CD_11_\`hostname -s\`,size=recosize"
    
    dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
    DBFS_DG_CD_06_\`hostname -s\` celldisk=CD_06_\`hostname -s\`,size=dbfssize"
    
    dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
    DBFS_DG_CD_07_\`hostname -s\` celldisk=CD_07_\`hostname -s\`,size=dbfssize"
    
    dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
    DBFS_DG_CD_08_\`hostname -s\` celldisk=CD_08_\`hostname -s\`,size=dbfssize"
    
    dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
    DBFS_DG_CD_09_\`hostname -s\` celldisk=CD_09_\`hostname -s\`,size=dbfssize"
    
    dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
    DBFS_DG_CD_10_\`hostname -s\` celldisk=CD_10_\`hostname -s\`,size=dbfssize"
    
    dcli -g cell_group -l celladmin "cellcli -e create griddisk            \
    DBFS_DG_CD_11_\`hostname -s\` celldisk=CD_11_\`hostname -s\`,size=dbfssize"
    

2.2.5 Adding Grid Disks to Oracle ASM Disk Groups in Oracle Exadata Database Machine X3-2 Eighth Rack

This procedure describes how to add the grid disks to Oracle ASM disk groups.

The grid disks created in Creating Grid Disks in Oracle Exadata Database Machine X3-2 Eighth Rack must be added as Oracle ASM disks to their corresponding, existing Oracle ASM disk groups.

  1. Validate the following:

    • No rebalance operation is currently running.

    • All Oracle ASM disks are active.

  2. Log in to the first database server as the owner who runs the Oracle Grid Infrastructure software.

  3. Set the environment to access the +ASM instance on the server.

  4. Log in to the ASM instance as the sysasm user using the following command:

    $ sqlplus / as sysasm
    
  5. Validate the current settings, as follows:

    SQL> set lines 100
    SQL> column attribute format a20
    SQL> column value format a20
    SQL> column diskgroup format a20
    SQL> SELECT att.name attribute, upper(att.value) value, dg.name diskgroup   \
    FROM V$ASM_ATTRIBUTE att, V$ASM_DISKGROUP  DG                               \
    WHERE DG.group_number = att.group_number AND                                \
    att.name LIKE '%appliance.mode%' ORDER BY att.group_number;
    

    The output should be similar to the following:

    ATTRIBUTE            VALUE                DISKGROUP
    -------------------- -------------------- --------------------
    appliance.mode       TRUE                 DATAC1
    appliance.mode       TRUE                 DBFS_DG
    appliance.mode       TRUE                 RECOC1
    
  6. Disable the appliance.mode attribute for any disk group that shows TRUE using the following commands:

    SQL> ALTER DISKGROUP data_diskgroup set attribute 'appliance.mode'='FALSE';
    SQL> ALTER DISKGROUP reco_diskgroup set attribute 'appliance.mode'='FALSE';
    SQL> ALTER DISKGROUP dbfs_dg_diskgroup set attribute 'appliance.mode'='FALSE';
    

    In the preceding commands, data_diskgroup, reco_diskgroup, and dbfs_dg_diskgroup are the names of the DATA, RECO and DBFS_DG disk groups, respectively.

  7. Add the grid disks to the Oracle ASM disk groups. The following table shows the commands to create the grid disks based on rack type and disk group. Adding the new disks requires a rebalance of the system.

    Table 2-4 Commands to Add Disk Groups When Extending an Oracle Exadata Database Machine X3-2 Eighth Rack

    Rack Commands

    High Capacity or High Performance Oracle Exadata Database Machine X3-2

    SQL> ALTER DISKGROUP data_diskgroup ADD DISK 'o/*/DATA_CD_0[6-9]*','    \
    o/*/DATA_CD_1[0-1]*' REBALANCE POWER 32;
     
    SQL> ALTER DISKGROUP reco_diskgroup ADD DISK 'o/*/RECO_CD_0[6-9]*','    \
    o/*/RECO_CD_1[0-1]*' REBALANCE POWER 32;
     
    SQL> ALTER DISKGROUP dbfs_dg_diskgroup ADD DISK '                       \
    o/*/DBFS_DG_CD_0[6-9]*',' o/*/DBFS_DG_CD_1[0-1]*' REBALANCE POWER 32; 

    The preceding commands return Diskgroup altered, if successful.

  8. (Optional) Monitor the current rebalance operation using the following command:

    SQL> SELECT * FROM  gv$asm_operation;
    
  9. Re-enable the appliance.mode attribute, if it was disabled in step 6 using the following commands:

    SQL> ALTER DISKGROUP data_diskgroup set attribute 'appliance.mode'='TRUE';
    SQL> ALTER DISKGROUP recodiskgroup set attribute 'appliance.mode'='TRUE';
    SQL> ALTER DISKGROUP dbfs_dg_diskgroup set attribute 'appliance.mode'='TRUE';
    

2.2.6 Validating New Oracle Exadata Database Machine X3-2 Quarter Rack Configuration

After adding the grid disks to the Oracle ASM disk groups, validate the configuration. The following procedure describes how to validate the configuration:

  1. Log in as the root user on the first database server.

  2. Check the core count using the following command:

    # dcli -g db_group -l root 'dbmcli -e list dbserver attributes coreCount'
    
  3. Review the storage server configuration using the following command.

    # dcli -g cell_group -l celladmin 'cellcli -e list cell attributes eighthrack'
    

    The output should show FALSE.

  4. Review the appliance mode for each disk group using the following commands:

    SQL> set lines 100
    SQL> column attribute format a20
    SQL> column value format a20
    SQL> column diskgroup format a20
    SQL> SELECT att.name attribute, upper(att.value) value, dg.name diskgroup    \
    FROM V$ASM_ATTRIBUTE att, V$ASM_DISKGROUP  DG                                \
    WHERE DG.group_number =att.group_number AND                                  \
    att.name LIKE '%appliance.mode%' ORDER BY DG.group_number;
    
  5. Validate the number of Oracle ASM disks using the following command:

    SQL> SELECT g.name,d.failgroup,d.mode_status,count(*)                      \
    FROM v$asm_diskgroup g, v$asm_disk d                                       \
    WHERE d.group_number=g.group_number                                        \
    GROUP BY g.name,d.failgroup,d.mode_status;
    

2.3 Extending Elastic Configurations

Oracle Engineered System is available in Elastic Configurations that consist of a number of database and storage servers up to the capacity of the rack, as defined within Oracle Exadata Configuration Assistant (OECA).

Additional database and storage servers can be added if space is available; see OECA for details. The upgrade process includes adding new servers and cables

Note:

It is possible to extend the hardware while the machine is online, and with no downtime. However, extreme care should be taken. In addition, patch application to existing switches and servers should be done before extending the hardware.

2.3.1 Removing the Doors

This procedure describes how to remove the doors on Oracle Engineered System.

Note:

For Oracle Engineered System X7 systems, refer to

2.3.2 Adding New Switches

You can add individual new RDMA Network Fabric switches as needed to meet growing resource requirements.

The instructions are different for X8M, InfiniBand Transport Layer systems based on a RoCE Network Layer from the instruction for X8 and ealier, InfiniBand Transport Layer systems based on an InfiniBand Network Layer

2.3.2.1 Adding a Cisco Nexus 9336C Switch (Optional)

This procedure is for X8M, InfiniBand Transport Layer systems based on a RoCE Network Layer.

  • Extending a Oracle Engineered System X8M-2 to another X8M-2.

  • Extending a Oracle Engineered System X8M-8 to another X8M-8.

  • This is not applicable to X8 or earlier.

Note:

The steps in this procedure are specific to Oracle Engineered System. They are not the same as the steps in the Cicso Nexus manual.

  1. Unpack the Cisco Nexus switch components from the packing cartons. The following items should be in the packing cartons:

    • Cisco Nexus 9336C-FX2 Switch

    • Cable bracket and rackmount kit

    • Cable management bracket and cover

    • Two rack rail assemblies

    • Assortment of screws and captive nuts

    • Cisco Nexus 9336C-FX2 Switch documentation

    The service label procedure on top of the switch includes descriptions of the preceding items.

  2. Remove the trough from the rack in RU1. Put the cables aside while installing the RDMA Network Fabric switch. The trough can be discarded.

  3. Install cage nuts in each rack rail in the appropriate holes.

  4. Attach the brackets with cutouts to the power supply side of the switch.

  5. Attach the C-brackets to the switch on the side of the ports.

  6. Slide the switch halfway into the rack from the front. Keep the switch to the left side of the rack as far as possible while pulling the two power cords through the C-bracket on the right side.

  7. Slide the server in rack location U2 out to the locked service position. This improves access to the rear of the switch during further assembly.

  8. Install the slide rails from the rear of the rack into the C-brackets on the switch, pushing them up to the rack rail.

  9. Attach an assembled cable arm bracket to the slide rail and using a No. 3 Phillips screwdriver, screw these together into the rack rail:

    1. Install the lower screw loosely with the cable arm bracket rotated 90 degrees downward. This allows better finger access to the screw.

    2. Rotate the cable arm bracket to the correct position.

    3. Install the upper screw.

    4. Tighten both screws.

    If available, a screwdriver with a long-shaft (16-inch / 400mm) will allow easier installation such that the handle is outside the rack and beyond the cabling.

  10. Push the switch completely into the rack from the front, routing the power cords through the cutout on the rail bracket.

  11. Secure the switch to the front rack rail with M6 16mm screws. Tighten the screws using the No. 3 Phillips screwdriver.

  12. Install the lower part of the cable management arm across the back of the switch.

  13. Connect the cables to the appropriate ports.

  14. Install the upper part of the cable management arm.

  15. Slide the server in rack location U2 back into the rack.

  16. Install power cords to the switch power supply slots on the front.

  17. Loosen the front screws to install the vented filler panel brackets. Tighten the screws, and snap on the vented filler panel in front of the switch.

See Also:

2.3.2.2 Adding a Sun Datacenter InfiniBand Switch 36 (Optional)

This procedure applies to X8 and earlier, InfiniBand Transport Layer systems based on an InfiniBand Network Layer.

  • This is not applicable to Oracle Engineered System X8M.

  • Upgrading a rack with Sun Fire X4170 Oracle Database Servers to Oracle Exadata Database Machine Half Rack or Oracle Exadata Database Machine Full Rack.

  • Extending a Quarter Rack or Eighth Rack.

  • Extending an Oracle Exadata Database Machine X4-2 rack.

Note:

The steps in this procedure are specific to Oracle Exadata Database Machine. They are not the same as the steps in the Sun Datacenter InfiniBand Switch 36 manual.
  1. Unpack the Sun Datacenter InfiniBand Switch 36 switch components from the packing cartons. The following items should be in the packing cartons:

    • Sun Datacenter InfiniBand Switch 36 switch

    • Cable bracket and rackmount kit

    • Cable management bracket and cover

    • Two rack rail assemblies

    • Assortment of screws and captive nuts

    • Sun Datacenter InfiniBand Switch 36 documentation

    The service label procedure on top of the switch includes descriptions of the preceding items.

  2. X5 racks only: Remove the trough from the rack in RU1 and put the cables aside while installing the Sun Datacenter InfiniBand Switch 36 switch. The trough can be discarded.

  3. Install cage nuts in each rack rail in the appropriate holes.

  4. Attach the brackets with cutouts to the power supply side of the switch.

  5. Attach the C-brackets to the switch on the side of the Sun Datacenter InfiniBand Switch 36 ports.

  6. Slide the switch halfway into the rack from the front. You need to keep it to the left side of the rack as far as possible while pulling the two power cords through the C-bracket on the right side.

  7. Slide the server in rack location U2 out to the locked service position. This improves access to the rear of the switch during further assembly.

  8. Install the slide rails from the rear of the rack into the C-brackets on the switch, pushing them up to the rack rail.

  9. Attach an assembled cable arm bracket to the slide rail and using a No. 3 Phillips screwdriver, screw these together into the rack rail:

    1. Install the lower screw loosely with the cable arm bracket rotated 90 degrees downward. This allows better finger access to the screw.

    2. Rotate the cable arm bracket to the correct position.

    3. Install the upper screw.

    4. Tighten both screws.

    If available, a screwdriver with a long-shaft (16-inch / 400mm) will allow easier installation such that the handle is outside the rack and beyond the cabling.

  10. Push the switch completely into the rack from the front, routing the power cords through the cutout on the rail bracket.

  11. Secure the switch to the front rack rail with M6 16mm screws. Tighten the screws using the No. 3 Phillips screwdriver.

  12. Install the lower part of the cable management arm across the back of the switch.

  13. Connect the cables to the appropriate ports.

  14. Install the upper part of the cable management arm.

  15. Slide the server in rack location U2 back into the rack.

  16. Install power cords to the Sun Datacenter InfiniBand Switch 36 switch power supply slots on the front.

  17. Loosen the front screws to install the vented filler panel brackets. Tighten the screws, and snap on the vented filler panel in front of the switch.

See Also:

2.3.3 Adding New Servers

For systems that are not an Eighth Rack, you can add new servers to an Oracle Engineered System Rack that is not at full capacity.

You can add individual database or storage servers as needed to meet growing resource requirements using the Elastic Configuration method. Additional database servers and storage servers can be added if space is available; see Oracle Exadata Configuration Assistant (OECA) for details. The upgrade process includes adding new servers and cables. Additional hardware may be required.

Note:

  • Always load equipment into the rack from the bottom up, so that the rack does not become top-heavy and tip over. Extend the rack anti-tip bar to prevent the rack from tipping during equipment installation.

  • The new servers need to be configured manually.

Related Topics

2.3.3.1 Preparing to Install New Servers

Before you install a new server, prepare the rack unit for the server installation.

  1. Identify the rack unit where the server will be installed. Fill the first available unit, starting from the bottom of the rack.

  2. Remove and discard the trough, which attaches the cable harness when no server is installed in the unit.

  3. Remove and discard the solid filler.

2.3.3.2 Installing the Rack Assembly

After preparing for installation, you next install the rack assembly to hold the new servers.

  1. Position a mounting bracket against the chassis so that the slide-rail lock is at the server front, and the five keyhole openings on the mounting bracket are aligned with the five locating pins on the side of the chassis.

  2. Orient the slide-rail assembly so that the ball-bearing track is forward and locked in place.

  3. Starting on either side of the rack, align the rear of the slide-rail assembly against the inside of the rear rack rail, and push until the assembly locks into place with an audible click.

    Figure 2-1 Locking the Slide-Rail Assembly Against the Inside of the Rear Rack Rail

    Description of Figure 2-1 follows
    Description of "Figure 2-1 Locking the Slide-Rail Assembly Against the Inside of the Rear Rack Rail"
  4. Align the front of the slide-rail assembly against the outside of the front rack rail, and push until the assembly locks into place and you hear the click.

  5. Repeat steps 2 to 4 on the other side on the rack.

2.3.3.3 Installing the Server

After preparing for the installation and installing the rack assembly, you then install the new server.

WARNING:

  • Installing a server requires a minimum of two people or a lift because of the weight of each server. Attempting this procedure alone can result in equipment damage, personal injury, or both.

  • Always load equipment into the rack from the bottom up, so that the rack does not become top-heavy and tip over. Extend the rack anti-tip bar to prevent the rack from tipping during equipment installation.

  1. Read the service label on the top cover of the server before installing a server into the rack.

  2. Push the server into the slide rail assembly:

    1. Push the slide rails into the slide rail assemblies as far as possible.

    2. Position the server so the rear ends of the mounting brackets are aligned with the slide rail assemblies mounted in the equipment rack.

      Figure 2-2 Aligning the Rear Ends of the Mounting Brackets with the Slide Rail Assemblies in the Rack


      Description of Figure 2-2 follows
      Description of "Figure 2-2 Aligning the Rear Ends of the Mounting Brackets with the Slide Rail Assemblies in the Rack"

      The callouts in the preceding image highlight the following:

      1: Mounting bracket inserted into slide rail

      2: Slide-rail release lever

    3. Insert the mounting brackets into the slide rails, and push the server into the rack until the mounting brackets encounter the slide rail stops, approximately 30 cm (12 inches).

    4. Simultaneously push down and hold the slide rail release levers on each mounting bracket while pushing the server into the rack.

      Note:

      Oracle recommends that two people push the servers into the rack: one person to move the server in and out of the rack, and another person to watch the cables and cable management arm (CMA).
    5. Continue pushing until the slide rail locks on the front of the mounting brackets engage the slide rail assemblies, and you hear the click.

  3. Cable the new server as described in Cabling Exadata Storage Servers.

2.3.4 Cabling Database Servers

After the new database servers are installed, they need to be cabled to the existing equipment. The following procedure describes how to cable the new equipment in the rack. The images shown in the procedure are of a Sun Fire X4170 M2 Oracle Database Server.

Note:

  • The existing cable connections in the rack do not change.

  • The blue cables connect to Oracle Database servers, and the black cables connect to Exadata Storage Servers. These network cables are for the NET0 Ethernet interface port.

  • Attach and route the management cables on the CMA and rear panel one server at a time. Do not slide out more than one server at a time.

  • Start from the bottom of the rack, and work upward. Route the cables through the CMA with the dongle on the top and power cables on the bottom.

  • Longer hook and loop straps are needed when cabling three CAT5e cables or two TwinAx cables.

  1. Connect the CAT5e cables, AC power cables, and USB to their respective ports on the rear of the server. Ensure the flat side of the dongle is flush against the CMA inner rail.

    Figure 2-3 Cables at the Rear of the Server

    Description of Figure 2-3 follows
    Description of "Figure 2-3 Cables at the Rear of the Server"
  2. Adjust the green cable management arm (CMA) brackets

    Figure 2-4 Cable Management Arm (CMA) Brackets

    Description of Figure 2-4 follows
    Description of "Figure 2-4 Cable Management Arm (CMA) Brackets"

    Description of the CMA callouts in the preceding image"

    1. Connector A

    2. Front slide bar

    3. Velcro straps (6)

    4. Connector B

    5. Connector C

    6. Connector D

    7. Slide-rail latching bracket (used with connector D)

    8. Rear slide bar

    9. Cable covers

    10. Cable covers

  3. Attach the CMA to the server.

  4. Route the CAT5e and power cables through the wire clip.

    Figure 2-5 Cables Routed Through the Cable Management Arm

    Description of Figure 2-5 follows
    Description of "Figure 2-5 Cables Routed Through the Cable Management Arm"
  5. Bend the CAT5e and power cables to enter the CMA, while adhering to the bend radius minimums.

  6. Secure the CAT5e and power cables under the cable clasps.

    Figure 2-6 Cables Secured under the Cable Clasps

    Description of Figure 2-6 follows
    Description of "Figure 2-6 Cables Secured under the Cable Clasps"
  7. Route the cables through the CMA, and secure them with hook and loop straps at equal intervals.

    Figure 2-7 Cables Secured with Hook and Loop Straps at Regular Intervals

    Description of Figure 2-7 follows
    Description of "Figure 2-7 Cables Secured with Hook and Loop Straps at Regular Intervals"
  8. Connect the RDMA Network Fabric or TwinAx cables with the initial bend resting on the CMA. The TwinAx cables are for client access to the database servers.

    Figure 2-8 RDMA Network Fabric or TwinAx Cables Positioned on the CMA

    Description of Figure 2-8 follows
    Description of "Figure 2-8 RDMA Network Fabric or TwinAx Cables Positioned on the CMA"
  9. Secure the RDMA Network Fabric or TwinAx cables with hook and loop straps at equal intervals.

    Figure 2-9 RDMA Network Fabric or TwinAx Cables Secured with Hook and Loop Straps at Regular Intervals

    Description of Figure 2-9 follows
    Description of "Figure 2-9 RDMA Network Fabric or TwinAx Cables Secured with Hook and Loop Straps at Regular Intervals"
  10. Route the fiber core cables.

  11. Rest the cables over the green clasp on the CMA.

  12. Attach the red ILOM cables to the database server.

  13. Attach the network cables to the Oracle Database server.

  14. Attach the cables from Oracle Database server to the RDMA Network Fabricswitches.

  15. Connect the orange Ethernet cable to the KVM switch.

  16. Connect the red and blue Ethernet cables to the Cisco switch.

  17. Verify operation of the slide rails and CMA for each server, as follows:

    Note:

    Oracle recommends that two people do this step. One person to move the server in and out of the rack, and another person to observe the cables and CMA.

    1. Slowly pull the server out of the rack until the slide rails reach their stops.

    2. Inspect the attached cables for any binding or kinks.

    3. Verify the CMA extends fully from the slide rails.

  18. Push the server back into the rack, as follows:

    1. Release the two sets of slide rail stops.

    2. Push in both levers simultaneously, and slide the server into the rack. The first stop in the set are levers located on the inside of each slide rail, just behind the back panel of the server. The levers are labeled PUSH. The server slides approximately 46 cm (18 inches) and stop.

    3. Verify the cables and CMA retract without binding.

    4. Simultaneously push or pull both slide rail release buttons, and push the server completely into the rack until both slide rails engage. The second stop in the set are the slide rail release buttons located near the front of each mounting bracket.

  19. Dress the cables, and then tie off the cables with the straps. Oracle recommends the cables should be dressed in bundles of eight or less.

  20. Extend and then fully retract the server to check cable travel by sliding each server out and back fully to ensure that the cables are not binding or catching.

  21. Repeat the procedure for the rest of the servers.

  22. Connect the power cables to the power distribution units (PDUs). Ensure the breaker switches are in the OFF position before connecting the power cables. Do not plug the power cables into the facility receptacles at this time.

2.3.5 Cabling Storage Servers

After the new Storage Servers are installed, you need to connect them to the existing equipment.

The following procedure describes how to cable the new equipment in the rack.

Note:

  • The existing cable connections in the rack do not change.

  • The blue cables connect to Oracle Database servers, and the black cables connect to Exadata Storage Servers. These network cables are for the NET0 Ethernet interface port.

  • Attach and route the management cables on the CMA and rear panel one server at a time. Do not slide out more than one server at a time.

  • Start from the bottom of the rack, and work upward.

  • Longer hook and loop straps are needed when cabling three CAT5e cables or two TwinAx cables.

  1. Attach a CMA to the server.

  2. Insert the cables into their ports through the hook and loop straps, then route the cables into the CMA in this order:

    1. Power

    2. Ethernet

    3. RDMA Network Fabric

    Figure 2-10 Rear of the Server Showing Power and Network Cables

    Description of Figure 2-10 follows
    Description of "Figure 2-10 Rear of the Server Showing Power and Network Cables"
  3. Route the cables through the CMA and secure them with hook and loop straps on both sides of each bend in the CMA.

    Figure 2-11 Cables Routed Through the CMA and Secured with Hook and Loop Straps

    Description of Figure 2-11 follows
    Description of "Figure 2-11 Cables Routed Through the CMA and Secured with Hook and Loop Straps"
  4. Close the crossbar covers to secure the cables in the straightaway.

  5. Verify operation of the slide rails and the CMA for each server:

    Note:

    Oracle recommends that two people do this step: one person to move the server in and out of the rack, and another person to watch the cables and the CMA.

    1. Slowly pull the server out of the rack until the slide rails reach their stops.

    2. Inspect the attached cables for any binding or kinks.

    3. Verify that the CMA extends fully from the slide rails.

  6. Push the server back into the rack:

    1. Release the two sets of slide rail stops.

    2. Locate the levers on the inside of each slide rail, just behind the back panel of the server. They are labeled PUSH.

    3. Simultaneously push in both levers and slide the server into the rack, until it stops in approximately 46 cm (18 inches).

    4. Verify that the cables and CMA retract without binding.

    5. Locate the slide rail release buttons near the front of each mounting bracket.

    6. Simultaneously push in both slide rail release buttons and slide the server completely into the rack, until both slide rails engage.

  7. Dress the cables, and then tie off the cables with the straps. Oracle recommends that you dress the RDMA Network Fabric cables in bundles of eight or fewer.

  8. Slide each server out and back fully to ensure that the cables are not binding or catching.

  9. Repeat the procedure for all servers.

  10. Connect the power cables to the power distribution units (PDUs). Ensure the breaker switches are in the OFF position before connecting the power cables. Do not plug the power cables into the facility receptacles now.

See Also:

Multi-Rack Cabling Tables

Oracle Exadata Database Machine System Overview for the cabling tables for your system

2.3.6 Closing the Rack

After installing new equipment, you must replace the panels and close the rack.

There are two rack models in use with Oracle Engineered System Racks. Refer to the appropriate documentation for the most up-to-date steps:

The following steps provide an overview of the process.

  1. Replace the rack front and rear doors as follows:

    1. Retrieve the doors, and place them carefully on the door hinges.

    2. Connect the front and rear door grounding strap to the frame.

    3. Close the doors.

    4. (Optional) Lock the doors. The keys are in the shipping kit.

  2. (Optional) Replace the side panels, if they were removed for the upgrade, as follows:

    1. Lift each side panel up and onto the side of the rack. The top of the rack should support the weight of the side panel. Ensure the panel fasteners line up with the grooves in the rack frame.

    2. Turn each side panel fastener one-quarter turn clockwise using the side panel removal tool. Turn the fasteners next to the panel lock clockwise. There are 10 fasteners per side panel.

    3. (Optional) Lock each side panel. The key is in the shipping kit. The locks are located on the bottom, center of the side panels.

    4. Connect the grounding straps to the side panels.

After closing the rack, proceed to Configuring the New Hardware to configure the new hardware.

2.4 Extending a Rack by Adding Another Rack

Extending your Oracle Engineered System Rack by adding another rack consists of cabling and configuring the racks together.

Racks up to model X8 can be cabled together with no downtime. X8M and newer racks might require downtime when cabling racks together.

WARNING:

Cabling within a live network must be done carefully in order to avoid potentially serious disruptions.

2.4.1 Overview of Adding Another Rack to an Existing System

Review the following notes before cabling racks together.

  • There is some performance degradation while cabling the racks together, for models up to X8. This degradation results from reduced network bandwidth, and the data retransmission due to packet loss when a cable is unplugged.

  • When connecting racks up to model X8-2, the environment is not a high-availability environment because one leaf switch will need to be off. All traffic goes through the remaining leaf switch.

  • Only the existing racks are operational when adding racks. The servers on any new racks must be powered down.

  • The software running on the systems cannot have problems related to RDMA Network Fabric restarts. To verify the configuration, before connecting multiple racks together, run infinicheck without the performance tests (use the -e option).

  • It is assumed that each Oracle Engineered System Rack has three RDMA Network Fabric switches already installed.

  • The new racks have been configured with the appropriate IP addresses to be migrated into the expanded system prior to any cabling, and there are no duplicate IP addresses.

  • The procedures for extending X8M racks with InfiniBand Transport Layer systems based on a RoCE Network Layer are different than the procedures for racks with InfiniBand Transport Layer systems based on an InfiniBand Network Layer (X8 and earlier.)

  • For X8M racks, the Oracle Engineered System needs one loopback IP interface per spine. The IP addressing scheme uses IANA 'Shared Address Space' 100.64.0.0/10. This ensures that there is no overlap with IPv4 addresses in the network using other schemes.
    • Leaf loopback0 IPs are assigned as 100.64.0.101, 100.64.0.102, 100.64.0.103, and so on.
    • Spine loopback0 IPs are assigned as 100.64.0.201, 100.64.0.202, up to 100.64.0.208.

2.4.2 Cabling Two Racks Together

The simplest case of extending an Engineered System rack is to cable two racks together.

2.4.2.1 Cabling Two Racks Together–X8M

Use this procedure to cable together two racks that both use InfiniBand Transport Layer systems based on a RoCE Network Layer.

Note:

Cabling two X8M racks together requires downtime for both racks.
In this procedure, the existing rack is R1, and the new rack is R2.
  1. Ensure the new rack is near the existing rack.
    The RDMA Network Fabric cables must be able to reach the servers in each rack.
  2. Ensure you have a backup of the current switch configuration for each switch in the existing and new rack.

    For each switch, complete the steps in the Oracle Exadata Database Machine Maintenance Guide, section Backing Up Settings on the InfiniBand Transport Layer systems based on a RoCE Network Layer Switch.

  3. Shut down all servers on both the new rack (R2) and the existing rack (R1).
    The switches should remain available.
  4. Apply the multi-rack spine switch configuration to the two spine switches.
    1. Log in to the server that has downloaded the RDMA network switch patch ZIP file for the latest release.
      If you do not have the patch ZIP file available, you can download the patch for your Oracle Exadata System Software release. For example, for 19.3 release, the patch is 29963277- RDMA network switch (7.0(3)17(6)) and InfiniBand network switch (2.2.13-2).
    2. Make a copy of the golden configuration file for each switch.

      After extracting the patch ZIP file, run these commands from patch directory:

      # cp roce_switch_templates/roce_spine_switch_multi.cfg roce_spine_switch_multi_R1SS.cfg
      # cp roce_switch_templates/roce_spine_switch_multi.cfg roce_spine_switch_multi_R2SS.cfg
    3. Edit each copy of the spine switch configuration file.

      Using a text editor, replace each occurrence of %SPINE_LOOPBACK_IP0% with the correct IP address for the switch, as indicated in the table below.

      Switch SPINE_LOOPBACK_IP0
      Rack 1 spine switch (R1SS) 100.64.0.201
      Rack 2 spine switch (R2SS) 100.64.0.202
    4. Apply the updated multi-rack configuration file to its corresponding spine switch.
      1. Log in to each spine switch, and remove the existing configuration file using the following command:

        delete bootflash:roce_spine_switch_multi.cfg
        

        For example:

        rack1sw-roces0(config)# delete bootflash:roce_spine_switch_multi.cfg
        Do you want to delete "/roce_spine_switch_multi.cfg" ? (yes/no/abort) [y] y
        rack1sw-roces0(config)#
      2. Log in to the server that contains the modified configuration files, and copy each file to its corresponding spine switch.

        # scp roce_spine_switch_multi_R1SS.cfg admin@R1SS_IP_Address:/
        # scp roce_spine_switch_multi_R2SS.cfg admin@R2SS_IP_Address:/
      3. Log in to each switch again, and copy the modified configuration into flash.

        On the spine switch for rack 1, you would use the following commands:

        run-script bootflash:roce_spine_switch_multi_R1SS.cfg | grep 'none'
        copy running-config startup-config

        On the spine switch for rack 2, you would use the following commands:

        run-script bootflash:roce_spine_switch_multi_R2SS.cfg | grep 'none'
        copy running-config startup-config
  5. Apply the multi-rack leaf switch configuration to the four leaf switches.

    For each switch, complete the following steps, where SW# represents the values R1LL, R1UL, R2LL, or R2UL, depending on which switch you are configuring.

    1. Log in to the server that has downloaded the RDMA network switch patch ZIP file for the latest release.
    2. Make a copy of the golden configuration file for each leaf switch.

      After extracting the patch ZIP file, run the following command four times from the patch directory, substituting for SW# the values R1LL, R1UL, R2LL, and R2UL.

      # cp roce_switch_templates/roce_leaf_switch_multi.cfg roce_leaf_switch_multi_SW#.cfg
    3. Edit each copy of the leaf switch configuration file to replace the loopback IP addresses.

      Using a text editor, replace each occurrence of %LEAF_LOOPBACK_IP0% and %LEAF_LOOPBACK_IP1% with the correct IP addresses for the leaf switch, as indicated in the table below.

      The scheme used for loopback IP addresses for the leaf switches in a 2-rack system is:

      Switch LEAF_LOOPBACK_IP0 LEAF_LOOPBACK_IP1
      Rack 1 Lower Leaf switch (R1LL) 100.64.0.101 100.64.1.101
      Rack 1 Upper Leaf switch (R1UL) 100.64.0.102 100.64.1.102
      Rack 2 Lower Leaf switch (R2LL) 100.64.0.103 100.64.1.103
      Rack 2 Upper Leaf switch (R2UL) 100.64.0.104 100.64.1.104
    4. Apply the updated multi-rack configuration files to each corresponding leaf switch.
      1. Log in to each leaf switch, and remove the existing configuration file using the following command:

        delete bootflash:roce_leaf_switch.cfg
        

        For example:

        rack1sw-rocea0(config)# delete bootflash:roce_leaf_switch.cfg
        Do you want to delete "/roce_leaf_switch.cfg" ? (yes/no/abort) [y] y
        rack1sw-rocea0(config)#
      2. Log in to the server that contains the modified configuration files, and copy each file to its corresponding leaf switch.

        # scp roce_leaf_switch_multi_SW#.cfg admin@SW#_IP_Address:/
      3. Log in to each switch again, and copy the modified configuration into flash.

        On the upper leaf switch for rack 1, you would use the following commands:

        run-script bootflash:roce_leaf_switch_multi_R1UL.cfg | grep 'none'
        copy running-config startup-config

        On the lower leaf switch for rack 2, you would use the following commands:

        run-script bootflash:roce_leaf_switch_multi_R2LL.cfg | grep 'none'
        copy running-config startup-config
  6. Use patchmgr to verify the configuration of the RDMA Network Fabric switches against the golden configuration files.
    1. Create a file that contains the name or IP address of the leaf and spine switches on both racks.
      For example, you create a file name roce_switches.lst. The file must contain the host name or IP address for the 2 spine switches and 4 leaf switches, with each switch on a new line.
    2. Run patchmgr with the --verify_config option.

      In the following command, roce_switches.lst is a file that contains the switches to be queried, one per line.

      ./patchmgr --roceswitches roce_switches.lst --verify-config
  7. Perform the physical cabling of the switches.
    1. In Rack 2, remove the eight existing inter-switch connections between the two leaf switches, R2UL and R2LL.
    2. In Rack 2, cable each leaf switch using the tables in Two-Rack Cabling for X8M Racks.
    3. In Rack 1, remove the eight existing inter-switch connections between the two leaf switches, R1UL and R1LL.
    4. In Rack 1, cable each leaf switch using the tables in Two-Rack Cabling for X8M Racks.
  8. Confirm each switch is available and connected.

    For each of the 6 switches, confirm the switch shows "connected" and "100G". In the following example, the leaf switches are ports Eth1/4 to Eth1/7, and Eth1/30 to Eth1/33. The spine switches are ports Eth1/5 to Eth1/20.

    When run from a spine switch, the output should be similar to the following:

    rack1sw-roces0# show interface status
    --------------------------------------------------------------------------------
    Port          Name               Status    Vlan      Duplex  Speed   Type
    --------------------------------------------------------------------------------
    mgmt0         --                 connected routed    full    1000    -- 
    --------------------------------------------------------------------------------
    Port          Name               Status    Vlan      Duplex  Speed   Type
    --------------------------------------------------------------------------------
    ...
    Eth1/5        RouterPort5        connected routed    full    100G    QSFP-100G-CR4
    Eth1/6        RouterPort6        connected routed    full    100G    QSFP-100G-SR4
    Eth1/7        RouterPort7        connected routed    full    100G    QSFP-100G-CR4
    Eth1/8        RouterPort8        connected routed    full    100G    QSFP-100G-SR4
    Eth1/9        RouterPort9        connected routed    full    100G    QSFP-100G-CR4
    Eth1/10       RouterPort10       connected routed    full    100G    QSFP-100G-SR4
    Eth1/11       RouterPort11       connected routed    full    100G    QSFP-100G-CR4
    Eth1/12       RouterPort12       connected routed    full    100G    QSFP-100G-SR4
    Eth1/13       RouterPort13       connected routed    full    100G    QSFP-100G-CR4
    Eth1/14       RouterPort14       connected routed    full    100G    QSFP-100G-SR4
    Eth1/15       RouterPort15       connected routed    full    100G    QSFP-100G-CR4
    Eth1/16       RouterPort16       connected routed    full    100G    QSFP-100G-SR4
    Eth1/17       RouterPort17       connected routed    full    100G    QSFP-100G-CR4
    Eth1/18       RouterPort18       connected routed    full    100G    QSFP-100G-SR4
    Eth1/19       RouterPort19       connected routed    full    100G    QSFP-100G-CR4
    Eth1/20       RouterPort20       connected routed    full    100G    QSFP-100G-SR4
    Eth1/21       RouterPort21       xcvrAbsen      routed    full    100G    --
    ...

    When run from a leaf switch, the output should be similar to the following:

    rack1sw-rocea0# show interface status
    --------------------------------------------------------------------------------
    Port          Name               Status    Vlan      Duplex  Speed   Type
    --------------------------------------------------------------------------------
    mgmt0         --                 connected routed    full    1000    -- 
    --------------------------------------------------------------------------------
    Port          Name               Status    Vlan      Duplex  Speed   Type
    --------------------------------------------------------------------------------
    ...
    Eth1/4        RouterPort1        connected routed    full    100G    QSFP-100G-CR4
    Eth1/5        RouterPort2        connected routed    full    100G    QSFP-100G-CR4
    Eth1/6        RouterPort3        connected routed    full    100G    QSFP-100G-CR4
    Eth1/7        RouterPort4        connected routed    full    100G    QSFP-100G-CR4
    Eth1/8        celadm14           connected 3888      full    100G    QSFP-100G-CR4
    ...
    Eth1/29       celadm01           connected 3888      full    100G    QSFP-100G-CR4
    Eth1/30       RouterPort5        connected routed    full    100G    QSFP-100G-SR4
    Eth1/31       RouterPort6        connected routed    full    100G    QSFP-100G-SR4
    Eth1/32       RouterPort7        connected routed    full    100G    QSFP-100G-SR4
    Eth1/33       RouterPort8        connected routed    full    100G    QSFP-100G-SR4
    ...
  9. Verify each switch is able to see the switches it is connected to.

    Check the neighbor discovery for every switch in racks R1 and R2. Make sure that all switches are visible and check the switch ports assignment (leaf switches: ports Eth1/4 - Eth1/7, Eth1/30 - Eth1/33; spine switches: ports Eth1/5 - Eth1/20) against the tables in Two-Rack Cabling for X8M Racks.

    Log in to each switch and use the show lldp neighbors command. A spine switch should see the two leaf switches in each rack, but not the other spine switch. The output for a spine switch should be similar to the following:

    rack1sw-roces0# show lldp neighbors
    ...
    Device ID            Local Intf      Hold-time  Capability  Port ID
    rack1-adm0           mgmt0           120        BR          Ethernet1/47
    rack1sw-roceb0       Eth1/5     120        BR          Ethernet1/5
    rack2sw-roceb0       Eth1/6     120        BR          Ethernet1/5
    rack1sw-roceb0       Eth1/7     120        BR          Ethernet1/7
    rack2sw-roceb0       Eth1/8     120        BR          Ethernet1/7
    rack1sw-roceb0       Eth1/9     120        BR          Ethernet1/4
    rack2sw-roceb0       Eth1/10    120        BR          Ethernet1/4
    rack1sw-roceb0       Eth1/11    120        BR          Ethernet1/6
    rack2sw-roceb0       Eth1/12    120        BR          Ethernet1/6
    rack1sw-rocea0       Eth1/13    120        BR          Ethernet1/5
    rack2sw-rocea0       Eth1/14    120        BR          Ethernet1/5
    rack1sw-rocea0       Eth1/15    120        BR          Ethernet1/7
    rack2sw-rocea0       Eth1/16    120        BR          Ethernet1/7
    rack1sw-rocea0       Eth1/17    120        BR          Ethernet1/4
    rack2sw-rocea0       Eth1/18    120        BR          Ethernet1/4
    rack1sw-rocea0       Eth1/19    120        BR          Ethernet1/6 
    rack2sw-rocea0       Eth1/20    120        BR          Ethernet1/6
    Total entries displayed: 17

    Each leaf switch should see the two spine switches, but not the other leaf switches. The output for a leaf switch should be similar to the following:

    rack1sw-rocea0# show lldp neighbors
    ...
    Device ID            Local Intf      Hold-time  Capability  Port ID
    switch               mgmt0      120        BR          Ethernet1/46
    rack1sw-roces0       Eth1/4     120        BR          Ethernet1/17
    rack1sw-roces0       Eth1/5     120        BR          Ethernet1/13
    rack1sw-roces0       Eth1/6     120        BR          Ethernet1/19
    rack1sw-roces0       Eth1/7     120        BR          Ethernet1/15
    rack2sw-roces0       Eth1/30    120        BR          Ethernet1/17
    rack2sw-roces0       Eth1/31    120        BR          Ethernet1/13
    rack2sw-roces0       Eth1/32    120        BR          Ethernet1/19
    rack2sw-roces0       Eth1/33    120        BR          Ethernet1/15
    rocetoi-ext-sw       Eth1/36    120        BR          Ethernet1/49
    Total entries displayed: 10
  10. Power on all servers in racks R1 and R2.
  11. For each rack, confirm the multi-rack cabling by running the verify_roce_cables.py script.

    Refer to My Oracle Support Doc ID 2587717.1 for download and usage instructions.

    Note:

    For the ports on the leaf and spine switches that were cabled for the RDMA Network Fabric, ignore the FAIL status in the column "CABLE OK?". For the other ports on the switches (for database servers, storage servers, etc.) the status should be OK.

    The following output is a partial example of the command results:

    # ./verify_roce_cables.py -n nodes.rack1 -s switches.rack1
    SWITCH PORT (EXPECTED PEER)  LEAF-1 (rack1sw-rocea0)     : CABLE OK?  LEAF-2 (rack1sw-roceb0)    : CABLE OK?
    ----------- --------------   --------------------------- : --------   -----------------------    : ---------
    Eth1/4 (ISL peer switch)   : rack1sw-roces0 Ethernet1/17 : FAIL       rack1sw-roces0 Ethernet1/9 : FAIL
    Eth1/5 (ISL peer switch)   : rack1sw-roces0 Ethernet1/13 : FAIL       rack1sw-roces0 Ethernet1/5 : FAIL
    Eth1/6 (ISL peer switch)   : rack1sw-roces0 Ethernet1/19 : FAIL       rack1sw-roces0 Ethernet1/11: FAIL
    Eth1/7 (ISL peer switch)   : rack1sw-roces0 Ethernet1/15 : FAIL       rack1sw-roces0 Ethernet1/7 : FAIL
    Eth1/12 (celadm10)         : rack1celadm10 port-1        : OK         rack1celadm10 port-2       : OK
    Eth1/13 (celadm09)         : rack1celadm09 port-1        : OK         rack1celadm09 port-2       : OK
    Eth1/14 (celadm08)         : rack1celadm08 port-1        : OK         rack1celadm08 port-2       : OK
    ...
    Eth1/15 (adm08)            : rack1dbadm08 port-1         : OK         rack1dbadm08 port-2        : OK
    Eth1/16 (adm07)            : rack1dbadm07 port-1         : OK         rack1dbadm07 port-2        : OK
    Eth1/17 (adm06)            : rack1dbadm06 port-1         : OK         rack1dbadm06 port-2        : OK
    ...
    Eth1/30 (ISL peer switch)  : rack2sw-roces0 Ethernet1/17 : FAIL       rack2sw-roces0 Ethernet1/9 : FAIL
    Eth1/31 (ISL peer switch)  : rack2sw-roces0 Ethernet1/13 : FAIL       rack2sw-roces0 Ethernet1/5 : FAIL
    Eth1/32 (ISL peer switch)  : rack2sw-roces0 Ethernet1/19 : FAIL       rack2sw-roces0 Ethernet1/11: FAIL
    Eth1/33 (ISL peer switch)  : rack2sw-roces0 Ethernet1/15 : FAIL       rack2sw-roces0 Ethernet1/7 : FAIL
    
    # ./verify_roce_cables.py -n nodes.rack2 -s switches.rack2
    SWITCH PORT (EXPECTED PEER)  LEAF-1 (rack1sw-rocea0)     : CABLE OK?  LEAF-2 (rack1sw-roceb0)    : CABLE OK?
    ----------- --------------   --------------------------- : --------   -----------------------    : ---------
    Eth1/4 (ISL peer switch)  :  rack1sw-roces0 Ethernet1/18 : FAIL       rack1sw-roces0 Ethernet1/10: FAIL
    ...
  12. Confirm the network status in the current cluster between database servers and storage servers using the infinicheck command.
    # /opt/oracle.SupportTools/ibdiagtools/infinicheck -z
    
    # /opt/oracle.SupportTools/ibdiagtools/infinicheck -g hosts.lst -c cells.lst -b
    
    INFINICHECK                    
            [Network Connectivity, Configuration and Performance]        
                   
              ####  FABRIC TYPE TESTS  #### 
    System type identified: RoCE
    Verifying User Equivalance of user=root from all DBs to all CELLs.
         ####  RoCE CONFIGURATION TESTS  ####       
         Checking for presence of RoCE devices on all DBs and CELLs 
    [SUCCESS].... RoCE devices on all DBs and CELLs look good
         Checking for RoCE Policy Routing settings on all DBs and CELLs 
    [SUCCESS].... RoCE Policy Routing settings look good
         Checking for RoCE DSCP ToS mapping on all DBs and CELLs 
    [SUCCESS].... RoCE DSCP ToS settings look good
         Checking for RoCE PFC settings and DSCP mapping on all DBs and CELLs
    [SUCCESS].... RoCE PFC and DSCP settings look good
         Checking for RoCE interface MTU settings. Expected value : 2300
    [SUCCESS].... RoCE interface MTU settings look good
         Verifying switch advertised DSCP on all DBs and CELLs ports ( )
    [SUCCESS].... Advertised DSCP settings from RoCE switch looks good  
        ####  CONNECTIVITY TESTS  ####
        [COMPUTE NODES -> STORAGE CELLS] 
          (60 seconds approx.)       
        (Will walk through QoS values: 0-6) [SUCCESS]..........Results OK
    [SUCCESS]....... All  can talk to all storage cells          
        [COMPUTE NODES -> COMPUTE NODES]               
    ...
  13. After cabling the racks together, proceed to Configuring the New Hardware to finish the configuration of the new rack.
2.4.2.2 Cabling Two Racks Together–X8 and Earlier

Use this procedure to cable together two racks that both use InfiniBand Transport Layer systems based on an InfiniBand Network Layer.

This procedure assumes that the racks are adjacent to each other.
In the procedure, the existing rack is R1, and the new rack is R2.
  1. Set the priority of the current, active Subnet Manager Master to 10 on the spine switch, as follows:

    1. Log in to any RDMA Network Fabric switch on the active system.

    2. Use the getmaster command to determine that the Subnet Manager Master is running on the spine switch. If it is not, then follow the procedure Setting the Subnet Manager Master on Oracle Exadata Database Machine Full Rack and Oracle Exadata Database Machine Half Rack in Oracle Exadata Database Machine Installation and Configuration Guide.

    3. Log in to the spine switch.

    4. Use the disablesm command to stop Subnet Manager.

    5. Use the setsmpriority 10 command to set the priority to 10.

    6. Use the enablesm command to restart Subnet Manager.

    7. Repeat step 1.b to ensure the Subnet Manager Master is running on the spine switch.

  2. Ensure the new rack is near the existing rack. The RDMA Network Fabric cables must be able to reach the servers in each rack.

  3. Completely shut down the new rack (R2).

  4. Cable the two leaf switches R2 IB2 and R2 IB3 in the new rack according to Two-Rack Cabling. Note that you need to first remove the seven existing inter-switch connections between each leaf switch, as well as the two connections between the leaf switches and the spine switch in the new rack R2, not in the existing rack R1.

  5. Verify both RDMA Network Fabric interfaces are up on all database nodes and storage cells. You can do this by running the ibstat command on each node and verifying both interfaces are up.

  6. Power off leaf switch R1 IB2. This causes all the database servers and Exadata Storage Servers to fail over their RDMA Network Fabric traffic to R1 IB3.

  7. Disconnect all seven inter-switch links between R1 IB2 and R1 IB3, as well as the one connection between R1 IB2 and the spine switch R1 IB1.

  8. Cable leaf switch R1 IB2 according to Two-Rack Cabling.

  9. Power on leaf switch R1 IB2.

  10. Wait for three minutes for R1 IB2 to become completely operational.

    To check the switch, log in to the switch and run the ibswitches command. The output should show three switches, R1 IB1, R1 IB2, and R1 IB3.

  11. Verify both RDMA Network Fabric interfaces are up on all database nodes and storage cells. You can do this by running the ibstat command on each node and verifying both interfaces are up.

  12. Power off leaf switch R1 IB3. This causes all the database servers and storage servers to fail over their RDMA Network Fabric traffic to R1 IB2.

  13. Disconnect the one connection between R1 IB3 and the spine switch R1 IB1.

  14. Cable leaf switch R1 IB3 according to Two-Rack Cabling.

  15. Power on leaf switch R1 IB3.

  16. Wait for three minutes for R1 IB3 to become completely operational.

    To check the switch, log in to the switch and run the ibswitches command. The output should show three switches, R1 IB1, R1 IB2, and R1 IB3.

  17. Power on all the InfiniBand switches in R2.

  18. Wait for three minutes for the switches to become completely operational.

    To check the switch, log in to the switch and run the ibswitches command. The output should show six switches, R1 IB1, R1 IB2, R1 IB3, R2 IB1, R2 IB2, and R2 IB3.

  19. Ensure the Subnet Manager Master is running on R1 IB1 by running the getmaster command from any switch.

  20. Power on all servers in R2.

  21. Log in to spine switch R1 IB1, and lower its priority to 8 as follows:

    1. Use the disablesm command to stop Subnet Manager.

    2. Use the setsmpriority 8 command to set the priority to 8.

    3. Use the enablesm command to restart Subnet Manager.

  22. Ensure Subnet Manager Master is running on one of the spine switches.

After cabling the racks together, proceed to Configuring the New Hardware to configure the racks.

2.4.3 Cabling Several Racks Together

You can cable several racks together by following a series of steps in a specific order.

This procedure assumes that the racks are adjacent to each other. The existing racks are R1, R2, ... Rn, the new rack is Rn+1. For example, if you have four racks and you are adding a fifth, the existing racks would be R1, R2, R3, and R4 and the new rack would be R5.

You can cable up to 18 racks together without additional switches.

The InfiniBand Transport Layer systems based on a RoCE Network Layer (X8M) procedures for are different than InfiniBand Transport Layer systems based on an InfiniBand Network Layer (X8 and earlier.)

2.4.3.1 Cabling Several Racks Together–X8M

To create a larger engineered system, you can cable several X8M racks together.

This procedure is for X8M InfiniBand Transport Layer systems based on a RoCE Network Layer.

In this procedure, the existing racks are R1, R2, … ,Rn, and the new rack is Rn+1. In the following steps, these example switch names are used:

  • rack5sw-roces0: Rack 5 Spine switch (SS)
  • rack5sw-rocea0: Rack 5 Lower Leaf switch (R5LL)
  • rack5sw-roceb0: Rack 5 Upper Leaf switch (R5UL)

Note:

Cabling three or more racks together requires no downtime for the existing racks R1, R2, …, Rn. Only the new rack, Rn+1, is powered down
  1. Ensure the new rack is near the existing racks R1, R2, …, Rn.
    The RDMA Network Fabric cables must be able to reach the servers in each rack.
  2. Ensure you have a backup of the current switch configuration for each switch in the existing racks and the new rack.
    For each switch, complete the steps in the Oracle Exadata Database Machine Maintenance Guide, section Backing Up Settings on the InfiniBand Transport Layer systems based on a RoCE Network Layer Switch.
  3. Shut down all servers in the new rack Rn+1.
    Refer to Powering Off Oracle Exadata Rack. The switches must remain online and available.
  4. Apply the multi-rack spine switch configuration to the spine switch in the new rack Rn+1:
    1. Log in to the server that has downloaded the RDMA network switch patch ZIP file for the Oracle Exadata System Software release used by the existing racks.

      If you do not have the patch ZIP file available, you can download the latest patch for your Oracle Exadata System Software release. Refer to My Oracle Support Doc ID 888828.1 for information about the latest available patches.

      For example, for 19.3 release, as of October 2019, the patch is 30345643- RDMA network switch (7.0(3)17(6)) and InfiniBand network switch (2.2.13-2).

    2. Make a copy of the golden configuration file for the new spine switch.

      After extracting the patch ZIP file, run these commands from patch directory, where n+1 is the number of the new rack:

      # cp roce_switch_templates/roce_spine_switch_multi.cfg roce_spine_switch_multi_Rn+1SS.cfg
    3. Edit the copy of the spine switch configuration file.

      Using a text editor, replace the three occurrences of %SPINE_LOOPBACK_IP0% with the correct IP address for the switch, as indicated in the table below, using the value that matches Rn+1 for your environment.

      Switch SPINE_LOOPBACK_IP0
      Rack 3 spine switch (R3SS) 100.64.0.203
      Rack 4 spine switch (R4SS) 100.64.0.204
      Rack 5 spine switch (R5SS) 100.64.0.205
      Rack 6 spine switch (R6SS) 100.64.0.206
      Rack 7 spine switch (R7SS) 100.64.0.207
      Rack 8 spine switch (R8SS) 100.64.0.208

      For example, if you are adding a rack to an existing 4-rack system (where n+1=5), then use IP address 100.64.0.205 as the SPINE_LOOPBACK_IP0 for the spine switch in the new rack (R5SS).

      ! Define loopback interface for underlay OSPF routing
      interface loopback0
       description Routing loopback interface
       !ip address 100.64.0.201/32
       ip address 100.64.0.205/32
       ip router ospf UNDERLAY area 0.0.0.0
      ! Configure OSPF as the underlay network
      router ospf UNDERLAY
       router-id 100.64.0.205
      ! change ECMP hash rotate value from default 32 to 40 for better
      ! router port utilization for upto parallel flows via the 8
      ! available router ports
      ip load-sharing address source-destination port source-destination rotate 40
      ! Create BGP route reflector to exchange routes across VTEPs
      ! Use CIDR block of IPs for neighbor range
      ! - log-neighbor-changes: Enables the generation of logging messages
      ! generated when the status of a BGP neighbor changes.
      ! - address-family ipv4 unicast: Enters address family configuration
      ! mode and Specifies IP Version 4 unicast address prefixes.
      ! address
      router bgp 65502
       router-id 100.64.0.205
       log-neighbor-changes
    4. Verify the three replacements in the spine switch configuration file.

      For example, if you are adding a 5th rack, then check for IP address 100.64.0.205 in the spine switch configuration file:

      $ grep 100.64 roce_spine_switch_multi_R5SS.cfg |grep -v ‘neighbor’ |grep -v ‘!’
       ip address 100.64.0.205/32
       router-id 100.64.0.205
       router-id 100.64.0.205
    5. Apply the updated multi-rack configuration file to the spine switch in the new rack Rn+1:
      1. Log in to the switch in the new rack Rn+1, and remove the existing configuration file, if it exists. For example, if you are adding a 5th rack, you would use the following command:

        rack5sw-roces0(config)# delete bootflash:roce_spine_switch_multi.cfg
        Do you want to delete "/roce_spine_switch_multi.cfg" ? (yes/no/abort) [y] y
        rack5sw-roces0(config)#
      2. Log in to the server that contains the modified configuration file for the spine switch, and copy the file to the spine switch in the new rack. For example, if you are adding a 5th rack:

        # scp roce_spine_switch_multi_R5SS.cfg admin@R5SS_IP_Address:/
      3. Verify the modified file was copied successfully to the spine switch. For example, if you are adding a 5th rack, log in to the spine switch on the new rack Rn+1 again and use the following command:

        rack5sw-roces0(config)# dir bootflash:roce_spine_switch_multi_R5SS.cfg
             27360 Nov 20 12:12:50 2019 roce_spine_switch_multi_R5SS.cfg
        Usage for bootflash://sup-local
        1829572608 bytes used
        114893496320 bytes free
        116723068928 bytes total
      4. Copy the modified configuration into flash.

        For example, if you are adding a 5th rack, you would use the following commands:

        rack5sw-roces0(config)# run-script bootflash:roce_spine_switch_multi_R5SS.cfg | grep 'none'
        
        rack5sw-roces0(config)# copy running-config startup-config

        Note:

        The run-script command for a spine switch can take up to 2 minutes to complete.
  5. Apply the multi-rack leaf switch configuration to the leaf switches in the new rack Rn+1:

    For each leaf switch, complete the following steps, where SW# represents the values Rn+1LL or Rn+1UL, depending on which switch you are configuring.

    1. Log in to the server that has downloaded the RDMA network switch patch ZIP file (from Step 4.a) for the Oracle Exadata System Software release used by the existing racks.
    2. Make a copy of the golden configuration file for each leaf switch.

      After extracting the patch ZIP file, run the following command twice from the patch directory, substituting for SW# the values Rn+1LL and Rn+1UL.

      # cp roce_switch_templates/roce_leaf_switch_multi.cfg roce_leaf_switch_multi_SW#.cfg
    3. Edit each copy of the leaf switch configuration file to replace the loopback IP addresses:

      Using a text editor, replace the three occurrences of %LEAF_LOOPBACK_IP0% and one occurrence of %LEAF_LOOPBACK_IP1% with the correct IP addresses for the leaf switch, as indicated in the table below.

      Switch LEAF_LOOPBACK_IP0 LEAF_LOOPBACK_IP1

      Rack 3 Lower Leaf switch (R3LL)

      Rack 3 Upper Leaf switch (R3UL)

      100.64.0.105

      100.64.0.106

      100.64.1.105

      100.64.1.106

      Rack 4 Lower Leaf switch (R4LL)

      Rack 4 Upper Leaf switch (R4UL)

      100.64.0.107

      100.64.0.108

      100.64.1.107

      100.64.1.108

      Rack 5 Lower Leaf switch (R5LL)

      Rack 5 Upper Leaf switch (R5UL)

      100.64.0.109

      100.64.0.110

      100.64.1.109

      100.64.1.110

      Rack 6 Lower Leaf switch (R6LL)

      Rack 6 Upper Leaf switch (R6UL)

      100.64.0.111

      100.64.0.112

      100.64.1.111

      100.64.1.112

      Rack 7 Lower Leaf switch (R7LL)

      Rack 7 Upper Leaf switch (R7UL)

      100.64.0.113

      100.64.0.114

      100.64.1.113

      100.64.1.114

      Rack 8 Lower Leaf switch (R8LL)

      Rack 8 Upper Leaf switch (R8UL)

      100.64.0.115

      100.64.0.116

      100.64.1.115

      100.64.1.116

      For example, if you are adding a rack to an existing 4-rack system (where n+1=5), then use the loopback IP addresses listed in the table above for R5LL and R5UL.

      ! Define loopback interface for IGP protocol for VTEP reachability
      interface loopback0
       description Routing loopback interface
       !ip address 100.64.0.101/32
       ip address 100.64.0.109/32
       ip router ospf UNDERLAY area 0.0.0.0
      ! Define loopback interface for associating with local VTEP
      interface loopback1
       description VTEP loopback interface
       !ip address 100.64.1.101/32
       ip address 100.64.1.109/32
       ip router ospf UNDERLAY area 0.0.0.0
      ! Configure OSPF as the underlay network
      router ospf UNDERLAY
       router-id 100.64.0.109
      ! change ECMP hash rotate value from default 32 to 40 for better
      ! router port utilization for upto parallel flows via the 8
      ! available router ports
      ip load-sharing address source-destination port source-destination rotate 40
      ! - Create BGP route reflector to exchange routes across VTEPs
      ! Define max config 8 neighbor spines using their loopback IPs
      ! - BGP peers are located in an autonomous system (AS) that uses
      ! 4-byte AS numbers. Cisco recommends to pick a high value such
      ! as 65502 to avoid conflict with future bgp peers.
      ! - Create a template ‘BasePolicy’ that defines a peer policy
      ! template to define attributes for a particular address family.
      router bgp 65502
       router-id 100.64.0.109
       log-neighbor-changes
      
    4. Verify the IP address replacements in the each leaf configuration file.

      For example, if you are adding a 5th rack, then check for IP address 100.64.0.109 and 100.64.1.109 in the leaf switch configuration file for R5LL and for IP addresses 100.64.0.110 and 100.64.1.110 in the leaf switch configuration file for R5UL:

      $ grep 100.64. roce_leaf_switch_multi_R5LL.cfg | grep -v neighbor | grep -v ‘!’
       ip address 100.64.0.109/32
       ip address 100.64.1.109/32
       router-id 100.64.0.109
       router-id 100.64.0.109
      
      $ grep 100.64. roce_leaf_switch_multi_R5UL.cfg | grep -v neighbor | grep -v ‘!’
       ip address 100.64.0.110/32
       ip address 100.64.1.110/32
       router-id 100.64.0.110
       router-id 100.64.0.110
    5. Apply the updated multi-rack configuration files to each corresponding leaf switch in the new rack:
      1. Log in to each leaf switch, and remove the existing configuration file. For example, if you are adding a 5th rack, you would use the following commands on each leaf switch:

        rack5sw-rocea0# delete bootflash:roce_leaf_switch.cfg
        Do you want to delete “/roce_leaf_switch.cfg” ? (yes/no/abort) [y] y
        
        rack5sw-rocea0# delete bootflash:roce_leaf_switch_multi.cfg
        No such file or directory
        rack5sw-roceb0# delete bootflash:roce_leaf_switch.cfg
        Do you want to delete “/roce_leaf_switch.cfg” ? (yes/no/abort) [y] y
        
        rack5sw-roceb0# delete bootflash:roce_leaf_switch_multi.cfg
        No such file or directory
      2. Log in to the server that contains the modified configuration files, and copy each file to its corresponding leaf switch.

        # scp roce_leaf_switch_multi_R5LL.cfg admin@rack5sw-rocea0:/
        User Access Verification
        Password:
        roce_leaf_switch_multi_R5LL.cfg 100% 167KB 487.6KB/s 00:00
        
        # scp roce_leaf_switch_multi_R5UL.cfg admin@rack5sw-roceb0:/
        User Access Verification
        Password:
        roce_leaf_switch_multi_R5UL.cfg
      3. Verify the modified files were copied successfully to the leaf switches. For example, if you are adding a 5th rack, log in to each leaf switch again and use the following commands:

        rack5sw-rocea0# dir bootflash:roce_leaf_switch_multi_R5LL.cfg
            171387 Nov 20 14:41:52 2019 roce_leaf_switch_multi_R5LL.cfg
        Usage for bootflash://sup-local
        2583580672 bytes used
        114139488256 bytes free
        116723068928 bytes total
        
        rack5sw-roceb0# dir bootflash:roce_leaf_switch_multi_R5UL.cfg
            171387 Nov 20 21:41:50 2019 roce_leaf_switch_multi_R5UL.cfg
        Usage for bootflash://sup-local
        2579836928 bytes used
        114143232000 bytes free
        116723068928 bytes total
      4. Copy the modified configuration file into flash.

        For example, if you are adding a 5th rack, you would use the following commands:

        rack5sw-rocea0(config)# run-script bootflash:roce_leaf_switch_multi_R5LL.cfg | grep 'none'
        
        rack5sw-rocea0(config)# copy running-config startup-config
        rack5sw-roceb0(config)# run-script bootflash:roce_leaf_switch_multi_R5UL.cfg | grep 'none'
        
        rack5sw-roceb0(config)# copy running-config startup-config

        Note:

        The run-script command for a leaf switch can take up to 6 minutes to complete.
  6. Use patchmgr to verify the configuration of the RDMA Network Fabric switches against the golden configuration files.
    1. Log in to the server that has downloaded the RDMA network switch patch ZIP file (from Step 4.a).
    2. Create a file that contains the name or IP address of the leaf and spine switches on all racks.
      For example, you create a file named roce_switches.lst. The file contains the host name or IP address for the spine switches and both leaf switches on each rack, with each switch on a new line.
    3. Run patchmgr with the --verify_config option.

      In the following command, roce_switches.lst is a file that contains the switches to be queried.

      $ ./patchmgr --roceswitches roce_switches.lst --verify-config --log_dir /tmp
      
      2019-11-20 14:12:27 -0800 :Working: Initiate config verify on RoCE switches from . Expect up to 6 minutes for each switch
                                                         
      
      2019-11-20 14:12:30 -0800 1 of 15 :Verifying config on switch rack1sw-rocea0
      
      2019-11-20 14:12:30 -0800: [INFO ] Dumping current running config locally as file: /tmp/run.rack1sw-rocea0.cfg
      2019-11-20 14:12:33 -0800: [SUCCESS ] Backed up switch config successfully
      2019-11-20 14:12:33 -0800: [INFO ] Validating running config against template [1/3]: /tmp/patch_switch_19.3.1.0.0.191018/roce_switch_templates/roce_leaf_switch.cfg
      2019-11-20 14:12:33 -0800: [INFO ] Validating running config against template [2/3]: /tmp/patch_switch_19.3.1.0.0.191018/roce_switch_templates/roce_leaf_switch_multi.cfg
      2019-11-20 14:12:33 -0800: [INFO ] Config matches template: /tmp/patch_switch_19.3.1.0.0.191018/roce_switch_templates/roce_leaf_switch_multi.cfg
      2019-11-20 14:12:33 -0800: [SUCCESS ] Config validation successful!
      
      
      2019-11-20 14:12:33 -0800 2 of 15 :Verifying config on switch rack1sw-roceb0
      ...
  7. Perform the physical cabling of the switches in the new rack Rn+1.

    Caution:

    Cabling within a live network must be done carefully in order to avoid potentially serious disruptions.
    1. Remove the eight existing inter-switch connections between each leaf switch in the new rack Rn+1 (ports 4, 5, 6, 7 and 30, 31, 32, 33).
    2. Cable the leaf switches in the new rack according to the appropriate table in Multi-Rack Cabling Tables.

      For example, if you are adding a 5th rack and rack Rn+1 is R5, then use "Table 4-14 Leaf Switch Connections for the Fifth Rack in a Five-Rack System".

  8. Add the new rack to the switches in the existing racks (R1 to Rn).
    1. For an existing rack (Rx), cable the lower leaf switch RxLL according to the appropriate table in Multi-Rack Cabling Tables.
    2. For the same rack, cable the upper leaf switch RxUL according to the appropriate table in Multi-Rack Cabling Tables.
    3. Repeat these steps for each existing rack, R1 to Rn.
  9. Confirm each switch is available and connected.

    For each switch in racks R1, R2, …, Rn, Rn+1, confirm the output for the switch show interface status command shows connected and 100G. In the following example, the leaf switches are ports Eth1/4 to Eth1/7, and Eth1/30 to Eth1/33. The spine switches are ports Eth1/5 to Eth1/20.

    When run from a spine switch, the output should be similar to the following:

    rack1sw-roces0# show interface status
    --------------------------------------------------------------------------------
    Port          Name               Status    Vlan      Duplex  Speed   Type
    --------------------------------------------------------------------------------
    mgmt0         --                 connected routed    full    1000    -- 
    --------------------------------------------------------------------------------
    Port          Name               Status    Vlan      Duplex  Speed   Type
    --------------------------------------------------------------------------------
    ...
    Eth1/5        RouterPort5        connected routed    full    100G    QSFP-100G-CR4
    Eth1/6        RouterPort6        connected routed    full    100G    QSFP-100G-SR4
    Eth1/7        RouterPort7        connected routed    full    100G    QSFP-100G-CR4
    Eth1/8        RouterPort8        connected routed    full    100G    QSFP-100G-SR4
    Eth1/9        RouterPort9        connected routed    full    100G    QSFP-100G-CR4
    Eth1/10       RouterPort10       connected routed    full    100G    QSFP-100G-SR4
    Eth1/11       RouterPort11       connected routed    full    100G    QSFP-100G-CR4
    Eth1/12       RouterPort12       connected routed    full    100G    QSFP-100G-SR4
    Eth1/13       RouterPort13       connected routed    full    100G    QSFP-100G-CR4
    Eth1/14       RouterPort14       connected routed    full    100G    QSFP-100G-SR4
    Eth1/15       RouterPort15       connected routed    full    100G    QSFP-100G-CR4
    Eth1/16       RouterPort16       connected routed    full    100G    QSFP-100G-SR4
    Eth1/17       RouterPort17       connected routed    full    100G    QSFP-100G-CR4
    Eth1/18       RouterPort18       connected routed    full    100G    QSFP-100G-SR4
    Eth1/19       RouterPort19       connected routed    full    100G    QSFP-100G-CR4
    Eth1/20       RouterPort20       connected routed    full    100G    QSFP-100G-SR4
    Eth1/21       RouterPort21       xcvrAbsen      routed    full    100G    --
    ...

    When run from a leaf switch, the output should be similar to the following:

    rack1sw-rocea0# show interface status
    --------------------------------------------------------------------------------
    Port          Name               Status    Vlan      Duplex  Speed   Type
    --------------------------------------------------------------------------------
    mgmt0         --                 connected routed    full    1000    -- 
    --------------------------------------------------------------------------------
    Port          Name               Status    Vlan      Duplex  Speed   Type
    --------------------------------------------------------------------------------
    ...
    Eth1/4        RouterPort1        connected routed    full    100G    QSFP-100G-CR4
    Eth1/5        RouterPort2        connected routed    full    100G    QSFP-100G-CR4
    Eth1/6        RouterPort3        connected routed    full    100G    QSFP-100G-CR4
    Eth1/7        RouterPort4        connected routed    full    100G    QSFP-100G-CR4
    Eth1/8        celadm14           connected 3888      full    100G    QSFP-100G-CR4
    ...
    Eth1/29       celadm01           connected 3888      full    100G    QSFP-100G-CR4
    Eth1/30       RouterPort5        connected routed    full    100G    QSFP-100G-SR4
    Eth1/31       RouterPort6        connected routed    full    100G    QSFP-100G-SR4
    Eth1/32       RouterPort7        connected routed    full    100G    QSFP-100G-SR4
    Eth1/33       RouterPort8        connected routed    full    100G    QSFP-100G-SR4
    ...
  10. Check the neighbor discovery for every switch in racks R1, R2, …, Rn, Rn+1.
    Make sure that all switches are visible and check the switch ports assignment (leaf switches: ports Eth1/4 - Eth1/7, Eth1/30 - Eth1/33; spine switches: ports Eth1/5 - Eth1/20) against the appropriate table in Multi-Rack Cabling Tables.

    Log in to each switch and use the show lldp neighbors command.

    Each spine switch should see all the leaf switches in each rack, but not the other spine switches. The output for a spine switch should be similar to the following:

    rack1sw-roces0# show lldp neighbors | grep roce
    rack1sw-roceb0 Eth1/5 120 BR Ethernet1/5
    rack2sw-roceb0 Eth1/6 120 BR Ethernet1/5
    rack1sw-roceb0 Eth1/7 120 BR Ethernet1/7
    rack2sw-roceb0 Eth1/8 120 BR Ethernet1/7
    rack1sw-roceb0 Eth1/9 120 BR Ethernet1/4
    rack2sw-roceb0 Eth1/10 120 BR Ethernet1/4
    rack3sw-roceb0 Eth1/11 120 BR Ethernet1/5
    rack3sw-roceb0 Eth1/12 120 BR Ethernet1/7
    rack1sw-rocea0 Eth1/13 120 BR Ethernet1/5
    rack2sw-rocea0 Eth1/14 120 BR Ethernet1/5
    rack1sw-rocea0 Eth1/15 120 BR Ethernet1/7
    rack2sw-rocea0 Eth1/16 120 BR Ethernet1/7
    rack3sw-rocea0 Eth1/17 120 BR Ethernet1/5
    rack2sw-rocea0 Eth1/18 120 BR Ethernet1/4
    rack3sw-rocea0 Eth1/19 120 BR Ethernet1/7
    rack3sw-rocea0 Eth1/20 120 BR Ethernet1/4 

    Each leaf switch should see the spine switch in every rack, but not the other leaf switches. The output for a leaf switch should be similar to the following:

    rack1sw-rocea0# show lldp neighbors | grep roce
    rack3sw-roces0 Eth1/4 120 BR Ethernet1/13
    rack1sw-roces0 Eth1/5 120 BR Ethernet1/13
    rack3sw-roces0 Eth1/6 120 BR Ethernet1/15
    rack1sw-roces0 Eth1/7 120 BR Ethernet1/15
    rack2sw-roces0 Eth1/30 120 BR Ethernet1/17
    rack2sw-roces0 Eth1/31 120 BR Ethernet1/13
    rack3sw-roces0 Eth1/32 120 BR Ethernet1/17
    rack2sw-roces0 Eth1/33 120 BR Ethernet1/15
  11. Power on all the servers in the new rack, Rn+1.
  12. For each rack, confirm the multi-rack cabling by running the verify_roce_cables.py script.

    Refer to My Oracle Support Doc ID 2587717.1 for download and usage instructions.

    Note:

    For the ports on the leaf and spine switches that were cabled for the RDMA Network Fabric, ignore the FAIL status in the column "CABLE OK?". For the other ports on the switches (for database servers, storage servers, etc.) the status should be OK.

    The following output is a partial example of the command results:

    # ./verify_roce_cables.py -n nodes.rack1 -s switches.rack1
    SWITCH PORT (EXPECTED PEER)  LEAF-1 (rack1sw-rocea0)     : CABLE OK?  LEAF-2 (rack1sw-roceb0)    : CABLE OK?
    ----------- --------------   --------------------------- : --------   -----------------------    : ---------
    Eth1/4 (ISL peer switch)   : rack1sw-roces0 Ethernet1/17 : FAIL       rack1sw-roces0 Ethernet1/9 : FAIL
    Eth1/5 (ISL peer switch)   : rack1sw-roces0 Ethernet1/13 : FAIL       rack1sw-roces0 Ethernet1/5 : FAIL
    Eth1/6 (ISL peer switch)   : rack1sw-roces0 Ethernet1/19 : FAIL       rack1sw-roces0 Ethernet1/11: FAIL
    Eth1/7 (ISL peer switch)   : rack1sw-roces0 Ethernet1/15 : FAIL       rack1sw-roces0 Ethernet1/7 : FAIL
    Eth1/12 (celadm10)         : rack1celadm10 port-1        : OK         rack1celadm10 port-2       : OK
    Eth1/13 (celadm09)         : rack1celadm09 port-1        : OK         rack1celadm09 port-2       : OK
    Eth1/14 (celadm08)         : rack1celadm08 port-1        : OK         rack1celadm08 port-2       : OK
    ...
    Eth1/15 (adm08)            : rack1dbadm08 port-1         : OK         rack1dbadm08 port-2        : OK
    Eth1/16 (adm07)            : rack1dbadm07 port-1         : OK         rack1dbadm07 port-2        : OK
    Eth1/17 (adm06)            : rack1dbadm06 port-1         : OK         rack1dbadm06 port-2        : OK
    ...
    Eth1/30 (ISL peer switch)  : rack2sw-roces0 Ethernet1/17 : FAIL       rack2sw-roces0 Ethernet1/9 : FAIL
    Eth1/31 (ISL peer switch)  : rack2sw-roces0 Ethernet1/13 : FAIL       rack2sw-roces0 Ethernet1/5 : FAIL
    Eth1/32 (ISL peer switch)  : rack2sw-roces0 Ethernet1/19 : FAIL       rack2sw-roces0 Ethernet1/11: FAIL
    Eth1/33 (ISL peer switch)  : rack2sw-roces0 Ethernet1/15 : FAIL       rack2sw-roces0 Ethernet1/7 : FAIL
    
  13. Confirm the network status in the current cluster between database servers and storage servers using the infinicheck command.

    Complete the steps documented in Verifying RDMA Network Fabric based on ROCE Operation.

    Note:

    If SSH equivalency has not been setup, first run infinicheck -s.
    # /opt/oracle.SupportTools/ibdiagtools/infinicheck -z
    
    # /opt/oracle.SupportTools/ibdiagtools/infinicheck -g hosts.lst -c cells.lst -b
    
    INFINICHECK                    
            [Network Connectivity, Configuration and Performance]        
                   
              ####  FABRIC TYPE TESTS  #### 
    System type identified: RoCE
    Verifying User Equivalance of user=root from all DBs to all CELLs.
         ####  RoCE CONFIGURATION TESTS  ####       
         Checking for presence of RoCE devices on all DBs and CELLs 
    [SUCCESS].... RoCE devices on all DBs and CELLs look good
         Checking for RoCE Policy Routing settings on all DBs and CELLs 
    [SUCCESS].... RoCE Policy Routing settings look good
         Checking for RoCE DSCP ToS mapping on all DBs and CELLs 
    [SUCCESS].... RoCE DSCP ToS settings look good
         Checking for RoCE PFC settings and DSCP mapping on all DBs and CELLs
    [SUCCESS].... RoCE PFC and DSCP settings look good
         Checking for RoCE interface MTU settings. Expected value : 2300
    [SUCCESS].... RoCE interface MTU settings look good
         Verifying switch advertised DSCP on all DBs and CELLs ports ( )
    [SUCCESS].... Advertised DSCP settings from RoCE switch looks good  
        ####  CONNECTIVITY TESTS  ####
        [COMPUTE NODES -> STORAGE CELLS] 
          (60 seconds approx.)       
        (Will walk through QoS values: 0-6) [SUCCESS]..........Results OK
    [SUCCESS]....... All  can talk to all storage cells          
        [COMPUTE NODES -> COMPUTE NODES]               
    ...
  14. After cabling the racks together, proceed to Configuring the New Hardware to finish the configuration of the new rack.
2.4.3.2 Cabling Several Racks Together–X8 and Earlier

You can cable several racks together by following a series of steps in a specific order.

The Subnet Manager Master is assumed to be running on the first InfiniBand Transport Layer systems based on an InfiniBand Network Layer switch on the first rack (R1 IB1).

This procedure is for X8 and earlier: InfiniBand Transport Layer systems based on an InfiniBand Network Layer.

  1. Set the priority of the current, active Subnet Manager Master to 10 on the spine switch.
    1. Log in to any InfiniBand switch on the active system.
    2. Use the getmaster command to determine that the Subnet Manager Master is running on the spine switch.

      The following example shows that the Subnet Manager Master is running on the spine switch dm01sw-ib1.

      # getmaster
      20100701 11:46:38 OpenSM Master on Switch : 0x0021283a8516a0a0 ports 36 Sun DCS 36
      QDR switch dm01sw-ib1.example.com enhanced port 0 lid 1 lmc 0
      If the Subnet Manager Master is not running on the spine switch, then perform the following steps:
      1. Use the getmaster command to identify the current location of the Subnet Manager Master.

      2. Log in as the root user on the leaf switch that is the Subnet Manager Master.

      3. Disable Subnet Manager on the switch. The Subnet Manager Master relocates to another switch.

      4. Use the getmaster command to identify the current location of the Subnet Manager Master. If the spine switch is not the Subnet Manager Master, then repeat steps 1.b.ii and 1.b.iii until the spine switch is the Subnet Manager Master.

      5. Enable Subnet Manager on the leaf switches that were disabled during this procedure.

    3. Log in to the spine switch.
    4. Use the disablesm command to stop the Subnet Manager.
    5. Use the setsmpriority 10 command to set the priority to 10.
    6. Use the enablesm command to restart the Subnet Manager.
    7. Repeat step 1.b to ensure that the Subnet Manager Master is running on the spine switch.
  2. Ensure the new rack is near the existing rack.
    The InfiniBand cables must be able to reach the servers in each rack.
  3. Completely shut down the new rack (Rn+1).
  4. Cable the leaf switch in the new rack according to the appropriate table in Multi-Rack Cabling Tables.

    For example, if rack Rn+1 is R4, then use Table 5-9.

    Caution:

    Cabling within a live network must be done carefully in order to avoid potentially serious disruptions.

    The cabling table that you use for your new InfiniBand topology tells you how to connect ports on the leaf switches to ports on spine switches in order to connect the racks. Some of these ports on the spine switches might be already in use to support the existing InfiniBand topology. In these cases, connect only the cable on the leaf switch in the new rack and stop there for now. Make note of which cables you were not able to terminate.

    Do not unplug any cables on the spine switch in the existing rack at this point. Step 5 describes how to re-cable the leaf switches on the existing racks (one leaf switch after the other - while the leaf switch being re-cabled will be powered off), which will free up these currently in-use ports. At that point, you can connect the other end of the cable from the leaf switch in the new rack to the spine switch in the existing rack as indicated in the table.

  5. Complete the following procedure for each of the original racks:
    In these steps, Rx represents a rack number from R1 to Rn.
    1. Power off leaf switch Rx IB2.
      This causes all servers in the rack to fail over their InfiniBand traffic to Rx IB3.
    2. Cable leaf switch Rx IB2 according to Multi-Rack Cabling Tables.
    3. Power on leaf switch Rx IB2.
    4. Wait at least three minutes for Rx IB2 to become completely operational.

      To check the switch, log in to the switch and run the ibswitches command. The output should show n*3 switches for IB1, IB2, and IB3 in racks R1, R2, ... Rn.

    5. Power off leaf switch Rx IB3.
      This causes all servers in the rack to fail over their InfiniBand traffic to Rx IB2.
    6. Cable leaf switch Rx IB3 according to Multi-Rack Cabling Tables.
    7. Power on leaf switch Rx IB3.
    8. Wait at least three minutes for Rx IB3 to become completely operational.

      To check the switch, log in to the switch and run the ibswitches command. The output should show n*3 switches for IB1, IB2, and IB3 in racks R1, R2, ... Rn.

    All racks should now be rewired according to Multi-Rack Cabling Tables.

  6. Power on all the InfiniBand switches in the new rack.
  7. Wait three minutes for the switches to become completely operational.

    To check the switch, log in to the switch and run the ibswitches command. The output should show (n+1)*3 switches for IB1, IB2, and IB3 in racks R1, R2, ... Rn+1.

  8. Ensure that the Subnet Manager Master is running on R1 IB1 by running the getmaster command from any switch.
  9. Power on all servers in the new rack (Rn+1).
  10. Log in to spine switch R1 IB1, and lower its priority to 8.
    1. Use the disablesm command to stop Subnet Manager.
    2. Use the setsmpriority 8 command to set the priority to 8.
    3. Use the enablesm command to restart Subnet Manager.
  11. Ensure that the Subnet Manager Master is running on one of the spine switches using the getmaster command from any switch.
  12. Ensure that the Subnet Manager is running on every spine switch by entering the following command from any switch:
    ibdiagnet -r

    Each spine switch should show as running in the Summary Fabric SM-state-priority section of the output. If a spine switch is not running, then log in to the switch and enable the Subnet Manager using the enablesm command.

  13. If there are now four or more racks, then log in to the leaf switches in each rack and disable Subnet Manager using the disablesm command.