4.9.3 Changing InfiniBand Network Information

This procedure describes how to change the InfiniBand network information.

The procedure described in this section is based on the following assumptions:

  • All changes should be done as the ilom-admin user using the Integrated Lights Out Manager (ILOM) interface.

  • Channel bonding is used for the client access network, such that the NET1 and NET2 interfaces are bonded to create BONDETH0. If channel bonding is not used, then replace BONDETH0 with NET1 in the procedure.

  • On Oracle Exadata X4-2 and later hardware, as of Oracle Exadata System Software release 11.2.3.3.0, the name used for InfiniBand bonding changed from BONDIB0 to IB0 and IB1. These interfaces are changed the same way as the ifcfg-bondib0 interface.

  • As of Oracle Exadata System Software release 11.2.2.1.0, the names used for bonding changed. The names are BONDIB0 for the InfiniBand bonding and BONDETH0 for Ethernet bonding. In earlier releases, the names were BOND0 and BOND1, respectively.

  • The procedure uses the dcli utility and the root user. This significantly reduces the overall time to complete the procedure by running the commands in parallel on the database servers.

  • The dcli utility requires SSH user-equivalence. If SSH user-equivalence is not configured, then some commands must be run explicitly on each database server.

  • The database group file, dbs_group, must exist and be located in the /root directory.

  • Ensure recent backups of the Oracle Cluster Registry (OCR) exist before changing the InfiniBand network information. OCR backups are located in the $Grid_home/cdata/cluster-name directory, where Grid_home represents the location of your Oracle Grid Infrastructure software installation.

  • Starting with Oracle Grid Infrastructure 11g release 2 (11.2), the private network configuration is stored in the Grid Plug and Play (GPNP) profile as well as the OCR. If the GPNP definition is not correct, then Oracle Clusterware CRS does not start. Take a backup of the GPNP profile on all nodes before changing the InfiniBand network information using the following commands:

    $ cd $Grid_home/gpnp/hostname/profiles/peer/
    $ cp -p profile.xml profile.xml.bk
    
  1. Determine if the CLUSTER_INTERCONNECT parameter is used in the Oracle Database and Oracle ASM instances.
    SQL> SELECT inst_id, name,value FROM gv$parameter WHERE name = \
    'cluster_interconnects';

    If the CLUSTER_INTERCONNECT parameter is set in OCR, then no value is returned. If the CLUSTER_INTERCONNECT parameter is defined in the server parameter file (SPFILE), then the query returns an IP addresses for each instance, and they need to be changed to new IP addresses.

    The following is an example of the commands to change the IP addresses for the Oracle ASM instances. In the example, the IP address 192.168.10.1 is the new IP address assigned to BONDIB0 on the server where the +ASM1 instance runs, 192.168.10.2 is the IP address for BONDIB0 on the server where the +ASM2 instance runs, and so on.

    ALTER SYSTEM SET CLUSTER_INTERCONNECTS='192.168.10.1' SCOPE=SPFILE SID='+ASM1';
    ALTER SYSTEM SET CLUSTER_INTERCONNECTS='192.168.10.2' SCOPE=SPFILE SID='+ASM2';
    ALTER SYSTEM SET CLUSTER_INTERCONNECTS='192.168.10.3' SCOPE=SPFILE SID='+ASM3';
    ...

    Use a similar command to change the IP addresses for each Oracle Database instance that was returned.

  2. Verify the assignment of the new InfiniBand network information for all servers.
    Verification should include the InfiniBand IP addresses, netmask, broadcast, and network IP information.
  3. Shut down all cluster-managed services on each database server as the oracle user.
    $ srvctl stop home -o db_home -s state_filename -n node_name
    

    In the preceding command, db_home is the full directory name for the Oracle Database home directory, state_filename is the path name where you want the state file to be written, and node_name is the name of the database server. The following is an example of the command:

    $ srvctl stop home -o /u01/app/oracle/product/11.2.0.3/dbhome_1 -s \
    /tmp/dm02db01_dbhome -n dm02db01
    

    In the preceding example, /u01/app/oracle/product/11.2.0.3/dbhome_1 is the Oracle Database home directory, /tmp/dm02db01_dbhome is the state file name, and dm02db01 is the name of the database server.

  4. Modify the cluster interconnect interface to use the BONDIB0 interface on the first database server.

    Note:

    At this point, only Oracle Clusterware, Oracle Clusterware CRS, and Oracle ASM instances are started.
    1. Log in as the oracle user.
    2. Set $ORACLE_HOME to the Oracle Grid Infrastructure home.
    3. Set the base for the ORACLE_SID environment variable.
      The ORACLE_HOME environment variable must be set to the Oracle Grid Infrastructure home.
      $ ORACLE_SID=+ASM1
      
    4. List the available cluster interfaces.
      $ oifcfg iflist
      

      The following is an example of the output:

      bondeth0 10.128.174.160
      bondeth1 10.128.176.0
      eth0 10.128.174.128
      ib0 192.168.160.0
      ib0 169.254.0.0
      ib1 192.168.160.0
      ib1 169.254.128.0
      
    5. List the currently-assigned cluster interfaces.
      $ oifcfg getif
      

      The following is an example of the output:

      bondeth0 10.204.76.0 global public
      ib0 192.168.16.0 global cluster_interconnect,asm
      ib1 192.168.16.0 global cluster_interconnect,asm
      
    6. Assign the ib0 and ib1 interfaces new IP addresses as global cluster interconnect interfaces.
      oifcfg setif -global ib0/192.168.8.0:cluster_interconnect
      oifcfg setif -global ib1/192.168.8.0:cluster_interconnect
    7. List the current interfaces.
      $ oifcfg getif
      

      The following is an example of the output:

      bondeth0 10.128.174.160 global public
      ib0 192.168.8.0 global cluster_interconnect
      ib1 192.168.8.0 global cluster_interconnect
      

      The old private interface is removed at a later time.

  5. Shut down Oracle Clusterware and Oracle Clusterware CRS on each database server.
    1. Log in as the root user.
    2. Shut down Oracle Clusterware CRS on each database server using the following command:
      # Grid_home/grid/bin/crsctl stop crs -f
      
    3. Disable automatic Oracle Clusterware CRS restart on each database server.
      # Grid_home/grid/bin/crsctl disable crs
      
  6. Change the InfiniBand IP addresses on each Oracle Exadata Storage Server.
    1. Log in as the root user.
    2. Shut down the cell services.
      # cellcli -e alter cell shutdown services all
        Stopping the RS, CELLSRV, and MS services...  The SHUTDOWN of services was successful.
    3. Run the ipconf command.

      The following is an example of the prompts and responses for the ipconf command. Changes are applied after the prompt for basic Integrated Lights Out Manager (ILOM) settings.

      # ipconf
      
      Logging started to /var/log/cellos/ipconf.log
      Interface ib0 is Linked.  hca: mlx4_0
      Interface ib1 is Linked.  hca: mlx4_0
      Interface eth0 is Linked.  driver/mac: ixgbe/00:00:00:00:cd:01
      Interface eth1 is ... Unlinked.  driver/mac: ixgbe/00:00:00:00:cd:02
      Interface eth2 is ... Unlinked.  driver/mac: ixgbe/00:00:00:00:cd:03
      Interface eth3 is ... Unlinked.  driver/mac: ixgbe/00:00:00:00:cd:04
       
      Network interfaces
      Name     State      IP address      Netmask         Gateway         Net type     Hostname
      ib0      Linked
      ib1      Linked
      eth0     Linked
      eth1     Unlinked
      eth2     Unlinked
      eth3     Unlinked
      Warning. Some network interface(s) are disconnected. Check cables and switches and retry
      Do you want to retry (y/n) [y]: n
       
      The current nameserver(s): 192.0.2.10 192.0.2.12 192.0.2.13
      Do you want to change it (y/n) [n]:
      The current timezone: America/Los_Angeles
      Do you want to change it (y/n) [n]:
      The current NTP server(s): 192.0.2.06 192.0.2.12 192.0.2.13
      Do you want to change it (y/n) [n]:
       
      Network interfaces
      Name     State           IP address    Netmask        Gateway       Net type            Hostname
      eth0     Linked       192.0.2.151  255.255.252.0 192.0.2.15    Management   myg.example.com
      eth1     Unlinked
      eth2     Unlinked
      eth3     Unlinked
      bondib0  ib0,ib1      192.168.13.101 255.255.252.0  Private             myg-priv.example.com
      Select interface name to configure or press Enter to continue: bondib0
      Selected interface. bondib0
      IP address or none [192.168.13.101]: 192.168.10.3
      Netmask [255.255.252.0]:255.255.248.0
      Fully qualified hostname or none [myg-priv.example.com]:
      Continue configuring or re-configuring interfaces? (y/n) [y]: n
       
      Select canonical hostname from the list below
      1: myg.example.com
      2: myg-priv.example.com 
      Canonical fully qualified domain name [1]:
       
      Select default gateway interface from the list below
      1: eth0
      Default gateway interface [1]:
       
      Canonical hostname: myg.example.com
      Nameservers: 192.0.2.10 192.0.2.12 192.0.2.13
      Timezone: America/Los_Angeles
      NTP servers: 192.0.2.06 192.0.2.12 192.0.2.13
      Default gateway device: eth0
      Network interfaces
      Name     State      IP address      Netmask         Gateway         Net type     Hostname
      eth0     Linked     192.0.2.151   255.255.252.0 192.0.2.15     Management   myg.example.com
      eth1     Unlinked
      eth2     Unlinked
      eth3     Unlinked
      bondib0  ib0,ib1    192.168.10.3    255.255.248.0                   Private      myg-priv.example.com
      Is this correct (y/n) [y]:
       
      Do you want to configure basic ILOM settings (y/n) [y]: n
      
      Starting the RS services...
      Getting the state of RS services...  running
       
      Starting MS services...
      The STARTUP of MS services was successful.
      A restart of all services is required to put new network configuration into
      effect. MS-CELLSRV communication may be hampered until restart.
      Cell myg successfully altered
       
      Stopping the RS, CELLSRV, and MS services...
      The SHUTDOWN of services was successful.
      ipaddress1=192.168.10.3/21
      
    4. Restart the Oracle Exadata Storage Server.
      # shutdown -r now
  7. Restart the cell services.
    # cellcli -e alter cell restart services all
    
  8. Verify the newly-assigned InfiniBand address on Oracle Exadata Storage Server.
    # cellcli -e list cell detail | grep ipaddress1
    

    The following is an example of the output:

    ipaddress1: 192.168.10.3/21
    
  9. Change the InfiniBand IP addresses on each database server.
    1. Log in as the root user.
    2. Change to the /etc/sysconfig/network-scripts directory.
    3. Copy the ifcfg-bondib0 file.

      The copied file name must not start with ifcfg.

      # cp ifcfg-bondib0 orig_ifcfg-bondib0
      
    4. Edit the ifcfg-bondib0 file to update the IPADDR, NETMASK, NETWORK and BROADCAST fields.

      Example of original ifcfg-bondib0 file:

      #### DO NOT REMOVE THESE LINES ####
      #### %GENERATED BY CELL% ####
      DEVICE=bondib0
      USERCTL=no
      BOOTPROTO=none
      ONBOOT=yes
      IPADDR=192.168.20.8
      NETMASK=255.255.248.0
      NETWORK=192.168.16.0
      BROADCAST=192.168.23.255
      BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000 updelay=5000"
      IPV6INIT=no
      MTU=65520
      

      Example of updated ifcfg-bondib0 file:

      #### DO NOT REMOVE THESE LINES ####
      #### %GENERATED BY CELL% ####
      DEVICE=bondib0
      USERCTL=no
      BOOTPROTO=none
      ONBOOT=yes
      IPADDR=192.168.10.8
      NETMASK=255.255.248.0
      NETWORK=192.168.8.0
      BROADCAST=192.168.15.255
      BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000 updelay=5000"
      IPV6INIT=no
      MTU=65520
      

      Note:

      The MTU size for the InfiniBand interfaces on the database servers should be set as follows:

      • For Oracle Exadata System Software release 11.2.3.3 and later, set the MTU size to 7000.

      • For Oracle Exadata System Software releases earlier than release 11.2.3.3, set the MTU size to 65520 to ensure a high transfer rate to external devices using TCP/IP over InfiniBand such as media servers or NFS servers.

    5. Restart the database server.
      # shutdown -r now
      
    6. Verify the InfiniBand IP address information.
      # ifconfig -a
      

      The following is an example of the BONDIB0 information. It shows the updated InfiniBand network information:

      inet addr:192.168.10.8 Bcast:192.168.15.255 Mask:255.255.248.0
      
  10. Update the cellinit.ora and cellip.ora files on each database server.

    Note:

    Do not edit the cellinit.ora or cellip.ora files when the database or Oracle ASM instance are running. To make changes to the files, perform a procedure similar to the following:

      1. Create a copy of the file.

        cp cellinit.ora cellinit.new
      2. Edit the cellinit.new file with a text editor.

      3. Replace the old cellinit.ora file with the updated cellinit.new file.

        mv cellinit.new cellinit.ora
    1. Log in as the root user.
    2. Change to the /etc/oracle/cell/network-config directory.
    3. Make a backup copy of the cellip.ora file.
      # cp cellip.ora orig_cellip.ora
      

      Note:

      If you are using SSH user-equivalence, then the dcli utility can be used. The following is an example of the dcli command:

      # dcli -l root -g /root/dbs_group "cp cellip.ora orig_cellip.ora"
    4. Make a backup copy of the cellinit.ora file.

      The following is an example of the command:

      # cp cellinit.ora orig_cellinit.ora
      

      Note:

      If you are using SSH user-equivalence, then the dcli utility can be used. The following is an example of the dcli command:

      # dcli -l root -g /root/dbs_group "cp cellinit.ora \
      orig_cellinit.ora"
    5. Change the InfiniBand IP addresses in the cellip.ora file.

      Example of original file:

      cell="192.168.20.1"
      cell="192.168.20.2"
      cell="192.168.20.3"
      cell="192.168.20.4"
      cell="192.168.20.5"
      cell="192.168.20.6"
      cell="192.168.20.7"
      

      Example of updated file:

      cell="192.168.10.1"
      cell="192.168.10.2"
      cell="192.168.10.3"
      cell="192.168.10.4"
      cell="192.168.10.5"
      cell="192.168.10.6"
      cell="192.168.10.7"
      

      Note:

      If you are using SSH user-equivalence, then the dcli utility can be used to copy the updated file from the first database server to the other database servers. The following is an example of using the dcli command:

      # dcli -l root -g /root/dbs_group -f \
      /etc/oracle/cell/network-config/cellip.ora 
      
      # dcli -l root -g /root/dbs_group "mv /root/cellip.ora \
      /etc/oracle/cell/network-config/"
    6. Change the InfiniBand IP addresses in the cellinit.ora file.

      The file is updated with the subnet ID and its subnet mask.

      Example of original file:

      ipaddress="192.168.20.8/21"
      

      Example of updated file:

      ipaddress="192.168.10.8/21"
      

      Update the cellinit.ora file on each database server. The contents of the file is specific to the database server. The dcli utility cannot be used for this step.

    7. Run the ALTER DBSERVER command on each database server to update the /etc/oracle/cell/network-config/cellinit.ora file.
      # dbmcli -e alter dbserver interconnect1 = "ib0"
      # dbmcli -e alter dbserver interconnect2 = "ib1"
      # dbmcli -e alter dbserver interconnect3 = "ib2"
      # dbmcli -e alter dbserver interconnect4 = "ib3"
      
  11. Update the /etc/hosts file on each database server and Oracle Exadata Storage Servers to use the new InfiniBand IP addresses.
    1. Log in as the root user.
    2. Make a backup copy of the /etc/hosts file.
      # cp /etc/hosts /etc/orig_hosts
      
    3. Change the InfiniBand IP addresses for the database servers and Oracle Exadata Storage Server files.
  12. Start Oracle Clusterware as the root user on each server.
    # Grid_home/grid/bin/crsctl start crs
    
  13. Verify the cluster interconnect is using the RDS protocol on each database server by examining the Oracle ASM alert.log.
    The log is in the directory/u01/app/oracle/diag/asm/+asm/+ASM1/trace. An entry similar to the following should be listed for the most-recent Oracle ASM restart:
    CELL interconnect IPC version: Oracle RDS/IP (generic)
    

    For Oracle Exadata System Software releases 11.2.0.2 and later, the following command can be used to verify cluster interconnect. The command is run as the oracle user on each database server.

    $ORACLE_HOME/bin/skgxpinfo
    

    The output from the command should be rds.

    If the instance is not using the RDS protocol over InfiniBand, then relink the Oracle software using the following steps:

    Note:

    Do not use the relink all command to relink the Oracle software.
    1. As the oracle user, shut down any processes using Oracle software.
    2. If you are relinking the Oracle Grid Infrastructure home, then as the root user, run one of the following commands. Do not perform this step if you are not relinking the Oracle Grid Infrastructure home.
      • For Oracle Grid Infrastructure release 12.2.0.1 or higher:

        # Grid_home/crs/install/rootcrs.sh -unlock
      • For Oracle Grid Infrastructure release 12.1.0.1 or 12.1.0.2:

        # Grid_home/crs/install/rootcrs.pl -unlock
        
    3. As the oracle user, change to the $ORACLE_HOME/rdbms/lib directory.
    4. As the oracle user, run the following command:
      $ make -f ins_rdbms.mk ipc_rds ioracle
      
    5. If you are relinking the Oracle Grid Infrastructure home, then as the root user, run one of the following commands. Do not perform this step if you are not relinking the Oracle Grid Infrastructure home.
      • For Oracle Grid Infrastructure release 12.2.0.1 or higher:

        # Grid_home/crs/install/rootcrs.sh -unlock
        # Grid_home/bin/crsctl start crs
      • For Oracle Grid Infrastructure release 12.1.0.1 or 12.1.0.2:

        # Grid_home/crs/install/rootcrs.pl -patch
        
  14. Start all cluster-managed services using the SRVCTL utility.
    1. Log in as the oracle user.
    2. Start the database using the following command, where Oracle_home is your Oracle home directory:
      $ srvctl start home -o Oracle_home \
      -s /tmp/dm02db01_dbhome -n dm02db01
      
    3. Verify the database instances are running.
      $ srvctl status database -d dbm
      
  15. Verify the Oracle ASM and database instances are using the new network settings.
    1. Log in to an Oracle ASM and database instance using SQL*Plus.
    2. Query the cluster interconnect information.
      SQL> SELECT inst_id, name,value FROM gv$parameter WHERE name = \
      'cluster_interconnects';
  16. Delete the old private network.
    $ oifcfg delif -global bondib0/192.168.16.0
    
  17. Verify that the old interface is not present.
    $ oifcfg getif
    bondeth0  10.204.76.0  global public
    bondib0   192.168.8.0  global cluster_interconnect
    
  18. Enable Oracle Clusterware CRS automatic restart on each database server.
    1. Log in as the root user.
    2. Enable Oracle Clusterware CRS.
      # Grid_home/grid/bin/crsctl enable crs
      

      Note:

      To use the dcli utility to enable Oracle Clusterware CRS.

      # dcli -l root -g dbs_group "Grid_home/grid/bin/crsctl \
      enable crs"
  19. Perform a full restart of Oracle Clusterware on all nodes.
  20. Perform a health check of Oracle Exadata Rack using the steps described in My Oracle Support Doc ID 1070954.1.

    Note:

    Exachk utility collects data for key software, hardware, and firmware releases, and configuration best practices for Oracle Exadata Rack.

    Oracle recommends you periodically review the current data for key components of Oracle Exadata Rack, and compare them to the supported release levels, and recommended best practices.

    Exachk is not a database, network, or SQL performance analysis tool. It is not a continuous monitoring utility, and does not duplicate other monitoring or alerting tools, such as ILOM, or Oracle Enterprise Manager Cloud Control.

  21. Verify the private network configuration using the clusterware verification utility, cluvfy.