Chapter 7 Troubleshooting

This chapter describes how to resolve a number of common problem scenarios.

7.1 Setting the Oracle Private Cloud Appliance Logging Parameters

When troubleshooting or if you have a support query open, you may be required to change the logging parameters for your Oracle Private Cloud Appliance. The settings for this are contained in /etc/ovca.conf, and can be changed using the CLI.

The following instructions must be followed for each of the two management nodes in your environment.

Changing the Oracle Private Cloud Appliance Logging Parameters for a Management Node
  1. Gain command line access to the management node. Usually this is achieved using SSH and logging in as the root user with the global Oracle Private Cloud Appliance password.

  2. Use the CLI, as described in Chapter 4, The Oracle Private Cloud Appliance Command Line Interface (CLI), to view or modify your appliance log settings. The CLI safely reads and edits the /etc/ovca.conf file, to prevent the possibility of configuration file corruption.

    • To view the current values for the configurable settings in the configuration file run the CLI as follows:

      # pca-admin show system-properties
    • To change the log level:

      # pca-admin set system-property log_level service LEVEL

      The service argument is the log file category to which the new log level applies. The following services can be specified: backup, cli diagnosis, monitor, ovca, snmp, syncservice.

      The LEVEL value is one of the following: DEBUG, INFO, WARNING, ERROR, CRITICAL.

    • To change the log file size:

      # pca-admin set system-property log_size SIZE

      Where SIZE, expressed in MB, is a number from 1 to 512.

    • To change the number of backup log files stored:

      # pca-admin set system-property log_count COUNT

      Where COUNT is a number of files ranging from 0 to 100.

    • To change the location where log files are stored:

      # pca-admin set system-property log_file service PATH

      Where PATH is the new location where the log file for the selected service is to be stored. The following services can be specified: backup, cli, diagnosis, monitor, ovca, snmp, and syncservice.

      Caution

      Make sure that the new path to the log file exists. Otherwise, the log server stops working.

      The system always prepends /var/log to your entry. Absolute paths are converted to /var/log/PATH.

      During management node upgrades, the log file paths are reset to the default values.

  3. The new log level setting only takes effect after a management node has been rebooted or the service has been restarted by running the service ovca restart command on the active management node shell.

7.2 Adding Proxy Settings for Oracle Private Cloud Appliance Updates

If your data center does not provide unlimited internet access and has a proxy server in place to control HTTP, HTTPS or FTP traffic, you may need to configure your management nodes to be able to access external resources; for example for the purpose of performing software updates.

The following instructions must be followed for each of the two management nodes in your environment.

Adding Proxy Settings for a Management Node
  1. Gain command line access to the management node. Usually this is achieved using SSH and logging in as the root user with the global Oracle Private Cloud Appliance password.

  2. Use the CLI, as described in Chapter 4, The Oracle Private Cloud Appliance Command Line Interface (CLI), to view or modify your proxy settings. The CLI safely reads and edits the /etc/ovca.conf file, to prevent the possibility of configuration file corruption.

    • To view the current values for the configurable settings in the configuration file run the CLI as follows:

      # pca-admin show system-properties
    • To set an HTTP proxy:

      # pca-admin set system-property http_proxy http://IP:PORT

      Where IP is the IP address of your proxy server, and PORT is the TCP port on which it is listening.

      Caution

      If your proxy server expects a user name and password, these should be provided when the proxy service is accessed. Do not specify credentials as part of the proxy URL, because this implies that you send sensitive information over a connection that is not secure.

    • To set an HTTPS proxy:

      # pca-admin set system-property https_proxy https://IP:PORT
    • To set an FTP proxy:

      # pca-admin set system-property ftp_proxy ftp://IP:PORT
  3. Setting any single parameter automatically rewrites the configuration file and the proxy settings become active immediately.

7.3 Configuring Data Center Switches for VLAN Traffic

Warning

This section applies only to systems with an InfiniBand-based network architecture. The configuration described in this section is valid for the outbound connections through the Oracle Fabric Interconnect F1-15s .

The Oracle Private Cloud Appliance network infrastructure supports the use of VLANs by default. For this purpose, the Oracle Fabric Interconnect F1-15s are set to trunking mode to allow tagged data traffic.

Caution

Do not configure any type of link aggregation group (LAG) across the 10GbE ports: LACP, network/interface bonding or similar methods to combine multiple network connections are not supported.

To provide additional bandwidth to the environment hosted by the Oracle Private Cloud Appliance, create custom networks. For detailed information, see Section 2.6, “Network Customization”.

You may implement VLANs for logical separation of different network segments, or to define security boundaries between networks with different applications – just as you would with physical servers instead of virtual machines.

However, to allow virtual machines hosted by the Oracle Private Cloud Appliance to communicate with systems external to the appliance, you must update the configuration of your next-level data center switches accordingly.

  • The switch ports on the receiving end of the outbound appliance connections must be part of each VLAN used within the Oracle Private Cloud Appliance environment.

  • The same ports must also be part of the network(s) connecting the external systems that your virtual machines need to access. For example, WAN connectivity implies that virtual machines are able to reach the public gateway in your data center. As an alternative to VLAN tagging, Layer 3 routing can be used to connect to the Oracle Private Cloud Appliance.

7.4 Changing the Oracle VM Agent Password

The password of the Oracle VM Agent cannot be modified in the Authentication tab of the Oracle Private Cloud Appliance Dashboard, nor with the update password command of the Oracle Private Cloud Appliance CLI. If you need to change the agent password, use Oracle VM Manager.

Instructions to change the Oracle VM Agent password can be found at the following location: Change Oracle VM Agent Passwords on Oracle VM Servers in the Oracle VM Manager User's Guide for Release 3.4.

7.5 Running Manual Pre- and Post-Upgrade Checks in Combination with Oracle Private Cloud Appliance Upgrader

Controller software updates must be installed using the Oracle Private Cloud Appliance Upgrader. While the Upgrader tool automates a large number of prerequisite checks, there are still some tasks that must be performed manually before and after the upgrade process. The manual tasks are listed in this section. For more detailed information, please refer to the support note with Doc ID 2442664.1 for Controller Software release 2.3.4, or support note Doc ID 2605884.1 for Controller Software release 2.4.2.

Start by running the Oracle Private Cloud Appliance Upgrader in verify-only mode. The steps are described in Section 3.2.3, “Verifying Upgrade Readiness”. Fix any issues reported by the Upgrader and repeat the verification procedure until all checks complete without errors. Then, proceed to the manual pre-upgrade checks.

Performing Manual Pre-Upgrade Checks
  1. Verify the WebLogic password.

    On the active Management Node, run the following commands:

    # cd /u01/app/oracle/ovm-manager-3/bin
    # ./ovm_admin --listusers

    Enter the WebLogic password when prompted. If the password is incorrect, the ovm_admin command fails and exits with return code 1. If the password is correct, the command lists the users and exits with return code of 0. In the event of an incorrect password, login to the Oracle Private Cloud Appliance web interface and change the wls-weblogic password to the expected password.

  2. Check that no external storage LUNs are connected to the management nodes.

    Verify that none of your external storage LUNs are visible from either management node. For more details, refer to the support note with Doc ID 2148589.1.

    If your system is InfiniBand-based and there are no Fibre Channel cards installed in the Fabric Interconnects , you can skip this check.

  3. Check for customized inet settings on the management nodes.

    Depending on the exact upgrade path you are following, xinetd may be upgraded. In this case, modified settings are automatically reset to default. Make a note of your custom inet settings and verify them after the upgrade process has completed. These setting changes are stored in the file /etc/postfix/main.cf.

  4. Register the number of objects in the MySQL database.

    As the root user on the active management node, download and run the script number_of_jobs_and_objects.sh. It is attached to the support note with Doc ID 2442664.1 for Controller Software release 2.3.4, or support note Doc ID 2605884.1 for Controller Software release 2.4.2. It returns the number of objects and the number of jobs in the database. Make a note of these numbers.

  5. Verify management node failover.

    Reboot the active management node to ensure that the standby management node is capable of taking over the active role.

  6. Check the NFS protocol used for the internal ZFS Storage Appliance.

    On both management nodes, run the command nfsstat -m. Each mounted share should use the NFSv4 protocol.

  7. Check the file /etc/yum.conf on both management nodes.

    If a proxy is configured for YUM, comment out or remove that line from the file.

When you have submitted your system to all pre-upgrade checks and you have verified that it is ready for upgrade, execute the controller software update. The steps are described in Section 3.2.4, “Executing a Controller Software Update”. After successfully upgrading the controller software, proceed to the manual post-upgrade checks for management nodes and compute nodes.

Performing Manual Post-Upgrade Checks on the Management Nodes
  1. Check the names of the Unmanaged Storage Arrays.

    If the names of the Unmanaged Storage Arrays are no longer displayed correctly after the upgrade, follow the workaround documented in the support note with Doc ID 2244130.1.

  2. Check for errors and warnings in Oracle VM.

    In the Oracle VM Manager web UI, verify that none of these occur:

    • Padlock icons against compute nodes or storage servers

    • Red error icons against compute nodes, repositories or storage servers

    • Yellow warning icons against compute nodes, repositories or storage servers

  3. Check the status of all components in the Oracle Private Cloud Appliance Dashboard.

    Verify that a green check mark appears to the right of each hardware component in the Hardware View, and that no red error icons are present.

  4. Check networks.

    Verify that all networks – factory default and custom – are present and correctly configured.

Performing Manual Post-Upgrade Checks on the Compute Nodes
  1. Change the min_free_kbytes setting on all compute nodes.

    Refer to the support note with Doc ID 2314504.1. Apply the corresponding steps and reboot the compute node after the change has been made permanent.

  2. Check that the fm package is installed on all compute nodes.

    Run the command rpm -q fm. If the package is not installed, run the following command:

    # chkconfig ipmi on; service ipmi start; LFMA_UPDATE=1 /usr/bin/yum install fm -q -y -\-nogpgcheck
  3. Perform a virtual machine test.

    Start a test virtual machine and verify that networks are functioning. Migrate the virtual machine to a compatible compute node to make sure that live migration works correctly.

7.6 Enabling Fibre Channel Connectivity on a Provisioned Appliance

Warning

This section applies only to systems with an InfiniBand-based network architecture. The configuration described in this section is valid for the I/O modules in the Oracle Fabric Interconnect F1-15s .

However, for Oracle Server X8-2 and newer compute nodes, Fibre Channel connectivity through the Fabric Interconnects is not supported. Instead, you must use the (optional) physical FC HBA expansion cards. Refer to the section Extending Storage Capacity of Ethernet-based Systems in the Oracle Private Cloud Appliance Installation Guide.

If you ordered an Oracle Private Cloud Appliance without factory-installed Fibre Channel I/O modules and you decide to add external Fibre Channel storage at a later time, when the rack has already been provisioned, your installation must meet these requirements:

  • The Oracle Private Cloud Appliance controller software must be at Release 2.1.1 or later.

  • A total of four Fibre Channel I/O modules must be installed in slots 3 and 12 of each Oracle Fabric Interconnect F1-15.

  • Storage clouds and vHBAs must be configured manually.

Installation information for the optional Fibre Channel I/O modules can be found in the section entitled Extending Oracle Private Cloud Appliance - Additional Storage in the Oracle Private Cloud Appliance Installation Guide. This section provides detailed CLI instructions to configure the storage clouds and vHBAs associated with Fibre Channel connectivity.

Configuring Storage Clouds and vHBAs for Fibre Channel Connectivity
  1. Using SSH and an account with superuser privileges, log into the active management node.

    Note

    The data center IP address used in this procedure is an example.

    # ssh root@10.100.1.101
    root@10.100.1.101's password:
    [root@ovcamn05r1 ~]#
  2. Launch the Oracle Private Cloud Appliance CLI in interactive mode.

    # pca-admin 
    Welcome to PCA! Release: 2.3.2
    PCA>
  3. Verify that no storage clouds or vHBAs exist yet.

    PCA> list storage-network
    
    Network_Name              Description
    ------------              -----------
    ----------------
    0 rows displayed
    Status: Success
    
    PCA> list wwpn-info
    
    WWPN                     vHBA     Cloud_Name     Server       Type  Alias
    -------------            ----     -----------    ---------    ----- --------------
    -----------------
    0 rows displayed
    Status: Success
  4. Configure the vHBAs on both management nodes.

    PCA> configure vhbas ovcamn05r1 ovcamn06r1
    
    Compute_Node         Status         
    ------------         ------         
    ovcamn05r1           Succeeded
    ovcamn06r1           Succeeded    
    ----------------
    2 rows displayed
    
    Status: Success
  5. Verify that the clouds have been configured.

    PCA> list storage-network
    
    Network_Name              Description
    ------------              -----------
    Cloud_A                   Default Storage Cloud ru22 port1 - Do not delete or modify
    Cloud_B                   Default Storage Cloud ru22 port2 - Do not delete or modify
    Cloud_C                   Default Storage Cloud ru15 port1 - Do not delete or modify
    Cloud_D                   Default Storage Cloud ru15 port2 - Do not delete or modify
    ----------------
    4 rows displayed
    
    Status: Success
  6. If the 4 storage clouds have been configured correctly, configure the vHBAs on all compute nodes.

    PCA> configure vhbas ALL
    
    Compute_Node         Status         
    ------------         ------         
    ovcacn07r1           Succeeded
    ovcacn08r1           Succeeded
    [...]
    ovcacn36r1           Succeeded
    ovcacn37r1           Succeeded
    ----------------
    20 rows displayed
    
    Status: Success
  7. Verify that all clouds and vHBAs have been configured correctly.

    PCA> list wwpn-info
    
    WWPN                     vHBA     Cloud_Name     Server       Type  Alias
    -------------            ----     -----------    ---------    ----- --------------
    50:01:39:70:00:4F:91:00  vhba01   Cloud_A        ovcamn05r1   MN    ovcamn05r1-Cloud_A
    50:01:39:70:00:4F:91:02  vhba01   Cloud_A        ovcamn06r1   MN    ovcamn06r1-Cloud_A
    50:01:39:70:00:4F:91:04  vhba01   Cloud_A        ovcacn07r1   CN    ovcacn07r1-Cloud_A
    50:01:39:70:00:4F:91:06  vhba01   Cloud_A        ovcacn08r1   CN    ovcacn08r1-Cloud_A
    [...]
    50:01:39:70:00:4F:F1:05  vhba04   Cloud_D        ovcacn35r1   CN    ovcacn35r1-Cloud_D
    50:01:39:70:00:4F:F1:03  vhba04   Cloud_D        ovcacn36r1   CN    ovcacn36r1-Cloud_D
    50:01:39:70:00:4F:F1:01  vhba04   Cloud_D        ovcacn37r1   CN    ovcacn37r1-Cloud_D
    -----------------
    88 rows displayed
    
    Status: Success
    PCA> show storage-network Cloud_A
    
    ----------------------------------------
    Network_Name         Cloud_A
    Description          Default Storage Cloud ru22 port1 - Do not delete or modify
    Ports                ovcasw22r1:12:1, ovcasw22r1:3:1
    vHBAs                ovcacn07r1-vhba01, ovcacn08r1-vhba01, ovcacn10r1-vhba01, [...]
    ----------------------------------------
    Status: Success
    
    PCA> show storage-network Cloud_B
    
    ----------------------------------------
    Network_Name         Cloud_B
    Description          Default Storage Cloud ru22 port2 - Do not delete or modify
    Ports                ovcasw22r1:12:2, ovcasw22r1:3:2
    vHBAs                ovcacn07r1-vhba02, ovcacn08r1-vhba02, ovcacn10r1-vhba02, [...]
    ----------------------------------------
    Status: Success
    
    PCA> show storage-network Cloud_C
    
    ----------------------------------------
    Network_Name         Cloud_C
    Description          Default Storage Cloud ru15 port1 - Do not delete or modify
    Ports                ovcasw15r1:12:1, ovcasw15r1:3:1
    vHBAs                ovcacn07r1-vhba03, ovcacn08r1-vhba03, ovcacn10r1-vhba03, [...]
    ----------------------------------------
    Status: Success
    
    PCA> show storage-network Cloud_D
    
    ----------------------------------------
    Network_Name         Cloud_D
    Description          Default Storage Cloud ru15 port2 - Do not delete or modify
    Ports                ovcasw15r1:12:2, ovcasw15r1:3:2
    vHBAs                ovcacn07r1-vhba04, ovcacn08r1-vhba04, ovcacn10r1-vhba04, [...]
    ----------------------------------------
    Status: Success
    

The system is now ready to integrate with external Fibre Channel storage. For detailed information and instructions, refer to the section entitled Adding External Fibre Channel Storage within Extending Oracle Private Cloud Appliance - Additional Storage in the Oracle Private Cloud Appliance Installation Guide.

7.7 Restoring a Backup After a Password Change

If you have changed the password for Oracle VM Manager or its related components Oracle WebLogic Server and Oracle MySQL database, and you need to restore the Oracle VM Manager from a backup that was made prior to the password change, the passwords will be out of sync. As a result of this password mismatch, Oracle VM Manager cannot connect to its database and cannot be started, so you must first make sure that the passwords are identical.

Note

The steps below are not specific to the case where a password changed occurred after the backup. They apply to any restore operation.

As of Release 2.3.1, which includes Oracle VM Manager 3.4.2, the database data directory cleanup is built into the restore process, so that step can be skipped.

Resolving Password Mismatches when Restoring Oracle VM Manager from a Backup
  1. Create a manual backup of the Oracle VM Manager MySQL database to prevent inadvertent data loss. On the command line of the active management node, run the following command:

    • Release 2.2.x and older:

      # /u01/app/oracle/ovm-manager-3/bin/createBackup.sh -n ManualBackup1
    • Release 2.3.1 and newer:

      # /u01/app/oracle/ovm-manager-3/ovm_tools/bin/BackupDatabase -w
      INFO:  Backup started to:
               /u01/app/oracle/mysql/dbbackup/ManualBackup-20190524_102412
  2. In the Oracle Private Cloud Appliance Dashboard, change the Oracle MySQL database password back to what it was at the time of the backup.

  3. On the command line of the active management node, as root user, stop the Oracle VM Manager and MySQL services, and then delete the MySQL data.

    # service ovmm stop
    # service ovmm_mysql stop
    # cd /u01/app/oracle/mysql/data
    # rm -rf appfw ibdata ib_logfile* mysql mysqld.err ovs performance_schema
  4. As oracle user, restore the database from the selected backup.

    • Release 2.2.x and older:

      # su oracle
      $ bash /u01/app/oracle/ovm-manager-3/ovm_shell/tools/RestoreDatabase.sh BackupToBeRestored
      INFO: Expanding the backup image...
      INFO: Applying logs to the backup snapshot...
      INFO: Restoring the backup...
      INFO: Success - Done!
      INFO: Log of operations performed is available at: 
            /u01/app/oracle/mysql/dbbackup/BackupToBeRestored/Restore.log
    • Release 2.3.1 and newer:

      # su oracle
      $ bash /u01/app/oracle/ovm-manager-3/ovm_tools/bin/RestoreDatabase.sh BackupToBeRestored
      INFO: Expanding the backup image...
      INFO: Applying logs to the backup snapshot...
      INFO: Restoring the backup...
      INFO: Success - Done!
      INFO: Log of operations performed is available at: 
            /u01/app/oracle/mysql/dbbackup/BackupToBeRestored/Restore.log
  5. As root user, start the MySQL and Oracle VM Manager services.

    $ su root
    # service ovmm_mysql start
    # service ovmm start

    After both services have restarted successfully, the restore operation is complete.

7.8 Enabling SNMP Server Monitoring

For troubleshooting or hardware monitoring, it may be useful to enable SNMP on the servers in your Oracle Private Cloud Appliance. While the tools for SNMP are available, the protocol is not enabled by default. This section explains how to enable SNMP with the standard Oracle Linux and additional Oracle Private Cloud Appliance Management Information Bases (MIBs).

Enabling SNMP on the Management Nodes
  1. Using SSH and an account with superuser privileges, log into the management node.

    Note

    The data center IP address used in this procedure is an example.

    # ssh root@10.100.1.101
    root@10.100.1.101's password:
    [root@ovcamn05r1 ~]#
  2. Locate the necessary rpm packages in the mounted directory /nfs/shared_storage/mgmt_image/Packages, which resides in the MGMT_ROOT file system on the ZFS storage appliance. The following packages are part of the Oracle Private Cloud Appliance ISO image:

    • net-snmp-5.5-60.0.1.el6.x86_64.rpm

    • net-snmp-libs-5.5-60.0.1.el6.x86_64.rpm

    • net-snmp-utils-5.5-60.0.1.el6.x86_64.rpm

    • ovca-snmp-0.9-3.el6.x86_64.rpm

    • lm_sensors-libs-3.1.1-17.el6.x86_64.rpm

  3. Install these packages by running the following command:

    # rpm -ivh ovca-snmp-0.9-3.el6.x86_64.rpm net-snmp-libs-5.5-49.0.1.el6.x86_64.rpm \
    net-snmp-5.5-49.0.1.el6.x86_64.rpm lm_sensors-libs-3.1.1-17.el6.x86_64.rpm \
    net-snmp-utils-5.5-49.0.1.el6.x86_64.rpm
  4. Create an SNMP configuration file: /etc/snmp/snmpd.conf.

    This is a standard sample configuration:

    rocommunity public
    syslocation MyDataCenter
    dlmod ovca /usr/lib64/ovca-snmp/ovca.so
  5. Enable the snmpd service.

    # service snmpd start
  6. If desired, enable the snmpd service on boot.

    # chkconfig snmpd on
  7. Open the SNMP ports on the firewall.

    # iptables -I INPUT -p udp -m udp --dport 161 -j ACCEPT
    # iptables -I INPUT -p udp -m udp --dport 162 -j ACCEPT
    # iptables-save > /etc/sysconfig/iptables

    SNMP is now ready for use on this management node. Besides the standard Oracle Linux MIBs, these are also available:

    • ORACLE-OVCA-MIB::ovcaVersion

    • ORACLE-OVCA-MIB::ovcaSerial

    • ORACLE-OVCA-MIB::ovcaType

    • ORACLE-OVCA-MIB::ovcaStatus

    • ORACLE-OVCA-MIB::nodeTable

    Usage examples:

    # snmpwalk  -v 1 -c public -O e 130.35.70.186 ORACLE-OVCA-MIB::ovcaVersion
    # snmpwalk  -v 1 -c public -O e 130.35.70.111 ORACLE-OVCA-MIB::ovcaStatus
    # snmpwalk  -v 1 -c public -O e 130.35.70.111 ORACLE-OVCA-MIB::nodeTable
  8. Repeat this procedure on the second management node.

Enabling SNMP on the Compute Nodes
Note

On Oracle Private Cloud Appliance compute nodes, net-snmp, net-snmp-utils and net-snmp-libs are already installed at the factory, but the SNMP service is not enabled or configured.

  1. Using SSH and an account with superuser privileges, log into the compute node. It can be accessed through the appliance internal management network.

    ssh root@192.168.4.5
    root@192.168.4.5's password:
    [root@ovcacn27r1 ~]#
  2. Create an SNMP configuration file: /etc/snmp/snmpd.conf and make sure this line is included:

    rocommunity public
    
  3. Enable the snmpd service.

    # service snmpd start

    SNMP is now ready for use on this compute node.

  4. If desired, enable the snmpd service on boot.

    # chkconfig snmpd on
  5. Repeat this procedure on all other compute nodes installed in your Oracle Private Cloud Appliance environment.

7.9 Using a Custom CA Certificate for SSL Encryption

By default, Oracle Private Cloud Appliance and Oracle VM Manager use a self-signed SSL certificate for authentication. While it serves to provide SSL encryption for all HTTP traffic, it is recommended that you obtain and install your own custom trusted certificate from a well-known and recognized Certificate Authority (CA).

Both the Oracle Private Cloud Appliance Dashboard and the Oracle VM Manager web interface run on Oracle WebLogic Server. The functionality to update the digital certificate and keystore is provided by the Oracle VM Key Tool in conjunction with the Java Keytool in the JDK. The tools are installed on the Oracle Private Cloud Appliance management nodes.

7.9.1 Creating a Keystore

If you do not already have a third-party CA certificate, you can create a new keystore. The keystore you create contains one entry for a private key. After you create the keystore, you generate a certificate signing request (CSR) for that private key and submit the CSR to a third-party CA. The CA then signs the CSR and returns a signed SSL certificate and a copy of the CA certificate, which you then import into your keystore.

Creating a Keystore with a Custom CA Certificate
  1. Using SSH and an account with superuser privileges, log into the management node.

    Note

    The data center IP address used in this procedure is an example.

    # ssh root@10.100.1.101
    root@10.100.1.101's password:
    [root@ovcamn05r1 ~]#
  2. Go to the security directory of the Oracle VM Manager WebLogic domain.

    # cd /u01/app/oracle/ovm-manager-3/domains/ovm_domain/security
  3. Create a new keystore. Transfer ownership to user oracle in the user group dba.

    # /u01/app/oracle/java/bin/keytool -genkeypair -alias ca -keyalg RSA -keysize 2048 \
    -keypass Welcome1 -storetype jks -keystore mykeystore.jks -storepass Welcome1
    # chown oracle.dba mykeystore.jks
  4. Generate a certificate signing request (CSR). Transfer ownership to user oracle in the user group dba.

    # /u01/app/oracle/java/bin/keytool -certreq -alias ca -file pcakey.csr \
    -keypass Welcome1 -storetype jks -keystore mykeystore.jks -storepass Welcome1
    # chown oracle.dba pcakey.csr
  5. Submit the CSR file to the relevant third-party CA for signing.

  6. For the signed files returned by the CA, transfer ownership to user oracle in the user group dba.

    # chown oracle.dba ca_cert_file
    # chown oracle.dba ssl_cert_file
  7. Import the signed CA certificate into the keystore.

    # /u01/app/oracle/java/bin/keytool -importcert -trustcacerts -noprompt -alias ca \
    -file ca_cert_file -storetype jks -keystore mykeystore.jks -storepass Welcome1
  8. Import the signed SSL certificate into the keystore.

    # /u01/app/oracle/java/bin/keytool -importcert -trustcacerts -noprompt -alias ca \
    -file ssl_cert_file -keypass Welcome1 -storetype jks -keystore mykeystore.jks \
    -storepass Welcome1
  9. Use the setsslkey command to configure the system to use the new keystore.

    # /u01/app/oracle/ovm-manager-3/ovm_upgrade/bin/ovmkeytool.sh setsslkey
    Path for SSL keystore: /u01/app/oracle/ovm-manager-3/domains/ovm_domain/security/mykeystore.jks
    Keystore password: 
    Alias of key to use as SSL key: ca
    Key password: 
    Updating keystore information in WebLogic
    Oracle MiddleWare Home (MW_HOME): [/u01/app/oracle/Middleware] 
    WebLogic domain directory: [/u01/app/oracle/ovm-manager-3/domains/ovm_domain] 
    Oracle WebLogic Server name: [AdminServer] 
    WebLogic username: [weblogic] 
    WebLogic password: [********] 
    WLST session logged at: /tmp/wlst-session5820685079094897641.log
  10. Configure the client certificate login.

    # /u01/app/oracle/ovm-manager-3/bin/configure_client_cert_login.sh \
    /u01/app/oracle/ovm-manager-3/domains/ovm_domain/security/pcakey.crt
  11. Test the new SSL configuration by logging into the Oracle Private Cloud Appliance Dashboard. From there, proceed to Oracle VM Manager with the button "Login to OVM Manager". The browser now indicates that your connection is secure.

7.9.2 Importing a Keystore

If you already have a CA certificate and SSL certificate, use the SSL certificate to create a keystore. You can then import that keystore into Oracle Private Cloud Appliance and configure it as the SSL keystore.

Caution

If you have generated custom keys using ovmkeytool.sh in a previous version of the Oracle Private Cloud Appliance software, you must regenerate the keys prior to updating the Controller Software. For instructions, refer to the support note with Doc ID 2597439.1.

Importing a Keystore with an Existing CA and SSL Certificate
  1. Using SSH and an account with superuser privileges, log into the management node.

    Note

    The data center IP address used in this procedure is an example.

    # ssh root@10.100.1.101
    root@10.100.1.101's password:
    [root@ovcamn05r1 ~]#
  2. Import the keystore.

    # /u01/app/oracle/java/bin/keytool -importkeystore -noprompt \
    -srckeystore existing_keystore.jks -srcstoretype source_format -srcstorepass Welcome1
    -destkeystore mykeystore.jks -deststoretype jks -deststorepass Welcome1
  3. Use the setsslkey command to configure the system to use the new keystore.

    # /u01/app/oracle/ovm-manager-3/ovm_upgrade/bin/ovmkeytool.sh setsslkey
    Path for SSL keystore: /u01/app/oracle/ovm-manager-3/domains/ovm_domain/security/mykeystore.jks
    Keystore password: 
    Alias of key to use as SSL key: ca
    Key password: 
    Updating keystore information in WebLogic
    Oracle MiddleWare Home (MW_HOME): [/u01/app/oracle/Middleware] 
    WebLogic domain directory: [/u01/app/oracle/ovm-manager-3/domains/ovm_domain] 
    Oracle WebLogic Server name: [AdminServer] 
    WebLogic username: [weblogic] 
    WebLogic password: [********] 
    WLST session logged at: /tmp/wlst-session5820685079094897641.log
  4. Configure the client certificate login.

    # /u01/app/oracle/ovm-manager-3/bin/configure_client_cert_login.sh /path/to/cacert

    Where /path/to/cacert is the absolute path to the CA certificate.

  5. Test the new SSL configuration by logging into the Oracle Private Cloud Appliance Dashboard. From there, proceed to Oracle VM Manager with the button "Login to OVM Manager". The browser now indicates that your connection is secure.

7.10 Reprovisioning a Compute Node when Provisioning Fails

Compute node provisioning is a complex orchestrated process involving various configuration and installation steps and several reboots. Due to connectivity fluctuations, timing issues or other unexpected events, a compute node may become stuck in an intermittent state or go into error status. The solution is to reprovision the compute node.

Warning

Reprovisioning is to be applied only to compute nodes that fail to complete provisioning.

For correctly provisioned and running compute nodes, reprovisioning functionality is blocked in order to prevent incorrect use that could lock compute nodes out of the environment permanently or otherwise cause loss of functionality or data corruption.

Reprovisioning a Compute Node when Provisioning Fails
  1. Log in to the Oracle Private Cloud Appliance Dashboard.

  2. Go to the Hardware View tab.

  3. Roll over the compute nodes that are in Error status or have become stuck in the provisioning process.

    A pop-up window displays a summary of configuration and status information.

    Figure 7.1 Compute Node Information and Reprovision Button in Hardware View
    Screenshot showing the Hardware View tab of the Oracle Private Cloud Appliance Dashboard. The pop-up window displays details of a compute node and has a Reprovision button.

  4. If the compute node provisioning is incomplete and the server is in error status or stuck in an intermittent state for several hours, click the Reprovision button in the pop-up window.

  5. When the confirmation dialog box appears, click OK to start reprovisioning the compute node.

If compute node provisioning should fail after the server was added to the Oracle VM server pool, additional recovery steps could be required. The cleanup mechanism associated with reprovisioning may be unable to remove the compute node from the Oracle VM configuration. For example, when a server is in locked state or owns the server pool active role, it must be unconfigured manually. In this case you need to perform operations in Oracle VM Manager that are otherwise not permitted. You may also need to power on the compute node manually.

Removing a Compute Node from the Oracle VM Configuration
  1. Log into the Oracle VM Manager user interface.

    For detailed instructions, see Section 5.2, “Logging in to the Oracle VM Manager Web UI”.

  2. Go to the Servers and VMs tab and verify that the server pool named Rack1_ServerPool does indeed contain the compute node that fails to provision correctly.

  3. If the compute node is locked due to a running job, abort it in the Jobs tab of Oracle VM Manager.

    Detailed information about the use of jobs in Oracle VM can be found in the Oracle VM Manager User's Guide. Refer to the section entitled Jobs Tab.

  4. Remove the compute node from the Oracle VM server pool.

    Refer to the section entitled Edit Server Pool in the Oracle VM Manager User's Guide. When editing the server pool, move the compute node out of the list of selected servers. The compute node is moved to the Unassigned Servers folder.

  5. Delete the compute node from Oracle VM Manager.

    Refer to the Oracle VM Manager User's Guide and follow the instructions in the section entitled Delete Server.

When the failing compute node has been removed from the Oracle VM configuration, return to the Oracle Private Cloud Appliance Dashboard, to reprovision it. If the compute node is powered off and reprovisioning cannot be started, power on the server manually.

7.11 Deprovisioning and Replacing a Compute Node

When a defective compute node needs to be replaced or repaired, or when a compute node is retired in favor of a newer model with higher capacity and better performance, it is highly recommended that you deprovision the compute node before removing it from the appliance rack. Deprovisioning ensures that all configuration entries for a compute node are removed cleanly, so that no conflicts are introduced when a replacement compute node is installed.

Deprovisioning a Compute Node for Repair or Replacement
  1. Log into the Oracle VM Manager user interface.

    For detailed instructions, see Section 5.2, “Logging in to the Oracle VM Manager Web UI”.

  2. Migrate all virtual machines away from the compute node you wish to deprovision. If any VMs are running on the compute node, the deprovision command fails.

  3. Using SSH and an account with superuser privileges, log into the active management node, then launch the Oracle Private Cloud Appliance command line interface.

    # ssh root@10.100.1.101
    root@10.100.1.101's password:
    root@ovcamn05r1 ~]# pca-admin
    Welcome to PCA! Release: 2.4.2
    PCA>
  4. Lock provisioning to make sure that the compute node cannot be reprovisioned immediately after deprovisioning.

    PCA> create lock provisioning
    Status: Success
  5. Deprovision the compute node you wish to remove. Repeat for additional compute nodes, if necessary.

    PCA> deprovision compute-node ovcacn29r1
    ************************************************************
     WARNING !!! THIS IS A DESTRUCTIVE OPERATION.
    ************************************************************
    Are you sure [y/N]:y
    Shutting down dhcpd:                                [ OK ]
    Starting dhcpd:                                     [ OK ]
    Shutting down dnsmasq:                              [ OK ]
    Starting dnsmasq:                                   [ OK ]
    
    Status: Success
  6. When the necessary compute nodes have been deprovisioned successfully, release the provisioning lock. The appliance resumes its normal operation.

    PCA> delete lock provisioning
    ************************************************************
     WARNING !!! THIS IS A DESTRUCTIVE OPERATION.
    ************************************************************
    Are you sure [y/N]:y
    Status: Success

When the necessary repairs have been completed, or when the replacement compute nodes are ready, install the compute nodes into the rack and connect the necessary cables. The controller software detects the new compute nodes and automatically launches the provisioning process.

7.12 Eliminating Time-Out Issues when Provisioning Compute Nodes

The provisioning process is an appliance level orchestration of many configuration operations that run at the level of Oracle VM Manager and the individual Oracle VM Servers or compute nodes. As the virtualized environment grows – meaning there are more virtual machines, storage paths and networks –, the time required to complete various discovery tasks increases exponentially.

The maximum task durations have been configured to reliably accommodate a standard base rack setup. At a given point, however, the complexity of the existing configuration, when replicated to a large number of compute nodes, increases the duration of tasks beyond their standard time-out. As a result, provisioning failures occur.

Because many provisioning tasks have been designed to use a common time-out mechanism, this problem cannot be resolved by simply increasing the global time-out. Doing so would decrease the overall performance of the system. To overcome this issue, additional code has been implemented to allow a finer-grained definition of time-outs through a number of settings in a system configuration file: /var/lib/ovca/ovca-system.conf.

If you run into time-out issues when provisioning additional compute nodes, it may be possible to resolve them by tweaking specific time-out settings in the configuration. Depending on which job failures occur, changing the storage_refresh_timeout, discover_server_timeout or other parameters could allow the provisioning operations to complete successfully. These changes would need to be applied on both management nodes.

Please contact your Oracle representative if your compute nodes fail to provision due to time-out issues. Oracle product specialists can analyse these failures for you and recommend new time-out parameters accordingly.

7.13 Returning Oracle VM Server Pool to Operation After Network Services Restart

Warning

This section applies only to systems with an InfiniBand-based network architecture. The use of the bond0 interface described in this section is inherent to the network design based on the use of Oracle Fabric Interconnect F1-15s .

When network services are restarted on the active management node, the connection to the Oracle VM management network ( bond0 ) is lost. By design, the bond0 interface is not brought up automatically on boot, so that the virtual IP of the management cluster can be configured on the correct node, depending on which management node assumes the active role. While the active management node is disconnected from the Oracle VM management network, the Oracle VM Manager user interface reports that the compute nodes in the server pool are offline.

The management node that becomes the active, runs the Oracle VM services necessary to bring up the bond0 interface and configure the virtual IP within a few minutes. It is expected that the compute nodes in the Oracle VM server pool return to their normal online status in the Oracle VM Manager user interface. If the active management node does not reconnect automatically to the Oracle VM management network, bring the bond0 interface up manually from the Oracle Linux shell.

Warning

Execute this procedure ONLY when so instructed by Oracle Support. This should only be necessary in rare situations where the active management node fails to connect automatically. You should never manually disconnect or restart networking on any node.

Manually Reconnecting the Active Management Node to the Oracle VM Management Network
  1. Using SSH and an account with superuser privileges, log into the disconnected active management node on the appliance management network.

    # ssh root@192.168.4.3
    root@192.168.4.3's password:
    [root@ovcamn05r1 ~]#

  2. Check the configuration of the bond0 interface.

    If the interface is down, the console output looks similar to this:

    # ifconfig bond0
    bond0     Link encap:Ethernet  HWaddr 00:13:97:4E:B0:02  
              BROADCAST MULTICAST  MTU:1500  Metric:1
              RX packets:0 errors:0 dropped:0 overruns:0 frame:0
              TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0 
              RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)
    
  3. Bring the bond0 interface up.

    # ifconfig bond0 up
  4. Check the configuration of the bond0 interface again.

    When the interface reconnects successfully to the Oracle VM management network, the console output looks similar to this:

    # ifconfig bond0
    bond0     Link encap:Ethernet  HWaddr 00:13:97:4E:B0:02  
              inet addr:192.168.140.4  Bcast:192.168.140.255  Mask:255.255.255.0
              inet6 addr: fe80::213:97ff:fe4e:b002/64 Scope:Link
              UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1
              RX packets:62191 errors:0 dropped:0 overruns:0 frame:0
              TX packets:9183 errors:0 dropped:0 overruns:0 carrier:0
              collisions:0 txqueuelen:0 
              RX bytes:4539474 (4.33 MB)  TX bytes:1853641 (1.77 MB)
    

7.14 Recovering from Tenant Group Configuration Mismatches

Tenant groups are essentially Oracle VM server pools, created and managed at the appliance level, with support for automatic custom network configuration across all pool members. The tenant groups appear in Oracle VM Manager, where the administrator could modify the server pool, but such operations are not supported in Oracle Private Cloud Appliance and cause configuration mismatches.

If you have inadvertently modified the configuration of a tenant group in Oracle VM Manager, follow the instructions in this section to correct the inconsistent state of your environment.

Caution

If the operations described below do not resolve the issue, it could be necessary to reprovision the affected compute nodes. This can result in downtime and data loss.

Adding a Server to a Tenant Group

If you try to add a server to a pool or tenant group using Oracle VM Manager, the operation succeeds. However, the newly added server is not connected to the custom networks associated with the tenant group because the Oracle Private Cloud Appliance controller software is not aware that a server has been added.

To correct this situation, first remove the server from the tenant group again in Oracle VM Manager. Then add the server to the tenant group again using the correct method, which is through the Oracle Private Cloud Appliance CLI. See Section 2.8.2, “Configuring Tenant Groups”.

As a result, Oracle VM Manager and Oracle Private Cloud Appliance are in sync again.

Removing a Server from a Tenant Group

If you try to remove a server from a pool or tenant group using Oracle VM Manager, the operation succeeds. However, the Oracle Private Cloud Appliance controller software is not aware that a server has been removed, and the custom network configuration associated with the tenant group is not removed from the server.

At this point, Oracle Private Cloud Appliance assumes that the server is still a member of the tenant group, and any attempt to remove the server from the tenant group through the Oracle Private Cloud Appliance CLI results in an error:

PCA> remove server ovcacn09r1 myTenantGroup
************************************************************
 WARNING !!! THIS IS A DESTRUCTIVE OPERATION.
************************************************************
Are you sure [y/N]:y

Status: Failure
Error Message: Error (SERVER_001): Exception while trying to 
remove the server ovcacn09r1 from tenant group myTenantGroup.
ovcacn09r1 is not a member of the Tenant Group myTenantGroup.

To correct this situation, use Oracle VM Manager to add the previously removed server to the tenant group again. Then use the Oracle Private Cloud Appliance CLI to remove the server from the tenant group. See Section 2.8.2, “Configuring Tenant Groups”. After the remove server command is applied successfully, the server is taken out of the tenant group, custom network configurations are removed, and the server is placed in the Unassigned Servers group in Oracle VM Manager. As a result, Oracle VM Manager and Oracle Private Cloud Appliance are in sync again.

7.15 Configure Xen CPU Frequency Scaling for Best Performance

The Xen hypervisor offers a mechanism to balance performance and power consumption through CPU frequency scaling. Known as the Current Governor, this mechanism can lower power consumption by throttling the clock speed when a CPU is idle.

Certain versions of Oracle VM Server have the Current Governor set to ondemand by default, which dynamically scales the CPU clock based on the load. Oracle recommends that on Oracle Private Cloud Appliance compute nodes you run the Current Governor with the performance setting. Particularly if you find that systems are not performing as expected after an upgrade of Oracle VM Server, make sure that the Current Governor is configured correctly.

To verify the Current Governor setting of a compute node, log in using SSH and enter the following command at the Oracle Linux prompt:

]# xenpm get-cpufreq-para
cpu id               : 0
affected_cpus        : 0
cpuinfo frequency    : max [2301000] min [1200000] cur [2301000]
scaling_driver       : acpi-cpufreq
scaling_avail_gov    : userspace performance powersave ondemand
current_governor     : performance
scaling_avail_freq   : *2301000 2300000 2200000 2100000 2000000 1900000 1800000 1700000 1600000 1500000 1400000 1300000 1200000
scaling frequency    : max [2301000] min [1200000] cur [2301000]
turbo mode           : enabled
[...]

The command lists all CPUs in the compute node. If the current_governor parameter is set to anything other than performance, you should change the Current Governor configuration.

To set performance mode manually, enter this command: xenpm set-scaling-governor performance.

To make this setting persistent, add it to the grub.cfg file.

  1. Add the xen cpu frequency setting to the /etc/default/grub template file, as shown in this example:

    GRUB_CMDLINE_XEN="dom0_mem=max:6144M allowsuperpage dom0_vcpus_pin dom0_max_vcpus=20 cpufreq=xen:performance max_cstate=1"
  2. Rebuild grub.cfg by means of the following command:

    # grub2-mkconfig -o /boot/grub2/grub.cfg