2.10 Health Monitoring

The Oracle Private Cloud Appliance Controller Software contains a monitoring service, which is started and stopped with the ovca service on the active management node. When the system runs for the first time it creates an inventory database and monitor database. Once these are set up and the monitoring service is active, health information about the hardware components is updated continuously.

The inventory database is populated with information about the various components installed in the rack, including the IP addresses to be used for monitoring. With this information, the ping manager pings all known components every 3 minutes and updates the inventory database to indicate whether a component is pingable and when it was last seen online. When errors occur they are logged in the monitor database. Error information is retrieved from the component ILOMs.

For troubleshooting purposes, historic health status details can be retrieved through the CLI support mode by an authorized Oracle Field Engineer. When the CLI is used in support mode, a number of additional commands are available; two of which are used to display the contents of the health monitoring databases.

  • Use show db inventory to display component health status information from the inventory database.

  • Use show db monitor to display errors logged in the monitoring database.

The appliance administrator can retrieve current component health status information through the Oracle Private Cloud Appliance CLI at any time by means of the diagnose command.

Checking the Current Health Status of an Oracle Private Cloud Appliance Installation

  1. Using SSH and an account with superuser privileges, log in to the active management node.

    Note

    The default root password is Welcome1. For security reasons, you must set a new password at your earliest convenience.

    # ssh root@10.100.1.101
    root@10.100.1.101's password:
    root@ovcamn05r1 ~]#
  2. Launch the Oracle Private Cloud Appliance command line interface.

    # pca-admin
    Welcome to PCA! Release: 2.4.1
    PCA>
  3. Check the current status of the rack components by querying their ILOMs.

    PCA> diagnose ilom
    Checking ILOM health............please wait..
    
    IP_Address      Status          Health_Details
    ----------      ------          --------------
    192.168.4.129   Not Connected   None
    192.168.4.128   Not Connected   None
    192.168.4.127   Not Connected   None
    192.168.4.126   Not Connected   None
    192.168.4.125   Not Connected   None
    192.168.4.124   Not Connected   None
    192.168.4.123   Not Connected   None
    192.168.4.122   Not Connected   None
    192.168.4.121   Not Connected   None
    192.168.4.120   Not Connected   None
    192.168.4.101   OK              None
    192.168.4.103   OK              None
    192.168.4.102   OK              None
    192.168.4.105   Faulty          Mon Apr 55 14:17:37 2019  Power    PS1 (Power Supply 1) 
                                    A loss of AC input to a power supply has occurred. 
                                    (Probability: 100, UUID: 2c1ec5fc-ffa3-c768-e602-ca12b86e3ea1, 
                                    Part Number: 07047410, Serial Number: 476856F+1252CE027X, 
                                    Reference Document: http://www.sun.com/msg/SPX86-8003-73)
    192.168.4.104   OK              None
    192.168.4.107   OK              None
    192.168.4.106   OK              None
    192.168.4.109   OK              None
    192.168.4.108   OK              None
    192.168.4.112   Not Connected   None
    192.168.4.113   OK              None
    192.168.4.110   OK              None
    192.168.4.111   OK              None
    192.168.4.116   OK              None
    192.168.4.117   OK              None
    192.168.4.114   OK              None
    192.168.4.115   OK              None
    192.168.4.118   OK              None
    192.168.4.119   OK              None
    -----------------
    29 rows displayed
    
    Status: Success
  4. Verify that the Oracle Private Cloud Appliance controller software is fully operational.

    PCA> diagnose software
    PCA Software Acceptance Test runner utility
    Test -  01 - OpenSSL CVE-2014-0160 Heartbleed bug Acceptance [PASSED]
    Test -  02 - PCA package Acceptance [PASSED]
    Test -  03 - Shared Storage Acceptance [PASSED]
    Test -  04 - PCA services Acceptance [PASSED]
    Test -  05 - PCA config file Acceptance [PASSED]
    Test -  06 - Check PCA DBs exist Acceptance [PASSED]
    Test -  07 - Compute node network interface Acceptance [PASSED]
    Test -  08 - OVM manager settings Acceptance [PASSED]
    Test -  09 - Check management nodes running Acceptance [PASSED]
    Test -  10 - Check OVM manager version Acceptance [PASSED]
    Test -  11 - OVM server model Acceptance [PASSED]
    Test -  12 - Repositories defined in OVM manager Acceptance [PASSED]
    Test -  13 - Management Nodes have IPv6 disabled [PASSED]
    Test -  14 - Bash Code Injection Vulnerability bug Acceptance [PASSED]
    Test -  15 - Check Oracle VM 3.4 xen security update Acceptance [PASSED]
    Test -  16 - Test for ovs-agent service on CNs Acceptance [PASSED]
    Test -  17 - Test for shares mounted on CNs Acceptance [PASSED]
    Test -  18 - All compute nodes running Acceptance [PASSED]
    Test -  19 - PCA version Acceptance [PASSED]
    Test -  20 - Check support packages in PCA image Acceptance [PASSED]
    
    Status: Success
    Note

    For additional information about these diagnostic results, look at /var/log/ovca-diagnosis.log. However, note that this health monitoring status information changes frequently as the appliance environment runs. If the system does not perform as expected, use it only as an indication of where a problem might have occurred.

  5. Close the CLI.

    PCA> exit