A.6 Recover from a Failed Upgrade

  1. Access the primary SDS NOAM GUI, use the VIP address to access the primary SDS NOAM GUI as described in Access the OAM GUI Using the VIP (NOAM/SOAM).
  2. In the primary SDS NOAM VIP, verify upgrade state.
    1. Expand Administration navigate to Software Management click Upgrade.
    2. Verify the host name of the primary active SDS NOAM server from the GUI banner.
    3. Select the Server Group tab for the server(s) being upgraded.
    4. Verify the Upgrade State for each server undergoing the software upgrade and identify any servers with a Failed state.

    Figure A-68 Server State


    Server State

    Note:

    If the Failed Server was upgraded using the Auto Upgrade option, that is, Auto Server Group Upgrade, then continue to the next step of this procedure. If the Failed Server was upgraded using the Upgrade Server option, then skip to step 11 of this procedure.
  3. In the Primary SDS NOAM VIP, filter the servers that need upgrading. Expand Status & Manage navigate to Tasks click Active Tasks.

    Figure A-69 Active Tasks


    Active Tasks

  4. From the Filter option, enter the following filter values:
    1. Network Element: All
    2. Display Filter: Name Like *upgrade*
  5. Click Go

    Figure A-70 Active Status


    Active Status

  6. In the primary SDS NOAM VIP, locate the Server Group Upgrade task. If not already selected, select the tab displaying the host name of the active SDS NOAM server. Locate the task for the Server Group Upgrade. It shows a status of paused.

    Figure A-71 Server Group Upgrade


    Server Group Upgrade

    Note:

    Consider the case of an upgrade cycle where it is seen that the upgrade of one or more servers in the server group has the status as exception (that is, failed), while the other servers in that server group have upgraded successfully. However, the server group upgrade task still shows as running. In this case, cancel the running (upgrade) task for that server group before reattempting ASU for the same.

    Note:

    Before clicking Cancel for the server group upgrade task, ensure the upgrade status of the individual servers in that particular server group have status as completed or exception (that is, failed for some reason). Make sure you are not canceling a task with some servers still in running state.
  7. In the primary SDS NOAM VIP, cancel the Server group Upgrade task.
    1. Click the Server Group Upgrade task to select it.
    2. Click Cancel to cancel the task.

    Figure A-72 Cancel Task


    Cancel Task

  8. Click on the confirmation screen to confirm the cancellation.

    Figure A-73 Confirm Cancellation


    Confirm Cancellation

  9. In the primary SDS NOAM VIP, verify if the Server Group Upgrade task is canceled. On the Active Tasks screen, verify the Status changed from paused to completed.

    Figure A-74 Status


    Status

  10. Verify the Result Details column now states “SG upgrade task canceled by user.

    Figure A-75 SG upgrade task cancelled


    SG upgrade task cancelled

  11. Access the failed CLI server, Use the XMI address to log into the failed server with the admusr account.
    sds-mrsvnc-a login: admusr
    Password: <admusr_password>
    *** TRUNCATED OUTPUT ***
    RELEASE=6.4
    RUNID=00
    VPATH=/var/TKLC/rundb:/usr/TKLC/appworks:/usr/TKLC/awpcommon:/usr/TKLC/comagent-gui:/usr/TKLC/comagent-gui:/usr/TKLC/comagent:/usr/TKLC/sds
    PRODPATH=/opt/comcol/prod
    RUNID=00
    
  12. Inspect the upgrade.log file and identify the reason for the failure in the upgrade.log file.
    [admusr@sds-mrsvnc-a ~]$ tail /var/TKLC/log/upgrade/upgrade.log
    1439256874:: INFO: Removing '/etc/my.cnf' from RCS repository
    1439256874:: INFO: Removing '/etc/pam.d/password-auth' from RCS repository
    1439256874:: INFO: Removing '/etc/pam.d/system-auth' from RCS repository
    1439256874:: INFO: Removing '/etc/sysconfig/network-scripts/ifcfg-eth0' from RCS repository
    1439256874:: INFO: Removing '/var/lib/prelink/force' from RCS repository
    1439256874::Marking task 1439256861.0 as finished.
    1439256874::
    1440613685::Early Checks failed for the next upgrade
    1440613691::Look at earlyChecks.log for more info
    1440613691::
    
  13. Inspect the earlyChecks.log file, identify the reason for the failure in the earlyChecks.log file.
    [admusr@sds-mrsvnc-a upgrade]$ grep ERROR /var/TKLC/log/upgrade/earlyChecks.log
    ERROR: There are alarms on the system!
    ERROR: <<<   OUTPUT   >>>
    ERROR:  SEQ: 15 UPTIME: 2070747 BIRTH: 1438969736 TYPE: SET ALARM: TKSPLATMI10|tpdNTPDaemonNotSynchronizedWarning|1.3.6.1.4.1.323.5.3.18.3.1.3.10|32509|Communications|Communications Subsystem Failure
    ERROR: <<< END OUTPUT >>>
    ERROR: earlyUpgradeChecks() code failed for Upgrade::EarlyPolicy::TPDEarlyChecks
    ERROR: Failed running earlyUpgradeChecks() code
    ERROR: Early Upgrade Checks Failed!
    

    Note:

    Although outside of the scope of this document, the user is expected to use standard troubleshooting techniques to clear the alarm condition from the failed server.

    If troubleshooting assistance is needed, it is recommended to contact My Oracle Support.

    Do not proceed to the next step until the alarm condition has been cleared.

  14. In the Failed Server (CLI), verify platform alarms are cleared from the failed server. Use the alarmMgr utility to verify all platform alarms have been cleared from the system.
    [admusr@sds-mrsvnc-b ~]$ alarmMgr –alarmStatus
  15. Exit the CLI for the failed server.
    [admusr@sds-mrsvnc-a ~]$ exit
    logout
  16. In the Primary SDS NOAM VIP (GUI), run the server upgrade again. Return to the upgrade procedure being run when the failure occurred. Re-run the upgrade for the failed server using the Upgrade Server option.

    Note:

    Once a server has failed while using the Automated Server Group Upgrade option, the Auto Upgrade option cannot be used again on that server group. The remaining servers in that server group must be upgraded using the Upgrade Server option.