Recover from a Failed Upgrade

Access the primary SDS NOAM GUI, use the VIP address to access the primary SDS NOAM GUI as described in Access the OAM GUI Using the VIP (NOAM/SOAM).

In the primary SDS NOAM VIP, verify upgrade state.

Expand Administration navigate to Software Management click Upgrade.
Verify the host name of the primary active SDS NOAM server from the GUI banner.
Select the Server Group tab for the server(s) being upgraded.
Verify the Upgrade State for each server undergoing the software upgrade and identify any servers with a Failed state.

Figure A-68 Server State

Note:

If the Failed Server was upgraded using the Auto Upgrade option, that is, Auto Server Group Upgrade, then continue to the next step of this procedure. If the Failed Server was upgraded using the Upgrade Server option, then skip to step 11 of this procedure.

In the Primary SDS NOAM VIP, filter the servers that need upgrading. Expand Status & Manage navigate to Tasks click Active Tasks.

Figure A-69 Active Tasks

From the Filter option, enter the following filter values:

Network Element: All
Display Filter: Name Like *upgrade*

Click Go

Figure A-70 Active Status

In the primary SDS NOAM VIP, locate the Server Group Upgrade task. If not already selected, select the tab displaying the host name of the active SDS NOAM server. Locate the task for the Server Group Upgrade. It shows a status of paused.

Figure A-71 Server Group Upgrade

Note:

Consider the case of an upgrade cycle where it is seen that the upgrade of one or more servers in the server group has the status as exception (that is, failed), while the other servers in that server group have upgraded successfully. However, the server group upgrade task still shows as running. In this case, cancel the running (upgrade) task for that server group before reattempting ASU for the same.

Note:

Before clicking Cancel for the server group upgrade task, ensure the upgrade status of the individual servers in that particular server group have status as completed or exception (that is, failed for some reason). Make sure you are not canceling a task with some servers still in running state.

In the primary SDS NOAM VIP, cancel the Server group Upgrade task.

Click the Server Group Upgrade task to select it.
Click Cancel to cancel the task.

Figure A-72 Cancel Task

Click on the confirmation screen to confirm the cancellation.

Figure A-73 Confirm Cancellation

In the primary SDS NOAM VIP, verify if the Server Group Upgrade task is canceled. On the Active Tasks screen, verify the Status changed from paused to completed.

Figure A-74 Status

Verify the Result Details column now states “SG upgrade task canceled by user.

Figure A-75 SG upgrade task cancelled

Access the failed CLI server, Use the XMI address to log into the failed server with the admusr account.

sds-mrsvnc-a login: admusr
Password: <admusr_password>
*** TRUNCATED OUTPUT ***
RELEASE=6.4
RUNID=00
VPATH=/var/TKLC/rundb:/usr/TKLC/appworks:/usr/TKLC/awpcommon:/usr/TKLC/comagent-gui:/usr/TKLC/comagent-gui:/usr/TKLC/comagent:/usr/TKLC/sds
PRODPATH=/opt/comcol/prod
RUNID=00

Inspect the upgrade.log file and identify the reason for the failure in the upgrade.log file.

[admusr@sds-mrsvnc-a ~]$ tail /var/TKLC/log/upgrade/upgrade.log
1439256874:: INFO: Removing '/etc/my.cnf' from RCS repository
1439256874:: INFO: Removing '/etc/pam.d/password-auth' from RCS repository
1439256874:: INFO: Removing '/etc/pam.d/system-auth' from RCS repository
1439256874:: INFO: Removing '/etc/sysconfig/network-scripts/ifcfg-eth0' from RCS repository
1439256874:: INFO: Removing '/var/lib/prelink/force' from RCS repository
1439256874::Marking task 1439256861.0 as finished.
1439256874::
1440613685::Early Checks failed for the next upgrade
1440613691::Look at earlyChecks.log for more info
1440613691::

Inspect the earlyChecks.log file, identify the reason for the failure in the earlyChecks.log file.

[admusr@sds-mrsvnc-a upgrade]$ grep ERROR /var/TKLC/log/upgrade/earlyChecks.log
ERROR: There are alarms on the system!
ERROR: <<<   OUTPUT   >>>
ERROR:  SEQ: 15 UPTIME: 2070747 BIRTH: 1438969736 TYPE: SET ALARM: TKSPLATMI10|tpdNTPDaemonNotSynchronizedWarning|1.3.6.1.4.1.323.5.3.18.3.1.3.10|32509|Communications|Communications Subsystem Failure
ERROR: <<< END OUTPUT >>>
ERROR: earlyUpgradeChecks() code failed for Upgrade::EarlyPolicy::TPDEarlyChecks
ERROR: Failed running earlyUpgradeChecks() code
ERROR: Early Upgrade Checks Failed!

Note:

Although outside of the scope of this document, the user is expected to use standard troubleshooting techniques to clear the alarm condition from the failed server.

If troubleshooting assistance is needed, it is recommended to contact My Oracle Support.

Do not proceed to the next step until the alarm condition has been cleared.

In the Failed Server (CLI), verify platform alarms are cleared from the failed server. Use the alarmMgr utility to verify all platform alarms have been cleared from the system.

[admusr@sds-mrsvnc-b ~]$ alarmMgr –alarmStatus

Exit the CLI for the failed server.

[admusr@sds-mrsvnc-a ~]$ exit

logout

In the Primary SDS NOAM VIP (GUI), run the server upgrade again. Return to the upgrade procedure being run when the failure occurred. Re-run the upgrade for the failed server using the Upgrade Server option.

Note:

Once a server has failed while using the

Automated Server Group
                            Upgrade

option, the

Auto
                            Upgrade

option cannot be used again on that server group. The remaining servers in that server group must be upgraded using the Upgrade Server option.