31101 - Database replication to slave failure

Alarm Group:
REPL
Description:
Database replication to a slave database has failed. This alarm is generated when:
  • The replication master finds the replication link is disconnected from the slave.
  • The replication master's link to the replication slave is OOS, or the replication master cannot get the slave's correct HA state because of a failure to communicate.
  • The replication mode is relayed in a cluster and either:
    • No nodes are active in cluster, or
    • None of the nodes in cluster are getting replication data.
Severity:
Critical
Instance:
May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and bindVarNamesValueStr
HA Score:
Normal
Auto Clear Seconds:
300
OID:
comcolDbRepToSlaveFailureNotify
Cause:

Alarm 31101 raises when:

  • The replication master finds the replication link is disconnected from the slave.
  • The replication master's link to the replication slave is OOS, or the replication master could not get the slave's correct HA state as a failure to communicate.
  • The replication mode is relayed in a cluster and either:
    • No nodes are active in cluster, or
    • None of the nodes in cluster are getting replication data.
Diagnostic Information:
  1. Verify the path for all services on a node:
    1. In a command interface, type path.test -a <toNode> to test the paths for all services.
  2. In a command interface, use the path test commands to test the communication between nodes:
    1. Run the command, iqt -pE NodeInfo to get the node ID
    2. Then, run the command, path.test -a <nodeid> to test the paths for all services
  3. Examine the Platform savelogs on all MPs, SO, and NO:
    1. Run the command, sudo /usr/TKLC/plat/sbin/savelogs_plat
    2. The plat savelogs in the /tmp directory.

Recovery:

  1. Verify the path for all services on a node by typing path.test –a <toNode> in a command interface to test the paths for all services.
  2. Use the path test command to test the communication between nodes by typing iqt -pE NodeInfo to get the node ID. Then type path.test -a <nodeid> to test the paths for all services.
  3. Examine the Platform savelogs on all MPs, SO, and NO by typing sudo /usr/TKLC/plat/sbin/savelogs_plat in the command interface. The plat savelogs are in the /tmp directory.
  4. Check network connectivity between the affected servers.
  5. If there are no issues with network connectivity, contact My Oracle Support.