31101 - Database replication to slave failure
- Alarm Group:
- REPL
- Description:
- Database replication to a slave database has failed. This alarm is generated when:
- The replication master finds the replication link is disconnected from the slave.
- The replication master's link to the replication slave is OOS, or the replication master cannot get the slave's correct HA state because of a failure to communicate.
- The replication mode is relayed in a cluster and either:
- No nodes are active in cluster, or
- None of the nodes in cluster are getting replication data.
- Severity:
- Critical
- Instance:
- May include AlarmLocation, AlarmId, AlarmState, AlarmSeverity, and bindVarNamesValueStr
- HA Score:
- Normal
- Auto Clear Seconds:
- 300
- OID:
- comcolDbRepToSlaveFailureNotify
Recovery:
- Verify the path for all services on a node by typing path.test –a <toNode> in a command interface to test the paths for all services.
- Use the path test command to test the communication between nodes by typing iqt -pE NodeInfo to get the node ID. Then type path.test -a <nodeid> to test the paths for all services.
- Examine the Platform savelogs on all MPs, SO, and NO by typing sudo /usr/TKLC/plat/sbin/savelogs_plat in the command interface. The plat savelogs are in the /tmp directory.
- Check network connectivity between the affected servers.
- If there are no issues with network connectivity, contact My Oracle Support (MOS).