31101 - Database replication to slave failure
- Alarm Group:
- REPL
- Description:
- Database replication to a slave database has
failed. This alarm is generated when:
-
- The replication master finds the replication
link is disconnected from the slave.
- The replication master's link to the
replication slave is OOS, or the replication master cannot get the slave's
correct HA state because of a failure to communicate.
- The replication mode is relayed in a cluster
and either:
- No nodes are active in cluster, or
- None of the nodes in cluster are getting
replication data.
- Severity:
- Critical
- Instance:
- May include AlarmLocation, AlarmId, AlarmState,
AlarmSeverity, and bindVarNamesValueStr
- HA Score:
- Normal
- Auto Clear Seconds:
- 300
- OID:
- comcolDbRepToSlaveFailureNotify
Recovery:
-
Verify the path for all services on a node by typing
path.test –a <toNode>
in a
command interface to test the paths for all services.
-
Use the path test command to test the communication
between nodes by typing
iqt -pE NodeInfo
to get the node ID.
Then type
path.test -a <nodeid>
to test
the paths for all services.
-
Examine the Platform savelogs on all MPs, SO, and
NO by typing
sudo /usr/TKLC/plat/sbin/savelogs_plat
in the command interface. The plat savelogs are in the /tmp directory.
-
Check network connectivity between the affected
servers.
-
If there are no issues with network connectivity,
contact
unresolvable-reference.htm#GUID-DD0927BD-FD0B-4CEB-86E9-98A33C12D4E0.