3.1.3 5003 - IPFE State Sync Run Error
- Alarm Group:
- IPFE
- Description:
-
The IPFE was unable to synchronize state data with its mate. This alarm is generated when the IPFE server missed the heartbeat messages from its mate, or if the mate is unavailable for any reason.
This alarm is normal when one IPFE of a pair is taken down for maintenance. In this case, the alarm is guaranteed.
If the alarm is not generated, this indicates the IPFE has detected that its mate is out of service.
DSR currently supports, at most, four IPFE servers, which are named IPFE-A1, IPFE-A2, IPFE-B1, and IPFE-B2 in the
tab. You can configure IPFE-A1 and IPFE-A2 servers only in the small DSR system and you can add IPFE-B1 and IPFE-B2 for a big size DSR, which depends on the needs. The IPFE-A1 and IPFE-A2 are configured as mated (IPFE-B1 and IPFE-B2 are mated, if configured). The heartbeat message exchanges between the mated IPFE servers once every 500ms. If, for any reason, the IPFE server missed its mate's heartbeat message, alarm 5003 is raised. A few typical reasons are:- Mate server is down
- Network connectivity issue
- Latency between the IPFEs
- High CPU load on the IPFE causing internal software latency in the transmission or receipt of a heartbeat message
- Severity:
- Critical
- Instance:
- One of the following strings:
- connect error - cannot connect to peer IPFE
- data read error - error reading data from peer IPFE
- data write error - error writing data to peer IPFE
Note:
If the is able to synchronize state data with its mate, this alarm will clear. - HA Score:
- Normal
- Auto Clear Seconds:
- N/A
- OID:
- ipfeIpfeStateSyncRunErrorNotify
- Diagnostic Information:
- The state synchronization data exchange is through the connection between IPFE server mates (IPFE A1/A2 IP or B1/B2 IP, 19041, TCP). Wireshark can be used to diagnose if there is an state sync heartbeat message sent and received.
Recovery:
- Check IPFE server configurations by navigating to and checking the IPFE server IP address. Select the IMI IP address. The Default State Sync TCP port number is 19041. If this port number is configurable in your version of the IPFE, then do not change it from the default.
- Check the Mated IPFE connectivity.
- ssh to IPFE-A1. ssh admusr@<IPFE-A1 XMI IP address>
- ping <IPFE-A2 IMI Address>
- telnet <IPFE-A2 IMI Address> 19041
- ssh to IPFE-A2 to ping/telnet IPFE-A1
- ssh to IPFE-B1 to ping/telnet IPFE-B2
- ssh to IPFE-B2 to ping/telnet IPFE-B1
- If the mated IPFE servers are reachable to each other, go to step 3
- Reboot the IPFE servers, one by one, if possible.
- Do CPU and userspace performance diagnostics using
the commands:
top
andmpstat -P ALL
. - For further assistance, it is recommended to contact
My Oracle Support
for assistance. Collect this data first:
- Screenshots of All IPFE Server tab and .
ifconfig>ifconfig_$(hostname)
(iqt -E IpfeOption ; iqt -E IpListTsa ; ) > ipfeconfig_$(hostname)
netstat -anop | grep 19041>netstat_$(hostname)