3.1.3 5003 - IPFE State Sync Run Error

Alarm Group:
IPFE
Description:

The IPFE was unable to synchronize state data with its mate. This alarm is generated when the IPFE server missed the heartbeat messages from its mate, or if the mate is unavailable for any reason.

This alarm is normal when one IPFE of a pair is taken down for maintenance. In this case, the alarm is guaranteed.

If the alarm is not generated, this indicates the IPFE has detected that its mate is out of service.

DSR currently supports, at most, four IPFE servers, which are named IPFE-A1, IPFE-A2, IPFE-B1, and IPFE-B2 in the IPFE, and then Configuration, and then Options tab. You can configure IPFE-A1 and IPFE-A2 servers only in the small DSR system and you can add IPFE-B1 and IPFE-B2 for a big size DSR, which depends on the needs. The IPFE-A1 and IPFE-A2 are configured as mated (IPFE-B1 and IPFE-B2 are mated, if configured). The heartbeat message exchanges between the mated IPFE servers once every 500ms. If, for any reason, the IPFE server missed its mate's heartbeat message, alarm 5003 is raised. A few typical reasons are:

  • Mate server is down
  • Network connectivity issue
  • Latency between the IPFEs
  • High CPU load on the IPFE causing internal software latency in the transmission or receipt of a heartbeat message
img/alarm5003.jpg
Severity:
Critical
Instance:
One of the following strings:
  • connect error - cannot connect to peer IPFE
  • data read error - error reading data from peer IPFE
  • data write error - error writing data to peer IPFE

Note:

If the is able to synchronize state data with its mate, this alarm will clear.
HA Score:
Normal
Auto Clear Seconds:
N/A
OID:
ipfeIpfeStateSyncRunErrorNotify
Diagnostic Information:
The state synchronization data exchange is through the connection between IPFE server mates (IPFE A1/A2 IP or B1/B2 IP, 19041, TCP). Wireshark can be used to diagnose if there is an state sync heartbeat message sent and received.
img/alarm5003-1.png

Recovery:

  1. Check IPFE server configurations by navigating to IPFE, and then Configuration, and then Options and checking the IPFE server IP address. Select the IMI IP address. The Default State Sync TCP port number is 19041. If this port number is configurable in your version of the IPFE, then do not change it from the default.
  2. Check the Mated IPFE connectivity.
    • ssh to IPFE-A1. ssh admusr@<IPFE-A1 XMI IP address>
    • ping <IPFE-A2 IMI Address>
    • telnet <IPFE-A2 IMI Address> 19041
    • ssh to IPFE-A2 to ping/telnet IPFE-A1
    • ssh to IPFE-B1 to ping/telnet IPFE-B2
    • ssh to IPFE-B2 to ping/telnet IPFE-B1
    • If the mated IPFE servers are reachable to each other, go to step 3
  3. Reboot the IPFE servers, one by one, if possible.
    1. Navigate to Status & Manage, and then Server.
    2. Select the IPFE server and click Restart.

      The Are you sure you want to restart application software on the following server(s)? <server name> warning message displays.

    3. Click OK to continue.
    4. If rebooting does not solve the issue or you are not allowed to reboot the IPFE server, go to the next step.
  4. Do CPU and userspace performance diagnostics using the commands: top and mpstat -P ALL.
  5. For further assistance, it is recommended to contact My Oracle Support for assistance. Collect this data first:
    • Screenshots of Configuration, and then Network, and then Devices All IPFE Server tab and IPFE, and then Configuration, and then Options.
    • ifconfig>ifconfig_$(hostname)
    • (iqt -E IpfeOption ; iqt -E IpListTsa ; ) > ipfeconfig_$(hostname)
    • netstat -anop | grep 19041>netstat_$(hostname)