Go to primary content
Oracle® Communications EAGLE Application Processor Alarms and Maintenance Guide
Release 16.3
E96331
Go To Table Of Contents
Contents

Previous
Previous
Next
Next

32508 5000000000000100 – Server Core File Detected

Alarm Type: TPD

Description: This alarm indicates that an application process has failed and debug information is available.

Severity: Minor

OID: tpdCoreFileDetectedNotify 1.3.6.1.4.1.323.5.3.18.3.1.3.9

Alarm ID: TKSPLATMI95000000000000100

Recovery

  1. Run syscheck in verbose mode.
  2. Run savelogs to gather system information (see Saving Logs Using the EPAP GUI)
  3. Contact Customer Care Center.

    Note:

    There is a special case of heartbeat process aborting and producing core file not as a result of a bug, but as an expected and intentional response of the process to unexpected activity on the network connecting the cluster nodes. Example of such activity could be switch configuration being performed during the time cluster nodes are trying to, or already are coupled together. To recognize such a case, the investigator first needs to find out if the core file was produced by the heartbeat process:

    1. Inspect syscheck verbose output, and look for "core" module. The output would be similar to following:
           core: Checking for core files.     core: There are core files on the system:     core:     CORE DIR: /var/TKLC/core     core:         CORE: core.heartbeat.<pid>     core:         CORE: core.heartbeat.<pid>.bt *     core: FAILURE:: MINOR::5000000000000100 -- Server Core File Detected
      There, investigator finds out there is a core file named core.heartbeat.<pid>, where <pid> is the process ID of the failed heartbeat process.
    2. If heartbeat core file was found, the investigator must get the backtrace of the process from the core file by running command:
      gdb /usr/lib/hearbeat/heartbeat /var/TKLC/core/core.heartbeat.<pid>
      Once in gdb shell, entering bt. The output would be similar to the following:
      (gdb) bt #0 0x00002b872c2c0215 in raise () from /lib64/libc.so.6 #1 0x00002b872c2c1cc0 in abort () from /lib64/libc.so.6 #2 0x000000000040b20c in update_ackseq () #3 0x000000000040d225 in send_cluster_msg () #4 0x000000000040d8d7 in send_local_status () #5 0x000000000040da63 in hb_send_local_status () #6 0x00002b872b2733d7 in Gmain_timeout_dispatch (src=0x13b66bc8, func=0x40da40 , user_data=0x0) at GSource.c:1570 #7 0x00002b872b8bbdb4 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #8 0x00002b872b8bec0d in ?? () from /lib64/libglib-2.0.so.0 #9 0x00002b872b8bef1a in g_main_loop_run () from /lib64/libglib-2.0.so.0 #10 0x000000000040e8de in initialize_heartbeat () #11 0x000000000040f235 in main ()
      The investigator is concerned in lines beginning with #0 through #5, where, in the fourth column, after the word "in", are listed function names called within the heartbeat process. If the order of called functions is the same as in the example above (i.e., raise on line #0) then abort, update_ackseq, send_cluster_msg, send_local_status, and hb_send_local_status on line #5, it is likely that the special case occurred. If such a case was recognized, the investigator can safely delete files /var/TKLC/core/core.heartbeat.<pid> and /var/TKLC/core/core.heartbeat.<pid>.bt and then clear the alarm itself by calling alarmMgr - -clear TKSPLATMI9.

    They will examine the files in /var/TKLC/core and remove them after all information has been extracted.