0541 - MSU cksum error threshold exceeded

One or more MSU checksum validation errors have been reported by a LIM or SCCP card during internal card integrity checks.

A LIM or SCCP card has reported a checksum validation failure for a MSU received from another card. The failure may be due to a hardware problem or other issue affecting the data transfer path on a particular card. It may indicate a problem with data corruption in an MSU sent to or received from another card.

The alarm is raised when a checksum validation failure occurs during internal card integrity checks. It remains active in the system until the Run-Time Diagnostic subsystem (RTD) statistics are reset and no further indications of MSU checksum validation failures are reported.

Example
     RLGHNCXA21W 06-12-07 12:01:43 EST  EAGLE 35.6.0 
 *C  2315.0541 *C RTD SYSTEM              MSU cksum error threshold exceeded

Alarm Level: Critical

Recovery

  1. Issue the following command with no parameters to obtain the Run-Time Diagnostic subsystem (RTD) report.

    Note: Save all command outputs and reports obtained during this procedure to provide to My Oracle Support (MOS).
    rept-stat-rtd
    Following is an example output of the rept-stat-rtd command.
        RLGHNCXA21W 06-12-07 12:01:43 EST EAGLE 35.6.0
        Retrieving data from the cards…
    
        RTD  SUBSYSTEM REPORT IS-ANR         Active     -----
        RTD  ALARM STATUS =  541 MSU cksum error threshold exceeded
            
                  MSU Validation Statistics                        
                ==============================     
                Total Rx   Total Rx      Total      
        CARD    Error      Validated        Tx          
        1101         275         275       710
        1102           0         200       200
        1103           0         200      1000     
        1105           0        1360       275     
        1107           0         200       100     
        1108           0         100       100      
    

  2. Record the timestamp reported for the alarm.
  3. Record the locations for cards reporting 1 or more errors in the Total Rx Error column.
  4. Determine if a single error or multiple errors were reported when the alarm occurred.

    • Multiple errors - if multiple cards report errors or a single card reports more than 1 error in the Total Rx Error column.

    • Single error - if only 1 card reports errors and the value in the Total Rx Error column is 1.

  5. Issue the following command for each card reporting 1 or more errors in 1

    rept-stat-rtd:loc=xxxx

    Where xxxx is the card location determined from the output in 1.

    The following is an example output of a card summary for card 1101.
    rept-stat-rtd:loc=1101
    
        RLGHNCXA21W 06-12-07 12:01:43 EST EAGLE 35.6.0
        Retrieving data from card …
    
        CARD SUMMARY: 1101     Last Alarm Timestamp: 06-12-07 12:01:43
            
                              MSU Validation Statistics                       
                         =================================   
      
                         Total Rx     Total Rx    Total Tx     
        SRC/DEST            Error    Validated 
        CARD 
        1102                  100          100         100
        1103                    0            0           0       
        1105                   75           75         360        
        1107                  100          100         200      
        1108                    0           50          50

  6. Issue the following command to clear the RTD statistics

    rept-stat-rtd:reset=yes:force=yes
    Following is an example output of the command.
    rept-stat-rtd:reset=yes
    
        RLGHNCXA21W 06-12-07 12:09:43 EST EAGLE 35.6.0
        Reset all RTD statistics sent to each card
    
    COMMAND COMPLETE
    

  7. Issue the following command with no parameters to obtain the Run-Time Diagnostic subsystem (RTD) report.

    rept-stat-rtd
    Following is an example output of the command showing no alarms.
        RLGHNCXA21W 06-12-07 12:10:43 EST EAGLE 35.6.0
        Retrieving data from the cards…
    
        RTD  SUBSYSTEM REPORT IS-NR         Active     -----
        RTD  ALARM STATUS =  No Alarms
            
                  MSU Validation Statistics                        
                ==============================     
                Total Rx   Total Rx      Total      
        CARD    Error      Validated        Tx          
        1101           0         275       710
        1102           0         200       200
        1103           0         200      1000     
        1105           0        1360       275     
        1107           0         200       100     
        1108           0         100       100      
    

    Note that the alarm did clear.

  8. Have all command outputs and reports obtained during this procedure available.

    This information will be used by the Customer Care Center in determining the cause of the alarm and monitoring the system for errors.

  9. If RTD alarm status reported in step 7 indicates that the alarm did not clear, then proceed with below steps. Otherwise, continue to 26
  10. Enter the rtrv-log:dir=bkwd:snum=1355:num=10 command to retrieve the 10 latest UIM 1355 records.
  11. Count the total number of times a particular card location appeared in the 10 UIM 1355 samples collected in the previous step, either as the source or as the destination location.

    For example, if card 1102 appeared as the source location in 4 UIM 1355 samples, and card 1102 appeared as the destination location in 6 UIM 1355 samples, then card 1102 appeared a total of 10 times in 10 UIM 1355 samples.

  12. Notify My Oracle Support (MOS) of the occurrence of the alarm immediately if none of the locations appeared exactly 10 times. Otherwise, proceed with the next step.
  13. If more than one location appeared 10 times, go to step 18. If only one card location appeared 10 times, then go to the next step.
  14. Inhibit the card location that appeared 10 times.
  15. Enter the rept-stat-rtd:reset=yes:force=yes command to reset the RTD alarm.
  16. Enter the rept-stat-rtd command to verify the RTD alarm status.
  17. Notify My Oracle Support (MOS) of the occurrence of the alarm immediately if the previous steps did not clear the RTD alarm. Otherwise, go to 26.
  18. Inhibit the card location that appeared 10 times as the source card location in the UIM 1355 samples collected in 10.
  19. Enter the rept-stat-rtd:reset=yes:force=yes command to reset the RTD alarm.
  20. Enter the rept-stat-rtd command to verify the RTD alarm status.
  21. If the previous steps did not clear the RTD alarm, allow the card location that was previously inhibited and bring it back in service.
  22. Inhibit the card location that appeared 10 times as the source card location in the UIM 1355 samples collected in 10.
  23. Enter the rept-stat-rtd:reset=yes:force=yes command to reset the RTD alarm.
  24. Enter the rept-stat-rtd command to verify the RTD alarm status.
  25. Notify My Oracle Support (MOS) of the occurrence of the alarm immediately if the previous steps did not clear the RTD alarm. Otherwise, go to the next step.
  26. Notify My Oracle Support (MOS) of the occurrence of the alarm within 1 business day, along with captures covering the recovery steps performed and all necessary system logs (UIM, UAM, seculog, trouble, obit, etc.) covering the incident.