Oracle Light Weight Availability Collection Tool User Guide

Troubleshooting

This section explains the troubleshooting steps for the Oracle Lightweight Availability Collection Tool.

Oracle Lightweight Availability Collection Tool Error Messages

This section lists the various errors logged by the Oracle Lightweight Availability Collection Tool, their functional meaning, and any actions that should be taken when these errors are displayed either in /var/adm/message or on the screen.

[logtime] Default causecode [XX] at LWACT configuration file is invalid

Indicates that an invalid cause code entry is in the /etc/default/lwact file. The user can set up to three levels of default cause codes for the outage events in this file. The cause code level that contains incorrect entry is logged in the error message with the square brackets ([]); that is, XX can take either [L1CC] or [L2CC], or [L3CC] based on the level of cause code that is invalid.

Action: Enter the valid set of cause codes against the L1CC, L2CC, L3CC fields in the /etc/default/lwact file. Use logtime -M command to get the list of valid cause codes for all three levels.

Invalid event number

Indicates that a user has tried to modify the cause code for an invalid event number; that is, a non-outage event. Users can modify/assign the cause codes in only the halt and panic outage events.

Action: Use the ltreport -v command to display the list of outage events along with their corresponding event numbers.

Invalid Level-X cause code: Invalid cause code entered

Indicates that a user has entered invalid an cause code for the level displayed in the message. X can be either 1, 2 or 3.

Action: For each level 1 cause code, there is a corresponding umbrella of level 2 and level 3 cause codes under it. The only valid cause codes for that level is listed under the umbrella. To obtain the valid list of cause codes, use the logtime -m command.

[logtime] event entry X was modified

Indicates a user has successfully modified the event number X. In this message, X is the event number.

Action: No action is required. Informational only.

[tictimed]: stopping on SIGTERM or SIGPWR

This message is logged when the Oracle Lightweight Availability Collection Tool terminates (for example, in the case of pkgrm).

Action: No action is required. Informational only.

[tictimed] Daemon instance already running

Indicates a user has tried to start the tictimed daemon that is already running.

Action: No action is required. Informational only.

[tictimed] Catastrophic file error - zero length

LWACT is removing the zero byte file and starting afresh. Occurs when the availability datagram file turns to 0 bytes in size for an unknown reason.

Action: For pre-LWACT 3.2 installation, remove the zero byte file, tictimed will recreate it. For LWACT 3.2 or higher versions, no action is required. LWACT will automatically remove the zero byte file.

[tictimed] datagram file corruption detected

The entire message is as follows:


[tictimed] datagram file corruption detected. LWACT is quarantining the 
corrupted file and starting afresh. If required user can pick up the uncorrupted
datagram file from the last run explorer output in-rder to avoid considerable
data loss.

Whenever the Availability datagram is found to be corrupted, the Oracle Lightweight Availability Collection Tool automatically quarantines it to the same folder where the Availability datagram is present with a filename of the format: lwact_corrupted_<UTC at which the corruption was detected> (for example: lwact_corrupted_1208531225). Quarantining the Availability datagram causes a data loss in the Oracle Lightweight Availability Collection Tool. Old data, collected before the file corruption occurred, will not be taken into account by the tool during the availability calculation.

Action: In order to minimize this data loss, you can manually obtain the uncorrupted copy of the datagram from the previous Explorer image.

[tictimed] Unable to update timestamp on log file

If the Availability datagram is lost or deleted for some reason, tictimed, which periodically updates the timestamp on the log file, will not be able to carry out this activity. Hence, it logs the error message. A few possible cases where this error can occur are the following:

Action: No action is required. tictimed will automatically recreate the file afresh if it does not find it.

Attempting to start LWACT. Respawning inittab

Indicates that user has attempted to start LWACT manually using the init script.

Action: No action required. Information only.

LWACT is already running

Indicates that user has attempted to start LWACT which was already running.

Action: No action required. Information only.

LWACT is going down

Indicates that user has attempted stop LWACT manually using the init script.

Action: No action required. Information only.

**ATTENTION** Event generation not in chronological order. It can affect availability metrics

The entire message is as follows:


**ATTENTION** Event generation not in chronological order. It can affect 
availability metrics. Sudden fall back in system date may have caused this. Check and 
correct system date. Otherwise, quarantine current datagram to start monitoring 
availability afresh. 

Occurs when the availability events are recorded out-of-sequence in the availability datagram. out-of-sequence events can occur due to sudden fall back in system date (for example, system shutdown today and boots back to a date from last week). In such cases, LWACT detects the sudden shift in time and records the message indicating the exact time when the system fell back in time. The affected system can report incorrect availability metrics.

Action: You can check and correct the system date or quarantine the current datagram to start monitoring the availability of the system afresh. Please note that old availability metrics will be lost during when the datagram is quarantined.

Failed to list SAVECORE dir contents

Indicates that the SAVECORE directory is clean from any core dumps and therefore LWACT was unable to get the contents of this directory.

Action: No action required. Information only.