3 Diagnostics through SNMP Traps

Simple Network Management Program (SNMP) is a protocol for network management services. Network management software typically uses SNMP to query or control the state of network devices like routers and switches. These devices sometimes also generate asynchronous alerts called Traps to inform the management systems of problems.

The following sections describe SNMP and traps in TimesTen:

TimesTen and SNMP

TimesTen cannot be queried nor controlled through SNMP. TimesTen only sends SNMP traps for certain critical events, to possibly facilitate some user recovery mechanisms. TimesTen can send traps for the following events:

  • Assertion failure

  • Death of daemons

  • Database invalid

  • Replicated transaction failure

  • Database out of space

  • Autorefresh transaction failure

  • Replication conflict resolution

  • File write errors

These events also cause log entries to be written by the TimesTen daemon, but exposing them through SNMP traps allows for the possibility of having some network management software take immediate action.

SNMP data types

The SNMP data types are either INTEGER or TEXT.

ASN_INTEGER data are:

  • ttPid

  • ttDSNConn

  • ttDSCurSize

  • ttDaeInst

  • ttRepReceiverPort

  • ttDSReqSize

  • ttDaePid

  • ttDSMaxSize

  • ttCacheAgentPid

The rest of the variables are ASN_OCTET_STRING type.

Trapping out of space messages

By default, TimesTen records that database space is low based on the partition space thresholds of PermWarnThreshold and TempWarnThreshold attributes. If the PermWarnThreshold, which defines the permanent database memory partition threshold, is set to 90, TimesTen records a message that the permanent database memory is full. Once the database permanent memory becomes 10% less than the set threshold, which in this case would be 80% full, TimesTen records a second message indicating that the database is no longer low on space.

When connecting to a database, you can change the out of space threshold by setting the PermWarnThreshold and TempWarnThreshold attributes. See "PermWarnThreshold" and "TempWarnThreshold" in Oracle TimesTen In-Memory Database Reference.

How TimesTen sends SNMP traps

SNMP traps are UDP/IP packets. Therefore, there is no guarantee of delivery, and it is not an error if there are no subscribers for the trap. TimesTen sends only SNMPv1 traps, which all network management systems should understand.

Generating and receiving SNMP traps

To enable SNMP trap generation, change the line -enabled 0 in the snmp.ini to 1. TimesTen does not generate SNMP traps by default because, in the case of repeated failures, such as an application that continues to attempt to insert new rows into a full database, the application may experience a performance slowdown due to generation of SNMP traps.

You must have network management software to receive SNMP traps.

Configuring the snmp.ini file

For root installations, the configuration file /var/TimesTen/snmp.ini on UNIX systems and install_dir\srv\info\snmp.ini on Windows systems enables or disables trap generation, controls the community string for SNMP traps, the target host and the target port on which to listen for traps.

Note:

For non-root installations, the file is install_dir/snmp.ini, where install_dir represents the path of the TimesTen installation.

The file contents are:

Component Description
enabled {0|1} Disable or enable SNMP trap generation.
-community {string} The SNMP community string. Default is "public."
-trapdest {host:portnumber} The SNMP agent hostname and port number where SNMP trap messages are received. The default host is "localhost." The default port number where the SNMP agent listens is 162.

Up to 8 destinations may be specified in the snmp.ini file.

-trapport {portnumber} To receive SNMP traps on the local machine when you do not want to use the default port, specify the portnumber with the -trapport option. The default port number is 162. If neither -trapdest or -trapport are specified, traps are sent to the default, which is localhost on the IPv4 loopback address and port 162.

You must be root to access the default SNMP port number. If you are not root, modify the port number to one that you can access.


An optional environment variable, TT_SNMP_INI, can override the location of the snmp.ini file. If this variable is set, it should contain the full path of the SNMP sender configuration file, which can have a name other than snmp.ini.

Example 3-1

To send messages and set one target destination, your snmp.ini file looks like this:

#Enable SNMP trap generation
-enabled 1
#Default community is "public"
-community "public"
#Default trap destination is "localhost" and default destination SNMP trap port is 162
-trapdest "localhost:162"

Example 3-2

To send messages and set multiple target destinations, your snmp.ini file looks like this:

#Enable SNMP trap generation
-enabled 1
#Default community is "public"
-community "public"
#Default trap destination is "localhost" and default destination SNMP trap port is 162
-trapdest "localhost:162"
-trapdest "pluto:10999"
-trapdest "mymachine:189"

Example 3-3

To disable trap generation, your snmp.ini file looks like this:

#Disable SNMP trap generation
-enabled 0
#Default community is "public"
-community "public"
#Default trap destination is "localhost" and default destination SNMP trap port is 162
-trapdest "localhost:162"

If one or more of the options is not specified, or if the snmp.ini file is missing, then the default value for each option is used.

Trap truncation on overflow

The maximum packet size of a single trap is 1024 bytes. If there is more data than can fit into the 1024 byte limit, the trap is truncated to fit. In this case, the trap contains a ttTrapTruncated OID set to 1.

Note:

You can use the UCD-SNMP perl module from the CPAN http://www.cpan.org/ directory to receive and act upon SNMP traps.

The TimesTen MIB

A Management Information Base (MIB) is like a database schema. It describes the structure of the SNMP data. For more information about MIBs in general, please refer to the previously mentioned SNMP overview documents.

The MIB extension file, install_dir/mibs/TimesTen-MIB.txt, describes the structure of the TimesTen SNMP information.

The TimesTen OID is rooted at Private Enterprise 5549. The complete path to root is iso.org.dod.internet.private.enterprise.TimesTen.* or numerically, 1.3.6.1.4.1.5549.*.

The traps

Every trap has a GMT timestamp of when the trap occurred, as well as the Process ID, user name (or User ID on UNIX systems) of the process, TimesTen instance name, TimesTen, release number and a trap specific Message. In addition, most traps provide additional information specific to that message. For example the ttRepAgentDiedTrap also provides the Replication Store ID. For a list of the variables for each trap see the TimesTen-MIB.txt file.

Trap names and severity levels

TimesTen SNMP traps can be categorized by severity level. The information in the trap can be of the type:

  • Informational

  • Warning

  • Error

Table 3-1 describes each trap and its severity level.

Table 3-1 Trap Description and Severity Levels

Trap name Severity level Description

ttAssertFailTrap

Error

TimesTen Assertion Failure

ttAsyncMVFailed

Warning

A refresh of Asynchronous materialized view failed. The SNMP trap includes dsname, daemon PID and viewid. If the error is due to a transient error, such as locking, the refresh may succeed in the next refresh.

ttCacheAgentDiedTrap

Error

TimesTen IMDB Cache daemon died.

ttCacheAgentFailoverTrap

Warning

The Cache Agent detected that a connection to Oracle had been lost and has begun to recover the connection.

ttCacheAutoRefFailedTrap

Error

TimesTen IMDB Cache incremental autorefresh failed.

ttCacheAutorefreshDsMarkedDeadTrap

Warning

TimesTen IMDB Cache incremental autorefresh failed. The cache agent for a remote datastore has stopped or is no longer responding. Autorefresh for the remote datastore has been disabled

ttCacheAwtRtReadFailedTrap

Error

For Asynchronous Writethrough cache groups, runtime information is stored on the Oracle instance. While reading this information from Oracle, replication either could not find the runtime data table (tt_version_reppeers) or could not find the information within the table.

ttCacheAwtRtUpdateFailedTrap

Error

For Asynchronous Writethrough cache groups, runtime information is stored on the Oracle instance. While updating this information replication either could not find the runtime data table (tt_version_reppeers) or could not find the information within the table.

ttCacheCgNotAutorefreshedTrap

Warning

The cache group will not be autorefreshed. Instead, it must be manually recovered by performing manual load or refresh cache group.

ttCacheLowOracleTblSpace

Warning

The tablespace the cache admin user is using is below the minimum threshold.

ttCacheRecoveryAutorefreshTrap

Warning

The Cache Agent is performing a full autorefresh. This may be needed when a change log table on Oracle was truncated because of lack of tablespace for the cache administration user.

ttCacheValidationAbortedTrap

Error

The Cache Agent aborted cache group validation because of a fatal error. Please refer to the user error log for details.

ttCacheValidationErrorTrap

Error

The Cache Agent has detected fatal anomalies with cache group cache-group-name that will prevent it from properly refreshing the cache group, or it has detected fatal anomalies within the refresh interval time-in-ms. Please refer to the user error log for details.

ttCacheValidationWarningTrap

Warning

The Cache Agent has detected anomalies with cache group cache-group-name that may prevent it from properly refreshing the cache group. Please refer to the user error log for details.

ttDSCkptFailedTrap

Error

A checkpoint has failed. Check the user error log and get view the checkpoint history using the built-in procedure ttCkptHistory.

ttDaemonOutOfMemoryTrap

Error

Call to malloc failed in TimesTen daemon.

ttDSDataCorruptionTrap

Error

Database corruption error has occurred.

ttDSGoingInvalidTrap

Error

Setting database to invalid state. Database invalidation usually happens when an application that is connected to the database is killed or exits abruptly without first disconnecting from the database. If TimesTen encounters an unrecoverable internal error during a database operation, it may also invalidate the database. You must commit or rollback and recover the database.

ttDSThreadCreateFailedTrap

Error

A process (typically multi-threaded) having multiple connections to a database exits abnormally. The subdaemon assigned to clean up the connections creates a separate thread for each connection. If creation of one of these threads fails, this trap is thrown. Thread creation may fail due to memory limitations or having too many threads in the system. After the trap is thrown, the thread creation is attempted four more times, with an increasingly longer pause between each attempt. The total time between the first and last attempt is approximately 30 seconds. If the fifth attempt fails, the database is invalidated.

ttFileWriteErrorTrap

Error

Error encountered during file I/O write.

ttMainDaemonDiedTrap

Error

Main or sub daemons died abnormally. This message is sent by a subdaemon when it notices that the main daemon has died. It suggests that the main daemon has been killed or has crashed.You must restart the main daemon.

ttMainDaemonExitingTrap

Informational

Main or sub daemons exiting normally.

ttMainDaemonReadyTrap

Informational

Main daemon has started.

ttMsgLogOpenFailedTrap

Error

The message log could not be opened, possibly because of a lack of privileges on the file. Check the file location and privileges.

ttPartitionSpaceExhaustedTrap

Error

Database partition (permanent or temporary) space is exhausted. This message is sent when either the permanent or temporary free space in the database is exhausted. Generally this message is preceded by the ttPartitionSpaceStateTrap warning message. See "PermWarnThreshold" and "TempWarnThreshold" in Oracle TimesTen In-Memory Database Reference for information on how to set the threshold.

ttPartitionSpaceStateTrap

Warning

Database partition (permanent or temporary) space is transitioning from OK to low or vice versa. This message is sent when either the permanent partition or the temporary partition free space in the database reaches a threshold or transitions back below the threshold. This message is sent only when the free space has reached the threshold specified by the PermWarnThrehold or TempWarnThreshold attribute at the time of the first connection to the database. See "PermWarnThreshold" and "TempWarnThreshold" in Oracle TimesTen In-Memory Database Reference for information on how to set the threshold.

ttQueryThresholdWarnTrap

Warning

A SQL query exceeded the user-specified threshold. The text of the query can be found n the user log message. The Transaction ID and the Statement ID of the query can be found both in the trap and the user log message. After issuing the trap, the query continues executing.

ttRepAgentClockSkewTrap

Error

Replication with a peer failed due to excessive clock skew. The skew between nodes in an active standby scheme has exceeded the allowed limit of 250ms.

ttRepAgentDiedTrap

Error

A replication agent has died abnormally. This message is sent when the main TimesTen daemon notices that a replication agent has died abnormally. This generally means that the replication agent has been killed or has crashed.

ttRepAgentExitingTrap

Informational

Replication agent exiting normally.

ttRepAgentStartingTrap

Informational

Replication agent starting.

ttRepCatchupStartTrap

Warning

Indicates that TimesTen has begun to restore a master from a subscriber where bi-directional replication has been configured, after a failure.

ttRepCatchupStopTrap

Warning

Indicates that TimesTen has restored a master database from a subscriber, where bi-directional replication was configured.

ttRepConflictReportStartingTrap

Informational

Indicates that conflict reporting has been restarted because the rate of conflicts has fallen below the low water mark set in the replication scheme. This trap also indicates how many conflicts went unreported during the period in which reporting was suspended.

ttRepConflictReportStoppingTrap

Informational

Indicates that suspension of conflict reporting has occurred because the rate of conflicts has exceeded the high water mark set in the replication scheme.

ttRepReturnTransitionTrap

Warning

Replication return receipt has been enabled or disabled on the subscriber.

ttRepSubscriberFailedTrap

Error

Subscriber marked as failed because too much log accumulated on its behalf by the master.

ttRepSubscriberTCPConnectFailedTrap

Error

A replication TCP connection failed.

ttRepUpdateFailedTrap

Warning

A replication insert, update or delete operation failed.

ttUnexpectedEndOfLogTrap

Error/Warning

Premature end of log file reached during a database recovery. If your application connected with LogAutoTruncate=1 (the default), this trap represents a warning, recovery continues with error messages. If your application connected with LogAutoTruncate=0, recovery fails with error messages.


Example

A typical TimesTen trap may supply the following information:

Enterprise Specific Trap (ttDSGoingInvalidTrap) Uptime: 4:34:16
enterprises.timesten.ttSystem.ttTimeStamp = "2002-07-20 22:24:49 (GMT)"
enterprises.timesten.ttSystem.ttPid = 127
enterprises.timesten.ttSystem.ttUid = "SYSTEM"
enterprises.timesten.ttSystem.ttVersion = "@(#) TimesTen Revision: 11.2.1.0.0 Date: 2008/07/07 18:24:10, instance giraffe"
enterprises.timesten.ttMsg, ttMesg "Data store going Invalid (from master daemon)"
enterprises.timesten.ttDataStore.ttDSName = "tptbmdata1121"
enterprises.timesten.ttDataStore.ttDSShmKey = "DBI39775920.0.SHM.12"
enterprises.timesten.ttDataStore.ttDSNConn = 2

This trap was generated from a TimesTen daemon running on a Windows system. The Uptime field, which is required by SNMP, lists the elapsed time since the start of the process which generated this trap. In this case, the process ttsrv1121.exe has been running for 4 hours, 34 minutes, and 16 seconds.

This specific trap is for the Database going invalid event. So additionally, it reports the database name, shared memory key of the database and the number of current connections to the database.