This topic includes the following sections:
As an administrator, you must ensure that once an application is up and running, it continues to meet the performance, availability, and security requirements set by your company. To perform this task, you need to monitor the resources (such as shared memory), activities (such as transactions), and potential problems (such as security breaches) in your configuration, and take any necessary corrective actions.
To help you meet this responsibility, the Oracle Tuxedo system provides several methods for monitoring system and application events, and dynamically reconfiguring your system to improve performance. The following facilities offer an excellent view of how your system is working:
These tools help make your application capable of responding quickly and efficiently to changing business needs or failure conditions. They also assist you in managing your application’s performance and security.
Figure 2-1 shows the monitoring tools.
The Oracle Tuxedo system offers the following tools to monitor your application:
The Oracle Tuxedo system enables you to monitor system and application data.
To help you monitor a running system, your Oracle Tuxedo system maintains parameter settings and generates statistics for the following system components:
You can access these components using the MIB or
tmadmin. You can set up your system so that it can use the statistics in the bulletin board to make decisions and to modify system components dynamically, without your intervention. With proper configuration, your system can perform the following tasks (when bulletin board statistics indicate that they are required):
By monitoring the administrative data for your system, you can prevent and resolve problems that threaten the performance, availability, and security of your application.
To ensure that you have the information necessary to monitor your system, the Oracle Tuxedo system provides the following three data repositories:
UBBCONFIG—a text file in which you define the parameters of your system and application
You can monitor two types of administrative data that are available on every running Oracle Tuxedo system:and .
Static data about your configuration consists of configuration settings that you assign when you first configure your system and application. These settings are never changed without intervention (either in realtime or through a program you have provided). Examples include system-wide parameters (such as the number of machines used) and the amount of interprocess communication (IPC) resources (such as shared memory) allocated to your system on your local machine. Static data is kept in the
UBBCONFIG file and in the bulletin board.
At times you may need to check static data about your configuration. For example, you may want to add a large number of machines without exceeding the maximum number of machines allowed in your configuration (or allowed in the machine tables of the bulletin board). You can look up the maximum number of machines allowed by checking the current values of the system-wide parameters for your configuration (one of which is
You may be able to improve the performance of your application by tuning your system. To determine whether tuning is required, you need to check the amount of local IPC resources currently available.
Dynamic data about your configuration consists of information that changes in realtime, that is, while an application is running. For example, the load (the number of requests sent to a server) and the state of various configuration components (such as servers) change frequently. Dynamic data is kept in the bulletin board.
Dynamic configuration data is useful in resolving many administrative problems, as demonstrated by two examples.
In the first example, suppose your throughput is suffering and you want to know whether you have enough servers running to accommodate the number of clients currently connected. Check the number of running servers and connected clients, and the load on one or more servers. These numbers help you determine whether adding more servers will improve performance.
In the second example, suppose you receive multiple complaints about slow response from users when making particular requests of your application. By checking load statistics, you can determine whether increasing the value of the
BLOCKTIME parameter would improve response time.
When evaluating whether your Oracle Tuxedo system is operating normally, you might want to consider the following list of common startup and shutdown problems, and monitor your system periodically.
IPCKEYis already in use
TLOGfile is not created
To monitor a running application, you need to keep track of the
UBBCONFIG file when necessary. The method you choose depends on the following factors:
RESOURCESsection of the
UBBCONFIGfile through the
tmadmincommand, you will have access to only the current values.
Table 2-1 describes how to use each monitoring method.
(for example, ULOG, TLOG)
Specifying a tracing expression that contains a category, a filtering expression, and an action, and enabling the
The Oracle Administration Console is a graphical user interface to the MIB that enables you to tune and modify your application. It is accessed through the World Wide Web and used through a Web browser. Any administrator with a supported browser can monitor a Oracle Tuxedo application.
The toolbar is a row of 12 buttons that allow you to run tools for frequently performed administrative and monitoring functions. All buttons are labeled with both icons and names. The following buttons are available for monitoring:
To monitor your application through the command-line interface, use the tmadmin(1) or txrpt(1) command.
tmadmin command is an interpreter for 53 commands that enable you to view and modify a bulletin board and its associated entities. Using the
tmadmin commands, you can monitor statistical information in the system such as the state of services, the number of requests executed, the number of queued requests, and so on.
tmadmin commands, you can also dynamically modify your Oracle Tuxedo system. You can, for example, perform the following types of changes while your system is running:
Whenever you start a
tmadmin session, you can choose the following operating modes for that session: the default operating mode, read-only mode, or configuration mode:
tmadminsession, if you have administrator privileges (that is, if your effective UID and GID are those of the administrator).
tmadminprocess attaches to the bulletin board as a client, leaving your administrator slot available for other work.
tmadminsession in configuration mode on any machine, including an inactive machine. On most inactive machines, configuration mode is required in order to run
tmadmin. (The only inactive machine on which you can start a
tmadminsession without requesting configuration mode is the
|Note:||You can also generate a report of the Oracle Tuxedo version and license numbers.|
txrpt command analyzes the standard error output of a Oracle Tuxedo server and provides a summary of service processing time within the server. The report shows the number of times each service was dispatched and the average amount of time it took for each service to process a request during the specified period.
txrpt takes its input from the standard input or from a standard error file redirected as input. To create standard error files, have your servers invoked with the
-r option from the servopts(5) selection; you can name the file by specifying it with the
servopts option. Multiple files can be concatenated into a single input stream for
Over time, information about service X and server Y (on which service X resides) is accumulated in a file.
txrpt processes the file and provides you with a report about the service access and timing characteristics of the server.
tmadmin command is an interpreter for 53 commands that enable you to view and modify a bulletin board and its associated entities. Figure 2-2 shows you how a typical
tmadmin session works.
Following is a list of run-time system functions that you can monitor with
The Oracle Tuxedo EventBroker monitors a running application for events (for example, a state change in a MIB object, such as the transition of a client from active to inactive). When the EventBroker detects an event, it reports or posts the event, and then notifies relevant subscribers that the event has occurred. You can be informed automatically when events occur in the MIB by receiving
FML data buffers representing MIB objects. To post the event and report it to subscribers, the EventBroker uses the tppost(3c) function. Both administrators and application processes can subscribe to events.
The EventBroker recognizes over 100 meaningful state transitions to a MIB object as system events. A posting for a system event includes the current MIB representation of the object on which the event occurred, and some event-specific fields that identify the event that occurred. For example, if a machine is partitioned, an event is posted with the following:
To use the EventBroker, you simply subscribe to system events.
To help you identify error conditions quickly and accurately, the Oracle Tuxedo system provides the following log files:
TLOGis created only on machines involved in Oracle Tuxedo global transactions.
ULOGMILLISECenvironment variable is used to time stamp ulog message output intervals in milliseconds instead of seconds. The
ULOGRTNSIZEenvironment variable is used to specify rotation files size. For more information on
ULOGRTNSIZE, see userlog(3c) in the Oracle Tuxedo Command Reference.
These logs are maintained and updated constantly while your application is running.
The transaction log (
TLOG) keeps track of global transactions during the commit phase. At the end of the first phase of a 2-phase commit protocol, the participants in a global transaction issue a reply to the question of whether to commit or roll back the transaction. This reply is recorded in the
TLOG file is used only by the Transaction Manager Server (TMS) that coordinates global transactions. It is not read by the administrator. The location and size of the
TLOG are specified by four parameters that you set in the
MACHINES section of the
You must create a
TLOG on each machine that participates in global transactions.
The user log (
ULOG) is a file to which all messages generated by the Oracle Tuxedo system—error messages, warning messages, information messages, and debugging messages—are written. Application clients and servers can also write to the user log. A new log is created every day and there can be a different log on each machine. However, a
ULOG can be shared by multiple machines when a remote file system is being used.
ULOG provides an administrator with a record of system events from which the causes of most Oracle Tuxedo system and application failures can be determined. You can view the
ULOG, a text file, with any text editor. The
ULOG also contains messages generated by the
tlisten process. The
tlisten process provides remote service connections for other machines in an application. Each machine, including the master machine, should have a
tlisten process running on it.
The Oracle Tuxedo log files can help you detect failures in both your application and your system by:
TLOG is a binary file that contains only messages about global transactions that are in the process of being committed. To view the
TLOG, you must first convert it to text format so that it is readable. The Oracle Tuxedo system provides two
tmadmin operations to do this:
loadtlog commands are also useful when you need to move the
TLOG between machines as part of a server group migration or machine migration.
Use the MIB
T_TRANSACTION class to obtain the runtime transaction attributes within the system. The
pt) can also be used to display this information. Information about each group in the transaction is printed only if
tmadmin is running in verbose mode as set by a previous
Any serious errors during the transaction commit process, such as a failure while writing the
TLOG, is written to the
On each active machine in an application, the Oracle Tuxedo system maintains a log file that contains Oracle Tuxedo system error messages, warning messages, debugging messages, or other helpful information. This file is called the user log or
ULOG simplifies the job of finding errors returned by the Oracle Tuxedo ATMI, and provides a central repository in which the Oracle Tuxedo system and applications can store error information.
You can use the information in the
ULOG to identify the cause of system or application failures. Multiple messages about a given problem can be placed in the user log. Generally, earlier messages provide more useful diagnostic information than later messages.
In the following example of Listing 2-1, message 358 from the
LIBTUX_CAT catalog identifies the cause of the trouble reported in subsequent messages, namely, that there are not enough UNIX system semaphores to boot the application.
.1.0:LIBTUX_CAT:262: std main starting
.1.0:LIBTUX_CAT:358: reached UNIX limit on semaphore ids
.1.0:LIBTUX_CAT:248: fatal: system init function ...
.1.0:CMDTUX_CAT:825: Process BBL at SITE1 failed ...
.1.0:WARNING: No BBL available on site SITE1.
Will not attempt to boot server processes on that site.
|Note:||System Messages contains complete descriptions of user log messages and recommendations for any actions that should be taken to resolve the problems indicated.|
Part of the
ULOG records error messages to the
tlisten process. You can view
tlisten messages using any text editor. Each machine, including the
MASTER machine contains a separate
tlisten process. Though separate
tlisten logs are maintained in the
ULOG on each machine, they can be shared across remote file systems.
tlisten process failures.
tlisten is used, during the boot process, by
tmboot and, while an application is running, by
tlisten messages are created as soon as the
tlisten process is booted. Whenever a
tlisten process failure occurs, a message is recorded in the
|Note:||Application administrators are responsible for analyzing the
The Oracle Tuxedo System Messages CMDTUX Catalog contains the following information about
Consider the following example of a
tlisten message in the
121449.gumby!simpserv.27190.1.0: LIBTUX_CAT:262: std main starting
ULOG message consists of a tag and text. The tag consists of the following:
|Note:||Placeholders are printed in the
The text consists of the following:
|Note:||You can find this message in the Oracle Tuxedo System Messages LIBTUX Catalog.|
A Oracle Tuxedo application server can generate a log of the service requests it handles. The log is displayed on the server’s standard output (
stdout). Each record contains a service name, start time, and end time.
You can request such a log when a server is activated. The
txrpt facility produces a summary of the time spent by the server, thus giving you a way to analyze the log output. Using this data, you can estimate the relative workload generated by each service, which will help you set workload parameters appropriately for the corresponding services in the MIB.
There are essentially two operations you can perform using the MIB: you can get information from the MIB (a
get operation) or you can update information in the MIB (a
set operation) at any time using a set of ATMI functions (for example, tpalloc(3c), tprealloc(3c), tpcall(3c), tpacall(3c), tpgetrply(3c), tpenqueue(3c), and tpdequeue(3c)).
When you query the MIB with a
get operation, the MIB responds to your reply with a number of matches, and indicates how many more objects match your request. The MIB returns a handle (that is, the cursor) that you can use to get the remaining objects. The operation you use to get the next set of objects is called
getnext. The third operation occurs when queries span multiple buffers.
When you query the MIB, which is a virtual database, you are selecting a set of records from the database table. You can control the size of the database table in two ways: by controlling the number of objects about which you want information, or by controlling the amount of information about each object. Using key fields and filters, you can limit the scope of your request to data that is meaningful for your needs. The more limits you specify, the less information is requested from the application, and the faster the data is provided to you.
Data in the MIB is stored in a number of different places. Some data is replicated on more than one machine in a distributed application. Other data is not replicated, but is local to particular machines based on the nature of the data or the object represented.
Global data is information about application components such as servers that is replicated on every machine in an application. Most of the data about a server, for example, such as information about its configuration and state, is replicated globally throughout an application, specifically in every bulletin board. An Oracle Tuxedo application can access this information from anywhere.
For example, from any machine in an application called Customer Orders, the administrator can find out that server B6 belongs to Group 1, runs on machine CustOrdA, and is active.
Other information is not replicated globally, but is local to an entity, such as statistics for a server. An example of a local attribute is
TA_TOTREQC, which defines the number of times services have been processed in a specified server. This statistic is stored with the server on its host machine. When the server accepts and processes a service request, the counter is incremented. Because this kind of information is managed locally, replicating it would inhibit your system’s performance.
There are also classes in the MIB that are exclusively local, such as clients. When a client logs in, the Oracle Tuxedo system creates an entry for it in the bulletin board, and records all tracking information about the client in that entry. The MIB can determine the state of the client at anytime by checking this entry.
The Oracle Tuxedo system provides a programming interface that offers direct access to the MIB while your application is not running. This interface, the
tpadmcall function, gives the application direct access to the data upon which the MIB is based.
tpadmcall allows you access to a subset of information that is local to your process.
tpadmcall when you need to query the system or make administrative changes while your system is not running.
tpadmcall queries the
TUXCONFIG file on behalf of your request. Data buffers that you put in, and data buffers that you receive (containing your queries and the replies to them) are exactly the same.
ud32 is a client program delivered with the Oracle Tuxedo system that reads input consisting of text representation of
ud32 for ad hoc queries and updates to the MIB. It creates an
FML32 buffer, makes a service call with the buffer, receives a reply (also in an
FML32 buffer) from the service call, and displays the results on screen or in a file in text format.
ud32 builds an
FML32-type buffer with the
FML fields and values that you represent in text format, makes a service call to the identified service in the buffer, and waits for the reply. The reply then comes back in
FML32 format as a report. Now, because the MIB is
ud32 becomes the scripting tool for the MIB.
For example, suppose you write a small file that contains the following text:
service name=.tmib and ta_operation=get,
When you type this file into ud32, you receive an FML output buffer listing all the data in the system about the servers.
The Oracle Tuxedo system provides a run-time and user-level tracing facility that enable you to track the execution of distributed business applications. The system has a set of built-in trace points that mark calls to functions in different categories, such as ATMI functions issued by the application or XA functions issued by the Oracle Tuxedo system to an X/Open compliant resource manager.
To enable tracing, you must specify a tracing expression that contains a category, a filtering expression, and an action. The category indicates the type of function (such as ATMI) to be traced. The filtering expression specifies which particular functions trigger an action. The action indicates the response to the specified functions by the Oracle Tuxedo system.
The system may, for example, write a record in the
ULOG, execute a system command, or terminate a trace process. A client process can also propagate the tracing facility with its requests. This capability is called dyeing; the trace dye colors all services that are called by the client.
You can specify a tracing expression in the following ways.
TMTRACErun-time environment variable
For a simple tracing expression, define
TMTRACE=on in the environment of the client. This expression enables tracing of ATMI functions on the client and on any server that performs a service on behalf of that client. The trace records are written to the
You can also specify a tracing expression in the environment of a server using the
utrace tmtrace(5) receivers. For example, you might enter the following:
TMTRACE=atmi:/tpservice/ulog. If you export this setting within a server environment, a record with general run-time trace information is created in the
ULOGfile for all service requests performed on that server.
TMTRACE=atmi:utrace. Specifying the
utracereceiver automatically calls the user-defined tputrace(3c). If you export this setting within a server environment, a record with trace information and output location defined by the user is created for the ATMI functions running on that server.
You can activate or deactivate the tracing option using the
changetrace command of
tmadmin. This command enables you to overwrite the tracing expression on active client or server processes. Administrators can enable global tracing for all clients and servers, or for a particular machine, group, or server.
The Oracle Tuxedo system uses the following two administrative servers to distribute the information on the bulletin board to all active machines in the application:
DBBL—the Distinguished Bulletin Board Liaison server propagates global changes to the MIB and maintains the static part of the MIB. Specifically, the DBBL:
BBL—the Bulletin Board Liaison server maintains the bulletin board on its host machine, coordinating changes to the local MIB, and verifying the integrity of application programs active on its machine. Specifically, the bulletin board:
Figure 2-3 shows the diagnosis and repair using the DBBL and BBLs.
Both servers have a role in managing faults. The DBBL coordinates the state of other active machines in the application. Each BBL communicates state changes in the MIB, and sometimes sends a message to the DBBL indicating all is OK on its host machine.
The Oracle Tuxedo run-time system records events, along with system errors, warnings, and tracing events, in the user log (
ULOG). Programmers can use the
ULOG to debug their applications or notify administrators of special conditions or states found (for example, an authorization failure).
Using ATMI, a programmer controls some of the more global aspects of communications. ATMI provides functions for handling both application and system-related errors. When a service routine encounters an application error, such as an invalid account number, the client knows the service performed its task but could not fulfill its request because of an application error.
With a system failure, such as a server crashing while performing a request, the client knows the service routine did not perform its task because of an underlying system error. The Oracle Tuxedo system notifies programs of system errors that occur as it monitors the application’s behavior and its own behavior.
At times, a service may get stuck in an infinite loop while processing a request. The client waits, but no reply is forthcoming. To protect a client from endless waiting, the Oracle Tuxedo system has two types of configurable timeout mechanisms: blocking timeouts and transaction timeouts. For more information about these timeout mechanisms, refer toin Using the Oracle Tuxedo Domains Component.
A blocking timeout is a mechanism that ensures a blocked program waits no longer than the specified timeout value for something to occur. Once a timeout is detected, the waiting program is alerted with a system error informing it that a blocking timeout has occurred. The blocking timeout defines the duration of service requests, or how long the application is willing to wait for a reply to a service request. The timeout value is a global value defined in the
BLOCKTIME field of the
RESOURCES section of the
A transaction timeout is another type of timeout that can occur because active transactions tend to be resource-intensive. A transaction timeout defines the duration of a transaction, which may involve several service requests. The timeout value is defined when the transaction is started (with tpbegin(3c)). Transaction timeouts are useful when maximizing resources. For example, if database locks are held while a transaction progresses, an application programmer may want to limit the amount of time that the application’s transaction resources are held up. A transaction timeout always overrides a blocking timeout.
There are two
UBBCONFIG file transaction timeout parameters:
TRANTIMEwhich is specified in the
SERVICESsection of the
UBBCONFIGand controls the timeout value for a specific
MAXTRANTIMEwhich is specified in the
RESOURCESsection of the
UBBCONFIGand is used by the administrator to place a maximum upper bound on the timeout value of a transaction started via tpbegin(3c) or via an
For more information about these transaction timeout parameters, refer toin File Formats, Data Descriptions, MIBs, and System Processes Reference.
You can handle some failure situations by configuring an application with redundant servers and the automatic restart capability. Redundant servers provide high availability, and can be used to handle large amounts of work, server failures, or machine failures. The Oracle Tuxedo system continually checks the status of active servers, and when it detects the failure of a restartable server, the system automatically creates a new instance of that server.
By configuring servers with the automatic restart property, you can handle individual server failures.You can also specify the number of restarts that the system will provide. This capability can prevent a recurring application error by limiting the number of times a server is restarted.
The Oracle Tuxedo system frequently checks the availability of each active machine. A machine is marked as partitioned when it cannot be reached by the system. If this occurs, a system event is generated. A partition can occur due to a network failure, machine failure, or severe performance degradation.
Therefore application programmers should keep in mind the possibility that individual threads within a process may die. If one thread dies and a signal is issued, the whole process to which the thread belongs usually dies, and that death is detected by the BBL.
If a thread dies as the result of an erroneous call to a thread exit function, however, no signal is generated. If this type of death occurs before the thread calls
tpterm(), then the BBL cannot detect the death and does not deallocate the registry table slot for the context associated with the dead thread. (It would not be proper for the BBL to deallocate this registry table slot even if it could detect the death of the thread because, in some application models, another thread might subsequently choose to associate itself with that context.)
|Note:||The information presented here applies to all multithreaded and/or multicontexted applications, regardless of which administrative tools are being used. The functionality is discussed from the point of view of an administrator using MIB calls, but is the same for an administrator using an interface to the MIB, whether that interface is tmadmin(1) or the Oracle Administration Console.|
You can obtain information about a multithreaded or multicontexted application by:
Information is available in the following locations:
T_SERVERCTXTclass of the
TM_MIBprovides multiple instances of 14 fields if multiple server dispatch threads are active simultaneously. Specifically, the
T_SERVERCTXTsection includes an instance of each of the following fields for each active sever dispatch thread:
For example, if 12 server dispatch threads are active simultaneously, then the
T_SERVERCTXT class of the MIB for this application will include 12 occurrences of the
TA_CONTEXTID field, 12 occurrences of the
TA_SRVGRP field, and so on.
When multiple instances of
T_SERVER class fields contain multiple values for different contexts of a multicontexted server, a “dummy” value is specified in the
T_SERVER class field and the
T_SERVERCTXT field contains an actual value for each context.