Administering an Oracle Tuxedo Application at Run Time

     Previous  Next    Open TOC in new window    View as PDF - New Window  Get Adobe Reader - New Window
Content starts here

Monitoring Your Oracle Tuxedo Application

This topic includes the following sections:

 


Ways to Monitor Your Application

As an administrator, you must ensure that once an application is up and running, it continues to meet the performance, availability, and security requirements set by your company. To perform this task, you need to monitor the resources (such as shared memory), activities (such as transactions), and potential problems (such as security breaches) in your configuration, and take any necessary corrective actions.

To help you meet this responsibility, the Oracle Tuxedo system provides several methods for monitoring system and application events, and dynamically reconfiguring your system to improve performance. The following facilities offer an excellent view of how your system is working:

These tools help make your application capable of responding quickly and efficiently to changing business needs or failure conditions. They also assist you in managing your application’s performance and security.

Figure 2-1 shows the monitoring tools.

Figure 2-1 Monitoring Tools

Monitoring Tools

The Oracle Tuxedo system offers the following tools to monitor your application:

See Also

 


System and Application Data That You Can Monitor

The Oracle Tuxedo system enables you to monitor system and application data.

Monitoring System Data

To help you monitor a running system, your Oracle Tuxedo system maintains parameter settings and generates statistics for the following system components:

You can access these components using the MIB or tmadmin. You can set up your system so that it can use the statistics in the bulletin board to make decisions and to modify system components dynamically, without your intervention. With proper configuration, your system can perform the following tasks (when bulletin board statistics indicate that they are required):

By monitoring the administrative data for your system, you can prevent and resolve problems that threaten the performance, availability, and security of your application.

Where the System Data Resides

To ensure that you have the information necessary to monitor your system, the Oracle Tuxedo system provides the following three data repositories:

Monitoring Dynamic and Static Administrative Data

You can monitor two types of administrative data that are available on every running Oracle Tuxedo system: static and dynamic.

What Is Static Data?

Static data about your configuration consists of configuration settings that you assign when you first configure your system and application. These settings are never changed without intervention (either in realtime or through a program you have provided). Examples include system-wide parameters (such as the number of machines used) and the amount of interprocess communication (IPC) resources (such as shared memory) allocated to your system on your local machine. Static data is kept in the UBBCONFIG file and in the bulletin board.

Checking Static Data

At times you may need to check static data about your configuration. For example, you may want to add a large number of machines without exceeding the maximum number of machines allowed in your configuration (or allowed in the machine tables of the bulletin board). You can look up the maximum number of machines allowed by checking the current values of the system-wide parameters for your configuration (one of which is MAXMACHINES).

You may be able to improve the performance of your application by tuning your system. To determine whether tuning is required, you need to check the amount of local IPC resources currently available.

What Is Dynamic Data?

Dynamic data about your configuration consists of information that changes in realtime, that is, while an application is running. For example, the load (the number of requests sent to a server) and the state of various configuration components (such as servers) change frequently. Dynamic data is kept in the bulletin board.

Checking Dynamic Data

Dynamic configuration data is useful in resolving many administrative problems, as demonstrated by two examples.

In the first example, suppose your throughput is suffering and you want to know whether you have enough servers running to accommodate the number of clients currently connected. Check the number of running servers and connected clients, and the load on one or more servers. These numbers help you determine whether adding more servers will improve performance.

In the second example, suppose you receive multiple complaints about slow response from users when making particular requests of your application. By checking load statistics, you can determine whether increasing the value of the BLOCKTIME parameter would improve response time.

 


Common Startup and Shutdown Problems

When evaluating whether your Oracle Tuxedo system is operating normally, you might want to consider the following list of common startup and shutdown problems, and monitor your system periodically.

Common Startup Problems

Common Shutdown Problems

 


Selecting Appropriate Monitoring Tools

To monitor a running application, you need to keep track of the dynamic aspects of your configuration and sometimes check the static data. In other words, you need to be able to watch the bulletin board on an ongoing basis and consult the UBBCONFIG file when necessary. The method you choose depends on the following factors:

Table 2-1 describes how to use each monitoring method.

Table 2-1 How to Use Each Monitoring Method
Use This Method...
By...
Using a graphical interface.
Command-line utilities, such as txrpt and tmadmin
Entering commands after a prompt.
Subscribing to Oracle Tuxedo system events, such as servers dying, and network failures.
Log files (for example, ULOG, TLOG)
Viewing the ULOG with any text editor; checking the ULOG for tlisten messages; and converting the TLOG (a binary file) to a text file by running tmadmin dumptlog which downloads a TLOG to a text file.
Writing programs that monitor your run-time application.
Run-time and user-level tracing utility
Specifying a tracing expression that contains a category, a filtering expression, and an action, and enabling the TMTRACE run-time and TMUTRACE user-level environment variable. For more information, see Using the Run-time and User-level Tracing Utility.

 


Using the Oracle Administration Console to Monitor Your Application

The Oracle Administration Console is a graphical user interface to the MIB that enables you to tune and modify your application. It is accessed through the World Wide Web and used through a Web browser. Any administrator with a supported browser can monitor a Oracle Tuxedo application.

Using the Toolbar to Monitor Activities

The toolbar is a row of 12 buttons that allow you to run tools for frequently performed administrative and monitoring functions. All buttons are labeled with both icons and names. The following buttons are available for monitoring:

See Also

 


Using Command-line Utilities to Monitor Your Application

To monitor your application through the command-line interface, use the tmadmin(1) or txrpt(1) command.

Inspecting Your Configuration Using tmadmin

The tmadmin command is an interpreter for 53 commands that enable you to view and modify a bulletin board and its associated entities. Using the tmadmin commands, you can monitor statistical information in the system such as the state of services, the number of requests executed, the number of queued requests, and so on.

Using the tmadmin commands, you can also dynamically modify your Oracle Tuxedo system. You can, for example, perform the following types of changes while your system is running:

Whenever you start a tmadmin session, you can choose the following operating modes for that session: the default operating mode, read-only mode, or configuration mode:

Note: You can also generate a report of the Oracle Tuxedo version and license numbers.

Generating Reports on Servers and Services Using txrpt

The txrpt command analyzes the standard error output of a Oracle Tuxedo server and provides a summary of service processing time within the server. The report shows the number of times each service was dispatched and the average amount of time it took for each service to process a request during the specified period. txrpt takes its input from the standard input or from a standard error file redirected as input. To create standard error files, have your servers invoked with the -r option from the servopts(5) selection; you can name the file by specifying it with the -e servopts option. Multiple files can be concatenated into a single input stream for txrpt.

Over time, information about service X and server Y (on which service X resides) is accumulated in a file. txrpt processes the file and provides you with a report about the service access and timing characteristics of the server.

See Also

 


How a tmadmin Session Works

The tmadmin command is an interpreter for 53 commands that enable you to view and modify a bulletin board and its associated entities. Figure 2-2 shows you how a typical tmadmin session works.

Figure 2-2 Typical tmadmin Session

Monitoring Your System Using tmadmin Commands

Following is a list of run-time system functions that you can monitor with tmadmin commands:

See Also

 


Using EventBroker to Monitor Your Application

The Oracle Tuxedo EventBroker monitors a running application for events (for example, a state change in a MIB object, such as the transition of a client from active to inactive). When the EventBroker detects an event, it reports or posts the event, and then notifies relevant subscribers that the event has occurred. You can be informed automatically when events occur in the MIB by receiving FML data buffers representing MIB objects. To post the event and report it to subscribers, the EventBroker uses the tppost(3c) function. Both administrators and application processes can subscribe to events.

The EventBroker recognizes over 100 meaningful state transitions to a MIB object as system events. A posting for a system event includes the current MIB representation of the object on which the event occurred, and some event-specific fields that identify the event that occurred. For example, if a machine is partitioned, an event is posted with the following:

To use the EventBroker, you simply subscribe to system events.

See Also

 


Using Log Files to Monitor Activity

To help you identify error conditions quickly and accurately, the Oracle Tuxedo system provides the following log files:

These logs are maintained and updated constantly while your application is running.

See Also

 


What Is the Transaction Log (TLOG)?

The transaction log (TLOG) keeps track of global transactions during the commit phase. At the end of the first phase of a 2-phase commit protocol, the participants in a global transaction issue a reply to the question of whether to commit or roll back the transaction. This reply is recorded in the TLOG.

The TLOG file is used only by the Transaction Manager Server (TMS) that coordinates global transactions. It is not read by the administrator. The location and size of the TLOG are specified by four parameters that you set in the MACHINES section of the UBBCONFIG file.

You must create a TLOG on each machine that participates in global transactions.

See Also

 


What Is the User Log (ULOG)?

The user log (ULOG) is a file to which all messages generated by the Oracle Tuxedo system—error messages, warning messages, information messages, and debugging messages—are written. Application clients and servers can also write to the user log. A new log is created every day and there can be a different log on each machine. However, a ULOG can be shared by multiple machines when a remote file system is being used.

The ULOG provides an administrator with a record of system events from which the causes of most Oracle Tuxedo system and application failures can be determined. You can view the ULOG, a text file, with any text editor. The ULOG also contains messages generated by the tlisten process. The tlisten process provides remote service connections for other machines in an application. Each machine, including the master machine, should have a tlisten process running on it.

 


Detecting Errors Using Logs

The Oracle Tuxedo log files can help you detect failures in both your application and your system by:

Analyzing the Transaction Log (TLOG)

The TLOG is a binary file that contains only messages about global transactions that are in the process of being committed. To view the TLOG, you must first convert it to text format so that it is readable. The Oracle Tuxedo system provides two tmadmin operations to do this:

The dumptlog and loadtlog commands are also useful when you need to move the TLOG between machines as part of a server group migration or machine migration.

Detecting Transaction Errors

Use the MIB T_TRANSACTION class to obtain the runtime transaction attributes within the system. The tmadmin command printtrans (pt) can also be used to display this information. Information about each group in the transaction is printed only if tmadmin is running in verbose mode as set by a previous verbose (v) command.

Any serious errors during the transaction commit process, such as a failure while writing the TLOG, is written to the USERLOG.

Analyzing the User Log (ULOG)

On each active machine in an application, the Oracle Tuxedo system maintains a log file that contains Oracle Tuxedo system error messages, warning messages, debugging messages, or other helpful information. This file is called the user log or ULOG. The ULOG simplifies the job of finding errors returned by the Oracle Tuxedo ATMI, and provides a central repository in which the Oracle Tuxedo system and applications can store error information.

You can use the information in the ULOG to identify the cause of system or application failures. Multiple messages about a given problem can be placed in the user log. Generally, earlier messages provide more useful diagnostic information than later messages.

ULOG Message Example

In the following example of Listing 2-1, message 358 from the LIBTUX_CAT catalog identifies the cause of the trouble reported in subsequent messages, namely, that there are not enough UNIX system semaphores to boot the application.

Listing 2-1 Sample ULOG Messages
151550.gumby!BBL.28041.1.0: LIBTUX_CAT:262: std main starting 
151550.gumby!BBL.28041.1.0: LIBTUX_CAT:358: reached UNIX limit on semaphore ids
151550.gumby!BBL.28041.1.0: LIBTUX_CAT:248: fatal: system init function ...
151550.gumby!BBL.28040.1.0: CMDTUX_CAT:825: Process BBL at SITE1 failed ...
151550.gumby!BBL.28040.1.0: WARNING: No BBL available on site SITE1.
Will not attempt to boot server processes on that site.
Note: System Messages contains complete descriptions of user log messages and recommendations for any actions that should be taken to resolve the problems indicated.

Analyzing tlisten Messages in the ULOG

Part of the ULOG records error messages to the tlisten process. You can view tlisten messages using any text editor. Each machine, including the MASTER machine contains a separate tlisten process. Though separate tlisten logs are maintained in the ULOG on each machine, they can be shared across remote file systems.

The ULOG records tlisten process failures. tlisten is used, during the boot process, by tmboot and, while an application is running, by tmadmin. tlisten messages are created as soon as the tlisten process is booted. Whenever a tlisten process failure occurs, a message is recorded in the ULOG.

Note: Application administrators are responsible for analyzing the tlisten messages in the ULOG, but programmers may also find it useful to check these messages.

The Oracle Tuxedo System Messages CMDTUX Catalog contains the following information about tlisten messages:

tlisten Message Example

Consider the following example of a tlisten message in the ULOG:

121449.gumby!simpserv.27190.1.0: LIBTUX_CAT:262: std main starting

A ULOG message consists of a tag and text. The tag consists of the following:

The text consists of the following:

Note: You can find this message in the Oracle Tuxedo System Messages LIBTUX Catalog.

See Also

 


Estimating Service Workload Using the Application Service Log

A Oracle Tuxedo application server can generate a log of the service requests it handles. The log is displayed on the server’s standard output (stdout). Each record contains a service name, start time, and end time.

You can request such a log when a server is activated. The txrpt facility produces a summary of the time spent by the server, thus giving you a way to analyze the log output. Using this data, you can estimate the relative workload generated by each service, which will help you set workload parameters appropriately for the corresponding services in the MIB.

 


Using the MIB to Monitor Your Application

There are essentially two operations you can perform using the MIB: you can get information from the MIB (a get operation) or you can update information in the MIB (a set operation) at any time using a set of ATMI functions (for example, tpalloc(3c), tprealloc(3c), tpcall(3c), tpacall(3c), tpgetrply(3c), tpenqueue(3c), and tpdequeue(3c)).

When you query the MIB with a get operation, the MIB responds to your reply with a number of matches, and indicates how many more objects match your request. The MIB returns a handle (that is, the cursor) that you can use to get the remaining objects. The operation you use to get the next set of objects is called getnext. The third operation occurs when queries span multiple buffers.

Limiting Your MIB Queries

When you query the MIB, which is a virtual database, you are selecting a set of records from the database table. You can control the size of the database table in two ways: by controlling the number of objects about which you want information, or by controlling the amount of information about each object. Using key fields and filters, you can limit the scope of your request to data that is meaningful for your needs. The more limits you specify, the less information is requested from the application, and the faster the data is provided to you.

Querying Global and Local Data

Data in the MIB is stored in a number of different places. Some data is replicated on more than one machine in a distributed application. Other data is not replicated, but is local to particular machines based on the nature of the data or the object represented.

What Is Global Data?

Global data is information about application components such as servers that is replicated on every machine in an application. Most of the data about a server, for example, such as information about its configuration and state, is replicated globally throughout an application, specifically in every bulletin board. An Oracle Tuxedo application can access this information from anywhere.

For example, from any machine in an application called Customer Orders, the administrator can find out that server B6 belongs to Group 1, runs on machine CustOrdA, and is active.

What Is Local Data?

Other information is not replicated globally, but is local to an entity, such as statistics for a server. An example of a local attribute is TA_TOTREQC, which defines the number of times services have been processed in a specified server. This statistic is stored with the server on its host machine. When the server accepts and processes a service request, the counter is incremented. Because this kind of information is managed locally, replicating it would inhibit your system’s performance.

There are also classes in the MIB that are exclusively local, such as clients. When a client logs in, the Oracle Tuxedo system creates an entry for it in the bulletin board, and records all tracking information about the client in that entry. The MIB can determine the state of the client at anytime by checking this entry.

Using tmadmcall to Access Information

The Oracle Tuxedo system provides a programming interface that offers direct access to the MIB while your application is not running. This interface, the tpadmcall function, gives the application direct access to the data upon which the MIB is based. tpadmcall allows you access to a subset of information that is local to your process.

Use tpadmcall when you need to query the system or make administrative changes while your system is not running. tpadmcall queries the TUXCONFIG file on behalf of your request. Data buffers that you put in, and data buffers that you receive (containing your queries and the replies to them) are exactly the same.

See Also

 


Querying and Updating the MIB with ud32

ud32 is a client program delivered with the Oracle Tuxedo system that reads input consisting of text representation of FML buffers. You can use ud32 for ad hoc queries and updates to the MIB. It creates an FML32 buffer, makes a service call with the buffer, receives a reply (also in an FML32 buffer) from the service call, and displays the results on screen or in a file in text format.

ud32 builds an FML32-type buffer with the FML fields and values that you represent in text format, makes a service call to the identified service in the buffer, and waits for the reply. The reply then comes back in FML32 format as a report. Now, because the MIB is FML32-based, ud32 becomes the scripting tool for the MIB.

For example, suppose you write a small file that contains the following text:

service name=.tmib and ta_operation=get, TACLASSES=T_SERVER

When you type this file into ud32, you receive an FML output buffer listing all the data in the system about the servers.

 


Using the Run-time and User-level Tracing Utility

The Oracle Tuxedo system provides a run-time and user-level tracing facility that enable you to track the execution of distributed business applications. The system has a set of built-in trace points that mark calls to functions in different categories, such as ATMI functions issued by the application or XA functions issued by the Oracle Tuxedo system to an X/Open compliant resource manager.

To enable tracing, you must specify a tracing expression that contains a category, a filtering expression, and an action. The category indicates the type of function (such as ATMI) to be traced. The filtering expression specifies which particular functions trigger an action. The action indicates the response to the specified functions by the Oracle Tuxedo system.

The system may, for example, write a record in the ULOG, execute a system command, or terminate a trace process. A client process can also propagate the tracing facility with its requests. This capability is called dyeing; the trace dye colors all services that are called by the client.

You can specify a tracing expression in the following ways.

You can activate or deactivate the tracing option using the changetrace command of tmadmin. This command enables you to overwrite the tracing expression on active client or server processes. Administrators can enable global tracing for all clients and servers, or for a particular machine, group, or server.

See Also

 


Managing Errors Using the DBBL and BBLs

The Oracle Tuxedo system uses the following two administrative servers to distribute the information on the bulletin board to all active machines in the application:

Figure 2-3 shows the diagnosis and repair using the DBBL and BBLs.

Figure 2-3 Diagnosis and Repair Using the DBBL and BBLs

Diagnosis and Repair Using the DBBL and BBLs

Both servers have a role in managing faults. The DBBL coordinates the state of other active machines in the application. Each BBL communicates state changes in the MIB, and sometimes sends a message to the DBBL indicating all is OK on its host machine.

The Oracle Tuxedo run-time system records events, along with system errors, warnings, and tracing events, in the user log (ULOG). Programmers can use the ULOG to debug their applications or notify administrators of special conditions or states found (for example, an authorization failure).

 


Using ATMI to Handle System and Application Errors

Using ATMI, a programmer controls some of the more global aspects of communications. ATMI provides functions for handling both application and system-related errors. When a service routine encounters an application error, such as an invalid account number, the client knows the service performed its task but could not fulfill its request because of an application error.

With a system failure, such as a server crashing while performing a request, the client knows the service routine did not perform its task because of an underlying system error. The Oracle Tuxedo system notifies programs of system errors that occur as it monitors the application’s behavior and its own behavior.

Using Configurable Timeout Mechanisms

At times, a service may get stuck in an infinite loop while processing a request. The client waits, but no reply is forthcoming. To protect a client from endless waiting, the Oracle Tuxedo system has two types of configurable timeout mechanisms: blocking timeouts and transaction timeouts. For more information about these timeout mechanisms, refer to Specifying Domains Transaction and Blocking Timeouts in Using the Oracle Tuxedo Domains Component.

A blocking timeout is a mechanism that ensures a blocked program waits no longer than the specified timeout value for something to occur. Once a timeout is detected, the waiting program is alerted with a system error informing it that a blocking timeout has occurred. The blocking timeout defines the duration of service requests, or how long the application is willing to wait for a reply to a service request. The timeout value is a global value defined in the BLOCKTIME field of the RESOURCES section of the TUXCONFIG file.

A transaction timeout is another type of timeout that can occur because active transactions tend to be resource-intensive. A transaction timeout defines the duration of a transaction, which may involve several service requests. The timeout value is defined when the transaction is started (with tpbegin(3c)). Transaction timeouts are useful when maximizing resources. For example, if database locks are held while a transaction progresses, an application programmer may want to limit the amount of time that the application’s transaction resources are held up. A transaction timeout always overrides a blocking timeout.

There are two UBBCONFIG file transaction timeout parameters:

For more information about these transaction timeout parameters, refer to UBBCONFIG(5) in File Formats, Data Descriptions, MIBs, and System Processes Reference.

Configuring Redundant Servers to Handle Failures

You can handle some failure situations by configuring an application with redundant servers and the automatic restart capability. Redundant servers provide high availability, and can be used to handle large amounts of work, server failures, or machine failures. The Oracle Tuxedo system continually checks the status of active servers, and when it detects the failure of a restartable server, the system automatically creates a new instance of that server.

By configuring servers with the automatic restart property, you can handle individual server failures.You can also specify the number of restarts that the system will provide. This capability can prevent a recurring application error by limiting the number of times a server is restarted.

The Oracle Tuxedo system frequently checks the availability of each active machine. A machine is marked as partitioned when it cannot be reached by the system. If this occurs, a system event is generated. A partition can occur due to a network failure, machine failure, or severe performance degradation.

See Also

 


Monitoring Multithreaded and Multicontexted Applications

How to Retrieve Data About a Multithreaded/Multicontexted Application Using the MIB

Note: The information presented here applies to all multithreaded and/or multicontexted applications, regardless of which administrative tools are being used. The functionality is discussed from the point of view of an administrator using MIB calls, but is the same for an administrator using an interface to the MIB, whether that interface is tmadmin(1) or the Oracle Administration Console.

You can obtain information about a multithreaded or multicontexted application by:

Information is available in the following locations:

See Also


  Back to Top       Previous  Next