Oracle Tuxedo System and Application Monitor (TSAM), is an Oracle Tuxedo add-on product. Tuxedo is widely used by enterprises that develop and use in mission-critical applications. It acts as the infrastructure layer in distributed computing environments. The complexity of Tuxedo and the applications running on top of it makes performance measurement extremely complex.
Oracle TSAM monitors the major performance sensitive areas of a Tuxedo-supported enterprise computing environment. It can be used to monitor real-time performance bottlenecks and business data fluctuations, determine service models, and provide notification when pre-defined thresholds are violated.
Oracle TSAM Features
The following is a list of Oracle TSAM features:
Tracking a Tuxedo system call transmissions. Each monitored call is assigned a unique ID and is propagated along a call path tree. TSAM is able to track calls across multiple machines and domains.
Real-time call path tree tracking of a monitored request is displayed and the performance metrics for each step are available.
Call pattern summarization based on historical call tracking data.
Monitoring a particular Tuxedo service, checking its response time, IPC queue length and execution status. The data can be queried using recent or historical data.
Monitoring CORBA interface metrics
Gathers Oracle Tuxedo GWTDOMAIN, BRIDGE and GWWS overall throughput, graphically displaying the business data flow curve.
Tracking transactions with XA API specifications. Displays execution status and time used on each XA call. TSAM helps diagnose global distributed transactions. Correlate transactions across multiple domains in tree style. TSAM supports the transaction monitoring propagation that is if monitoring enabled for the transaction initiator, the whole transaction path will be monitored.
Monitoring Oracle Tuxedo Application Runtime for CICS and Batch specific metrics (including transactions and terminal data).
Flexible monitoring controls. Powerful monitoring policy can achieve the exact monitoring result and reduce the performance impact. The sampling can be based on interval, ratio and runtime data. The monitoring can be turned on or off dynamically without restarting application.
Comprehensive SLA alert configurations based on monitoring metrics. Alert evaluation is based on Tuxedo FML Boolean expression. The event generated by Alert can be post to Tuxedo Event Broker. Some alert type can also drop the staled service request.
Programming APIs that retrieve metadata packaged in a monitored call. Helps developers make application decisions dynamically.
Plug-in mechanism for performance metrics collection at the Tuxedo infrastructure level. It provides great integration capability between TSAM and other third-party products.
Scalable Tuxedo-side server monitoring design to meet small, middle and large Tuxedo runtime environments.
J2EE based solution. Easy to deploy, configure and use. It is a pure Web-based solution with WEB 2.0 technologies. The TSAM Console can be accessed anywhere using a compatible Web browser.
Easy-to-understand metrics database schema. The metrics can be used for data mining or further business analysis.
Performs data storage, aggregation, computing and representation.
Oracle TSAM Agent
The Oracle TSAM Agent handles all Tuxedo-side back-end logic. It works in conjunction with the Oracle TSAM Manager, and includes the following sub-components:
Oracle TSAM Framework: The framework is the data collection engine. It is an independent layer working between Tuxedo infrastructure and other TSAM components. This module is responsible for run time metrics collection, alert evaluation and monitoring policy enforcement.
Oracle TSAM Plug-in: An extensible mechanism invoked by the Oracle TSAM Framework. The Oracle TSAM Agent provides default plug-ins to send data to the LMS (Local Monitor Server), and then to the Oracle TSAM Manager. The plug-in allows custom plug-in to be hooked to intercept the metrics. The default plug-in communicates with LMS with share memory. Application will not be blocked at metrics collection point.
You can develop your own plug-ins for additional data processing. A customized plug-in can be linked to an existing plug-in chain, or replace the default plug-in.
Local Monitor Server (LMS): The LMS is an Oracle Tuxedo system server. The Oracle TSAM default plug-in sends data to the LMS. The LMS then passes the data to the Oracle TSAM Manager in HTTP protocol. LMS is required on each Tuxedo machine if the node need to be monitored.
The Oracle TSAM Manager is built on J2EE technology. It includes following components:
Oracle TSAM Data Server: The data server is responsible for:
accepting data from the LMS and store them into database
accepting requests from representation layer and does data processing
communicating with LMS for configuration instructions.
Oracle TSAM Console: The Oracle TSAM presentation layer. It is a J2EE Web application and can be accessed via a compatible Web browser. After logging on to the Oracle TSAM Console, you have access to full Oracle TSAM functionality.
Tuxedo is typically used by a client program (not necessarily a Tuxedo client process) that calls a service to perform a business computing logic scenario. The service implementation is completely transparent to the caller. This type of middleware transparency provides many benefits for development, deployment, and system administration. However, from a monitoring perspective, it is difficult for the end user or administrator to figure out what happens “behind the scene”. Oracle TSAM call path monitoring helps to alleviate this problem.
Call Path Tree Definition
A simple Tuxedo application call triggers a set of service invocations. The involved services constitute a tree (“call path tree”). A call path tree strictly defines the following factors:
What type of services are involved to perform the initial service request.
The service invocation depth (that is, the depth of the call path tree).
The service invocation sequence. For example, client A calls SVC1. SVC1 calls SVC2 and SVC3.
Call transportation. The edge (how information is sent and received) of a call path tree represents the transportation information from caller to service provider. It could be an IPC queue, BRIDGE connection or DOMAIN connection. The elapse time used for each transportation is also recorded.
Call path metrics. Lots of metrics are available during the message propagation within Tuxedo system, such as the message size, execution status, transaction and CPU consumption etc.
A "monitoring initiator" is a process that "initiates" tracking a call path tree. The process can be a Tuxedo client, application server, client proxy server (WSH/JSH), the Tuxedo domain gateway server or web services proxy serve GWWS. A typical scenario is when a tpcall/tpacall is invoked by the monitoring initiator; call path monitoring begins. All the back-end services involved in this call are displayed on the call path tree representation in the Oracle TSAM Console.
Currently only tpcall/tpacall can trigger a call path monitoring. Other communication models are not supported.
A Tuxedo application server performs two functions:
All sub-calls made in the service implementation are a part of the call path tree started by the original monitoring initiator (if the incoming request is already monitored).
It is a monitoring initiator with calls made in the service routine according to the monitoring policy definition.
Service monitoring focuses on Tuxedo service execution status. It does not care about call correlation, as call path monitoring does. Service monitoring can be used with call path monitoring together or performed independently. Tuxedo CORBA is also based on the service infrastructure, so in this release the CORBA interfaces can also be monitored.
System Server Monitoring
Tuxedo has several important system servers: BRIDGE, GWTDOMAIN and GWWS. BRIDGE connects multiple Tuxedo machines within a Tuxedo domain. GWTDOMAIN connects one Tuxedo domain with others. GWWS is the web services gateway. The system server monitoring tracks message throughput, pending sent messages and awaiting reply messages on each network link for BRIDGE and GWTDOMAIN. For GWWS, the web service requests statistics will be collected.
A critical use of Tuxedo is transaction monitoring. Tuxedo coordinates activities in a distributed transaction with an XA compliant resource manager, such as a database. Oracle TSAM transaction monitoring tracks each XA call triggered in a transaction allowing you to clearly identify where a global distributed transaction is bottle necked. TSAM supports the transaction monitoring capability propagation. That is if the transaction initiator is monitored, all XA calls on the transaction path will be monitored. The propagation supports transaction across domains.
Oracle Tuxedo Application Runtime for CICS and Batch Monitoring
Oracle TSAM allows you to monitor the following Oracle Tuxedo Application Runtime for CICS and Batch components.
CICS Region representation (status and components).
Oracle TSAM provides a comprehensive policy monitoring mechanism. When the proper policy monitoring settings are created, you can collect the exact metrics needed with minimum application performance impact. You define a policy using the TSAM Web console and apply it to an Oracle Tuxedo application automatically. Oracle TSAM Policy Monitoring has the following characteristics:
Monitoring Category. One monitoring policy can focus on one kind of monitoring, such as call path. It can also cover multiple interested areas.
Enable or Disable. Oracle TSAM monitoring can be dynamically turned on or off. Monitoring policy can be predefined and enabled when the monitoring is needed. All enabled monitoring policy will be applied to Tuxedo applications automatically while application is running. Non-started application will get the policy while it is started.
Interval-Based Monitoring. Monitoring is initiated based on specific time intervals. For example, call path monitoring. An interval-based monitoring policy can specify that the call path is tracked in 60-second intervals.
Ratio-Based Monitoring. Monitoring is initiated by the number of executions. For example, service monitoring. A ratio that is set to 5 indicates that every 5 executed services are monitored. For call path monitoring, a ratio set to 5 indicates that every 5 tpcall/tpacall calls are monitored.
Runtime Condition Filtering. TSAM monitoring policy supports some run time filters. Customer can monitor a particular service, a request from specific client and some kind of process type. The filter supports regular expression format.
Flexibility to Reduce Monitoring Performance Impact. Oracle TSAM monitoring control enables you to configure the monitoring policy based on your application size, load and network activity. The monitoring policy can support only alert triggering without raw metrics storage.
Oracle TSAM performance metrics are listed as follows:
Correlation ID: A unique identifier that represents a call path tree. It is generated by the monitoring initiator plug-in. It uses the following format:
DOMAINID:MASTERHOSTNAME:IPCKEY LMID PROCESSNAME PID TID COUNTER TIMESTAMP
Listing 1 shows an example of a Correlation ID. The monitored call is started by the program “bankclient” with process ID 8089 and thread ID 1 on machine “SITE1” on Tuxedo domain “TUXDOM1”. The master is “bjsol18” and IPCKEY in TUXCONFIG is “72854”.
Service Name: The name of an Oracle Tuxedo Service.
Location: The set of metrics to identify the process who sends out the performance metrics. It includes information about domain, machine, group and process name etc.
IPC Queue Length: The message number in an IPC queue.
IPC Queue ID: Oracle Tuxedo identifier of an IPC queue.
Execution Time: The time used in an Oracle Tuxedo service or XA call execution in milliseconds.
Wait Time: The time used of a message in the transportation stage.
CPU Time: The CPU time consumed by the service request processing. It only applies to single threaded server.
Message Size: The Oracle Tuxedo message size.
Execution Status: The tpreturn service return code. It is defined by the Oracle Tuxedo ATMI interface.
Call Flags: The flags passed to tpcall/tpacall in the Oracle Tuxedo ATMI interface.
Call Type: tpcall, tpacall, or tpforward.
Elapse Time: The time elapsed time a call is monitored.
GTRID: OracleTuxedo global transaction ID.
Pending Message Number: The number of messages which are delivered to the Oracle Tuxedo network layer and waiting for being sent.
Message Throughput: The total message number and volume accumulated in system server monitoring intervals.
Waiting Reply Message Number: The number of requests in GWTDOMAIN awaiting a reply from the remote domain.
XA Code: The XA call return code in transaction monitoring.
XA Name: The XA call name.
GWWS Metrics: A set of metrics used to measure GWWS throughput, including:
Inbound Message Throughput
Inbound Message Processing Time
Outbound Message Throughput
Outbound Message Processing Time
Oracle Tuxedo Application Runtime for CICS and Batch Metrics
TCP terminal throughput
Oracle TSAM Use Cases
Oracle TSAM is built on top of Oracle Tuxedo and has unique service, call, and transaction tracking capabilities. Enterprise organization usually have many widely distributed services deployed and one client request that requires complex back-end service coordination to perform the processes.
It can be difficult for an administrator to figure out what exactly is happening during these interactions. Oracle TSAM call path monitoring helps to alleviate this problem.
The followings are FAQs will help you to better understand how Oracle TSAM works with your applications:
Message routing is an important concept in Tuxedo. It impacts the system's performance, business logic process correctness and application reliability and high availability. Lots of factors can affect Tuxedo message routing, such as:
Data dependent routing (DDR)
Transaction Affinity (Oracle RAC support)
In development stage, huge of tests needed to verify your settings take effect. Without TSAM, it is hard to tell the exact request going to the correct service desired. TSAM call path can depict each call path clearly and speed your development process.
Understanding Your Applications
What happens behind a simple call?
Enabling call path monitoring for a Tuxedo client or application server allows you to find out all the information behind a simple tpcall/tpacall. The tracking points span multiple machines and multiple domains. You can clearly see the following information in the call path tree:
The service invocation hierarchy that supports your call
The transmission cost for each message flow step, from IPC Queue to Network
The execution status of each service involved
The call type and call flags of all the intermediate calls
The waiting time in queue and response time for each service
The end-to-end response time
What about my services?
Service monitoring enables you to measure your service response time, IPC queue length, and execution status. Service monitoring provides the following information:
Service Execution Status Summary.
Oracle TSAM tells you how many service executions succeeded or failed recently or during a period of time. Oracle TSAM also computes the average response time. These are important factors in measuring the quality of your services.
Service Activity Trends.
Oracle TSAM also displays your services activity trends. It tells you what the peek time is and when the services requests are low.
Is my network busy?
Oracle TSAM allows you to monitor the network connection attached to your local domain gateways. You can easily find which link is busy and its data fluctuation trend. You have more in-depth understanding of the business data flow model between departments and organizations.
Who participates in my transaction?
Oracle TSAM monitors the transaction XA calls. Transaction participants are listed on the transaction monitoring page. For a large distributed transaction, a slow branch can result in the entire transaction being slowly completed. Oracle TSAM lets you know who the transaction participants are, and how much time is used during XA calls. The transaction monitoring can also help find the bottleneck in a two phase commit stage if multiple resource managers involved.
Solving Application Performance Problems
Why is the service response time slow recently?
Turn on the call path monitoring for a particular call to investigate the following:
How much network-side time is used
Which services are the most time-consuming point in the call path tree
Is the service routed to a remote machine or a domain
Is client wait time a reply problem?
My back-end services failed, but I don’t know which one.
Turn on call path monitoring. You can find the service execution status for this call.
How many kinds of call paths are in my application?
Turn on call path monitoring using an adequate sampling policy. Oracle TSAM will tell you how many call paths (a “call pattern”) exist in your application.
Why is my global distributed transaction completed slowly?
Turn on Oracle TSAM transaction monitoring. You can see the execution time used by the transaction participants.
I want to correlate local transactions with remote transactions.
Turn on Oracle TSAM transaction monitoring for all involved processes and GWTDOMAIN. The Oracle TSAM Console shows you the transaction mapping between local and remote transactions.
I want to know what is the peak time that my local domain uses resources from the remote domain, and how busy it is.
Use Oracle TSAM system server monitoring on the GWTDOMAIN. Oracle TSAM records the information for you, and shows you the throughput trends.
Can I check program request information?
Turn on call path monitoring with the proper monitoring policy and then use “tpgetcallinfo”. The following information is provided.
The timestamp when the request leaves the caller
The timestamp when the request comes into to the server IPC queue
The client IP address (workstation client, GWWS client)
The monitoring initiator process, tpgetcallinfo(), can also tell you the total time used.
Improving Application Performance
Are my services too fine grained?
In some cases, too many services supporting a request may add to performance overhead. Use call path tree to investigate. The service number and the tree depth are key analysis factors.
Are my services deployed properly?
Some services are called more frequently than others. Use call path monitoring to gather the information, and re-consider the service deployment. It is best to have the most used services located on the local machine and LAN. Services across domain services should be used carefully.
Do I have too many servers configured?
Oracle TSAM provides a central view of your Tuxedo applications with multiple domain support. Using Oracle TSAM Console allows you to easily see how many domains, machines, servers and services are configured.
I want to be notified when something is wrong with my system
Oracle TSAM provides comprehensive alert configuration based on the metrics collected. The base technology of TSAM alert evaluation is Tuxedo FML Boolean expression, so you can combine complex conditions to compose an alert, such as:
Report an alert while the execution time greater than 30 seconds and execution is failed.
Report an alert if heuristic commit happens
Report an alert if the CPU time consumption is too high in my service
How to drop stale requests?
Some times the service execution is slow and quite some request messages are waiting for a long time in the IPC queue. The client who issues the request might be already got timeout notification, but the service still continually to process the request. Oracle TSAM Alert can allow you to configure an "drop request" action with some metrics. For example, you can drop a request if the waiting time of a message greater an large value. You can also configure an alert with drop request action under the condition that the message number in the request queue exceeding some threshold.
To add Oracle TSAM functionality to an existing Oracle Tuxedo application, do the following steps: