The Use of tmadmin(1)

As the chapter title says, this chapter deals with how to use the interactive monitor program, tmadmin(1).

tmadmin has almost 50 commands that fall into the following seven categories; commands that:

In this chapter we will cover the seven categories of tmadmin commands in the way we hope will be most useful to the System/T administrator. The major sections of this chapter are:

General Syntax of tmadmin Commands

tmadmin is a command interpreter that provides for the inspection and modification of a bulletin board and its associated entities. The command requires the TUXCONFIG environmental variable to be set.

Only one tmadmin process at a time can be the administrator. In normal operation tmadmin is invoked without options by the System/T administrator on an active node; however, there are exceptions. If the application is active but partitioned when tmadmin is invoked, not all tmadmin commands are available. If the application is inactive when the command is invoked, again not all tmadmin commands are available.

tmadmin Command Line Options

There are two command line options available:

-r
instructs the command to enter the bulletin board in read-only mode. This leaves the administrator slot open; the process attaches to the bulletin board as a client.

-c
indicates a desire to enter tmadmin in configuration mode. This form of the command can be invoked on any node, including inactive ones. Without the -c option, when the application is inactive tmadmin can be successfully invoked only from the MASTER node. There is more about configuration mode in the next subsection.

-v
causes tmadmin to display the TUXEDO version number and license number. After printing out the information, tmadmin exits. If the -v option is entered with either of the other two options, the others are ignored; only the information requested by the -v option is displayed.

Available Commands Matrix

The tmadmin commands available depend on the state of the configuration, the type of node and the command line option (if any). Figure 1 summarizes this.

Fig. 1: Matrix showing tmadmin commands available

Command Config State Node Type Commands Available
tmadmin -c active/inactive native default, dumptlog, echo, help, quit, livtoc, crdl, lidl, dsdl, indl, paginate, verbose
tmadmin -r active native all -c plus bbls, bbparms, bbstat, dump, dumptlog, printclient, printnet, printqueue, printserver, printservice, printtrans, printgroup, serverparms, serviceparms
tmadmin active native all
tmadmin inactive master all -c plus crlog, dslog, inlog, boot
tmadmin inactive non-master <error>, function not allowed
tmadmin partitioned backup master all, including master
tmadmin partitioned not backup master read-only commands on local bulletin board only

tmadmin Commands

Once tmadmin is invoked, a greater-than sign (\^>\^) appears as a prompt and the commands are available as shown in Figure 1. The prompt may be preceded by a machine-id (see the discussion of default that follows). If the configuration is not active when the command is invoked, the following message is displayed:


No bulletin board exists.  Entering boot mode
>

Commands that Help You Use Other Commands

Figure 2 lists commands specifically designed to help you work with the other tmadmin commands.

Most tmadmin commands have an abbreviation. In our tables and examples we always show the abbreviations. In tables, such as Figure 2 below, the abbreviations are enclosed in parentheses; in examples, we use the abbreviations rather than the full command.

Fig. 2: tmadmin miscellaneous commands

Command(abbr) Description
default(d) set default values for arguments of other commands
dump(du) dump current bulletin board into a file
echo(e) echo input command lines
help(h) print command list or command syntax
paginate(page) pipe output of commands to a pager
quit(q) terminate the session
verbose(v) show output in verbose mode
!shlcmd escape to shell; do shlcmd
!! repeat previous shell command
<CR> repeat last tmadmin command

Default (d)

The default command of tmadmin allows default values to be set for several frequently used parameters. The default parameters can be used by most of the tmadmin commands, but not for boot or shutdown. Those commands ignore the default settings. For all other commands, however, once defaults are set, they remain in effect until the session ends or until reset to a different value. Parameters other than Machine ID (-m) can be unset by entering \(** as the value. The command:


default -m DBBL
resets the -m to the original state.

Entering the command default without options produces a list of the current settings. If no options are set the list looks like Figure 3 (comments have been added):

Fig. 3: default output

> d
Default Settings:
      Group Name: (not set)
       Server ID: (not set)
      Machine ID: all
      Queue Name: (not set)
     Client Name: (not set)
    Service Name: (not set)
       User Name: (not set)
          Blocks: 1000
          Offset: 0
            Path: /home/apps/bank/bankdl1
	# Blocks, Offset and Path were picked up from
	# the ENVFILE for the sample application.
	# Path defaults to the value of FSCONFIG
>

In a multiprocessor environment, Machine ID (-m) can be set to all, to the DBBL, or to a specific processor. If not set to a specific processor or to all, information displayed is retrieved from the DBBL. Once set to a specific processor, mid must explicitly be set to DBBL to return to using only that bulletin board. If the Machine ID is set to a specific processor, information is retrieved only from that processor. The setting is displayed as part of the tmadmin prompt. Figure 4 shows this.

Fig. 4: Prompt when mid is set

		# 1. default mid not previously set
> d -m SITE1	# 2. set SITE1 as default mid
MACH1 >		# 3. prompt now shows default mid

Optional vs. Required Arguments

Most tmadmin commands require explicit information about the resource on which the command is to act\(emalthough required arguments can often be set via the default command as well as on the command line. tmadmin reports an error if required information is not available from either source.

Some tmadmin statistical commands treat unspecified default parameters to mean all.

Verbose

tmadmin commands that display information from the bulletin board sometimes show different output when run with verbose mode on. In the examples that follow we will indicate whether verbose mode is on or off; where the display differs a lot we will show both versions.

Monitoring the Configuration

tmadmin provides the System/T administrator with a window into bulletin board operations. It allows the administrator fast and easy access to do the following:

  • get information about the current configuration

  • look at statistics that show the amount of processing activity.

  • dynamically reconfigure the application to serve the current needs of users.

The first two items on the list are covered in this section; dynamic reconfiguration is covered in the section called Managing the Configuration later in this chapter.

To show the output produced by various tmadmin commands the configuration file shown in Figure 5 was used. This is the configuration file of an MP version of bankapp.

Fig. 5: Configuration file for tmadmin examples

*RESOURCES
IPCKEY		80952
UID		4196
GID		601
PERM		0660
MAXACCESSERS	40
MAXSERVERS	35
MAXSERVICES	75
MAXCONV		10
MAXGTT		20
MASTER		SITE1,SITE2
SCANUNIT		10
SANITYSCAN	12
BBLQUERY		180
BLOCKTIME		30
DBBLWAIT		6
OPTIONS		LAN,MIGRATE
MODEL		MP
LDBAL		Y
#
*MACHINES
mchn1		LMID=SITE1
		TUXDIR="/home/tuxroot"
		APPDIR="/home/apps/bank"
		ENVFILE="/home/apps/bank/ENVFILE"
		TLOGDEVICE="/home/apps/bank/TLOG"
		TLOGNAME=TLOG
		TUXCONFIG="/home/apps/bank/tuxconfig"
		TYPE="3B2"
		ULOGPFX="/home/apps/bank/ULOG"
wgs386		LMID=SITE2
		TUXDIR="/home2/tuxroot"
		APPDIR="/home2/apps/bank"
		ENVFILE="/home2/apps/bank/ENVFILE"
		TLOGDEVICE="/home2/apps/bank/TLOG"
		TLOGNAME=TLOG
		TUXCONFIG="/home2/apps/bank/tuxconfig"
		TYPE="386"
		ULOGPFX="/home2/apps/bank/ULOG"
#
*GROUPS
DEFAULT:	TMSNAME=TMS_SQL	TMSCOUNT=2
# For NT/Netware, :bankdb: becomes ;bankdb;
BANKB1		LMID=SITE1	GRPNO=1
	OPENINFO="TUXEDO/SQL:/home/apps/bank/bankdl1:bankdb:readwrite"
BANKB2		LMID=SITE2	GRPNO=2
	OPENINFO="TUXEDO/SQL:/home2/apps/bank/bankdl2:bankdb:readwrite"
*NETWORK
SITE1	NADDR="0x00021112c00b6903"
	BRIDGE="/dev/tcp"
	NLSADDR="0x00021111c00b6903"
SITE2	NADDR="0x00021112c00b690c"
	BRIDGE="/dev/tcp"
	NLSADDR="0x00021111c00b690c"
*SERVERS
#
DEFAULT: RESTART=Y MAXGEN=5 REPLYQ=Y CLOPT="-A"
TLR	SRVGRP=BANKB1	SRVID=1		RQADDR=tlr1	CLOPT="-A -- -T 100"
TLR	SRVGRP=BANKB1	SRVID=2		RQADDR=tlr1	CLOPT="-A -- -T 200"
TLR	SRVGRP=BANKB2	SRVID=3		RQADDR=tlr2	CLOPT="-A -- -T 600"
TLR	SRVGRP=BANKB2	SRVID=4		RQADDR=tlr2	CLOPT="-A -- -T 700"
XFER	SRVGRP=BANKB1	SRVID=5
XFER	SRVGRP=BANKB2	SRVID=6
ACCT	SRVGRP=BANKB1	SRVID=7
ACCT	SRVGRP=BANKB2	SRVID=8
BAL	SRVGRP=BANKB1	SRVID=9	
BAL	SRVGRP=BANKB2	SRVID=10
BTADDSRVGRP=BANKB1
BTADD	SRVGRP=BANKB2	SRVID=12
AUDITC	SRVGRP=BANKB1	SRVID=13 CONV=Y MIN=1 MAX=10
BALC	SRVGRP=BANKB1	SRVID=24
BALC	SRVGRP=BANKB2	SRVID=25
#
*SERVICES
DEFAULT:	LOAD=50		AUTOTRAN=N
WITHDRAWAL	PRIO=50		ROUTING=ACCOUNT_ID
DEPOSIT		PRIO=50		ROUTING=ACCOUNT_ID
TRANSFER		PRIO=50		ROUTING=ACCOUNT_ID
INQUIRY		PRIO=50		ROUTING=ACCOUNT_ID
CLOSE_ACCT	PRIO=40		ROUTING=ACCOUNT_ID
OPEN_ACCT		PRIO=40		ROUTING=BRANCH_ID
BR_ADD		PRIO=20		ROUTING=BRANCH_ID
TLR_ADD		PRIO=20		ROUTING=BRANCH_ID
ABAL		PRIO=30		ROUTING=b_id
TBAL		PRIO=30		ROUTING=b_id
ABAL_BID		PRIO=30		ROUTING=b_id
TBAL_BID		PRIO=30		ROUTING=b_id
ABALC_BID		PRIO=30		ROUTING=b_id
TBALC_BID		PRIO=30		ROUTING=b_id
*ROUTING
ACCOUNT_ID	FIELD=ACCOUNT_ID
		BUFTYPE="FML"
		RANGES="10000-59999:BANKB1,
			60000-109999:BANKB2,
			*:*"
BRANCH_ID	FIELD=BRANCH_ID
		BUFTYPE="FML"
		RANGES="1-5:BANKB1,
			6-10:BANKB2,
			*:*"
b_id		FIELD=b_id
		BUFTYPE="VIEW:aud"
		RANGES="1-5:BANKB1,
			6-10:BANKB2,
			*:*"

Using Tmadmin to Display Parameters

The tmadmin commands that are used primarily to produce information about configuration parameters are shown in Figure 6.

Fig. 6: tmadmin parameter display commands

Command(abbr) Description
bbparms(bbp) print a summary of bulletin board parameters
bbsread(bbls) list IPC resources on machine mid
serverparms(srp) print parameters of the specified server
serviceparms(scp) print parameters of the specified service

bbparms (bbp)

This command prints parameters from the \(**RESOURCES section. The display is shown in Figure 7.

Fig. 7: bbparms output

> bbparms
Bulletin Board Parameters:
      MAXSERVERS: 35
     MAXSERVICES: 75
    MAXACCESSERS: 40
          MAXGTT: 20
         MAXCONV: 10
      MAXBUFTYPE: 16
     MAXBUFSTYPE: 32
          IPCKEY: 35384
          MASTER: SITE1,SITE2
           MODEL: MP
           LDBAL: Y
         OPTIONS: LAN,MIGRATE
        SCANUNIT: 10
      SANITYSCAN: 12
        DBBLWAIT: 6
        BBLQUERY: 180
       BLOCKTIME: 30

The display is the same with verbose mode on or off.

bbsread (bbls)

The bbsread command produces information about the IPC resources on a local site. The output is shown in Figure 8.

Fig. 8: bbsread output

SITE1> bbsread
IPC resources for the bulletin board on machine SITE1:
SHARED MEMORY:          Key: 0x1013c38
SEGMENT 0:
                         ID: 15730
                       Size: 36924
         Attached processes: 12
      Last attach/detach by: 4181
This semaphore is the system semaphore
SEMAPHORE:              Key: 0x1013c38
                         Id: 15666
       | semaphore  | current |   last    | # waiting |
       |   number   | status  |  accesser | processes |
       |----------------------------------------------|
       |      0     |   free  |    4181   |     0     |
       |------------|---------|-----------|-----------|
This semaphore set is part of the user-level semaphore
SEMAPHORE:              Key: IPC_PRIVATE
                         Id: 11572
       | semaphore  | current |   last    | # waiting |
       |   number   | status  |  accesser | processes |
       |----------------------------------------------|
       |      0     | locked  |    4181   |     0     |
       |      1     | locked  |    4181   |     0     |
       |      2     | locked  |    4181   |     0     |
       |      3     | locked  |    4181   |     0     |
       |      4     | locked  |    4181   |     0     |
       |      5     | locked  |    4181   |     0     |
       |      6     | locked  |    4181   |     0     |
       |      7     | locked  |    4181   |     0     |
       |      8     | locked  |    4181   |     0     |
       |      9     | locked  |    4181   |     0     |
       |     10     | locked  |    4181   |     0     |
       |     11     | locked  |    4181   |     0     |
       |     12     | locked  |    4181   |     0     |
       |     13     | locked  |    4181   |     0     |
       |------------|---------|-----------|-----------|

The display is the same with verbose mode on or off.

serverparms (srp)

The serverparms command produces the display shown in Figure 9. Again, the request is to display only information for SITE1. Figure 9 shows just a sample of the output; a similar report is produced for each server at SITE1.

Fig. 9: serverparms output

SITE1> srp -g BANKB1 -i 111
        a.out Name: /home/apps/bank/TLR
        Queue Name: tlr1
    Server Options: RESTARTABLE
    Max # Restarts: 5
   Restart Command: (restartsrv)
      Grace Period: 1 day
          Group ID: 1
         Server ID: 1
        Machine ID: SITE1

The display is the same with verbose mode on or off.

serviceparms (scp)

The serviceparms command produces the display shown in Figure 10.

Fig. 10: serviceparms output

SITE1> scp -g BANKB1 -i 111 -s WITHDRAWAL
    Service Name: WITHDRAWAL
   Function Name: WITHDRAWAL
            Load: 50
        Priority: 50
         Address: 0x2

The display is the same with verbose mode on or off.

Tmadmin Statistics

The tmadmin commands that display statistics are shown in Figure 11.

Fig. 11: tmadmin statistics commands

Command(abbr) Description
bbstats(bbs) print a summary of the bulletin board's statistics
printclient(pclt) print names and other information about active client processes
printgroup(pg) print server group table information
printnet(pnw) print count of messages in and out for specified machines; indicates if machine is partitioned
printqueue(pq) print information for a specified queue or all queues
printserver(psr) print information for a specified server or all servers
printservice(psc) print information for a specified service or all services
shmstats(sstats) Available in SHM mode only. Enable option for more exact statistics.

The format of the output of some statistics commands is quite different depending on whether verbose mode is off or on.

verbose mode off is useful for displaying statistics the TUXEDO System administrator can use in deciding whether some action should be taken to reconfigure the system. When the verbose mode is on, additional detail is displayed.

Statistics are collected by bulletin board. Setting the default mid to all, retrieves a current reading from each bulletin board. Setting the default mid to a single processor, retrieves statistics from the bulletin board on that machine. (The display may list resources on all machines, but the statistics are provided for the specified machine only.) If the default mid is set to DBBL (or if it has not been set at all during the current tmadmin session), statistics are retrieved from the distinguished bulletin board. In the displays, a zero in a column means there is nothing to report, a dash means the information is not being collected in the present mode of execution.

bbstats prints a brief summary of the number of servers, services, request queues and groups. The output is shown in Figure 12.

Fig. 12: bbstats output

> bbs
Current Bulletin Board Status:
          Current number of servers: 24
         Current number of services: 47
   Current number of request queues: 20
    Current number of server groups: 2

The output of bbstats is the same with verbose mode on or off, and is the same whether the default mid is all or any single processor.

shmstats (sstats)

When running in SHM mode (that is, when MODEL SHM is specified) the shmstats command can be used to assure more accurate statistics. When an application is active for many hours (or days), the statistics have a tendency to get a out of synch. shmstats can be used to specify exact recording (with the ex argument), or approximate recording (with the app argument). If the command:


sstats ex
is entered, TUXEDO System locks the bulletin board briefly and resets several counters. If the command shmstats is entered without arguments, it reports on which method is currently in force.

printclient(pclt)

The printclient command displays information for a selected group of active clients. The information is shown in Figure 13. usrname and cltname are from values provided in a TPINIT buffer when the client joins the application. If the names are longer than 8 characters, they are truncated from the right; a plus sign appears to the right of a name that has been truncated. The tran info columns show the number of transactions begun and ended directly by the client. Status can be one of the following values:

IDLE

The client has joined the application ( tpinit(3c)), but does not have outstanding service request handles nor does it have active conversations.

IDLET

The client is idle, as described above, and has initiated a transaction ( tpbegin(3c)).

BUSY

The client has joined the system and has at least one outstanding service request handle or one active conversation.

BUSYT

The client is busy, as described above, and has initiated a transaction.

Fig. 13: printclient output, verbose mode off

all> pclt
     LMID         User Name       Client Name    Time    Status  Bgn/Cmmt/Abrt
--------------- --------------- --------------- -------- ------- -------------
SITE1           tuxedo          tmadmin          0:03:44 IDLE    0/0/0
SITE1                                            0:00:05 BUSY    0/0/0
SITE1                                            0:00:05 BUSY    0/0/0

In verbose mode, additional information is included as shown in Figure 14.

Fig. 14: printclient output, verbose mode on

> v
Verbose now on
> pclt
                               LMID: SITE1
                Reply queue address: 114421
                          User Name: tuxedo
                   Application Name: tmadmin
                     Time Connected: 0:04:41
          Requests Outstanding/Made: 0/0
     Conversations Active/Initiated: 0/0
                 Transactions Begun: 0
             Transactions Committed: 0
               Transactions Aborted: 0
        Transactions Begun Per Hour: 0
             Requests Made Per Hour: 0
   Conversations Initiated Per Hour: 0
                             Status: IDLE
			       LMID: SITE1
                          User Name:
                   Application Name:
                     Time Connected:  0:01:19
          Requests Outstanding/Made: 1/0
     Conversations Active/Initiated: 0/0
                 Transactions Begun: 0
             Transactions Committed: 0
               Transactions Aborted: 0
        Transactions Begun Per Hour: 0
             Requests Made Per Hour: 0
   Conversations Initiated Per Hour: 0
                             Status: BUSY

printgroup(pg)

The printgroup command prints information about server groups. It can be specified either by -m machine or -g groupname. An error message is returned if machine is all. Output is shown in Figure 15.

Fig. 15: printgroup output

SITE1> pg -g BANKB1
Server group parameters:
         Group Name: BANKB1
       Group Number: 1
      Group Options: RM
    Primary Machine: SITE1
    Current Machine: SITE1

The output is the same with verbose mode on or off.

printnet(pnw)

The printnet command can take a comma-separated list of LMIDs. If no list is provided, all BRIDGE processes are queried. For each LMID, an indication is given if the machine is partitioned. If not partitioned, information is printed that shows the other machines this one is connected to and the count of messages in and out.

Since most System/T network traffic is between the DBBL on the master and BBLs on non-master machines, at boot time only connections between the master (and backup master) and non-master machines are brought up. Connections from one non-master machine to another are brought up when needed. This is referred to as the ``lazy connection'' feature. The printnet command shows only connections that have been brought up.

The output of the printnet command is shown in Figure 16.

Fig. 16: printnet output

SITE1> printnet
SITE1	Connected To:		    msgs snd	    msgs rcv
	wgs386				214		201
SITE2	Connected To:		    msgs snd	    msgs rcv
	mchn1				201		214

printqueue (pq)

The printqueue command produces a display of information about the activity on the queues. Figure 17 shows the format with verbose mode off.

Fig. 17: printqueue output, mid set to all

all> pq
a.out Name     Queue Name  # Serv Wk Queued  # Queued  Ave. Len    Machine
----------     ------------------ ---------  --------  --------    -------
TMS_SQL        BANKB2_TMS       2         -         0         -      SITE2
BTADD          00002.00012      1         -         0         -      SITE2
DBBL           80952            1         -         0         -      SITE1
BRIDGE         33635384         1         -         0         -      SITE2
ACCT           00002.00008      1         -         0         -      SITE2
TLR            tlr1             2         -         0         -      SITE1
BAL            00001.00009      1         -         0         -      SITE1
BAL            00002.00010      1         -         0         -      SITE2
BBL            30002.00000      1         -         0         -      SITE1
TMS_SQL        BANKB1_TMS       2         -         0         -      SITE1
BRIDGE         16858168         1         -         0         -      SITE1
ACCT           00001.00007      1         -         0         -      SITE1
BALC           00002.00025      1         -         0         -      SITE2
BBL            30003.00000      1         -         0         -      SITE2
TLR            tlr2             2         -         0         -      SITE2
BTADD          00001.00011      1         -         0         -      SITE1
AUDITC         00001.00013      1         -         0         -      SITE1
XFER           00001.00005      1         -         0         -      SITE1
BALC           00001.00024      1         -         0         -      SITE1
XFER           00002.00006      1         -         0         -      SITE2
1 Queue Table Entry allocated for client processes.
dashes ( - ) indicate the information is not collected in the present mode.
zeroes ( 0 ) indicated information is collected but there is nothing to report.

Notice that most of the queue names in Figure 17 have been generated by the TUXEDO System software. It defaults to "GRPNO.SRVID" where GRPNO is the number of the server group associated with the server and SRVID is the server identifier as specified in the configuration file. Since the TLR servers participate in MSSQ sets, only the TLR servers were assigned symbolic names via the RQADDR parameter in the configuration file (see Figure 5).

The application was idle when Figure 17 was produced. If requests were backed up for any server, the following information would be significant:

#~Queued

the number of service requests enqueued

If the example were of a SHM system, these columns would be of interest:

Wk~Queued

the load currently queued for a server

Ave.~Len

the average length of the queue

In a single processor system (SHM), when there is no work backed up in the queues the display shows zeros in these columns. (Also see the description of the shmstats command above) In the multiprocessor system (MP), statistics for the enqueued load and average queue length are not available. Where figures are available, if they indicate a problem in the queue, the administrator might boot more servers (assuming more are available).

The output of the printqueue command is quite different when verbose mode is on. In that case, the pertinent information is whether and how many times the server on the queue is restartable. Figure 18 shows a sample using one queue name.

Fig. 18: printqueue, verbose mode on

all> v
Verbose now on.
all> pq 00002.00012
           a.out Name: /home2/units/apps/bankapp/BTADD
           Queue Name: 00002.00012
   # Servers on Queue: 1
       Server Options: RM, RESTARTABLE
       Max # Restarts: 5
      Restart Command: (restartsrv)
         Grace Period: 1 day
           Queue Type: USER

printserver (psr)

The printserver command provides information about the work being done by the application's servers.

Fig. 19: printserver output, mid set to all

all> psr
Totals for all machines:
a.out Name    Queue Name  Grp Name      ID RqDone Load Done Machine
----------    ----------  --------      -- ------ --------- -------
BBL           30003.00000 SITE2          0     49      2450 SITE2
BBL           30002.00000 SITE1          0     53      2650 SITE1
DBBL          80952       SITE1          0    460     23000 SITE1
TLR           tlr1        BANKB1         1     55      2750 SITE1
BRIDGE        33635384    SITE2          1      0         0 SITE2
BRIDGE        16858168    SITE1          1      0         0 SITE1
TLR           tlr1        BANKB1         2     45      2250 SITE1
TLR           tlr2        BANKB2         3     49      2450 SITE2
TLR           tlr2        BANKB2         4     51      2550 SITE2
XFER          00001.00005 BANKB1         5      0         0 SITE1
XFER          00002.00006 BANKB2         6      0         0 SITE2
ACCT          00001.00007 BANKB1         7    100      5000 SITE1
ACCT          00002.00008 BANKB2         8    100      5000 SITE2
BAL           00001.00009 BANKB1         9      0         0 SITE1
BAL           00002.00010 BANKB2        10      0         0 SITE2
BTADD         00001.00011 BANKB1        11     20      1000 SITE1
BTADD         00002.00012 BANKB2        12     20      1000 SITE2
AUDITC        00001.00013 BANKB1        13      0         0 SITE1
BALC          00001.00024 BANKB1        14      0         0 SITE1
BALC          00002.00025 BANKB2        15      0         0 SITE2
TMS_SQL       BANKB2_TMS  BANKB2     30001      0         0 SITE2
TMS_SQL       BANKB1_TMS  BANKB1     30001      0         0 SITE1
TMS_SQL       BANKB2_TMS  BANKB2     30002    120      6000 SITE2
TMS_SQL       BANKB1_TMS  BANKB1     30002    120      6000 SITE1

printserver output can be used to check on the load and number of requests handled by each server. This is different information from that available through the printqueue command. Here the Rq Done and Load Done figures are cumulative from the time the system was booted. Figure 19 shows what was done by each of the four TLR servers. For MSSQ sets such as these, the imbalance in the figures of the MSSQ set at SITE1 might indicate a problem.

With verbose mode on and the machine ID set to an individual processor, the information is presented in the form shown in Figure 20.

Fig. 20: printserver output, mid set to a processor, verbose mode on

SITE1> psr -g BANKB1 -i 1
        Group ID: BANKB1, Server ID: 1, Machine ID: SITE1
      Process ID: 2133, Request Qaddr: 1508, Reply Qaddr: 1210
     Server Type: USER
      a.out Name: /home/apps/bank/TLR
      Queue Name: tlr1
         Options: RESTARTABLE
  Max # Restarts: 5
 Restart Command: (restartsrv)
    Grace Period: 1 day
      Generation: 1, Max message type: 1073741824
   Creation time: Tue Oct 01 10:19:11 1991
         Up time: 0:23:21
   Requests done: 55
       Load done: 2750
  Current Status: ( IDLE )

If the server is restartable, the Generation line in the above display can be checked to see the number of times the server has been restarted. The figure 1 in the display means the server has been started once but has not been restarted.

Current Status may sometimes be reported as UNKNOWN. This almost always means that status could not be determined because the message queues were full.

printservice (psc)

The printservice command provides the additional detail that shows the number of requests handled by each service within a server. With verbose mode off it appears as shown in Figure 21.

Fig. 21: printservice output, verbose off

all> psc
Totals for all machines:
Service Name Routine Name a.out Name Grp Name  ID    Machine  # Done Status
------------ ------------ ---------- --------  --    -------  ------ ------
AUDITC       AUDITC       AUDITC     BANKB1     1      SITE1       0 AVAIL
TRANSFER     TRANSFER     XFER       BANKB2   201      SITE2       0 AVAIL
OPEN_ACCT    OPEN_ACCT    ACCT       BANKB2   202      SITE2       6 AVAIL
CLOSE_ACCT   CLOSE_ACCT   ACCT       BANKB2   202      SITE2       0 AVAIL
TBAL_BID     TBAL_BID     BAL        BANKB2   203      SITE2       0 AVAIL
TBAL         TBAL         BAL        BANKB2   203      SITE2       0 AVAIL
ABAL_BID     ABAL_BID     BAL        BANKB2   203      SITE2       0 AVAIL
ABAL         ABAL         BAL        BANKB2   203      SITE2       0 AVAIL
TLR_ADD      TLR_ADD      BTADD      BANKB2   204      SITE2       3 AVAIL
BR_ADD       BR_ADD       BTADD      BANKB2   204      SITE2       3 AVAIL
TBALC_BID    TBALC_BID    BALC       BANKB2   205      SITE2       0 AVAIL
ABALC_BID    ABALC_BID    BALC       BANKB2   205      SITE2       0 AVAIL
WITHDRAWAL   WITHDRAWAL   TLR        BANKB2   211      SITE2       0 AVAIL
INQUIRY      INQUIRY      TLR        BANKB2   211      SITE2       0 AVAIL
DEPOSIT      DEPOSIT      TLR        BANKB2   211      SITE2       2 AVAIL
WITHDRAWAL   WITHDRAWAL   TLR        BANKB2   212      SITE2       0 AVAIL
INQUIRY      INQUIRY      TLR        BANKB2   212      SITE2       0 AVAIL
DEPOSIT      DEPOSIT      TLR        BANKB2   212      SITE2       2 AVAIL
WITHDRAWAL   WITHDRAWAL   TLR        BANKB2   213      SITE2       0 AVAIL
INQUIRY      INQUIRY      TLR        BANKB2   213      SITE2       0 AVAIL
DEPOSIT      DEPOSIT      TLR        BANKB2   213      SITE2       2 AVAIL
TMS          TMS          TMS_SQL    BANKB2 30001      SITE2       1 AVAIL
TRANSFER     TRANSFER     XFER       BANKB1   101      SITE1       1 AVAIL
TMS          TMS          TMS_SQL    BANKB1 30001      SITE1       4 AVAIL
TMS          TMS          TMS_SQL    BANKB2 30002      SITE2      10 AVAIL
OPEN_ACCT    OPEN_ACCT    ACCT       BANKB1   102      SITE1      60 AVAIL
CLOSE_ACCT   CLOSE_ACCT   ACCT       BANKB1   102      SITE1       0 AVAIL
TMS          TMS          TMS_SQL    BANKB1 30002      SITE1      65 AVAIL
TBAL_BID     TBAL_BID     BAL        BANKB1   103      SITE1       0 AVAIL
TBAL         TBAL         BAL        BANKB1   103      SITE1       0 AVAIL
ABAL_BID     ABAL_BID     BAL        BANKB1   103      SITE1       0 AVAIL
ABAL         ABAL         BAL        BANKB1   103      SITE1       0 AVAIL
TLR_ADD      TLR_ADD      BTADD      BANKB1   104      SITE1       3 AVAIL
BR_ADD       BR_ADD       BTADD      BANKB1   104      SITE1       3 AVAIL
TBALC_BID    TBALC_BID    BALC       BANKB1   105      SITE1       0 AVAIL
ABALC_BID    ABALC_BID    BALC       BANKB1   105      SITE1       0 AVAIL
WITHDRAWAL   WITHDRAWAL   TLR        BANKB1   111      SITE1       1 AVAIL
INQUIRY      INQUIRY      TLR        BANKB1   111      SITE1       0 AVAIL
DEPOSIT      DEPOSIT      TLR        BANKB1   111      SITE1      19 AVAIL
WITHDRAWAL   WITHDRAWAL   TLR        BANKB1   112      SITE1       0 AVAIL
INQUIRY      INQUIRY      TLR        BANKB1   112      SITE1       0 AVAIL
DEPOSIT      DEPOSIT      TLR        BANKB1   112      SITE1      20 AVAIL
WITHDRAWAL   WITHDRAWAL   TLR        BANKB1   113      SITE1       0 AVAIL
INQUIRY      INQUIRY      TLR        BANKB1   113      SITE1       0 AVAIL
DEPOSIT      DEPOSIT      TLR        BANKB1   113      SITE1      20 AVAIL

Summary of Statistics Commands

The examples of the displays from the tmadmin(1) statistics commands are just a fraction of those that can be produced. Combinations of options can be used to narROW the information. As the TUXEDO System administrator gains experience working with the servers and services of an application, ways in which the information can be used to tune the system will become apparent.

Managing the Configuration

The tmadmin commands that are used in reconfiguring a running system and managing transactions when necessary, consist of the commands shown in Figure 22. The commands that make changes in the parameters of the configuration stay in effect only until the system (or component) is shut down. Permanent changes to the TUXCONFIG file can be made by entering config to invoke tmconfig(1) or by quitting tmadmin and invoking tmconfig directly.

Fig. 22: tmadmin commands for managing services and transactions

Command(abbr) Description
aborttrans(abort) notify the coordinator of a transaction, or a participant, to abort it
committrans(commit) notify a participant of a decided transaction to commit heuristically
printtrans(pt) print information from the global transaction table
advertise(adv) add a service to the service table
unadvertise(unadv) remove a service from the service table
suspend(susp) remove a service from the list of those available. Server or queue identifiers can be used to broaden the scope.
resume(res) return a suspended service to the list of those available Server or queue identifiers can be used to broaden the scope.
changeload(chl) change the load specified for a service
changepriority(chp) change the priority specified for a service
changetrantime(chtt) change the time limit specified for a service
config(conf) make changes in TUXCONFIG

If errors are encountered by a reconfiguration command, a message indicating the error is displayed on the terminal. More information is available in the central event log.

The boot and shutdown commands of tmadmin(1) could be added to the collection shown in Figure 22. They are the same as tmboot(1) and tmshutdown(1), respectively. They might be included in this discussion because one way to change the services available is to start more servers or shut some down, but since they are covered in detail in the previous chapter they are omitted here.

Most of the commands listed above affect only the information in the bulletin board structure. They do not change the content of TUXCONFIG. TUXCONFIG can be changed only by tmadmin config or its shell counterpart, tmconfig(1). The details of using tmconfig can be found in the next chapter, The Use of tmconfig(1) and in the tmconfig(1) manual page in the BEA TUXEDO Reference Manual: Section 1.

Managing Transactions

In normal System/T operations the software automatically manages global transactions and entries remain in the Global Transaction Table (GTT) for such a brief moment, it is very difficult to capture a live one when you enter a printtrans command. However, if there has been some problem: the network (or part of it) has failed, the system has crashed or a server that participates in a global transaction has gone down, there may be entries left in the GTT that require administrator intervention to push through to a reasonable conclusion, (which generally means to abort the transaction).

How to Detect a Problem

You may have a problem that requires intervention if:

  • Transactions persist in the GTT. As noted above, entries normally stay in the GTT for such a short time that you rarely get any output when you enter the printtrans command.

  • There is a logical disconnect between the transaction status and the group status. For example, the transaction state is TMGDECIDED, but a participating group shows a Group State of TMGACTIVE.

  • Database locks are being held until one participating group finishes committing.

Valid Transaction States

As is shown in the verbose mode of the printtrans command, the GTT shows a status for the transaction and a status for each participating group. The lists of valid status codes are shown in Figure 23 and Figure 24

Fig. 23: Valid transaction states

Transaction
-----------
TMGACTIVE	
  transaction is active, no errors have occurred
TMGABORTONLY
  transaction can only be aborted, but abort has not yet been called
TMGABORTED
  transaction can only be aborted; abort has been called
TMGCOMCALLED
  commit has been called; 1st phase in progress	
TMGREADY
  transaction has completed 1st phase of commit
TMGDECIDED
  transaction has been written to TLOG

Fig. 24: Valid group transaction states

Group
-----
TMGACTIVE|TMGNOTPART
  Group is active but has not yet been called
TMGACTIVE
  Group working on a transaction; no errors have occurred
TMGABORTED
  work for this Group has been aborted
TMGREADONLY
  work for this Group done in read-only mode
TMGREADY
  work for this Group has been successfully pre-committed
TMGHCOMMIT
  work for this Group has been heuristically committed
TMGHABORT
  work for this Group has been heuristically aborted
TMGDONE
  work for this Group has been committed

Clearing the GTT

When you have reason to believe that some intervention is needed, here is a recommended procedure:

  • Enter a printtrans command in verbose mode to see what is listed in the GTT.

  • Enter a series of aborttrans commands or committrans commands, one for each participating group.

    printtrans(pt)

    The printtrans command takes either a -m machine or -g groupname argument. The output differs slightly depending on whether verbose mode is on or off. The two forms of output are shown in Figure 25 and Figure 26.

    Fig. 25: printtrans output, verbose mode on

    all>
    pt
    >> index=0	gtrid=x0 x259a633a xf2
    :  Machine id: SITE1, Transaction status: TMGACTIVE
       Group count: 1, timeout: 30, time left: 39
       Known participants:
        group: BANKB2, status: TMGACTIVE, remote, coord
    

    In Figure 25, time left is greater than timeout because the software makes sure that you have at least timeout seconds, regardless of where in the SCANUNIT cycle the transaction began.

    Fig. 26: printtrans output, verbose mode off

    all> v
    Verbose now off.
    all> pt
    >> index=0	gtrid=x0 x259a633a x19e
    :  Machine id: SITE1, Transaction status: TMGACTIVE
       Group count: 1
    

    If the output of the printtrans command leads to the conclusion that you need to proceed with a series of committrans or aborttrans commands, the most important parts of the ENTRY listing are the index number and the names of the participating groups; they will be needed as arguments for the committrans or aborttrans commands.

    aborttrans(abort)

    The aborttrans command has only one required argument: the tranindex, which is the index= number from the printrans command. You can optionally specify the groupname with a -g. If you do specify the groupname, the operation applies only to tranindex at the specified group. If you choose not to specify groupname, the coordinator of the global transaction is requested to abort it, and all groups are aborted.

    committrans(commit)

    For committrans both grpname, with a -g flag, and tranindex are required and the command must be entered for all participating groups before the operation is complete. If the transaction is not in TMGREADY state for any participating group, the command fails.

    A Final Note on Managing Transactions

    We cannot emphasize too strongly that administrator intervention for transactions is an extremely rare occurrence. The chances are that you will run your TUXEDO System application for several years without have the occasion to use the aborttrans and committrans commands.

    Migrating Servers

    There are two tmadmin commands that can be used to migrate servers.

    Fig. 27: tmadmin migrate commands

    Command(abbr) Description
    migrategroup(migg) migrate servers in a group to their alternate location
    migratemach(migm) migrate servers by using LMIDs

    A special case involves switching from the ACTING MASTER to the ACTING BACKUP node. We include a discussion of this topic in this section because it has to do with moving the DBBL server. The following commands are involved:

    Fig. 28: tmadmin commands for switching the MASTER node

    Command(abbr) Description
    shutdown(stop) shutdown servers for migration
    master(m) switch MASTER to BACKUP or vice versa
    pclean(pcl) force a bbclean(bbc) then remove partitioned processes from a non-partioned bulletin board
    reconnect(rco) make a new connection from a non-partitioned machine to a partitioned machine

    migrategroup(migg)

    The migrategroup command takes the name of a single server group as an argument. The groupname must be explicitly stated; it can not be provided via the default command. Servers to be migrated must first be shutdown with this command:

    
    stop -R -g groupname
    
    If you prefer, tmshutdown can be used instead of tmadmin shutdown(stop). The server group being migrated must have an alternate location specified in its LMID parameter. Servers in the group must specify RESTART=Y and the MIGRATE option must be specified in the RESOURCES section. The migrategroup command
    
    migg groupname
    
    boots the server group on the new machine.

    If transactions are being logged for the servers involved in a group migration, you may need to dump the TLOG, load it and perform a warm start. See ``TLOG Commands'' earlier in this chapter.

    migratemach(migm)

    The migratemach command can take one LMID as an argument. The LMID names the processor where the server group(s) have been running. The alternate location must be the same for all server groups on the LMID. Servers on the LMID must specify RESTART=Y and the MIGRATE option must be specified in the RESOURCES section. Servers to be migrated must first be shutdown with this command:

    
    stop -R -l lmid
    
    If you prefer, tmshutdown can be used instead of tmadmin shutdown(stop). The migratemach command boots all affected server groups on the new machine. migratemach command
    
    migm machine
    
    boots all server groups on the new machine. The ground rules for migratemach call for all server groups on machine to have the same alternate location.

    Canceling a Migration

    You can cancel a migration after the stop -R command has completed, but before going ahead with the migrate command, by using the -cancel option of available with each command. The command lines look like this:

    
    > migg -cancel groupname
    > migm -cancel lmid
    
    The effect of the -cancel option is the same as shutting down without the -R option; the server entries are deleted from the bulletin board. (The -R option retains the server names in anticipation of a migrate command)

    Switching MASTER and BACKUP Nodes

    This is a special case of server migration. While it generally comes into play when the network is partitioned, there are situations where the administrator needs to shut down the master and a migration should be done from the master to the backup node. It is being covered here, adjacent to the discussion of ``Handling a Partitioned Network,'' because it also is related to how the System/T administrator might deal with such a condition.

    If you recall, in the discussion in Chapter 4 of the MASTER parameter in the \(**RESOURCES section of the configuration file we described the MASTER/BACKUP terminology. Here is where that terminology is most helpful.

    Use this procedure if it becomes necessary to take over for a crashed MASTER:

    Step 1:

    Run tmadmin on the ACTING BACKUP node

    Step 2:

    Invoke the master command (it takes no arguments) to become the \%ACTING MASTER

    Step 3:

    Invoke the pclean command specifying the LMID of the old ACTING MASTER. This will remove bulletin board entries for partitioned processes.

    Step 4:

    Once the crashed node is restored, from the ACTING MASTER run boot -B lmid -l lmid, where lmid is the machine id of the old ACTING MASTER. This effectively makes that node the ACTING BACKUP.

    Step 5:

    If you wish, you can now run tmadmin master on the ACTING BACKUP to return the nodes to their original roles.

    It may be necessary for the administrator to shutdown the ACTING MASTER node. Before doing so, the master and backup should be switched. In this case, the pclean command should not be run (Step 3 above). Once the switch is complete, the ACTING BACKUP can be shutdown using tmshutdown.

    If MASTER is Disconnected from the Network

    If the MASTER becomes disconnected from the network and will not be reconnected for some time, you may want to migrate control to the BACKUP. Do this before using the above procedure:

    Step 0:

    Shut down all System/T processes on MASTER and release all IPC resources.

    Then go ahead with Step 1 above.

    Handling a Partitioned Network

    In System/T networks a partition is said to exist when one or more remote nodes are not accessible to processes on the MASTER node. Partition of a System/T application network may result from any of three failures:

    • Node failure

    • LAN failure

    • BRIDGE process failure

    Detection and diagnosis of a partition is the responsibility of the System/T administrator, who must take appropriate action to recover. In all three types of failure the symptom is the same; the MASTER node has lost access to a remote node(s), remote node(s) can not access the MASTER node. In the sections that follow, we give you a general idea of how you might become aware that partition has occurred, some suggestions for determining what type of failure caused the partition and finally, specific steps to take to restore the network. The first two topics have a range of possibilities that depend to a large extent on your configuration and your own methods of administering the application. The last topic, restoring the network, is one where we can offer some detailed help.

    How You Learn that Partition Has Occurred

    Here are some of the ways in which the System/T administrator might learn that the network is partitioned:

    userlog

    When things go wrong with the network, System/T processes begin sending messages to the userlog. If you have set up your log under RFS so that all messages to userlog go to one file, and if you monitor that file regularly (perhaps with a window on your terminal where you run tail -f on the userlog), you will begin to see failure messages. If RFS is using the same network, the remote file systems may no longer be available.

    tmadmin

    The printnet command of tmadmin tells you of nodes that are partitioned. The printservice command shows suspended services. If services on a remote node all show status of PARTITIONED, you may have a partition problem.

    phone calls

    Users of your application are quite likely to be the first to realize that something is not working properly. If your phone begins to ring and users begin to ask, ``How come I can't get the BUY service?'' you probably have a partition problem.

    How to Find Out What Has Failed

    The thing you are most likely to detect at once is a failure of the MASTER node. But assuming the MASTER node is still in business, the first step in closing in on the problem is to learn the extent of it.

    Invoke tmadmin and see if the printnet and printservice commands indicate that more than one node is partitioned. If more than one node is partitioned, it strongly suggests the problem is a LAN failure; simultaneous failure of two or more nodes (or two or more BRIDGE processes) is statistically improbable.

    Your next step might be to check out the LAN. Your LAN probably has diagnostic tools that you can use to check the viability of the network.

    If the problem seems to be in a single node, the difficulty of verifying a node failure depends to a large extent on whether you have physical access of the node. If it is in the same building as the MASTER, it is not too hard to go check it out to see if it is still running. If it is still running, it again points to the likelihood of the problem being in the LAN. If the location is physically remote, you will have to rely on other means to find out if the node has failed.

    Steps to Take to Recover

    The way you recover from a partition differs depending on the type of failure. The preceding paragraphs suggested how you determine the type of failure.

    Node Failure

    If the MASTER node has failed, login to the BACKUP MASTER and execute the tmadmin master command to make it the ACTING MASTER. Follow this by running pclean with the LMID of the old MASTER to clean up bulletin board entries for processes that were running on the failed node.

    If the failure is in a remote node, run pclean from the MASTER, specifying the LMID of the failed node.

    Either of the above two steps has the effect of removing the failed node from the application and continuing to run with the remaining nodes of the configuration. This may resolve your System/T problem; however, you are left with a hardware problem.

    LAN Failure

    Your course of action with a LAN failure depends on the severity of the failure.

    Transient LAN Failure

    A transient LAN failure is one that corrects itself within minutes. The LAN is back to a viable state, but BRIDGE processes may be left unconnected. After LAN failures of very short duration, BRIDGE processes try to reconnect themselves. If the reconnection is successful, the transient LAN failure may slip by unnoticed. However, if the BRIDGE processes are not able to reconnect automatically, a message is sent to the userlog.

    When such a message is seen in the userlog, use the tmadmin reconnect command. The command takes two LMIDs for arguments, as follows:

    reconnect  non-partitioned lmid   partitioned lmid
    
    The reconnect command initiates a new connection between two BRIDGE processes.

    Severe LAN Failure

    If the LAN failure is one that is not going to correct itself in a short time, you will probably want to take the partitioned node out of the network. This can be done gracefully by using the -P option of tmshutdown (or shutdown in tmadmin).

    The objective is to shut down the bulletin board and application servers on the partitioned node, and clean up after them. Let's say that the partitioned remote node has the LMID MACH3. The shutdown command would look like this:

    
    tmshutdown -P MACH3
    

    BRIDGE Failure

    BRIDGE process failure is the easiest one of all to deal with, because System/T takes care of it for you. If a BRIDGE process fails it is automatically restarted, it reconnects automatically to other nodes in the network and new bulletin board information is downloaded to the partitioned node (that is, the node where the BRIDGE process failed).

    Recovery Considerations

    The BEA TUXEDO System requires a certain level of environmental stability to provide optimum functionality. Although the BEA TUXEDO administrative subsystem offers unparalleled capabilities of recovering from network, machine, and application process failures, it is not invulnerable. You should be aware of the following ways in which a BEA TUXEDO system works:

    • Application clients and servers that use the FASTPATH model of SYSTEM_ACCESS (the default) have direct memory access to the BEA TUXEDO shared data structures. Using the FASTPATH model helps ensure that BEA TUXEDO achieves its outstanding performance.
    • BEA TUXEDO uses the IPC (InterProcess Communication and File System) facilities provided by the operating system.

    If an application accidentally uses these facilities to write into the BEA TUXEDO shared memory or to a BEA TUXEDO file descriptor, or if it mistakenly uses any other BEA TUXEDO system resource, data may become corrupted, BEA TUXEDO functionality may be compromised, or an application may be brought down.

    It is inappropriate for a user or administrator to directly terminate application clients, application servers, or BEA TUXEDO administrative processes because these processes may be executing within a critical section (that is, updating shared information in shared memory). Interrupting a critical section during a memory update could potentially cause inconsistent internal data structures. (This is characteristic not only of BEA TUXEDO, but of any system that uses shared data.) Error messages in the BEA TUXEDO userlog that refer to locks or semaphores may indicate that such corruption has occurred.

    For maximum application availability, you can take advantage of BEA TUXEDO's facilities for managing redundancy, such as its multiple server, machine, and domain facilities. Distributing an application's functionality allows continued operation if a failure occurs in one area.