As the chapter title says, this chapter deals with how to use the
interactive monitor program,
tmadmin(1).
tmadmin
has almost 50 commands that fall into the following seven
categories; commands that:
affect the Universal
Device List and the transaction log
start, shutdown or modify part or all
of the configuration
let you monitor the
configuration and its performance
affect global transactions
affect services offered
affect servers and server groups
help you use the other commands
In this chapter we will cover the seven categories of
tmadmin
commands in the way we hope will be most useful to the
System/T administrator.
The major sections of this chapter are:
General syntax of
tmadmin
commands
Device list and TLOG commands
Booting and shutting down servers
Monitoring the configuration
Managing the configuration
Migrating servers
Handling a partitioned network
General Syntax of tmadmin Commands
tmadmin
is a command interpreter that provides for the inspection
and modification of a bulletin board and its associated
entities.
The command requires the
TUXCONFIG
environmental variable to be set.
Only one
tmadmin
process at a time can be the administrator.
In normal operation
tmadmin
is invoked without options
by the System/T administrator on an active node;
however, there are exceptions.
If the application is active but partitioned when
tmadmin
is invoked, not all
tmadmin commands are available.
If the application is inactive when the command is
invoked, again not all
tmadmin commands are available.
tmadmin Command Line Options
There are two command line options available:
- -r
instructs the command to enter the bulletin board in
read-only mode.
This leaves the administrator slot open; the process
attaches to the bulletin board as a client.
- -c
indicates a desire to enter
tmadmin
in configuration mode.
This form of the command can be invoked on any node,
including inactive ones.
Without the -c option, when the application is
inactive tmadmin can be successfully invoked
only from the MASTER node.
There is more about configuration mode in the next
subsection.
- -v
causes tmadmin
to display the TUXEDO version number and license number.
After printing out the information, tmadmin exits.
If the -v option is entered with either of the
other two options, the others are ignored; only the information
requested by the -v option is displayed.
Available Commands Matrix
The tmadmin commands available depend on the state of
the configuration, the type of node and the command line
option (if any).
Figure 1 summarizes this.
Fig. 1: Matrix showing tmadmin commands available
Command
| Config State
| Node Type
| Commands Available
|
tmadmin -c
| active/inactive
| native
| default, dumptlog, echo, help, quit,
livtoc, crdl, lidl, dsdl, indl, paginate, verbose>
|
tmadmin -r
| active
| native
| all -c plus bbls, bbparms, bbstat, dump, dumptlog,
printclient, printnet, printqueue,
printserver, printservice, printtrans, printgroup,
serverparms, serviceparms>
|
tmadmin
| active
| native
| all
|
tmadmin
| inactive
| master
| all -c plus crlog, dslog, inlog,
boot>
|
tmadmin
| inactive
| non-master
| <error>, function not
allowed
|
tmadmin
| partitioned
| backup master
| all, including master
|
tmadmin
| partitioned
| not backup master
| read-only commands on local bulletin board only
|
tmadmin Commands
Once tmadmin is invoked,
a greater-than sign (\^>\^) appears as a prompt
and the commands are available as shown in Figure 1.
The prompt may be preceded by a machine-id (see the
discussion of default that follows).
If the configuration is not active when the command is
invoked, the following message is displayed:
No bulletin board exists. Entering boot mode
>
Commands that Help You Use Other Commands
Figure 2 lists commands specifically designed to help you work
with the other tmadmin commands.
Most tmadmin commands
have an abbreviation.
In our tables and examples we always show the abbreviations.
In tables, such as Figure 2 below, the abbreviations are
enclosed in parentheses; in examples, we use the abbreviations
rather than the full command.
Fig. 2: tmadmin miscellaneous commands
Command(abbr)
| Description
|
default(d)
| set default values for arguments of other commands
|
dump(du)
| dump current bulletin board into a file
|
echo(e)
| echo input command lines
|
help(h)
| print command list or command syntax
|
paginate(page)
| pipe output of commands to a pager
|
quit(q)
| terminate the session
|
verbose(v)
| show output in verbose mode
|
!shlcmd
| escape to shell; do shlcmd
|
!!
| repeat previous shell command
|
<CR>
| repeat last tmadmin command
|
Default (d)
The default command of tmadmin allows default
values to be set
for several frequently used parameters.
The default parameters can be used by most of the tmadmin
commands, but not for
boot or shutdown.
Those commands ignore the default settings.
For all other commands, however,
once defaults are set, they remain in effect until the session ends
or until reset to a different value.
Parameters other than Machine ID (-m) can be unset
by entering \(** as the value.
The command:
default -m DBBL
resets the
-m
to the original state.
Entering the command default without options
produces a list of the current settings.
If no options are set the list looks like Figure 3
(comments have been added):
Fig. 3: default output
> d
Default Settings:
Group Name: (not set)
Server ID: (not set)
Machine ID: all
Queue Name: (not set)
Client Name: (not set)
Service Name: (not set)
User Name: (not set)
Blocks: 1000
Offset: 0
Path: /home/apps/bank/bankdl1
# Blocks, Offset and Path were picked up from
# the ENVFILE for the sample application.
# Path defaults to the value of FSCONFIG
>
In a multiprocessor environment,
Machine ID (-m) can be set to all,
to the DBBL, or to a specific processor.
If not set to a specific processor or to all, information
displayed is retrieved from the DBBL.
Once set to a specific processor, mid must explicitly
be set to DBBL to return to using only that bulletin
board.
If the Machine ID is set to a specific processor, information is
retrieved only from that processor.
The setting is displayed as part of the tmadmin prompt.
Figure 4 shows this.
Fig. 4: Prompt when mid is set
# 1. default mid not previously set
> d -m SITE1 # 2. set SITE1 as default mid
MACH1 > # 3. prompt now shows default mid
Optional vs. Required Arguments
Most tmadmin commands require explicit information about
the resource on which the command is to act\(emalthough
required arguments
can often be set via the default command as well
as on the command line.
tmadmin
reports an error if required information
is not available from either source.
Some
tmadmin
statistical commands treat
unspecified default parameters to mean all.
Verbose
tmadmin
commands that display information from the bulletin board
sometimes show different output when run with
verbose
mode on.
In the examples that follow we will
indicate whether
verbose
mode is on or off; where the display differs
a lot we will show both versions.
Monitoring the Configuration
tmadmin
provides the System/T administrator
with a window
into bulletin board operations.
It
allows the administrator fast and easy access to do
the following:
get information about the current configuration
look at statistics that show the amount of processing
activity.
dynamically reconfigure the application to serve
the current needs of users.
The first two items on the list are covered in this
section; dynamic reconfiguration is covered in the
section called
Managing the Configuration
later in this chapter.
To show the output produced by various tmadmin
commands the configuration file shown in Figure 5 was used.
This is the configuration file of an MP version of
bankapp.
Fig. 5: Configuration file for tmadmin examples
*RESOURCES
IPCKEY 80952
UID 4196
GID 601
PERM 0660
MAXACCESSERS 40
MAXSERVERS 35
MAXSERVICES 75
MAXCONV 10
MAXGTT 20
MASTER SITE1,SITE2
SCANUNIT 10
SANITYSCAN 12
BBLQUERY 180
BLOCKTIME 30
DBBLWAIT 6
OPTIONS LAN,MIGRATE
MODEL MP
LDBAL Y
#
*MACHINES
mchn1 LMID=SITE1
TUXDIR="/home/tuxroot"
APPDIR="/home/apps/bank"
ENVFILE="/home/apps/bank/ENVFILE"
TLOGDEVICE="/home/apps/bank/TLOG"
TLOGNAME=TLOG
TUXCONFIG="/home/apps/bank/tuxconfig"
TYPE="3B2"
ULOGPFX="/home/apps/bank/ULOG"
wgs386 LMID=SITE2
TUXDIR="/home2/tuxroot"
APPDIR="/home2/apps/bank"
ENVFILE="/home2/apps/bank/ENVFILE"
TLOGDEVICE="/home2/apps/bank/TLOG"
TLOGNAME=TLOG
TUXCONFIG="/home2/apps/bank/tuxconfig"
TYPE="386"
ULOGPFX="/home2/apps/bank/ULOG"
#
*GROUPS
DEFAULT: TMSNAME=TMS_SQL TMSCOUNT=2
# For NT/Netware, :bankdb: becomes ;bankdb;
BANKB1 LMID=SITE1 GRPNO=1
OPENINFO="TUXEDO/SQL:/home/apps/bank/bankdl1:bankdb:readwrite"
BANKB2 LMID=SITE2 GRPNO=2
OPENINFO="TUXEDO/SQL:/home2/apps/bank/bankdl2:bankdb:readwrite"
*NETWORK
SITE1 NADDR="0x00021112c00b6903"
BRIDGE="/dev/tcp"
NLSADDR="0x00021111c00b6903"
SITE2 NADDR="0x00021112c00b690c"
BRIDGE="/dev/tcp"
NLSADDR="0x00021111c00b690c"
*SERVERS
#
DEFAULT: RESTART=Y MAXGEN=5 REPLYQ=Y CLOPT="-A"
TLR SRVGRP=BANKB1 SRVID=1 RQADDR=tlr1 CLOPT="-A -- -T 100"
TLR SRVGRP=BANKB1 SRVID=2 RQADDR=tlr1 CLOPT="-A -- -T 200"
TLR SRVGRP=BANKB2 SRVID=3 RQADDR=tlr2 CLOPT="-A -- -T 600"
TLR SRVGRP=BANKB2 SRVID=4 RQADDR=tlr2 CLOPT="-A -- -T 700"
XFER SRVGRP=BANKB1 SRVID=5
XFER SRVGRP=BANKB2 SRVID=6
ACCT SRVGRP=BANKB1 SRVID=7
ACCT SRVGRP=BANKB2 SRVID=8
BAL SRVGRP=BANKB1 SRVID=9
BAL SRVGRP=BANKB2 SRVID=10
BTADDSRVGRP=BANKB1
BTADD SRVGRP=BANKB2 SRVID=12
AUDITC SRVGRP=BANKB1 SRVID=13 CONV=Y MIN=1 MAX=10
BALC SRVGRP=BANKB1 SRVID=24
BALC SRVGRP=BANKB2 SRVID=25
#
*SERVICES
DEFAULT: LOAD=50 AUTOTRAN=N
WITHDRAWAL PRIO=50 ROUTING=ACCOUNT_ID
DEPOSIT PRIO=50 ROUTING=ACCOUNT_ID
TRANSFER PRIO=50 ROUTING=ACCOUNT_ID
INQUIRY PRIO=50 ROUTING=ACCOUNT_ID
CLOSE_ACCT PRIO=40 ROUTING=ACCOUNT_ID
OPEN_ACCT PRIO=40 ROUTING=BRANCH_ID
BR_ADD PRIO=20 ROUTING=BRANCH_ID
TLR_ADD PRIO=20 ROUTING=BRANCH_ID
ABAL PRIO=30 ROUTING=b_id
TBAL PRIO=30 ROUTING=b_id
ABAL_BID PRIO=30 ROUTING=b_id
TBAL_BID PRIO=30 ROUTING=b_id
ABALC_BID PRIO=30 ROUTING=b_id
TBALC_BID PRIO=30 ROUTING=b_id
*ROUTING
ACCOUNT_ID FIELD=ACCOUNT_ID
BUFTYPE="FML"
RANGES="10000-59999:BANKB1,
60000-109999:BANKB2,
*:*"
BRANCH_ID FIELD=BRANCH_ID
BUFTYPE="FML"
RANGES="1-5:BANKB1,
6-10:BANKB2,
*:*"
b_id FIELD=b_id
BUFTYPE="VIEW:aud"
RANGES="1-5:BANKB1,
6-10:BANKB2,
*:*"
Using Tmadmin to Display Parameters
The tmadmin commands that are used primarily
to produce information about configuration parameters are
shown in Figure 6.
Fig. 6: tmadmin parameter display commands
Command(abbr)
| Description
|
bbparms(bbp)
| print a summary of bulletin board parameters
|
bbsread(bbls)
| list IPC resources on machine mid
|
serverparms(srp)
| print parameters of the specified server
|
serviceparms(scp)
| print parameters of the specified service
|
bbparms (bbp)
This command prints parameters from the
\(**RESOURCES section.
The display is shown in Figure 7.
Fig. 7: bbparms output
> bbparms
Bulletin Board Parameters:
MAXSERVERS: 35
MAXSERVICES: 75
MAXACCESSERS: 40
MAXGTT: 20
MAXCONV: 10
MAXBUFTYPE: 16
MAXBUFSTYPE: 32
IPCKEY: 35384
MASTER: SITE1,SITE2
MODEL: MP
LDBAL: Y
OPTIONS: LAN,MIGRATE
SCANUNIT: 10
SANITYSCAN: 12
DBBLWAIT: 6
BBLQUERY: 180
BLOCKTIME: 30
The display is the same with verbose mode on or off.
bbsread (bbls)
The bbsread command produces information about the
IPC resources on a local site.
The output is shown in Figure 8.
Fig. 8: bbsread output
SITE1> bbsread
IPC resources for the bulletin board on machine SITE1:
SHARED MEMORY: Key: 0x1013c38
SEGMENT 0:
ID: 15730
Size: 36924
Attached processes: 12
Last attach/detach by: 4181
This semaphore is the system semaphore
SEMAPHORE: Key: 0x1013c38
Id: 15666
| semaphore | current | last | # waiting |
| number | status | accesser | processes |
|----------------------------------------------|
| 0 | free | 4181 | 0 |
|------------|---------|-----------|-----------|
This semaphore set is part of the user-level semaphore
SEMAPHORE: Key: IPC_PRIVATE
Id: 11572
| semaphore | current | last | # waiting |
| number | status | accesser | processes |
|----------------------------------------------|
| 0 | locked | 4181 | 0 |
| 1 | locked | 4181 | 0 |
| 2 | locked | 4181 | 0 |
| 3 | locked | 4181 | 0 |
| 4 | locked | 4181 | 0 |
| 5 | locked | 4181 | 0 |
| 6 | locked | 4181 | 0 |
| 7 | locked | 4181 | 0 |
| 8 | locked | 4181 | 0 |
| 9 | locked | 4181 | 0 |
| 10 | locked | 4181 | 0 |
| 11 | locked | 4181 | 0 |
| 12 | locked | 4181 | 0 |
| 13 | locked | 4181 | 0 |
|------------|---------|-----------|-----------|
The display is the same with verbose mode on or off.
serverparms (srp)
The serverparms command produces the display shown in
Figure 9.
Again, the request is to display only information for SITE1.
Figure 9 shows just a sample of the output; a similar
report is produced for each server at SITE1.
Fig. 9: serverparms output
SITE1> srp -g BANKB1 -i 111
a.out Name: /home/apps/bank/TLR
Queue Name: tlr1
Server Options: RESTARTABLE
Max # Restarts: 5
Restart Command: (restartsrv)
Grace Period: 1 day
Group ID: 1
Server ID: 1
Machine ID: SITE1
The display is the same with verbose mode on or off.
serviceparms (scp)
The serviceparms command produces the display shown in Figure 10.
Fig. 10: serviceparms output
SITE1> scp -g BANKB1 -i 111 -s WITHDRAWAL
Service Name: WITHDRAWAL
Function Name: WITHDRAWAL
Load: 50
Priority: 50
Address: 0x2
The display is the same with verbose mode on or off.
Tmadmin Statistics
The tmadmin commands that display statistics are shown
in Figure 11.
Fig. 11: tmadmin statistics commands
Command(abbr)
| Description
|
bbstats(bbs)
| print a summary of the bulletin board's statistics
|
printclient(pclt)
| print names and other information about
active client processes
|
printgroup(pg)
| print server group table information
|
printnet(pnw)
| print count of messages in and out for specified machines;
indicates if machine is partitioned
|
printqueue(pq)
| print information for a specified queue or all queues
|
printserver(psr)
| print information for a specified server or all servers
|
printservice(psc)
| print information for a specified service or all services
|
shmstats(sstats)
| Available in
SHM
mode only.
Enable option for more exact statistics.
|
The format of the output of some statistics
commands is quite different depending on
whether verbose mode is off or on.
verbose mode off is useful for displaying statistics
the TUXEDO System administrator can use in deciding whether some action
should be taken to reconfigure the system.
When the verbose mode is on, additional detail
is displayed.
Statistics are collected by bulletin board.
Setting the default mid to all,
retrieves a current reading from each bulletin board.
Setting the default mid to a single processor,
retrieves statistics from the bulletin board on that
machine.
(The display may list resources on
all machines, but the statistics are provided for the specified
machine only.)
If the default mid is set to DBBL (or if it has not
been set at all during the current tmadmin session),
statistics are retrieved from the distinguished bulletin board.
In the displays, a zero in a column means there is nothing to report,
a dash means the information is not being collected
in the present mode of execution.
bbstats prints a brief summary of
the number of servers, services, request queues and groups.
The output is shown in Figure 12.
Fig. 12: bbstats output
> bbs
Current Bulletin Board Status:
Current number of servers: 24
Current number of services: 47
Current number of request queues: 20
Current number of server groups: 2
The output of
bbstats is the same with verbose mode on or off,
and is the same whether the default mid is all
or any single processor.
shmstats (sstats)
When running in
SHM
mode (that is, when
MODEL SHM
is specified) the
shmstats
command can be used to assure more accurate statistics.
When an application is active for many hours (or days),
the statistics have a tendency to get a out of synch.
shmstats
can be used to specify exact recording (with the
ex
argument), or approximate recording (with the
app
argument).
If the command:
sstats ex
is entered, TUXEDO System locks the bulletin board briefly and
resets several counters.
If the command
shmstats
is entered without arguments, it reports on which method
is currently in force.
printclient(pclt)
The
printclient
command displays information for a selected group of active
clients.
The information is shown in Figure 13.
usrname
and
cltname
are from values provided in a
TPINIT
buffer when the client joins the application.
If the names are longer than 8 characters, they are
truncated from the right; a plus sign appears to the right
of a name that has been truncated.
The
tran info
columns show the number of transactions begun and ended
directly by the client.
Status can be one of the following values:
- IDLE
The client has joined the application (
tpinit(3c)), but does not have
outstanding service request handles nor does it have active conversations.
- IDLET
The client is idle, as described above, and has initiated a
transaction (
tpbegin(3c)).
- BUSY
The client has joined the system and has at least one outstanding service
request handle or one active conversation.
- BUSYT
The client is busy, as described above, and has initiated a transaction.
Fig. 13: printclient output, verbose mode off
all> pclt
LMID User Name Client Name Time Status Bgn/Cmmt/Abrt
--------------- --------------- --------------- -------- ------- -------------
SITE1 tuxedo tmadmin 0:03:44 IDLE 0/0/0
SITE1 0:00:05 BUSY 0/0/0
SITE1 0:00:05 BUSY 0/0/0
In
verbose
mode, additional information is included as shown in Figure 14.
Fig. 14: printclient output, verbose mode on
> v
Verbose now on
> pclt
LMID: SITE1
Reply queue address: 114421
User Name: tuxedo
Application Name: tmadmin
Time Connected: 0:04:41
Requests Outstanding/Made: 0/0
Conversations Active/Initiated: 0/0
Transactions Begun: 0
Transactions Committed: 0
Transactions Aborted: 0
Transactions Begun Per Hour: 0
Requests Made Per Hour: 0
Conversations Initiated Per Hour: 0
Status: IDLE
LMID: SITE1
User Name:
Application Name:
Time Connected: 0:01:19
Requests Outstanding/Made: 1/0
Conversations Active/Initiated: 0/0
Transactions Begun: 0
Transactions Committed: 0
Transactions Aborted: 0
Transactions Begun Per Hour: 0
Requests Made Per Hour: 0
Conversations Initiated Per Hour: 0
Status: BUSY
printgroup(pg)
The printgroup command prints information about
server groups.
It can be specified either by -m machine
or -g groupname.
An error message is returned if machine is
all.
Output is shown in Figure 15.
Fig. 15: printgroup output
SITE1> pg -g BANKB1
Server group parameters:
Group Name: BANKB1
Group Number: 1
Group Options: RM
Primary Machine: SITE1
Current Machine: SITE1
The output is the same with verbose mode on or off.
printnet(pnw)
The printnet command can take a comma-separated list
of LMIDs.
If no list is provided, all BRIDGE processes are queried.
For each LMID, an indication is given if the machine is
partitioned.
If not partitioned, information is printed that shows the
other machines this one is connected to and the count of
messages in and out.
Since most System/T network traffic is between the DBBL
on the master and BBLs on
non-master machines,
at boot time only connections between the master (and
backup master) and non-master machines are brought up.
Connections from one non-master machine to another
are brought up when needed.
This is referred to as the ``lazy connection'' feature.
The printnet command shows only connections that
have been brought up.
The output of the printnet command is shown in Figure 16.
Fig. 16: printnet output
SITE1> printnet
SITE1 Connected To: msgs snd msgs rcv
wgs386 214 201
SITE2 Connected To: msgs snd msgs rcv
mchn1 201 214
printqueue (pq)
The printqueue command produces a display of information
about the activity on the queues.
Figure 17 shows the format with verbose mode off.
Fig. 17: printqueue output, mid set to all
all> pq
a.out Name Queue Name # Serv Wk Queued # Queued Ave. Len Machine
---------- ------------------ --------- -------- -------- -------
TMS_SQL BANKB2_TMS 2 - 0 - SITE2
BTADD 00002.00012 1 - 0 - SITE2
DBBL 80952 1 - 0 - SITE1
BRIDGE 33635384 1 - 0 - SITE2
ACCT 00002.00008 1 - 0 - SITE2
TLR tlr1 2 - 0 - SITE1
BAL 00001.00009 1 - 0 - SITE1
BAL 00002.00010 1 - 0 - SITE2
BBL 30002.00000 1 - 0 - SITE1
TMS_SQL BANKB1_TMS 2 - 0 - SITE1
BRIDGE 16858168 1 - 0 - SITE1
ACCT 00001.00007 1 - 0 - SITE1
BALC 00002.00025 1 - 0 - SITE2
BBL 30003.00000 1 - 0 - SITE2
TLR tlr2 2 - 0 - SITE2
BTADD 00001.00011 1 - 0 - SITE1
AUDITC 00001.00013 1 - 0 - SITE1
XFER 00001.00005 1 - 0 - SITE1
BALC 00001.00024 1 - 0 - SITE1
XFER 00002.00006 1 - 0 - SITE2
1 Queue Table Entry allocated for client processes.
dashes ( - ) indicate the information is not collected in the present mode.
zeroes ( 0 ) indicated information is collected but there is nothing to report.
Notice that most of the queue names in Figure 17
have been generated by the TUXEDO System software.
It defaults to "GRPNO.SRVID" where GRPNO is the number
of the server group associated with the server and SRVID
is the server identifier as specified in the configuration file.
Since the TLR servers participate in MSSQ sets,
only the TLR servers were assigned symbolic names via the
RQADDR parameter in the configuration file (see Figure 5).
The application was idle when Figure 17 was produced.
If requests were backed up for any server,
the following information would be significant:
- #~Queued
the number of service requests enqueued
If the example were of a SHM system, these columns would be
of interest:
- Wk~Queued
the load currently queued for a server
- Ave.~Len
the average length of the queue
In a single processor system (SHM),
when there is no work backed up in the queues the display shows
zeros in these columns.
(Also see the description of the
shmstats
command above)
In the multiprocessor system (MP), statistics for the
enqueued load and average queue length are not available.
Where figures are available,
if they indicate a problem in the queue,
the administrator might boot more servers (assuming
more are available).
The output of the printqueue command is quite different
when verbose mode is on.
In that case, the pertinent information is whether
and how many times the server on the queue is restartable.
Figure 18 shows a sample using one queue name.
Fig. 18: printqueue, verbose mode on
all> v
Verbose now on.
all> pq 00002.00012
a.out Name: /home2/units/apps/bankapp/BTADD
Queue Name: 00002.00012
# Servers on Queue: 1
Server Options: RM, RESTARTABLE
Max # Restarts: 5
Restart Command: (restartsrv)
Grace Period: 1 day
Queue Type: USER
printserver (psr)
The printserver command provides information about the
work being done by the application's servers.
Fig. 19: printserver output, mid set to all
all> psr
Totals for all machines:
a.out Name Queue Name Grp Name ID RqDone Load Done Machine
---------- ---------- -------- -- ------ --------- -------
BBL 30003.00000 SITE2 0 49 2450 SITE2
BBL 30002.00000 SITE1 0 53 2650 SITE1
DBBL 80952 SITE1 0 460 23000 SITE1
TLR tlr1 BANKB1 1 55 2750 SITE1
BRIDGE 33635384 SITE2 1 0 0 SITE2
BRIDGE 16858168 SITE1 1 0 0 SITE1
TLR tlr1 BANKB1 2 45 2250 SITE1
TLR tlr2 BANKB2 3 49 2450 SITE2
TLR tlr2 BANKB2 4 51 2550 SITE2
XFER 00001.00005 BANKB1 5 0 0 SITE1
XFER 00002.00006 BANKB2 6 0 0 SITE2
ACCT 00001.00007 BANKB1 7 100 5000 SITE1
ACCT 00002.00008 BANKB2 8 100 5000 SITE2
BAL 00001.00009 BANKB1 9 0 0 SITE1
BAL 00002.00010 BANKB2 10 0 0 SITE2
BTADD 00001.00011 BANKB1 11 20 1000 SITE1
BTADD 00002.00012 BANKB2 12 20 1000 SITE2
AUDITC 00001.00013 BANKB1 13 0 0 SITE1
BALC 00001.00024 BANKB1 14 0 0 SITE1
BALC 00002.00025 BANKB2 15 0 0 SITE2
TMS_SQL BANKB2_TMS BANKB2 30001 0 0 SITE2
TMS_SQL BANKB1_TMS BANKB1 30001 0 0 SITE1
TMS_SQL BANKB2_TMS BANKB2 30002 120 6000 SITE2
TMS_SQL BANKB1_TMS BANKB1 30002 120 6000 SITE1
printserver output can be used to check on the load and
number of requests handled by each server.
This is different information from that available through the
printqueue command.
Here the Rq Done and Load Done figures are
cumulative from the time the system was booted.
Figure 19 shows what was done by each of the
four TLR servers.
For MSSQ sets such as these, the imbalance in the figures
of the MSSQ set at SITE1 might indicate a problem.
With verbose mode on and
the machine ID set to an individual processor, the
information is presented in the form shown in Figure 20.
Fig. 20: printserver output, mid set to a processor, verbose mode on
SITE1> psr -g BANKB1 -i 1
Group ID: BANKB1, Server ID: 1, Machine ID: SITE1
Process ID: 2133, Request Qaddr: 1508, Reply Qaddr: 1210
Server Type: USER
a.out Name: /home/apps/bank/TLR
Queue Name: tlr1
Options: RESTARTABLE
Max # Restarts: 5
Restart Command: (restartsrv)
Grace Period: 1 day
Generation: 1, Max message type: 1073741824
Creation time: Tue Oct 01 10:19:11 1991
Up time: 0:23:21
Requests done: 55
Load done: 2750
Current Status: ( IDLE )
If the server is restartable,
the Generation line in the above display can be checked to see
the number of times the server has been restarted.
The figure 1 in the display means the server has been
started once but has not been restarted.
Current Status may sometimes be reported as UNKNOWN.
This almost always means that status could not be determined because
the message queues were full.
printservice (psc)
The printservice command provides the additional detail
that shows the number of requests handled by
each service within a server.
With verbose mode off it appears as shown in Figure 21.
Fig. 21: printservice output, verbose off
all> psc
Totals for all machines:
Service Name Routine Name a.out Name Grp Name ID Machine # Done Status
------------ ------------ ---------- -------- -- ------- ------ ------
AUDITC AUDITC AUDITC BANKB1 1 SITE1 0 AVAIL
TRANSFER TRANSFER XFER BANKB2 201 SITE2 0 AVAIL
OPEN_ACCT OPEN_ACCT ACCT BANKB2 202 SITE2 6 AVAIL
CLOSE_ACCT CLOSE_ACCT ACCT BANKB2 202 SITE2 0 AVAIL
TBAL_BID TBAL_BID BAL BANKB2 203 SITE2 0 AVAIL
TBAL TBAL BAL BANKB2 203 SITE2 0 AVAIL
ABAL_BID ABAL_BID BAL BANKB2 203 SITE2 0 AVAIL
ABAL ABAL BAL BANKB2 203 SITE2 0 AVAIL
TLR_ADD TLR_ADD BTADD BANKB2 204 SITE2 3 AVAIL
BR_ADD BR_ADD BTADD BANKB2 204 SITE2 3 AVAIL
TBALC_BID TBALC_BID BALC BANKB2 205 SITE2 0 AVAIL
ABALC_BID ABALC_BID BALC BANKB2 205 SITE2 0 AVAIL
WITHDRAWAL WITHDRAWAL TLR BANKB2 211 SITE2 0 AVAIL
INQUIRY INQUIRY TLR BANKB2 211 SITE2 0 AVAIL
DEPOSIT DEPOSIT TLR BANKB2 211 SITE2 2 AVAIL
WITHDRAWAL WITHDRAWAL TLR BANKB2 212 SITE2 0 AVAIL
INQUIRY INQUIRY TLR BANKB2 212 SITE2 0 AVAIL
DEPOSIT DEPOSIT TLR BANKB2 212 SITE2 2 AVAIL
WITHDRAWAL WITHDRAWAL TLR BANKB2 213 SITE2 0 AVAIL
INQUIRY INQUIRY TLR BANKB2 213 SITE2 0 AVAIL
DEPOSIT DEPOSIT TLR BANKB2 213 SITE2 2 AVAIL
TMS TMS TMS_SQL BANKB2 30001 SITE2 1 AVAIL
TRANSFER TRANSFER XFER BANKB1 101 SITE1 1 AVAIL
TMS TMS TMS_SQL BANKB1 30001 SITE1 4 AVAIL
TMS TMS TMS_SQL BANKB2 30002 SITE2 10 AVAIL
OPEN_ACCT OPEN_ACCT ACCT BANKB1 102 SITE1 60 AVAIL
CLOSE_ACCT CLOSE_ACCT ACCT BANKB1 102 SITE1 0 AVAIL
TMS TMS TMS_SQL BANKB1 30002 SITE1 65 AVAIL
TBAL_BID TBAL_BID BAL BANKB1 103 SITE1 0 AVAIL
TBAL TBAL BAL BANKB1 103 SITE1 0 AVAIL
ABAL_BID ABAL_BID BAL BANKB1 103 SITE1 0 AVAIL
ABAL ABAL BAL BANKB1 103 SITE1 0 AVAIL
TLR_ADD TLR_ADD BTADD BANKB1 104 SITE1 3 AVAIL
BR_ADD BR_ADD BTADD BANKB1 104 SITE1 3 AVAIL
TBALC_BID TBALC_BID BALC BANKB1 105 SITE1 0 AVAIL
ABALC_BID ABALC_BID BALC BANKB1 105 SITE1 0 AVAIL
WITHDRAWAL WITHDRAWAL TLR BANKB1 111 SITE1 1 AVAIL
INQUIRY INQUIRY TLR BANKB1 111 SITE1 0 AVAIL
DEPOSIT DEPOSIT TLR BANKB1 111 SITE1 19 AVAIL
WITHDRAWAL WITHDRAWAL TLR BANKB1 112 SITE1 0 AVAIL
INQUIRY INQUIRY TLR BANKB1 112 SITE1 0 AVAIL
DEPOSIT DEPOSIT TLR BANKB1 112 SITE1 20 AVAIL
WITHDRAWAL WITHDRAWAL TLR BANKB1 113 SITE1 0 AVAIL
INQUIRY INQUIRY TLR BANKB1 113 SITE1 0 AVAIL
DEPOSIT DEPOSIT TLR BANKB1 113 SITE1 20 AVAIL
Summary of Statistics Commands
The examples of the displays from the
tmadmin(1) statistics commands are just a fraction
of those that can be produced.
Combinations of options can be used to narROW the information.
As the TUXEDO System administrator gains experience working with the
servers and services of an application, ways in which the
information can be used to tune the system will become
apparent.
Managing the Configuration
The tmadmin commands that are used in reconfiguring
a running system and managing transactions when necessary,
consist of the commands shown in Figure 22.
The commands that make changes in the parameters of the
configuration stay in effect only until the system
(or component) is shut down.
Permanent changes to the
TUXCONFIG
file can be made by entering
config
to invoke
tmconfig(1)
or by quitting
tmadmin
and invoking
tmconfig
directly.
Fig. 22: tmadmin commands for managing services and transactions
Command(abbr)
| Description
|
aborttrans(abort)
| notify the coordinator of a transaction,
or a participant, to abort it
|
committrans(commit)
| notify a participant of a decided transaction
to commit heuristically
|
printtrans(pt)
| print information from the global transaction table
|
advertise(adv)
| add a service to the service table
|
unadvertise(unadv)
| remove a service from the service table
|
suspend(susp)
| remove a service
from the list of
those available.
Server or queue identifiers can be used to
broaden the scope.
|
resume(res)
| return a suspended service to the list of those available
Server or queue identifiers can be used to
broaden the scope.
|
changeload(chl)
| change the load specified for a service
|
changepriority(chp)
| change the priority specified for a service
|
changetrantime(chtt)
| change the time limit specified for a service
|
config(conf)
| make changes in
TUXCONFIG
|
If errors are encountered by a reconfiguration command,
a message indicating the
error is displayed on the terminal.
More information is available in the central event log.
The boot and shutdown commands of
tmadmin(1)
could be added to the collection shown in Figure 22.
They are the same
as
tmboot(1) and
tmshutdown(1),
respectively.
They might be included in this discussion
because one way to change the services available is to start
more servers or shut some down,
but since they are covered in detail in the previous
chapter they are omitted here.
Most of the commands listed above
affect only the information in the bulletin
board structure.
They do not change the content of TUXCONFIG.
TUXCONFIG can be changed only by
tmadmin
config
or its shell counterpart,
tmconfig(1).
The details of using
tmconfig
can be found in the next chapter,
The Use of tmconfig(1)
and in the
tmconfig(1)
manual page in the
BEA TUXEDO Reference Manual: Section 1.
Managing Transactions
In normal System/T operations the software automatically
manages global transactions and entries remain in the
Global Transaction Table
(GTT)
for such a brief moment,
it is very difficult to capture a live one when you
enter a
printtrans
command.
However, if there has been some problem: the network
(or part of it) has failed,
the system has crashed
or a server that participates in a global transaction
has gone down,
there may be entries left in the
GTT
that require administrator intervention to push through
to a reasonable conclusion, (which generally means to
abort the transaction).
How to Detect a Problem
You may have a problem that requires intervention if:
Transactions persist in the GTT.
As noted above, entries normally stay in the
GTT
for such a short time that you rarely get any output
when you enter the
printtrans
command.
There is a logical disconnect between the
transaction status and the group status. For example,
the transaction state is
TMGDECIDED,
but a participating group shows a Group State of
TMGACTIVE.
Database locks are being held until one participating
group finishes committing.
Valid Transaction States
As is shown in the
verbose
mode of the
printtrans
command,
the
GTT
shows a status for the transaction and a status for each
participating group.
The lists of valid status codes are shown in Figure 23 and
Figure 24
Fig. 23: Valid transaction states
Transaction
-----------
TMGACTIVE
transaction is active, no errors have occurred
TMGABORTONLY
transaction can only be aborted, but abort has not yet been called
TMGABORTED
transaction can only be aborted; abort has been called
TMGCOMCALLED
commit has been called; 1st phase in progress
TMGREADY
transaction has completed 1st phase of commit
TMGDECIDED
transaction has been written to TLOG
Fig. 24: Valid group transaction states
Group
-----
TMGACTIVE|TMGNOTPART
Group is active but has not yet been called
TMGACTIVE
Group working on a transaction; no errors have occurred
TMGABORTED
work for this Group has been aborted
TMGREADONLY
work for this Group done in read-only mode
TMGREADY
work for this Group has been successfully pre-committed
TMGHCOMMIT
work for this Group has been heuristically committed
TMGHABORT
work for this Group has been heuristically aborted
TMGDONE
work for this Group has been committed
Clearing the GTT
When you have reason to believe that some intervention is
needed, here is a recommended procedure:
Enter a
printtrans
command in
verbose
mode
to see what is listed in the
GTT.
Enter a series of
aborttrans
commands or
committrans
commands,
one for each participating group.
printtrans(pt)
The printtrans command takes either a -m
machine or -g groupname argument.
The output differs slightly depending on
whether verbose mode is on or off.
The two forms of output are shown in Figure 25
and Figure 26.
Fig. 25: printtrans output, verbose mode on
all>
pt
>> index=0 gtrid=x0 x259a633a xf2
: Machine id: SITE1, Transaction status: TMGACTIVE
Group count: 1, timeout: 30, time left: 39
Known participants:
group: BANKB2, status: TMGACTIVE, remote, coord
In Figure 25,
time left
is greater than
timeout
because the software makes sure that you have
at least
timeout
seconds, regardless of where in the
SCANUNIT
cycle the transaction began.
Fig. 26: printtrans output, verbose mode off
all> v
Verbose now off.
all> pt
>> index=0 gtrid=x0 x259a633a x19e
: Machine id: SITE1, Transaction status: TMGACTIVE
Group count: 1
If the output of the
printtrans
command leads to the conclusion that you need to proceed
with a series of
committrans
or
aborttrans
commands,
the most important parts of the ENTRY listing are the index
number and the names of the participating groups;
they will be needed as arguments for the
committrans
or
aborttrans
commands.
aborttrans(abort)
The
aborttrans
command has only one required argument: the
tranindex,
which is the
index=
number from the
printrans
command.
You can optionally specify the
groupname
with a
-g.
If you do specify the
groupname,
the operation applies only to
tranindex
at the specified group.
If you choose not to specify
groupname,
the coordinator of the global transaction is
requested to abort it, and all groups are aborted.
committrans(commit)
For
committrans
both
grpname,
with a
-g
flag, and
tranindex
are required and the command must be entered for all
participating groups before the operation is complete.
If the transaction is not in
TMGREADY
state for any participating group, the command fails.
A Final Note on Managing Transactions
We cannot emphasize too strongly that administrator
intervention for transactions is an extremely rare
occurrence.
The chances are that you will run your TUXEDO System application
for several years without have the occasion to use the
aborttrans
and
committrans
commands.
Migrating Servers
There are two
tmadmin
commands that can be used to migrate servers.
Fig. 27: tmadmin migrate commands
Command(abbr)
| Description
|
migrategroup(migg)
| migrate servers in a group to their alternate location
|
migratemach(migm)
| migrate servers by using LMIDs
|
A special case involves switching from the ACTING MASTER
to the ACTING BACKUP node.
We include a discussion of this topic in this section
because it has to do with moving the DBBL server.
The following commands are involved:
Fig. 28: tmadmin commands for switching the MASTER node
Command(abbr)
| Description
|
shutdown(stop)
| shutdown servers for migration
|
master(m)
| switch MASTER to BACKUP or vice versa
|
pclean(pcl)
| force a bbclean(bbc) then
remove partitioned processes from a non-partioned bulletin
board
|
reconnect(rco)
| make a new connection from a non-partitioned machine to
a partitioned machine
|
migrategroup(migg)
The migrategroup command takes the name of a single
server group as an argument.
The groupname must be explicitly stated; it can not
be provided via the default command.
Servers to be migrated must first be shutdown with this
command:
stop -R -g groupname
If you prefer,
tmshutdown
can be used instead of
tmadmin shutdown(stop).
The server group being migrated must have an alternate
location specified in its LMID parameter.
Servers in the group must specify RESTART=Y
and the MIGRATE option must be specified in the
RESOURCES section.
The migrategroup command
migg groupname
boots the server group on the new machine.
If transactions are being logged for the servers involved
in a group migration, you may need to dump the TLOG, load
it and perform a warm start.
See ``TLOG Commands'' earlier in this chapter.
migratemach(migm)
The migratemach command can take one LMID
as an argument.
The LMID names the
processor where the server group(s) have been running.
The alternate location must be the same for all server
groups on the LMID.
Servers on the LMID must specify RESTART=Y
and the MIGRATE option must be specified in the
RESOURCES section.
Servers to be migrated must first be shutdown with this
command:
stop -R -l lmid
If you prefer,
tmshutdown
can be used instead of
tmadmin shutdown(stop).
The migratemach command boots
all affected server groups on the new
machine.
migratemach command
migm machine
boots all server groups on the new machine.
The ground rules for
migratemach
call for all server groups on
machine
to have the same alternate location.
Canceling a Migration
You can cancel a migration after the
stop -R
command has completed, but before going ahead with
the migrate command,
by using the
-cancel
option of available with each command.
The command lines look like this:
> migg -cancel groupname
> migm -cancel lmid
The effect of the
-cancel
option is the same as shutting down without the -R
option; the server entries are deleted from the bulletin
board.
(The -R option retains the server names in
anticipation of a migrate command)
Switching MASTER and BACKUP Nodes
This is a special case of server migration.
While it generally
comes into play when the network is partitioned,
there are situations where the administrator needs to shut down the master
and a migration should be done from the master to the backup node.
It is being covered here, adjacent to the discussion of
``Handling a Partitioned Network,''
because it also is related to how the System/T administrator
might deal with such a condition.
If you recall, in the discussion in Chapter 4 of the
MASTER parameter in the \(**RESOURCES section of the
configuration file we described the
MASTER/BACKUP terminology.
Here is where that terminology is most helpful.
Use this procedure if it becomes
necessary to take over for a crashed MASTER:
- Step 1:>
Run tmadmin on the ACTING BACKUP node
- Step 2:>
Invoke the master command (it takes no arguments)
to become the \%ACTING MASTER
- Step 3:>
Invoke the pclean command specifying the LMID of the
old ACTING MASTER.
This will remove bulletin board entries
for partitioned processes.
- Step 4:>
Once the crashed node is restored,
from the ACTING MASTER run boot -B lmid -l
lmid, where lmid is the machine id of
the old ACTING MASTER.
This effectively makes that node the ACTING BACKUP.
- Step 5:>
If you wish, you can now run tmadmin master
on the ACTING BACKUP to return the nodes to their
original roles.
It may be necessary for the administrator to shutdown the ACTING MASTER
node.
Before doing so, the master and backup should be switched. In this case,
the
pclean
command should not be run (Step 3 above). Once the switch is complete,
the ACTING BACKUP can be shutdown using tmshutdown.
If MASTER is Disconnected from the Network
If the
MASTER
becomes disconnected from the network and will not be
reconnected for some time, you may want to migrate control
to the
BACKUP.
Do this before using the above procedure:
- Step 0:>
Shut down all System/T processes on
MASTER
and release all
IPC
resources.
Then go ahead with Step 1> above.
Handling a Partitioned Network
In System/T networks a partition is said to exist
when one or more remote nodes are not
accessible to processes on the MASTER node.
Partition of a System/T application network may
result from any of three failures:
Node failure
LAN failure
BRIDGE process failure
Detection and diagnosis of a partition is the
responsibility of the System/T administrator, who must take
appropriate action to recover.
In all three types of failure the symptom is the same;
the MASTER node has lost access to a remote node(s),
remote node(s) can not access the MASTER node.
In the sections that follow, we give you
a general idea of how you
might become aware that partition has occurred,
some suggestions for determining what type of failure
caused the partition and finally,
specific steps to take to restore the network.
The first two topics have a range of possibilities
that depend to a large extent on your configuration
and your own methods of
administering the application.
The last topic, restoring the network,
is one where we can offer some detailed help.
How You Learn that Partition Has Occurred
Here are some of the ways in which the System/T administrator
might learn that the network is partitioned:
- userlog
When things go wrong with the network, System/T processes
begin sending messages to the userlog.
If you have set up your log under RFS
so that all messages to userlog go to one file, and
if you monitor that file regularly (perhaps with a window
on your terminal where you run tail -f on the
userlog), you will begin to see failure messages.
If RFS is using the same network, the remote file systems may no longer
be available.
- tmadmin
The printnet command of tmadmin tells you of
nodes that are partitioned.
The printservice command shows suspended services.
If services on a remote node all show status of
PARTITIONED,
you may have a partition problem.
- phone calls
Users of your application are quite likely to be the first
to realize that something is not working properly.
If your phone begins to ring and users begin to ask,
``How come I can't get the BUY service?''
you probably have a partition problem.
How to Find Out What Has Failed
The thing you are most likely to detect
at once is a failure of the MASTER node.
But assuming the MASTER node is still in business,
the first step in closing in on the problem is to learn the
extent of it.
Invoke tmadmin and see if the printnet and
printservice commands indicate that more than one
node is partitioned.
If more than one node is partitioned, it strongly suggests the
problem is a LAN failure; simultaneous failure of two or
more nodes (or two or more BRIDGE processes) is statistically
improbable.
Your next step might be to check out the LAN.
Your LAN probably has diagnostic tools that you can
use to check the viability of the network.
If the problem seems to be in a single node, the
difficulty of verifying a node failure
depends to a large extent on whether you have physical
access of the node.
If it is in the same building as the MASTER, it is not too
hard to go check it out to see if it is still running.
If it is still running, it again points to the likelihood
of the problem being in the LAN.
If the location is physically remote, you will have to rely
on other means to find out if the node has failed.
Steps to Take to Recover
The way you recover from a partition differs depending on
the type of failure.
The preceding paragraphs suggested how you determine the
type of failure.
Node Failure
If the MASTER node has failed, login to the BACKUP MASTER
and execute the tmadmin master command to make it the
ACTING MASTER.
Follow this by running pclean with the LMID of the
old MASTER to clean up bulletin board entries for processes
that were running on the failed node.
If the failure is in a remote node, run pclean from
the MASTER, specifying the LMID of the failed node.
Either of the above two steps has the effect of removing the
failed node from the application and continuing to run with
the remaining nodes of the configuration.
This may resolve your System/T problem;
however, you are left with a
hardware problem.
LAN Failure
Your course of action with a LAN failure depends on the
severity of the failure.
Transient LAN Failure
A transient LAN failure is one that corrects itself within
minutes.
The LAN is back to a viable state, but BRIDGE processes may
be left unconnected.
After LAN failures of
very short duration, BRIDGE processes try to reconnect
themselves.
If the reconnection is successful, the transient LAN
failure may slip by unnoticed.
However, if the BRIDGE processes are not able to reconnect
automatically, a message is sent to the userlog.
When such a message is seen in the userlog,
use the tmadmin reconnect command.
The command takes two LMIDs for arguments, as follows:
reconnect non-partitioned lmid partitioned lmid
The reconnect command initiates a new connection
between two BRIDGE processes.
Severe LAN Failure
If the LAN failure is one that is not going to correct
itself in a short time, you will probably want to take the
partitioned node out of the network.
This can be done gracefully by using the -P option
of tmshutdown (or shutdown in tmadmin).
The objective is to shut down the bulletin board
and application servers on the partitioned node,
and clean up after them.
Let's say that the partitioned remote node has the LMID
MACH3.
The shutdown command would look like this:
tmshutdown -P MACH3
BRIDGE Failure
BRIDGE process failure is the easiest one of all to deal
with, because System/T takes care of it for you.
If a BRIDGE process fails it is automatically restarted,
it reconnects automatically to other nodes in the network
and new bulletin board information is downloaded to the
partitioned node (that is, the node where the BRIDGE
process failed).
Recovery Considerations
The BEA TUXEDO System requires a certain level of
environmental stability to provide optimum functionality.
Although the BEA TUXEDO administrative subsystem offers
unparalleled capabilities of recovering from network, machine,
and application process failures, it is not invulnerable.
You should be aware of the following ways in which a
BEA TUXEDO system works:
-
Application clients and servers that use
the FASTPATH model of SYSTEM_ACCESS (the default)
have direct memory access to the BEA TUXEDO shared data structures.
Using the FASTPATH model helps ensure
that BEA TUXEDO achieves its outstanding performance.
-
BEA TUXEDO uses the IPC (InterProcess Communication
and File System) facilities provided by the operating system.
If an application accidentally uses these facilities to write into
the BEA TUXEDO shared memory or to a BEA TUXEDO file descriptor,
or if it mistakenly uses any other BEA TUXEDO system resource,
data may become corrupted,
BEA TUXEDO functionality may be compromised,
or an application may be brought down.
It is inappropriate for a user or administrator to directly
terminate application clients,
application servers, or BEA TUXEDO administrative processes
because these processes may be executing
within a critical section
(that is, updating shared information in shared memory).
Interrupting a critical section during a memory update
could potentially cause inconsistent internal data structures.
(This is characteristic not only of BEA TUXEDO,
but of any system that uses shared data.)
Error messages in the BEA TUXEDO userlog that refer to
locks or semaphores may indicate that such corruption has occurred.
For maximum application availability,
you can take advantage of BEA TUXEDO's facilities
for managing redundancy,
such as its multiple server, machine, and domain facilities.
Distributing an application's functionality
allows continued operation if a failure occurs in one area.