Solaris Modular Debugger Guide

Chapter 3 Language Syntax

This chapter describes the MDB language syntax, operators, and rules for command and symbol name resolution.

Syntax

The debugger processes commands from standard input. If standard input is a terminal, MDB provides terminal editing capabilities. MDB can also process commands from macro files and from dcmd pipelines, described below. The language syntax is designed around the concept of computing the value of an expression (typically a memory address in the target), and applying a dcmd to that address. The current address location is referred to as dot, and " . " is used to reference its value.

A metacharacter is one of the following characters:

[ ] | ! / \ ? = > $ : ; NEWLINE SPACE TAB

A blank is a TAB or a SPACE. A word is a sequence of characters separated by one or more non-quoted metacharacters. Some of the metacharacters function only as delimiters in certain contexts, as described below. An identifier is a sequence of letters, digits, underscores, periods, or back quotes beginning with a letter, underscore, or period. Identifiers are used as the names of symbols, variables, dcmds, and walkers. Commands are delimited by a NEWLINE or semicolon ( ; ).

A dcmd is denoted by one of the following words or metacharacters:

/ \ ? = > $character :character ::identifier

dcmds named by metacharacters or prefixed by a single $ or : are provided as built-in operators, and implement complete compatibility with the command set of the legacy adb(1) utility. After a dcmd has been parsed, the /, \, ?, =, >, $, and : characters are no longer recognized as metacharacters until the termination of the argument list.

A simple-command is a dcmd followed by a sequence or zero or more blank-separated words. The words are passed as arguments to the invoked dcmd, except as specified under "Arithmetic Expansion" and "Quoting". Each dcmd returns an exit status that indicates it was either successful, failed, or was invoked with invalid arguments.

A pipeline is a sequence of one or more simple commands separated by |. Unlike the shell, dcmds in MDB pipelines are not executed as separate processes. After the pipeline has been parsed, each dcmd is invoked in order from left to right. Each dcmd's output is processed and stored as described in "dcmd Pipelines". After the left-hand dcmd is complete, its processed output is used as input for the next dcmd in the pipeline. If any dcmd does not return a successful exit status, the pipeline is aborted.

An expression is a sequence of words that is evaluated to compute a 64-bit unsigned integer value. The words are evaluated using the rules described in "Arithmetic Expansion".

Commands

A command is one of the following:

pipeline [ ! word ... ] [ ; ]

A simple-command or pipeline can be optionally suffixed with the ! character, indicating that the debugger should open a pipe(2) and send the standard output of the last dcmd in the MDB pipeline to an external process created by executing $SHELL -c followed by the string formed by concatenating the words after the ! character. For more details, refer to "Shell Escapes".

expression pipeline [ ! word ... ] [ ; ]

A simple-command or pipeline can be prefixed with an expression. Before execution of the pipeline, the value of dot (the variable denoted by " . ") is set to the value of the expression.

expression , expression pipeline [ ! word ... ] [ ; ]

A simple-command or pipeline can be prefixed with two expressions. The first is evaluated to determine the new value of dot, and the second is evaluated to determine a repeat count for the first dcmd in the pipeline. This dcmd will be executed count times before the next dcmd in the pipeline is executed. The repeat count applies only to the first dcmd in the pipeline.

, expression pipeline [ ! word ... ] [ ; ]

If the initial expression is omitted, dot is not modified; however, the first dcmd in the pipeline will be repeated according to the value of the expression.

expression [ ! word ... ] [ ; ]

A command can consist only of an arithmetic expression. The expression is evaluated and the dot variable is set to its value, then the previous dcmd and arguments are executed using the new value of dot.

expression , expression [ ! word ... ] [ ; ]

A command can consist only of a dot expression and repeat count expression. After dot is set to the value of the first expression, the previous dcmd and arguments are repeatedly executed the number of times specified by the value of the second expression.

, expression [ ! word ... ] [ ; ]

If the initial expression is omitted, dot is not modified but the previous dcmd and arguments are repeatedly executed the number of times specified by the value of the count expression.

! word ... [ ; ]

If the command begins with the ! character, no dcmds are executed and the debugger executes $SHELL -c followed by the string formed by concatenating the words after the ! character.

Comments

A word beginning with // causes that word and all the subsequent characters up to a NEWLINE to be ignored.

Arithmetic Expansion

Arithmetic expansion is performed when an MDB command is preceded by an optional expression representing a start address, or a start address and a repeat count. Arithmetic expansion can also be performed to compute a numerical argument for a dcmd. An arithmetic expression can appear in an argument list enclosed in square brackets preceded by a dollar sign ($[ expression ]), and will be replaced by the value of the expression.

Expressions can contain any of the following special words:

integer

The specified integer value. Integer values can be prefixed with 0i or 0I to indicate binary values, 0o or 0O to indicate octal values, 0t or 0T to indicate decimal values, and 0x or 0X to indicate hexadecimal values (the default).

0[tT][0-9]+.[0-9]+

The specified decimal floating point value, converted to its IEEE double-precision floating point representation

'cccccccc'

The integer value computed by converting each character to a byte equal to its ASCII value. Up to eight characters can be specified in a character constant. Characters are packed into the integer in reverse order (right-to-left), beginning at the least significant byte.

<identifier

The value of the variable named by identifier

identifier

The value of the symbol named by identifier

(expression)

The value of expression

.

The value of dot

&

The most recent value of dot used to execute a dcmd

+

The value of dot incremented by the current increment

^

The value of dot decremented by the current increment

The increment is a global variable that stores the total bytes read by the last formatting dcmd. For more information on the increment, refer to the discussion of "Formatting dcmds".

Unary Operators

Unary operators are right associative and have higher precedence than binary operators. The unary operators are:

#expression

Logical negation

~expression

Bitwise complement

-expression

Integer negation

%expression

Value of a pointer-sized quantity at the object file location corresponding to virtual address expression in the target's virtual address space

%/[csil]/expression

Value of a char-, short-, int-, or long-sized quantity at the object file location corresponding to virtual address expression in the target's virtual address space

%/[1248]/expression

Value of a one-, two-, four-, or eight-byte quantity at the object file location corresponding to virtual address expression in the target's virtual address space

*expression

Value of a pointer-sized quantity at virtual address expression in the target's virtual address space

*/[csil]/expression

Value of a char-, short-, int-, or long-sized quantity at virtual address expression in the target's virtual address space

*/[1248]/expression

Value of a one-, two-, four-, or eight-byte quantity at virtual address expression in the target's virtual address space

Binary Operators

Binary operators are left associative and have lower precedence than unary operators. The binary operators, in order of precedence from highest to lowest, are:

*

Integer multiplication

%

Integer division

#

Left-hand side rounded up to next multiple of right-hand side

+

Integer addition

-

Integer subtraction

<<

Bitwise shift left

>>

Bitwise shift right

==

Logical equality

!=

Logical inequality

&

Bitwise AND

^

Bitwise exclusive OR

|

Bitwise inclusive OR

Quoting

Each metacharacter described above (see Chapter 3, Language Syntax) terminates a word unless quoted. Characters can be quoted (forcing MDB to interpret each character as itself without any special significance) by enclosing them in a pair of single (') or double (") quotation marks. A single quote cannot appear within single quotes. Inside double quotes, MDB recognizes the C programming language character escape sequences.

Shell Escapes

The ! character can be used to create a pipeline between an MDB command and the user's shell. If the $SHELL environment variable is set, MDB will fork and exec this program for shell escapes; otherwise /bin/sh is used. The shell is invoked with the -c option followed by a string formed by concatenating the words after the ! character.

The ! character takes precedence over all other metacharacters, except semicolon (;) and NEWLINE. After a shell escape is detected, the remaining characters up to the next semicolon or NEWLINE are passed "as is" to the shell. The output of shell commands cannot be piped to MDB dcmds. Commands executed by a shell escape have their output sent directly to the terminal, not to MDB.

Variables

A variable is a variable name, a corresponding integer value, and a set of attributes. A variable name is a sequence of letters, digits, underscores, or periods. A variable can be assigned a value using the > dcmd or ::typeset dcmd, and its attributes can be manipulated using the ::typeset dcmd. Each variable's value is represented as a 64-bit unsigned integer. A variable can have one or more of the following attributes: read-only (cannot be modified by the user), persistent (cannot be unset by the user), and tagged (user-defined indicator).

The following variables are defined as persistent:

0

Most recent value printed using the /, \, ?, or = dcmd.

9

Most recent count used with the $< dcmd

b

Virtual address of the base of the data section

d

Size of the data section in bytes

e

Virtual address of the entry point

m

Initial bytes (magic number) of the target's primary object file, or zero if no object file has been read yet

t

Size of the text section in bytes

In addition, the MDB kernel and process targets export the current values of the representative thread's register set as named variables. The names of these variables depend on the target's platform and instruction set architecture.

Symbol Name Resolution

As explained in "Syntax", a symbol identifier present in an expression context evaluates to the value of this symbol. The value typically denotes the virtual address of the storage associated with the symbol in the target's virtual address space. A target can support multiple symbol tables including, but not limited to,

The target typically searches the primary executable's symbol tables first, then one or more of the other symbol tables. Notice that ELF symbol tables contain only entries for external, global, and static symbols; automatic symbols do not appear in the symbol tables processed by mdb.

Additionally, mdb provides a private user-defined symbol table that is searched prior to any of the target symbol tables. The private symbol table is initially empty, and can be manipulated using the ::nmadd and ::nmdel dcmds.

The ::nm -P option can be used to display the contents of the private symbol table. The private symbol table allows the user to create symbol definitions for program functions or data that were either missing from the original program or stripped out. These definitions are then used whenever MDB converts a symbolic name to an address, or an address to the nearest symbol.

Because targets contain multiple symbol tables, and each symbol table can include symbols from multiple object files, different symbols with the same name can exist. MDB uses the backquote " ` " character as a symbol-name scoping operator to allow the programmer to obtain the value of the desired symbol in this situation.

You can specify the scope used to resolve a symbol name as either: object`name, or file`name, or object`file`name. The object identifier refers to the name of a load object. The file identifier refers to the basename of a source file that has a symbol of type STT_FILE in the specified object's symbol table. The object identifier's interpretation depends on the target type.

The MDB kernel target expects object to specify the base name of a loaded kernel module. For example, the symbol name:

specfs`_init

evaluates to the value of the _init symbol in the specfs kernel module.

The mdb process target expects object to specify the name of the executable or of a loaded shared library. It can take any of the following forms:

In the case of a naming conflict between symbols and hexadecimal integer values, MDB attempts to evaluate an ambiguous token as a symbol first, before evaluating it as an integer value. For example, the token f can refer either to the decimal integer value 15 specified in hexadecimal (the default base), or to a global variable named f in the target's symbol table. If a symbol with an ambiguous name is present, the integer value can be specified by using an explicit 0x or 0X prefix.

dcmd and Walker Name Resolution

As described earlier, each MDB dmod provides a set of dcmds and walkers. dcmds and walkers are tracked in two distinct, global namespaces. MDB also keeps track of a dcmd and walker namespace associated with each dmod. Identically named dcmds or walkers within a given dmod are not allowed: a dmod with this type of naming conflict will fail to load.

Name conflicts between dcmds or walkers from different dmods are allowed in the global namespace. In the case of a conflict, the first dcmd or walker with that particular name to be loaded is given precedence in the global namespace. Alternate definitions are kept in a list in load order.

The backquote character " ` " can be used in a dcmd or walker name as a scoping operator to select an alternate definition. For example, if dmods m1 and m2 each provide a dcmd d, and m1 is loaded prior to m2, then:

::d

Executes m1's definition of d

::m1`d

Executes m1's definition of d

::m2`d

Executes m2's definition of d

If module m1 were now unloaded, the next dcmd on the global definition list (m2`d) would be promoted to global visibility. The current definition of a dcmd or walker can be determined using the ::which dcmd, described below. The global definition list can be displayed using the ::which -v option.

dcmd Pipelines

dcmds can be composed into a pipeline using the | operator. The purpose of a pipeline is to pass a list of values, typically virtual addresses, from one dcmd or walker to another. Pipeline stages might be used to map a pointer from one type of data structure to a pointer to a corresponding data structure, to sort a list of addresses, or to select the addresses of structures with certain properties.

MDB executes each dcmd in the pipeline in order from left to right. The left-most dcmd is executed using the current value of dot, or using the value specified by an explicit expression at the start of the command. When a | operator is encountered, MDB creates a pipe (a shared buffer) between the output of the dcmd to its left and the MDB parser, and an empty list of values.

As the dcmd executes, its standard output is placed in the pipe and then consumed and evaluated by the parser, as if MDB were reading this data from standard input. Each line must consist of an arithmetic expression terminated by a NEWLINE or semicolon (;). The value of the expression is appended to the list of values associated with the pipe. If a syntax error is detected, the pipeline is aborted.

When the dcmd to the left of a | operator completes, the list of values associated with the pipe is then used to invoke the dcmd to the right of the | operator. For each value in the list, dot is set to this value and the right-hand dcmd is executed. Only the rightmost dcmd in the pipeline has its output printed to standard output. If any dcmd in the pipeline produces output to standard error, these messages are printed directly to standard error and are not processed as part of the pipeline.

Formatting dcmds

The /, \, ?, and = metacharacters are used to denote the special output formatting dcmds. Each of these dcmds accepts an argument list consisting of one or more format characters, repeat counts, or quoted strings. A format character is one of the ASCII characters shown in the table below.

Format characters are used to read and format data from the target. A repeat count is a positive integer preceding the format character that is always interpreted in base 10 (decimal). A repeat count can also be specified as an expression enclosed in square brackets preceded by a dollar sign ($[ ]). A string argument must be enclosed in double-quotes (" "). No blanks are necessary between format arguments.

The formatting dcmds are:

/

Display data from the target's virtual address space starting at the virtual address specified by dot.

\

Display data from the target's physical address space starting at the physical address specified by dot.

?

Display data from the target's primary object file starting at the object file location corresponding to the virtual address specified by dot.

=

Display the value of dot itself in each of the specified data formats. The = dcmd is therefore useful for converting between bases and performing arithmetic.

In addition to dot, MDB keeps track of another global value called the increment. The increment represents the distance between dot and the address following all the data read by the last formatting dcmd.

For example, if a formatting dcmd is executed with dot equal to address A, and displays a 4-byte integer, then after this dcmd completes, dot is still A, but the increment is set to 4. The + character (described in "Arithmetic Expansion") would now evaluate to the value A + 4, and could be used to reset dot to the address of the next data object for a subsequent dcmd.

Most format characters increase the value of the increment by the number of bytes corresponding to the size of the data format, shown in the table. The table of format characters can be displayed from within MDB using the ::formats dcmd. The format characters are:

+

Increment dot by the count (variable size)

-

Decrement dot by the count (variable size)

B

Hexadecimal int (1 byte)

C

Character using C character notation (1 byte)

D

Decimal signed int (4 bytes)

E

Decimal unsigned long long (8 bytes)

F

Double (8 bytes)

G

Octal unsigned long long (8 bytes)

H

Swap bytes and shorts (4 bytes)

I

Address and disassembled instruction (variable size)

J

Hexadecimal long long (8 bytes)

K

Hexadecimal uintptr_t (4 or 8 bytes)

O

Octal unsigned int (4 bytes)

P

Symbol (4 or 8 bytes)

Q

Octal signed int (4 bytes)

S

String using C string notation (variable size)

U

Decimal unsigned int (4 bytes)

V

Decimal unsigned int (1 byte)

W

Default radix unsigned int (4 bytes)

X

Hexadecimal int (4 bytes)

Y

Decoded time32_t (4 bytes)

Z

Hexadecimal long long (8 bytes)

^

Decrement dot by increment * count (variable size)

a

Dot as symbol+offset

b

Octal unsigned int (1 byte)

c

Character (1 byte)

d

Decimal signed short (2 bytes)

e

Decimal signed long long (8 bytes)

f

Float (4 bytes)

g

Octal signed long long (8 bytes)

h

Swap bytes (2 bytes)

i

Disassembled instruction (variable size)

n

Newline

o

Octal unsigned short (2 bytes)

p

Symbol (4 or 8 bytes)

q

Octal signed short (2 bytes)

r

Whitespace

s

Raw string (variable size)

t

Horizontal tab

u

Decimal unsigned short (2 bytes)

v

Decimal signed int (1 byte)

w

Default radix unsigned short (2 bytes)

x

Hexadecimal short (2 bytes)

y

Decoded time64_t (8 bytes)

The /, \, and ? formatting dcmds can also be used to write to the target's virtual address space, physical address space, or object file by specifying one of the following modifiers as the first format character, and then specifying a list of words that are either immediate values or expressions enclosed in square brackets preceded by a dollar sign ($[ ]).

The write modifiers are:

v, w

Write the lowest 2 bytes of the value of each expression to the target beginning at the location specified by dot

W

Write the lowest 4 bytes of the value of each expression to the target beginning at the location specified by dot

Z

Write the complete 8 bytes of the value of each expression to the target beginning at the location specified by dot

The /, \, and ? formatting dcmds can also be used to search for a particular integer value in the target's virtual address space, physical address space, and object file, respectively, by specifying one of the following modifiers as the first format character, then specifying a value and optional mask. The value and mask are each specified as either immediate values or expressions enclosed in square brackets preceded by a dollar sign.

If only a value is specified, MDB reads integers of the appropriate size and stops at the address containing the matching value. If a value V and mask M are specified, MDB reads integers of the appropriate size and stops at the address containing a value X where (X & M) == V. At the completion of the dcmd, dot is updated to the address containing the match. If no match is found, dot is left at the last address that was read.

The search modifiers are:

l

Search for the specified 2-byte value

L

Search for the specified 4-byte value

M

Search for the specified 8-byte value

For both user and kernel targets, an address space is typically composed of a set of discontiguous segments. It is not legal to read from an address that does not have a corresponding segment. If a search reaches a segment boundary without finding a match, it aborts when the read past the end of the segment boundary fails.