Lexical Conventions in Assembly Language - x86 Assembly Language Reference Manual

Language:

2.1 Lexical Conventions in Assembly Language

This section discusses the lexical conventions of the Oracle Solaris x86 assembly language.

2.1.1 Statements in Assembly Language

An x86 assembly language program consists of one or more files containing statements. A statement consists of tokens separated by whitespace and terminated by either a newline character (ASCII 0x0A) or a semicolon (;) (ASCII 0x3B). Whitespace consists of spaces (ASCII 0x20), tabs (ASCII 0x09), and form feeds (ASCII 0x0B) that are not contained in a string or comment. More than one statement can be placed on a single input line provided that each statement is terminated by a semicolon. A statement can consist of a comment. Empty statements, consisting only of whitespace, are allowed.

2.1.1.1 Comments in Assembly Language

You can add comment to a statement. A comment can be in a single line or in many lines.

If you want to add a single-line comment, use the slash character (/) (ASCII 0x2F) followed by the text of the comment. A new line that terminates the statement, terminates the comment. If you want to add a multi-line comment, place the comment text between the characters { } or /* */. For example:

Single-line comment:

1: / define numeric label "1"

Multi-line comment:


 jump to last numeric label "1" defined
 before this instruction
 (this reference is equivalent to label "one")
}
jmp   1b


/*
  jump to first numeric label "1" defined
  after this instruction
  (this reference is equivalent to label "two")
*/
jmp 1f

Note - In AVX512 instructions, the text within the braces {...} is regular syntax and not a comment.

2.1.1.2 Labels in Assembly Language

A label can be placed at the beginning of a statement. During assembly, the label is assigned the current value of the active location counter and serves as an instruction operand. There are two types of labels: symbolic and numeric.

Symbolic Labels in Assembly Language

A symbolic label consists of an identifier (or symbol) followed by a colon (:) (ASCII 0x3A). Symbolic labels must be defined only once. Symbolic labels have global scope and appear in the object file's symbol table.

Symbolic labels with identifiers beginning with a period (.) (ASCII 0x2E) are considered to have local scope and are not included in the object file's symbol table.

Numeric Labels in Assembly Language

A numeric label consists of a unsigned decimal int32 value followed by a colon (:). Numeric labels are used only for local reference and are not included in the object file's symbol table. Numeric labels have limited scope and can be redefined repeatedly.

When a numeric label is used as a reference (as an instruction operand, for example), the suffixes b ("backward") or f ("forward") should be added to the numeric label. For numeric label N, the reference Nb refers to the nearest label N defined before the reference, and the reference Nf refers to the nearest label N defined after the reference. The following example illustrates the use of numeric labels:

1:          / define numeric label "1"
one:        / define symbolic label "one"

/ ... assembler code ...

jmp   1f    / jump to first numeric label "1" defined
            / after this instruction
            / (this reference is equivalent to label "two")

jmp   1b    / jump to last numeric label "1" defined
            / before this instruction
            / (this reference is equivalent to label "one")

1:          / redefine label "1"
two:        / define symbolic label "two"

jmp   1b    / jump to last numeric label "1" defined
            / before this instruction
            / (this reference is equivalent to label "two")

2.1.2 Tokens in Assembly Language

There are five classes of tokens:

Identifiers (symbols)
Keywords
Numerical constants
String Constants
Operators

2.1.2.1 Identifiers in Assembly Language

An identifier is an arbitrarily-long sequence of letters and digits. The first character must be a letter; the underscore (_) (ASCII 0x5F) and the period (.) (ASCII 0x2E) are considered to be letters. Case is significant: uppercase and lowercase letters are different.

2.1.2.2 Keywords in Assembly Language

Keywords such as x86 instruction mnemonics ("opcodes") and assembler directives are reserved for the assembler and should not be used as identifiers. See Instruction Set Mapping for a list of the Oracle Solaris x86 mnemonics. See Assembler Directives for the list of as assembler directives.

2.1.2.3 Numerical Constants in Assembly Language

Numbers in the x86 architecture can be integers or floating point. Integers can be signed or unsigned, with signed integers represented in two's complement representation. Floating-point numbers can be: single-precision floating-point; double-precision floating-point; and double-extended precision floating-point.

Integer Constants in Assembly Language

Integers can be expressed in several bases:

Decimal. Decimal integers begin with a non-zero digit followed by zero or more decimal digits (0-9).
Binary. Binary integers begin with "0b" or "0B" followed by zero or more binary digits (0, 1).
Octal. Octal integers begin with zero (0) followed by zero or more octal digits (0-7).
Hexadecimal. Hexadecimal integers begin with "0x" or "0X" followed by one or more hexadecimal digits (0-9, A-F). Hexadecimal digits can be either uppercase or lowercase.

Floating Point Constants in Assembly Language

Floating point constants have the following format:

Sign (optional) – Either plus (+) or minus (-)
Integer (optional) – Zero or more decimal digits (0–9)
Fraction (optional) – decimal point (.) followed by zero or more decimal digits
Exponent (optional) – The letter "e" or "E", followed by an optional sign (plus or minus), followed by one or more decimal digits (0-9)

A valid floating point constant must have either an integer part or a fractional part.

2.1.2.4 String Constants in Assembly Language

A string constant consists of a sequence of characters enclosed in double quotes (") (ASCII 0x22). To include a double-quote character ("), single-quote character ('), or backslash character (\) within a string, precede the character with a backslash (\) (ASCII 0x5C). A character can be expressed in a string as its ASCII value in octal preceded by a backslash (for example, the letter "J" could be expressed as "\112"). The assembler accepts the following escape sequences in strings:

Escape Sequence	Character Name	ASCII Value (hex)
`\n`	newline	0A
`\r`	carriage return	0D
`\b`	backspace	08
`\t`	horizontal tab	09
`\f`	form feed	0C
`\v`	vertical tab	0B

2.1.2.5 Operators in Assembly Language

The assembler supports the following operators for use in expressions. Operators have no assigned precedence. Expressions can be grouped in square brackets ([]) to establish precedence.

+: Addition
-: Subtraction
\*: Multiplication
\/: Division
&: Bitwise logical AND
|: Bitwise logical OR
>>: Shift right
<<: Shift left
\%: Remainder
!: Bitwise logical AND NOT
^: Bitwise logical XOR

Note - The asterisk (*), slash (/), and percent sign (%) characters are overloaded. When used as operators in an expression, these characters must be preceded by the backslash character (\).