JavaScript is required to for searching.
Skip Navigation Links
Exit Print View
Oracle Solaris Studio 12.3: C User's Guide     Oracle Solaris Studio 12.3 Information Library
search filter icon
search icon

Document Information

Preface

1.  Introduction to the C Compiler

2.  C-Compiler Implementation-Specific Information

3.  Parallelizing C Code

4.  lint Source Code Checker

5.  Type-Based Alias Analysis

6.  Transitioning to ISO C

6.1 Basic Modes

6.1.1 -Xc

6.1.2 -Xa

6.1.3 -Xt

6.1.4 -Xs

6.2 New-Style Function Prototypes

6.2.1 Writing New Code

6.2.2 Updating Existing Code

6.2.3 Mixing Considerations

6.3 Functions With Varying Arguments

6.4 Promotions: Unsigned Versus Value Preserving

6.4.1 Some Background History

6.4.2 Compilation Behavior

6.4.3 Example: The Use of a Cast

6.4.4 Example: Same Result, No Warning

6.4.5 Integral Constants

6.4.6 Example: Integral Constants

6.5 Tokenization and Preprocessing

6.5.1 ISO C Translation Phases

6.5.2 Old C Translation Phases

6.5.3 Logical Source Lines

6.5.4 Macro Replacement

6.5.5 Using Strings

6.5.6 Token Pasting

6.6 const and volatile

6.6.1 Types for lvalue Only

6.6.2 Type Qualifiers in Derived Types

6.6.3 const Means readonly

6.6.4 Examples of const Usage

6.6.5 Examples of volatile Usage

6.7 Multibyte Characters and Wide Characters

6.7.1 Asian Languages Require Multibyte Characters

6.7.2 Encoding Variations

6.7.3 Wide Characters

6.7.4 C Language Features

6.8 Standard Headers and Reserved Names

6.8.1 Standard Headers

6.8.2 Names Reserved for Implementation Use

6.8.3 Names Reserved for Expansion

6.8.4 Names Safe to Use

6.9 Internationalization

6.9.1 Locales

6.9.2 setlocale() Function

6.9.3 Changed Functions

6.9.4 New Functions

6.10 Grouping and Evaluation in Expressions

6.10.1 Expression Definitions

6.10.2 K&R C Rearrangement License

6.10.3 ISO C Rules

6.10.4 Parentheses Usage

6.10.5 The As If Rule

6.11 Incomplete Types

6.11.1 Types

6.11.2 Completing Incomplete Types

6.11.3 Declarations

6.11.4 Expressions

6.11.5 Justification

6.11.6 Examples: Incomplete Types

6.12 Compatible and Composite Types

6.12.1 Multiple Declarations

6.12.2 Separate Compilation Compatibility

6.12.3 Single Compilation Compatibility

6.12.4 Compatible Pointer Types

6.12.5 Compatible Array Types

6.12.6 Compatible Function Types

6.12.7 Special Cases

6.12.8 Composite Types

7.  Converting Applications for a 64-Bit Environment

8.  cscope: Interactively Examining a C Program

A.  Compiler Options Grouped by Functionality

B.  C Compiler Options Reference

C.  Implementation-Defined ISO/IEC C99 Behavior

D.  Features of C99

E.  Implementation-Defined ISO/IEC C90 Behavior

F.  ISO C Data Representations

G.  Performance Tuning

H.  Oracle Solaris Studio C: Differences Between K&R C and ISO C

Index

6.5 Tokenization and Preprocessing

Probably the least specified part of previous versions of C concerned the operations that transformed each source file from a bunch of characters into a sequence of tokens, ready to parse. These operations included recognition of white space (including comments), bundling consecutive characters into tokens, handling preprocessing directive lines, and macro replacement. However, their respective ordering was never guaranteed.

6.5.1 ISO C Translation Phases

The order of these translation phases is specified by ISO C.

Every trigraph sequence in the source file is replaced. ISO C has exactly nine trigraph sequences that were invented solely as a concession to deficient character sets. They are three-character sequences that name a character not in the ISO 646-1983 character set:

Table 6-1 Trigraph Sequences

Trigraph Sequence
Converts to
??=
#
??-
~
??(
[
??)
]
??!
|
??<
{
??>
}
??/
\
??’
^

These sequences must be understood by ISO C compilers, but are not recommended. When you use the -xtransition option, the ISO C compiler warns you whenever it replaces a trigraph while in transition (–Xt) mode, even in comments. For example, consider the following:

/* comment *??/
/* still comment? */

The ??/ becomes a backslash. This character and the following newline are removed. The resulting characters are:

/* comment */* still comment? */

The first / from the second line is the end of the comment. The next token is the *.

  1. Every backslash/new-line character pair is deleted.

  2. The source file is converted into preprocessing tokens and sequences of white space. Each comment is effectively replaced by a space character.

  3. Every preprocessing directive is handled and all macro invocations are replaced. Each #included source file is run through the earlier phases before its contents replace the directive line.

  4. Every escape sequence (in character constants and string literals) is interpreted.

  5. Adjacent string literals are concatenated.

  6. Every preprocessing token is converted into a regular token. The compiler properly parses these and generates code.

  7. All external object and function references are resolved, resulting in the final program.

6.5.2 Old C Translation Phases

Previous C compilers did not follow such a simple sequence of phases, and the order in which these steps were applied was not predictable. A separate preprocessor recognized tokens and white space at essentially the same time as it replaced macros and handled directive lines. The output was then completely retokenized by the compiler proper, which then parsed the language and generated code.

The tokenization process within the preprocessor was a moment-by-moment operation and macro replacement was done as a character-based, not token-based, operation. Therefore, the tokens and white space could greatly vary during preprocessing.

A number of differences arise from these two approaches. The rest of this section discusses how code behavior can change due to line splicing, macro replacement, stringizing, and token pasting, which occur during macro replacement.

6.5.3 Logical Source Lines

In K&R C, backslash/new-line pairs were allowed only as a means to continue a directive, a string literal, or a character constant to the next line. ISO C extended the notion so that a backslash/new-line pair can continue anything to the next line. The result is a logical source line. Therefore, any code that relies on the separate recognition of tokens on either side of a backslash/new-line pair does not behave as expected.

6.5.4 Macro Replacement

The macro replacement process was not described in detail prior to ISO C. This vagueness spawned a great many divergent implementations. Any code that relied on anything more complex than manifest constant replacement and simple function–like macros was probably not truly portable. This manual cannot uncover all the differences between the old C macro replacement implementation and the ISO C version. Nearly all uses of macro replacement with the exception of token pasting and stringizing produce exactly the same series of tokens as before. Furthermore, the ISO C macro replacement algorithm can do things not possible in the old C version. The following example causes any use of name to be replaced with an indirect reference through name.

#define name (*name)

The old C preprocessor would produce a huge number of parentheses and stars and eventually produce an error about macro recursion.

The major change in the macro replacement approach taken by ISO C is to require macro arguments, other than those that are operands of the macro substitution operators # and ##, to be expanded recursively prior to their substitution in the replacement token list. However, this change seldom produces an actual difference in the resulting tokens.

6.5.5 Using Strings


Note - In ISO C, the examples below marked with a ? produce a warning about use of old features when you use the -xtransition option. Only in the transition mode ( –Xt and -Xs) is the result the same as in previous versions of C.


In K&R C, the following code produced the string literal "x y!":

#define str(a) "a!"   ?
str(x y)

Thus, the preprocessor searched inside string literals and character constants for characters that looked like macro parameters. ISO C recognized the importance of this feature, but could not condone operations on parts of tokens. In ISO C, all invocations of the above macro produce the string literal "a!". To achieve the old effect in ISO C, use the # macro substitution operator and the concatenation of string literals.

#define str(a) #a "!"
str(x y)

This code produces the two string literals "x y" and "!" which, after concatenation, produce the identical "x y!".

There is no direct replacement for the analogous operation for character constants. The major use of this feature was similar to the following example:

#define CNTL(ch) (037 & ’ch’)    ?
CNTL(L)

This example produces the following result, which evaluates to the ASCII control-L character.

(037 & ’L’)

The best solution is to change all uses of this macro as follows:

#define CNTL(ch) (037 & (ch))
CNTL(’L’)

This code is more readable and more useful, as it can also be applied to expressions.

6.5.6 Token Pasting

K&R C hadat least two ways to combine two tokens. Both invocations in the following code produced a single identifier x1 out of the two tokens x and 1.

#define self(a) a
#define glue(a,b) a/**/b ?
self(x)1
glue(x,1)

Again, ISO C could not sanction either approach. In ISO C, both invocations would produce the two separate tokens x and 1. The second of the two methods can be rewritten for ISO C by using the ## macro substitution operator:

#define glue(a,b) a ## b
glue(x, 1)

# and ## should be used as macro substitution operators only when __STDC__ is defined. Because ## is an actual operator, the invocation can be much freer with respect to white space in both the definition and invocation.

The compiler issues a warning diagnostic for an undefined ## operation (C standard, section 3.4.3), where undefined is a ## result that, when preprocessed, consists of multiple tokens rather than one single token (C standard, section 6.10.3.3(3)). The result of an undefined ## operation is now defined as the first individual token generated by preprocessing the string created by concatenating the ## operands.

No direct approach reproduces the first of the two old-style pasting schemes but because it put the burden of the pasting at the invocation, it was used less frequently than the other form.