C H A P T E R 3 |
Using Standard Mode |
This chapter explains use of the standard mode, which is the default compilation mode for the C++ compiler.
Since standard mode is the primary default, no option is required. You can also choose the compiler option:
C++ has added several new keywords. If you use any of these as identifiers, you get numerous and sometimes bizarre error messages. (Determining when a programmer has used a keyword as an identifier is quite difficult, and the compiler error messages might not be helpful in such cases.)
Most of the new keywords can be disabled with a compiler option, as shown in the following table. Some are logically related, and are enabled or disabled as a group.
and, and_eq, bitand, compl, not, not_eq, or, bitor, xor, xor_eq |
The addendum to the ISO C standard introduced the C standard header <iso646.h>, which defined new macros to generate the special tokens. The C++ standard has introduced these spellings directly as reserved words. (When the alternative spellings are enabled, including <iso646.h> in your program has no net effect.) The meaning of these tokens is shown in the following table.
The C++ standard has some new rules for templates that make old code nonconforming, particularly code involving the use of the new keyword typename. The C++ compiler does not enforce these rules, but it does recognize this keyword. In most cases, template code that worked under the 4.2 compiler continues to work, although the 4.2 version accepted some invalid template code. You should migrate your code to the new C++ rules as development schedules permit, since future compilers will enforce the new rules.
The C++ standard has new rules for determining whether an identifier is the name of a type. The following example illustrates the new rules.
typedef int S; template< class T > class B { typedef int U; }; template< class T > class C : public B<T> { S s; // OK T t; // OK U x; // 1 No longer valid T::V z; // 2 No longer valid }; |
The new language rules state that no base class that is dependent on a template parameter is searched automatically to resolve type names in a template, and that no name coming from a base class or template parameter class is a type name unless it is declared to be so with the keyword typename.
The first invalid line (1) in the code example tries to inherit U from B as a type without the qualifying class name and without the keyword typename. The second invalid line (2) uses type V coming from the template parameter, but omits the keyword typename. The definition of s is valid because the type doesn't depend on a base class or member of a template parameter. Similarly, the definition of t is valid because it uses type T directly, a template parameter that must be a type.
The following modified example is correct.
typedef int S; template< class T > class B { typedef int U; }; template< class T > class C : public B<T> { S s; // OK T t; // OK typename B::U x; // OK typename T::V z; // OK }; |
A problem for migrating code is that typename was not previously a keyword. If existing code uses typename as an identifier, you must first change the name to something else.
For code that must work with old and new compilers, you can add statements similar to the following example to a project-wide header file.
The effect is to conditionally replace typename with nothing. When using older compilers (such as C++ 4.1) that do not recognize typename, add -DTYPENAME_NOT_RECOGNIZED to the set of compiler options in your makefile.
In the ARM, and in the 4.2 compiler, there was no standard way to request an explicit instantiation of a template using the template definition. The C++ standard, and the C++ compiler in standard mode, provide a syntax for explicit instantiation using the template definition; the keyword template followed by a declaration of the type. For example, the last line in the following code forces the instantiation of class MyClass on type int, using the default template definition.
The syntax for explicit specializations has changed. To declare an explicit specialization, or to provide the full definition, you now prefix the declaration with template<>. (Notice the empty angle brackets.) For example:
The declaration forms mean that the programmer has somewhere provided a different definition (specialization) for the template for the provided arguments, and the compiler is not to use the default template definition for those arguments.
In standard mode, the compiler accepts the old syntax as an anachronism. The 4.2 compiler accepted the new specialization syntax, but it did not treat code using the new syntax correctly in every case. (The draft standard changed after the feature was put into the 4.2 compiler.) For maximum portability of template specialization code, you can add statements similar to the following to a project-wide header:
Then you would write, for example:
Specialize class MyClass<char>; // declaration
In class template definitions and declarations, appending the type argument bracketed by < > to the class's name has never been valid, but versions 4 and 5.0 of the C++ compiler did not report the error. For example, in the following code the <T> appended to MyClass is invalid for both the definition and the declaration.
template<class T> class MyClass<T> { ... }; // definition template<class T> class MyClass<T>; // declaration |
To resolve the problem, remove the bracketed type argument from the class name, as shown in the following code.
template<class T> class MyClass { ... }; // definition template<class T> class MyClass; // declaration |
The Sun implementation of C++ templates uses a repository for template instances. The C++ 4.2 compiler stored the repository in a directory called Templates.DB. The C++ 5 compilers, by default, use directories called SunWS_cache and SunWS_config. SunWS_cache contains the working files and SunWS_config contains the configuration files, specifically, the template options file (SunWs_config/CC_tmpl_opt). (See the C++ User's Guide.)
If you have makefiles that for some reason mention repository directories by name, you need to modify the makefiles. Furthermore, the internal structure of the repository has changed, so any makefiles that access the contents of Templates.DB no longer work.
In addition, standard C++ programs probably make heavier use of templates. Paying attention to the considerations of multiple programs or projects that share directories is very important. If possible, use the simplest organization: compile only files belonging to the same program or library in any one directory. The template repository then applies to exactly one program. If you compile a different program in the same directory, clear the repository by using CCadmin -clean. See the C++ User's Guide for more information.
The danger in more than one program sharing the same repository is that different definitions for the same name might be required. This situation cannot be handled correctly when the repository is shared.
The C++ standard library contains many templates, and many new standard header names to access those templates. The Sun C++ standard library puts declarations in the template headers, and implementation of the templates in separate files. If one of your project file names matches the name of a new template header, the compiler might pick up the wrong implementation file and cause numerous, bizarre errors. Suppose you have your own template called vector, putting the implementation in a file called vector.cc. Depending on file locations and command-line options, the compiler might pick up your vector.cc when it needs the one from the standard library, or vice-versa. When the export keyword and exported templates are implemented in a future compiler version, the situation will be worse.
Here are two recommendations for preventing current and future problems:
The C++ standard says that the name of a class is "injected" into the class itself. This is a change from earlier C++ rules. Formerly, the name of the class was not found as a name within the class.
In most cases, this subtle change has no effect on an existing program. In some cases, this change can make a formerly valid program invalid, and sometimes can result in a change of meaning. For example:
const int X = 5; class X { int i; public: X(int j = X) : // what is the default value X? i(j) { } }; |
To determine the meaning of X as a default parameter value, the compiler looks up the name X in the current scope, then in successive outer scopes, until it finds an X:
Because having a type and an object with the same name in the same scope is considered poor programming practice, this error should rarely occur. If you get such an error, you can fix the code by qualifying the variable with the proper scope, such as:
The next example (adapted from the standard library) illustrates another scoping problem.
What is the parameter type to the constructor for const_iterator? Under the old C++ rules, the compiler does not find the name iterator in the scope of class const_iterator, so it searches the next outer scope, class list<T>. That scope has a member type iterator. The parameter type is therefore list<T>::iterator.
Under the new C++ rules, the name of a class is inserted into its own scope. In particular, the name of a base class is inserted into the base class. When the compiler starts searching for a name in a derived class scope, it can now find the name of a base class. Since the type of the parameter to the const_iterator constructor does not have a scope qualifier, the name that is found is the name of the const_iterator base class. The parameter type is therefore the
global::iterator<T>, instead of list<T>::iterator.
To get the intended result, you can change some of the names, or use a scope qualifier, such as:
The ARM rules stated that a variable declared in the header of a for-statement was inserted into the scope containing the for-statement. The C++ committee felt that this rule was incorrect, and that the variable's scope should end at the end of the for-statement. (In addition, the rule didn't cover some common cases and, as a result, some code worked differently with different compilers.) The C++ committee changed the rule accordingly. Many compilers, C++ 4.2 included, continued to use the old rule.
In the following example, the if-statement is valid under the old rules, but invalid under the new rules, because k has gone out of scope.
In compatibility mode, the C++ compiler uses the old rule by default. You can instruct the compiler to use the new rule with the -features=localfor compiler option.
In standard mode, the C++ compiler uses the new rule by default. You can instruct the compiler to use the old rule with the -features=no%localfor compiler option.
You can write code that works properly with all compilers in any mode by pulling the declaration out of the for-statement header, as shown in the following example.
The C++ compiler, in both compatibility and standard mode, now issues a warning for implicit and explicit conversions between pointer-to-function and void*. In standard mode, the compiler no longer recognizes such implicit conversions when resolving overloaded function calls. For more information, see Section 1.6, Function Pointers and void*.
Some history might help clarify this subtle issue. Standard C introduced the const keyword and the concept of constant objects, neither of which was present in the original C language ("K&R" C). A string literal such as "Hello world" logically should be const in order to prevent nonsensical results, as in the following example.
#define GREETING "Hello world" char* greet = GREETING; // No compiler complaint greet[0] = 'G'; printf("%s", GREETING); // Prints "Gello world" on some systems |
In both C and C++, the results of attempting to modify a string literal are undefined. The previous example produces the odd result shown if the implementation chooses to use the same writable storage for identical string literals.
Because so much then-existing code looked like the second line in the preceding example, the C Standards Committee in 1989 did not want to make string literals const. The C++ language originally followed the C language rule. The C++ Standards Committee later decided that the C++ goal of type safety was more important, and changed the rule.
In standard C++, string literals are constant and have type const char[]. The second line of code in the previous example is not valid in standard C++. Similarly, a function parameter declared as char* should no longer be passed a string literal. However, the C++ standard also provides for a deprecated conversion of a string literal from const char[] to char*. Some examples are:
If a function does not modify, directly or indirectly, a character array that is passed as an argument, the parameter should be declared const char* (or const char[]). You might find that the need to add const modifiers propagates through the program; as you add modifiers, still more become necessary. (This phenomenon is sometimes called "const poisoning.")
In standard mode, the compiler issues a warning about the deprecated conversion of a string literal to char*. If you were careful to use const wherever it was appropriate in your existing programs, they probably compile without these warnings under the new rules.
For function overloading purposes, a string literal is always regarded as const in standard mode. For example:
If the above example is compiled in compatibility mode (or with the 4.2 compiler), function f(char*) is called. If compiled in standard mode, function f(const char*) is called.
In standard mode, the compiler will put literal strings in read-only memory by default. If you then attempt to modify the string (which might happen due to automatic conversion to char*) the program aborts with a memory violation.
With the following example, the C++ compiler in compatibility mode puts the string literal in writable memory, just like the 4.2 compiler did. The program will run, although it technically has undefined behavior. In standard mode, the compiler puts the string literal in read-only memory by default, and the program aborts with a memory fault. You should therefore heed all warnings about conversion of string literals, and try to fix your program so the conversions do not occur. Such changes will ensure your program is correct for every C++ implementation.
You can change the compiler behavior with the use of a compiler option:
You might find it convenient to use the standard C++ string class instead of C-style strings. The C++ string class does not have the problems associated with string literals, because standard string objects can be declared separately as const or not, and can be passed by reference, by pointer, or by value to functions.
The C++ standard introduced a change in the rules for conditional expressions. The C++ compiler uses the new rule in both standard mode and compatibility mode. For more information, see Section 1.5, Conditional Expressions.
There are four issues regarding the new forms of new and delete:
The old rules are used by default in compatibility mode, and the new rules are used by default in standard mode. Changing from the default is not recommended, because the compatibility-mode run-time library (libC) depends on the old definitions and behavior, and the standard-mode run-time library (libCstd) depends on the new definitions and behavior.
The compiler predefines the macro _ARRAYNEW to the value 1 when the new rules are in force. The macro is not defined when the old rules are in use. The following example is explained in more detail in the next section:
// Replacement functions #ifdef _ARRAYNEW void* operator new(size_t) throw(std::bad_alloc); void* operator new[](size_t) throw(std::bad_alloc); #else void* operator new(size_t); #endif |
The C++ standard adds new forms of operator new and operator delete that are called when allocating or deallocating an array. Previously, there was only one form of these operator functions. In addition, when you allocate an array, only the global form of operator new and operator delete would be used, never a class-specific form. The C++ 4.2 compiler did not support the new forms, since their use requires an ABI change.
In addition to these functions:
In all cases (previous and current), you can write replacements for the versions found in the run-time library. The two forms are provided so that you can use a different memory pool for arrays than for single objects, and so that a class can provide its own version of operator new for arrays.
Under both sets of rules, when you write new T, where T is some type, function operator new(size_t) gets called. However, when you write new T[n] under the new rules, function operator new[](size_t) is called.
Similarly, under both sets of rules, when you write delete p, operator delete(void*) is called. Under the new rules, when you write delete [] p, operator delete[](void*) is called.
You can write class-specific versions of the array forms of these functions as well.
Under the old rules, all forms of operator new returned a null pointer if the allocation failed. Under the new rules, the ordinary forms of operator new throw an exception if allocation fails, and do not return any value. Special forms of operator new that return zero instead of throwing an exception are available. All versions of operator new and operator delete have an exception-specification. The declarations found in standard header <new> are:
Defensive code such as the following example no longer works as previously intended. If the allocation fails, the operator new that is called automatically from the new expression throws an exception, and the test for zero never occurs.
If you prefer not to use any exceptions in your code, you can use the second form. If you are using exceptions in your code, consider using the first form.
If you did not previously verify whether operator new succeeded, you can leave your existing code unchanged. It then aborts immediately on allocation failure instead of progressing to some point where an invalid memory reference occurs.
If you have replacement versions of operator new and delete, they must match the signatures shown in CODE EXAMPLE 3-3, including the exception specifications on the functions. In addition, they must implement the same semantics. The normal forms of operator new must throw a bad_alloc exception on failure; the nothrow version must not throw any exception, but must return zero on failure. The forms of operator delete must not throw any exception. Code in the standard library uses the global operator new and delete and depends on this behavior for correct operation. Third-party libraries can have similar dependencies.
The global version of operator new[]() in the C++ runtime library just calls the single-object version, operator new(), as required by the C++ standard. If you replace the global version of operator new() from the C++ standard library, you don't need to replace the global version of operator new[] ().
The C++ standard prohibits replacing the predefined "placement" forms of operator new:
They cannot be replaced in standard mode, although the 4.2 compiler allowed it. You can, of course, write your own placement versions with different parameter lists.
In compatibility mode, include <new.h> as always. In standard mode, include <new> (no .h) instead. To ease in transition, a header <new.h> is available in standard mode that makes the names from namespace std available in the global namespace. This header also provides typedefs that make the old names for exceptions correspond to the new exception names. See Section 3.13, Standard Exceptions.
The Boolean keywords--bool, true, and false--are controlled by the presence or absence of Boolean keyword recognition in the compiler:
Turning on the keywords in compatibility mode is a good idea because it exposes any current use of the keywords in your code.
Turning off the Boolean keywords in standard mode is not a good idea, because the C++ standard library depends on the built-in bool type, which would not be available. When you later turn on bool, more problems ensue, particularly with name mangling.
The compiler predefines the macro _BOOL to be 1 when the Boolean keywords are enabled. It is not defined when they are disabled. For example:
You cannot define a Boolean type in compatibility mode that will work exactly like the new built-in bool type. This is one reason why a built-in Boolean type was added to C++.
A function can be declared with a language linkage, such as
If you do not specify a linkage, C++ linkage is assumed. You can specify C++ linkage explicitly:
You can also group declarations:
This technique is used extensively in the standard headers.
Language linkage means the way in which a function is called: where the arguments are placed, where the return value is to be found, and so on. Declaring a language linkage does not mean the function is written in that language. It means that the function is called as if it were written in that language. Thus, declaring a C++ function to have C linkage means the C++ function can be called from a function written in C.
A language linkage applied to a function declaration applies to the return type and all its parameters that have function or pointer-to-function type.
In compatibility mode, the compiler implements the ARM rule that the language linkage is not part of the function type. In particular, you can declare a pointer to a function without regard to the linkage of the pointer, or of a function assigned to it. This is the same behavior as the C++ 4.2 compiler.
In standard mode, the compiler implements the new rule that the language linkage is part of its type, and is part of the type of a pointer to function. The linkages must therefore match.
The following example shows functions and function pointers with C and C++ linkage, in all four possible combinations. In compatibility mode the compiler accepts all combinations, just like the 4.2 compiler. In standard mode the compiler accepts the mismatched combinations only as an anachronism.
If you encounter a problem, be sure that the pointers to be used with C linkage functions are declared with C linkage, and the pointers to be used with C++ linkage functions are declared without a linkage specifier, or with C++ linkage. For example:
extern "C" { int fc(int); int (*fp1)(int) = fc; // Both have C linkage } int fcpp(int); int (*fp2)(int) = fcpp; // Both have C++ linkage |
In the worst case, where you really do have mismatched pointer and function, you can write a "wrapper" around the function to avoid any compiler complaints. In the following example, composer is a C function taking a pointer to a function with C linkage.
To pass function foo (which has C++ linkage) to the function composer, create a C-linkage function foo_wrapper that presents a C interface to foo:
extern "C" void composer( int(*)(int) ); extern "C++" int foo(int); extern "C" int foo_wrapper(int i) { return foo(i); } composer( foo_wrapper ); // OK |
In addition to eliminating the compiler complaint, this solution works even if C and C++ functions really have different linkage.
The Sun implementation of C and C++ function linkage is binary-compatible. That is not the case with every C++ implementation, although it is reasonably common. If you are not concerned with possible incompatibility, you can employ a cast to use a C++-linkage function as if it were a C-linkage function.
A good example concerns static member functions. Prior to the new C++ language rule regarding linkage being part of a function's type, the usual advice was to treat a static member function of a class as a function with C linkage. Such a practice circumvented the limitation that you cannot declare any linkage for a class member function. You might have code like the following:
// Existing code typedef int (*cfuncptr)(int); extern "C" void set_callback(cfuncptr); class T { ... static int memfunc(int); }; ... set_callback(T::memfunc); // no longer valid |
As recommended in the previous section, you can create a function wrapper that calls T::memfunc and then change all the set_callback calls to use a wrapper instead of T::memfunc. Such code will be correct and completely portable.
An alternative is to create an overloaded version of set_callback that takes a function with C++ linkage and calls the original, as in the following example:
This example requires only a small modification to existing code. An extra version of the function that sets the callback was added. Existing code that called the original set_callback now calls the overloaded version that in turn calls the original version. Since the overloaded version is an inline function, there is no runtime overhead at all.
Although this technique works with Sun C++, it is not guaranteed to work with every C++ implementation because the calling sequence for C and C++ functions may be different on other systems.
A subtle consequence of the new rule for language linkage involves functions that take pointers to functions as parameters, such as:
An unchanged rule about language linkage is that if you declare a function with language linkage and follow it with a definition of the same function with no language linkage specified, the previous language linkage applies. For example:
In this example, function f has C linkage. The definition that follows the declaration (the declaration might be in a header file that gets included) inherits the linkage specification of the declaration. But suppose the function takes a parameter of type pointer-to-function, as in the following example:
Under the old rule, and with the 4.2 compiler, there is only one function g. Under the new rule, the first line declares a function g with C linkage that takes a pointer-to-function-with-C-linkage. The second line defines a function that takes a pointer-to-function-with-C++-linkage. The two functions are not the same; the second function has C++ linkage. Because linkage is part of the type of a pointer-to-function, the two lines refer to a pair of overloaded functions each called g. Code that depended on these being the same function breaks. Very likely, the code fails during compilation or linking.
Good programming practice puts the linkage specification on the function definition as well as on the declaration:
You can further reduce confusion about types by using a typedef for the function parameter:
extern "C" {typedef int (*pfc)(int);} // ptr to C-linkage function extern "C" int g(pfc); extern "C" int g(pfc pf) { ... } |
In compatibility mode, RTTI is off by default, as with the 4.2 compiler. In standard mode, RTTI is on and cannot be turned off. Under the old ABI, RTTI has a noticeable cost in data size and in efficiency. (RTTI could not be implemented directly under the old ABI, and an inefficient indirect method was required.) In standard mode using the new ABI, RTTI has negligible cost. (This is one of several improvements in the ABI.)
The C++ 4.2 compiler used the names related to standard exceptions that appeared in the C++ draft standard at the time the compiler was prepared. The names in the C++ standard have changed since then. In standard mode, the C++ 5 compilers use the standard names, as shown in the following table.
The public members of the classes (xmsg vs. exception, and xalloc vs. bad_alloc) are different, as is the way you use the classes.
A static object is an object with static storage duration. The static object can be global or in a namespace. It can be a static variable local to a function or it can be a static data member of a class.
The C++ standard requires that static objects be destroyed in the reverse order of their construction. In addition, the destruction of these objects might need to be intermixed with functions that are registered with the atexit() function.
Earlier versions of the C++ compiler destroyed the global static objects that are created in any one module in the reverse order of their construction. However, the correct destruction order over the entire program was not assured.
Beginning with version 5.1 of the C++ compiler, static objects are destroyed in strict reverse order of their construction. For example, suppose there are three static objects of type T:
We can't predict which of the two global objects will be created first, the one in file1 or the one in file2. However, the global object that is created first will be destroyed after the other global object is destroyed.
The local static object is created when its function is called. If the function is called after the creation of both the global static objects, the local object is destroyed before the global objects are destroyed.
The C++ standard places additional requirements on destruction of static objects in relation to functions registered with the atexit() function. If a function F is registered with atexit() after the construction of a static object X, F must be called at program exit before X is destroyed. Conversely, if function F is registered with atexit() before X is constructed, F must be called at program exit after X is destroyed.
Here is an example of this rule.
// T is a type having a destructor void bar(); void foo() { static T t2; atexit(bar); static T t3; } T t1; int main() { foo(); } |
At program start, t1 is created, then main runs. Main calls foo(). The foo() function performs the following in this order.
2. Register bar() with atexit()
Upon reaching the end of main, exit is called automatically. The sequence of the exit processing must be the following.
1. Destroy t3; t3 was constructed after bar() was registered with atexit()
3. Destroy t2; t2 was constructed before bar() was registered with atexit()
4. Destroy t1; t1 was the first thing constructed, and therefore the last thing destroyed
Support for this interleaving of static destructors and the atexit() processing requires help from the Solaris run-time library libc.so. This support is available beginning with Solaris 8 software. A C++ program that is compiled with version 5.1, version 5.2, version 5.3, or version 5.4 of the C++ compiler looks, at runtime, for a special symbol in the library to determine whether it is currently running on a version of Solaris software that has this support. If the support is available, the static destructors are properly interleaved with atexit-registered functions. If the program is running on a version of Solaris software that does not have this support, the destructors are still executed in the proper order, but they are not interleaved with atexit-registered functions.
Notice that the determination is made by the program each time it runs. It does not matter what version of Solaris software you use to build the program. As long as the Solaris run-time library libc.so is linked dynamically (which happens by default), the interleaving at program exit will happen if the version of Solaris software that is running the program supports it.
Different compilers provide different levels of support for the correct order of the destruction of static objects. To improve the portability of your code, the correctness of your program should not depend on the exact order in which static objects are destroyed.
If your program depends on a particular order of destruction and worked with an older compiler, the order required by the standard might break the program in standard mode. The -features=no%strictdestrorder command option disables the strict ordering of destruction.
Copyright © 2005, Sun Microsystems, Inc. All Rights Reserved.