C H A P T E R  1

Introduction

In this book, the C++ 4.0, 4.0.1, 4.1, and 4.2 compilers are referred to collectively as "C++ 4," and the C++ 5.0, 5.1, 5.2, 5.3, 5.4, and 5.5 compilers are referred to collectively as "C++ 5." To a large degree, C++ source code that compiled and ran under C++ 4 continues to work under the C++ 5 compilers, with a few exceptions that are due to changes in the C++ language definition. The compiler provides a compatibility mode (-compat[=4]) that allows nearly all of your C++ 4 code to continue to work unchanged.



Note - Object code that is compiled in standard mode (the default mode) using version 5.0, version 5.1, version 5.2, version 5.3, version 5.4, or version 5.5 of the C++ compiler is not compatible with C++ code from any earlier compiler. You can sometimes use self-contained libraries of older object code with the 5.0, 5.1, 5.2, 5.3, 5.4, and 5.5 compiler versions. The details are covered in Section 1.3, Binary Compatibility Issues.




1.1 The C++ Language

C++ was first described in The C++ Programming Language (1986) by Bjarne Stroustrup, and later more formally in The Annotated C++ Reference Manual (the ARM) (1990), by Margaret Ellis and Bjarne Stroustrup. The Sun C++ 4 compiler versions were based primarily on the definition in the ARM, with additions from the then-emerging C++ standard. The additions selected for inclusion in C++ 4, and particularly in the C++ 4.2 compiler, were mainly those that did not cause source or binary incompatibility.

C++ is now the subject of an international standard, ISO/IEC 14882:1998 Programming Languages -- C++. The C++ 5.4 compiler in standard mode implements nearly all of the language as specified in the standard. The readme file that accompanies the current release describes departures from requirements in the standard.

Some changes in the C++ language definition prevent compilation of old source code without minor changes. The most obvious example is that the entire C++ standard library is defined in namespace std. The traditional first C++ program

#include <iostream.h>
int main() { cout << "Hello, world!" << endl; }

no longer compiles under a strictly conforming compiler because the standard name of the header is now <iostream> (without the .h), and the names cout and endl are in namespace std, not in the global namespace. The C++ compiler, as an extension, provides a header <iostream.h> which allows that program to compile even in standard mode. Besides the source code changes required, such language changes create binary incompatibilities, and so were not introduced into the C++ compiler prior to version 5.0.

Some newer C++ language features also required changes in the binary representation of programs. This subject is discussed in some detail in Section 1.3, Binary Compatibility Issues.


1.2 Compiler Modes of Operation

The C++ compiler has two modes of operation, standard mode and compatibility mode.

1.2.1 Standard Mode

Standard mode implements most of the C++ International Standard, and has some source incompatibilities with the language accepted by C++ 4, as noted earlier.

More importantly, in standard mode, the C++ 5 compilers use an application binary interface (ABI) different from that of C++ 4. Code generated by the compiler in standard mode is generally incompatible with, and cannot be linked with, code from the various C++ 4 compilers. This subject is discussed in more detail in Section 1.3, Binary Compatibility Issues.

You should update your code to compile in standard mode, for several reasons:

1.2.2 Compatibility Mode

To provide a migration path from C++ 4 to standard mode, the compiler provides a compatibility mode (-compat[=4]). The compatibility mode is fully binary-compatible and mostly source-compatible with the C++ 4 compilers. (Compatible means upward compatible. Older source and binary code works with the new compiler, but you cannot depend on code intended for the new compiler working with an old compiler.) Compatibility mode is not binary-compatible with standard mode. Compatibility mode is available for the Solaris 8 operating environment on IA and SPARC platforms, but not for SPARC V9 (64-bit) processors.



Note - In this document, the term "IA" refers to the Intel 32-bit processor architecture, which includes the Pentium, Pentium Pro, and Pentium II, Pentium II Xeon, Celeron, Pentium III, and Pentium III Xeon processors and compatible microprocessor chips made by AMD and Cyrix.



Reasons to use compatibility mode:



Note - Under most conditions, you cannot link object files and libraries compiled in compatibility mode (-compat[=4]) with object files and libraries compiled in standard mode (the default mode). For more information, see Section 1.4, Mixing Old and New Binaries.




1.3 Binary Compatibility Issues

An application binary interface, or ABI, defines the machine-level characteristics of the object program produced by a compiler. It includes the sizes and alignment requirements of basic types, the layout of structured or aggregate types, the way in which functions are called, the actual names of entities defined in a program, and many other features. Much of the C++ ABI for the Solaris operating environment is the same as the basic Solaris ABI, which is the ABI for the C language.

1.3.1 Language Changes

C++ introduced many features (such as class member functions, overloaded functions and operators, type-safe linkage, exceptions, and templates) that did not correspond to anything in the ABI for C. Each major new version of C++ added language features that could not be implemented using the previous ABI. Necessary ABI changes have involved the way class objects are laid out, or the way in which some functions are called, and the way type-safe linkage ("name mangling") can be implemented.

The C++ 4.0 compiler implemented the language defined by the ARM. By the time the C++ 4.2 compiler was released, the C++ committee had introduced many new language features, some requiring a change in the ABI. Because it was certain that additional ABI changes would be required for as-yet unknown language additions or changes, Sun elected to implement only those new features that did not require a change to the ABI. The intent was to minimize the inconvenience of having to maintain correspondence of binary files compiled with different compiler versions. When the C++ standard was published, Sun designed a new ABI that allows the full C++ language to be implemented. The C++ 5 compilers use the new ABI by default.

One example of language changes affecting the ABI is the new names, signatures, and semantics of the new and delete free-store functions. Another is the new rule that a template function and non-template function with the same signature are nevertheless different functions. This rule required a change in "name mangling," which created a binary incompatibility with older compiled code. The introduction of type bool also created an ABI change, particularly regarding the interface to the standard library. Because the ABI needed to change, aspects of the old ABI that resulted in needlessly inefficient runtime code were improved.


1.4 Mixing Old and New Binaries

It is an overstatement to say that you cannot link old binaries (object files and libraries compiled by the C++ 4 compiler or compiled by the C++ 5 compilers in compatibility mode) with new binaries (object files and libraries compiled by the C++ 5 compilers in standard mode). It is possible to do this on SPARC platforms by using the libExbridge library.



Note - Mixing modes using libExbridge is supported only on SPARC platforms, not on x86 platforms. Allowing libExbridge to work on x86 platforms would require an ABI change and recompiling code. If you can recompile the code, you do not need libExbridge. We very strongly recommend that you do not mix compat modes in one program. Compile all code in standard mode instead. It is much better in both the short term and in the long term. The problem you are trying to solve by mixing modes will not go away, and will not be reliably solved in the long term by using libExbridge.



1.4.1 Getting Started

Install the latest SUNWlibC patch on all systems where the application is built or run. The libExbridge.so.1 library in the new patch depends on the versions of libC.so.5 and libCrun.so.1 in the same patch. Therefore, it will not work if you just copy libExbridge.so.1 onto a system instead of installing the patch.

1.4.2 Requirements

You can link old binaries with new binaries (as defined above) under the following conditions:

1.4.2.1 Using Exceptions

When the code uses exceptions, meaning that the code contains the throw or catch keywords (including an exception specification on a function), the requirements are as follows.

If the version of the C++ compiler that you are using supports linker scoping with -xldscope, use that feature to control the visibility of symbols in the library instead. See -xldscope in the C++ User's Guide or CC(1) for details.

If the version of the C++ compiler that you are using does not support linker scoping, use linker mapfiles to control the visibility of symbols in the library. See the Linker and Libraries Guide for details.

1.4.2.2 Using the libExbridge Library

The preferred method for using libExbridge is to link it in with the code. To do this, add the -lExbridge option to your CC command. The -l option prepends lib to the library name.

If you cannot link with libExbridge, follow these instructions:

Once LD_PRELOAD is set, it affects all programs started afterward. Preloading libExbridge can cause a subsequent invocation of the shell, or a program that spawns a shell, to fail. /usr/bin/sh defines its own malloc which is not properly initialized at the time it is called by the .init section of the preloaded library.

Thus, when using the C shell, it is best to set the environment variable in a shell script that runs the application. The Bourne and Korn shell syntax shown creates the environment variable only for the command on the same line.

If you preload libExbridge and your application spawns a shell, it is possible for the application to fail. In that event, you must relink the application using libExbridge instead of preloading the library.

1.4.3 Structuring the Interface

The files and libraries present a C interface.

Sometimes a library is coded in C++ for convenience, yet presents only a C interface to the outside world. Put simply, having a C interface means that a client cannot tell the program was written in C++. More specifically, having a C interface means that all of the following are true:

If a library meets the C-interface criteria, it can be used wherever a C library can be used. In particular, such libraries can be compiled with one version of the C++ compiler and linked with object files compiled with a different version, provided they do not mix exception handling.

However, if any of these conditions are violated, the files and libraries cannot be linked together. If an attempted link succeeds, which is doubtful, the program does not run correctly.

Note that if you use the C compiler (cc) to link an application with a C-interface library, and if that library needs C++ run-time support, then you must create a dependency on either libC (compatibility mode) or libCrun (standard mode) using one of the following methods. If the C-interface library does not need C++ run-time support, then you do not need to link with libC or libCrun.


1.5 Conditional Expressions

The C++ standard introduced a change in the rules for conditional expressions. The difference shows up only in an expression like

e ? a : b = c

The critical issue is having an assignment following the colon when no grouping parentheses are present.

The 4.2 compiler used the original C++ rule and treats that expression as if you had written

(e ? a : b) = c

That is, the value of c will be assigned to either a or b depending on the value of e.

The compiler now uses the new C++ rule in both compatibility and standard mode. It treats that expression as if you had written

e ? a : (b = c)

That is, c will be assigned to b if and only if e is false.

Solution: Always use parentheses to indicate which meaning you intend. You can then be sure the code will have the same meaning when compiled by any compiler.


1.6 Function Pointers and void*

In C there is no implicit conversion between pointer-to-function and void*. The ARM added an implicit conversion between function pointers and void* "if the value would fit." C++ 4.2 implemented that rule. The implicit conversion was later removed from C++, since it causes unexpected function overloading behavior, and because it reduces portability of code. In addition, there is no longer any conversion, even with a cast, between pointer-to-function and void*.

The compiler now issues a warning for implicit and explicit conversions between pointer-to-function and void*. In standard mode, the compiler no longer recognizes such implicit conversions when resolving overloaded function calls. Such code that compiled with the 4.2 compiler now generates an error (no matching function) in standard mode. (The compiler emits an anachronism warning in compatibility mode.) If you have code that depends on the implicit conversion for proper overload resolution, you need to add a cast. For example:

int g(int);
typedef void (*fptr)();
int f(void*);
int f(fptr);
void foo()
{
    f(g);          // This line has different behavior
}

With the 4.2 compiler, the marked line in the code example calls f(void*). Now, in standard mode, there is no match, and you get an error message. You can add an explicit cast, such as f((void*)g), but you will get a warning because the code violates the C++ standard. Conversions between function pointers and void* are valid on all versions of the Solaris operating environment, but are not portable to all platforms.

C++ does not have a "universal function pointer" corresponding to void*. With C++ on all supported platforms, all function pointers have the same size and representation. You can therefore use any convenient function pointer type to hold the value of any function pointer. This solution is portable to most platforms. As always, you must convert the pointer value back to its original type before attempting to call the function that is pointed to. See also Section 3.11, Pointers to extern "C" Functions.


1.7 Anticipating Future Mangling Changes

There are some instances where the compiler does not meet the C++ standard regarding declarations that refer to the same entry. In these instances, your program will not get the correct linking behavior. To avoid this problem, follow these rules. When the mangling problem is fixed in a later release, the names will still be mangled in the same way.

Declaring a value parameter const is not supposed to have any effect on the function signature or on how the function can be called, so don't declare it const.

int f(const int); // the const has no meaning, don't use it
int f(int);       // do this instead
int f(const int i) { ... } // don't do this
int f(int i) { ... }       // do this instead

Unfortunately, there is no direct workaround for this declaration.

If you can't avoid code that is affected by this mangling problem, for example because it occurs in headers or libraries that you don't own, you can use weak symbols to equate a declaration with its definition, as shown in the following example.

int cpp_function( int arg ) { return arg; }
#pragma_weak "__1c_missing_mangled_name" = cpp_function

You must use the mangled name versions in these types of declarations.

1.7.1 Symptoms of Improper Mangling

The compiler does not always mangle names consistently when your code has any of the features described in Section 1.7, Anticipating Future Mangling Changes as problem areas. The symptom is that the program fails to link and the linker complains that a symbol cannot be found. The unmangled name in the linker error message refers to a function or object that is, in fact, defined. However, because the compiler mangled a reference to the symbol differently from the symbol definition, the linker cannot match up the names. Consider the following example:

main.cc
--------
int foo(int); // no "const" in declaration
int main()
{
    return foo(1);
}
 
file1.cc
---------
int foo(const int k) // "const" added to parameter declaration
{
    return k;
}
 
example% CC main.cc file1.cc
main.cc:
file1.cc:
Undefined                        first referenced
  symbol                            in file
int foo(int)                        main.o
ld: fatal: Symbol referencing errors. No output written to a.out

You can see the reason for the failure by inspecting the names emitted by the compiler into the object files:

% nm main.o | grep foo
[2]    |        0|      0|NOTY |GLOB |0   |UNDEF |__1cDfoo6Fi_i_
% nm file1.o | grep foo
[2]     |     16|     40|FUNC |GLOB |0   |2    |__1cDfoo6Fki_i_

In main.o, the compiler emitted a reference to function foo that was mangled differently from the way the name was mangled in the definition of function foo in file1.o. As described in Section 1.7, Anticipating Future Mangling Changes, the workaround is not to use const in the declaration of foo's parameter.

Programs that declare foo consistently with a const parameter, and programs that declare foo consistently with non-const parameter will compile and link successfully.

If we fixed the compiler bug, some programs that link now would stop linking. For example, suppose a 3rd-party binary library contained file1.o. If we fixed the compiler bug, no declaration of foo would allow a program to link to the foo in that library. If we do not fix the bug, you can declare foo with a const parameter and successfully link to the library.

Fortunately, all known compiler bugs related to name mangling result in "impossible" mangled names. That is, the invalid mangled name will never accidently refer to the mangled name of some other function or object. You can always add extra symbols to fix problems caused by incorrectly mangled names, as described elsewhere in the Migration Guide.