Linker and Libraries Guide

Chapter 2 Link-Editor

The link-editing process creates an output file from one or more input files. The creation of the output file is directed by the options supplied to the link-editor together with the input sections provided by the input files.

All files are represented in the executable and linking format (ELF). For a complete description of the ELF format see Chapter 7, Object File Format. For this introduction, however, it is first necessary to introduce two ELF structures, sections and segments.

Sections are the smallest indivisible units that can be processed within an ELF file. Segments are a collection of sections that represent the smallest individual units that can be mapped to a memory image by exec(2) or by the runtime linker ld.so.1(1).

Although there are many types of ELF sections, they all fall into two categories with respect to the link-editing phase:

Basically, the link-editor concatenates the program data sections into the output file. The link-editing information sections are interpreted by the link-editor to modify other sections or to generate new output information sections used in later processing of the output file.

The following simple breakdown of link-editor functionality introduces the topics covered in this chapter:

The process of concatenating like sections and associating sections to segments is carried out using default information within the link-editor. The default section and segment handling provided by the link-editor is usually sufficient for most link-edits. However, these defaults can be manipulated using the -M option with an associated mapfile. See Chapter 8, Mapfile Option.

Invoking the Link-Editor

You can either run the link-editor directly from the command line or have a compiler driver invoke it for you. In the following two sections the description of both methods are expanded. However, using the compiler driver is the preferred choice. The compilation environment is often the consequence of a complex and occasionally changing series of operations known only to compiler drivers.

Direct Invocation

When you invoke the link-editor directly, you have to supply every object file and library required to create the intended output. The link-editor makes no assumptions about the object modules or libraries that you meant to use in creating the output. For example, when you issue the command:


$ ld test.o

the link-editor creates a dynamic executable named a.out using only the input file test.o. For the a.out to be a useful executable, it should include startup and exit processing code. This code can be language or operating system specific, and is usually provided through files supplied by the compiler drivers.

Additionally, you can also supply your own initialization and termination code. This code must be encapsulated and labeled correctly for it to be correctly recognized and made available to the runtime linker. This encapsulation and labeling can also be provided through files supplied by the compiler drivers.

When creating runtime objects such as executables and shared objects, you should use a compiler driver to invoke the link-editor. Invoking the link-editor directly is recommended only when creating intermediate relocatable objects when using the -r option.

Using a Compiler Driver

The conventional way to use the link-editor is through a language-specific compiler driver. You supply the compiler driver, cc(1), CC(1), and so forth, with the input files that make up your application. The compiler driver adds additional files and default libraries to complete the link-edit. These additional files can be seen by expanding the compilation invocation, for example:


$ cc -# -o prog main.o
/usr/ccs/bin/ld -dy /opt/COMPILER/crti.o /opt/COMPILER/crt1.o \
/usr/ccs/lib/values-Xt.o -o prog main.o \
-YP,/opt/COMPILER/lib:/usr/ccs/lib:/usr/lib -Qy -lc \
/opt/COMPILER/crtn.o

Note –

The actual files included by your compiler driver and the mechanism used to display the link-editor invocation might differ.


Specifying the Link-Editor Options

Most options to the link-editor can be passed through the compiler driver command line. For the most part the compiler and the link-editor options do not conflict. Where a conflict arises, the compiler drivers usually provide a command-line syntax you can use to pass specific options to the link-editor. You can also provide options to the link-editor by setting the LD_OPTIONS environment variable. For example:


$ LD_OPTIONS="-R /home/me/libs -L /home/me/libs" cc -o prog main.c -lfoo

The -R and -L options are interpreted by the link-editor and prepended to any command-line options received from the compiler driver.

The link-editor parses the entire option list for any invalid options or any options with invalid associated arguments. When either of these cases is found, a suitable error message is generated. If the error is deemed fatal, the link-edit terminates. In the following example, the illegal option -X is identified, and the illegal argument to the -z option is caught by the link-editor's checking.


$ ld -X -z sillydefs main.o
ld: illegal option -- X
ld: fatal: option -z has illegal argument `sillydefs'

If an option requiring an associated argument is mistakenly specified twice, the link-editor will provide a suitable warning but will continue with the link-edit. For example:


$ ld -e foo ...... -e bar main.o
ld: warning: option -e appears more than once, first setting taken

The link-editor also checks the option list for any fatal inconsistencies. For example:


$ ld -dy -a main.o
ld: fatal: option -dy and -a are incompatible

After processing all options, if no fatal error conditions have been detected, the link-editor proceeds to process the input files.

See Appendix A, Link-Editor Quick Reference for the most commonly used link-editor options, and the ld(1) man page for a complete description of all link-editor options.

Input File Processing

The link-editor reads input files in the order in which they appear on the command line. Each file is opened and inspected to determine its ELF file type and therefore determine how it must be processed. The file types that apply as input for the link-edit are determined by the binding mode of the link-edit, either static or dynamic.

Under static mode, the link-editor accepts only relocatable objects or archive libraries as input files. Under dynamic mode, the link-editor also accepts shared objects.

Relocatable objects represent the most basic input file type to the link-editing process. The program data sections within these files are concatenated into the output file image being generated. The link-edit information sections are organized for later use, but do not become part of the output file image, as new sections are generated to take their places. Symbols are gathered into an internal symbol table for verification and resolution. This table is then used to create one or more symbol tables in the output image.

Although any input file can be specified directly on the link-edit command-line, archive libraries and shared objects are commonly specified using the -l option. See Linking With Additional Libraries for coverage of this mechanism and how it relates to the two different linking modes. However, even though shared objects are often referred to as shared libraries, and both of these objects can be specified using the same option, the interpretation of shared objects and archive libraries is quite different. The next two sections expand upon these differences.

Archive Processing

Archives are built using ar(1), and usually consist of a collection of relocatable objects with an archive symbol table. This symbol table provides an association of symbol definitions with the objects that supply these definitions. By default, the link-editor provides selective extraction of archive members. When the link-editor reads an archive, it uses information within the internal symbol table it is creating to select only the objects from the archive it requires to complete the binding process. You can also explicitly extract all members of an archive.

The link-editor extracts a relocatable object from an archive if:

Under selective archive extraction, a weak symbol reference does not extract an object from an archive unless the -z weakextract option is in effect. See Simple Resolutions for more information.


Note –

The options -z weakextract, -z allextract, and -z defaultextract enable you to toggle the archive extraction mechanism among multiple archives.


With selective archive extraction, the link-editor makes multiple passes through an archive to extract relocatable objects as needed to satisfy the symbol information being accumulated in the link-editor internal symbol table. After the link-editor has made a complete pass through the archive without extracting any relocatable objects, it moves on to process the next input file.

By extracting from the archive only the relocatable objects needed at the time the archive was encountered, the position of the archive within the input file list can be significant. See Position of an Archive on the Command Line.


Note –

Although the link-editor makes multiple passes through an archive to resolve symbols, this mechanism can be quite costly for large archives containing random organizations of relocatable objects. In these cases, you should use tools like lorder(1) and tsort(1) to order the relocatable objects within the archive and so reduce the number of passes the link-editor must carry out.


Shared Object Processing

Shared objects are indivisible whole units that have been generated by a previous link-edit of one or more input files. When the link-editor processes a shared object, the entire contents of the shared object become a logical part of the resulting output file image. This logical inclusion means that all symbol entries defined in the shared object are made available to the link-editing process. The shared object is actually copied during process execution.

The shared object's program data sections and most of the link-editing information sections are unused by the link-editor. These sections are interpreted by the runtime linker when the shared object is bound to generate a runnable process. However, the occurrence of a shared object is remembered, and information is stored in the output file image to indicate that this object is a dependency and must be made available at runtime.

By default, all shared objects specified as part of a link-edit are recorded as dependencies in the object being built. This recording is made regardless of whether the object being built actually references symbols offered by the shared object. To minimize runtime linking overhead, specify only those dependencies required to resolve symbol references from the object being built as part of the link-edit. The link-editor's debugging capabilities, and ldd(1) with the -u option, can be used to determine unused dependencies. Alternatively, the link-editor's -z ignore option can suppress the dependency recording of unused shared objects.

If a shared object has dependencies on other shared objects, these dependencies are also processed. This processing occurs after all command-line input files have been processed. These shared objects will be used to complete the symbol resolution process; however, their names will not be recorded as dependencies in the output file image being generated.

Although the position of a shared object on the link-edit command-line has less significance than it does for archive processing, the position can have a global effect. Multiple symbols of the same name are allowed to occur between relocatable objects and shared objects, and between multiple shared objects. See Symbol Resolution.

The order of shared objects processed by the link-editor is maintained in the dependency information stored in the output file image. As the runtime linker reads this information, it loads the specified shared objects in the same order. Therefore, the link-editor and the runtime linker select the first occurrence of a symbol of a multiply-defined series of symbols.


Note –

Multiple symbol definitions, and thus the information to describe the interposing of one definition of a symbol for another, are reported in the load map output generated using the -m option.


Linking With Additional Libraries

Although the compiler drivers often ensure that appropriate libraries are specified to the link-editor, frequently you must supply your own. Shared objects and archives can be specified by explicitly naming the input files required to the link-editor, but a more common and more flexible method involves using the link-editor's -l option.

Library Naming Conventions

By convention, shared objects are usually designated by the prefix lib and the suffix .so, and archives are designated by the prefix lib and the suffix .a. For example, libc.so is the shared object version of the standard C library made available to the compilation environment, and libc.a is the library's archive version.

These conventions are recognized by the -l option of the link-editor. This option is commonly used to supply additional libraries to a link-edit. The following example directs the link-editor to search for libfoo.so. If the link-editor does not find libfoo.so, it searches for libfoo.a before moving on to the next directory to be searched.


$ cc -o prog file1.c file2.c -lfoo

Note –

There is a naming convention regarding the compilation environment and the runtime environment use of shared objects. The compilation environment uses the simple .so suffix, whereas the runtime environment commonly uses the suffix with an additional version number. See Naming Conventions and Coordination of Versioned Filenames.


When link-editing in dynamic mode, you can choose to link with a mix of shared objects and archives. When link-editing in static mode, only archive libraries are acceptable for input.

When in dynamic mode and using the -l option to enable a library search, the link-editor will first search in a given directory for a shared object that matches the specified name. If no match is found, the link-editor looks for an archive library in the same directory. When in static mode and using the -l option, only archive libraries are sought.

Linking With a Mix of Shared Objects and Archives

The library search mechanism in dynamic mode searches a given directory for a shared object, and then searches an archive library. Finer control of the type of search required is possible through the -B option.

By specifying the -B dynamic and -B static options on the command line as many times as required, you can toggle the library search between shared objects or archives respectively. For example, to link an application with the archive libfoo.a and the shared object libbar.so, issue the following command:


$ cc -o prog main.o file1.c -Bstatic -lfoo -Bdynamic -lbar

The -B static and -B dynamic keywords are not exactly symmetrical. When you specify -B static, the link-editor does not accept shared objects as input until the next occurrence of -B dynamic. However, when you specify -B dynamic, the link-editor first looks for shared objects and then archive library's in any given directory.

The precise description of the previous example is that the link-editor first searches for libfoo.a, and then for libbar.so, and if that search fails, for libbar.a. Finally, it searches for libc.so, and if that search fails, libc.a.

Position of an Archive on the Command Line

The position of an archive on the command line can affect the output file being produced. The link-editor searches an archive only to resolve undefined or tentative external references it has previously seen. After this search is completed and any required members have been extracted, the link-editor moves onto the next input file on the command line.

Therefore by default, the archive is not available to resolve any new references from the input files that follow the archive on the command line. For example, the following command directs the link-editor to search libfoo.a only to resolve symbol references that have been obtained from file1.c. The libfoo.a archive is not available to resolve symbol references from file2.c or file3.c.


$ cc -o prog file1.c -Bstatic -lfoo file2.c file3.c -Bdynamic

Note –

You should specify any archives at the end of the command line unless multiple-definition conflicts require you to do otherwise.


In some instances users have interdependencies between archives such that the extraction of members from one archive is resolved by extracting members from another archive. If these dependencies are cyclic, the archives must be specified repeatedly on the command line to satisfy previous references. For example:


$ cc -o prog .... -lA -lB -lC -lA -lB -lC -lA

The determination, and maintenance, of repeated archive specifications can be tedious. The -z rescan option makes this process simpler. Following all input file processing, this option causes the entire archive list to be reprocessed in an attempt to locate additional archive members that resolve symbol references. This archive rescanning continues until a pass over the archive list occurs in which no new members are extracted. The previous example could therefore be simplified to:


$ cc -o prog -z rescan .... -lA -lB -lC

Directories Searched by the Link-Editor

All previous examples assume the link-editor knows where to search for the libraries listed on the command line. By default, when linking 32–bit objects, the link-editor knows of only two standard directories in which to look for libraries, /usr/ccs/lib and /usr/lib. When linking 64–bit objects, only one standard directory is used, /usr/lib/64. All other directories to be searched must be added to the link-editor's search path explicitly.

You can change the link-editor search path in two ways: using a command-line option, or using an environment variable.

Using a Command-Line Option

You can use the -L option to add a new path name to the library search path. This option affects the search path at the point it is encountered on the command line. For example, the following command searches path1, then /usr/ccs/lib and /usr/lib, to find libfoo. It searches path1 and then path2, and then /usr/ccs/lib and /usr/lib, to find libbar.


$ cc -o prog main.o -Lpath1 file1.c -lfoo file2.c -Lpath2 -lbar

Path names defined using the -L option are used only by the link-editor. These path names are not recorded in the output file image created for use by the runtime linker.


Note –

You must specify -L if you want the link-editor to search for libraries in your current directory. You can use a period (.) to represent the current directory.


You can use the -Y option to change the default directories searched by the link-editor. The argument supplied with this option takes the form of a colon separated list of directories. For example, the following command searches for libfoo only in the directories /opt/COMPILER/lib and /home/me/lib.


$ cc -o prog main.c -YP,/opt/COMPILER/lib:/home/me/lib -lfoo

The directories specified using the -Y option can be supplemented by using the -L option.

Using an Environment Variable

You can also use the environment variable LD_LIBRARY_PATH, which takes a colon-separated list of directories, to add to the link-editor's library search path. In its most general form, LD_LIBRARY_PATH takes two directory lists separated by a semicolon. The first list is searched before the lists supplied on the command line, and the second list is searched after.

The following example shows the combined effect of setting LD_LIBRARY_PATH and calling the link-editor with several -L occurrences:


$ LD_LIBRARY_PATH=dir1:dir2;dir3
$ export LD_LIBRARY_PATH
$ cc -o prog main.c -Lpath1 ... -Lpath2 ... -Lpathn -lfoo

The effective search path is dir1:dir2:path1:path2... pathn:dir3:/usr/ccs/lib:/usr/lib.

If no semicolon is specified as part of the LD_LIBRARY_PATH definition, the specified directory list is interpreted after any -L options. In the following example, the effective search path is path1:path2... pathn:dir1:dir2:/usr/ccs/lib:/usr/lib.


$ LD_LIBRARY_PATH=dir1:dir2
$ export LD_LIBRARY_PATH
$ cc -o prog main.c -Lpath1 ... -Lpath2 ... -Lpathn -lfoo

Note –

This environment variable can also be used to augment the search path of the runtime linker. See Directories Searched by the Runtime Linker. To prevent this environment variable from influencing the link-editor, use the -i option.


Directories Searched by the Runtime Linker

The runtime linker only looks in one default location for dependencies. This location is /usr/lib when processing 32–bit objects, and /usr/lib/64 when processing 64–bit objects. All other directories to be searched must be added to the runtime linker's search path explicitly.

When a dynamic executable or shared object is linked with additional shared objects, these shared objects are recorded as dependencies. These dependencies must be located during process execution by the runtime linker. During the link-edit, one or more search paths can be recorded in the output file. These search paths are used by the runtime linker to locate any dependencies. These recorded search paths are referred to as a runpath.

Specialized objects may be built with the -z nodefaultlib option to suppress any search of the default location at runtime. Use of this option implies that all the dependencies of an object can be located using its runpaths. Without this option, no matter how you augment the runtime linker's search path, its last element is always the default location. /usr/lib for 32–bit objects and /usr/lib/64 for 64–bit objects.


Note –

The default search path can be administrated using a runtime configuration file. See Configuring the Default Search Paths. However, the creator of an object should not rely on the existence of this file. You should always ensure that an object can locate its dependencies with only its runpaths or the default location.


You can use the -R option, which takes a colon-separated list of directories, to record a runpath in a dynamic executable or shared object. The following example records the runpath /home/me/lib:/home/you/lib in the dynamic executable prog.


$ cc -o prog main.c -R/home/me/lib:/home/you/lib -Lpath1 \
-Lpath2 file1.c file2.c -lfoo -lbar

The runtime linker uses these paths, followed by the default location, to obtain any shared object dependencies. In this case, this runpath is used to locate libfoo.so.1 and libbar.so.1.

The link-editor accepts multiple -R options. These multiple specifications are concatenate together, separated by a colon. Thus, the previous example can also be expressed as follows.


$ cc -o prog main.c -R/home/me/lib -Lpath1 -R/home/you/lib \
-Lpath2 file1.c file2.c -lfoo -lbar

For objects that may be installed in various locations, the $ORIGIN dynamic string token provides a flexible means of recording a runpath. See Locating Associated Dependencies.


Note –

A historic alternative to specifying the -R option is to set the environment variable LD_RUN_PATH, and make this available to the link-editor. The scope and function of LD_RUN_PATH and -R are identical, but when both are specified, -R supersedes LD_RUN_PATH.


Initialization and Termination Sections

Dynamic objects may supply code that provides for runtime initialization and termination processing. This code can be encapsulated in one of two section types, either an array of function pointers or a single code block. Each of these section types is built from a concatenation of like sections from the input relocatable objects.

The sections .preinit_array, .init_array and .fini_array provide arrays of runtime pre-initialization, initialization, and termination functions, respectively. When creating a dynamic object, the link-editor identifies these arrays with the .dynamic tag pairs DT_PREINIT_[ARRAY/ARRAYSZ], DT_INIT_[ARRAY/ARRAYSZ], and DT_FINI_[ARRAY/ARRAYSZ] accordingly. These tags identify the associated sections so they may be called by the runtime linker. A pre-initialization array is applicable to dynamic executables only.

The sections .init and .fini provide a runtime initialization and termination code block, respectively. However, the compiler drivers typically supply .init and .fini sections with files they add to the beginning and end of your input file list. These files have the effect of encapsulating the .init and .fini code into individual functions. These functions are identified by the reserved symbol names _init and _fini respectively. When creating a dynamic object, the link-editor identifies these symbols with the .dynamic tags DT_INIT and DT_FINI accordingly. These tags identify the associated sections so they may be called by the runtime linker.

For more information regarding the execution of initialization and termination code at runtime see Initialization and Termination Routines.

The registration of initialization and termination functions can be carried out directly by the link-editor using the -z initarray and -z finiarray options. For example, the following command places the address of foo() in an .initarray element, and the address of bar() in a .finiarray element.


$ cat main.c
#include    <stdio.h>

void foo()
{
        (void) printf("initializing: foo()\n");
}

void bar()
{
        (void) printf("finalizing: bar()\n");
}

main()
{
        (void) printf("main()\n");
        return (0);
}

$ cc -o main -zinitarray=foo -zfiniarray=bar main.c
$ main
initializing: foo()
main()
finalizing: bar()

The creation of initialization and termination sections can be carried out directly using an assembler. However, most compilers offer special primitives to simplify their declaration. For example, the previous code example can be rewritten using the following #pragma definitions. These definitions result in a call to foo() being placed in an .init section, and a call to bar() being placed in a .fini section.


$ cat main.c
#include    <stdio.h>

#pragma init (foo)
#pragma fini (bar)

.......
$ cc -o main main.c
$ main
initializing: foo()
main()
finalizing: bar()

Initialization and termination code, spread throughout several relocatable objects, can result in different behavior when included in an archive library or shared object. The link-edit of an application using this archive might extract only a fraction of the objects contained in the archive. These objects might provide only a portion of the initialization and termination code spread throughout the members of the archive. At runtime, only this portion of code is executed. The same application built against the shared object will have all the accumulated initialization and termination code executed when the dependency is loaded at runtime.

To determine the order of executing initialization and termination code within a process at runtime is a complex issue involving dependency analysis. Limiting the content of initialization and termination code can simplifying this analysis, while providing both flexible, and predictable runtime behavior. See Initialization and Termination Order for more details.

Data initialization should be independent if the initialization code is involved with a dynamic object whose memory can be dumped using dldump(3DL).

Symbol Processing

During input file processing, all local symbols from the input relocatable objects are passed through to the output file image. All global symbols are accumulated internally within the link-editor. Each global symbol supplied by a relocatable object is searched for within this internal symbol table. If a symbol with the same name has already been encountered from a previous input file, a symbol resolution process is called. This symbol resolution process determines which of the two entries is kept.

On completing input file processing, and providing no fatal error conditions have been encountered during symbol resolution, the link-editor determines if any unresolved symbol references remain. Unresolved symbol references can cause the link-edit to terminate.

Finally, the link-editor's internal symbol table is added to the symbol tables of the image being created.

The following sections expand upon symbol resolution and undefined symbol processing.

Symbol Resolution

Symbol resolution runs the entire spectrum, from simple and intuitive to complex and perplexing. Resolutions can be carried out silently by the link-editor, can be accompanied by warning diagnostics, or can result in a fatal error condition.

The resolution of two symbols depends on their attributes, the type of file providing the symbol, and the type of file being generated. For a complete description of symbol attributes, see Symbol Table Section. For the following discussions, however, it is worth identifying three basic symbol types:

In its simplest form, symbol resolution involves the use of a precedence relationship that has defined symbols dominating tentative symbols, which in turn dominate undefined symbols.

The following C code example shows how these symbol types can be generated. Undefined symbols are prefixed with u_, tentative symbols are prefixed with t_, and defined symbols are prefixed with d_.


$ cat main.c
extern int      u_bar;
extern int      u_foo();

int             t_bar;
int             d_bar = 1;

d_foo()
{
        return (u_foo(u_bar, t_bar, d_bar));
}
$ cc -o main.o -c main.c
$ nm -x main.o

[Index]   Value      Size      Type  Bind  Other Shndx   Name
...............
[8]     |0x00000000|0x00000000|NOTY |GLOB |0x0  |UNDEF  |u_foo
[9]     |0x00000000|0x00000040|FUNC |GLOB |0x0  |2      |d_foo
[10]    |0x00000004|0x00000004|OBJT |GLOB |0x0  |COMMON |t_bar
[11]    |0x00000000|0x00000000|NOTY |GLOB |0x0  |UNDEF  |u_bar
[12]    |0x00000000|0x00000004|OBJT |GLOB |0x0  |3      |d_bar

Simple Resolutions

Simple symbol resolutions are by far the most common, and result when two symbols with similar characteristics are detected and one symbol takes precedence over the other. This symbol resolution is carried out silently by the link-editor. For example, for symbols with the same binding, a reference to an undefined symbol from one file is bound to, or satisfied by, a defined or tentative symbol definition from another file. Or, a tentative symbol definition from one file is bound to a defined symbol definition from another file.

Symbols that undergo resolution can have either a global or weak binding. Weak bindings have lower precedence than global binding, so symbols with different bindings are resolved according to a slight alteration of the basic rules.

Weak symbols can usually be defined via the compiler, either individually or as aliases to global symbols. One mechanism uses a #pragma definition:


$ cat main.c
#pragma weak    bar
#pragma weak    foo = _foo

int             bar = 1;

_foo()
{
        return (bar);
}
$ cc -o main.o -c main.c
$ nm -x main.o
[Index]   Value      Size      Type  Bind  Other Shndx   Name
...............
[7]     |0x00000000|0x00000004|OBJT |WEAK |0x0  |3      |bar
[8]     |0x00000000|0x00000028|FUNC |WEAK |0x0  |2      |foo
[9]     |0x00000000|0x00000028|FUNC |GLOB |0x0  |2      |_foo

Notice that the weak alias foo is assigned the same attributes as the global symbol _foo. This relationship is maintained by the link-editor and results in the symbols being assigned the same value in the output image. In symbol resolution, weak defined symbols are silently overridden by any global definition of the same name.

Another form of simple symbol resolution, interposition, occurs between relocatable objects and shared objects, or between multiple shared objects. In these cases, when a symbol is multiply-defined, the relocatable object, or the first definition between multiple shared objects, is silently taken by the link-editor. The relocatable object's definition, or the first shared object's definition, is said to interpose on all other definitions. This interposition can be used to override the functionality provided by one shared object, by a dynamic executable, or by another shared object.

The combination of weak symbols and interposition provides a useful programming technique. For example, the standard C library provides several services that you are allowed to redefine. However, ANSI C defines a set of standard services that must be present on the system and cannot be replaced in a strictly conforming program.

The function fread(3C), for example, is an ANSI C library function, whereas the system function read(2) is not. A conforming ANSI C program must be able to redefine read(2) and still use fread(3C) in a predictable way.

The problem here is that read(2) underlies the fread(3C) implementation in the standard C library. Therefore, a program that redefines read(2) might confuse the fread(3C) implementation. To guard against this occurrence, ANSI C states that an implementation cannot use a name that is not reserved for it. Using the following #pragma directive you can define just such a reserved name, and from it generate an alias for the function read(2).


#pragma weak read = _read

Thus, you can quite freely define your own read() function without compromising the fread(3C) implementation, which in turn is implemented to use the _read() function.

The link-editor will not have difficulty with your redefinition of read(), either when linking against the shared object or archive version of the standard C library. In the former case, interposition takes its course. In the latter case, the fact that the C library's definition of read(2) is weak allows that definition to be quietly overridden.

You can use the link-editor's -m option to write a list of all interposed symbol references, along with section load address information, to the standard output.

Complex Resolutions

Complex resolutions occur when two symbols of the same name are found with differing attributes. In these cases, the link-editor selects the most appropriate symbol and generates a warning message indicating the symbol, the attributes that conflict, and the identity of the file from which the symbol definition is taken. In the following example two files with a definition of the data item array have different size requirements.


$ cat foo.c
int array[1];

$ cat bar.c
int array[2] = { 1, 2 };

$ cc -dn -r -o temp.o foo.c bar.c
ld: warning: symbol `array' has differing sizes:
        (file foo.o value=0x4; file bar.o value=0x8);
        bar.o definition taken

A similar diagnostic is produced if the symbol's alignment requirements differ. In both of these cases, the diagnostic can be suppressed by using the link-editor's -t option.

Another form of attribute difference is the symbol's type. In the following example the symbol bar() has been defined as both a data item and a function.


$ cat foo.c
bar()
{
        return (0);
}
$ cc -o libfoo.so -G -K pic foo.c
$ cat main.c
int     bar = 1;

main()
{
        return (bar);
}
$ cc -o main main.c -L. -lfoo
ld: warning: symbol `bar' has differing types:
        (file main.o type=OBJT; file ./libfoo.so type=FUNC);
        main.o definition taken

Note –

Symbol types in this context are classifications that can be expressed in ELF. They are not related to the data types as employed by the programming language, except in the crudest fashion.


In cases like the previous example, the relocatable object definition is taken when the resolution occurs between a relocatable object and a shared object, or the first definition is taken when the resolution occurs between two shared objects. When such resolutions occur between symbols of different bindings (weak or global), a warning is also produced.

Inconsistencies between symbol types are not suppressed by the link-editor's -t option.

Fatal Resolutions

Symbol conflicts that cannot be resolved result in a fatal error condition. In this case, an appropriate error message is provided indicating the symbol name together with the names of the files that provided the symbols, and no output file is generated. Although the fatal condition is sufficient to terminate the link-edit, all input file processing is first completed. In this manner, all fatal resolution errors can be identified.

The most common fatal error condition exists when two relocatable objects both define symbols of the same name, and neither symbol is a weak definition:


$ cat foo.c
int bar = 1;

$ cat bar.c
bar()
{ 
        return (0);
}

$ cc -dn -r -o temp.o foo.c bar.c
ld: fatal: symbol `bar' is multiply-defined:
        (file foo.o and file bar.o);
ld: fatal: File processing errors. No output written to int.o

foo.c and bar.c have conflicting definitions for the symbol bar. Because the link-editor cannot determine which should dominate, the link-edit usually terminates with an error message. You can use the link-editor's -z muldefs option to suppress this error condition, and allow the first symbol definition to be taken.

Undefined Symbols

After all of the input files have been read and all symbol resolution is complete, the link-editor searches the internal symbol table for any symbol references that have not been bound to symbol definitions. These symbol references are referred to as undefined symbols. The effect of these undefined symbols on the link-edit process can vary according to the type of output file being generated, and possibly the type of symbol.

Generating an Executable Output File

When the link-editor is generating an executable output file, the link-editor's default behavior is to terminate with an appropriate error message should any symbols remain undefined. A symbol remains undefined when a symbol reference in a relocatable object is never matched to a symbol definition:


$ cat main.c
extern int foo();

main()
{
        return (foo());
}
$ cc -o prog main.c
Undefined           first referenced
 symbol                 in file
foo                     main.o
ld: fatal: Symbol referencing errors. No output written to prog

In a similar manner, a symbol reference within a shared object that is never matched to a symbol definition when the shared object is being used to create a dynamic executable will also result in an undefined symbol:


$ cat foo.c
extern int bar;
foo()
{
        return (bar);
}

$ cc -o libfoo.so -G -K pic foo.c
$ cc -o prog main.c -L. -lfoo
Undefined           first referenced
 symbol                 in file
bar                     ./libfoo.so
ld: fatal: Symbol referencing errors. No output written to prog

If you want to allow undefined symbols, as in cases like the previous example, then the default fatal error condition can be suppressed by using the link-editor's -z nodefs option.


Note –

Take care when using the -z nodefs option. If an unavailable symbol reference is required during the execution of a process, a fatal runtime relocation error occurs. It may be possible to detect this error during the initial execution and testing of an application. However, more complex execution paths can result in this error condition taking much longer to detect, which can be time consuming and costly.


Symbols can also remain undefined when a symbol reference in a relocatable object is bound to a symbol definition in an implicitly defined shared object. For example, continuing with the files main.c and foo.c used in the previous example:


$ cat bar.c
int bar = 1;

$ cc -o libbar.so -R. -G -K pic bar.c -L. -lfoo
$ ldd libbar.so
        libfoo.so =>     ./libfoo.so

$ cc -o prog main.c -L. -lbar
Undefined           first referenced
 symbol                 in file
foo                     main.o  (symbol belongs to implicit \
                        dependency ./libfoo.so)
ld: fatal: Symbol referencing errors. No output written to prog

prog is built with an explicit reference to libbar.so. libbar.so has a dependency on libfoo.so, and therefore an implicit reference to libfoo.so from prog is established.

Because main.c made a specific reference to the interface provided by libfoo.so, prog really has a dependency on libfoo.so. However, only explicit shared object dependencies are recorded in the output file being generated. Thus, prog fails to run if a new version of libbar.so is developed that no longer has a dependency on libfoo.so.

For this reason, bindings of this type are deemed fatal, and the implicit reference must be made explicit by referencing the library directly during the link-edit of prog. The required reference is hinted at in the fatal error message shown in the preceding example.

Generating a Shared Object Output File

When the link-editor is generating a shared object output file, it allows undefined symbols to remain at the end of the link-edit. This default behavior allows the shared object to import symbols from either relocatable objects or from other shared objects when the object is used to create a dynamic executable.

The link-editor's -z defs option can be used to force a fatal error if any undefined symbols remain. This option is recommended when creating any shared objects. Shared objects that reference symbols from an application can use the -z defs option and define the applications symbols using the extern mapfile directive, as described in Defining Additional Symbols.

A self-contained shared object, in which all references to external symbols are satisfied by named dependencies, provides maximum flexibility. The shared object can be employed by many users without those users having to determine and establish dependencies to satisfy the shared object's requirements.

Weak Symbols

Weak symbol references that are not bound during a link-edit do not result in a fatal error condition, no matter what output file type is being generated.

If a static executable is being generated, the symbol is converted to an absolute symbol and assigned a value of zero.

If a dynamic executable or shared object is being produced, the symbol is left as an undefined weak reference and assigned the value zero. During process execution, the runtime linker searches for this symbol. If the runtime linker does not find a match, it binds the reference to an address of zero instead of generating a fatal runtime relocation error.

Historically, these undefined weak referenced symbols have been employed as a mechanism to test for the existence of functionality. For example, the following C code fragment might have been used in the shared object libfoo.so.1:


#pragma weak    foo

extern  void    foo(char *);

void bar(char * path)
{
        void (* fptr)(char *);

        if ((fptr = foo) != 0)
                (* fptr)(path);
}

When an application is built that references libfoo.so.1, the link-edit will complete successfully regardless of whether a definition for the symbol foo is found. If during execution of the application the function address tests nonzero, the function is called. However, if the symbol definition is not found, the function address tests zero and so it is not called.

Compilation systems view this address comparison technique as having undefined semantics, which can result in the test statement being removed under optimization. In addition, the runtime symbol binding mechanism places other restrictions on the use of this technique, which prevents a consistent model from being available for all dynamic objects.


Note –

Undefined weak references in this manner are discouraged. Instead, you should use dlsym(3DL) with the RTLD_DEFAULT flag as a means of testing for a symbol's existence. See Testing for Functionality.


Tentative Symbol Order Within the Output File

Contributions from input files usually appear in the output file in the order of their contribution. An exception occurs when processing tentative symbols and their associated storage. These symbols are not fully defined until their resolution is complete. If the resolution occurs as a result of encountering a defined symbol from a relocatable object, then the order of appearance is that which would have occurred for the definition.

If you need to control the ordering of a group of symbols, then any tentative definition should be redefined to a zero-initialized data item. For example, the following tentative definitions result in a reordering of the data items within the output file, compared to the original order described in the source file foo.c:


$ cat foo.c
char A_array[0x10];
char B_array[0x20];
char C_array[0x30];

$ cc -o prog main.c foo.c
$ nm -vx prog | grep array
[32]    |0x00020754|0x00000010|OBJT |GLOB |0x0  |15  |A_array
[34]    |0x00020764|0x00000030|OBJT |GLOB |0x0  |15  |C_array
[42]    |0x00020794|0x00000020|OBJT |GLOB |0x0  |15  |B_array

By defining these symbols as initialized data items, the relative ordering of these symbols within the input file is carried over to the output file:


$ cat foo.c
char A_array[0x10] = { 0 };
char B_array[0x20] = { 0 };
char C_array[0x30] = { 0 };

$ cc -o prog main.c foo.c
$ nm -vx prog | grep array
[32]    |0x000206bc|0x00000010|OBJT |GLOB |0x0  |12  |A_array
[42]    |0x000206cc|0x00000020|OBJT |GLOB |0x0  |12  |B_array
[34]    |0x000206ec|0x00000030|OBJT |GLOB |0x0  |12  |C_array

Defining Additional Symbols

Besides the symbols provided from input files, you can supply additional symbol references or definitions to a link-edit. In the simplest form, symbol references can be generated using the link-editor's -u option. Greater flexibility is provided with the link-editor's -M option and an associated mapfile that enables you to define symbol references and a variety of symbol definitions.

The -u option provides a mechanism for generating a symbol reference from the link-edit command line. This option can be used to perform a link-edit entirely from archives, or to provide additional flexibility in selecting the objects to extract from multiple archives. See section Archive Processing for an overview of archive extraction.

For example, perhaps you want to generate a dynamic executable from the relocatable object main.o, which refers to the symbols foo and bar. You want to obtain the symbol definition foo from the relocatable object foo.o contained in lib1.a, and the symbol definition bar from the relocatable object bar.o, contained in lib2.a.

However, the archive lib1.a also contains a relocatable object defining the symbol bar. This relocatable object is presumably of differing functionality to the relocatable object provided in lib2.a. To specify the required archive extraction, you can use the following link-edit:


$ cc -o prog -L. -u foo -l1 main.o -l2

The -u option generates a reference to the symbol foo. This reference causes extraction of the relocatable object foo.o from the archive lib1.a. The first reference to the symbol bar occurs in main.o, which is encountered after lib1.a has been processed. Therefore, the relocatable object bar.o is obtained from the archive lib2.a.


Note –

This simple example assumes that the relocatable object foo.o from lib1.a does not directly or indirectly reference the symbol bar. If it does then the relocatable object bar.o is also extracted from lib1.a during its processing. See Archive Processing for a discussion of the link-editor's multi-pass processing of an archive.


A more extensive set of symbol definitions can be provided using the link-editor's -M option and an associated mapfile. The syntax for these mapfile entries is:


[ name ] {
      scope:
            symbol [ = [ type ] [ value ] [ size ] [ extern ] ];
} [ dependency ];
name

A label for this set of symbol definitions, if present, identifies a version definition within the image. See Chapter 5, Application Binary Interfaces and Versioning.

scope

Indicates the visibility of the symbols' binding within the output file being generated. All symbols defined with a mapfile are treated as global in scope during the link-edit process. That is, they are resolved against any other symbols of the same name obtained from any of the input files. The following definitions, and aliases, define a symbols' visibility in the object being created:

default / global

Symbols of this scope remain visible to other external objects. References to such symbols from within the object are bound at runtime, thus allowing interposition to take place.

protected / symbolic

Symbols of this scope remain visible to other external objects. References to these symbols from within the object are bound at link-edit, thus preventing runtime interposition. This scope definition has the same affect as a symbol with STV_PROTECTED visibility. See Table 7–24.

hidden / local

Symbols of this scope are reduced to symbols with a local binding. Symbols of this scope are not visible to other external objects. This scope definition has the same affect as a symbol with STV_HIDDEN visibility. See Table 7–24.

eliminate

Symbols of this scope are hidden. Their symbol table entries are eliminated.

symbol

The name of the symbol required. If the name is not followed by one of the symbol attributes, type, value, size or extern, a symbol reference is created. This reference is exactly the same as would be generated using the -u option discussed earlier in this section. If the symbol name is followed by any symbol attributes, then a symbol definition is generated using the associated attributes.

When in local scope, this symbol name can be defined as the special auto-reduction directive “*”. This directive results in all global symbols, not explicitly defined to be global in the mapfile, receiving a local binding within any dynamic object file being generated.

type

Indicates the symbol type attribute. This attribute can be either data, function, or COMMON. The former two type attributes result in an absolute symbol definition. See Symbol Table Section. The latter type attribute results in a tentative symbol definition.

value

Indicates the value attribute and takes the form of Vnumber.

size

Indicates the size attribute and takes the form of Snumber.

extern

This keyword indicates the symbol is defined externally to the object being created. Undefined symbols flagged with the -z defs option can be suppressed with this option.

dependency

Represents a version definition that is inherited by this definition. See Chapter 5, Application Binary Interfaces and Versioning.

If either a version definition or the auto-reduction directive is specified, then versioning information is recorded in the image created. If this image is an executable or shared object, then any symbol reduction is also applied.

If the image being created is a relocatable object, then by default, no symbol reduction is applied. In this case, any symbol reductions are recorded as part of the versioning information. These reductions are applied when the relocatable object is finally used to generate an executable or shared object. The link-editor's -B reduce option can be used to force symbol reduction when generating a relocatable object.

A more detailed description of the versioning information is provided in Chapter 5, Application Binary Interfaces and Versioning.


Note –

To ensure interface definition stability, no wildcard expansion is provided for defining symbol names.


This section presents several examples of using the mapfile syntax.

The following example shows how three symbol references can be defined. These references are then used to extract members of an archive. Although this archive extraction can be achieved by specifying multiple -u options to the link-edit, this example also shows how the eventual scope of a symbol can be reduced to local.


$ cat foo.c
foo()
{
        (void) printf("foo: called from lib.a\n");
}
$ cat bar.c
bar()
{
        (void) printf("bar: called from lib.a\n");
}
$ cat main.c
extern  void    foo(), bar();

main()
{
        foo();
        bar();
}
$ ar -rc lib.a foo.o bar.o main.o
$ cat mapfile
{
        local:
                foo;
                bar;
        global:
                main;
};
$ cc -o prog -M mapfile lib.a
$ prog
foo: called from lib.a
bar: called from lib.a
$ nm -x prog | egrep "main$|foo$|bar$"
[28]    |0x00010604|0x00000024|FUNC |LOCL |0x0  |7      |foo
[30]    |0x00010628|0x00000024|FUNC |LOCL |0x0  |7      |bar
[49]    |0x0001064c|0x00000024|FUNC |GLOB |0x0  |7      |main

The significance of reducing symbol scope from global to local is covered in more detail in the section Reducing Symbol Scope.

The following example shows how two absolute symbol definitions can be defined. These definitions are then used to resolve the references from the input file main.c.


$ cat main.c
extern  int     foo();
extern  int     bar;

main()
{
        (void) printf("&foo = %x\n", &foo);
        (void) printf("&bar = %x\n", &bar);
}
$ cat mapfile
{
        global:
                foo = FUNCTION V0x400;
                bar = DATA V0x800;
};
$ cc -o prog -M mapfile main.c
$ prog
&foo = 400 &bar = 800
$ nm -x prog | egrep "foo$|bar$"
[37]    |0x00000800|0x00000000|OBJT |GLOB |0x0  |ABS    |bar
[42]    |0x00000400|0x00000000|FUNC |GLOB |0x0  |ABS    |foo

When obtained from an input file, symbol definitions for functions or data items are usually associated with elements of data storage. A mapfile definition is insufficient to be able to construct this data storage, so these symbols must remain as absolute values.

However, a mapfile can also be used to define a COMMON, or tentative, symbol. Unlike other types of symbol definition, tentative symbols do not occupy storage within a file, but define storage that must be allocated at runtime. Therefore, symbol definitions of this kind can contribute to the storage allocation of the output file being generated.

A feature of tentative symbols that differs from other symbol types is that their value attribute indicates their alignment requirement. A mapfile definition can therefore be used to realign tentative definitions obtained from the input files of a link-edit.

The following example shows the definition of two tentative symbols. The symbol foo defines a new storage region whereas the symbol bar is actually used to change the alignment of the same tentative definition within the file main.c.


$ cat main.c
extern  int     foo;
int             bar[0x10];

main()
{
        (void) printf("&foo = %x\n", &foo);
        (void) printf("&bar = %x\n", &bar);
}
$ cat mapfile
{
        global:
                foo = COMMON V0x4 S0x200;
                bar = COMMON V0x100 S0x40;
};
$ cc -o prog -M mapfile main.c
ld: warning: symbol `bar' has differing alignments:
        (file mapfile value=0x100; file main.o value=0x4);
        largest value applied
$ prog
&foo = 20940
&bar = 20900
$ nm -x prog | egrep "foo$|bar$"
[37]    |0x00020900|0x00000040|OBJT |GLOB |0x0  |16     |bar
[42]    |0x00020940|0x00000200|OBJT |GLOB |0x0  |16     |foo

Note –

This symbol resolution diagnostic can be suppressed by using the link-editor's -t option.


Reducing Symbol Scope

Symbol definitions defined to have local scope within a mapfile can be used to reduce the symbol's eventual binding. This mechanism can play an important role in reducing the symbol's visibility to future link-edits that use the generated file as part of their input. In fact, this mechanism can provide for the precise definition of a file's interface, and so restrict the functionality made available to others.

For example, say you want to generate a simple shared object from the files foo.c and bar.c. The file foo.c contains the global symbol foo, which provides the service that you want to make available to others. The file bar.c contains the symbols bar and str, which provide the underlying implementation of the shared object. The creation of a simple shared object usually results in all three of these symbols having global scope.


$ cat foo.c
extern  const char *    bar();

const char * foo()
{
        return (bar());
}
$ cat bar.c
const char * str = "returned from bar.c";

const char * bar()
{
        return (str);
}
$ cc -o lib.so.1 -G foo.c bar.c
$ nm -x lib.so.1 | egrep "foo$|bar$|str$"
[29]    |0x000104d0|0x00000004|OBJT |GLOB |0x0  |12     |str
[32]    |0x00000418|0x00000028|FUNC |GLOB |0x0  |6      |bar
[33]    |0x000003f0|0x00000028|FUNC |GLOB |0x0  |6      |foo

You can now use the functionality offered by this shared object as part of the link-edit of another application. References to the symbol foo are bound to the implementation provided by the shared object.

Because of their global binding, direct reference to the symbols bar and str is also possible. This can have dangerous consequences, as you might later change the implementation underlying the function foo. In so doing, you could unintentionally cause an existing application that had bound to bar or str to fail or misbehave.

Another consequence of the global binding of the symbols bar and str is that they can be interposed upon by symbols of the same name. The interposition of symbols within shared objects is covered in section Simple Resolutions. This interposition can be intentional and be used as a means of circumventing the intended functionality offered by the shared object. On the other hand, this interposition can be unintentional, the result of the same common symbol name used for both the application and the shared object.

When developing the shared object, you can protect against this scenario by reducing the scope of the symbols bar and str to a local binding. In the following example, the symbols bar and str are no longer available as part of the shared objects interface. Thus, these symbols cannot be referenced, or interposed upon, by an external object. You have effectively defined an interface for the shared object. This interface can be managed while hiding the details of the underlying implementation.


$ cat mapfile
{
        local:
                bar;
                str;
};
$ cc -o lib.so.1 -M mapfile -G foo.c bar.c
$ nm -x lib.so.1 | egrep "foo$|bar$|str$"
[27]    |0x000003dc|0x00000028|FUNC |LOCL |0x0  |6      |bar
[28]    |0x00010494|0x00000004|OBJT |LOCL |0x0  |12     |str
[33]    |0x000003b4|0x00000028|FUNC |GLOB |0x0  |6      |foo

This symbol scope reduction has an additional performance advantage. The symbolic relocations against the symbols bar and str that would have been necessary at runtime are now reduced to relative relocations. This reduces the runtime overhead of initializing and processing the shared object. See When Relocations are Performed for details of symbolic relocation overhead.

As the number of symbols processed during a link-edit increases, the ability to define each local scope reduction within a mapfile becomes harder to maintain. An alternative and more flexible mechanism enables you to define the shared objects interface in terms of the global symbols that should be maintained, and instructs the link-editor to reduce all other symbols to local binding. This mechanism is achieved using the special auto-reduction directive “*”. For example, the previous mapfile definition can be rewritten to define foo as the only global symbol required in the output file generated:


$ cat mapfile
lib.so.1.1
{
        global:
                foo;
        local:
                *;
};
$ cc -o lib.so.1 -M mapfile -G foo.c bar.c
$ nm -x lib.so.1 | egrep "foo$|bar$|str$"
[30]    |0x00000370|0x00000028|FUNC |LOCL |0x0  |6      |bar
[31]    |0x00010428|0x00000004|OBJT |LOCL |0x0  |12     |str
[35]    |0x00000348|0x00000028|FUNC |GLOB |0x0  |6      |foo

This example also defines a version name, lib.so.1.1, as part of the mapfile directive. This version name establishes an internal version definition that defines the file's symbolic interface. The creation of a version definition is recommended. The definition forms the foundation of an internal versioning mechanism that can be used throughout the evolution of the file. See Chapter 5, Application Binary Interfaces and Versioning.


Note –

If a version name is not supplied, the output file name is used to label the version definition. The versioning information created within the output file can be suppressed using the link-editor's -z noversion option.


Whenever a version name is specified, all global symbols must be assigned to a version definition. If any global symbols remain unassigned to a version definition, the link-editor generates a fatal error condition:


$ cat mapfile
lib.so.1.1 {
        global:
                foo;
};
$ cc -o lib.so.1 -M mapfile -G foo.c bar.c
Undefined           first referenced
 symbol                 in file
str                     bar.o  (symbol has no version assigned)
bar                     bar.o  (symbol has no version assigned)
ld: fatal: Symbol referencing errors. No output written to lib.so.1

The -B local option can be used to assert the auto-reduction directive “*” from the command line. The previous example could be compiled successfully with:


$ cc -o lib.so.1 -M mapfile -B local -G foo.c bar.c

When generating an executable or shared object, any symbol reduction results in the recording of version definitions within the output image, together with the reduction of the appropriate symbols. When generating a relocatable object, the version definitions are created but the symbol reductions are not processed. The result is that the symbol entries for any symbol reductions still remain global. For example, using the previous mapfile with the auto-reduction directive and associated relocatable objects, an intermediate relocatable object is created that shows no symbol reduction.


$ cat mapfile
lib.so.1.1 {
        global:
                foo;
        local:
                *;
};
$ ld -o lib.o -M mapfile -r foo.o bar.o
$ nm -x lib.o | egrep "foo$|bar$|str$"
[17]    |0x00000000|0x00000004|OBJT |GLOB |0x0  |3      |str
[19]    |0x00000028|0x00000028|FUNC |GLOB |0x0  |1      |bar
[20]    |0x00000000|0x00000028|FUNC |GLOB |0x0  |1      |foo

The version definitions created within this image show that symbol reductions are required. When the relocatable object is used eventually to generate an executable or shared object, the symbol reductions occur. In other words, the link-editor reads and interprets symbol reduction information contained in relocatable objects in the same manner as it processes the data from a mapfile.

Thus, the intermediate relocatable object produced in the previous example can now be used to generate a shared object:


$ ld -o lib.so.1 -G lib.o
$ nm -x lib.so.1 | egrep "foo$|bar$|str$"
[22]    |0x000104a4|0x00000004|OBJT |LOCL |0x0  |14     |str
[24]    |0x000003dc|0x00000028|FUNC |LOCL |0x0  |8      |bar
[36]    |0x000003b4|0x00000028|FUNC |GLOB |0x0  |8      |foo

Symbol reduction at the point at which an executable or shared object is created is typically the most common requirement. However, symbol reductions can be forced to occur when creating a relocatable object by using the link-editor's -B reduce option.


$ ld -o lib.o -M mapfile -B reduce -r foo.o bar.o
$ nm -x lib.o | egrep "foo$|bar$|str$"
[15]    |0x00000000|0x00000004|OBJT |LOCL |0x0  |3      |str
[16]    |0x00000028|0x00000028|FUNC |LOCL |0x0  |1      |bar
[20]    |0x00000000|0x00000028|FUNC |GLOB |0x0  |1      |foo

Symbol Elimination

An extension to symbol reduction is the elimination of a symbol entry from an object's symbol table. Local symbols are only maintained in an object's .symtab symbol table. This entire table can be removed from the object using the link-editor's -s option, or strip(1). On occasion, you might want to maintain the .symtab symbol table but remove selected local symbol definitions from it.

Symbol elimination can be carried out using the mapfile directive eliminate. As with the local directive, symbols can be individually defined, or the symbol name can be defined as the special auto-elimination directive “*”. The following example shows the elimination of the symbol bar for the previous symbol reduction example.


$ cat mapfile
lib.so.1.1
{
        global:
                foo;
        local:
                str;
        eliminate:
                *;
};
$ cc -o lib.so.1 -M mapfile -G foo.c bar.c
$ nm -x lib.so.1 | egrep "foo$|bar$|str$"
[31]    |0x00010428|0x00000004|OBJT |LOCL |0x0  |12     |str
[35]    |0x00000348|0x00000028|FUNC |GLOB |0x0  |6      |foo

The -B eliminate option can be used to assert the auto-elimination directive “*” from the command line.

External Bindings

When a symbol reference from the object being created is satisfied by a definition within a shared object, the symbol remains undefined. The relocation information associated with the symbol provides for its lookup at runtime. The shared object that provided the definition typically becomes a dependency.

The runtime linker employs a default search model to locate this definition at runtime. It typically searches each object, starting with the dynamic executable, and progressing through each dependency in the same order in which the objects were loaded.

Objects can also be created using the link-editor's -B direct option. With this option the relationship between the referenced symbol and the object that provides the symbol's definition is maintained within the object being created. The runtime linker uses this information to directly bind the reference to the object that defines the symbol, thus bypassing the default symbol search model. Direct binding information can only be established to dependencies specified with the link-edit. Therefore, use of the -z defs option is recommended. Direct binding can significantly reduce the symbol lookup processing required at runtime. See Direct Binding for more details on this runtime binding model.

String Table Compression

String tables are compressed by the link-editor by removing duplicate entries and tail substrings. This compression can significantly reduce the size of any string tables. A compressed .dynstr table can produce a smaller text segment and hence reduce runtime paging activity. Because of these benefits, string table compression is enabled by default.

Linking objects that contribute a very large number of symbols may increase the link-edit time due to the string table compression. To avoid this cost during development use the link-editors -z nocompstrtab option. Any string table compression performed during a link-edit can be displayed using the link-editors debugging tokens -D strtab,detail.

Generating the Output File

After all input file processing and symbol resolution is completed with no fatal errors, the link-editor can start generating the output file. The link-editor establishes the additional sections that must be generated to complete the output file. These sections include the symbol tables that contain local symbol definitions from the input files, together with the global and weak symbol information that has been collected in the link-editor's internal symbol table.

Also included are any output relocation and dynamic information sections required by the runtime linker. After all the output section information has been established, the total output file size is calculated and the output file image is created accordingly.

When creating a dynamic executable or shared object, two symbol tables are usually generated. The .dynsym table and its associated string table .dynstr contain register (even if these are local), global, weak, and section symbols. These sections become part of the text segment that is mapped as part of the process image at runtime (see the mmap(2) man page). This enables the runtime linker to read these sections and perform any necessary relocations.

The .symtab table, and its associated string table .strtab contain all the symbols collected from the input file processing. These sections are not mapped as part of the process image. They can even be stripped from the image using the link-editor's -s option, or after the link-edit using strip(1).

During the generation of the symbol tables, reserved symbols are created. These symbols have special meaning to the linking process and should not be defined in your code.

_etext

The first location after the text segment.

_edata

The first location after initialized data.

_end

The first location after all data.

_DYNAMIC

The address of the dynamic information section (the .dynamic section).

_END_

The same as _end. The symbol has local scope and, together with _START_, provides a means of establishing an object's address range.

_GLOBAL_OFFSET_TABLE_

The position-independent reference to a link-editor supplied table of addresses, the .got section. This table is constructed from position-independent data references occurring in objects that have been compiled with the -K pic option. See Position-Independent Code.

_PROCEDURE_LINKAGE_TABLE_

The position-independent reference to a link-editor supplied table of addresses, the .plt section. This table is constructed from position-independent function references occurring in objects that have been compiled with the -K pic option. See Position-Independent Code.

_START_

The first location within the text segment. The symbol has local scope and, together with _END_, provides a means of establishing an object's address range.

When generating an executable, the link-editor looks for additional symbols to define the executable's entry point. If a symbol was specified using the link-editor's -e option, that symbol is used. Otherwise the link-editor looks for the reserved symbol names _start, and then main. If none of these symbols exists, the first address of the text segment is used.

Relocation Processing

After you have created the output file, all data sections from the input files are copied to the new image. Any relocations specified by the input files are applied to the output image. Any additional relocation information that must be generated is also written to the new image.

Relocation processing is normally uneventful, although error conditions might arise that are accompanied by specific error messages. Two conditions are worth more discussion. The first condition involves text relocations that result from position-dependent code. This condition is covered in more detail in Position-Independent Code. The second condition can arise from displacement relocations, which is described more fully in the next section.

Displacement Relocations

Error conditions might occur if displacement relocations are applied to a data item, which itself can be used in a copy relocation. The details of copy relocations are covered in Copy Relocations.

A displacement relocation remains valid when both the relocated offset and the target to which it is relocated remain separated by the same displacement. A copy relocation is one where a global data item within a shared object is copied to the .bss of an executable, to preserve the executable's read-only text segment. If the copied data has a displacement relocation applied to it, or an external relocation is a displacement into the copied data, the displacement relocation becomes invalidated.

The areas to address in trying to catch these sorts of errors are:

To help diagnose these problem areas, the link-editor indicates the displacement relocation use of a dynamic object with one or more dynamic DT_FLAGS_1 flags, as shown in Table 7–45. In addition, the link-editor's -z verbose option can be used to display suspicious relocations.

For example, say you create a shared object with a global data item, bar[], which has a displacement relocation applied to it. This item could be copy-relocated if referenced from a dynamic executable. The link-editor warns of this condition with:


$ cc -G -o libfoo.so.1 -z verbose -K pic foo.o
ld: warning: relocation warning: R_SPARC_DISP32: file foo.o: symbol foo: \
    displacement relocation to be applied to the symbol bar: at 0x194: \
    displacement relocation will be visible in output image

If you now create an application that references the data item bar[], a copy relocation will be created which results in the displacement relocation being invalidated. Because the link-editor can explicitly discover this situation, an error message is generated regardless of the use of the -z verbose option.


$ cc -o prog prog.o -L. -lfoo
ld: warning: relocation error: R_SPARC_DISP32: file foo.so: symbol foo: \
    displacement relocation applied to the symbol bar at: 0x194: \
    the symbol bar is a copy relocated symbol

Note –

ldd(1), when used with either the -d or -r options, uses the displacement dynamic flags to generate similar relocation warnings.


These error conditions can be avoided by ensuring that the symbol definition being relocated (offset) and the symbol target of the relocation are both local. Use static definitions or the link-editor's scoping technology. See Reducing Symbol Scope. Relocation problems such as these can be avoided by accessing data within shared objects using functional interfaces.

Debugging Aids

A debugging library is provided with the Solaris linkers. This library enables you to trace the link-editing process in more detail. This library helps you understand, or debug, the link-edit of your own applications or libraries. Although the type of information displayed using this library is expected to remain constant, the exact format of the information might change slightly from release to release.

Some of the debugging output might be unfamiliar if you do not have an intimate knowledge of the ELF format. However, many aspects might be of general interest to you.

Debugging is enabled by using the -D option, and all output produced is directed to the standard error. This option must be augmented with one or more tokens to indicate the type of debugging required. The tokens available can be displayed by typing -D help at the command line.


$ ld -Dhelp
debug:
debug:            For debugging the link-editing of an application:
debug:                  LD_OPTIONS=-Dtoken1,token2 cc -o prog ...
debug:            or,
debug:                  ld -Dtoken1,token2 -o prog ...
debug:            where placement of -D on the command line is significant
debug:            and options can be switched off by prepending with `!'.
debug:
debug:
debug: args       display input argument processing
debug: basic      provide basic trace information/warnings
debug: detail     provide more information in conjunction with other options
debug: entry      display entrance criteria descriptors
debug: files      display input file processing (files and libraries)
debug: got        display GOT symbol information
debug: help       display this help message
debug: libs       display library search paths; detail flag shows actual
debug:              library lookup (-l) processing
debug: map        display map file processing
debug: move       display move section processing
debug: reloc      display relocation processing
debug: sections   display input section processing
debug: segments   display available output segments and address/offset
debug:              processing; detail flag shows associated sections
debug: statistics display processing statistics
debug: strtab     display information about string table compression; detail
debug:              shows layout of string tables
debug: support    display support library processing
debug: symbols    display symbol table processing; detail flag shows
debug:              internal symbol table addition and resolution (ld only)
debug: tls        display TLS processing info
debug: unused     display unused/unreferenced files; detail flag shows
debug:              unused sections (ld only)
debug: versions   display version processing

Note –

This listing is an example, and shows the options meaningful to the link-editor. The exact options might differ from release to release.


Most compiler drivers interpret the -D option during their preprocessing phase. Therefore, the LD_OPTIONS environment variable is a suitable mechanism for passing this option to the link-editor.

The following example shows how input files can be traced. This syntax can be especially useful in determining what libraries have been located, or what relocatable objects have been extracted from an archive during a link-edit.


$ LD_OPTIONS=-Dfiles cc -o prog main.o -L. -lfoo
............
debug: file=main.o  [ ET_REL ]
debug: file=./libfoo.a  [ archive ]
debug: file=./libfoo.a(foo.o)  [ ET_REL ]
debug: file=./libfoo.a  [ archive ] (again)
............

Here the member foo.o is extracted from the archive library libfoo.a to satisfy the link-edit of prog. Notice that the archive is searched twice to verify that the extraction of foo.o did not warrant the extraction of additional relocatable objects. More than one “(again)” display indicates that the archive is a candidate for ordering using lorder(1) and tsort(1).

By using the symbols token, you can determine which symbol caused an archive member to be extracted, and which object made the initial symbol reference.


$ LD_OPTIONS=-Dsymbols cc -o prog main.o -L. -lfoo
............
debug: symbol table processing; input file=main.o  [ ET_REL ]
............
debug: symbol[7]=foo  (global); adding
debug:
debug: symbol table processing; input file=./libfoo.a  [ archive ]
debug: archive[0]=bar
debug: archive[1]=foo  (foo.o) resolves undefined or tentative symbol
debug:
debug: symbol table processing; input file=./libfoo(foo.o)  [ ET_REL ]
.............

The symbol foo is referenced by main.o and is added to the link-editor's internal symbol table. This symbol reference causes the extraction of the relocatable object foo.o from the archive libfoo.a.


Note –

This output has been simplified for this document.


By using the detail token together with the symbols token, the details of symbol resolution during input file processing can be observed.


$ LD_OPTIONS=-Dsymbols,detail cc -o prog main.o -L. -lfoo
............
debug: symbol table processing; input file=main.o  [ ET_REL ]
............
debug: symbol[7]=foo  (global); adding
debug:   entered  0x000000 0x000000 NOTY GLOB  UNDEF REF_REL_NEED
debug:
debug: symbol table processing; input file=./libfoo.a  [ archive ]
debug: archive[0]=bar
debug: archive[1]=foo  (foo.o) resolves undefined or tentative symbol
debug:
debug: symbol table processing; input file=./libfoo.a(foo.o)  [ ET_REL ]
debug: symbol[1]=foo.c
.............
debug: symbol[7]=bar  (global); adding
debug:   entered  0x000000 0x000004 OBJT GLOB  3     REF_REL_NEED
debug: symbol[8]=foo  (global); resolving [7][0]
debug:       old  0x000000 0x000000 NOTY GLOB  UNDEF main.o
debug:       new  0x000000 0x000024 FUNC GLOB  2     ./libfoo.a(foo.o)
debug:  resolved  0x000000 0x000024 FUNC GLOB  2     REF_REL_NEED
............

The original undefined symbol foo from main.o has been overridden with the symbol definition from the extracted archive member foo.o. The detailed symbol information reflects the attributes of each symbol.

In the previous example, you can see that using some of the debugging tokens can produce a wealth of output. In cases where you are interested only in the activity around a subset of the input files, the -D option can be placed directly in the link-edit command-line, and toggled on and off. In the following example the display of symbol processing is switched on only during the processing of the library libbar.


$ ld .... -o prog main.o -L. -Dsymbols -lbar -D!symbols .... 

Note –

To obtain the link-edit command line you might have to expand the compilation line from any driver being used. See Using a Compiler Driver.