Linker and Libraries Guide

Input File Processing

The link-editor reads input files in the order in which they appear on the command-line. Each file is opened and inspected to determine its ELF file type and therefore determine how it must be processed. The file types that apply as input for the link-edit are determined by the binding mode of the link-edit, either static or dynamic.

Under static mode, the link-editor accepts only relocatable objects or archive libraries as input files. Under dynamic mode, the link-editor also accepts shared objects.

Relocatable objects represent the most basic input file type to the link-editing process. The program data sections within these files are concatenated into the output file image being generated. The link-edit information sections are organized for later use, but do not become part of the output file image, as new sections are generated to take their places. Symbols are gathered into a special internal symbol table that allows for their verification and resolution, and eventually for the creation of one or more symbol tables in the output image.

Although any input file can be specified directly on the link-edit command-line, archive libraries and shared objects are commonly specified using the -l option (see "Linking with Additional Libraries" for coverage of this mechanism and how it relates to the two different linking modes). However, even though shared objects are often referred to as shared libraries, and both of these objects can be specified using the same option, the interpretation of shared objects and archive libraries is quite different. The next two sections expand upon these differences.

Archive Processing

Archives are built using ar(1), and usually consist of a collection of relocatable objects with an archive symbol table. This symbol table provides an association of symbol definitions with the objects that supply these definitions. By default, the link-editor provides selective extraction of archive members. When the link-editor reads an archive, it uses information within the internal symbol table it is creating to select only the objects from the archive it requires to complete the binding process. It is also possible to explicitly extract all members of an archive.

The link-editor will extract a relocatable object from an archive if:

The archive contains a symbol definition that satisfies a symbol reference (sometimes referred to as an undefined symbol) presently held in the link-editor's internal symbol table.

The archive contains a data symbol definition that satisfies a tentative symbol definition presently held in the link-editor's internal symbol table. An example of this is a FORTRAN COMMON block definition, which will cause the extraction of a relocatable object that defines the same DATA symbol.

The link-editors -zallextract is in effect. This option suspends selective archive extraction and causes all archive members to be extracted from the archive being processed.

Under selective archive extraction, a weak symbol reference will not cause the extraction of an object from an archive unless the -zweakextract option is in effect. Weak symbols are expanded upon in section "Simple Resolutions".

Note -

The options -zweakextract, -zallextract, and -zdefaultextract provide a means to toggle the archive extraction mechanism among multiple archives.

Under selective archive extraction the link-editor makes multiple passes through an archive extracting relocatable objects as needed to satisfy the symbol information being accumulated in the link-editor internal symbol table. After the link-editor has made a complete pass through the archive without extracting any relocatable objects, it moves on to process the next input file.

By extracting from the archive only the relocatable objects needed at the time the archive was encountered means that the position of the archive within the input file list can be significant (see "Position of an Archive on the Command-Line" for more details).

Note -

Although the link-editor makes multiple passes through an archive to resolve symbols, this mechanism can be quite costly for large archives containing random organizations of relocatable objects. In these cases you can use tools like lorder(1) and tsort(1) to order the relocatable objects within the archive and so reduce the number of passes the link-editor must carry out.

Shared Object Processing

Shared objects are indivisible, whole units that have been generated by a previous link-edit of one or more input files. When the link-editor processes a shared object the entire contents of the shared object become a logical part of the resulting output file image. The shared object is not copied physically during the link-edit as its actual inclusion is deferred until process execution. This logical inclusion means that all symbol entries defined in the shared object are made available to the link-editing process.

The shared object's program data sections and most of the link-editing information sections are unused by the link-editor, as these will be interpreted by the runtime linker when the shared object is bound to generate a runable process. However, the occurrence of a shared object is remembered, and information is stored in the output file image to indicate that this object is a dependency and must be made available at runtime.

By default, all shared objects specified as part of a link-edit are recorded as dependencies in the object being built. This recording is made regardless of whether the object being built actually references symbols offered by the shared object. To minimize runtime linking overhead, specify only those dependencies required to resolve symbol references from the object being built as part of the link-edit. The link-editor's debugging capabilities, and ldd(1) with the -u option, can be used to determine unused dependencies. Alternatively, the link-editor's -zignore option can suppress the dependency recording of unused shared objects.

If a shared object has dependencies on other shared objects, these too will be processed. This processing occurs after all command-line input files have been processed. These shared objects will be used to complete the symbol resolution process, however their names will not be recorded as dependencies in the output file image being generated.

Although the position of a shared object on the link-edit command-line has less significance than it does for archive processing, it can have a global effect. Multiple symbols of the same name are allowed to occur between relocatable objects and shared objects, and between multiple shared objects (see "Symbol Resolution" for more details).

The order of shared objects processed by the link-editor is maintained in the dependency information stored in the output file image. As the runtime linker reads this information, it loads the specified shared objects in the same order. Therefore, the link-editor and the runtime linker select the first occurrence of a symbol of a multiply-defined series of symbols.

Note -

Multiple symbol definitions, and thus the information to describe the interposing of one definition of a symbol for another, are reported in the load map output generated using the -m option.

Linking with Additional Libraries

Although the compiler drivers often ensure that appropriate libraries are specified to the link-editor, frequently you must supply your own. Shared objects and archives can be specified by explicitly naming the input files required to the link-editor, but a more common and more flexible method involves using the link-editor's -l option.

Library Naming Conventions

By convention, shared objects are usually designated by the prefix lib and the suffix .so, and archives are designated by the prefix lib and the suffix .a. For example, libc.so is the shared object version of the standard C library made available to the compilation environment, and libc.a is its archive version.

These conventions are recognized by the -l option of the link-editor. This option is commonly used to supply additional libraries to a link-edit. The following example:

$ cc -o prog file1.c file2.c -lfoo

directs the link-editor to search for libfoo.so, and if it does not find it, to search for libfoo.a, before moving on to the next directory to be searched.

Note -

There is a naming convention regarding the compilation environment and the runtime environment use of shared objects. The compilation environment uses the simple .so suffix, whereas the runtime environment commonly uses the suffix with an additional version number. See "Naming Conventions" and "Coordination of Versioned Filenames" for more details.

When link-editing in dynamic mode, you can choose to link with a mix of shared objects and archives. When link-editing in static mode, only archive libraries are acceptable for input.

When in dynamic mode and using the -l option to enable a library search, the link-editor will first search in a given directory for a shared object that matches the specified name. If no match is found, the link-editor will then look for an archive library in the same directory. When in static mode and using the -l option, only archive libraries will be sought.

Linking With a Mix of Shared Objects and Archives

Although the library search mechanism, in dynamic mode, searches a given directory for a shared object, then an archive library, finer control of the type of search required can be achieved using the -B option.

By specifying the -Bdynamic and -Bstatic options on the command-line, as many times as required, the library search can be toggled between shared objects or archives respectively. For example, to link an application with the archive libfoo.a and the shared object libbar.so, issue the following command:

$ cc -o prog main.o file1.c -Bstatic -lfoo -Bdynamic -lbar

The -Bstatic and -Bdynamic keywords are not exactly symmetrical. When you specify -Bstatic, the link-editor does not accept shared objects as input until the next occurrence of -Bdynamic. However, when you specify -Bdynamic, the link-editor first looks for shared objects and then archives in any given directory.

In the previous example it is more precise to say that the link-editor first searches for libfoo.a, and then for libbar.so, and if that fails, for libbar.a. Finally, it will search for libc.so, and if that fails, libc.a.

Position of an Archive on the Command-Line

The position of an archive on the command-line can affect the output file being produced. The link-editor searches an archive only to resolve undefined or tentative external references it has previously seen. After this search is completed and the required relocatable objects have been extracted, the archive is not available to resolve any new symbols obtained from the input files that follow the archive on the command-line. For example, the command

$ cc -o prog file1.c -Bstatic -lfoo file2.c file3.c -Bdynamic

directs the link-editor to search libfoo.a only to resolved symbol references that have been obtained from file1.c. libfoo.a is not available to resolve symbol references from file2.c or file3.c.

Note -

As a rule, it is best to specify any archives at the end of the command-line unless multiple-definition conflicts require you to do otherwise.

Directories Searched by the Link-Editor

All previous examples assume the link-editor knows where to search for the libraries listed on the command-line. By default, when linking 32-bit objects, the link-editor knows of only two standard directories to look for libraries, /usr/ccs/lib and /usr/lib. When linking 64-bit objects only one standard directory is used /usr/lib/64. All other directories to be searched must be added to the link-editor's search path explicitly.

You can change the link-editor search path in two ways: using a command-line option, or using an environment variable.

Using a Command-Line Option

The -L option can be used to add a new pathname to the library search path. This option affects the search path at the point it is encountered on the command-line. For example, the command

$ cc -o prog main.o -Lpath1 file1.c -lfoo file2.c -Lpath2 -lbar

searches path1 (then /usr/ccs/lib and /usr/lib) to find libfoo, but searches path1 and then path2 (and then /usr/ccs/lib and /usr/lib) to find libbar.

Pathnames defined using the -L option are used only by the link-editor. They are not recorded in the output file image created for use by the runtime linker.

Note -

You must specify -L if you want the link-editor to search for libraries in your current directory. You can use a period (.) to represent the current directory.

The -Y option can be used to change the default directories searched by the link-editor. The argument supplied with this option takes the form of a colon separated list of directories. For example, the command

$ cc -o prog main.c -YP,/opt/COMPILER/lib:/home/me/lib -lfoo

searches for libfoo only in the directories /opt/COMPILER/lib and /home/me/lib. The directories specified using the -Y option can be supplemented by using the -L option.

Using an Environment Variable

You can also use the environment variable LD_LIBRARY_PATH, which takes a colon-separated list of directories, to add to the link-editor's library search path. In its most general form, LD_LIBRARY_PATH takes two directory lists separated by a semicolon. The first list is searched before the list(s) supplied on the command-line, and the second list is searched after.

Here is the combined effect of setting LD_LIBRARY_PATH and calling the link-editor with several -L occurrences:

$ LD_LIBRARY_PATH=dir1:dir2;dir3
$ export LD_LIBRARY_PATH
$ cc -o prog main.c -Lpath1 ... -Lpath2 ... -Lpathn -lfoo

The effective search path will be dir1:dir2:path1:path2... pathn:dir3:/usr/ccs/lib:/usr/lib.

If no semicolon is specified as part of the LD_LIBRARY_PATH definition, the specified directory list is interpreted after any -L options. For example:

$ LD_LIBRARY_PATH=dir1:dir2
$ export LD_LIBRARY_PATH
$ cc -o prog main.c -Lpath1 ... -Lpath2 ... -Lpathn -lfoo

Here the effective search path will be path1:path2... pathn:dir1:dir2:/usr/ccs/lib:/usr/lib.

Note -

This environment variable can also be used to augment the search path of the runtime linker (see "Directories Searched by the Runtime Linker" for more details). To prevent this environment variable from influencing the link-editor, use the -i option.

Directories Searched by the Runtime Linker

By default, the runtime linker knows of only one standard place to look for libraries, /usr/lib when processing 32-bit objects, and /usr/lib/64 when processing 64-bit objects. All other directories to be searched must be added to the runtime linker's search path explicitly.

When a dynamic executable or shared object is linked with additional shared objects, these shared objects are recorded as dependencies that must be located again during process execution by the runtime linker. During the link-edit, one or more pathnames can be recorded in the output file. These pathnames are used by the runtime linker to search for any shared object dependencies. These recorded pathnames are referred to as a runpath.

Objects may be built with the -znodefaultlib option to suppress any search of the standard locations (/usr/lib or /usr/lib/64) at runtime. Use of this option implies that all the dependencies of an object can be located using its runpaths. Without this option, which is the most common case, no matter how you modify the runtime linker's library search path, its last element is always /usr/lib for 32-bit objects and /usr/lib/64 for 64-bit objects.

Note -

Default search paths can be administrated using a runtime configuration file (see "Configuring the Default Search Paths"). However the creator of an object should not rely on the existence of this file, and should always insure their object can locate its dependencies with only its runpaths or standard system defaults.

The -R option, which takes a colon-separated list of directories, can be used to record a runpath in a dynamic executable or shared library. For example:

$ cc -o prog main.c -R/home/me/lib:/home/you/lib -Lpath1 \
-Lpath2 file1.c file2.c -lfoo -lbar

will record the runpath /home/me/lib:/home/you/lib in the dynamic executable prog. The runtime linker uses these paths, then the default location /usr/lib, to locate any shared object dependencies. In this case, this runpath is used to locate libfoo.so.1 and libbar.so.1.

The link-editor accepts multiple -R options and concatenates each of these specifications, separated by a colon. Thus, the previous example can also be expressed as:

$ cc -o prog main.c -R/home/me/lib -Lpath1 -R/home/you/lib \
-Lpath2 file1.c file2.c -lfoo -lbar

For objects that may be installed in various locations, the $ORIGIN dynamic string token provides a flexible means of recording a runpath. See "Locating Associated Dependencies".

Note -

A historic alternative to specifying the -R option is to set the environment variable LD_RUN_PATH, and make this available to the link-editor. The scope and function of LD_RUN_PATH and -R are identical, but when both are specified, -R supersedes LD_RUN_PATH.

Initialization and Termination Sections

Dynamic objects may supply code that provides for runtime initialization and termination processing. This code can be encapsulated in one of two ways, either an array of function pointers or a single code block. Each of these section types is built from a concatenation of like sections from the input relocatable objects.

The sections .preinit_array, .init_array and .fini_array provide arrays of, respectively, runtime pre-initialization, initialization, and termination functions. When creating a dynamic object, the link-editor identifies these arrays with the .dynamic tags DT_PREINIT_ARRAY and DT_PREINIT_ARRAYSZ, DT_INIT_ARRAY and DT_INIT_ARRAYSZ, and DT_FINI_ARRAY and DT_FINI_ARRAYSZ accordingly, so that they may be called by the runtime linker. The pre-initialization array is applicable to dynamic executables only. See "Initialization and Termination Routines" for more details.

The sections .init and .fini provide, respectively, a runtime initialization and termination code block. However, the compiler drivers typically supply .init and .fini sections as part of the files they add to the beginning and end of your input-file list. These files have the effect of encapsulating the .init and .fini code into individual functions that are identified by the reserved symbol names _init and _fini respectively. When creating a dynamic object, the link-editor identifies these symbols with the .dynamic tags DT_INIT and DT_FINI accordingly, so that they may be called by the runtime linker. See "Initialization and Termination Routines" for more details.

The registration of initialization and termination functions can be carried out directly by the link-editor using the -zinitarray and -zfiniarray options. For example, the following command results in the address of the function foo being placed in an .initarray element, and the address of the function bar being placed in a .finiarray element:

$ cat main.c
#include    <stdio.h>

void foo()
{
     (void) printf("initializing: foo()\n");
}

void bar()
{
     (void) printf("finalizing: bar()\n");
}

main()
{
     (void) printf("main()\n");
     return (0);
}

$ cc -o main -zinitarray=foo -zfiniarray=bar main.c
$ main
initializing: foo()
main()
finalizing: bar()

The creation of initialization and termination sections can be carried out directly using an assembler, or some compilers can offer special primitives to simplify their declaration. For example, the same example containing the following #pragmas can result in a call to the function foo being placed in an .init section, and a call to the function bar being placed in a .fini section:

$ cat main.c
#include    <stdio.h>

#pragma init (foo)
#pragma fini (bar)

.......
$ cc -o main main.c
$ main
initializing: foo()
main()
finalizing: bar()

Be careful when designing initialization and termination code that can be included in both a shared object and archive library. If this code is spread throughout several relocatable objects within an archive library, then the link-edit of an application using this archive might extract only a portion of the objects, and therefore only a portion of the initialization and termination code. At runtime, only this portion of code is executed. The same application built against the shared object will have all the accumulated initialization and termination code executed at runtime when the shared object is loaded in as one of the application's dependencies.

Determining the sequence of executing initialization and termination code within a process at runtime is a complex issue involving dependency analysis. Initialization and termination code that references external global symbols make this process more difficult and can result in cyclic dependencies. The most flexible initialization and termination code references elements only within the resident object.

It is also recommended that data initialization be independent if the initialization code is involved with a dynamic object whose memory can be dumped using dldump(3DL).