Linker and Libraries Guide

Input File Processing

The link-editor reads input files in the order in which they appear on the command line. Each file is opened and inspected to determine its ELF file type and therefore determine how it must be processed. The file types that apply as input for the link-edit are determined by the binding mode of the link-edit, either static or dynamic.

Under static mode, the link-editor accepts only relocatable objects or archive libraries as input files. Under dynamic mode, the link-editor also accepts shared objects.

Relocatable objects represent the most basic input file type to the link-editing process. The program data sections within these files are concatenated into the output file image being generated. The link-edit information sections are organized for later use, but do not become part of the output file image, as new sections are generated to take their places. Symbols are gathered into an internal symbol table for verification and resolution. This table is then used to create one or more symbol tables in the output image.

Although any input file can be specified directly on the link-edit command-line, archive libraries and shared objects are commonly specified using the -l option. See Linking With Additional Libraries for coverage of this mechanism and how it relates to the two different linking modes. However, even though shared objects are often referred to as shared libraries, and both of these objects can be specified using the same option, the interpretation of shared objects and archive libraries is quite different. The next two sections expand upon these differences.

Archive Processing

Archives are built using ar(1), and usually consist of a collection of relocatable objects with an archive symbol table. This symbol table provides an association of symbol definitions with the objects that supply these definitions. By default, the link-editor provides selective extraction of archive members. When the link-editor reads an archive, it uses information within the internal symbol table it is creating to select only the objects from the archive it requires to complete the binding process. You can also explicitly extract all members of an archive.

The link-editor extracts a relocatable object from an archive if:

The archive member contains a symbol definition that satisfies a symbol reference, sometimes referred to as an undefined symbol, presently held in the link-editor's internal symbol table.
The archive member contains a data symbol definition that satisfies a tentative symbol definition presently held in the link-editor's internal symbol table. An example of this is a FORTRAN COMMON block definition, which causes the extraction of a relocatable object that defines the same DATA symbol.
The link-editors -z allextract is in effect. This option suspends selective archive extraction and causes all archive members to be extracted from the archive being processed.

Under selective archive extraction, a weak symbol reference does not extract an object from an archive unless the -z weakextract option is in effect. See Simple Resolutions for more information.

Note –

The options -z weakextract, -z allextract, and -z defaultextract enable you to toggle the archive extraction mechanism among multiple archives.

With selective archive extraction, the link-editor makes multiple passes through an archive to extract relocatable objects as needed to satisfy the symbol information being accumulated in the link-editor internal symbol table. After the link-editor has made a complete pass through the archive without extracting any relocatable objects, it moves on to process the next input file.

By extracting from the archive only the relocatable objects needed at the time the archive was encountered, the position of the archive within the input file list can be significant. See Position of an Archive on the Command Line.

Note –

Although the link-editor makes multiple passes through an archive to resolve symbols, this mechanism can be quite costly for large archives containing random organizations of relocatable objects. In these cases, you should use tools like lorder(1) and tsort(1) to order the relocatable objects within the archive and so reduce the number of passes the link-editor must carry out.

Shared Object Processing

Shared objects are indivisible whole units that have been generated by a previous link-edit of one or more input files. When the link-editor processes a shared object, the entire contents of the shared object become a logical part of the resulting output file image. This logical inclusion means that all symbol entries defined in the shared object are made available to the link-editing process. The shared object is actually copied during process execution.

The shared object's program data sections and most of the link-editing information sections are unused by the link-editor. These sections are interpreted by the runtime linker when the shared object is bound to generate a runnable process. However, the occurrence of a shared object is remembered, and information is stored in the output file image to indicate that this object is a dependency and must be made available at runtime.

By default, all shared objects specified as part of a link-edit are recorded as dependencies in the object being built. This recording is made regardless of whether the object being built actually references symbols offered by the shared object. To minimize runtime linking overhead, specify only those dependencies required to resolve symbol references from the object being built as part of the link-edit. The link-editor's debugging capabilities, and ldd(1) with the -u option, can be used to determine unused dependencies. Alternatively, the link-editor's -z ignore option can suppress the dependency recording of unused shared objects.

If a shared object has dependencies on other shared objects, these dependencies are also processed. This processing occurs after all command-line input files have been processed. These shared objects will be used to complete the symbol resolution process; however, their names will not be recorded as dependencies in the output file image being generated.

Although the position of a shared object on the link-edit command-line has less significance than it does for archive processing, the position can have a global effect. Multiple symbols of the same name are allowed to occur between relocatable objects and shared objects, and between multiple shared objects. See Symbol Resolution.

The order of shared objects processed by the link-editor is maintained in the dependency information stored in the output file image. As the runtime linker reads this information, it loads the specified shared objects in the same order. Therefore, the link-editor and the runtime linker select the first occurrence of a symbol of a multiply-defined series of symbols.

Note –

Multiple symbol definitions, and thus the information to describe the interposing of one definition of a symbol for another, are reported in the load map output generated using the -m option.

Linking With Additional Libraries

Although the compiler drivers often ensure that appropriate libraries are specified to the link-editor, frequently you must supply your own. Shared objects and archives can be specified by explicitly naming the input files required to the link-editor, but a more common and more flexible method involves using the link-editor's -l option.

Library Naming Conventions

By convention, shared objects are usually designated by the prefix lib and the suffix .so, and archives are designated by the prefix lib and the suffix .a. For example, libc.so is the shared object version of the standard C library made available to the compilation environment, and libc.a is the library's archive version.

These conventions are recognized by the -l option of the link-editor. This option is commonly used to supply additional libraries to a link-edit. The following example directs the link-editor to search for libfoo.so. If the link-editor does not find libfoo.so, it searches for libfoo.a before moving on to the next directory to be searched.

$ cc -o prog file1.c file2.c -lfoo

Note –

There is a naming convention regarding the compilation environment and the runtime environment use of shared objects. The compilation environment uses the simple .so suffix, whereas the runtime environment commonly uses the suffix with an additional version number. See Naming Conventions and Coordination of Versioned Filenames.

When link-editing in dynamic mode, you can choose to link with a mix of shared objects and archives. When link-editing in static mode, only archive libraries are acceptable for input.

When in dynamic mode and using the -l option to enable a library search, the link-editor will first search in a given directory for a shared object that matches the specified name. If no match is found, the link-editor looks for an archive library in the same directory. When in static mode and using the -l option, only archive libraries are sought.

Linking With a Mix of Shared Objects and Archives

The library search mechanism in dynamic mode searches a given directory for a shared object, and then searches an archive library. Finer control of the type of search required is possible through the -B option.

By specifying the -B dynamic and -B static options on the command line as many times as required, you can toggle the library search between shared objects or archives respectively. For example, to link an application with the archive libfoo.a and the shared object libbar.so, issue the following command:

$ cc -o prog main.o file1.c -Bstatic -lfoo -Bdynamic -lbar

The -B static and -B dynamic keywords are not exactly symmetrical. When you specify -B static, the link-editor does not accept shared objects as input until the next occurrence of -B dynamic. However, when you specify -B dynamic, the link-editor first looks for shared objects and then archive library's in any given directory.

The precise description of the previous example is that the link-editor first searches for libfoo.a, and then for libbar.so, and if that search fails, for libbar.a. Finally, it searches for libc.so, and if that search fails, libc.a.

Position of an Archive on the Command Line

The position of an archive on the command line can affect the output file being produced. The link-editor searches an archive only to resolve undefined or tentative external references it has previously seen. After this search is completed and any required members have been extracted, the link-editor moves onto the next input file on the command line.

Therefore by default, the archive is not available to resolve any new references from the input files that follow the archive on the command line. For example, the following command directs the link-editor to search libfoo.a only to resolve symbol references that have been obtained from file1.c. The libfoo.a archive is not available to resolve symbol references from file2.c or file3.c.

$ cc -o prog file1.c -Bstatic -lfoo file2.c file3.c -Bdynamic

Note –

You should specify any archives at the end of the command line unless multiple-definition conflicts require you to do otherwise.

In some instances users have interdependencies between archives such that the extraction of members from one archive is resolved by extracting members from another archive. If these dependencies are cyclic, the archives must be specified repeatedly on the command line to satisfy previous references. For example:

$ cc -o prog .... -lA -lB -lC -lA -lB -lC -lA

The determination, and maintenance, of repeated archive specifications can be tedious. The -z rescan option makes this process simpler. Following all input file processing, this option causes the entire archive list to be reprocessed in an attempt to locate additional archive members that resolve symbol references. This archive rescanning continues until a pass over the archive list occurs in which no new members are extracted. The previous example could therefore be simplified to:

$ cc -o prog -z rescan .... -lA -lB -lC

Directories Searched by the Link-Editor

All previous examples assume the link-editor knows where to search for the libraries listed on the command line. By default, when linking 32–bit objects, the link-editor knows of only two standard directories in which to look for libraries, /usr/ccs/lib and /usr/lib. When linking 64–bit objects, only one standard directory is used, /usr/lib/64. All other directories to be searched must be added to the link-editor's search path explicitly.

You can change the link-editor search path in two ways: using a command-line option, or using an environment variable.

Using a Command-Line Option

You can use the -L option to add a new path name to the library search path. This option affects the search path at the point it is encountered on the command line. For example, the following command searches path1, then /usr/ccs/lib and /usr/lib, to find libfoo. It searches path1 and then path2, and then /usr/ccs/lib and /usr/lib, to find libbar.

$ cc -o prog main.o -Lpath1 file1.c -lfoo file2.c -Lpath2 -lbar

Path names defined using the -L option are used only by the link-editor. These path names are not recorded in the output file image created for use by the runtime linker.

Note –

You must specify -L if you want the link-editor to search for libraries in your current directory. You can use a period (.) to represent the current directory.

You can use the -Y option to change the default directories searched by the link-editor. The argument supplied with this option takes the form of a colon separated list of directories. For example, the following command searches for libfoo only in the directories /opt/COMPILER/lib and /home/me/lib.

$ cc -o prog main.c -YP,/opt/COMPILER/lib:/home/me/lib -lfoo

The directories specified using the -Y option can be supplemented by using the -L option.

Using an Environment Variable

You can also use the environment variable LD_LIBRARY_PATH, which takes a colon-separated list of directories, to add to the link-editor's library search path. In its most general form, LD_LIBRARY_PATH takes two directory lists separated by a semicolon. The first list is searched before the lists supplied on the command line, and the second list is searched after.

The following example shows the combined effect of setting LD_LIBRARY_PATH and calling the link-editor with several -L occurrences:

$ LD_LIBRARY_PATH=dir1:dir2;dir3
$ export LD_LIBRARY_PATH
$ cc -o prog main.c -Lpath1 ... -Lpath2 ... -Lpathn -lfoo

The effective search path is dir1:dir2:path1:path2... pathn:dir3:/usr/ccs/lib:/usr/lib.

If no semicolon is specified as part of the LD_LIBRARY_PATH definition, the specified directory list is interpreted after any -L options. In the following example, the effective search path is path1:path2... pathn:dir1:dir2:/usr/ccs/lib:/usr/lib.

$ LD_LIBRARY_PATH=dir1:dir2
$ export LD_LIBRARY_PATH
$ cc -o prog main.c -Lpath1 ... -Lpath2 ... -Lpathn -lfoo

Note –

This environment variable can also be used to augment the search path of the runtime linker. See Directories Searched by the Runtime Linker. To prevent this environment variable from influencing the link-editor, use the -i option.

Directories Searched by the Runtime Linker

The runtime linker only looks in one default location for dependencies. This location is /usr/lib when processing 32–bit objects, and /usr/lib/64 when processing 64–bit objects. All other directories to be searched must be added to the runtime linker's search path explicitly.

When a dynamic executable or shared object is linked with additional shared objects, these shared objects are recorded as dependencies. These dependencies must be located during process execution by the runtime linker. During the link-edit, one or more search paths can be recorded in the output file. These search paths are used by the runtime linker to locate any dependencies. These recorded search paths are referred to as a runpath.

Specialized objects may be built with the -z nodefaultlib option to suppress any search of the default location at runtime. Use of this option implies that all the dependencies of an object can be located using its runpaths. Without this option, no matter how you augment the runtime linker's search path, its last element is always the default location. /usr/lib for 32–bit objects and /usr/lib/64 for 64–bit objects.

Note –

The default search path can be administrated using a runtime configuration file. See Configuring the Default Search Paths. However, the creator of an object should not rely on the existence of this file. You should always ensure that an object can locate its dependencies with only its runpaths or the default location.

You can use the -R option, which takes a colon-separated list of directories, to record a runpath in a dynamic executable or shared object. The following example records the runpath /home/me/lib:/home/you/lib in the dynamic executable prog.

$ cc -o prog main.c -R/home/me/lib:/home/you/lib -Lpath1 \
-Lpath2 file1.c file2.c -lfoo -lbar

The runtime linker uses these paths, followed by the default location, to obtain any shared object dependencies. In this case, this runpath is used to locate libfoo.so.1 and libbar.so.1.

The link-editor accepts multiple -R options. These multiple specifications are concatenate together, separated by a colon. Thus, the previous example can also be expressed as follows.

$ cc -o prog main.c -R/home/me/lib -Lpath1 -R/home/you/lib \
-Lpath2 file1.c file2.c -lfoo -lbar

For objects that may be installed in various locations, the $ORIGIN dynamic string token provides a flexible means of recording a runpath. See Locating Associated Dependencies.

Note –

A historic alternative to specifying the -R option is to set the environment variable LD_RUN_PATH, and make this available to the link-editor. The scope and function of LD_RUN_PATH and -R are identical, but when both are specified, -R supersedes LD_RUN_PATH.

Initialization and Termination Sections

Dynamic objects may supply code that provides for runtime initialization and termination processing. This code can be encapsulated in one of two section types, either an array of function pointers or a single code block. Each of these section types is built from a concatenation of like sections from the input relocatable objects.

The sections .preinit_array, .init_array and .fini_array provide arrays of runtime pre-initialization, initialization, and termination functions, respectively. When creating a dynamic object, the link-editor identifies these arrays with the .dynamic tag pairs DT_PREINIT_[ARRAY/ARRAYSZ], DT_INIT_[ARRAY/ARRAYSZ], and DT_FINI_[ARRAY/ARRAYSZ] accordingly. These tags identify the associated sections so they may be called by the runtime linker. A pre-initialization array is applicable to dynamic executables only.

The sections .init and .fini provide a runtime initialization and termination code block, respectively. However, the compiler drivers typically supply .init and .fini sections with files they add to the beginning and end of your input file list. These files have the effect of encapsulating the .init and .fini code into individual functions. These functions are identified by the reserved symbol names _init and _fini respectively. When creating a dynamic object, the link-editor identifies these symbols with the .dynamic tags DT_INIT and DT_FINI accordingly. These tags identify the associated sections so they may be called by the runtime linker.

For more information regarding the execution of initialization and termination code at runtime see Initialization and Termination Routines.

The registration of initialization and termination functions can be carried out directly by the link-editor using the -z initarray and -z finiarray options. For example, the following command places the address of foo() in an .initarray element, and the address of bar() in a .finiarray element.

$ cat main.c
#include    <stdio.h>

void foo()
{
        (void) printf("initializing: foo()\n");
}

void bar()
{
        (void) printf("finalizing: bar()\n");
}

main()
{
        (void) printf("main()\n");
        return (0);
}

$ cc -o main -zinitarray=foo -zfiniarray=bar main.c
$ main
initializing: foo()
main()
finalizing: bar()

The creation of initialization and termination sections can be carried out directly using an assembler. However, most compilers offer special primitives to simplify their declaration. For example, the previous code example can be rewritten using the following #pragma definitions. These definitions result in a call to foo() being placed in an .init section, and a call to bar() being placed in a .fini section.

$ cat main.c
#include    <stdio.h>

#pragma init (foo)
#pragma fini (bar)

.......
$ cc -o main main.c
$ main
initializing: foo()
main()
finalizing: bar()

Initialization and termination code, spread throughout several relocatable objects, can result in different behavior when included in an archive library or shared object. The link-edit of an application using this archive might extract only a fraction of the objects contained in the archive. These objects might provide only a portion of the initialization and termination code spread throughout the members of the archive. At runtime, only this portion of code is executed. The same application built against the shared object will have all the accumulated initialization and termination code executed when the dependency is loaded at runtime.

To determine the order of executing initialization and termination code within a process at runtime is a complex issue involving dependency analysis. Limiting the content of initialization and termination code can simplifying this analysis, while providing both flexible, and predictable runtime behavior. See Initialization and Termination Order for more details.

Data initialization should be independent if the initialization code is involved with a dynamic object whose memory can be dumped using dldump(3DL).