Linker and Libraries Guide

Input File Processing

The link-editor reads input files in the order in which they appear on the command-line. Each file is opened and inspected to determine its ELF file type and therefore determine how it must be processed. The file types that apply as input for the link-edit are determined by the binding mode of the link-edit, either static or dynamic.

Under static mode, the link-editor will accepts only relocatable objects or archive libraries as input files. Under dynamic mode, the link-editor also accepts shared objects.

Relocatable objects represent the most basic input file type to the link-editing process. The program data sections within these files are concatenated into the output file image being generated. The link-edit information sections are organized for later use, but do not become part of the output file image, as new sections are generated to take their places. Symbols are gathered into a special internal symbol table that allows for their verification and resolution, and eventually for the creation of one or more symbol tables in the output image.

Although any input file can be specified directly on the link-edit command-line, archive libraries and shared objects are commonly specified using the -l option (see "Linking with Additional Libraries" for coverage of this mechanism and how it relates to the two different linking modes). However, even though shared objects are often referred to as shared libraries, and both of these objects can be specified using the same option, the interpretation of shared objects and archive libraries is quite different. The next two sections expand upon these differences.

Archive Processing

Archives are built using ar(1), and usually consist of a collection of relocatable objects with an archive symbol table. This symbol table provides an association of symbol definitions with the objects that supply these definitions. By default, the link-editor provides selective extraction of archive members. When the link-editor reads an archive, it uses information within the internal symbol table it is creating to select only the objects from the archive it requires to complete the binding process. It is also possible to explicitly extract all members of an archive.

The link-editor will extract a relocatable object from an archive if:

The archive contains a symbol definition that satisfies a symbol reference (sometimes referred to as an undefined symbol) presently held in the link-editor's internal symbol table.

The archive contains a data symbol definition that satisfies a tentative symbol definition presently held in the link-editor's internal symbol table. An example of this is a FORTRAN COMMON block definition, which will cause the extraction of a relocatable object that defines the same DATA symbol.

The link-editors -z allextract is in effect. This option suspends selective archive extraction and causes all archive members to be extracted from the archive being processed.

Under selective archive extraction, a weak symbol reference will not cause the extraction of an object from an archive unless the -z weakextract option is in effect. Weak symbols are expanded upon in section "Simple Resolutions".

Note -

The options -z weakextract, -z allextract, and -z defaultextract provide a means to toggle the archive extraction mechanism among multiple archives.

Under selective archive extraction the link-editor makes multiple passes through an archive extracting relocatable objects as needed to satisfy the symbol information being accumulated in the link-editor internal symbol table. After the link-editor has made a complete pass through the archive without extracting any relocatable objects, it moves on to process the next input file.

By extracting from the archive only the relocatable objects needed at the time the archive was encountered means that the position of the archive within the input file list can be significant (see "Position of an Archive on the Command-Line" for more details).

Note -

Although the link-editor makes multiple passes through an archive to resolve symbols, this mechanism can be quite costly for large archives containing random organizations of relocatable objects. In these cases you can use tools like lorder(1) and tsort(1) to order the relocatable objects within the archive and so reduce the number of passes the link-editor must carry out.

Shared Object Processing

Shared objects are indivisible, whole units that have been generated by a previous link-edit of one or more input files. When the link-editor processes a shared object the entire contents of the shared object become a logical part of the resulting output file image. The shared object is not copied physically during the link-edit as its actual inclusion is deferred until process execution. This logical inclusion means that all symbol entries defined in the shared object are made available to the link-editing process.

The shared object's program data sections and most of the link-editing information sections are unused by the link-editor, as these will be interpreted by the runtime linker when the shared object is bound to generate a runable process. However, the occurrence of a shared object is remembered, and information is stored in the output file image to indicate that this object is a dependency and must be made available at runtime.

By default, all shared objects specified as part of a link-edit are recorded as dependencies in the object being built. This recording is made regardless of whether the object being built actually references symbols offered by the shared object. To minimize runtime linking overhead, specify only those dependencies required to resolve symbol references from the object being built as part of the link-edit. Alternatively, the link-editor's -z ignore option can suppress the dependency recording of unused shared objects.

If a shared object has dependencies on other shared objects, these too will be processed. This processing occurs after all command-line input files have been processed. These shared objects will be used to complete the symbol resolution process, however their names will not be recorded as dependencies in the output file image being generated.

Although the position of a shared object on the link-edit command-line has less significance than it does for archive processing, it can have a global effect. Multiple symbols of the same name are allowed to occur between relocatable objects and shared objects, and between multiple shared objects (see "Symbol Resolution" for more details).

The order of shared objects processed by the link-editor is maintained in the dependency information stored in the output file image. As the runtime linker reads this information, it loads the specified shared objects in the same order. Therefore, the link-editor and the runtime linker select the first occurrence of a symbol of a multiply defined series of symbols.

Note -

Multiple symbol definitions, and thus the information to describe the interposing of one definition of a symbol for another, are reported in the load map output generated using the -m option.

Linking with Additional Libraries

Although the compiler drivers often ensure that appropriate libraries are specified to the link-editor, frequently you must supply your own. Shared objects and archives can be specified by explicitly naming the input files required to the link-editor, but a more common and more flexible method involves using the link-editor's -l option.

Library Naming Conventions

By convention, shared objects are usually designated by the prefix lib and the suffix .so, and archives are designated by the prefix lib and the suffix .a. For example, libc.so is the shared object version of the standard C library made available to the compilation environment, and libc.a is its archive version.

These conventions are recognized by the -l option of the link-editor. This option is commonly used to supply additional libraries to a link-edit. The following example:

$ cc -o prog file1.c file2.c -lfoo

directs the link-editor to search for libfoo.so, and if it does not find it, to search for libfoo.a, before moving on to the next directory to be searched.

Note -

There is a naming convention regarding the compilation environment and the runtime environment use of shared objects. The compilation environment uses the simple .so suffix, whereas the runtime environment commonly uses the suffix with an additional version number. See "Naming Conventions" and "Coordination of Versioned Filenames" for more details.

When link-editing in dynamic mode, you can choose to link with a mix of shared objects and archives. When link-editing in static mode, only archive libraries are acceptable for input.

When in dynamic mode and using the -l option to enable a library search, the link-editor will first search in a given directory for a shared object that matches the specified name. If no match is found, the link-editor will then look for an archive library in the same directory. When in static mode and using the -l option, only archive libraries will be sought.

Linking With a Mix of Shared Objects and Archives

Although the library search mechanism, in dynamic mode, searches a given directory for a shared object, then an archive library, finer control of the type of search required can be achieved using the -B option.

By specifying the -Bdynamic and -Bstatic options on the command-line, as many times as required, the library search can be toggled between shared objects or archives respectively. For example, to link an application with the archive libfoo.a and the shared object libbar.so, issue the following command:

$ cc -o prog main.o file1.c -Bstatic -lfoo -Bdynamic -lbar

The -Bstatic and -Bdynamic keywords are not exactly symmetrical. When you specify -Bstatic, the link-editor does not accept shared objects as input until the next occurrence of -Bdynamic. However, when you specify -Bdynamic, the link-editor first looks for shared objects and then archives in any given directory.

In the previous example it is more precise to say that the link-editor first searches for libfoo.a, and then for libbar.so, and if that fails, for libbar.a. Finally, it will search for libc.so, and if that fails, libc.a.

Position of an Archive on the Command-Line

The position of an archive on the command-line can affect the output file being produced. The link-editor searches an archive only to resolve undefined or tentative external references it has previously seen. After this search is completed and the required relocatable objects have been extracted, the archive is not available to resolve any new symbols obtained from the input files that follow the archive on the command-line. For example, the command

$ cc -o prog file1.c -Bstatic -lfoo file2.c file3.c -Bdynamic

directs the link-editor to search libfoo.a only to resolved symbol references that have been obtained from file1.c. libfoo.a is not available to resolve symbol references from file2.c or file3.c.

Note -

As a rule, it is best to specify any archives at the end of the command-line unless multiple-definition conflicts require you to do otherwise.

Directories Searched by the Link-Editor

All previous examples assumed that the link-editor knows where to search for the libraries listed on the command-line. By default the link-editor knows of only two standard places to look for libraries, /usr/ccs/lib and /usr/lib for 32-bit link-edits, or /usr/lib/sparcv9 for 64-bit SPARCV9 link-edits. All other directories to be searched must be added to the link-editor's search path explicitly.

You can change the link-editor search path in two ways: using a command-line option, or using an environment variable.

Using a Command-Line Option

The -L option can be used to add a new pathname to the library search path. This option affects the search path at the point it is encountered on the command-line. For example, the command

$ cc -o prog main.o -Lpath1 file1.c -lfoo file2.c -Lpath2 -lbar

searches path1 (then /usr/ccs/lib and /usr/lib) to find libfoo, but searches path1 and then path2 (and then /usr/ccs/lib and /usr/lib) to find libbar.

Pathnames defined using the -L option are used only by the link-editor. They are not recorded in the output file image created for use by the runtime linker.

Note -

You must specify -L if you want the link-editor to search for libraries in your current directory. You can use a period (.) to represent the current directory.

The -Y option can be used to change the default directories searched by the link-editor. The argument supplied with this option takes the form of a colon separated list of directories. For example, the command

$ cc -o prog main.c -YP,/opt/COMPILER/lib:/home/me/lib -lfoo

searches for libfoo only in the directories /opt/COMPILER/lib and /home/me/lib. The directories specified using the -Y option can be supplemented by using the -L option.

Using an Environment Variable

You can also use the environment variable LD_LIBRARY_PATH, which takes a colon-separated list of directories, to add to the link-editor's library search path. In its most general form, LD_LIBRARY_PATH takes two directory lists separated by a semicolon. The first list is searched before the list(s) supplied on the command-line, and the second list is searched after. There is also an LD_LIBRARY_PATH_64 environment variable that overrides any active LD_LIBRARY_PATH setting when processing 64-bit objects.

Here is the combined effect of setting LD_LIBRARY_PATH and calling the link-editor with several -L occurrences:

$ LD_LIBRARY_PATH=dir1:dir2;dir3
$ export LD_LIBRARY_PATH
$ cc -o prog main.c -Lpath1 ... -Lpath2 ... -Lpathn -lfoo

The effective search path will be dir1:dir2:path1:path2... pathn:dir3:/usr/ccs/lib:/usr/lib.

If no semicolon is specified as part of the LD_LIBRARY_PATH definition, the specified directory list is interpreted after any -L options. For example:

$ LD_LIBRARY_PATH=dir1:dir2
$ export LD_LIBRARY_PATH
$ cc -o prog main.c -Lpath1 ... -Lpath2 ... -Lpathn -lfoo

Here the effective search path will be path1:path2... pathn:dir1:dir2:/usr/ccs/lib:/usr/lib.

Note -

This environment variable can also be used to augment the search path of the runtime linker (see "Directories Searched by the Runtime Linker" for more details). To prevent this environment variable from influencing the link-editor, use the -i option.

Directories Searched by the Runtime Linker

The runtime linker knows of only one standard place to look for libraries, /usr/lib, or /usr/lib/sparcv9 for 64-bit SPARCV9 libraries. All other directories to be searched must be added to the runtime linker's search path explicitly.

When a dynamic executable or shared object is linked with additional shared objects, these shared objects are recorded as dependencies that must be located again during process execution by the runtime linker. During the link-edit, one or more pathnames can be recorded in the output file. These pathnames are used by the runtime linker to search for any shared object dependencies. These recorded pathnames are referred to as a runpath.

Note -

No matter how you modify the runtime linker's library search path, its last element is always /usr/lib.

The -R option, which takes a colon-separated list of directories, can be used to record a runpath in a dynamic executable or shared library. For example:

$ cc -o prog main.c -R/home/me/lib:/home/you/lib -Lpath1 \
-Lpath2 file1.c file2.c -lfoo -lbar

will record the runpath /home/me/lib:/home/you/lib in the dynamic executable prog. The runtime linker uses these paths, then the default location /usr/lib, to locate any shared object dependencies. In this case, this runpath is used to locate libfoo.so.1 and libbar.so.1.

The link-editor accepts multiple -R options and concatenates each of these specifications, separated by a colon. Thus, the previous example can also be expressed as:

$ cc -o prog main.c -R/home/me/lib -Lpath1 \
-R/home/you/lib -Lpath2 file1.c file2.c -lfoo -lbar

Note -

A historic alternative to specifying the -R option is to set the environment variable LD_RUN_PATH, and make this available to the link-editor. The scope and function of LD_RUN_PATH and -R are identical, but when both are specified, -R supersedes LD_RUN_PATH.

Initialization and Termination Sections

The .init and .fini section types provide for runtime initialization and termination processing. These section types are concatenated from the input relocatable objects like any other sections. However, the compiler drivers can also supply .init and .fini sections as part of the additional files they add to the beginning and end of your input-file list.

These files have the effect of encapsulating the .init and .fini code into individual functions that are identified by the reserved symbol names _init and _fini respectively.

When building a dynamic executable or shared object, the link-editor records these symbol addresses in the output file's image so they can be called by the runtime linker during initialization and termination processing. See "Initialization and Termination Routines" for more details on the runtime processing of these sections.

The creation of .init and .fini sections can be carried out directly using an assembler, or some compilers can offer special primitives to simplify their declaration. For example, the following code segments result in a call to the function foo being placed in an .init section, and a call to the function bar being placed in a .fini section:

#pragma init (foo)
#pragma fini (bar)
 
foo()
{
    /* Perform some initialization processing. */
    ......
}
 
bar()
{
    /* Perform some termination processing. */
    .......
}

Be careful when designing initialization and termination code that can be included in both a shared object and archive library. If this code is spread throughout several relocatable objects within an archive library, then the link-edit of an application using this archive can extract only a portion of the modules, and therefore only a portion of the initialization and termination code. At runtime, only this portion of code is executed.

The same application built against the shared object will have all the accumulated initialization and termination code executed at runtime when the shared object is mapped in as one of the application's dependencies.

It is also recommended that data initialization be independent if the .init code is involved with a dynamic object whose memory can be dumped using dldump(3X).