Lazy Loading of Dynamic Dependencies

Language:

When a dynamic object is loaded into memory, the object is examined for any additional dependencies. By default, any dependencies that exist are immediately loaded. This cycle continues until the full dependency tree is exhausted. Finally, all inter-object data references that are specified by relocations, are resolved. These operations are performed regardless of whether the code in these dependencies is referenced by the application during its execution.

Under a lazy loading model, any dependencies that are labeled for lazy loading are loaded only when explicitly referenced. By taking advantage of the lazy binding of a function call, the loading of a dependency is delayed until the function is first referenced. As a result, objects that are never referenced are never loaded.

A relocation reference can be immediate or lazy. Because immediate references must be resolved when an object is initialized, any dependency that satisfies this reference must be immediately loaded. Therefore, identifying such a dependency as lazy loadable has little effect. See When Relocations Are Performed. Immediate references between dynamic objects are generally discouraged.

Lazy loading is used by the link-editors reference to a debugging library, liblddbg. As debugging is only called upon infrequently, loading this library every time that the link-editor is invoked is unnecessary and expensive. By indicating that this library can be lazily loaded, the expense of processing the library is moved to those invocations that ask for debugging output.

The alternate method of achieving a lazy loading model is to use dlopen() and dlsym() to load and bind to a dependency when needed. This model is ideal if the number of dlsym() references is small. This model also works well if the dependency name or location is not known at link-edit time. For more complex interactions with known dependencies, coding to normal symbol references and designating the dependency to be lazily loaded is simpler.

An object is designated as lazily or normally loaded through the link-editor options –z lazyload and –z nolazyload respectively. These options are position-dependent on the link-edit command line. Any dependency that follows the option takes on the loading attribute specified by the option. By default, the –z nolazyload option is in effect.

The following simple program has a dependency on libdebug.so.1. The dynamic section, .dynamic, shows libdebug.so.1 is marked for lazy loading. The symbol information section, .SUNW_syminfo, shows the symbol reference that triggers libdebug.so.1 loading.

$ cc -o prog prog.c -L. -z lazyload -ldebug -z nolazyload -lelf -R'$ORIGIN'
$ elfdump -d prog

Dynamic Section:  .dynamic
   index  tag           value
     [0]  POSFLAG_1       0x1         [ LAZY ]
     [1]  NEEDED        0x123         libdebug.so.1
     [2]  NEEDED        0x131         libelf.so.1
     [3]  NEEDED        0x13d         libc.so.1
     [4]  RUNPATH       0x147         $ORIGIN
     ...
$ elfdump -y prog

Syminfo section: .SUNW_syminfo
   index  flgs        bound to        symbol
    ....
    [52]  DL      [1] libdebug.so.1   debug

The POSFLAG_1 with the value of LAZY designates that the following NEEDED entry, libdebug.so.1, should be lazily loaded. As libelf.so.1 has no preceding LAZY flag, this library is loaded at the initial startup of the program.

Note - libc.so.1 has special system requirements, that require the file not be lazy loaded. If –z lazyload is in effect when libc.so.1 is processed, the flag is effectively ignored.

The use of lazy loading can require a precise declaration of dependencies and runpaths through out the objects used by an application. For example, suppose two objects, libA.so and libB.so, both make reference to symbols in libX.so. libA.so declares libX.so as a dependency, but libB.so does not. Typically, when libA.so and libB.so are used together, libB.so can reference libX.so because libA.so made this dependency available. But, if libA.so declares libX.so to be lazy loaded, it is possible that libX.so might not be loaded when libB.so makes reference to this dependency. A similar failure can occur if libB.so declares libX.so as a dependency but fails to provide a runpath necessary to locate the dependency.

Regardless of lazy loading, dynamic objects should declare all their dependencies and how to locate the dependencies. With lazy loading, this dependency information becomes even more important.

Note - Lazy loading can be disabled at runtime by setting the environment variable LD_NOLAZYLOAD to a non-null value.

Providing an Alternative to `dlopen()`

dlopen(3C) and dlsym(3C) are often used to load and exercise additional objects. See Runtime Linking Programming Interface. For example, the following code from libdep.so.1 loads libbar.so.1, and on success calls interfaces provided by libbar.so.1.

void dep()
{
        void *handle;

        if ((handle = dlopen("libbar.so.1", RTLD_LAZY)) != NULL) {
                int (*fptr)();

                if ((fptr = (int (*)())dlsym(handle, "bar1")) != NULL)
                        (*fptr)(arg1);
                if ((fptr = (int (*)())dlsym(handle, "bar2")) != NULL)
                        (*fptr)(arg2);
                ....
        }
}

Although very flexible, this model of using dlopen() and dlsym() is an unnatural coding style, and has some drawbacks.

The object in which the symbols are expected to exist must be known.
The calls through function pointers provide no means of verification by either the compiler, or lint(1).

This code can be simplified if the object that supplies the required interfaces satisfies the following conditions.

The object can be established as a dependency at link-edit time.
The object is always available.

By exploiting that a function reference can trigger lazy loading, the same delayed loading of libbar.so.1 can be achieved. In this case, the reference to the function bar1() results in lazy loading the associated dependency. This coding is far more natural, and the use of standard function calls provides for compiler, or lint(1) validation.

void dep()
{
        bar1(arg1);
        bar2(arg2);
        ....
}
$ cc -G -o libdep.so.1 dep.c -L. -z lazyload -lbar -lc

However, this model fails if the object that provides the required interfaces is not always available. Should the application be exercised when LD_BIND_NOW is set, or the shared object be loaded through dlopen(3C) with the RTLD_NOW flag, then all references from the associated objects are processed. Any failure to bind a symbol reference to a definition results in a fatal error.

In this case, it is desirable to test for the existence of the dependency, without having to know the dependency name. A means of testing for the availability of a dependency that satisfies a function reference is required.

A robust model for testing for the existence of a function can be achieved with deferred dependencies, and use of dlsym(3C) with the RTLD_PROBE handle.

Deferred symbol references differ from standard symbol references in the following details.

Deferred references can only be established for function calls.
Deferred references are directly bound at runtime to the associated dependency.
Deferred references are not resolved as part of standard relocation processing, or LD_BIND_NOW processing, or through dlopen(3C) with the RTLD_NOW flag.

Deferred references are resolved during process execution, when the associated function is first referenced. The assurance of this delayed resolution provides a window where the caller can test for the existence of the deferred dependency before making calls to the deferred function.

Deferred Dependencies

A deferred dependency identifies a dependency for which all references to that dependency are deferred. Deferred dependencies are established at link-edit time using the link-editors –z deferred option.

$ cc -G -o libdef.so.1 def.c -lfoo -z deferred -lbar -lc

The deferred nature of these references can be observed from the symbol information and dynamic information defined within the referring object.

$ elfdump -d libdef.so.1 | egrep "NEEDED|POSFLAG"
    [0]  NEEDED          0x85        libfoo.so
    [1]  POSFLAG_1       0x4         [ DEFERRED ]
    [2]  NEEDED          0x8f        libbar.so
    [3]  NEEDED          0x99        libc.so

$ elfdump -y libdep.so.1 | egrep "foo|bar"
    [4]  [ DEPEND DEFERRED ]   [2]   libbar.so  bar1
    [7]  [ DEPEND ]            [0]   libfoo.so  foo1
    ...

Having established libbar.so.1 as a deferred dependency, at runtime a dlsym(RTLD_PROBE) against one of the bar() symbols can be used to determine whether the family of symbols are available. On success, the members of the family can be called as direct function calls. These calls are much more legible and easier to write, and allow the compiler to catch errors in their calling sequences.

void dep()
{
        if (dlsym(RTLD_PROBE, "bar1")) {
                bar1(arg1);
                bar2(arg2);
                ....
        }
}

Deferred dependencies offer an additional level of flexibility. Provided the dependency has not already been loaded, the dependency can be changed at runtime. This mechanism offers a level of flexibility similar to dlopen(3C), where different objects can be loaded and bound to by the caller.

If the original dependency name is known, then the original dependency can be exchanged for a new dependency using dlinfo(3C) with the RTLD_DI_DEFERRED argument. Alternatively, a deferred symbol that is associated with the dependency can be used to identify the deferred dependency using dlinfo(3C) with the RTLD_DI_DEFERRED_SYM argument.

Oracle® Solaris 11.3 Linkers and Libraries Guide