Copy Relocations (Linker and Libraries Guide)

Linker and Libraries Guide

Copy Relocations

Shared objects are usually built with position-independent code. References to external data items from code of this type employs indirect addressing through a set of tables. See Position-Independent Code for more details. These tables are updated at runtime with the real address of the data items. These updated tables enable access to the data without the code itself being modified.

Dynamic executables, however, are generally not created from position-independent code. Any references to external data they make can seemingly only be achieved at runtime by modifying the code that makes the reference. Modifying a read-only text segment is to be avoided. The copy relocation technique can solve this reference.

Suppose the link-editor is used to create a dynamic executable, and a reference to a data item is found to reside in one of the dependent shared objects. Space is allocated in the dynamic executable's .bss, equivalent in size to the data item found in the shared object. This space is also assigned the same symbolic name as defined in the shared object. Along with this data allocation, the link-editor generates a special copy relocation record that instructs the runtime linker to copy the data from the shared object to the allocated space within the dynamic executable.

Because the symbol assigned to this space is global, it is used to satisfy any references from any shared objects. The dynamic executable inherits the data item. Any other objects within the process that make reference to this item are bound to this copy. The original data from which the copy is made effectively becomes unused.

The following example of this mechanism uses an array of system error messages that is maintained within the standard C library. In previous SunOS operating system releases, the interface to this information was provided by two global variables, sys_errlist[], and sys_nerr. The first variable provided the array of error message strings, while the second conveyed the size of the array itself. These variables were commonly used within an application in the following manner.

$ cat foo.c
extern int  sys_nerr;
extern char *sys_errlist[];

char *
error(int errnumb)
{
        if ((errnumb < 0) || (errnumb >= sys_nerr))
                return (0);
        return (sys_errlist[errnumb]);
}

The application uses the function error to provide a focal point to obtain the system error message associated with the number errnumb.

Examining a dynamic executable built using this code shows the implementation of the copy relocation in more detail.

$ cc -o prog main.c foo.c
$ elfdump -sN.dynsym prog | grep ' sys_'
      [24]  0x00021240 0x00000260  OBJT GLOB  D    1 .bss           sys_errlist
      [39]  0x00021230 0x00000004  OBJT GLOB  D    1 .bss           sys_nerr
$ elfdump -c prog
....
Section Header[19]:  sh_name: .bss
    sh_addr:      0x21230         sh_flags:   [ SHF_WRITE SHF_ALLOC ]
    sh_size:      0x270           sh_type:    [ SHT_NOBITS ]
    sh_offset:    0x1230          sh_entsize: 0
    sh_link:      0               sh_info:    0
    sh_addralign: 0x8
....
$ elfdump -r prog

Relocation Section:  .SUNW_reloc
    type                       offset     addend  section        symbol
....
  R_SPARC_COPY                0x21240          0  .SUNW_reloc    sys_errlist
  R_SPARC_COPY                0x21230          0  .SUNW_reloc    sys_nerr
....

The link-editor has allocated space in the dynamic executable's .bss to receive the data represented by sys_errlist and sys_nerr. These data are copied from the C library by the runtime linker at process initialization. Thus, each application that uses these data gets a private copy of the data in its own data segment.

There are two drawbacks to this technique. First, each application pays a performance penalty for the overhead of copying the data at runtime. Second, the size of the data array sys_errlist has now become part of the C library's interface. Suppose the size of this array were to change, perhaps as new error messages are added. Any dynamic executables that reference this array have to undergo a new link-edit to be able to access any of the new error messages. Without this new link-edit, the allocated space within the dynamic executable is insufficient to hold the new data.

These drawbacks can be eliminated if the data required by a dynamic executable are provided by a functional interface. The ANSI C function strerror(3C) returns a pointer to the appropriate error string, based on the error number supplied to it. One implementation of this function might be:

$ cat strerror.c
static const char *sys_errlist[] = {
        "Error 0",
        "Not owner",
        "No such file or directory",
        ......
};
static const int sys_nerr =
        sizeof (sys_errlist) / sizeof (char *);

char *
strerror(int errnum)
{
        if ((errnum < 0) || (errnum >= sys_nerr))
                return (0);
        return ((char *)sys_errlist[errnum]);
}

The error routine in foo.c can now be simplified to use this functional interface. This simplification in turn removes any need to perform the original copy relocations at process initialization.

Additionally, because the data are now local to the shared object, the data are no longer part of its interface. The shared object therefore has the flexibility of changing the data without adversely effecting any dynamic executables that use it. Eliminating data items from a shared object's interface generally improves performance while making the shared object's interface and code easier to maintain.

ldd(1), when used with either the -d or -r options, can verify any copy relocations that exist within a dynamic executable.

For example, suppose the dynamic executable prog had originally been built against the shared object libfoo.so.1 and the following two copy relocations had been recorded.

$ cat foo.c
int _size_gets_smaller[16];
int _size_gets_larger[16];
$ cc -o libfoo.so -G foo.c
$ cc -o prog main.c -L. -R. -lfoo
$ elfdump -sN.symtab prog | grep _size
      [49]  0x000211d0 0x00000040  OBJT GLOB  D    0 .bss           _size_gets_larger
      [59]  0x00021190 0x00000040  OBJT GLOB  D    0 .bss           _size_gets_smaller
$ elfdump -r prog | grep _size
  R_SPARC_COPY                0x211d0          0  .SUNW_reloc    _size_gets_larger
  R_SPARC_COPY                0x21190          0  .SUNW_reloc    _size_gets_smaller

A new version of this shared object is supplied that contains different data sizes for these symbols.

$ cat foo2.c
int _size_gets_smaller[4];
int _size_gets_larger[32];
$ cc -o libfoo.so -G foo2.c
$ elfdump -sN.symtab libfoo.so | grep _size
      [37]  0x000105cc 0x00000010  OBJT GLOB  D    0 .bss           _size_gets_smaller
      [41]  0x000105dc 0x00000080  OBJT GLOB  D    0 .bss           _size_gets_larger

Running ldd(1) against the dynamic executable reveals the following.

$ ldd -d prog
    libfoo.so.1 =>   ./libfoo.so.1
    ....
  	relocation R_SPARC_COPY sizes differ: _size_gets_larger
		  (file prog size=0x40; file ./libfoo.so size=0x80)
		  prog size used; possible data truncation
	  relocation R_SPARC_COPY sizes differ: _size_gets_smaller
		  (file prog size=0x40; file ./libfoo.so size=0x10)
		  ./libfoo.so size used; possible insufficient data copied
....

ldd(1) shows that the dynamic executable will copy as much data as the shared object has to offer, but only accepts as much as its allocated space allows.

Copy relocations can be eliminated by building the application from position-independent code. See Position-Independent Code.