Linker and Libraries Guide

Runtime Allocation of Thread-Local Storage

TLS is created at three occasions during the lifetime of a program:

Thread-local data storage is layed out at runtime as illustrated in Figure 8–1.

Figure 8–1 Runtime Storage Layout of Thread-Local Storage

Runtime Thread-Local Storage Layout

Program Startup

At program startup, the runtime system creates TLS for the main thread.

First, the runtime linker logically combines the TLS templates for all loaded dynamic objects, including the dynamic executable, into a single static template. Each dynamic objects's TLS template is assigned an offset within the combined template, tlsoffsetm, as follows:

tlssizem+1 and alignm+1 are the size and alignment, respectively, for the allocation template for dynamic object m (1 <= m <= M, where M is the total number of loaded dynamic objects). The round(offset, align) function returns an offset rounded up to the next multiple of align. The TLS template is placed immediately preceding the thread pointer tpt. Accesses to the TLS data are based off of subtractions from tpt.

Next, the runtime linker computes the total startup TLS allocation size, tlssizeS, which is equal to tlsoffsetM.

The runtime linker then constructs a linked list of initialization records. Each record in this list describes the TLS initialization image for one loaded dynamic object, and contains the following fields:

The thread library uses this information to allocate storage for the initial thread. This storage is initialized, and a dynamic TLS vector for the initial thread is created.

Thread Creation

For the initial thread, and for each new thread created, the thread library allocates a new TLS block for each loaded dynamic object. Blocks may be allocated separately, or as a single contiguous block.

Each thread t has an associated thread pointer tpt, which points to the thread control block, TCB. The thread pointer, tp, always contains the value of tpt for the current running thread.

The thread library then creates a vector of pointers, dtvt, for the current thread t. The first element of each vector contains a generation number gent, which is used to determine when the vector needs to be extended. See Deferred Allocation of Thread-Local Storage Blocks.

Each remaining element in the vector, dtvt,m, is a pointer to the block reserved for the TLS belonging to dynamic object m.

For dynamically loaded, post-startup objects, the thread library defers the allocation of TLS blocks. This allocation occurs when the first reference is made to a TLS variable within the loaded object. All references to TLS defined in a post-startup, dynamically loaded object, must use a dynamic TLS model. For blocks whose allocation has been deferred, the pointer dtvt,m is set to an implementation-defined special value.


Note –

The runtime linker may group TLS templates for all startup objects such that they share a single element in the vector, dtvt,1. This does not affect the offset calculations described above or the creation of the list of initialization records. For the following sections, however, the value of M, the total number of objects, start with the value of 1.


The thread library then copies the initialization images to the corresponding locations within the new block of storage.

Post-Startup Dynamic Loading

When a shared library containing TLS is loaded following process startup, the runtime linker extends the list of initialization records to include the initialization template of new library. The new object is given an index of m = M + 1, and the counter M is incremented by one. However, the allocation of new TLS blocks is deferred until they are actually referenced.

When a library containing TLS is unloaded, the TLS blocks used by that library are freed.

Deferred Allocation of Thread-Local Storage Blocks

In a dynamic TLS model, when a thread t needs to access a TLS block for object m, the code updates the dtvt and performs the initial allocation of the TLS block. The thread library provides the following interface to provide for dynamic TLS allocation:

typedef struct {
    unsigned long ti_moduleid;
    unsigned long ti_tlsoffset;
} TLS_index;

extern void * __tls_get_addr(TLS_index * ti);     (SPARC)
extern void * ___tls_get_addr(TLS_index * ti);    (x86)

Note –

The SPARC and x86 definitions of this function have the same function signature. However, the x86 version does not use the default x86 calling convention of passing arguments on the stack. Instead, the x86 version passes it's argument via the %eax register which is more efficient. To denote that this alternate calling method is used, the x86 function name has three leading underscores in its name.


Both versions of tls_get_addr() check the per-thread generation counter, gent, to determine whether the vector needs to be updated. If the vector dtvt is out of date, the routine updates the vector, possibly reallocating it to make room for more entries. The routine then checks to see if the TLS block corresponding to dtvt,m has been allocated. If it has not been allocated, the routine allocates and initializes the block, using the information in the list of initialization records provided by the runtime linker. The pointer dtvt,m is set to point to the allocated block. The routine returns a pointer to the given offset within the block.