Linker and Libraries Guide

Runtime Allocation of Thread-Local Storage

TLS is created at three occasions during the lifetime of a program.

At program startup.
When a new thread is created.
When a thread references a TLS block for the first time after a shared object is loaded following program startup.

Thread-local data storage is laid out at runtime as illustrated in Figure 8–1.

Figure 8–1 Runtime Storage Layout of Thread-Local Storage

Program Startup

At program startup, the runtime system creates TLS for the main thread.

First, the runtime linker logically combines the TLS templates for all loaded dynamic objects, including the dynamic executable, into a single static template. Each dynamic objects's TLS template is assigned an offset within the combined template, tlsoffset_m, as follows.

tlsoffset₁ = round(tlssize₁, align₁ )
tlsoffset_m+1 = round(tlsoffset_m + tlssize_m+1, align_m+1)

tlssize_m+1 and align_m+1 are the size and alignment, respectively, for the allocation template for dynamic object m. Where 1 <= m <= M, and M is the total number of loaded dynamic objects. The round(offset, align) function returns an offset rounded up to the next multiple of align.

Next, the runtime linker computes the allocation size that is required for the startup TLS, tlssize_S. This size is equal to tlsoffset_M, plus an additional 512 bytes. This addition provides a backup reservation for static TLS references. Shared objects that make static TLS references, and are loaded after process initialization, are assigned to this backup reservation. However, this reservation is a fixed, limited size. In addition, this reservation is only capable of providing storage for uninitialized TLS data items. For maximum flexibility, shared objects should reference thread-local variables using a dynamic TLS model.

The static TLS arena associated with the calculated TLS size tlssize_S, is placed immediately preceding the thread pointer tp_t. Accesses to this TLS data is based off of subtractions from tp_t.

The static TLS arena is associated with a linked list of initialization records. Each record in this list describes the TLS initialization image for one loaded dynamic object. Each record contains the following fields.

A pointer to the TLS initialization image.
The size of the TLS initialization image.
The tlsoffset_m of the object.
A flag indicating whether the object uses a static TLS model.

The thread library uses this information to allocate storage for the initial thread. This storage is initialized, and a dynamic TLS vector for the initial thread is created.

Thread Creation

For the initial thread, and for each new thread created, the thread library allocates a new TLS block for each loaded dynamic object. Blocks can be allocated separately, or as a single contiguous block.

Each thread t, has an associated thread pointer tp_t, which points to the thread control block, TCB. The thread pointer, tp, always contains the value of tp_t for the current running thread.

The thread library then creates a vector of pointers, dtv_t, for the current thread t. The first element of each vector contains a generation number gen_t, which is used to determine when the vector needs to be extended. See Deferred Allocation of Thread-Local Storage Blocks.

Each element remaining in the vector dtv_t,m, is a pointer to the block that is reserved for the TLS belonging to the dynamic object m.

For dynamically loaded, post-startup objects, the thread library defers the allocation of TLS blocks. Allocation occurs when the first reference is made to a TLS variable within the loaded object. For blocks whose allocation has been deferred, the pointer dtv_t,m is set to an implementation-defined special value.

Note –

The runtime linker can group TLS templates for all startup objects so as to share a single element in the vector, dtv _t,1. This grouping does not affect the offset calculations described previously or the creation of the list of initialization records. For the following sections, however, the value of M, the total number of objects, start with the value of 1.

The thread library then copies the initialization images to the corresponding locations within the new block of storage.

Post-Startup Dynamic Loading

A shared object containing only dynamic TLS can be loaded following process startup without limitations. The runtime linker extends the list of initialization records to include the initialization template of the new object. The new object is given an index of m = M + 1. The counter M is incremented by 1. However, the allocation of new TLS blocks is deferred until the blocks are actually referenced.

When a shared object that contains only dynamic TLS is unloaded, the TLS blocks used by that shared object are freed.

A shared object containing static TLS can be loaded following process startup with limitations. Static TLS references can only be satisfied from any remaining backup TLS reservation. See Program Startup. This reservation is limited in size. In addition, this reservation can only provide storage for uninitialized TLS data items.

A shared object that contains static TLS is never unloaded. The shared object is tagged as non-deletable as a consequence of processing the static TLS.

Deferred Allocation of Thread-Local Storage Blocks

In a dynamic TLS model, when a thread t needs to access a TLS block for object m, the code updates the dtv_t and performs the initial allocation of the TLS block. The thread library provides the following interface to provide for dynamic TLS allocation.

typedef struct {
    unsigned long ti_moduleid;
    unsigned long ti_tlsoffset;
} TLS_index;

extern void * __tls_get_addr(TLS_index * ti);     (SPARC and x64)
extern void * ___tls_get_addr(TLS_index * ti);    (32–bit x86)

Note –

The SPARC and 64–bit x86 definitions of this function have the same function signature. However, the 32–bit x86 version does not use the default calling convention of passing arguments on the stack. Instead, the 32–bit x86 version passes its arguments by means of the %eax register which is more efficient. To denote that this alternate calling method is used, the 32–bit x86 function name has three leading underscores in its name.

Both versions of tls_get_addr() check the per-thread generation counter, gen_t, to determine whether the vector needs to be updated. If the vector dtv_t is out of date, the routine updates the vector, possibly reallocating the vector to make room for more entries. The routine then checks to see if the TLS block corresponding to dtv_t,m has been allocated. If the vector has not been allocated, the routine allocates and initializes the block. The routine uses the information in the list of initialization records provided by the runtime linker. The pointer dtv _t,m is set to point to the allocated block. The routine returns a pointer to the given offset within the block.