Linker and Libraries Guide

Chapter 8 Thread-Local Storage

The compilation environment supports the declaration of thread-local data. This data is sometime referred to as thread-specific, or thread-private data, but more typically by the acronym TLS. By declaring variables to be thread-local, the compiler automatically arranges for these variables to be allocated on a per-thread basis.

The built-in support for this feature serves three purposes:

C/C++ Programming Interface

Variables are declared thread-local using the __thread keyword, as in the following examples:

__thread int i;
__thread char *p;
__thread struct state s;

During loop optimizations, the compiler may choose to create thread-local temporaries as needed.

Applicability

The __thread keyword may be applied to any global, file-scoped static, or function-scoped static variable. It has no effect on automatic variables, which are always thread-local.

Initialization

In C++, a thread-local variable may not be initialized if the initialization requires a static constructor. Otherwise, a thread-local variable may be initialized to any value that would be legal for an ordinary static variable.

No variable, thread-local or otherwise, may be statically initialized to the address of a thread-local variable.

Binding

Thread-local variables may be declared and referenced externally, and they are subject to the same interposition rules as normal symbols.

Dynamic loading restrictions

A shared library can be dynamically loaded during process startup, or after process startup via lazy loading, filters, or dlopen(3DL). A shared library containing a reference to a thread-local variable, may be loaded post-startup if every translation unit containing the reference is compiled with a dynamic TLS model.

Static TLS models generates faster code. However, code compiled to use this model cannot reference thread-local variables in post-startup dynamically loaded libraries. A dynamic TLS model is able to reference all TLS. These models are described in Thread-Local Storage Access Models.

Address-of operator

The address-of operator, &, can be applied to a thread-local variable. This operator is evaluated at runtime, and returns the address of the variable within the current thread. The address obtained by this operator may be used freely by any thread in the process as long as the thread that evaluated the address remains in existence. When a thread terminates, any pointers to thread-local variables in that thread become invalid.

When dlsym(3DL) is used to obtain the address of a thread-local variable, the address returned is the address of the instance of that variable in the thread that called dlsym().

Thread-Local Storage Section

Separate copies of thread-local data, allocated at compile-time, must be associated with individual threads of execution. To provide this data, TLS sections are used to specify the size and initial contents.

The compilation environment allocates TLS in sections identified with the SHF_TLS flag. These sections provide initialized and uninitialized TLS based on how the storage is declared:

The uninitialized section is allocated immediately following any initialized sections, subject to padding for proper alignment. Together, the combined sections form a TLS template that is used to allocate TLS whenever a new thread is created.

The initialized portion of this template is called the TLS initialization image. All relocations generated as a result of initialized thread-local variables are applied to this template. These relocated values are then used when a new thread requires the initial values.

TLS symbols have the symbol type STT_TLS. These symbols are assigned offsets relative to the beginning of the TLS template. The actual virtual address associated with these symbols is irrelevant. The address refers only to the template, and not to the per-thread copy of each data item.

In dynamic executables and shared objects, the st_value field of a STT_TLS symbol contains the assigned offset for defined symbols, or zero for undefined symbols.

Several relocations are defined to support access to TLS. See SPARC: Thread-Local Storage Relocation Types and x86: Thread-Local Storage Relocation Types. Symbols of type STT_TLS are only referenced by TLS relocations. TLS relocations only reference symbols of type STT_TLS.

In dynamic executables and shared objects, a PT_TLS program entry describes a TLS template, and has the following members:

Table 8–1 ELF PT_TLS Program Header Entry

Member 

Value 

p_offset

File offset of the TLS initialization image

p_vaddr

Virtual memory address of the TLS initialization image

p_paddr

Reserved 

p_filesz

Size of the TLS initialization image

p_memsz

Total size of the TLS template

p_flags

PF_R

p_align

Alignment of the TLS template

Runtime Allocation of Thread-Local Storage

TLS is created at three occasions during the lifetime of a program:

Thread-local data storage is layed out at runtime as illustrated in Figure 8–1.

Figure 8–1 Runtime Storage Layout of Thread-Local Storage

Runtime Thread-Local Storage Layout

Program Startup

At program startup, the runtime system creates TLS for the main thread.

First, the runtime linker logically combines the TLS templates for all loaded dynamic objects, including the dynamic executable, into a single static template. Each dynamic objects's TLS template is assigned an offset within the combined template, tlsoffsetm, as follows:

tlssizem+1 and alignm+1 are the size and alignment, respectively, for the allocation template for dynamic object m (1 <= m <= M, where M is the total number of loaded dynamic objects). The round(offset, align) function returns an offset rounded up to the next multiple of align. The TLS template is placed immediately preceding the thread pointer tpt. Accesses to the TLS data are based off of subtractions from tpt.

Next, the runtime linker computes the total startup TLS allocation size, tlssizeS, which is equal to tlsoffsetM.

The runtime linker then constructs a linked list of initialization records. Each record in this list describes the TLS initialization image for one loaded dynamic object, and contains the following fields:

The thread library uses this information to allocate storage for the initial thread. This storage is initialized, and a dynamic TLS vector for the initial thread is created.

Thread Creation

For the initial thread, and for each new thread created, the thread library allocates a new TLS block for each loaded dynamic object. Blocks may be allocated separately, or as a single contiguous block.

Each thread t has an associated thread pointer tpt, which points to the thread control block, TCB. The thread pointer, tp, always contains the value of tpt for the current running thread.

The thread library then creates a vector of pointers, dtvt, for the current thread t. The first element of each vector contains a generation number gent, which is used to determine when the vector needs to be extended. See Deferred Allocation of Thread-Local Storage Blocks.

Each remaining element in the vector, dtvt,m, is a pointer to the block reserved for the TLS belonging to dynamic object m.

For dynamically loaded, post-startup objects, the thread library defers the allocation of TLS blocks. This allocation occurs when the first reference is made to a TLS variable within the loaded object. All references to TLS defined in a post-startup, dynamically loaded object, must use a dynamic TLS model. For blocks whose allocation has been deferred, the pointer dtvt,m is set to an implementation-defined special value.


Note –

The runtime linker may group TLS templates for all startup objects such that they share a single element in the vector, dtvt,1. This does not affect the offset calculations described above or the creation of the list of initialization records. For the following sections, however, the value of M, the total number of objects, start with the value of 1.


The thread library then copies the initialization images to the corresponding locations within the new block of storage.

Post-Startup Dynamic Loading

When a shared library containing TLS is loaded following process startup, the runtime linker extends the list of initialization records to include the initialization template of new library. The new object is given an index of m = M + 1, and the counter M is incremented by one. However, the allocation of new TLS blocks is deferred until they are actually referenced.

When a library containing TLS is unloaded, the TLS blocks used by that library are freed.

Deferred Allocation of Thread-Local Storage Blocks

In a dynamic TLS model, when a thread t needs to access a TLS block for object m, the code updates the dtvt and performs the initial allocation of the TLS block. The thread library provides the following interface to provide for dynamic TLS allocation:

typedef struct {
    unsigned long ti_moduleid;
    unsigned long ti_tlsoffset;
} TLS_index;

extern void * __tls_get_addr(TLS_index * ti);     (SPARC)
extern void * ___tls_get_addr(TLS_index * ti);    (x86)

Note –

The SPARC and x86 definitions of this function have the same function signature. However, the x86 version does not use the default x86 calling convention of passing arguments on the stack. Instead, the x86 version passes it's argument via the %eax register which is more efficient. To denote that this alternate calling method is used, the x86 function name has three leading underscores in its name.


Both versions of tls_get_addr() check the per-thread generation counter, gent, to determine whether the vector needs to be updated. If the vector dtvt is out of date, the routine updates the vector, possibly reallocating it to make room for more entries. The routine then checks to see if the TLS block corresponding to dtvt,m has been allocated. If it has not been allocated, the routine allocates and initializes the block, using the information in the list of initialization records provided by the runtime linker. The pointer dtvt,m is set to point to the allocated block. The routine returns a pointer to the given offset within the block.

Thread-Local Storage Access Models

Each TLS reference follows one of the following access models. These models are listed from the most general, but least optimized, to the fastest, but most restrictive.

General Dynamic (GD) - dynamic TLS

This model allows reference of all TLS variables, from either a shared object or a dynamic executable. This model also supports the deferred allocation of a TLS block when it is first referenced from a specific thread.

Local Dynamic (LD) - dynamic TLS of local symbols

This model is a optimization of the GD model. If the compiler determines that a variable is bound locally, or protected, within the dynamic object being built, it instructs the link-editor to statically bind the dynamic tlsoffset and use this model. This provides a performance benefit over the GD model. Only one call to tls_get_addr() is required per function, to determine the address of dtv0,m. The dynamic TLS offset, bound at link-edit time, is added to the dtv0,m address for each reference.

Initial Executable (IE) - static TLS with assigned offsets

This model can only reference TLS variables which are available as part of the initial static TLS template. This template is composed of all TLS blocks available at process startup. In this model, the thread pointer-relative offset for a given variable x is stored in the global offset table entry for x. This model can not reference TLS variables from shared libraries loaded after initial process startup, such as via lazy loading, filters, or dlopen(3DL). This model can not access TLS blocks which use deferred allocation.

Local Executable (LE )- static TLS

This model can only reference TLS variables which are part of the TLS block of the dynamic executable itself. The link-editor calculates the thread pointer-relative offsets statically, without the need for dynamic relocations, or the extra reference to the global offset table. This model can not be used to reference variables outside of the dynamic executable.

The link-editor can transition code from the more general access models to the more optimized models, if it is determined appropriate to do so. This transitioning is achievable through the use of unique TLS relocations. These relocations, not only request updates be performed, but identify which TLS access model is being used.

Knowing the TLS access model, and the type of object being created, allows the link-editor to perform translations. For example, if a relocatable object using the GD access model is being linked into a dynamic executable, the link-editor can transition the references using the IE or LE access models, as appropriate. The relocations required for the model are then performed.

The following diagram illustrates the different access models, and when one model can be transitioned from one to the other.

Figure 8–2 Thread-Local Storage Access Models and Transitions

Thread-Local Storage Access Models and Transitions

SPARC: Thread-Local Variable Access

On SPARC, the following code sequence models are available for accessing thread-local variables.

SPARC: 32-bit and 64-bit General Dynamic (GD)

This code sequence is the most general, and can be included in both shared objects and dynamic executables. This code sequence can also reference an external TLS variable in either a shared object or dynamic executable.

Table 8–2 SPARC: 32-bit and 64-bit General Dynamic Thread-Local Variable Access Codes

Code Sequence 

Initial Relocations 

Symbol 

# %l7 - initialized to GOT pointer

0x00 sethi %hi(@dtlndx(x)), %o0
0x04 add   %o0,%lo(@dtlndx(x)),%o0
0x08 add   %l7, %o0, %o0
0x0c call  x@TLSPLT
 
# %o0 - contains address of TLS variable
 
 
R_SPARC_TLS_GD_HI22
R_SPARC_TLS_GD_LO22
R_SPARC_TLS_GD_ADD
R_SPARC_TLS_GD_CALL
 
 
x
x
x
x
 

Outstanding Relocations: 32-bit 

Symbol 

GOT[n]
GOT[n + 1]
R_SPARC_TLS_DTPMOD32
R_SPARC_TLS_DTPOFF32
x
x
 

Outstanding Relocations: 64-bit 

Symbol 

GOT[n]
GOT[n + 1]
R_SPARC_TLS_DTPMOD64
R_SPARC_TLS_DTPOFF64
x
x

The sethi, and add instructions generate R_SPARC_TLS_GD_HI22 and R_SPARC_TLS_GD_LO10 relocations respectively. These relocations instruct the link-editor to allocate space in the global offset table to hold a TLS_index structure for variable x. The link-editor processes this relocation by substituting the GOT-relative offset for the new GOT entry.

Since the load object index and TLS block index for x are not known until runtime, the link-editor places the R_SPARC_TLS_DTPMOD32 and R_SPARC_TLS_DPTOFF32 relocations against the GOT for processing by the runtime linker.

The add instruction causes the generation of the R_SPARC_TLS_GD_ADD relocation. This relocation is used only if the GD code sequence is changed to another sequence by the link-editor.

The call instruction generates the R_SPARC_TLS_GD_CALL relocation. This relocation instructs the link-editor to bind the call to the __tls_get_addr() function, and associates the call instruction with the GD code sequence.


Note –

The add instruction must appear before the call instruction. It cannot be placed into the delay slot for the call. This is required as the code-transformations that can occur later require a known order.

The register used as the GOT-pointer for the add instruction tagged by the R_SPARC_TLS_GD_ADD relocation, must be the first register in the add instruction. This permits the link-editor to identify the GOT-pointer register during a code transformation.


SPARC: 32-bit and 64-bit Local Dynamic (LD)

This code sequence can be used in either a shared object or dynamic executable. This sequence is used when referencing a TLS variable bound within the same object as the reference. Because the dynamic tlsoffset can be bound at link-edit time, only one call to __tls_get_addr() is required per function call for all symbols referenced via the LD code sequence.

Table 8–3 SPARC: 32-bit and 64-bit Local Dynamic Thread-Local Variable Access Codes

Code Sequence 

Initial Relocations 

Symbol 

# %l7 - initialized to GOT pointer

0x00 sethi %hi(@tmndx(x1)), %o0
0x04 add   %o0,%lo(@tmndx(x1)),%o0
0x08 add   %l7, %o0, %o0
0x0c call  x@TLSPLT

# %o0 - contains address of TLS block of current object

0x10 sethi %hi(@dtpoff(x1)), %l1
0x14 xor   %l1, %lo(@dtpoff(x1)), %l1
0x18 add   %o0, %l1, %l1

# %l1 - contains address of local TLS variable x1

0x20 sethi %hi(@dtpoff(x2)), %l2
0x24 xor   %l2, %lo(@dtpoff(x2)), %l2
0x28 add   %o0, %l2, %l2

# %l2 - contains address of local TLS variable x2
 
 
R_SPARC_TLS_LDM_HI22
R_SPARC_TLS_LDM_LO10
R_SPARC_TLS_LDM_ADD
R_SPARC_TLS_LDM_CALL
 
 
 
R_SPARC_TLS_LDO_HIX22
R_SPARC_TLS_LDO_LOX10
R_SPARC_TLS_LDO_ADD
 
 
 
R_SPARC_TLS_LDO_HIX22
R_SPARC_TLS_LDO_LOX10
R_SPARC_TLS_LDO_ADD
 
 
x1
x1
x1
x1
 
 
 
x1
x1
x1
 
 
 
x2
x2
x2
 

Outstanding Relocations: 32-bit 

Symbol 

GOT[n]
GOT[n + 1]
R_SPARC_TLS_DTPMOD32
<none>
x1
 

Outstanding Relocations: 64-bit 

Symbol 

GOT[n]
GOT[n + 1]
R_SPARC_TLS_DTPMOD64
<none>
x1

The first sethi and add instructions generate R_SPARC_TLS_LDM_HI22 and R_SPARC_TLS_LDM_LO10 relocations respectively. These relocations instruct the link-editor to allocate space in the global offset table to hold a TLS_index structure for the current object. The link-editor processes this relocation by substituting the GOT-relative offset for the new GOT entry.

Since the load object index is not known until runtime, a R_SPARC_TLS_DTPMOD32 relocation is created, and the ti_tlsoffset field of the TLS_index structure is zero filled.

The second add and the call instruction are tagged with the R_SPARC_TLS_LDM_ADD and R_SPARC_TLS_LDM_CALL relocations respectively.

The following sethi and or instructions generate the R_SPARC_LDO_HIX22 and R_SPARC_TLS_LDO_LOX10 relocations, respectively. The TLS offset for each local symbol is known at link-edit time, therefore these values are filled in directly. The add instruction is tagged with the R_SPARC_TLS_LDO_ADD relocation.

When a procedure references more then one local symbol, the compiler generates code to obtain the base address of the TLS block once. This base address is then used to calculate the address of each symbol without a separate library call.


Note –

The register containing the TLS object address in the add instruction tagged by the R_SPARC_TLS_LDO_ADD must be the first register in the instruction sequence. This permits the link-editor to identify the register during a code transformation.


SPARC: 32-bit Initial Executable (IE)

This code sequence can only be used in a dynamic executable. It can reference a TLS variable defined in either the executable or any shared libraries loaded at process startup. This model can not reference TLS variables from shared libraries loaded after process startup.

Table 8–4 SPARC: 32-bit Initial Executable Thread-Local Variable Access Codes

Code Sequence 

Initial Relocations 

Symbol 

# %l7 - initialized to GOT pointer

0x00 sethi %hi(@tpoff(x)), %o0
0x04 or    %o0,%lo(@tpoff(x)),%o0
0x08 ld    [%l7 + %o0], %o0
0x0c add   %g7, %o0, %o0
 
# %o0 - contains address of TLS variable
 
 
R_SPARC_TLS_IE_HI22
R_SPARC_TLS_IE_LO10
R_SPARC_TLS_IE_LD
R_SPARC_TLS_IE_ADD
 
 
x
x
x
x
 

Outstanding Relocations 

Symbol 

GOT[n]
R_SPARC_TLS_TPOFF32
x

The sethi and or instructions generate R_SPARC_TLS_IE_HI22 and R_SPARC_TLS_IE_LO10 relocations, respectively. These relocations instruct the link-editor to create space in the global offset table to store the static TLS offset for symbol x. A R_SPARC_TLS_TPOFF32 relocation is left outstanding against the GOT for the runtime linker to fill in with the negative static TLS offset for symbol x. The ld and the add instructions are tagged with the R_SPARC_TLS_IE_LD and R_SPARC_TLS_IE_ADD relocations respectively.


Note –

The register used as the GOT-pointer for the add instruction tagged by the R_SPARC_TLS_IE_ADD relocation must be the first register in the instruction. This permits the link-editor to identify the GOT-pointer register during a code transformation.


SPARC: 64-bit Initial Executable (IE)

This sequence is identical to the SPARC 32–bit sequence, except that an ldx instruction is used to load the 64–bit address instead of an ld instruction.

Table 8–5 SPARC: 64-bit Initial Executable Thread-Local Variable Access Codes

Code Sequence 

Initial Relocations 

Symbol 

# %l7 - initialized to GOT pointer

0x00 sethi %hi(@tpoff(x)), %o0
0x04 or    %o0,%lo(@tpoff(x)),%o0
0x08 ldx   [%l7 + %o0], %o0
0x0c add   %g7, %o0, %o0
 
# %o0 - contains address of TLS variable
 
 
R_SPARC_TLS_IE_HI22
R_SPARC_TLS_IE_LO10
R_SPARC_TLS_IE_LD
R_SPARC_TLS_IE_ADD
 
 
x
x
x
x
 

Outstanding Relocations 

Symbol 

GOT[n]
R_SPARC_TLS_TPOFF64
x

SPARC: 32-bit and 64-bit Local Executable (LE)

This code sequence can only be used from within a dynamic executable to reference a TLS variable defined within the executable. If this is the case, the static tlsoffset is known at link-edit time and no runtime relocations are required.

Table 8–6 SPARC: 32-bit and 64-bit Local Executable Thread-Local Variable Access Codes

Code Sequence 

Initial Relocations 

Symbol 

0x00 sethi %hix(@tpoff(x)), %o0
0x04 xor   %o0,%lo(@tpoff(x)),%o0
0x08 add   %g7, %o0, %o0
 
# %o0 - contains address of TLS variable
R_SPARC_TLS_LE_HIX22
R_SPARC_TLS_LE_LOX10
<none>
x
x

The sethi and or instructions generate R_SPARC_TLS_LE_HIX22 and R_SPARC_TLS_LE_LOX10 relocations respectively. The link-editor binds these relocations directly to the static TLS offset for the symbol defined in the executable. No relocation processing is required at runtime.

SPARC: Thread-Local Storage Relocation Types

The TLS relocations listed in the following table are defined for SPARC. Descriptions in the table use the following notation:

@dtlndx(x)

Allocates two contiguous entries in the global offset table to hold a TLS_index structure. This information is passed to __tls_get_addr(). The instruction referencing this entry is bound to the address of the first of the two GOT entries.

@tmndx(x)

Allocates two contiguous entries in the global offset table to hold a TLS_index structure. This information is passed to __tls_get_addr(). The ti_tlsoffset field of this structure is set to 0, and the ti_moduleid is filled in at runtime. The call to __tls_get_addr() returns the starting offset of the dynamic TLS block.

@dtpoff(x)

Calculates the tlsoffset relative to the TLS block.

@tpoff(x)

Calculates the negative tlsoffset relative to the static TLS block. This value is added to the thread-pointer to calculate the TLS address.

@dtpmod(x)

Calculates the object identifier of the object containing symbol S.

Table 8–7 SPARC: Thread-Local Storage Relocation Types

Name 

Value 

Field 

Calculation 

R_SPARC_TLS_GD_HI22

56

T-simm22

@dtlndx(S + A) >> 10

R_SPARC_TLS_GD_LO10

57

T-simm13

@dtlndx(S + A) & 0x3ff

R_SPARC_TLS_GD_ADD

58

None 

See R_SPARC_R_SPARC_TLS_GD_ADD

R_SPARC_TLS_GD_CALL

59

V-disp30

See R_SPARC_R_SPARC_TLS_GD_CALL

R_SPARC_TLS_LDM_HI22

60 

T-simm22

@tmndx(S + A) >> 10

R_SPARC_TLS_LDM_LO10

61

T-simm13

@tmndx(S + A) & 0x3ff

R_SPARC_TLS_LDM_ADD

62 

None 

See R_SPARC_R_SPARC_TLS_LDM_ADD

R_SPARC_TLS_LDM_CALL

63

V-disp30

See R_SPARC_R_SPARC_TLS_LDM_CALL

R_SPARC_TLS_LDO_HIX22

64

T-simm22

@dtpoff(S + A) >> 10

R_SPARC_TLS_LDO_LOX10

65

T-simm13

@dtpoff(S + A) & 0x3ff

R_SPARC_TLS_LDO_ADD

66

None 

See R_SPARC_R_SPARC_TLS_LDO_ADD

R_SPARC_TLS_IE_HI22

67

T-simm22

@got(@tpoff(S + A)) >> 10

R_SPARC_TLS_IE_LO10

68

T-simm13

@got(@tpoff(S + A)) & 0x3ff

R_SPARC_TLS_IE_LD

69

None 

See R_SPARC_R_SPARC_TLS_IE_LD

R_SPARC_TLS_IE_LDX

70

None 

See R_SPARC_R_SPARC_TLS_IE_LDX

R_SPARC_TLS_IE_ADD

71

None 

See R_SPARC_R_SPARC_TLS_IE_ADD

R_SPARC_TLS_LE_HIX22

72

T-imm22

(@tpoff(S + A) ^0xffffffffffffffff) >> 10

R_SPARC_TLS_LE_LOX10

73

T-simm13

(@tpoff(S + A) & 0x3ff) | 0x1c00

R_SPARC_TLS_DTPMOD32

74

V-word32

@dtpmod(S + A)

R_SPARC_TLS_DTPMOD64

75

V-word64

@dtpmod(S + A)

R_SPARC_TLS_DTPOFF32

76

V-word32

@dtpoff(S + A)

R_SPARC_TLS_DTPOFF64

77

V-word64

@dtpoff(S + A)

R_SPARC_TLS_TPOFF32

78

V-word32

@tpoff(S + A)

R_SPARC_TLS_TPOFF64

79

V-word64

@tpoff(S + A)

Some relocation types have semantics beyond simple calculations:

R_SPARC_TLS_GD_ADD

This relocation tags the add instruction of a GD code sequence. The register used for the GOT-pointer is the first register in the sequence. The instruction tagged by this relocation comes before the call instruction tagged by the R_SPARC_TLS_GD_CALL relocation. This is used to transition between TLS models at link-edit time.

R_SPARC_TLS_GD_CALL

This relocation is handled as if it were a R_SPARC_WPLT30 relocation referencing the __tls_get_addr() function. This relocation is part of a GD code sequence.

R_SPARC_LDM_ADD

This relocation tags the first add instruction of a LD code sequence. The register used for the GOT-pointer is the first register in the sequence. The instruction tagged by this relocation comes before the call instruction tagged by the R_SPARC_TLS_GD_CALL relocation. This is used to transition between TLS models at link-edit time.

R_SPARC_LDM_CALL

This relocation is handled as if it were a R_SPARC_WPLT30 relocation referencing the __tls_get_addr() function. This relocation is part of a LD code sequence.

R_SPARC_LDO_ADD

This relocation tags the final add instruction in a LD code sequence. The register which contains the object address computed in the initial part of the code sequence is the first register in this instruction. This permits the link-editor to identify this register for code transformations.

R_SPARC_TLS_IE_LD

This relocation tags the ld instruction in the 32–bit IE code sequence. This is used to transition between TLS models at link-edit time.

R_SPARC_TLS_IE_LDX

This relocation tags the ldx instruction in the 64–bit IE code sequence. This is used to transition between TLS models at link-edit time.

R_SPARC_TLS_IE_ADD

This relocation tags the add instruction in the IE code sequence. The register that is used for the GOT-pointer is the first register in the sequence.

x86: Thread-Local Variable Access

On x86, the following code sequence models are available for accessing TLS

x86: General Dynamic (GD)

This code sequence is the most general, and can be included both in a shared objects and dynamic executables. This code sequence can reference an external TLS variable in either a shared object or dynamic executable.

Table 8–8 x86: General Dynamic Thread-Local Variable Access Codes

Code Sequence 

Initial Relocations 

Symbol 

0x00 leal  x@tlsgd(,%ebx,1),%eax
0x07 call  x@tlsgdplt  

# %eax - contains address of TLS variable
R_386_TLS_GD
R_386_TLS_GD_PLT
x
x
 

Outstanding Relocations 

Symbol 

GOT[n]
GOT[n + 1]
R_386_TLS_DTPMOD32
R_386_TLS_DTPOFF32
x

The leal instruction generates a R_386_TLS_GD relocation which instructs the link-editor to allocate space in the global offset table to hold a TLS_index structure for variable x. The link-editor processes this relocation by substituting the GOT-relative offset for the new GOT entry.

Since the load object index and TLS block index for x are not known until runtime, the link-editor places the R_386_TLS_DTPMOD32 and R_386_TLS_DTPOFF32 relocations against the GOT for processing by the runtime linker. The address of the generated GOT entry is loaded into register %eax for the call to ___tls_get_addr().

The call instruction causes the generation of the R_386_TLS_GD_PLT relocation. This instructs the link-editor to bind the call to the ___tls_get_addr() function and associates the call instruction with the GD code sequence.

The call instruction must immediately follow the leal instruction. This is a required to permit the code transformations.

x86: Local Dynamic (LD)

This code sequence can be used in either a shared object or dynamic executable. This sequence is used when referencing a TLS variable bound to the same object as the reference. Because the dynamic tlsoffset can be bound at link-edit time, only one call to ___tls_get_addr() is required per function call for all symbols which are referenced via the LD code sequence.

Table 8–9 x86: Local Dynamic Thread-Local Variable Access Codes

Code Sequence 

Initial Relocations 

Symbol 

0x00 leal  x1@tlsldm(%ebx), %eax
0x06 call  x@tlsldmplt

# %eax - contains address of TLS block of current object

0x10 leal x1@dtpoff(%eax), %edx

# %edx - contains address of local TLS variable x1

0x20 leal x2@dtpoff(%eax), %edx

# %edx - contains address of local TLS variable x2
R_386_TLS_LDM
R_386_TLS_LDM_PLT
 
 
 
R_386_TLS_LDO_32
 
 
 
R_386_TLS_LDO_32
x1
x1
 
 
 
x1
 
 
 
x2
 

Outstanding Relocations 

Symbol 

GOT[n]
GOT[n + 1]
R_386_TLS_DTPMOD32
<none>
x

The first leal instruction generates a R_386_TLS_LDM relocation that instructs the link-editor to allocate space in the global offset table to hold a TLS_index structure for the current object. The link-editor process this relocation by substituting the GOT-relative offset for the new linkage table entry.

Since the load object index is not known until runtime, aR_386_TLS_DTPMOD32 relocation is created, and the ti_tlsoffset field of the structure is zero filled. The call instruction is tagged with the R_386_TLS_LDM_PLT relocation.

The TLS offset for each local symbol is known at link-edit time so the link-editor fills these values in directly.

When a procedure references more then one local symbol, the compiler generates code to obtain the base address of the TLS block once. This base address is then used to calculate the address of each symbol without a separate library call.

x86: Initial Executable (IE)

There are two code-sequences for the IE model. One sequence is for position independent code which uses a GOT-pointer. The other sequence is for position dependent code which does not use a GOT-pointer. Both of these code sequences can only be used in a dynamic executable. These code sequences can reference a TLS variable defined in either the executable or any of the shared libraries loaded at process startup. This model can not reference TLS variables from shared libraries loaded after process startup.

Table 8–10 x86: Initial Executable, Position Independent, Thread-Local Variable Access Codes

Code Sequence 

Initial Relocations 

Symbol 

0x00 movl  %gs:0, %eax
0x06 addl  x@gotntpoff(%ebx), %eax

# %eax - contains address of TLS variable
<none>
R_386_TLS_GOTIE
 
x
 

Outstanding Relocations 

Symbol 

GOT[n]
R_386_TLS_TPOFF
x

The addl instruction generates a R_386_TLS_GOTIE relocation that instructs the link–editor to create space in the global offset table to store the static TLS offset for symbol x. A R_386_TLS_TPOFF relocation is left outstanding against the GOT table for the runtime linker to fill in with the static TLS offset for symbol x.

Table 8–11 x86: Initial Executable, Position Dependent, Thread-Local Variable Access Codes

Code Sequence 

Initial Relocations 

Symbol 

0x00 movl  %gs:0, %eax
0x06 addl  x@indntpoff, %eax

# %eax - contains address of TLS variable
<none>
R_386_TLS_IE
 
x
 

Outstanding Relocations 

Symbol 

GOT[n]
R_386_TLS_TPOFF
x

The addl instruction generates a R_386_TLS_IE relocation, that instructs the link-editor to create space in the global offset table to store the static TLS offset for symbol x. The main difference between this sequence and the position independent form, is that the instruction is bound directly to the GOT entry created, instead of via an offset off of the GOT-pointer register. A R_386_TLS_TPOFF relocation is left outstanding against the GOT for the runtime linker to fill in with the static TLS offset for symbol x.

The contents of variable x, rather then the address, can be loaded by embedding the offset directly into the memory reference as shown in the next two sequences.

Table 8–12 x86: Initial Executable, Position Independent, Dynamic Thread-Local Variable Access Codes

Code Sequence 

Initial Relocations 

Symbol 

0x00 movl  x@gotntpoff(%ebx), %eax
0x06 movl  %gs:(%eax), %eax

# %eax - contains address of TLS variable
R_386_TLS_GOTIE
<none>
x
 

Outstanding Relocations 

Symbol 

GOT[n]
R_386_TLS_TPOFF
x

Table 8–13 x86: Initial Executable, Position Independent, Thread-Local Variable Access Codes

Code Sequence 

Initial Relocations 

Symbol 

0x00 movl  x@indntpoff, %ecx
0x06 movl  %gs:(%ecx), %eax

# %eax - contains address of TLS variable
R_386_TLS_IE
<none>
x
 

Outstanding Relocations 

Symbol 

GOT[n]
R_386_TLS_TPOFF
x

In the last sequence, if the %eax register is used instead of the %ecx above, the first instruction may be either 5 or 6 bytes long.

x86: Local Executable (LE)

This code sequence can only be used from within a dynamic executable and referencing a TLS variable defined within the executable. If this is the case the static tlsoffset it known at link-edit time and no runtime relocations are required.

Table 8–14 x86: Local Executable Thread-Local Variable Access Codes

Code Sequence 

Initial Relocations 

Symbol 

0x00 movl %gs:0, %eax
0x06 leal x@ntpoff(%eax), %eax

# %eax - contains address of TLS variable
<none>
R_386_TLS_LE
 
x

The movl instruction generates aR_386_TLS_LE_32 relocation. The link-editor binds this relocation directly to the static TLS offset for the symbol defined in the executable. No processing is required at runtime.

The contents of variable x, rather then the address, can be accessed with the same relocation by using the following instruction sequence.

Table 8–15 x86: Local Executable Thread-Local Variable Access Codes

Code Sequence 

Initial Relocations 

Symbol 

0x00 movl %gs:0, %eax
0x06 movl x@ntpoff(%eax), %eax

# %eax - contains address of TLS variable
<none>
R_386_TLS_LE
 
x

If instead of computing the address of the variable we want to load from it or store in it the following sequence can be used. Note that in this case we use the x@ntpoff expression not as an immediate value, but instead as an absolute address.

Table 8–16 x86: Local Executable Thread-Local Variable Access Codes

Code Sequence 

Initial Relocations 

Symbol 

0x00 movl %gs:x@ntpoff, %eax

# %eax - contains address of TLS variable
R_386_TLS_LE
x

x86: Thread-Local Storage Relocation Types

The TLS relocations listed in the following table are defined for x86. Descriptions in the table use the following notation:

@tlsgd(x)

Allocates two contiguous entries in the GOT to hold a TLS_index structure. This structure is passed to ___tls_get_addr(). The instruction referencing this entry will be bound to the first of the two GOT entries.

@tlsgdplt(x)

This relocation is handled as if it were a R_386_PLT32 relocation referencing the ___tls_get_addr() function.

@tlsldm(x)

Allocates two contiguous entries in the GOT to hold a TLS_index structure. This structure is passed to the ___tls_get_addr(). Theti_tlsoffset field of the TLS_index is set to 0, and the ti_moduleid is filled in at runtime. The call to ___tls_get_addr() returns the starting offset of the dynamic TLS block.

@gotntpoff(x)

Allocates a entry in the GOT, and initializes it with the negative tlsoffset relative to the static TLS block. This is performed at runtime via the R_386_TLS_TPOFF relocation.

@indntpoff(x)

This expression is similar to @gotntpoff, but used in position dependent code. @gotntpoff resolves to a GOT slot address relative to the start of the GOT in the movl or addl instructions. @indntpoff resolves to the absolute GOT slot address.

@ntpoff(x)

Calculates the negative offset of the variable it is added to relative to the static TLS block.

@dtpoff(x)

Calculates the tlsoffset relative to the TLS block. The value is used as an immediate value of an addend and is not associated with a specific register.

@dtpmod(x)

Calculates the object identifier of the object containing symbol S.

Table 8–17 x86: Thread-Local Storage Relocation Types

Name 

Value 

Field 

Calculation 

R_386_TLS_GD_PLT

12

Word32

@tlsgdplt

R_386_TLS_LDM_PLT

13

Word32

@tlsldmplt

R_386_TLS_TPOFF

14

Word32

@ntpoff(S)

R_386_TLS_IE

15

Word32

@indntpoff(S)

R_386_TLS_GOTIE

16

Word32

@gotntpoff(S)

R_386_TLS_LE

17

Word32

@ntpoff(S)

R_386_TLS_GD

18

Word32

@tlsgd(S)

R_386_TLS_LDM

19

Word32

@tlsldm(S)

R_386_TLS_LDO_32

32

Word32

@dtpoff(S)

R_386_TLS_DTPMOD32

35

Word32

@dtpmod(S)

R_386_TLS_DTPOFF32

36

Word32

@dtpoff(S)