inline - Oracle Solaris Studio 12.4 Man Pages

Language:

inline(4)

Name

inline, filename.il - Assembly language inline template files

Description

Assembly language call instructions are replaced by a copy of their corresponding function body obtained from the inline template (*.il) file.

Inline template files have a suffix of .il, for example:

% CC foo.il hello.c

Inlining is done by the compiler's code generator.

Usage

Each inlinefile contains one or more labeled assembly language templates of the form:

 
inline-directive
instructions
...
.end

where the instructions constitute an in-line expansion of the named routine. An inline-directive is a command of the form:

.inline   identifier, argsize

This declares a block of code for the routine named by identifier, with argsize as the total size of the routine's arguments, in bytes. The value of argsize is ignored.

Calls to the named routine are replaced by the code in the in-line template.

NOTE: The value of argsize is ignored but the argument and is included for compatibility with legacy compiler versions.

Multiple templates are permitted; matching templates after the first are ignored.

The compiler can change the body of an inline template to optimize it. The directive .volatile within a template file prohibits such optimizations.

Coding Conventions

Inline templates should be coded as expansions of C-compatible procedure calls, with the difference that the return address cannot be depended upon to be in the expected place, since no call instruction will have been executed.

Inline templates must conform to standard Oracle Solaris Studio parameter passing and register usage conventions, as detailed below. They must not call routines that violate these conventions; for example, assembly language routines such as setjmp(3c) may cause problems.

Registers other than the ones mentioned below must not be used or set.

Branch instructions in an in-line template may only transfer to numeric labels (1f, 2b, and so on) defined within the in-line template. No other control transfers are allowed.

Templates do not need return instructions, and should not include them.

Only opcodes and addressing modes generated by Oracle Solaris Studio compilers are guaranteed to work. Binary encodings of instructions are allowed but the correctness of resulting code depends on correctness of rest of the inline template. Binary encodings also prohibit any optimization of the template by the compiler.

Coding Conventions for SPARC Systems

On SPARC, arguments to C functions are passed as-if they were in a parameter array. The array elements are called "slots". The slots are numbered from zero. For 32-bit code, the array has 32-bit elements (slots), and for 64-bit code the array has 64-bit elements (slots). Successive parameters to a routine are passed in successive slots of the parameter array.

For 32-bit code, the parameter array starts at %fp+68, and the stack and frame pointers are aligned on a 64-bit (8 Byte) boundary. For 64-bit code, the parameter array starts at %fp+BIAS+128, and the stack and frame pointers are aligned on a 128-bit (16 Byte) boundary. Parameters that are passed in registers also have a (unused) memory location corresponding to their slot(s).

Data types that are larger than the slot size are passed in multiple slots. For 32-bit code, doubles and long long's are passed in 2 slots, and they are not aligned, but packed next to the previous parameter slot. For 64-bit code, doubles and long long's occupy just one slot, but long doubles and double complex occupy two slots, and these slots are aligned (slot # % 2 == 0), skipping a slot if necessary for alignment.

The first six slots of the parameter array are passed in registers. For 32-bit code, these slots always go into the lower 32-bits of registers %o0 to %o5.

For 64-bit code, these 6 slots go into the full 64-bits of registers %o0 to %o5 if they are integer types. Float, double, long double types are passed in the double register, %d0 to %d10, corresponding to slots 0 to 5. Float complex, double complex, and long double complex types are passed as though there were just two parameters of their base type. The imaginary types are passed the same as the plain float types. For 64-bit code, float, double, and long doubles in slots 6-31 are passed in registers %d12 to %d62. Structures and unions passed by value are more complicated, and not recommended for inline templates.

Functions that return an integer value return it in %o0 or %o0 and %o1. For 32-bit code, long long's are returned with the upper 32-bits in %o0 and the lower 32-bit in %o1.

Functions that return a floating-point or complex value return it in some subset of %f0, %f1, %d0, %d2, %d4, and %d6.

Registers %o0-%o5 and %f0-%f31 may be used as temporaries.

Integral and single-precision floating-point arguments are 32-bit aligned.

Double-precision floating-point arguments are guaranteed to be 64-bit aligned if their offsets are multiples of 8.

Each control-transfer instruction (branches and calls) must be immediately followed by a nop.

Call instructions must include an extra (final) argument which indicates the number of registers used to pass parameters to the called routine.

Note that for SPARC systems, the instruction following an expanded 'call' is deleted.

Notes for s86/x64 Platforms

Programs compiled with -xarch set to sse, sse2, sse2a, or sse3 and beyond must be run only on platforms that provide these extensions and features.

This warning extends also to programs that employ .il inline assembly language functions or __asm() assembler code that utililize extended features.

If you compile and link in separate steps, always link using the compiler and with same -xarch setting to ensure that the correct startup routine is linked.

Coding Conventions for 32-bit x86 Systems

Arguments are passed on the stack. Since no call instruction was issued, the first argument is at (%esp), the second argument is at 4 (%esp), etc. Integer results of 32 bits or less are returned in %eax, 64-bit integer results are returned in %edx:%eax. Floating point results are returned in %st(0).

The code may use registers %eax, %ecx and %edx. The values in any other registers must be preserved. The floating point stack will be empty at the start of the inline expansion template, and must be empty (except for a returned floating point value) at the end.

Coding Conventions for x64 Platforms

Arguments are passed according to their classification. The classification includes integer-, sse- and memory-arguments.

Arguments of types (signed and unsigned) _Bool, char, short, int, long, long long and pointers are integer arguments. Arguments of aggregate types (struct,union,array) of size less than or equal to 16 bytes and that contain aligned members of types _Bool, char, short, int, long, long long and pointers are also integer.

Arguments of types float and double are sse arguments. Arguments of aggregate types of size less than or equal to 16 bytes and that contain aligned members of types float and double are also sse.

Arguments of types long double and of aggregate types of size greater than 16 bytes, or with unaligned members are memory arguments.

Integer arguments are passed in integer registers by the next sequence: %rdi, %rsi, %rdx, %rcx, %r8 and %r9. One integer argument of aggregate type can hold up to 2 integer registers. If the number of integer arguments is greater than 6, the 7th and next integer arguments are considered as memory arguments.

Sse arguments are passed in sse registers in the order from %xmm0 to %xmm7. One sse argument of aggregate type can hold up to 2 sse registers, each sse register holds up to 8 bytes of argument. For example, argument of type double complex is passed in 2 consequent see registers, argument of type float complex is passed in 1 see register. If the number of sse arguments is greater than 8, the 9th and next sse arguments are considered as memory arguments.

Integer and sse arguments are numbered independently.

Memory arguments are passed on the stack in order from right to left how they appear in function arguments list. Each argument on stack is aligned according to its size, on 8 if size is less or equal to 8, on 16 otherwise. at the start of the inline expansion template stack is aligned on 16.

Since no call instruction was issued, the first memory argument is at (%rsp), the second argument is at 8(%rsp) or at 16(%rsp) depending on the first memory argument size and the second memory argument alignment, etc.

Returning values are classified in the same way as arguments.

Integer results of 8 bytes or less are returned in %rax, integer results of 9 to 16 bytes are returned in %rdx:%rax.

Sse results are returned depending on their size too, in %xmm0 or in %xmm1:%xmm0.

Results of type long double are returned in %st(0).

If returning value is of type long double complex, the real part of the value is returned in %st0 and the imaginary part in %st1.

For memory results the caller provides space for the return value and passes the address of this storage in %rdi as if it were the first argument to the function. In effect, this address becomes a hidden first argument. On return %rax will contain the address that has been passed in by the caller in %rdi.

The code may not change register %rbp. The floating point stack will be empty at the start of the inline expansion template, and must be empty (except for a returned floating point value) at the end.

In addition to %rbp, the values in registers %rbx and %r12-%r15 must be preserved across the inlined code.

Examples

Please review libm.il or vis.il for examples. You can find a version of these libraries that is specific to each supported architecture under the compiler's lib/ directory.