Guidelines for Coding Inline Templates - SPARC Assembly Language Reference Manual

Language:

7.1.3 Guidelines for Coding Inline Templates

SPARC inline assembly code can use only integer registers %o0 to %o5 and floating point registers %f0 to %f31 for temporary values. These registers are referred to as the caller-saved registers. Other registers should not be used. Calls can be made to other routines from the inline template, but these calls are subject to the same constraint.

The compiler will handle most of the SPARC instruction set. If the template utilises only those instructions that the compiler normally generates it will be early inlined (see Late and Early Inlining), and the code will be scheduled optimally. However, if the template utilises instructions that the compiler accepts but does not typically generate (such as VIS instructions or atomics), the code might be late inlined. Consequently, the code might not be optimally scheduled by the compiler, resulting in a possible performance loss.

7.1.3.1 Parameter Passing

Passing parameters between the C/C++ caller program and the assembly language template code must obey the parameter passing rules defined by the target architecture, which are different for 32-bit and 64-bit code. Parameter passing is described by the SPARC ABI. See the SPARC International Technical Documents page. SCD 2.3 describes Version 8 (32-bit code) and SCD 2.4.1 describes Version 9 (64-bit code).

Entering the template code, arguments will be passed in %o0 to %o5 and will continue on the stack. For 32-bit code, the offset is [%sp+0x5c] and %sp is guaranteed to be 64-byte aligned; for 64-bit code, the offset is [%sp+0x8af]. (For 64-bit code, the stack bias is %sp+2047, which is aligned on a 16-byte boundary.)

For example (function prototype in C followed by assembler template equivalent):

int add_up(int v1,int v2, int v3, int v4, int v5, int v6, int v7);

/*Add up 7 integer parameters; last one will be passed on stack*/
.inline add_up,28
  add %o0,%o1,%o0
  ld [%sp+0x5c],%o1
  add %o2,%o3,%o2
  add %o4,%o5,%o4
  add %o0,%o1,%o0
  add %o2,%o4,%o2
  add %o0,%o2,%o0
.end

The same example for 64-bit code, but note that when a 32-bit int register is passed on the stack, the full 64 bits of the register are saved:

int add_up(int v1,int v2, int v3, int v4, int v5, int v6, int v7);

/*Add up 7 integer parameters; last one will be passed on stack*/
.inline add_up,28
  add %o0,%o1,%o0
  ldx [%sp+0x8af],%o1
  add %o2,%o3,%o2
  add %o4,%o5,%o4
  add %o0,%o1,%o0
  add %o2,%o4,%o2
  add %o0,%o2,%o0
.end

For 32-bit floating point, values will be passed in the integer registers. For 64-bit code, they will be passed in the floating point registers.

32-bit floating-point passing by value example:

double sum_val(double a, double b);

/*sum of two doubles by value*/
.inline sum_val,16
  st   %o0,[%sp+0x48]
  st   %o1,[%sp+0x4c]
  ldd  [%sp+0x48],%f0
  st   %o2,[%sp+0x48]
  st   %o3,[%sp+0x4c]
  ldd  [%sp+0x48],%f2
  faddd %f0,%f2,%f0
.end

64-bit floating-point passing by value example:

double sum(double a, double b);

/*sum of two doubles 64-bit calling convention*/
.inline sum,16
  faddd %f0,%f2,%f0
.end

Values passed in memory, single-precision floating point values, and integers are guaranteed to be 4-byte aligned. Double-precision floating point values will be 8-byte aligned if their offset in the parameters is a multiple of 8-bytes.

Integer return values are passed in %o0. Floating point return values are passed in %f0/%f1 (single-precision values in %f0, double-precision values in the register pair %f0,%f1).

For 32-bit code, there are two ways of passing the floating point registers. The first way is to pass them by value, and the second is to pass them by reference. Either way, the compiler will do its best to optimize out the load and store instructions. It is often more successful at doing this if the floating point parameters are passed by reference.

Here is an example of 32-bit by reference parameter passing:

double sum_ref(double *a, double *b);

/*sum of two doubles by reference*/
.inline sum_ref,16
  ldd [%o0],%f0
  ldd [%o1],%f2
  faddd %f0,%f2,%f0
.end

7.1.3.2 Stack Space

Sometimes, it is necessary to store variables to the stack in order to load them back later; this is the case for moving between the int and fp registers. The best way of doing this is to use the space already set aside for parameters that are passed into the function.

For example, in the 32-bit floating-point passing by value code shown above, the location %sp+0x48 is 8-byte aligned (%sp is 8-byte aligned), and it corresponds to the place where the second and third 4-byte integer parameters would be stored if they were passed on the stack. (Note that the first parameter would be stored at a non-8-byte boundary.)

7.1.3.3 Branches and Calls

Branching and calls within template code is allowed. Every branch or call must be followed by a nop instruction to fill the branch delay slot. It is possible to put instructions in the delay slot of branches, which can be useful if you wish to use the processor support for annulled instructions, but doing so will cause the code to be late-inlined (described in Late and Early Inlining) and may result in sub-optimal performance.

Call instructions must have an extra last argument that indicates the number of registers used to pass arguments in the call parameters. In general, you should avoid inlining call instructions.

The destinations of branches must be indicated with a number, and the branch instructions should use this number to indicate the appropriate destination together with an f for a forward branch or a b for a backward branch.

Here is an example of using branches in an inline template:

int is_true(int i);
/*return whether true*/
.inline is_true,4
   cmp  %o0,%g0
   bne  1f
   nop
   mov  1,%o0
   ba   2f
   nop
1:
   mov  0,%o0
2:
.end