In the following example, when we examine the code the compiler generated we see a number of unnecessary loads and stores when all the data could be held in registers.
Calling C program:
int lzd(int); int a; int c=0; int main() { for(a=0; a<1000; a++) { c=lzd(c); } return 0; }
The program is intended to use the Leading Zero Detect (LZD) instruction on the SPARC T4 to do a count of the number of leading zero bits in an integer register. The inline template lzd.il might look like this:
.inline lzd lzd %o0,%o0 .end
Compiling the code with optimization gives the resulting code:
% cc -O -xtarget=T4 -S lzd.c lzd.il % more lzd.s ... .L77000018: /* 0x001c 11 */ lzd %o0,%o0 /* 0x0020 9 */ ld [%i1],%i3 /* 0x0024 11 */ st %o0,[%i2] /* 0x0028 9 */ add %i3,1,%i0 /* 0x002c */ cmp %i0,999 /* 0x0030 */ ble,pt %icc,.L77000018 /* 0x0034 */ st %i0,[%i1] ...
Clearly everything could be held in registers, but the compiler is adding unnecessary loads and stores because it sees the inline template as a call to a function and must load and save registers around a function call it knows nothing about.
But we can insert a #pragma directive to tell the compiler that the routine lzd() has no side effects - meaning that it does not read or write to memory:
#pragma no_side_effect(routine_name)
and it needs to be placed after the declaration of the function. The new C code might look like:
int lzd(int); #pragma no_side_effect(lzd) int a; int c=0; int main() { for(a=0; a<1000; a++) { c=lzd(c); } return 0; }
Now the generated assembler code for the loop looks much neater:
/* 0x0014 10 */ add %i1,1,%i1 ! 11 ! { ! 12 ! c=lzd(c); /* 0x0018 12 */ lzd %o0,%o0 /* 0x001c 10 */ cmp %i1,999 /* 0x0020 */ ble,pt %icc,.L77000018 /* 0x0024 */ nop