In the following example, when we examine the code the compiler generated we see a number of unnecessary loads and stores when all the data could be held in registers.
Calling C program:
int lzd(int);
int a;
int c=0;
int main()
{
for(a=0; a<1000; a++)
{
c=lzd(c);
}
return 0;
}The program is intended to use the Leading Zero Detect (LZD) instruction on the SPARC T4 to do a count of the number of leading zero bits in an integer register. The inline template lzd.il might look like this:
.inline lzd lzd %o0,%o0 .end
Compiling the code with optimization gives the resulting code:
% cc -O -xtarget=T4 -S lzd.c lzd.il
% more lzd.s
...
.L77000018:
/* 0x001c 11 */ lzd %o0,%o0
/* 0x0020 9 */ ld [%i1],%i3
/* 0x0024 11 */ st %o0,[%i2]
/* 0x0028 9 */ add %i3,1,%i0
/* 0x002c */ cmp %i0,999
/* 0x0030 */ ble,pt %icc,.L77000018
/* 0x0034 */ st %i0,[%i1]
...Clearly everything could be held in registers, but the compiler is adding unnecessary loads and stores because it sees the inline template as a call to a function and must load and save registers around a function call it knows nothing about.
But we can insert a #pragma directive to tell the compiler that the routine lzd() has no side effects - meaning that it does not read or write to memory:
#pragma no_side_effect(routine_name)
and it needs to be placed after the declaration of the function. The new C code might look like:
int lzd(int);
#pragma no_side_effect(lzd)
int a;
int c=0;
int main()
{
for(a=0; a<1000; a++)
{
c=lzd(c);
}
return 0;
}Now the generated assembler code for the loop looks much neater:
/* 0x0014 10 */ add %i1,1,%i1
! 11 ! {
! 12 ! c=lzd(c);
/* 0x0018 12 */ lzd %o0,%o0
/* 0x001c 10 */ cmp %i1,999
/* 0x0020 */ ble,pt %icc,.L77000018
/* 0x0024 */ nop