Improving Efficiency of Inlined Functions - SPARC Assembly Language Reference Manual

Language:

7.1.6 Improving Efficiency of Inlined Functions

In the following example, when we examine the code the compiler generated we see a number of unnecessary loads and stores when all the data could be held in registers.

Calling C program:

int lzd(int);

int a;
int c=0;

int main()
{
  for(a=0; a<1000; a++)
  {
    c=lzd(c);
  }
  return 0;
}

The program is intended to use the Leading Zero Detect (LZD) instruction on the SPARC T4 to do a count of the number of leading zero bits in an integer register. The inline template lzd.il might look like this:

.inline lzd
  lzd %o0,%o0
.end

Compiling the code with optimization gives the resulting code:

% cc -O -xtarget=T4 -S lzd.c lzd.il
% more lzd.s
...
                        .L77000018:
/* 0x001c         11 */         lzd     %o0,%o0
/* 0x0020          9 */         ld      [%i1],%i3
/* 0x0024         11 */         st      %o0,[%i2]
/* 0x0028          9 */         add     %i3,1,%i0
/* 0x002c            */         cmp     %i0,999
/* 0x0030            */         ble,pt  %icc,.L77000018
/* 0x0034            */         st      %i0,[%i1]
...

Clearly everything could be held in registers, but the compiler is adding unnecessary loads and stores because it sees the inline template as a call to a function and must load and save registers around a function call it knows nothing about.

But we can insert a #pragma directive to tell the compiler that the routine lzd() has no side effects - meaning that it does not read or write to memory:

#pragma no_side_effect(routine_name)

and it needs to be placed after the declaration of the function. The new C code might look like:

int lzd(int);
#pragma no_side_effect(lzd)

int a;
int c=0;

int main()
{
  for(a=0; a<1000; a++)
  {
    c=lzd(c);
  }
  return 0;
}

Now the generated assembler code for the loop looks much neater:

/* 0x0014         10 */         add     %i1,1,%i1

!   11                !  {
!   12                !    c=lzd(c);

/* 0x0018         12 */         lzd     %o0,%o0
/* 0x001c         10 */         cmp     %i1,999
/* 0x0020            */         ble,pt  %icc,.L77000018
/* 0x0024            */         nop