The code shown in Using the spot Command is a program which has three routines, each of which targets a different kind of events:
The routine fp_routine does floating point computation on three 80MB arrays. The routine will have floating point operations, and also (because of the size of the array) significant amounts of memory traffic, which appears as read and write memory bandwidth consumption.
The routine cache_miss is a test of memory latency. Each pointer chase in the key loop brings in another cacheline. This results in lots of cache misses, and also a significant amount of memory read bandwidth.
The routine tlb_miss is identical to the routine cache_miss. The only difference is how the routine is called. The reason for duplicating the code is to clearly show the location in the code where the events are happening. This routine brings in a new TLB page on every pointer chase in the key loop. So the routine encounters both cache and TLB misses.
#include <stdio.h> #include <stdlib.h> void fp_routine(double *out, double *in1, double *in2,int n) { for (int i=0; i<n; i++) {out[i]=in1[i]+in2[i];} } int** cache_miss(int **array, int size, int step) { for (int i=0; i<size-step; i++){array[i]=(int*)&array[i+step];} for (int i=size-step; i<size; i++) {array[i]=(int*)&array[i-size+step];} int ** cp=(int**)array[0]; for (int i=0; i<size*16; i++) {cp= (int**)*cp;} return cp; } int** tlb_miss(int **array, int size, int step) { for (int i=0; i<size-step; i++){array[i]=(int*)&array[i+step];} for (int i=size-step; i<size; i++) {array[i]=(int*)&array[i-size+step];} int ** cp=(int**)array[0]; for (int i=0; i<size*16; i++) {cp= (int**)*cp;} return cp; } void main() { double * out, *in1, *in2; int **array; out=(double*) calloc(sizeof(double),10*1024*1024); in1=(double*) calloc(sizeof(double),10*1024*1024); in2=(double*) calloc(sizeof(double),10*1024*1024); for (int rpt=0; rpt <100; rpt++) fp_routine(out,in1,in2,10*1024*1024); free(out); free(in1); free(in2); array=(int**)calloc(sizeof(int*),10*1024*1024); cache_miss(array,10*1024*1024,64/sizeof(int*)); tlb_miss(array,10*1024*1024,8192/sizeof(int*)); free (array); }
The program is compiled, using Sun Studio 12, in the following way:
$ cc -g -O -xbinopt=prepare -o test test.c |
The key compiler flags are:
The flag -g generates debug information. This flag is recommended so that the tools are able to attribute time and processor events back to the lines of source that cause them. For C++ programs, the flag -g will disable inlining of some routines. This can have significant performance impact so it is better to use the flag -g0 which generates the debug information without disabling this optimization.
The flag -xbinopt=prepare builds the applications with compiler annotations such that it can later be instrumented to generate the counts of number of calls to routines and number of times that each individual instruction was executed. This flag requires some level of optimization to be enabled, hence the flag -O has been added in this example.