This chapter covers how to compile a program to get the most information from the spot command, and how to run the resulting application under the SPOT software.
You can run the spot command from either the directory where it is installed, or by adding the installation directory (by default, /opt/SUNWspro/extra/bin) to your system’s $PATH environmental variable.
There are two ways you can run the spot command:
spot can be given a command and arguments and will then gather data by executing that command multiple times
spot can attach to an existing process and generate a report on that process.
The two command lines are:
To run the application multiple times and produce the report:
$ spot application parameters |
Where application is the name of the application being investigated and parameters is the application arguments.
To attach to a running process and produce the report for that process.
$ spot -P pid |
Where pid is the process ID number of the running application.
There are a number of command-line options:
The flag -X requests extended statistics. The SPOT report will include system wide bandwidth consumption data and system wide trap statistics (if the user has the root permission necessary to gather the information). It is recommended that a dedicated system is used when gathering this data. The report will also profile the application on the top four processor events, indicating where these events happen in the application.
The flag -d specifies a directory where the SPOT report should be placed. By default the spot report is placed in the current directory.
The flag -o specifies the name that should be used for the sub-directory containing the SPOT report. By default the directory is called spot_run followed by a unique number. The -o and -d flags work together to specify the location and name of the subdirectory that contains the SPOT report.
The flag -T is appropriate only when spot is attaching to a process. In this case it specifies how long each tool should attach to the process. The default duration is 60 seconds of sampling for each set of results.
The flag -h will print help information listing all the flags.
Each of the tools called by spot can be invoked stand-alone. If invoked stand-alone, the data collected by these tools will not be in HTML format.
The code shown in Using the spot Command is a program which has three routines, each of which targets a different kind of events:
The routine fp_routine does floating point computation on three 80MB arrays. The routine will have floating point operations, and also (because of the size of the array) significant amounts of memory traffic, which appears as read and write memory bandwidth consumption.
The routine cache_miss is a test of memory latency. Each pointer chase in the key loop brings in another cacheline. This results in lots of cache misses, and also a significant amount of memory read bandwidth.
The routine tlb_miss is identical to the routine cache_miss. The only difference is how the routine is called. The reason for duplicating the code is to clearly show the location in the code where the events are happening. This routine brings in a new TLB page on every pointer chase in the key loop. So the routine encounters both cache and TLB misses.
#include <stdio.h> #include <stdlib.h> void fp_routine(double *out, double *in1, double *in2,int n) { for (int i=0; i<n; i++) {out[i]=in1[i]+in2[i];} } int** cache_miss(int **array, int size, int step) { for (int i=0; i<size-step; i++){array[i]=(int*)&array[i+step];} for (int i=size-step; i<size; i++) {array[i]=(int*)&array[i-size+step];} int ** cp=(int**)array[0]; for (int i=0; i<size*16; i++) {cp= (int**)*cp;} return cp; } int** tlb_miss(int **array, int size, int step) { for (int i=0; i<size-step; i++){array[i]=(int*)&array[i+step];} for (int i=size-step; i<size; i++) {array[i]=(int*)&array[i-size+step];} int ** cp=(int**)array[0]; for (int i=0; i<size*16; i++) {cp= (int**)*cp;} return cp; } void main() { double * out, *in1, *in2; int **array; out=(double*) calloc(sizeof(double),10*1024*1024); in1=(double*) calloc(sizeof(double),10*1024*1024); in2=(double*) calloc(sizeof(double),10*1024*1024); for (int rpt=0; rpt <100; rpt++) fp_routine(out,in1,in2,10*1024*1024); free(out); free(in1); free(in2); array=(int**)calloc(sizeof(int*),10*1024*1024); cache_miss(array,10*1024*1024,64/sizeof(int*)); tlb_miss(array,10*1024*1024,8192/sizeof(int*)); free (array); }
The program is compiled, using Sun Studio 12, in the following way:
$ cc -g -O -xbinopt=prepare -o test test.c |
The key compiler flags are:
The flag -g generates debug information. This flag is recommended so that the tools are able to attribute time and processor events back to the lines of source that cause them. For C++ programs, the flag -g will disable inlining of some routines. This can have significant performance impact so it is better to use the flag -g0 which generates the debug information without disabling this optimization.
The flag -xbinopt=prepare builds the applications with compiler annotations such that it can later be instrumented to generate the counts of number of calls to routines and number of times that each individual instruction was executed. This flag requires some level of optimization to be enabled, hence the flag -O has been added in this example.
To get the most information from the spot run with the -X option. The downside of using this option is that it takes longer to gather the data. If spot is run with root privileges, as well as the -X option, it will also gather bandwidth utilization and trap data. The command line to run the example application under spot is:.
$ spot -X test |
SPOT will produce a subdirectory spot_run1 and several files in the current directory. One of the files is spot_summary.html. To start examining SPOT’s output, view the content of spot_summary.html in a browser. Subsequent spot runs in the current directory will produce spot_run2, spot_run3, etc. and will add content to spot_summary.html.