C H A P T E R  5

Teja Profiler

This chapter discusses the Teja Profiler used in the Netra Data Plane software. Topics include:


Teja Profiler Introduction

Teja Profiler is a set of API calls that help you collect various critical data during the execution of an application. You can profile one or more areas of your application such as CPU utilization, I/O wait times, and so on. Information gathered using the profiler helps you decide where to direct performance-tuning efforts. The profiler uses special counters and resources available in the system hardware to collect critical information about the application.

As with instrumentation-based profiling, there is a slight overhead for collecting data during the application run. Teja Profiler uses as little overhead as possible so that the presented data is very close to the actual application run without the profiler API in place.


How the Profiler Works

You enable the Teja Profiler with the -pg command-line option. You can insert the API calls at desired places to start collecting profiling data. Teja Profiler configures and sets the hardware resources to capture the requested data. At the same time, Teja Profiler reserves and sets up the memory buffer where the data will be stored. You can insert calls to update the profiler data at any further location in the application. With this setup, the profiler reads the current values of the data and stores the values in memory.

There is an option to store additional user data in the memory along with each update capture. Storing this data helps you analyze the application in the context of different application-specific data.

You can also obtain the current profiler data in the application and use the data as desired. With the assistance of other communication mechanisms you can send the data to the host or other parts of the application.

By demarking the portions that are being profiled, you can dump the collected data to the console. The data is presented as a comma-delimited table that can be further processed for report generation.

To minimize the amount of memory space needed for the profile capture, the profiler uses a circular buffer mechanism to store the data. In a circular buffer, the start and the end data is preserved, yet the intermediate data is overwritten when the buffer becomes full.


Groups and Events

The profiling data is captured into different groups based on the significance of the data. For example, with the CPU performance group, events such as completed instruction cycles, data cache misses, and secondary cache misses are captured. In the memory performance group, events such as memory queue and memory cycles are captured. Refer to the Profiler section of the Netra Data Plane Software Suite Reference Manual for the different groups and different events that are captured and measured on the target.


Dump Output

The profiler dump output consists of one line per profiler record. Each line most commonly has a format of nine comma-delimited fields. The fields contain values in hexadecimal. If a record is prefixed with a -1, that indicates that the buffer allocated for the profiler records has overrun. When a buffer overrun occurs, you should increase the value of the profiler_buffer_size property as described in the configuration section of the Netra Data Plane Software Suite 1.1 Reference Manual, and run the application again.

TABLE 5-1 describes the fields of the profiler records:


TABLE 5-1 Profiler Record Fields

Field

Description

CPU ID

The number representing the CPU ID where the current profiler call was made.

Caller ID

The number representing the source location of the teja_profiler call. The records/profiler_call_locations.txt file lists all of the IDs and their corresponding source locations.

Call Type

The type of teja_profiler call. The values listed are defined in the teja_profiler.h file.

Completed Cycles

The running total of completed clock cycles so far. You can use this value to calculate the time between two entries.

Program Counter

The value of the program counter when the current profiler call was invoked.

Group Type

The group number of the events started or being measured.

Event Values

The value of the events. This value can be one or more columns depending on the target CSP. The target-dependent values are described in the profiler section in the Netra Data Plane Software Suite 1.1 Reference Manual. The order of the events are the same as the location of the bit set in the event bit mask, passed to teja_profiler_start, starting from left to right. For the entry that represents teja_profiler_start, the values represent the event types.

There are two events per record (group) in the dump output:

  • event_hi - represents the higher bit set in the event mask
  • event_lo - represents the lower bit set in the event mask

Overflow values consist of:

  • 0x0 - no overflow
  • 0x1 - overflow of the event_lo
  • 0x2 - overflow of the event_hi
  • 0x3 - overflow of both event_hi and event_lo

Overflow

The overflow information of one or more events being measured. The value is target-dependent and is explained in the Netra Data Plane Software Suite 1.1 Reference Manual.

User Data

The values of the user-defined data. Zero or more columns, depending on the number of counters allocated and recorded by the user.


Refer to Dump Output Example for an example of dump output.


Profiler Examples

Profiler API

CODE EXAMPLE 5-1 provides an example of profiler API output.


CODE EXAMPLE 5-1 Sample Profiler API Output
main()  
{ 
  /* ...user code... */ 
 teja_profiler_start(TEJA_PROFILER_CMT_CPU, TEJA_PROFILER_CMT_CPU_IC_MISS); 
  /*   ...user code... */ 
  while (packet) { 
    /*  ...user code... */ 
    teja_profiler_update(TEJA_PROFILER_CMT_CPU, num_pkt); 
    if (num_pkt == 100)  
      teja_profiler_dump(generator_thread); teja_profiler_stop(TEJA_PROFILER_CMT_CPU);
  }  
}

Profiler Configuration

You can change the profiler configuration in the software architecture. The following example shows the three profiler properties that can be changed per process.


teja_process_set_property(main_process, "profiler_log_table_size","4096");

main_process is the process object that was created using the teja_process_create call. The property values are applied to all threads mapped to the process specified using main_process.

Dump Output Example

The following is an example of dump output.


TEJA_PROFILE_DUMP_START,ver1.1  
CPUID,ID,Type,Cycles,PC,Grp,Evt_Hi,Evt_Lo,Overflow,User Data  
0,2be4,1,29371aa3d0,51171c,1,100,4  
0,2bf6,1,294bbbd464,51189c,2,2,1  
0,2c0c,1,29629416a0,511a08,4,2,1  
0,2c22,1,29761be17c,511b7c,8,2,1  
0,2c38,1,2988fbbf60,511ce8,10,2,1  
0,2c4e,1,299c3ca170,511e5c,20,2,1  
0,30e6,2,2d20448f60,512904,1,36c2ba96,ce,0,0,114ee88  
0,30fe,2,2d37b98aec,512acc,2,9,9,0,0  
TEJA_PROFILE_DUMP_END 

The string, ver1.1, is the dump format version. The string is used as an identifier of the output format. The string helps scripts written to process the output validate the format before processing further.

In the first record, call type 1 represents teja_profiler_start. The values 100 and 4 seen in the event_hi and event_lo columns are the types of events in group 1 being measured. In the record with ID 30e6, call type 2 represents teja_profiler_update, so the values 36c2ba96 and ce are the values of the event types 100 and 1 respectively.

Cycle counts are in increasing order so the difference between two of them provides the exact number of cycle counts between two profiler API calls. The difference divided by the processor frequency calculates the actual time between two calls.

IDs 2be4 and 2bf6 represent the source location of the profiler API call. The records/profiler_call_locations.txt file lists a table that maps IDs and actual source locations.