C H A P T E R 5 |
Teja Profiler |
This chapter discusses the Teja Profiler used in the Netra Data Plane software. Topics include:
Teja Profiler is a set of API calls that help you collect various critical data during the execution of an application. You can profile one or more areas of your application such as CPU utilization, I/O wait times, and so on. Information gathered using the profiler helps you decide where to direct performance-tuning efforts. The profiler uses special counters and resources available in the system hardware to collect critical information about the application.
As with instrumentation-based profiling, there is a slight overhead for collecting data during the application run. Teja Profiler uses as little overhead as possible so that the presented data is very close to the actual application run without the profiler API in place.
You enable the Teja Profiler with the -pg command-line option. You can insert the API calls at desired places to start collecting profiling data. Teja Profiler configures and sets the hardware resources to capture the requested data. At the same time, Teja Profiler reserves and sets up the memory buffer where the data will be stored. You can insert calls to update the profiler data at any further location in the application. With this setup, the profiler reads the current values of the data and stores the values in memory.
There is an option to store additional user data in the memory along with each update capture. Storing this data helps you analyze the application in the context of different application-specific data.
You can also obtain the current profiler data in the application and use the data as desired. With the assistance of other communication mechanisms you can send the data to the host or other parts of the application.
By demarking the portions that are being profiled, you can dump the collected data to the console. The data is presented as a comma-delimited table that can be further processed for report generation.
To minimize the amount of memory space needed for the profile capture, the profiler uses a circular buffer mechanism to store the data. In a circular buffer, the start and the end data is preserved, yet the intermediate data is overwritten when the buffer becomes full.
The profiling data is captured into different groups based on the significance of the data. For example, with the CPU performance group, events such as completed instruction cycles, data cache misses, and secondary cache misses are captured. In the memory performance group, events such as memory queue and memory cycles are captured. Refer to the Profiler section of the Netra Data Plane Software Suite Reference Manual for the different groups and different events that are captured and measured on the target.
The profiler dump output consists of one line per profiler record. Each line most commonly has a format of nine comma-delimited fields. The fields contain values in hexadecimal. If a record is prefixed with a -1, that indicates that the buffer allocated for the profiler records has overrun. When a buffer overrun occurs, you should increase the value of the profiler_buffer_size property as described in the configuration section of the Netra Data Plane Software Suite 1.1 Reference Manual, and run the application again.
TABLE 5-1 describes the fields of the profiler records:
Refer to Dump Output Example for an example of dump output.
CODE EXAMPLE 5-1 provides an example of profiler API output.
You can change the profiler configuration in the software architecture. The following example shows the three profiler properties that can be changed per process.
main_process is the process object that was created using the teja_process_create call. The property values are applied to all threads mapped to the process specified using main_process.
The following is an example of dump output.
The string, ver1.1, is the dump format version. The string is used as an identifier of the output format. The string helps scripts written to process the output validate the format before processing further.
In the first record, call type 1 represents teja_profiler_start. The values 100 and 4 seen in the event_hi and event_lo columns are the types of events in group 1 being measured. In the record with ID 30e6, call type 2 represents teja_profiler_update, so the values 36c2ba96 and ce are the values of the event types 100 and 1 respectively.
Cycle counts are in increasing order so the difference between two of them provides the exact number of cycle counts between two profiler API calls. The difference divided by the processor frequency calculates the actual time between two calls.
IDs 2be4 and 2bf6 represent the source location of the profiler API call. The records/profiler_call_locations.txt file lists a table that maps IDs and actual source locations.
Copyright © 2007, Sun Microsystems, Inc. All Rights Reserved.