Analyzing Program Performance with Sun WorkShop HomeContentsPreviousNextIndex


Chapter 3

Sampling Collector Reference

This chapter introduces the Sampling Collector and explains how to use it. It covers the following topics:

The Sampling Collector collects performance data from your target application and the kernel under which your application is running, and writes that data to an experiment record file. An experiment is the data collected during one execution of your application.

Unless you have specified otherwise, experiment-record files generated by the Sampling Collector have the extension .n.er, where n is an integer 1 or higher. The default experiment-file name is test.n.er. If you use the filename.1.er format for your experiment-record file name, the Collector automatically increments the names of subsequent experiments by one--for example, my_test.1.er is followed by my_test.2.er, my_test.3.er, and so on.


Caution – Do not use the rm utility to delete an experiment-record file. The actual experiment information is stored in a hidden directory, .filename.n.er, which rm does not remove. To delete an experiment record file and also its hidden directory, use the performance-tool utility er_rm, which is included with the Collector and Analyzer. See the er_rm man page for information about using er_rm.

What the Sampling Collector Collects

The Sampling Collector records performance data, and organizes the data into samples, each of which represents an interval within an experiment. The Sampling Collector terminates one sample and begins a new one in the following circumstances:

The data recorded for each sample consists of microstate accounting information from the kernel and various other statistics maintained within the kernel.

All data recorded at sample points is global to the program and does not include function-level metrics. However, if function-level metrics have been recorded during the sampling interval, the Collector associates these function metrics with the sampling interval during which they were collected.

The Sampling Collector can gather the following types of function-level information:

Exclusive, Inclusive, and Attributed Metrics

The Collector collects exclusive, inclusive, and attributed function and load-object metrics.

Clock-Based Profiling Data

Clock-based profiling records information to support the following metrics:

This information appears in the Function List display and the Callers-Callees window of the Analyzer. (See Examining Metrics for Functions and Load-Objects.) It also appears in the Summary Metrics window and annotated source and disassembly.


Note – For multiprocessor experiments, times other than wall-clock time are summed across all LWPs in the process. Total time equals the wall-clock time multiplied by the average number of LWPs in the process. Each record contains a timestamp and the IDs of the thread and LWP running at the time of the clock tick.

Clock-based profiling helps answer the following kinds of questions:

Thread Synchronization Wait Tracing

In multithreaded programs, thread synchronization wait tracing keeps track of wait time on calls to thread-synchronization routines in the threads library; if the real-time delay exceeds a certain user-defined threshold, an event is recorded for the call, as well as the wait time, in seconds.

Each record contains a timestamp and the IDs of the thread and LWP running at the time of the clock stamp. Synchronization-delay information supports the following metrics:

This information appears in the Function List display and the Caller-Callee window of the Sampling Analyzer (see Examining Metrics for Functions and Load-Objects). It also appears in the Summary Metrics window and annotated source and disassembly.

Hardware-Counter Overflow Profiling

Hardware-counter overflow profiling records the callstack of each LWP at the time a designated hardware counter of the CPU on which the LWP is running overflows. The data recorded includes a timestamp and the IDs of the thread and the LWP.

The Collector allows you to select the type of counter whose overflow is to be monitored, and to set an overflow value for it. Typically, counters keep track of such things as instruction-cache misses, data-cache misses, cycles, or instructions issued or executed.


Note – Hardware-counter overflow profiling can be done only on Solaris 8 for SPARC (UltraSPARC III) machines and on x86 (Pentium II and compatible products). On other machines, this feature is disabled.

Hardware-counter overflow profiling produces data to support count metrics.

Global Information

Global information about your program includes the following kinds of data:

Collecting Performance Data in Sun WorkShop

Before you can collect data, you must do the following:

Collecting data requires two steps:

  1. Specifying the kinds of data you want to collect and where you want to store the data.

  2. Running the Collector.

To specify the kinds of data you want to collect:

1. From the WorkShop window menu bar, choose Window Sampling Collector.

The WorkShop Sampling Collector window appears.

FIGURE 3-1   The Sampling Collector Window

2. Use the Collect Data radio buttons to specify whether you want to collect data for this one run only or for multiple runs.

3. In the Experiment File text box, specify the path and file name of the experiment-record file in which you want the data to be stored.

The default experiment-record file name provided by the Collector is test.1.er. If you want to name your file a different file name, you must enter a path (if you do not want it to be stored in the default directory) and file name for it.
If you use the .1.er suffix for your experiment-record file name, the Sampling Collector automatically increments the names of subsequent experiments by one. For example, test.1.er is followed by test.2.er.

4. To collect clock-based profiling information, ensure that the Clock-Based Profiling Data check box is selected. (This check box is selected by default.)

You can accept the Normal profiling interval (10 milliseconds) or from the Profiling Interval list box you can select Hi-res (1 millisecond) or Custom, where you set your own interval in milliseconds.
High-resolution profiles record ten times as much data as normal profiles for any given run. To support high-resolution profiling, the operating system must be running with a high-resolution clock routine. You can specify a high-resolution routine by adding the following line to the file /etc/system, and then rebooting:

set hires_tick=1 


Note – If you try to set high-resolution profiling on a machine whose operating system does not support it, the Collector posts a warning message and reverts to the highest resolution supported. A custom setting that is not a multiple of the resolution supported by the system is rounded to the nearest multiple of that resolution, and the Collector issues a warning message.

5. To collect information about thread-synchronization wait counts and times, select the Synchronization Wait Tracing check box.

To specify the threshold beyond which tracing begins:

6. To collect information about hardware counter overflows, select the HW Counter Overflow Profiling check box.

7. Choose a category of counters from the Counter Name menu, then click Show for a list of all counters available in that category. The user-recognizable name of the counter you select appears in the Counter Name text box.

8. To specify the number of increments that take place between one overflow event and the next, choose the default Normal from the Collect Interval menu (the value of Normal depends on the counter you have selected), or choose Custom and type a value in the Collect Interval text box.


Note – All hardware counters are platform dependent, so the list of available counters differs from system to system. Some systems do not support hardware-counter overflow profiling. On such systems, this option is disabled.

9. To collect information about memory allocation in the address space, select the Address Space Data check box.

10. You can accept the default Hi-res sampling interval (a 1-second interval), or from the Sampling Interval list box you can select Normal (a 10-second interval), Custom (set your own interval in seconds), or Manual, in which you signal the end of the current sample and the beginning of a new one by either choosing Collect New Sample in the WorkShop Sampling Collector window, or clicking the New Sample button:


Now you are ready to collect data. To run the Sampling Collector, in the WorkShop Sampling Collector window:


Starting a Process Under the Collector in dbx

You can run the Collector from dbx, as well as from the Sun WorkShop Debugging window. To do this:

1. Start your program in dbx by typing:

% dbx program_name 

2. Press the space bar until the (dbx) prompt appears.

3. Use the collector command with its various arguments to collect data and generate an experiment record:

(dbx) collector argument 

The collector command arguments are listed in TABLE 3-1.

TABLE 3-1   collector Command Arguments 
Argument What It Does
{ enable | enable_once | disable } Enables or disables data collection.

  • If the mode is enable, data collection is enabled for the current run and all subsequent runs.

  • If the mode is enable_once, data is collected for the current run, and the mode is reset to disable when the run ends.

  • If the mode is disable, no performance data is collected.

  • profile { on | off } Enables or disables collection of profiling data. The default is on.
    profile timer value Sets the profiling timer interval to value, given in milliseconds. The default is 10 ms.
    address_space { on | off } Enables or disables collection of address-space data (pages that have been referenced and modified). The default is off.
    synctrace { on | off } Enables or disables collecting of thread-synchronization wait tracing data. Default is off.
    synctrace threshold value Sets the threshold for synchronization delay tracing according to the given value, in microseconds. value is one of the following:

  • calibrate: Use a calibrated threshold, determined at runtime.

  • number: Use a threshold of number, given in microseconds. Setting number to 0 (zero) causes the collector to trace all events, regardless of wait time.

    The default setting is calibrate.
  • hwprofile { on | off } Enables or disables hardware-counter overflow profiling. The default is off. If you attempt to enable hardware-counter overflow profiling on systems that do not support it, dbx returns an error message.
    hwprofile list Returns a list of available counters by name, with two numeric settings, one for a normal interval and one for a higher-resolution interval. If your system does not support hardware-counter overflow profiling, dbx returns an error message.
    hwprofile counter name interval Sets the hardware-counter profiling to the event name, and its overflow interval to interval. The default for name is cycles, at the normal-profiling interval.
    status Reports on the status of any loaded experiment.
    show Shows the current setting of every collector control.
    close Closes the current experiment.
    quit Terminates data collection for the current run.
    sample { periodic | manual } Sets the sampling mode to either periodic (for which you must use the sample period argument to set a value) or manual.
    sample period value Sets the sampling frequency to value, given in seconds.
    store directory directory_name Sets the directory where the experiment record file is stored to directory_name.
    store filename file_name Sets the output experiment file name to file_name.


    To get a listing of available collector command arguments, type the following at the (dbx) prompt and press Enter:

    (dbx) help collector 
    

    Attaching to a Running Process

    The Collector allows you to attach to a running process and collect performance data from the process.

    If you want to collect thread synchronization wait tracing, load the library libcollector.so before you start the executable, so the Collector's wrapper around the real synchronization routines is referenced, rather than the actual routines themselves. If you are collecting only profiling data or hardware-counter overflow profiling, you do not need to preload the collector library, although you can do so if you wish.

    To preload libcollector.so:

    To attach to the executable and collect data:

    1. Start the executable.

    2. Determine the executable's PID, and attach dbx to it.

    3. Enable data collection:

      1. Start collecting data, either directly in dbx using the collector command, or from the Sampling Collector window.
      2. Use the cont command to resume the target process from dbx.


        Note – If you have started the executable from dbx without enabling data collection, you can pause the target from dbx and then execute the preceding instructions to start data collection during the run.

    Using the Collector for Programs Written with MPI

    The Collector can collect performance data from multi-process programs that use the Sun Message Passing Interface (MPI), if you use the Sun Cluster Runtime Environment (CRE) command mprun to launch the parallel jobs. Use ClusterTools 3.1 or a compatible version. See the Sun HPC ClusterTools 3.1 documentation for more information.

    To collect data from MPI jobs, you must either start the MPI processes under dbx, or attach dbx to each process separately. For example, suppose you run MPI jobs using a command line like the following:

    % mprun -np 2 a.out [program-arguments] 
    

    You can replace that command line with the following:

    % mprun -np 2 dbx a.out < collection.script 
    

    where collection.script is a dbx script, described in the following paragraphs.

    When this example is executed, two MPI processes from the executable a.out run, and two experiments are created, named test.M.er and test.N.er, where M and N are the PIDs of the two MPI processes.

    Your file collection.script must ensure that the experiments created are uniquely named. Otherwise, because the dbx instances that generate the experiments are being run simultaneously, two or more of the dbx instances might attempt to create experiments with the same name. One way to ensure uniquely named files is to make your script specify that each dbx instance must use a file name with the process ID of the MPI process in it:

    stop in main 
    
    run [program_arguments]
    
    collector enable 
    
    collector store filename test.$[getpid()].er 
    
    cont 
    
    quit 
    

    You can also create experiments named with the MPI rank, by using a slightly different dbx script in which you stop the target program immediately following a call to MPI_Comm_rank() and use the rank to specify the experiment directory. For example, suppose your MPI program contains one of the following statements on line 17:

    Change collection.script to read as follows:

    stop at 18
    
    run [program_arguments]
    
    rank=$[me]
    
    collector enable
    
    collector store filename test.$rank.er
    
    cont
    
    quit
    

    With this modification, mprun creates experiments named with the rank of the MPI process to which they correspond.

    To examine the data collected from the MPI processes, open one experiment in the Analyzer, then add the others, so you can see the data for all the MPI processes in aggregate. See Starting the Analyzer and Loading an Experiment and Adding Experiments to the Analyzer for more information.

    You can also use er_print to print out the data. er_print accepts multiple experiments on the command line. For information about using er_print, see Chapter 5.


    Sun Microsystems, Inc.
    Copyright information. All rights reserved.
    Feedback
    Library   |   Contents   |   Previous   |   Next   |   Index