Analyzing Program Performance With Sun WorkShop

Chapter 2 The Sampling Analysis Tools

Sun WorkShop provides a pair of tools that you use together to collect performance data on your application and analyze that data: the Sampling Collector and the Sampling Analyzer. The Sampling Collector collects performance data and the Sampling Analyzer displays the data graphically. These tools are designed for use by any software developer, even if performance tuning is not the developer's main responsibility.

This chapter is organized as follows:

The Sampling Collector collects timing summary information for the operating system, address space data, and statistical samples of either program counters or call stacks, and shows inclusive and exclusive times in various states on a per-function basis.

The Sampling Analyzer lets you browse an experiment recorded with the Sampling Collector. In certain cases, the Sampling Analyzer helps rebuild a tuned application by creating a mapfile. You can use the information in the mapfile to improve the order of loading functions into the application address space, thus reducing the memory footprint of your application. When you link the application with the new loading order, you get an executable with a smaller text working set size.

The Analyzer identifies a way to improve the order of loading functions into the application address space. You can link the application again to create an executable that runs with a smaller text working set size.

Figure 2-1 illustrates the basic performance-tuning architecture.

Figure 2-1 Performance Tuning Using the Sampling Collector and Sampling Analyzer

Graphic

Understanding the Sampling Collector

The Sampling Collector, a GUI tool within the Sun WorkShop debugger, collects performance behavior data from the kernel under which the application is running and writes that data to an experiment record file. The execution of the application during which data is collected is called the experiment.

Before collecting behavior data, you can define the following:

Using the Sampling Collector

The Sampling Collector can gather:

The Sampling Collector records data samples, each sample representing a portion of a run. To display the data for any set of such samples, you use the Sampling Analyzer.

Summary Data

Summary statistics are always collected, and include two kinds of data. The first kind is shown on the initial screen of the Sampling Analyzer, and is a summary of the amount of time the program spent in various states. States include user time, system time, system wait time, text page fault time, and data page fault time.The second kind of statistics is shown in the "Execution Statistics" screen of the Analyzer, and includes page fault and I/O statistics, context switches, and a variety of page residency (working-set and paging) statistics.

Execution Profile Data

Execution profile data shows how much time is consumed by functions, modules, and segments while the application is running. It can be displayed in the Sampling Analyzer as a histogram or as a cumulative histogram (if called-function times are included).

Execution profile data helps answer the following kinds of questions:

Program Memory Usage: Address Space Data

Address space data represents the process address space as a series of segments, each of which contains a number of pages. It allows the Sampling Analyzer to describe the status of each page and whether it was referenced or modified. Program memory-usage data can be displayed in the Sampling Analyzer in the Address Space display.

Address space data helps answer the following kinds of questions:

Collecting Sampling Data

Before you can collect data:

To collect data:

  1. In the Debugging window, choose Windows > Sampling Collector to open the Sampling Collector Window (see Figure 2-1)..

  2. Select whether you want to collect data for only one run or for all runs.

    If you run the Sampling Collector for only one run, the Sampling Collector shuts off after the first experiment is created. If you leave the Sampling Collector on for all runs, the Sampling Collector remains on, and subsequent executions record additional experiments.

    Turning the Sampling Collector on does not start data collection. Performance data is collected only when you execute the program in the Debugging window.

  3. Enter the complete path name for your experiment file in the Experiment File text box.

    The Sampling Collector provides the default experiment-record name test.1.er. If you use the .1.er suffix for your experiment-record filename, the Sampling Collector automatically increments the names of subsequent experiments by one--for example, test.1.er is followed by test.2.er.

  4. Select the data to be collected.

    Summary data is always collected, but most users want to collect information about functions that are executed as well; to do so, select Execution Profile data. Execution profile data can include or exclude time spent in called functions. Most users want to include such time; to do so, select Include called function time.

    If you are concerned about your application's memory usage, you should select Address Space data.

    Data is collected in terms of samples, each representing a sample of the program's execution. By default, samples are collected periodically. If you select Manually, on `New Sample' command, samples are marked in response to the new sample command or button. Whether you collect samples manually or periodically, additional samples are always marked when the program encounters a breakpoint.

  5. Start the program running in the Debugging window by clicking either Start or Go.

    Start begins sampling the program from the beginning of the code. Go begins sampling it from the current location in the code.

    Figure 2-2 Sampling Collector Window

    Graphic

 Collect menu Provides commands to start program sampling, break program sampling, start the Sampling Analyzer, and exit the Sampling Collector.
 Collect Data radio buttons Turn the Sampling Collector off, on for one run, or on for all runs. If you select for one run only, the Sampling Collector turns off after an experiment is created. If you select for all runs, the Sampling Collector remains turned on even after the experiment is created. Turning the Sampling Collector on does not start data collection. Performance data is collected only when you execute the program in the Debugging window.
 Experiment File text box Accepts the complete path name of your experiment. You can either type in the path yourself or select it through the file chooser, which can be accessed by clicking the ellipsis button (...) to the right of the text box.
 Address Space data checkbox Collects process state address space data represented as a series of segments, each of which contains a number of pages. Such data allows the Sampling Analyzer to describe the status of each page and whether it was referenced or modified.
 Execute Profile data checkbox Collects information about the time consumed by functions, modules, and segments during the execution of the application.
 Collect Profile data slider Controls how many samples per second the Sampling Collector gathers.
 Profile Times radio buttons Determine whether called function times are included in or excluded from the sample data.
 Manually, on "New Sample" command radio button Summarizes data and starts a new sample whenever you choose Collect New Sample.
 Periodically radio button Collects data periodically at the interval set by the Period slider (default is one second).
 Period slider Determines the interval at which period data is collected.

Controlling Profile Frequency

You can specify, in intervals of seconds, how often the Sampling Collector records profile PCs and call stacks. The valid range for the interval is 1 to 100 samples per second.

To specify an interval for gathering profiles, move the Collect Profile data slider to the number of samples per second you want.

Marking Samples

The Sampling Collector interrupts data gathering to end one sample and begin another at these three points:

Breakpoints

Because data is always summarized at breakpoints in the code, you can set breakpoints at any location at which you want to summarize collected data.

New Sample Command

If you select the radio button labeled "Manually, on `New Sample' command", you can use the New Sample command on the Collect menu to check data at whatever points in the application you wish without setting a breakpoint in the code. This is useful if you are interested in measuring human interaction with the application--for example, the time it takes to choose a command from a menu or to type in a keyboard command.

Periodic Sampling

If you select the radio button labeled "Periodically", the Sampling Collector takes behavior data samples as you observe the running application, to give you a uniform view of the application's behavior. Use the Period slider at the bottom of the Sampling Collector window to define the intervals at which the Sampling Collector summarizes samples. The interval can be from 1 to 60 seconds.

Understanding the Sampling Analyzer

The Sampling Analyzer measures, records, and analyzes the performance of an application. It can also compute an improved load order for functions in your application's address space and help you rebuild a tuned application.

Figure 2-3 The Sampling Analyzer's main window

Graphic

 Experiment menu Provides commands for loading, exporting, printing, and deleting experiments and for creating mapfiles.
 View menu Provides commands for selecting, sorting, finding, and showing data.
 Options menu Provides commands for altering column widths and histogram names
 Help menu Provides online help
 Data list box Determines the kind of performance data to be analyzed
 Display list box  Sets the display method for the data being analyzed
 Unit radio buttons Select the type of unit to view in the display pane.
 Average legend Displays the average percentage of time spent in performance problem areas contained in experiment samples.
 Sample display pane Contains graphical analyses of collected data.
 Include Samples text field Displays samples, sample ranges, and/or numbers of displayed samples. The text box is editable.
 Arrow buttons Let you step through an experiment, incrementing or decrementing the sample number by one per click, and view program behavior at each sample.
 Message area Displays information about current actions.

The Sampling Analyzer examines an experiment record written by the Sampling Collector and displays it graphically on screen. The er_export utility converts the data of the experiment record to ASCII format, and the er_print utility prints the data of the current display to a file or printer. These two utilities are invoked from the Export and Print entries in the Experiment menu. They are not normally run from the command line.

Loading an Experiment

You can load an experiment both as the Sampling Analyzer is opening and after the Sampling Analyzer is already open. By default, experiment data is shown using Overview display, but you can change the view to a Histogram, Cumulative, Address Space, or Statistics display, depending on the nature of the data.

To load an experiment as the Sampling Analyzer opens, double-click the experiment name in the Load Experiment dialog box that is displayed when the Sampling Analyzer window opens.

Or, navigate to the experiment name you want, and double-click it.

To load an experiment after the Sampling Analyzer is already open:

  1. Choose Experiment > Load.

  2. Type the name of the experiment in the Name text box or double-click its entry in the file filter.

    Or, navigate to the experiment name you want, and double-click it.

Selecting Data Types for Viewing

The Sampling Analyzer allows you to view different types of collected data. You can specify the kind of data that would help you improve your application's performance.

Select one of the following data types from the Data list box:

Process Times 

Summary of process state transitions 

User Time 

Time spent in the user process state from the execution of instructions 

System Wait Time 

Time the process is sleeping in the kernel but is not in the suspend, idle, lock wait, text fault, or data fault state 

System Time 

Time the operating system spends executing system calls 

Text Page Fault Time 

Time spent faulting in text pages 

Data Page Fault Time 

Time spent faulting in data pages 

Program Sizes 

Sizes in bytes of the functions, modules, and segments of your application. Used in conjunction with Address Space data, this lets you examine the size of your application and helps you establish specific memory requirements 

Address Space 

Reference behavior of both text pages and data pages. Used in conjunction with Program Sizes data, it lets you examine the size of your application and helps you establish specific memory requirements 

Execution Statistics 

Overall statistics on the execution of the application 

Data Types and Display Options

Each data type can be viewed only in displays appropriate to its nature. Table 2-1 lists the display options associated with each data type:

Table 2-1 Data Types and Corresponding Display Options

Data Type 

Display Option(s) 

Process Times 

Overview 

User Time 

Histogram; Cumulative 

System Wait Time 

Histogram; Cumulative 

System Time 

Histogram; Cumulative 

Text Page Fault Time 

Histogram; Cumulative 

Data Page Fault Time 

Histogram; Cumulative 

Program Sizes 

Histogram 

Address Space 

Address Space 

Execution Statistics 

Statistics 

Selecting Display Options

The Sampling Analyzer associates each data type with one or two display options, depending on the nature of the actual data.

Select one of the display options shown in Table 2-2 from the Display list box.

Table 2-2 Display Options for Specific Data Types
 Display Option Information Presented

Overview 

The default display gives a high-level overview of performance behavior  

Histogram 

Summary of the amount of time spent executing functions, files, and load objects 

Cumulative 

Cumulative amount of time spent by a function, file, or load object, including the time spent in called functions, files, or segments 

Address Space 

Information about memory usage 

Statistics 

Aggregate data about performance and system resource usage 

The Overview Display

For each sample, the Overview display (see Figure 2-4) shows the amount of time the application spends in different process states. The Sampling Collector always gathers this data during the data collection process, so the Overview display appears by default whenever an experiment is loaded into the Sampling Analyzer.

The Overview display option:

Figure 2-4 Overview Display

Graphic

The Overview display contains numbered sample columns made up of segmented bars. Each column represents individual samples collected during an experiment.

The segments inside each column represent different performance areas. The height of each segment is proportional to the time spent in each performance area.

The shade that represents a specific performance area is consistent across all the sample columns in the experiment and across other experiments as well.

A transparent segment is a segment the same color as the foreground of the display pane. It represents performance areas too small to display individually. To see exactly which performance areas are contained in a transparent segment, click the segment's column and choose View Show Details to open the Sample Details dialog box (see Figure 2-5).

Figure 2-5 Sample Details Dialog Box

Graphic

The fields in the dialog box contain the following information about the selected samples.

 Samples Samples currently selected and the percentage of the experiment they represent
 Start Time Start time of the sample
 End Time End time of the sample
 Duration Duration of the sample
 User Time spent executing application instructions
 System Time the operating system spent executing system calls
 Trap Time spent executing traps (automatic exceptions or memory faults)
 Text Fault The time spent faulting in text pages
 Data Fault The time spent faulting in data pages
 I/O Time spent in program I/0
 Lock Wait Time spent waiting for lightweight process locks to be released
 Sleep Time the program spent sleeping (due to any cause other than Text Fault, Data Fault, System Wait, or Lock Wait)
 Suspend Time spent suspended (including time spent in the debugger when it encounters breakpoints)
 Idle Time spent idle
 Parameters List of the data parameters collected for each sample (set in the Sampling Collector before beginning the experiment)

The Histogram Display

The Histogram display (see Figure 2-6) shows how much time an application spends executing functions, files, or load objects.

The Histogram display option is available for the following data types:

Figure 2-6 Histogram Display Showing Time Spent Executing Functions

Graphic

To view your application's performance data at various levels of compilation granularity, choose one of the following unit types:

 Function Time your application spent executing functions
 File

Time spent executing file-level units. This view is useful if your application has a large number of functions. All data for a single source file is displayed together. 

Note: If any part of the executable (including shared libraries) is not compiled with the -g option, the Sampling Collector may not have enough information to associate functions with their containing files.

 Load Object

Time spent executing text segments 

You can select which samples to include in the Histogram display in three ways:

To select which segments to include in the Histogram display, choose View Segments Included from Files to open the Segments Included from Files dialog. Click any segments and click Apply, or click the Select All button to select all segments.

To sort the Histogram display, choose View > Sort by and select either Values (descending by time value) or Names(alphabetically).

To search for specific names, choose View > Find to open the Find dialog box. Enter the search string in the text field and click Apply.

The Cumulative Display

The Cumulative display (see Figure 2-7) shows the total execution time spent by a function, file, or load object, including time spent in called functions, files, or segments. All execution time accumulated in a descendant function is attributed to the parent function.

The Cumulative display is available for the following data types:

Figure 2-7 Cumulative Display

Graphic

To view data at various levels of compilation granularity, choose one of the following unit types:

 Function Time your application spent executing functions
 File Time spent executing file-level units. This view is useful if your application has a large number of functions. All data for a single source file is displayed together. Note: If any part of the executable (including shared libraries) is not compiled with the -g option, the Sampling Collector may not have enough information to associate functions with their containing files.
 Load Object Time spent executing text segments.

You can select which samples to include in the Cumulative display in three ways:

To select which segments to include in the Cumulative display, choose View > Segments Included from Files to open the Segments Included from Files dialog. Click any segments and click Apply, or click the Select All button to select all segments.

To sort the Cumulative display, choose View > Sort by and select either Values (descending by time value) or Names (alphabetically).

To search for specific names, choose View > Find to open the Find dialog box. Enter the search string in the text field and click Apply.

The Address Space Display

The Address Space display (see Figure 2-8) helps you identify memory that is most heavily used by your application (modified and referenced pages). This display option also identifies memory that is unused because the experiment did not exercise all of your application's functionality, or because your application has dead code or memory allocation problems.

The Address Space display option shows data only if you collect address-space data. If no address-space data was collected, a message to that effect will appear at the bottom of the Sampling Analyzer screen.

Memory Categories

The Address Space display divides memory used by your application into the following categories:

 Modified A page written on during the execution of the application; may or may not be referenced
 Referenced A page read by your application or containing instructions that have been executed by your application
 Unreferenced A page neither modified nor referenced by the application

Address Space Display Layout

The Address Space display (see Figure 2-8) is laid out in rows and columns that are made up of individual squares (pages) or rectangles (segments). The rows and columns are numbered to describe their address in memory. Gaps (shown as white space) represent a region of the address space that was not used by the application.

Figure 2-8 Address Space Display

Graphic

Sun systems use either 4-Kbyte or 8-Kbyte pages. The address of a page is a multiple of 0x1000 (4 Kbytes in hexadecimal) or 0x2000 (8 Kbytes in hexadecimal).

To verify the page size of your system, go to a prompt and type:

% pagesize

The pagesize command returns the page size in bytes:

If the page size is 4 Kbytes, the number of pages per row is 16. If the page size is 8 Kbytes, the number of pages per row is 8.

You can determine the address of a page by combining the hexadecimal values of the row and column that contains the page. For example, if the page you are examining is in the fourth row (0004_ _00) and the third column (20), then the address of that page is 00042000.

To view memory units at various levels of granularity in the Address Space display, select Page or Segment in the Unit type area.

Selected pages and segments are shadowed and raised to the left. If you keep the right mouse button pressed down over a selected page, the segment containing that page is also displayed and shadowed; likewise, if you keep the right mouse button pressed down over a selected segment, the pages contained within that segment are also displayed and selected.

To view information about the properties of a selected page or segment, choose View  Show Details to open either the Page Properties or Segment Properties dialog, which displays the following information:

To select which samples to include in the Address Space display, you can:

The Statistics Display

The Statistics display (see Figure 2-9) provides data about your application's overall performance and system resource usage (as opposed to the Histogram, Cumulative, and Address Space display options, which show data broken down by program components such as functions and pages). The information provided by the Statistics display is useful when you want to compare actual numerical values against any previous estimates you may have made.

Figure 2-9 Display of Program Execution Statistics

Graphic

The information needed to produce the Statistics display is always collected by the Sampling Collector during the data collection process, so you do not need to specify any particular data type to view information in this display. The Statistics display shows:

 Minor Page Faults The number of page faults serviced that do not require any physical I/O activity
 Major Page Faults The number of page faults serviced that require physical I/O activity (if non-zero, the Overview display shows text page or data page fault wait time)
 Process swaps The number of times a process is swapped out of main memory
 Input blocks The number of times a read() system call is performed on a non-character or special file
 Output blocks The number of times a write() system call is performed on a non-character or special file
 Messages sent The number of messages sent over sockets
 Messages received The number of messages received from sockets
 Signals handled The number of signals delivered or received
 Voluntary context switches The number of times a context switch occurred because a process voluntarily gave up the processor before its allotted time was completed, to wait for availability of a resource
 Involuntary context switches The number of times a context switch occurred because a higher-priority process became runnable, or because the current process exceeded its allotted time
 System calls The total number of system calls
 Characters of I/O The number of characters transferred in or out to a character device or file by read and write calls
 Total address space size Total size of the address space (in pages)
 Maximum address space size Maximum size of the address space (pages per sample)
 Minimum address space size Minimum size of the address space (pages per sample)
 Average address space size Average size of the address space (pages per sample)
 Total text address space size Total size of the text address space (pages)
 Maximum text address space size Maximum size of the text address space (pages per sample)
 Minimum text address space size Minimum size of the text address space (pages per sample)
 Average text address space size Average size of the text address space (pages per sample)
 Total non-text address space size Total size of the non-text address space (pages)
 Maximum non-text address space size Maximum size of the non-text address space (pages per sample)
 Minimum non--text address space size Minimum size of the non-text address space (pages per sample)
 Average non-text address space size Average size of the non-text address space (pages per sample)


Note -

Workset sizes will be non-zero only if address-space data was collected.


You can select which samples to include in the Statistics display in three ways:

Reordering an Application

You might wish to reorder your application if (and only if) text page faults are consuming a large percentage of its time.

After the behavior data is collected, you can use the Sampling Analyzer to generate a mapfile containing an improved ordering of functions. The -M option passes the mapfile to the linker, which then relinks your application and produces a new executable application with a smaller text address space size.

After you have reordered your application, you can run a new experiment and compare the original version with the reordered one.

To reorder an application:

  1. Compile the application using the -xF option.

    The -xF option is required for reordering. This option causes the compiler to generate functions that can be relocated independently.

    For C applications, type:

    cc -xF -c a.c b.c

    cc -o application_name a.o b.o

    For C++ applications, type:

    CC -xF -c a.cc b.cc

    CC -o application_name a.o b.o

    For Fortran applications, type:

    f77 -xF -c a.f b.f

    f77 -o application_name a.o b.o

    If you see the following warning message, check any files that are statically linked, such as unshared object and library files, because these files may not have been compiled with the -xF option:

    ld: warning: mapfile: text: .text% :function_name

    object_file_name:

    Entrance criteria not met, the named file, function_name, has not been compiled with the -xF option

  2. Load the application in Sun WorkShop for debugging.

  3. Activate the Sampling Collector to collect performance data by choosing Windows > Sampling Collector from the Debugging window. Be sure to enable Address Space data collection.

  4. Run the application in Sun WorkShop.

  5. Load the specified experiment into the Sampling Analyzer.

  6. Create a reordered map in the Sampling Analyzer by choosing Experiment > Create Mapfile. In the file chooser, enter the samples to be used, the mapfile directory, and the name of the mapfile to be created; and click OK.

    The mapfile contains names of functions that have user CPU time associated with them. It specifies a function ordering that reduces the size of the text address space by sorting profiling data and function sizes in descending order. All functions not listed in the mapfile are placed after the listed functions.

  7. Link the application using the new mapfile.

For C applications, type:

cc -Wl -M mapfile_name a.o b.o

For C++ applications, type:

CC -M omapfile_name a.o b.

For C applications, the -M option causes the compiler to pass -M mapfile_name to the linker.

For Fortran applications, type:

f77 -M mapfile_name a.o b.o

Comparing Runtime Experiment Samples

The Sampling Analyzer lets you simultaneously view data in multiple displays, so you can compare samples in an experiment. With multiple displays, you can:

To view multiple displays:

  1. Choose View > New Window to open a second Sampling Analyzer window.

  2. In the new Sampling Analyzer window, choose data types, displays, and samples to examine, or load a second experiment if you wish.

    The new window does not inherit the settings of the first Sampling Analyzer window; it is set to the defaults with which the original Sampling Analyzer window started. Also, if you close or quit the original Sampling Analyzer window, all windows opened from that window close as well.

Printing Experiments

If you want to save a record of an experiment, you can print experiment data to either a printer or a file. The Sampling Analyzer allows you to print:

To print a plain-text version of the current display:

  1. Choose Experiment > Print.

  2. Select whether the data should be printed to a printer or a file, and indicate the printer name and number of copies, if applicable.

  3. Click OK.

To print a plain text summary of the experiment:

  1. Choose Experiment > Print Summary.

  2. Select whether the summary data should be printed to a printer or a file, and indicate the printer name and number of copies, if applicable.

  3. Click OK.

Exporting Experiment Data

The Sampling Analyzer allows you to export experiment data to an ASCII file to be used later by other programs.

To export experiment data to an ASCII file:

  1. Choose Experiment > Export to open the Export dialog box.

  2. Enter the directory and the name of the experiment data file to be exported.

  3. Click OK to save the experiment data under the given file name.