Analyzing Program Performance With Sun WorkShop

Understanding the Sampling Analyzer

The Sampling Analyzer measures, records, and analyzes the performance of an application. It can also compute an improved load order for functions in your application's address space and help you rebuild a tuned application.

Figure 2-3 The Sampling Analyzer's main window

Experiment menu	Provides commands for loading, exporting, printing, and deleting experiments and for creating mapfiles.
View menu	Provides commands for selecting, sorting, finding, and showing data.
Options menu	Provides commands for altering column widths and histogram names
Help menu	Provides online help
Data list box	Determines the kind of performance data to be analyzed
Display list box	Sets the display method for the data being analyzed
Unit radio buttons	Select the type of unit to view in the display pane.
Average legend	Displays the average percentage of time spent in performance problem areas contained in experiment samples.
Sample display pane	Contains graphical analyses of collected data.
Include Samples text field	Displays samples, sample ranges, and/or numbers of displayed samples. The text box is editable.
Arrow buttons	Let you step through an experiment, incrementing or decrementing the sample number by one per click, and view program behavior at each sample.
Message area	Displays information about current actions.

The Sampling Analyzer examines an experiment record written by the Sampling Collector and displays it graphically on screen. The er_export utility converts the data of the experiment record to ASCII format, and the er_print utility prints the data of the current display to a file or printer. These two utilities are invoked from the Export and Print entries in the Experiment menu. They are not normally run from the command line.

Loading an Experiment

You can load an experiment both as the Sampling Analyzer is opening and after the Sampling Analyzer is already open. By default, experiment data is shown using Overview display, but you can change the view to a Histogram, Cumulative, Address Space, or Statistics display, depending on the nature of the data.

To load an experiment as the Sampling Analyzer opens, double-click the experiment name in the Load Experiment dialog box that is displayed when the Sampling Analyzer window opens.

Or, navigate to the experiment name you want, and double-click it.

To load an experiment after the Sampling Analyzer is already open:

Choose Experiment > Load.

Type the name of the experiment in the Name text box or double-click its entry in the file filter.

Or, navigate to the experiment name you want, and double-click it.

Selecting Data Types for Viewing

The Sampling Analyzer allows you to view different types of collected data. You can specify the kind of data that would help you improve your application's performance.

Select one of the following data types from the Data list box:

Process Times	Summary of process state transitions
User Time	Time spent in the user process state from the execution of instructions
System Wait Time	Time the process is sleeping in the kernel but is not in the suspend, idle, lock wait, text fault, or data fault state
System Time	Time the operating system spends executing system calls
Text Page Fault Time	Time spent faulting in text pages
Data Page Fault Time	Time spent faulting in data pages
Program Sizes	Sizes in bytes of the functions, modules, and segments of your application. Used in conjunction with Address Space data, this lets you examine the size of your application and helps you establish specific memory requirements
Address Space	Reference behavior of both text pages and data pages. Used in conjunction with Program Sizes data, it lets you examine the size of your application and helps you establish specific memory requirements
Execution Statistics	Overall statistics on the execution of the application

Data Types and Display Options

Each data type can be viewed only in displays appropriate to its nature. Table 2-1 lists the display options associated with each data type:

Table 2-1 Data Types and Corresponding Display Options


Data Type	Display Option(s)
Process Times	Overview
User Time	Histogram; Cumulative
System Wait Time	Histogram; Cumulative
System Time	Histogram; Cumulative
Text Page Fault Time	Histogram; Cumulative
Data Page Fault Time	Histogram; Cumulative
Program Sizes	Histogram
Address Space	Address Space
Execution Statistics	Statistics

Selecting Display Options

The Sampling Analyzer associates each data type with one or two display options, depending on the nature of the actual data.

Select one of the display options shown in Table 2-2 from the Display list box.

Table 2-2 Display Options for Specific Data Types


Display Option	Information Presented
Overview	The default display gives a high-level overview of performance behavior
Histogram	Summary of the amount of time spent executing functions, files, and load objects
Cumulative	Cumulative amount of time spent by a function, file, or load object, including the time spent in called functions, files, or segments
Address Space	Information about memory usage
Statistics	Aggregate data about performance and system resource usage

The Overview Display

For each sample, the Overview display (see Figure 2-4) shows the amount of time the application spends in different process states. The Sampling Collector always gathers this data during the data collection process, so the Overview display appears by default whenever an experiment is loaded into the Sampling Analyzer.

The Overview display option:

Provides a high-level overview of the performance behavior of an application
Provides data about how your application's execution time breaks down into different performance areas, helping you identify CPU bottlenecks, I/O bottlenecks, and paging bottlenecks
Shows how application performance changes during execution (for example, early parts of the execution might be I/O-bound, while later parts might be CPU-bound)

Figure 2-4 Overview Display

The Overview display contains numbered sample columns made up of segmented bars. Each column represents individual samples collected during an experiment.

The segments inside each column represent different performance areas. The height of each segment is proportional to the time spent in each performance area.

The shade that represents a specific performance area is consistent across all the sample columns in the experiment and across other experiments as well.

A transparent segment is a segment the same color as the foreground of the display pane. It represents performance areas too small to display individually. To see exactly which performance areas are contained in a transparent segment, click the segment's column and choose View Show Details to open the Sample Details dialog box (see Figure 2-5).

Figure 2-5 Sample Details Dialog Box

The fields in the dialog box contain the following information about the selected samples.

Samples	Samples currently selected and the percentage of the experiment they represent
Start Time	Start time of the sample
End Time	End time of the sample
Duration	Duration of the sample
User	Time spent executing application instructions
System	Time the operating system spent executing system calls
Trap	Time spent executing traps (automatic exceptions or memory faults)
Text Fault	The time spent faulting in text pages
Data Fault	The time spent faulting in data pages
I/O	Time spent in program I/0
Lock Wait	Time spent waiting for lightweight process locks to be released
Sleep	Time the program spent sleeping (due to any cause other than Text Fault, Data Fault, System Wait, or Lock Wait)
Suspend	Time spent suspended (including time spent in the debugger when it encounters breakpoints)
Idle	Time spent idle
Parameters	List of the data parameters collected for each sample (set in the Sampling Collector before beginning the experiment)

The Histogram Display

The Histogram display (see Figure 2-6) shows how much time an application spends executing functions, files, or load objects.

The Histogram display option is available for the following data types:

User Time
System Wait Time
System Time
Text Page Fault Time
Data Page Fault Time

Figure 2-6 Histogram Display Showing Time Spent Executing Functions

To view your application's performance data at various levels of compilation granularity, choose one of the following unit types:

Function

Time your application spent executing functions

File

Time spent executing file-level units. This view is useful if your application has a large number of functions. All data for a single source file is displayed together.

Note: If any part of the executable (including shared libraries) is not compiled with the -g option, the Sampling Collector may not have enough information to associate functions with their containing files.

Load Object

Time spent executing text segments

You can select which samples to include in the Histogram display in three ways:

Type sample numbers directly into the Includes Samples text field: separate numbers with commas (1,3,6), and define ranges using a hyphen (1-6).
Select the columns containing those samples while still in the Overview display.
Choose either View > Select or View > Select None while still in the Overview display to include or exclude all samples in the experiment.

To select which segments to include in the Histogram display, choose View Segments Included from Files to open the Segments Included from Files dialog. Click any segments and click Apply, or click the Select All button to select all segments.

To sort the Histogram display, choose View > Sort by and select either Values (descending by time value) or Names(alphabetically).

To search for specific names, choose View > Find to open the Find dialog box. Enter the search string in the text field and click Apply.

The Cumulative Display

The Cumulative display (see Figure 2-7) shows the total execution time spent by a function, file, or load object, including time spent in called functions, files, or segments. All execution time accumulated in a descendant function is attributed to the parent function.

The Cumulative display is available for the following data types:

User Time
System Wait Time
System Time
Text Page Fault Time
Data Page Fault Time

Figure 2-7 Cumulative Display

To view data at various levels of compilation granularity, choose one of the following unit types:

Function	Time your application spent executing functions
File	Time spent executing file-level units. This view is useful if your application has a large number of functions. All data for a single source file is displayed together. Note: If any part of the executable (including shared libraries) is not compiled with the -g option, the Sampling Collector may not have enough information to associate functions with their containing files.
Load Object	Time spent executing text segments.

You can select which samples to include in the Cumulative display in three ways:

Type sample numbers directly into the Includes Samples text field: separate numbers with commas (1,3,6), and define ranges using a hyphen (1-6).
Select the columns containing those samples while still in the Overview display.
Choose either View > Select or View > Select None while still in the Overview display to include or exclude all samples in the experiment.

To select which segments to include in the Cumulative display, choose View > Segments Included from Files to open the Segments Included from Files dialog. Click any segments and click Apply, or click the Select All button to select all segments.

To sort the Cumulative display, choose View > Sort by and select either Values (descending by time value) or Names (alphabetically).

To search for specific names, choose View > Find to open the Find dialog box. Enter the search string in the text field and click Apply.

The Address Space Display

The Address Space display (see Figure 2-8) helps you identify memory that is most heavily used by your application (modified and referenced pages). This display option also identifies memory that is unused because the experiment did not exercise all of your application's functionality, or because your application has dead code or memory allocation problems.

The Address Space display option shows data only if you collect address-space data. If no address-space data was collected, a message to that effect will appear at the bottom of the Sampling Analyzer screen.

Memory Categories

The Address Space display divides memory used by your application into the following categories:

Modified	A page written on during the execution of the application; may or may not be referenced
Referenced	A page read by your application or containing instructions that have been executed by your application
Unreferenced	A page neither modified nor referenced by the application

Address Space Display Layout

The Address Space display (see Figure 2-8) is laid out in rows and columns that are made up of individual squares (pages) or rectangles (segments). The rows and columns are numbered to describe their address in memory. Gaps (shown as white space) represent a region of the address space that was not used by the application.

Figure 2-8 Address Space Display

Sun systems use either 4-Kbyte or 8-Kbyte pages. The address of a page is a multiple of 0x1000 (4 Kbytes in hexadecimal) or 0x2000 (8 Kbytes in hexadecimal).

To verify the page size of your system, go to a prompt and type:

% pagesize

The pagesize command returns the page size in bytes:

4096 (4-Kbyte pages)
8192 (8-Kbyte pages)

If the page size is 4 Kbytes, the number of pages per row is 16. If the page size is 8 Kbytes, the number of pages per row is 8.

You can determine the address of a page by combining the hexadecimal values of the row and column that contains the page. For example, if the page you are examining is in the fourth row (0004_ _00) and the third column (20), then the address of that page is 00042000.

To view memory units at various levels of granularity in the Address Space display, select Page or Segment in the Unit type area.

Selected pages and segments are shadowed and raised to the left. If you keep the right mouse button pressed down over a selected page, the segment containing that page is also displayed and shadowed; likewise, if you keep the right mouse button pressed down over a selected segment, the pages contained within that segment are also displayed and selected.

To view information about the properties of a selected page or segment, choose View Show Details to open either the Page Properties or Segment Properties dialog, which displays the following information:

Address
Size of the page or size range of the segment in bytes
Functions contained in the page or segment
Name of the segment

To select which samples to include in the Address Space display, you can:

Type sample numbers directly into the Includes Samples text field: separate numbers with commas (1,3,6), and define ranges using a hyphen (1-6).
Select the columns containing those samples while still in the Overview display.
Choose either View Select or View Select None while still in the Overview display to include or exclude all samples in the experiment.

The Statistics Display

The Statistics display (see Figure 2-9) provides data about your application's overall performance and system resource usage (as opposed to the Histogram, Cumulative, and Address Space display options, which show data broken down by program components such as functions and pages). The information provided by the Statistics display is useful when you want to compare actual numerical values against any previous estimates you may have made.

Figure 2-9 Display of Program Execution Statistics

The information needed to produce the Statistics display is always collected by the Sampling Collector during the data collection process, so you do not need to specify any particular data type to view information in this display. The Statistics display shows:

Minor Page Faults	The number of page faults serviced that do not require any physical I/O activity
Major Page Faults	The number of page faults serviced that require physical I/O activity (if non-zero, the Overview display shows text page or data page fault wait time)
Process swaps	The number of times a process is swapped out of main memory
Input blocks	The number of times a read() system call is performed on a non-character or special file
Output blocks	The number of times a write() system call is performed on a non-character or special file
Messages sent	The number of messages sent over sockets
Messages received	The number of messages received from sockets
Signals handled	The number of signals delivered or received
Voluntary context switches	The number of times a context switch occurred because a process voluntarily gave up the processor before its allotted time was completed, to wait for availability of a resource
Involuntary context switches	The number of times a context switch occurred because a higher-priority process became runnable, or because the current process exceeded its allotted time
System calls	The total number of system calls
Characters of I/O	The number of characters transferred in or out to a character device or file by read and write calls
Total address space size	Total size of the address space (in pages)
Maximum address space size	Maximum size of the address space (pages per sample)
Minimum address space size	Minimum size of the address space (pages per sample)
Average address space size	Average size of the address space (pages per sample)
Total text address space size	Total size of the text address space (pages)
Maximum text address space size	Maximum size of the text address space (pages per sample)
Minimum text address space size	Minimum size of the text address space (pages per sample)
Average text address space size	Average size of the text address space (pages per sample)
Total non-text address space size	Total size of the non-text address space (pages)
Maximum non-text address space size	Maximum size of the non-text address space (pages per sample)
Minimum non--text address space size	Minimum size of the non-text address space (pages per sample)
Average non-text address space size	Average size of the non-text address space (pages per sample)

Note -

Workset sizes will be non-zero only if address-space data was collected.

You can select which samples to include in the Statistics display in three ways:

Type sample numbers directly into the Includes Samples text field: separate numbers with commas (1,3,6), and define ranges using a hyphen (1-6).
Select the columns containing those samples while still in the Overview display.
Choose either View Select or View Select None while still in the Overview display to include or exclude all samples in the experiment.

Reordering an Application

You might wish to reorder your application if (and only if) text page faults are consuming a large percentage of its time.

After the behavior data is collected, you can use the Sampling Analyzer to generate a mapfile containing an improved ordering of functions. The -M option passes the mapfile to the linker, which then relinks your application and produces a new executable application with a smaller text address space size.

After you have reordered your application, you can run a new experiment and compare the original version with the reordered one.

To reorder an application:

Compile the application using the -xF option.

The -xF option is required for reordering. This option causes the compiler to generate functions that can be relocated independently.

For C applications, type:

cc -xF -c a.c b.c

cc -o application_namea.o b.o

For C++ applications, type:

CC -xF -c a.cc b.cc

CC -o application_namea.o b.o

For Fortran applications, type:

f77 -xF -c a.f b.f

f77 -o application_namea.o b.o

If you see the following warning message, check any files that are statically linked, such as unshared object and library files, because these files may not have been compiled with the -xF option:

ld: warning: mapfile: text: .text% :function_name

object_file_name:

Entrance criteria not met, the named file, function_name, has not been compiled with the -xF option

Load the application in Sun WorkShop for debugging.

Activate the Sampling Collector to collect performance data by choosing Windows > Sampling Collector from the Debugging window. Be sure to enable Address Space data collection.

Run the application in Sun WorkShop.

Load the specified experiment into the Sampling Analyzer.

Create a reordered map in the Sampling Analyzer by choosing Experiment > Create Mapfile. In the file chooser, enter the samples to be used, the mapfile directory, and the name of the mapfile to be created; and click OK.

The mapfile contains names of functions that have user CPU time associated with them. It specifies a function ordering that reduces the size of the text address space by sorting profiling data and function sizes in descending order. All functions not listed in the mapfile are placed after the listed functions.

Link the application using the new mapfile.

For C applications, type:

cc -Wl -M mapfile_namea.o b.o

For C++ applications, type:

CC -M omapfile_namea.o b.

For C applications, the -M option causes the compiler to pass -M mapfile_name to the linker.

For Fortran applications, type:

f77 -M mapfile_namea.o b.o

Comparing Runtime Experiment Samples

The Sampling Analyzer lets you simultaneously view data in multiple displays, so you can compare samples in an experiment. With multiple displays, you can:

View different sets of samples in the same display option. For example, you can compare Histogram displays of sample 8 and sample 11.
View one set of samples in different display options. For example, you can view samples 1-6 in the Histogram display and, in a second window, view the same samples in the Cumulative display.
Compare samples from different experiments.

To view multiple displays:

Choose View > New Window to open a second Sampling Analyzer window.

In the new Sampling Analyzer window, choose data types, displays, and samples to examine, or load a second experiment if you wish.

The new window does not inherit the settings of the first Sampling Analyzer window; it is set to the defaults with which the original Sampling Analyzer window started. Also, if you close or quit the original Sampling Analyzer window, all windows opened from that window close as well.

Printing Experiments

If you want to save a record of an experiment, you can print experiment data to either a printer or a file. The Sampling Analyzer allows you to print:

A plain-text version of the current display
A text summary of the experiment that gives average sample times for each data type and shows how frequently functions, modules, and segments are used

To print a plain-text version of the current display:

Choose Experiment > Print.

Select whether the data should be printed to a printer or a file, and indicate the printer name and number of copies, if applicable.

Click OK.

To print a plain text summary of the experiment:

Choose Experiment > Print Summary.

Select whether the summary data should be printed to a printer or a file, and indicate the printer name and number of copies, if applicable.

Click OK.

Exporting Experiment Data

The Sampling Analyzer allows you to export experiment data to an ASCII file to be used later by other programs.

To export experiment data to an ASCII file:

Choose Experiment > Export to open the Export dialog box.

Enter the directory and the name of the experiment data file to be exported.

Click OK to save the experiment data under the given file name.