Sun Studio 12: Performance Analyzer

Preparing Your Program for Data Collection and Analysis

You do not need to do anything special to prepare most programs for data collection and analysis. You should read one or more of the subsections below if your program does any of the following:

Installs a signal handler
Explicitly dynamically loads a system library
Dynamically compiles functions
Creates descendant processes that you want to profile
Uses the asynchronous I/O library
Uses the profiling timer or hardware counter API directly
Calls setuid(2) or executes a setuid file

Also, if you want to control data collection from your program, you should read the relevant subsection.

Using Dynamically Allocated Memory

Many programs rely on dynamically-allocated memory, using features such as:

malloc, valloc, alloca (C/C++)
new (C++)
stack local variables (Fortran)
MALLOC, MALLOC64 (Fortran)

You must take care to ensure that a program does not rely on the initial contents of dynamically allocated memory, unless the memory allocation method is explicitly documented as setting an initial value: for example, compare the descriptions of calloc and malloc in the man page for malloc(3C).

Occasionally, a program that uses dynamically-allocated memory might appear to work correctly when run alone, but might fail when run with performance data collection enabled. Symptoms might include unexpected floating point behavior, segmentation faults, or application-specific error messages.

Such behavior might occur if the uninitialized memory is, by chance, set to a benign value when the application is run alone, but is set to a different value when the application is run in conjunction with the performance data collection tools. In such cases, the performance tools are not at fault. Any application that relies on the contents of dynamically allocated memory has a latent bug: an operating system is at liberty to provide any content whatsoever in dynamically allocated memory, unless explicitly documented otherwise. Even if an operating system happens to always set dynamically allocated memory to a certain value today, such latent bugs might cause unexpected behavior with a later revision of the operating system, or if the program is ported to a different operating system in the future.

The following tools may help in finding such latent bugs:

f95 -xcheck=init_local

For more information, see the Fortran User’s Guide or the f95(1) man page
lint utility

For more information, see the C User’s Guide or the lint(1) man page
Runtime checking under dbx

For more information, see the Debugging a Program With dbx manual or the dbx(1) man page.
Rational Purify

Using System Libraries

The Collector interposes on functions from various system libraries, to collect tracing data and to ensure the integrity of data collection. The following list describes situations in which the Collector interposes on calls to library functions.

Collecting synchronization wait tracing data. The Collector interposes on functions from the Solaris threads library, libthread.so on the Solaris 9 OS, and from the Solaris C library, libc.so, on the Solaris 10 OS.
Collecting heap tracing data. The Collector interposes on the functions malloc, realloc, memalign and free. Versions of these functions are found in the C standard library, libc.so and also in other libraries such as libmalloc.so and libmtmalloc.so.
Collecting MPI tracing data. The Collector interposes on functions from the Solaris MPI library, libmpi.so.
Ensuring the integrity of clock data. The Collector interposes on setitimer and prevents the program from using the profiling timer.
Ensuring the integrity of hardware counter data. The Collector interposes on functions from the hardware counter library, libcpc.so and prevents the program from using the counters. Calls from the program to functions from this library return a value of -1.
Enabling data collection on descendant processes. The Collector interposes on the functions fork(2), fork1(2), vfork(2), fork(3F), system(3C), system(3F), sh(3F), popen(3C), and exec(2) and its variants. Calls to vfork are replaced internally by calls to fork1. These interpositions are only done for the collect command.
Guaranteeing the handling of the SIGPROF and SIGEMT signals by the Collector. The Collector interposes on sigaction to ensure that its signal handler is the primary signal handler for these signals.

Under some circumstances the interposition does not succeed:

Statically linking a program with any of the libraries that contain functions that are interposed.
Attaching dbx to a running application that does not have the collector library preloaded.
Dynamically loading one of these libraries and resolving the symbols by searching only within the library.

The failure of interposition by the Collector can cause loss or invalidation of performance data.

Using Signal Handlers

The Collector uses two signals to collect profiling data: SIGPROF for all experiments and SIGEMT for hardware counter experiments only. The Collector installs a signal handler for each of these signals. The signal handler intercepts and processes its own signal, but passes other signals on to any other signal handlers that are installed. If a program installs its own signal handler for these signals, the Collector reinstalls its signal handler as the primary handler to guarantee the integrity of the performance data.

The collect command can also use user-specified signals for pausing and resuming data collection and for recording samples. These signals are not protected by the Collector although a warning is written to the experiment if a user handler is installed. It is your responsibility to ensure that there is no conflict between use of the specified signals by the Collector and any use made by the application of the same signals.

The signal handlers installed by the Collector set a flag that ensures that system calls are not interrupted for signal delivery. This flag setting could change the behavior of the program if the program’s signal handler sets the flag to permit interruption of system calls. One important example of a change in behavior occurs for the asynchronous I/O library, libaio.so, which uses SIGPROF for asynchronous cancel operations, and which does interrupt system calls. If the collector library, libcollector.so, is installed, the cancel signal invariably arrives too late to cancel the asynchronous I/O operation.

If you attach dbx to a process without preloading the collector library and enable performance data collection, and the program subsequently installs its own signal handler, the Collector does not reinstall its own signal handler. In this case, the program’s signal handler must ensure that the SIGPROF and SIGEMT signals are passed on so that performance data is not lost. If the program’s signal handler interrupts system calls, both the program behavior and the profiling behavior are different from when the collector library is preloaded.

Using `setuid`

Restrictions enforced by the dynamic loader make it difficult to use setuid(2) and collect performance data. If your program calls setuid or executes a setuid file, it is likely that the Collector cannot write an experiment file because it lacks the necessary permissions for the new user ID.

You can work around this issue by ensuring that your umask is set to give write permission to any UIDs or GIDs that the process may run under. The ids must have write permission for the experiments.

Program Control of Data Collection

If you want to control data collection from your program, the Collector shared library, libcollector.so contains some API functions that you can use. The functions are written in C. A Fortran interface is also provided. Both C and Fortran interfaces are defined in header files that are provided with the library.

The API functions are defined as follows.

void collector_sample(char *name);
void collector_pause(void);
void collector_resume(void);
void collector_thread_pause(unsigned int t);
void collector_thread_resume(unsigned int t);
void collector_terminate_expt(void);

Similar functionality is provided for Java^TM programs by the CollectorAPI class, which is described in The Java Interface.

The C and C++ Interface

There are two ways to access the C and C++ interface:

One way is to include collectorAPI.h and link with -lcollectorAPI, which contains real functions to check for the existence of the underlying libcollector.so API functions.

This way requires that you link with an API library, and works under all circumstances. If no experiment is active, the API calls are ignored.
The second way is to include libcollector.h, which contains macros that check for the existence of the underlying libcollector.so API functions.

This way works when used in the main executable, and when data collection is started at the same time the program starts. This way does not always work when dbx is used to attach to the process, nor when used from within a shared library that is dlopen’d by the process. This second way is provided for backward compatibility only, and its use is discouraged for any other purpose.

Note –
Do not link a program in any language with -lcollector. If you do, the Collector can exhibit unpredictable behavior.

The Fortran Interface

The Fortran API libfcollector.h file defines the Fortran interface to the library. The application must be linked with -lcollectorAPI to use this library. (An alternate name for the library, -lfcollector, is provided for backward compatibility.) The Fortran API provides the same features as the C and C++ API, excluding the dynamic function and thread pause and resume calls.

Insert the following statement to use the API functions for Fortran:

include "libfcollector.h"

Note –

Do not link a program in any language with -lcollector. If you do, the Collector can exhibit unpredictable behavior.

The Java Interface

Use the following statement to import the CollectorAPI class and access the Java API. Note however that your application must be invoked with a classpath pointing to / installation_directory/lib/collector.jar where installation-directory is the directory in which the Sun Studio software is installed.

import com.sun.forte.st.collector.CollectorAPI;

The Java CollectorAPI methods are defined as follows:

CollectorAPI.sample(String name)
CollectorAPI.pause()
CollectorAPI.resume()
CollectorAPI.threadPause(Thread thread)
CollectorAPI.threadResume(Thread thread)
CollectorAPI.terminate()

The Java API includes the same functions as the C and C++ API, excluding the dynamic function API.

The C include file libcollector.h contains macros that bypass the calls to the real API functions if data is not being collected. In this case the functions are not dynamically loaded. However, using these macros is risky because the macros do not work well under some circumstances. It is safer to use collectorAPI.h because it does not use macros. Rather, it refers directly to the functions.

The Fortran API subroutines call the C API functions if performance data is being collected, otherwise they return. The overhead for the checking is very small and should not significantly affect program performance.

To collect performance data you must run your program using the Collector, as described later in this chapter. Inserting calls to the API functions does not enable data collection.

If you intend to use the API functions in a multithreaded program, you should ensure that they are only called by one thread. With the exception of collector_thread_pause() and collector_thread_resume(), the API functions perform actions that apply to the process and not to individual threads. If each thread calls the API functions, the data that is recorded might not be what you expect. For example, if collector_pause() or collector_terminate_expt() is called by one thread before the other threads have reached the same point in the program, collection is paused or terminated for all threads, and data can be lost from the threads that were executing code before the API call. To control data collection at the level of the individual threads, use the collector_thread_pause() and collector_thread_resume() functions. There are two viable ways of using these functions: by having one master thread make all the calls for all threads, including itself; or by having each thread make calls only for itself. Any other usage can lead to unpredictable results.

The C, C++, Fortran, and Java API Functions

The descriptions of the API functions follow.

C and C++: collector_sample(char *name)

Fortran: collector_sample(string name)

Java: CollectorAPI.sample(String name)

Record a sample packet and label the sample with the given name or string. The label is displayed by the Performance Analyzer in the Event tab. The Fortran argument string is of type character.

Sample points contain data for the process and not for individual threads. In a multithreaded application, the collector_sample() API function ensures that only one sample is written if another call is made while it is recording a sample. The number of samples recorded can be less than the number of threads making the call.

The Performance Analyzer does not distinguish between samples recorded by different mechanisms. If you want to see only the samples recorded by API calls, you should turn off all other sampling modes when you record performance data.
C, C++, Fortran: collector_pause()

Java: CollectorAPI.pause()

Stop writing event-specific data to the experiment. The experiment remains open, and global data continues to be written. The call is ignored if no experiment is active or if data recording is already stopped. This function stops the writing of all event-specific data even if it is enabled for specific threads by the collector_thread_resume() function.
C, C++, Fortran: collector_resume()

Java: CollectorAPI.resume()

Resume writing event-specific data to the experiment after a call to collector_pause() . The call is ignored if no experiment is active or if data recording is active.
C and C++ only: collector_thread_pause(unsigned int t)

Java: CollectorAPI.threadPause(Thread t)

Stop writing event-specific data from the thread specified in the argument list to the experiment. The argument t is the POSIX thread identifier for C/C++ programs and a Java thread for Java programs. If the experiment is already terminated, or no experiment is active, or writing of data for that thread is already turned off, the call is ignored. This function stops the writing of data from the specified thread even if the writing of data is globally enabled. By default, recording of data for individual threads is turned on.
C and C++ only: collector_thread_resume(unsigned int t)

Java: CollectorAPI.threadResume(Thread t)

Resume writing event-specific data from the thread specified in the argument list to the experiment. The argument t is the POSIX thread identifier for C/C++ programs and a Java thread for Java programs. If the experiment is already terminated, or no experiment is active, or writing of data for that thread is already turned on, the call is ignored. Data is written to the experiment only if the writing of data is globally enabled as well as enabled for the thread.
C, C++, Fortran: collector_terminate_expt()

Java: CollectorAPI.terminate

Terminate the experiment whose data is being collected. No further data is collected, but the program continues to run normally. The call is ignored if no experiment is active.

Dynamic Functions and Modules

If your C or C++ program dynamically compiles functions into the data space of the program, you must supply information to the Collector if you want to see data for the dynamic function or module in the Performance Analyzer. The information is passed by calls to collector API functions. The definitions of the API functions are as follows.

void collector_func_load(char *name, char *alias,
    char *sourcename, void *vaddr, int size, int lntsize,
    Lineno *lntable);
void collector_func_unload(void *vaddr);

You do not need to use these API functions for Java methods that are compiled by the Java HotSpot^TM virtual machine, for which a different interface is used. The Java interface provides the name of the method that was compiled to the Collector. You can see function data and annotated disassembly listings for Java compiled methods, but not annotated source listings.

The descriptions of the API functions follow.

`collector_func_load()`

Pass information about dynamically compiled functions to the Collector for recording in the experiment. The parameter list is described in the following table.

Table 3–1 Parameter List for collector_func_load()


Parameter	Definition
`name`	The name of the dynamically compiled function that is used by the performance tools. The name does not have to be the actual name of the function. The name need not follow any of the normal naming conventions of functions, although it should not contain embedded blanks or embedded quote characters.
`alias`	An arbitrary string used to describe the function. It can be `NULL`. It is not interpreted in any way, and can contain embedded blanks. It is displayed in the Summary tab of the Analyzer. It can be used to indicate what the function is, or why the function was dynamically constructed.
`sourcename`	The path to the source file from which the function was constructed. It can be `NULL`. The source file is used for annotated source listings.
`vaddr`	The address at which the function was loaded.
`size`	The size of the function in bytes.
`lntsize`	A count of the number of entries in the line number table. It should be zero if line number information is not provided.
`lntable`	A table containing `lntsize` entries, each of which is a pair of integers. The first integer is an offset, and the second entry is a line number. All instructions between an offset in one entry and the offset given in the next entry are attributed to the line number given in the first entry. Offsets must be in increasing numeric order, but the order of line numbers is arbitrary. If `lntable` is `NULL`, no source listings of the function are possible, although disassembly listings are available.

`collector_func_unload()`

Inform the collector that the dynamic function at the address vaddr has been unloaded.