Sun Studio 12: Performance Analyzer

Collecting Data Using the `collect` Command

To run the Collector from the command line using the collect command, type the following.

% collect collect-options program program-arguments

Here, collect-options are the collect command options, program is the name of the program you want to collect data on, and program-arguments are the program's arguments.

If no collect-options are given, the default is to turn on clock-based profiling with a profiling interval of approximately 10 milliseconds.

To obtain a list of options and a list of the names of any hardware counters that are available for profiling, type the collect command with no arguments.

% collect

For a description of the list of hardware counters, see Hardware Counter Overflow Profiling Data. See also Limitations on Hardware Counter Overflow Profiling.

Data Collection Options

These options control the types of data that are collected. See What Data the Collector Collects for a description of the data types.

If you do not specify data collection options, the default is -p on, which enables clock-based profiling with the default profiling interval of approximately 10 milliseconds. The default is turned off by the -h option but not by any of the other data collection options.

If you explicitly disable clock-based profiling, and do not enable tracing or hardware counter overflow profiling, the collect command prints a warning message, and collects global data only.

`-p` `option`

Collect clock-based profiling data. The allowed values of option are:

off– Turn off clock-based profiling.
on– Turn on clock-based profiling with the default profiling interval of approximately 10 milliseconds.
lo[w]– Turn on clock-based profiling with the low-resolution profiling interval of approximately 100 milliseconds.
hi[gh]– Turn on clock-based profiling with the high-resolution profiling interval of approximately 1 millisecond. See Limitations on Clock-Based Profiling for information on enabling high-resolution profiling.
[+]value– Turn on clock-based profiling and set the profiling interval to value. The default units for value are milliseconds. You can specify value as an integer or a floating-point number. The numeric value can optionally be followed by the suffix m to select millisecond units or u to select microsecond units. The value should be a multiple of the clock resolution. If it is larger but not a multiple it is rounded down. If it is smaller, a warning message is printed and it is set to the clock resolution.

On SPARC platforms, any value may be prepended with a + sign to enable clock-based dataspace profiling, as is done for hardware counter profiling.

Collecting clock-based profiling data is the default action of the collect command.

`-h` `counter_definition_1` `...[,counter_definition_n]`

Collect hardware counter overflow profiling data. The number of counter definitions is processor-dependent. This option is now available on systems running the Linux operating system if you have installed the perfctr patch, which you can download from http://user.it.uu.se/~mikpe/linux/perfctr/2.6/perfctr-2.6.15.tar.gz .

A counter definition can take one of the following forms, depending on whether the processor supports attributes for hardware counters.

[+]counter_name[/ register_number][,interval ]

[+]counter_name[~ attribute_1=value_1]...[~attribute_n =value_n][/ register_number][,interval ]

The processor-specific counter_name can be one of the following:

A well-known (aliased) counter name
A raw (internal) name, as used by cputrack(1). If the counter can use either event register, the event register to be used can be specified by appending /0 or /1 to the internal name.

If you specify more than one counter, they must use different registers. If they do not use different registers, the collect command prints an error message and exits. Some counters can count on either register.

To obtain a list of available counters, type collect with no arguments in a terminal window. A description of the counter list is given in the section Hardware Counter Lists.

If the hardware counter counts events that relate to memory access, you can prefix the counter name with a + sign to turn on searching for the true program counter address (PC) of the instruction that caused the counter overflow. This backtracking works on SPARC processors, and only with counters of type load , store , or load-store. If the search is successful, the virtual PC, the physical PC, and the effective address that was referenced are stored in the event data packet.

On some processors, attribute options can be associated with a hardware counter. If a processor supports attribute options, then running the collect command with no arguments lists the counter definitions including the attribute names. You can specify attribute values in decimal or hexadecimal format.

The interval (overflow value) is the number of events counted at which the hardware counter overflows and the overflow event is recorded. The interval can be set to one of the following:

on, or a null string– The default overflow value, which you can determine by typing collect with no arguments.
hi[gh]– The high-resolution value for the chosen counter, which is approximately ten times shorter than the default overflow value. The abbreviation h is also supported for compatibility with previous software releases.
lo[w]– The low-resolution value for the chosen counter, which is approximately ten times longer than the default overflow value.
interval– A specific overflow value, which must be a positive integer and can be in decimal or hexadecimal format.

The default is the normal threshold, which is predefined for each counter and which appears in the counter list. See also Limitations on Hardware Counter Overflow Profiling.

If you use the -h option without explicitly specifying a-p option, clock-based profiling is turned off. To collect both hardware counter data and clock-based data, you must specify both a -h option and a -p option.

`-s` `option`

Collect synchronization wait tracing data. The allowed values of option are:

all– Enable synchronization wait tracing with a zero threshold. This option forces all synchronization events to be recorded.
calibrate– Enable synchronization wait tracing and set the threshold value by calibration at runtime. (Equivalent to on.)
off– Disable synchronization wait tracing.
on– Enable synchronization wait tracing with the default threshold, which is to set the value by calibration at runtime. (Equivalent to calibrate.)
value– Set the threshold to value, given as a positive integer in microseconds.

Synchronization wait tracing data is not recorded for Java monitors.

`-H` `option`

Collect heap tracing data. The allowed values of option are:

on– Turn on tracing of heap allocation and deallocation requests.
off– Turn off heap tracing.

Heap tracing is turned off by default. Heap tracing is not supported for Java programs; specifying it is treated as an error.

`-m` `option`

Collect MPI tracing data. The allowed values of option are:

on– Turn on tracing of MPI calls.
off– Turn off tracing of MPI calls.

MPI tracing is turned off by default.

See MPI Tracing Data for more information about the MPI functions whose calls are traced and the metrics that are computed from the tracing data.

`-S` `option`

Record sample packets periodically. The allowed values of option are:

off– Turn off periodic sampling.
on– Turn on periodic sampling with the default sampling interval of 1 second.
value– Turn on periodic sampling and set the sampling interval to value. The interval value must be positive, and is given in seconds.

By default, periodic sampling at 1 second intervals is enabled.

`-c` `option`

Record count data, for SPARC processors only.

Note –

This feature requires you to install the Binary Interface Tool (BIT), which is part of the Add-on Cool Tools for Sun Studio 12, available at http://cooltools.sunsource.net/. BIT is a tool for measuring performance or test suite coverage of SPARC binaries.

The allowed values of option are

on– Turn on collection of function and instruction count data. Data is recorded for the executable and for any shared objects that the executable statically links with, provided that those executables and shared objects were compiled with the -xbinopt=prepare flag. Any other shared objects that are statically linked but not compiled with the -xbinopt=prepare flag are not included in the data. Likewise, any shared objects that are dynamically opened are not included in the data. The data is viewed in the Instruction-Frequency tab in Performance Analyzer, or with the er_print ifreq command.
static– Generates an experiment with the assumption that every instruction in the target executable and any statically linked shared objects was executed exactly once. As with the -c on option, the -c static option requires that the executables and shared objects are compiled with the -xbinopt=prepare flag.

`-r` `option`

Collect data for data race detection or deadlock detection for the Thread Analyzer. The allowed values are:

on- Turn on thread analyzer data-race-detection data
off– Turn off thread analyzer data
all– Turn on all thread analyzer data
race- Turn on thread analyzer data-race-detection data
deadlock– Collect deadlock and potential-deadlock data
dtN– Turn on specific thread analyzer data types, as named by the dt* parameters.

For more information about the collect -r command and Thread Analyzer, see the Sun Studio 12: Thread Analyzer User’s Guide and the tha.1 man page.

Experiment Control Options

`-F` `option`

Control whether or not descendant processes should have their data recorded. The allowed values of option are:

on– Record experiments only on descendant processes that are created by functions fork, exec, and their variants.
all– Record experiments on all descendant processes.
off– Do not record experiments on descendant processes.
= regexp– Record experiments on all descendant processes whose name or lineage matches the specified regular expression.

If you specify the -F on option, the Collector follows processes created by calls to the functions fork(2), fork1(2), fork(3F), vfork(2), and exec(2) and its variants. The call to vfork is replaced internally by a call to fork1.

If you specify the -F all option, the Collector follows all descendant processes including those created by calls to system(3C), system(3F), sh(3F), and popen(3C), and similar functions, and their associated descendant processes.

If you specify the -F '= regexp' option, the Collector follows all descendant processes whose name or lineage matches the specified regular expression. See the regexp(5) man page for information about regular expressions.

When you collect data on descendant processes, the Collector opens a new experiment for each descendant process inside the founder experiment. These new experiments are named by adding an underscore, a letter, and a number to the experiment suffix, as follows:

The letter is either an “f” to indicate a fork, an “x” to indicate an exec, or “c” to indicate any other descendant process.
The number is the index of the fork or exec (whether successful or not) or other call.

For example, if the experiment name for the initial process is test.1.er , the experiment for the child process created by its third fork is test.1.er/_f3.er. If that child process execs a new image, the corresponding experiment name is test.1.er/_f3_x1.er. If that child creates another process using a popen call, the experiment name is test.1.er/_f3_x1_c1.er.

The Analyzer and the er_print utility automatically read experiments for descendant processes when the founder experiment is read, but the experiments for the descendant processes are not selected for data display.

To select the data for display from the command line, specify the path name explicitly to either er_print or analyzer. The specified path must include the founder experiment name, and descendant experiment name inside the founder directory.

For example, here’s what you specify to see the data for the third fork of the test.1.er experiment:

er_print test.1.er/_f3.er

analyzer test.1.er/_f3.er

Alternatively, you can prepare an experiment group file with the explicit names of the descendant experiments in which you are interested.

To examine descendant processes in the Analyzer, load the founder experiment and select Filter Data from the View menu. A list of experiments is displayed with only the founder experiment checked. Uncheck it and check the descendant experiment of interest.

Note –

If the founder process exits while descendant processes are being followed, collection of data from descendants might continue. The founder experiment directory continues to grow accordingly.

`-j` `option`

Enable Java profiling when the target program is a JVM. The allowed values of option are:

on – Recognize methods compiled by the Java HotSpot virtual machine, and attempt to record Java call stacks.
off – Do not attempt to recognize methods compiled by the Java HotSpot virtual machine.
path – Record profiling data for the JVM installed in the specified path.

The -j option is not needed if you want to collect data on a .class file or a .jar file, provided that the path to the java executable is in either the JDK_HOME environment variable or the JAVA_PATH environment variable. You can then specify the target program on the collect command line as the .class file or the .jar file, with or without the extension.

If you cannot define the path to java in the JDK_HOME or JAVA_PATH environment variables, or if you want to disable the recognition of methods compiled by the Java HotSpot virtual machine you can use the -j option. If you use this option, the program specified on the collect command line must be a Java virtual machine whose version is not earlier than 1.5_03. The collect command verifies that program is a JVM, and is an ELF executable; if it is not, the collect command prints an error message.

If you want to collect data using the 64-bit JVM, you must not use the -d64 option to java for a 32-bit JVM. If you do so, no data is collected. Instead you must specify the path to the 64-bit JVM either in the program argument to the collect command or in one of the environment variables given in this section.

`-J` `java_argument`

Specify a single argument to be passed to the JVM used for profiling. If you specify the -J option, but do not specify Java profiling, an error is generated, and no experiment is run. The argument is passed as a single argument to the JVM. If multiple arguments are needed, do not use the -J option. Instead, specify the path to the JVM explicitly, use -j on, and add the arguments for the JVM after the path to the JVM on the collect command line.

`-l` `signal`

Record a sample packet when the signal named signal is delivered to the process.

You can specify the signal by the full signal name, by the signal name without the initial letters SIG, or by the signal number. Do not use a signal that is used by the program or that would terminate execution. Suggested signals are SIGUSR1 and SIGUSR2. Signals can be delivered to a process by the kill command.

If you use both the -l and the -yoptions, you must use different signals for each option.

If you use this option and your program has its own signal handler, you should make sure that the signal that you specify with -l is passed on to the Collector’s signal handler, and is not intercepted or ignored.

See the signal(3HEAD) man page for more information about signals.

`-t` `duration`

Specify a time range for data collection.

The duration can be specified as a single number, with an optional m or s suffix, to indicate the time in minutes or seconds at which the experiment should be terminated. By default, the duration is in seconds. The duration can also be specified as two such numbers separated by a hyphen, which causes data collection to pause until the first time elapses, and at that time data collection begins. When the second time is reached, data collection terminates. If the second number is a zero, data will be collected after the initial pause until the end of the program's run. Even if the experiment is terminated, the target process is allowed to run to completion.

`-x`

Leave the target process stopped on exit from the exec system call in order to allow a debugger to attach to it. If you attach dbx to the process, use the dbx commands ignore PROF and ignore EMT to ensure that collection signals are passed on to the collect command.

`-y` `signal`[ `,r`]

Control recording of data with the signal named signal. Whenever the signal is delivered to the process, it switches between the paused state, in which no data is recorded, and the recording state, in which data is recorded. Sample points are always recorded, regardless of the state of the switch.

The signal can be specified by the full signal name, by the signal name without the initial letters SIG, or by the signal number. Do not use a signal that is used by the program or that would terminate execution. Suggested signals are SIGUSR1 and SIGUSR2. Signals can be delivered to a process by the kill(1) command.

If you use both the -l and the -y options, you must use different signals for each option.

When the -y option is used, the Collector is started in the recording state if the optional r argument is given, otherwise it is started in the paused state. If the -y option is not used, the Collector is started in the recording state.

If you use this option and your program has its own signal handler, make sure that the signal that you specify with -y is passed on to the Collector’s signal handler, and is not intercepted or ignored.

See the signal(3HEAD) man page for more information about signals.

Output Options

`-o` `experiment_name`

Use experiment_name as the name of the experiment to be recorded. The experiment_name string must end in the string “.er”; if not, the collect utility prints an error message and exits.

`-d` `directory-name`

Place the experiment in directory directory-name. This option only applies to individual experiments and not to experiment groups. If the directory does not exist, the collect utility prints an error message and exits. If a group is specified with the -g option, the group file is also written to directory-name.

`-g` `group-name`

Make the experiment part of experiment group group-name. If group-name does not end in .erg, the collect utility prints an error message and exits. If the group exists, the experiment is added to it. If group-name is not an absolute path, the experiment group is placed in the directory directory-name if a directory has been specified with -d, otherwise it is placed in the current directory.

`-A` `option`

Control whether or not load objects used by the target process should be archived or copied into the recorded experiment. The allowed values of option are:

off– do not archive load objects into the experiment.
on– archive load objects into the experiment.
copy– copy and archive load objects into the experiment.

If you expect to copy experiments to a different machine from which they were recorded, or to read the experiments from a different machine, specify - A copy. Using this option does not copy any source files or object files into the experiment. You should ensure that those files are accessible on the machine to which you are copying the experiment.

`-L` `size`

Limit the amount of profiling data recorded to size megabytes. The limit applies to the sum of the amounts of clock-based profiling data, hardware counter overflow profiling data, and synchronization wait tracing data, but not to sample points. The limit is only approximate, and can be exceeded.

When the limit is reached, no more profiling data is recorded but the experiment remains open until the target process terminates. If periodic sampling is enabled, sample points continue to be written.

The default limit on the amount of data recorded is 2000 Mbytes. This limit was chosen because the Performance Analyzer cannot process experiments that contain more than 2 Gbytes of data. To remove the limit, set size to unlimited or none.

`-O` `file`

Append all output from collect itself to the name file, but do not redirect the output from the spawned target. If file is set to /dev/null, suppress all output from collect, including any error messages.

Other Options

`-C` `comment`

Put the comment into the notes file for the experiment. You can supply up to ten -C options. The contents of the notes file are prepended to the experiment header.

`-n`

Do not run the target but print the details of the experiment that would be generated if the target were run. This option is a dry run option.

`-R`

Display the text version of the Performance Analyzer Readme in the terminal window. If the readme is not found, a warning is printed. No further arguments are examined, and no further processing is done.

`-V`

Print the current version of the collect command. No further arguments are examined, and no further processing is done.

`-v`

Print the current version of the collect command and detailed information about the experiment being run.

Collecting Data From a Running Process Using the `collect` Utility

In the Solaris OS only, the -P pid option can be used with the collect utility to attach to the process with the specified PID, and collect data from the process. The other options to the collect command are translated into a script for dbx, which is then invoked to collect the data. Only clock-based profile data (-p option) and hardware counter overflow profile data (-h option) can be collected. Tracing data is not supported.

If you use the -h option without explicitly specifying a -p option, clock-based profiling is turned off. To collect both hardware counter data and clock-based data, you must specify both a -h option and a -p option.

To Collect Data From a Running Process Using the `collect` Utility

Determine the program’s process ID (PID).

If you started the program from the command line and put it in the background, its PID will be printed to standard output by the shell. Otherwise you can determine the program’s PID by typing the following.
% ps -ef | grep program-name

Use the collect command to enable data collection on the process, and set any optional parameters.
% collect -P pid collect-options
The collector options are described in Data Collection Options. For information about clock-based profiling, see -p option. For information about hardware clock profiling, see -h option.

Collecting Data Using the collect Command

Data Collection Options

-p option

-h counter_definition_1 ...[,counter_definition_n]

-s option

-H option

-m option

-S option

-c option

-r option

Experiment Control Options

-F option

-j option

-J java_argument

-l signal

-t duration

-x

-y signal[ ,r]