1. Overview of the Performance Analyzer
3. Collecting Performance Data
Compiling and Linking Your Program
Preparing Your Program for Data Collection and Analysis
Using Dynamically Allocated Memory
Program Control of Data Collection
The C, C++, Fortran, and Java API Functions
Limitations on Data Collection
Limitations on Clock-Based Profiling
Runtime Distortion and Dilation with Clock-profiling
Limitations on Collection of Tracing Data
Runtime Distortion and Dilation with Tracing
Limitations on Hardware Counter Overflow Profiling
Runtime Distortion and Dilation With Hardware Counter Overflow Profiling
Limitations on Data Collection for Descendant Processes
Limitations on OpenMP Profiling
Estimating Storage Requirements
Collecting Data Using the collect Command
-h counter_definition_1...[,counter_definition_n]
Collecting Data From a Running Process Using the collect Utility
To Collect Data From a Running Process Using the collect Utility
Collecting Data Using the dbx collector Subcommands
To Run the Collector From dbx:
Experiment Control Subcommands
Collecting Data From a Running Process With dbx on Solaris Platforms
To Collect Data From a Running Process That is Not Under the Control of dbx
Collecting Tracing Data From a Running Program
Collecting Data From MPI Programs
Running the collect Command for MPI
4. The Performance Analyzer Tool
5. The er_print Command Line Performance Analysis Tool
6. Understanding the Performance Analyzer and Its Data
To run the Collector from the command line using the collect command, type the following.
% collect collect-options program program-arguments
Here, collect-options are the collect command options, program is the name of the program you want to collect data on, and program-arguments are the program's arguments. The target program is typically a binary executable. However, if you set the environment variable SP_COLLECTOR_SKIP_CHECKEXEC you can specify a script as the target.
If no collect-options are given, the default is to turn on clock-based profiling with a profiling interval of approximately 10 milliseconds.
To obtain a list of options and a list of the names of any hardware counters that are available for profiling, type the collect command with no arguments.
% collect
For a description of the list of hardware counters, see Hardware Counter Overflow Profiling Data. See also Limitations on Hardware Counter Overflow Profiling.
These options control the types of data that are collected. See What Data the Collector Collects for a description of the data types.
If you do not specify data collection options, the default is -p on, which enables clock-based profiling with the default profiling interval of approximately 10 milliseconds. The default is turned off by the -h option but not by any of the other data collection options.
If you explicitly disable clock-based profiling, and do not enable tracing or hardware counter overflow profiling, the collect command prints a warning message, and collects global data only.
Collect clock-based profiling data. The allowed values of option are:
off– Turn off clock-based profiling.
on– Turn on clock-based profiling with the default profiling interval of approximately 10 milliseconds.
lo[w]– Turn on clock-based profiling with the low-resolution profiling interval of approximately 100 milliseconds.
hi[gh]– Turn on clock-based profiling with the high-resolution profiling interval of approximately 1 millisecond. See Limitations on Clock-Based Profiling for information on enabling high-resolution profiling.
[+]value– Turn on clock-based profiling and set the profiling interval to value. The default units for value are milliseconds. You can specify value as an integer or a floating-point number. The numeric value can optionally be followed by the suffix m to select millisecond units or u to select microsecond units. The value should be a multiple of the clock resolution. If it is larger but not a multiple it is rounded down. If it is smaller, a warning message is printed and it is set to the clock resolution.
On SPARC platforms, any value can be prepended with a + sign to enable clock-based dataspace profiling, as is done for hardware counter profiling.
Collecting clock-based profiling data is the default action of the collect command.
Collect hardware counter overflow profiling data. The number of counter definitions is processor-dependent.
This option is available on systems running the Linux operating system if you have installed the perfctr patch, which you can download from http://user.it.uu.se/~mikpe/linux/perfctr/2.6/ . Instructions for installation are contained within the tar file. The user-level libperfctr.so libraries are searched for using the value of the LD_LIBRARY_PATH environment variable, then in /usr/local/lib, /usr/lib, and /lib for the 32–bit versions, or /usr/local/lib64, /usr/lib64, and /lib64 for the 64–bit versions.
To obtain a list of available counters, type collect with no arguments in a terminal window. A description of the counter list is given in the section Hardware Counter Lists. On most systems, even if a counter is not listed, you can still specify it by a numeric value, either in hexadecimal or decimal.
A counter definition can take one of the following forms, depending on whether the processor supports attributes for hardware counters.
[+]counter_name[/ register_number][,interval ]
[+]counter_name[~ attribute_1=value_1]...[~attribute_n =value_n][/ register_number][,interval ]
The processor-specific counter_name can be one of the following:
An aliased counter name
A raw name
A numeric value in either decimal or hexadecimal
If you specify more than one counter, they must use different registers. If they do not use different registers, the collect command prints an error message and exits.
If the hardware counter counts events that relate to memory access, you can prefix the counter name with a + sign to turn on searching for the true program counter address (PC) of the instruction that caused the counter overflow. This backtracking works on SPARC processors, and only with counters of type load , store , or load-store. If the search is successful, the virtual PC, the physical PC, and the effective address that was referenced are stored in the event data packet.
On some processors, attribute options can be associated with a hardware counter. If a processor supports attribute options, then running the collect command with no arguments lists the counter definitions including the attribute names. You can specify attribute values in decimal or hexadecimal format.
The interval (overflow value) is the number of events or cycles counted at which the hardware counter overflows and the overflow event is recorded. The interval can be set to one of the following:
on, or a null string– The default overflow value, which you can determine by typing collect with no arguments.
hi[gh]– The high-resolution value for the chosen counter, which is approximately ten times shorter than the default overflow value. The abbreviation h is also supported for compatibility with previous software releases.
lo[w]– The low-resolution value for the chosen counter, which is approximately ten times longer than the default overflow value.
interval– A specific overflow value, which must be a positive integer and can be in decimal or hexadecimal format.
The default is the normal threshold, which is predefined for each counter and which appears in the counter list. See also Limitations on Hardware Counter Overflow Profiling.
If you use the -h option without explicitly specifying a -p option, clock-based profiling is turned off. To collect both hardware counter data and clock-based data, you must specify both a -h option and a -p option.
Collect synchronization wait tracing data. The allowed values of option are:
all– Enable synchronization wait tracing with a zero threshold. This option forces all synchronization events to be recorded.
calibrate– Enable synchronization wait tracing and set the threshold value by calibration at runtime. (Equivalent to on.)
off– Disable synchronization wait tracing.
on– Enable synchronization wait tracing with the default threshold, which is to set the value by calibration at runtime. (Equivalent to calibrate.)
value– Set the threshold to value, given as a positive integer in microseconds.
Synchronization wait tracing data cannot be recorded for Java programs; specifying it is treated as an error.
On Solaris, the following functions are traced:
|
On Linux, the following functions are traced:
|
Collect heap tracing data. The allowed values of option are:
on– Turn on tracing of heap allocation and deallocation requests.
off– Turn off heap tracing.
Heap tracing is turned off by default. Heap tracing is not supported for Java programs; specifying it is treated as an error.
Specify collection of an MPI experiment. The target of the collect command must be the mpirun command, and its options must be separated from the target programs to be run by the mpirun command by a -- option. (Always use the -- option with the mpirun command so that you can collect an experiment by prepending the collect command and its option to the mpirun command line.) The experiment is named as usual and is referred to as the founder experiment; its directory contains subexperiments for each of the MPI processes, named by rank.
The allowed values of option are:
MPI-version - Turn on collection of an MPI experiment, assuming the specified MPI version which must be one of OMPT, CT, OPENMPI, MPICH2, or MVAPICH2. Oracle Message Passing Toolkit can be specified using OMPT or CT.
off - Turn off collection of an MPI experiment.
By default, collection of an MPI experiment is turned off. When collection of an MPI experiment is turned on, the default setting for the -m option is changed to on.
The supported versions of MPI are printed when you type the collect command with no options, or if you specify an unrecognized version with the -M option.
Collect MPI tracing data. The allowed values of option are:
on – Turn on MPI tracing information.
off – Turn off MPI tracing information.
MPI tracing is turned off by default unless the -M option is enabled, in which case MPI tracing is turned on by default. Normally MPI experiments are collected with the -M option, and no user control of MPI tracing is needed. If you want to collect an MPI experiment, but not collect MPI tracing data, use the explicit options -M MPI-version -m off.
See MPI Tracing Data for more information about the MPI functions whose calls are traced and the metrics that are computed from the tracing data.
Record sample packets periodically. The allowed values of option are:
off – Turn off periodic sampling.
on – Turn on periodic sampling with the default sampling interval of 1 second.
value – Turn on periodic sampling and set the sampling interval to value. The interval value must be positive, and is given in seconds.
By default, periodic sampling at 1 second intervals is enabled.
Record count data, for Solaris systems only.
The allowed values of option are
on– Turn on collection of function and instruction count data. Count data and simulated count data are recorded for the executable and for any shared objects that are instrumented and that the executable statically links with, provided that those executables and shared objects were compiled with the -xbinopt=prepare option. Any other shared objects that are statically linked but not compiled with the -xbinopt=prepare option are not included in the data. Any shared objects that are dynamically opened are not included in the simulated count data.
In addition to count metrics for functions, lines, and so on, you can view a summary of the usage of various instructions in the Instruction-Frequency tab in Performance Analyzer, or with the er_print ifreq command.
off - Turn off collection of count data.
static - Generates an experiment with the assumption that every instruction in the target executable and any statically linked shared objects was executed exactly once. As with the -c on option, the -c static option requires that the executables and shared objects are compiled with the -xbinopt=prepare flag.
By default, turn off collection of count data. Count data cannot be collected with any other type of data.
Specify a directory for bit instrumentation. This option is available only on Solaris systems, and is meaningful only when the -c option is also specified.
Specify a library to be excluded from bit instrumentation, whether the library is linked into the executable or loaded with dlopen()(). This option is available only on Solaris systems, and is meaningful only when the -c option is also specified. You can specify multiple -N options.
Collect data for data race detection or deadlock detection for the Thread Analyzer. The allowed values are:
race - Collect data for data race detection
deadlock - Collect deadlock and potential-deadlock data
all - Collect data for data race detection and deadlock detection
off - Turn off thread analyzer data
For more information about the collect -r command and Thread Analyzer, see the Oracle Solaris Studio 12.2: Thread Analyzer User’s Guide and the tha(1) man page.
These options control aspects of how the experiment data is collected.
Control whether or not descendant processes should have their data recorded. The allowed values of option are:
on – Record experiments only on descendant processes that are created by functions fork, exec, and their variants.
all – Record experiments on all descendant processes.
off – Do not record experiments on descendant processes.
= regexp – Record experiments on all descendant processes whose name or lineage matches the specified regular expression.
The -F on option is set by default so that the Collector follows processes created by calls to the functions fork(2), fork1(2), fork(3F), vfork(2), and exec(2) and its variants. The call to vfork is replaced internally by a call to fork1.
For MPI experiments, descendants are also followed by default.
If you specify the -F all option, the Collector follows all descendant processes including those created by calls to system(3C), system(3F), sh(3F), posix_spawn(3p), posix_spawnp(3p), and popen(3C), and similar functions, and their associated descendant processes.
If you specify the -F '= regexp' option, the Collector follows all descendant processes. The Collector creates a subexperiment when the descendant name or subexperiment name matches the specified regular expression. See the regexp(5) man page for information about regular expressions.
When you collect data on descendant processes, the Collector opens a new experiment for each descendant process inside the founder experiment. These new experiments are named by adding an underscore, a letter, and a number to the experiment suffix, as follows:
The letter is either an “f” to indicate a fork, an “x” to indicate an exec, or “c” to indicate any other descendant process.
The number is the index of the fork or exec (whether successful or not) or other call.
For example, if the experiment name for the initial process is test.1.er , the experiment for the child process created by its third fork is test.1.er/_f3.er. If that child process execs a new image, the corresponding experiment name is test.1.er/_f3_x1.er. If that child creates another process using a popen call, the experiment name is test.1.er/_f3_x1_c1.er.
The Analyzer and the er_print utility automatically read experiments for descendant processes when the founder experiment is read, and show descendants in the data display.
To select the data for display from the command line, specify the path name explicitly to either er_print or analyzer. The specified path must include the founder experiment name, and descendant experiment name inside the founder directory.
For example, here’s what you specify to see the data for the third fork of the test.1.er experiment:
er_print test.1.er/_f3.er
analyzer test.1.er/_f3.er
Alternatively, you can prepare an experiment group file with the explicit names of the descendant experiments in which you are interested.
To examine descendant processes in the Analyzer, load the founder experiment and select Filter Data from the View menu. A list of experiments is displayed with only the founder experiment checked. Uncheck it and check the descendant experiment of interest.
Note - If the founder process exits while descendant processes are being followed, collection of data from descendants that are still running will continue. The founder experiment directory continues to grow accordingly.
You can also collect data on scripts and follow descendant processes of scripts. See Collecting Data From Scripts for more information.
Enable Java profiling when the target program is a JVM. The allowed values of option are:
on – Recognize methods compiled by the Java HotSpot virtual machine, and attempt to record Java call stacks.
off – Do not attempt to recognize methods compiled by the Java HotSpot virtual machine.
path – Record profiling data for the JVM installed in the specified path.
The -j option is not needed if you want to collect data on a .class file or a .jar file, provided that the path to the java executable is in either the JDK_HOME environment variable or the JAVA_PATH environment variable. You can then specify the target program on the collect command line as the .class file or the .jar file, with or without the extension.
If you cannot define the path to the java executable in the JDK_HOME or JAVA_PATH environment variables, or if you want to disable the recognition of methods compiled by the Java HotSpot virtual machine you can use the -j option. If you use this option, the program specified on the collect command line must be a Java virtual machine whose version is not earlier than JDK 6, Update 18. The collect command verifies that program is a JVM, and is an ELF executable; if it is not, the collect command prints an error message.
If you want to collect data using the 64-bit JVM, you must not use the -d64 option to the java command for a 32-bit JVM. If you do so, no data is collected. Instead you must specify the path to the 64-bit JVM either in the program argument to the collect command or in the JDK_HOME or JAVA_PATH environment variable.
Specify additional arguments to be passed to the JVM used for profiling. If you specify the -J option, but do not specify Java profiling, an error is generated, and no experiment is run. The java_argument must be enclosed in quotation marks if it contains more than one argument. It must consist of a set of tokens separated by blanks or tabs. Each token is passed as a separate argument to the JVM. Most arguments to the JVM must begin with a “-” character.
Record a sample packet when the signal named signal is delivered to the process.
You can specify the signal by the full signal name, by the signal name without the initial letters SIG, or by the signal number. Do not use a signal that is used by the program or that would terminate execution. Suggested signals are SIGUSR1 and SIGUSR2. SIGPROF can be used, even when clock-profiling is specified. Signals can be delivered to a process by the kill command.
If you use both the -l and the -y options, you must use different signals for each option.
If you use this option and your program has its own signal handler, you should make sure that the signal that you specify with -l is passed on to the Collector’s signal handler, and is not intercepted or ignored.
See the signal(3HEAD) man page for more information about signals.
Specify a time range for data collection.
The duration can be specified as a single number, with an optional m or s suffix, to indicate the time in minutes or seconds at which the experiment should be terminated. By default, the duration is in seconds. The duration can also be specified as two such numbers separated by a hyphen, which causes data collection to pause until the first time elapses, and at that time data collection begins. When the second time is reached, data collection terminates. If the second number is a zero, data will be collected after the initial pause until the end of the program's run. Even if the experiment is terminated, the target process is allowed to run to completion.
Leave the target process stopped on exit from the exec system call in order to allow a debugger to attach to it. If you attach dbx to the process, use the dbx commands ignore PROF and ignore EMT to ensure that collection signals are passed on to the collect command.
Control recording of data with the signal named signal. Whenever the signal is delivered to the process, it switches between the paused state, in which no data is recorded, and the recording state, in which data is recorded. Sample points are always recorded, regardless of the state of the switch.
The signal can be specified by the full signal name, by the signal name without the initial letters SIG, or by the signal number. Do not use a signal that is used by the program or that would terminate execution. Suggested signals are SIGUSR1 and SIGUSR2. SIGPROF can be used, even when clock-profiling is specified. Signals can be delivered to a process by the kill command.
If you use both the -l and the -y options, you must use different signals for each option.
When the -y option is used, the Collector is started in the recording state if the optional r argument is given, otherwise it is started in the paused state. If the -y option is not used, the Collector is started in the recording state.
If you use this option and your program has its own signal handler, make sure that the signal that you specify with -y is passed on to the Collector’s signal handler, and is not intercepted or ignored.
See the signal(3HEAD) man page for more information about signals.
These options control aspects of the experiment produced by the Collector.
Use experiment_name as the name of the experiment to be recorded. The experiment_name string must end in the string “.er”; if not, the collect utility prints an error message and exits.
If you do not specify the -o option, give the experiment a name of the form stem.n.er, where stem is a string, and n is a number. If you have specified a group name with the -g option, set stem to the group name without the .erg suffix. If you have not specified a group name, set stem to the string test.
If you are invoking the collect command from one of the commands used to run MPI jobs, for example, mpirun, but without the -M MPI-version option and the -o option, take the value of n used in the name from the environment variable used to define the MPI rank of that process. Otherwise, set n to one greater than the highest integer currently in use.
If the name is not specified in the form stem.n.er, and the given name is in use, an error message is displayed and the experiment is not run. If the name is of the form stem.n.er and the name supplied is in use, the experiment is recorded under a name corresponding to one greater than the highest value of n that is currently in use. A warning is displayed if the name is changed.
Place the experiment in directory directory-name. This option only applies to individual experiments and not to experiment groups. If the directory does not exist, the collect utility prints an error message and exits. If a group is specified with the -g option, the group file is also written to directory-name.
For the lightest-weight data collection, it is best to record data to a local file, using the -d option to specify a directory in which to put the data. However, for MPI experiments on a cluster, the founder experiment must be available at the same path for all processes to have all data recorded into the founder experiment.
Experiments written to long-latency file systems are especially problematic, and might progress very slowly, especially if Sample data is collected (-S on option, the default). If you must record over a long-latency connection, disable Sample data.
Make the experiment part of experiment group group-name. If group-name does not end in .erg, the collect utility prints an error message and exits. If the group exists, the experiment is added to it. If group-name is not an absolute path, the experiment group is placed in the directory directory-name if a directory has been specified with -d, otherwise it is placed in the current directory.
Control whether or not load objects used by the target process should be archived or copied into the recorded experiment. The allowed values of option are:
off – do not archive load objects into the experiment.
on – archive load objects into the experiment.
copy – copy and archive load objects (the target and any shared objects it uses) into the experiment.
If you expect to copy experiments to a different machine from which they were recorded, or to read the experiments from a different machine, specify - A copy. Using this option does not copy any source files or object (.o) files into the experiment. Ensure that those files are accessible and unchanged from the machine on which you are examining the experiment.
Limit the amount of profiling data recorded to size megabytes. The limit applies to the sum of the amounts of clock-based profiling data, hardware counter overflow profiling data, and synchronization wait tracing data, but not to sample points. The limit is only approximate, and can be exceeded.
When the limit is reached, no more profiling data is recorded but the experiment remains open until the target process terminates. If periodic sampling is enabled, sample points continue to be written.
To impose a limit of approximately 2 Gbytes, for example, specify -L 2000. The size specified must be greater than zero.
By default, there is no limit on the amount of data recorded.
Append all output from collect itself to the name file, but do not redirect the output from the spawned target. If file is set to /dev/null, suppress all output from collect, including any error messages.
These collect command options are used for miscellaneous purposes.
Write a script for dbx to attach to the process with the given process_id, collect data from it, and then invoke dbx on the script. You can specify only profiling data, not tracing data, and timed runs (-t option) are not supported.
Put the comment into the notes file for the experiment. You can supply up to ten -C options. The contents of the notes file are prepended to the experiment header.
Do not run the target but print the details of the experiment that would be generated if the target were run. This option is a dry run option.
Display the text version of the Performance Analyzer Readme in the terminal window. If the readme is not found, a warning is printed. No further arguments are examined, and no further processing is done.
Print the current version of the collect command. No further arguments are examined, and no further processing is done.
Print the current version of the collect command and detailed information about the experiment being run.
In the Solaris OS only, the -P pid option can be used with the collect utility to attach to the process with the specified PID, and collect data from the process. The other options to the collect command are translated into a script for dbx, which is then invoked to collect the data. Only clock-based profile data (-p option) and hardware counter overflow profile data (-h option) can be collected. Tracing data is not supported.
If you use the -h option without explicitly specifying a -p option, clock-based profiling is turned off. To collect both hardware counter data and clock-based data, you must specify both a -h option and a -p option.
If you started the program from the command line and put it in the background, its PID will be printed to standard output by the shell. Otherwise you can determine the program’s PID by typing the following.
% ps -ef | grep program-name
% collect -P pid collect-options
The collector options are described in Data Collection Options. For information about clock-based profiling, see -p option. For information about hardware clock profiling, see -h option.