NAME
collect - command used to collect program performance data
SYNOPSIS
collect collect-arguments target target-arguments
collect
collect -V
collect -R
DESCRIPTION
The collect command runs the target process and records per-
formance data and global data for the process. Performance
data is collected using profiling or tracing techniques.
The data can be examined with a GUI program (analyzer) or a
command-line program (er_print). The data collection
software run by the collect command is referred to here as
the Collector.
The data from a single run of the collect command is called
an experiment. The experiment is represented in the file
system as a directory, with various files inside that direc-
tory.
The target is the path name of the executable, Java(TM) .jar
file, or Java .class file for which you want to collect per-
formance data. (For more information about Java profiling,
see JAVA PROFILING, below.) Executables that are targets
for the collect command can be compiled with any level of
optimization, but must use dynamic linking. If a program is
statically linked, the collect command prints an error mes-
sage. In order to see annotated source using analyzer or
er_print, targets should be compiled with the -g flag, and
should not be stripped.
In order to enable dataspace profiling, executables must be
compiled with the -xhwcprof -xdebugformat=dwarf -g flags.
These flags are valid for the C, C++ and Fortran compilers,
but only on SPARC[R] platforms. See the section "DATASPACE
PROFILING", below.
The collect command uses the following strategy to find its
target:
- If there is a file with the name of the target that is
marked executable, the file is verified as an ELF execut-
able that can run on the target machine. If the file is
not such a valid ELF executable, the collect command
fails.
- If there is a file with the name of the target, and the
file is not executable, collect checks whether the file is
a Java[TM] jar file or class file. If the file is a Java
jar file or class file, the Java[TM] virtual machine (JVM)
software is inserted as the target, with any necessary
flags, and data is collected on that JVM machine. (The
terms "Java virtual machine" and "JVM" mean a virtual
machine for the Java[TM] platform.) See the section on
"JAVA PROFILING", below.
- If there is no file with the name of the target, your path
is searched to find an executable; if an executable is
found, it is verified as described above.
- If no file of the current name is found, the command looks
for a file with that name and the string .class appended;
if a file is found, the target of a JVM machine is
inserted, with the appropriate flags, as above.
- If none of these procedures can find the target, the com-
mand fails.
OPTIONS
If invoked with no arguments, print a usage summary, includ-
ing the default configuration of the experiment. If the pro-
cessor supports hardware counter overflow profiling, print
two lists containing information about hardware counters.
The first list contains "aliased" hardware counters; the
second list contains raw hardware counters. For more
details, see the "Hardware Counter Overflow Profiling" sec-
tion below.
Data Specifications
-p option
Collect clock-based profiling data. The allowed values
of option are:
Value Meaning
off Turn off clock-based profiling
on Turn on clock-based profiling with the
default profiling interval of approximately
10 milliseconds.
lo[w] Turn on clock-based profiling with the low-
resolution profiling interval of approxi-
mately 100 milliseconds.
hi[gh] Turn on clock-based profiling with the high-
resolution profiling interval of approxi-
mately 1 millisecond.
n Turn on clock-based profiling with a
profiling interval of n. The value n can be
an integer or a floating-point number, with a
suffix of u for values in microseconds, or m
for values in milliseconds. If no suffix is
used, assume the value to be in milliseconds.
If the value is smaller than the clock pro-
filing minimum, set it to the minimum; if it
is not a multiple of the clock profiling
resolution, round down to the nearest multi-
ple of the clock resolution. If it exceeds
the clock profiling maximum, report an error.
If it is negative or zero, report an error.
If invoked with no arguments, report the
clock-profiling intervals.
An optional + can be prepended to the clock-profiling
interval, specifying that collect capture dataspace
data. It will do so by backtracking one instruction,
and if that instruction is a memory instruction, it
will assume that the delay was attributed to that
instruction and record the event, including the virtual
and physical addresses of the memory reference.
Caution must be used in interpreting clock-based
dataspace data; the delay might be completely unrelated
to the memory instruction that happened to precede the
instruction with the clock-profile hit; for example, if
a memory instruction hits in the cache, but is in a
loop executed many times, high counts on that instruc-
tion might appear to indicate memory stall delays, but
they do not. This situation can be disambiguated by
examining the disassembly around the instruction indi-
cating the stall. If the surrounding instructions also
have high clock-profiling metrics, the memory delay is
likely to be spurious.
Clock-based dataspace profiling should be used only on
machines that do not support hardware counter profiling
on memory-based counters.
See the section "DATASPACE PROFILING", below.
If no explicit -p off argument is given, and neither
hardware counter overflow profiling, nor count data,
nor race-detection or deadlock data is specified, turn
on clock-based profiling.
-h ctr_def...[,ctr_n_def]
Collect hardware counter overflow profiles. The number
of counter definitions, (ctr_def through ctr_n_def) is
processor-dependent. For example, on an UltraSPARC III
system, up to two counters can be programmed; on an
Intel Pentium IV with Hyperthreading, up to 18 counters
are available. You can ascertain the maximum number of
hardware counters definitions for profiling on a target
system, and the full list of available hardware
counters, by running the collect command without any
arguments.
This option is now available on systems running the
Linux OS. You are responsible for installing the
required perfctr patch on the system; that patch can be
downloaded from:
http://user.it.uu.se/~mikpe/linux/perfctr/2.6/perfctr-2.6.15.tar.gz
Instructions for installation are contained within that
tar file. The user-level libperfctr.so libraries are
searched for using LD_LIBRARY_PATH, and then in
/usr/local/lib, /usr/lib/, and /lib/ for the 32-bit
versions, or /usr/local/lib64 /usr/lib64/, and /lib64/
for the 64-bit versions.
Each counter definition takes one of the following
forms, depending on whether attributes for hardware
counters are supported on the processor:
1. [+]ctr[/reg#][,interval]
2. [+]ctr[~attr=val]...[~attrN=valN][/reg#][,interval]
The meanings of the counter definition options are as
follows:
Value Meaning
+ Optional parameter that can be applied to
memory-related counters. Causes collect to
collect dataspace data by backtracking to
find the instruction that triggered the over-
flow, and to find the virtual and physical
addresses of the memory reference. Backtrack-
ing works on SPARC processors, and only with
counters of type load, store, or load-store,
as displayed in the counter list obtained by
running the collect command without any
command-line arguments. See the section
"DATASPACE PROFILING", below.
ctr Processor-specific counter name. You can
ascertain the list of counter names by run-
ning the collect command without any
command-line arguments. On most systems,
even if a counter is not listed, it can still
be specified by a numeric value, either in
hexadecimal (0x1234) or decimal. Drivers for
older chips do not support numeric values,
but drivers for more recent chips do.
attr=val On some processors, attribute options can be
associated with a hardware counter. If the
processor supports attribute options, then
running collect without any command-line
arguments specifies the counter definition,
ctr_def, in the second form listed above, and
provide a list of attribute names to use for
attr. Value val can be in decimal or hexade-
cimal format. Hexadecimal format numbers are
in C program format where the number is
prepended by a zero and lower-case x
(0xhex_number).
reg# Hardware register to use for the counter. If
not specified, collect attempts to place the
counter into the first available register and
as a result, might be unable to place subse-
quent counters due to register conflicts. If
you specify more than one counter, the
counters must use different registers. The
list of allowable register numbers can be
ascertained by running the collect command
without any command-line arguments.
interval Sampling frequency, set by defining the
counter overflow value. Valid values are as
follows:
Value Meaning
on Select the default rate, which can
be determined by running the
collect command without any
command-line arguments. Note that
the default value for all raw
counters is the same, and might not
be the most suitable value for a
specific counter.
hi Set interval to approximately 10
times shorter than on.
lo Set interval to approximately 10
times longer than on.
value Set interval to a specific value,
specified in decimal or hexadecimal
format.
An experiment can specify both hardware counter over-
flow profiling and clock-based profiling. If hardware
counter overflow profiling is specified, but clock-
based profiling is not explicitly specified, turn off
clock-based profiling.
For more information on hardware counters, see the
"Hardware Counter Overflow Profiling" section below.
-s option
Collect synchronization tracing data.
The minimum delay threshold for tracing events is set
using option. The allowed values of option are:
Value Meaning
on Turn on synchronization delay tracing and set
the threshold value by calibration at runtime
calibrate Same as on
off Turn off synchronization delay tracing
n Turn on synchronization delay tracing with a
threshold value of n microseconds; if n is
zero, trace all events
all Turn on synchronization delay tracing and
trace all synchronization events
By default, turn off synchronization delay tracing.
Record synchronization events for Java monitors, but
not for native synchronization within the JVM machine.
On Solaris, the following functions are traced:
mutex_lock, rw_rdlock, rw_wrlock, cond_wait,
cond_timedwait, cond_reltimedwait, thr_join, sema_wait,
pthread_mutex_lock, pthread_rwlock_rdlock,
pthread_rwlock_wrlock, pthread_cond_wait,
pthread_cond_timedwait, pthread_cond_reltimedwait_np,
pthread_join, and sem_wait.
On Linux, the following functions are traced:
pthread_mutex_lock, pthread_cond_wait,
pthread_cond_timedwait, pthread_join, and sem_wait.
-H option
Collect heap trace data. The allowed values of option
are:
Value Meaning
on Turn on tracing of memory allocation requests
off Turn off tracing of memory allocation
requests
By default, turn off heap tracing.
Record heap-tracing events for any native calls. Treat
calls to mmap as memory allocations.
Heap profiling is not supported for Java programs.
Specifying it is treated as an error.
Note that heap tracing might produce very large experi-
ments. Such experiments are very slow to load and
browse.
-M option
Specify collection of an MPI experiment. (See MPI PRO-
FILING, below.) The target of collect should be
mpirun, and its arguments should be separated from the
user target (that is the programs that are to be run by
mpirun) by an inserted -- argument. The experiment is
named as usual, and is referred to as the "founder
experiment"; its directory contains subexperiments for
each of the MPI processes, named by rank. It is recom-
mended that the -- argument always be used with mpirun,
so that an experiment can be collected by prepending
collect and its options to the mpirun command line.
The allowed values of option are:
Value Meaning
MPI-version
Turn on collection of an MPI experiment,
assuming the MPI version named
off Turn off collection of an MPI experiment
By default, turn off collection of an MPI experiment.
When an MPI experiment is turned on, the default set-
ting for -m (see below) is changed to on.
The recognized versions of MPI are printed when you
type collect with no arguments, or in response to an
unrecognized version specified with -M.
-m option
Collect MPI tracing data. (See MPI PROFILING, below.)
The allowed values of option are:
Value Meaning
on Turn on MPI tracing information
off Turn off MPI tracing information
By default, turn off MPI tracing, except if the -M flag
is enabled, in which case MPI tracing is turned on by
default. Normally, MPI experiments are collected with
-M, and no user control of MPI tracing is needed. If
you want to collect an MPI experiment, but not collect
MPI trace data, you can use the explicit flags:
-M MPI-version -m off.
-c option
Collect count data, using bit(1) instrumentation. This
option is available only on Solaris systems. The
allowed values of option are:
Value Meaning
on Turn on count data
static Turn on simulated count data, based on the
assumption that every instruction was exe-
cuted exactly once.
off Turn off count data
By default, turn off count data. Count data cannot be
collected with any other type of data. For count data
or simulated count data, the executable and any
shared-objects that are instrumented and statically
linked are counted; for count data, but not simulated
count data, dynamically loaded shared objects are also
instrumented and counted.
In order to collect count data, the executable must be
compiled with the -xbinopt=prepare flag.
-I directory
Specify a directory for bit(1) instrumentation. This
option is available only on Solaris systems, and is
meaningful only when -c is specified.
-N libname
Specify a library to be excluded from bit(1)
instrumentation, whether the library is linked into the
executable, or loaded with dlopen. This option is
available only on Solaris systems, and is meaningful
only when -c is also specified. Multiple -N options
can be specified.
-r option
Collect thread-analyzer data.
The allowed values of option are:
Value Meaning
on Turn on thread analyzer data-race-detection
data
all Turn on all thread analyzer data
off Turn off thread analyzer data
dt1,...,dtN
Turn on specific thread analyzer data types,
as named by the dt* parameters.
The specific types of thread analyzer data
that can be requested are:
Value Meaning
race Collect datarace data
deadlock Collect deadlock and potential-
deadlock data
By default, turn off all thread-analyzer data.
Thread Analyzer data cannot be collected with any trac-
ing data, but can be collected in conjunction with
clock- or hardware counter profiling data. Thread
Analyzer data significantly slows down the execution of
the target, and profiles might not be meaningful as
applied to the user code.
Thread Analyzer experiments can be examined with either
analyzer or with tha. The latter displays a simplified
list of default tabs, but is otherwise identical.
In order to enable data-race detection, executables
must be instrumented, either at compile time, or by
invoking a postprocessor. If the target is not instru-
mented, and none of the shared objects on its library
list is instrumented, a warning is displayed, but the
experiment is run. Other Thread Analyzer data do not
require instrumentation.
See the tha(1) man page for more detail.
-S interval
Collect periodic samples at the interval specified (in
seconds). Record data samples from the process, and
include a timestamp and execution statistics from the
kernel, among other things. The allowed values of
interval are:
Value Meaning
off Turn off periodic sampling
on Turn on periodic sampling with the default
sampling interval (1 second)
n Turn on periodic sampling with a sampling
interval of n in seconds; n must be positive.
By default, turn on periodic sampling.
If no data specification arguments are supplied,
collect clock-based profiling data, using the default
resolution.
If clock-based profiling is explicitly disabled, and
neither hardware counter overflow profiling nor any
kind of tracing is enabled, display a warning that no
function-level data is being collected, then execute
the target and record global data.
Experiment Controls
-L size
Limit the amount of profiling and tracing data recorded
to size megabytes. The limit applies to the sum of all
profiling data and tracing data, but not to sample
points. The limit is only approximate, and can be
exceeded. When the limit is reached, stop profiling
and tracing data, but keep the experiment open and
record samples until the target process terminates.
The allowed values of size are:
Value Meaning
unlimited or none
Do not impose a size limit on the experiment
n Impose a limit of n MB.; n must be positive
and greater than zero.
By default, there is no limit on the amount of data
recorded.
-F option
Control whether or not descendant processes should have
their data recorded. The allowed values of option are:
Value Meaning
on Record experiments on descendant processes
from fork and exec
all Record experiments on all descendant
processes
off Do not record experiments on descendant
processes
=<regex> Record experiments on all descendant
processes whose executable name (a.out name)
or lineage match the regular expression.
By default, record descendant processes from fork and
exec. For more details, read the sections "FOLLOWING
DESCENDANT PROCESSES", and "PROFILING SCRIPTS" below.
-A option
Control whether or not load objects used by the target
process should be archived or copied into the recorded
experiment. The allowed values of option are:
Value Meaning
on Archive load objects into the experiment.
off Do not archive load objects into the experi-
ment.
copy Copy and archive load objects (the target and
any shared objects it uses) into the experi-
ment.
If you copy experiments onto a different machine, or
read the experiments from a different machine, specify
-A copy. Doing so will consume more disk space but
allow the experiment to be read on other machines. For
Java experiments, all .jar files are also copied into
the experiment.
Note that -A copy does not copy any sources or object
files (.o's); it is your responsibility to ensure that
those files are accessible from the machine where the
experiment is being examined.
The default setting for -A is on, except for datarace
detection and deadlock experiments, where the default
setting is copy.
-j option
Control Java profiling when the target is a JVM
machine. The allowed values of option are:
Value Meaning
on Record profiling data for the JVM machine,
and recognize methods compiled by the Java
HotSpot[TM] virtual machine, and also record
Java callstacks.
off Do not record Java profiling data.
<path> Record profiling data for the JVM, and use
the JVM as installed in <path>.
See the section "JAVA PROFILING", below.
You must use -j on to obtain profiling data if the tar-
get is a JVM machine. The -j on option is not needed
if the target is a class or jar file. if you are on a
64-bit JVM machine, you must specify its path expli-
citly as the target; do not use the -d64 option for a
32-bit JVM machine. If the -j on option is specified,
but the target is not a JVM machine, an invalid argu-
ment might be passed to the target, and no data would
be recorded. The collect command validates the version
of the JVM machine specified for Java profiling.
-J java_arg
Specify additional arguments to be passed to the JVM
used for profiling. If -J is specified, but Java pro-
filing is not specified, an error is generated, and no
experiment run. The java_arg must be surrounded by
quotes if it contains more than one argument. It con-
sists of a set of tokens, separated by either a blank
or a tab; each token is passed as a separate argument
to the JVM. Note that most arguments to the JVM must
begin with a "-" character.
-l signal
Record a sample point whenever the given signal is
delivered to the process.
-y signal[,r]
Control recording of data with signal. Whenever the
given signal is delivered to the process, switch
between paused (no data is recorded) and resumed (data
is recorded) states. Start in the resumed state if the
optional ,r flag is given, otherwise start in the
paused state. This option does not affect the record-
ing of sample points.
Output Controls
-o experiment_name
Use experiment_name as the name of the experiment to be
recorded. The experiment_name must end in the string
.er; if not, print an error message and do not run the
experiment.
If -o is not specified, give the experiment a name of
the form stem.n.er, where stem is a string, and n is a
number. If a group name has been specified with -g, set
stem to the group name without the .erg suffix. If no
group name has been specified, set stem to the string
"test".
If invoked from one of the commands used to run MPI
jobs, for example, mpirun, but without -M MPI-versions,
and -o is not specified, take the value of n used in
the name from the environment variable used to define
the MPI rank of that process. Otherwise, set n to one
greater than the highest integer currently in use.
(See MPI PROFILING, below.)
If the name is not specified in the form stem.n.er, and
the given name is in use, print an error message and do
not run the experiment. If the name is of the form
stem.n.er and the name supplied is in use, record the
experiment under a name corresponding to one greater
than the highest value of n that is currently in use.
Print a warning if the name is changed.
-d directory_name
Place the experiment in directory directory_name. If
no directory is given, place the experiment in the
current working directory. If a group is specified
(see -g, below), the group file is also written to the
directory named by -d.
For the lightest-weight data collection, it is best to
record data to a local file, with -d used to specify a
directory in which to put the data. However, for MPI
experiments on a cluster, the founder experiment must
be available at the same path to all processes to have
all data recorded into the founder experiment.
Experiments written to long-latency file systems are
especially problematic, and might progress very slowly,
especially if Sample data is collected (-S on, the
default). If you must record over a long-latency con-
nection, disable Sample data.
-g group_name
Add the experiment to the experiment group group_name.
The group_name string must end in the string .erg; if
not, report an error and do not run the experiment.
The first line of a group file must contain the string
#analyzer experiment group
and each subsequent line is the name of an experiment.
-O file
Append all output from collect itself to the named
file, but do not redirect the output from the spawned
target. If file is set to /dev/null suppress all out-
put from collect, including any error messages.
-t duration
Collect data for the specified duration. duration can
be a single number, followed by either m, specifying
minutes, or s, specifying seconds (default), or two
such numbers separated by a - sign. If one number is
given, data is collected from the start of the run
until the given time; if two numbers are given, data is
collected from the first time to the second. If the
second time is zero, data is collected until the end of
the run. If two non-zero numbers are given, the first
must be less than the second.
Other Arguments
-P <pid>
Write a script for dbx to attach to the process with
the given PID, and collect data from it, and then
invoke dbx with that script. Only profiling data, not
tracing data can be specified, and timed runs (-t) are
not supported.
-C comment
Put the comment into the notes file for the experiment.
Up to ten -C arguments can be supplied.
-n Dry run: do not run the target, but print all the
details of the experiment that would be run. Turn on
-v.
-R Display the text version of the performance tools
README in the terminal window. If the README is not
found, print a warning. Do not examine further argu-
ments and do no further processing.
-V Print the current version. Do not examine further
arguments and do no further processing.
-v Print the current version and further detailed informa-
tion about the experiment being run.
-x Leave the target process stopped on the exit from the
exec system call, in order to allow a debugger to
attach to it. The collect command prints a message
with the process PID.
To attach a debugger to the target once it is stopped
by collect, you must follow the procedure below.
- Obtain the PID of the process from the message
printed by the collect -x command
- Start the debugger
- Configure the debugger to ignore SIGPROF and, if you
chose to collect hardware counter data, SIGEMT on
Solaris or SIGIO on Linux
- Attach to the process using the PID.
As the process runs under the control of the debugger,
the Collector records an experiment.
FOLLOWING DESCENDANT PROCESSES
Data from the initial process spawned by collect, called the
founder process, is always collected. Processes can create
descendant processes by calling system library functions,
including the variants of fork, exec, system, etc.. If a -F
argument is used, the collector can collect data for descen-
dant processes, and it opens a new experiment for each des-
cendant process inside the parent experiment. These new
experiments are named with their lineage as follows:
- An underscore is appended to the creator's experiment
name.
- A code letter is added: either "f" for a fork, or "x" for
an exec, or "c" for other descendants.
- A number is added after the code letter, which is the
index of the fork or exec. The assignment of this number
is applied whether the process was started successfully or
not.
- The experiment suffix, ".er" is appended to the lineage.
For example, if the experiment name for the initial process
is "test.1.er", the experiment for the descendant process
created by its third fork is "test.1.er/_f3.er". If that
descendant process execs a new image, the corresponding
experiment name is "test.1.er/_f3_x1.er".
If the default, -F on, is used, descendant processes ini-
tiated by calls to fork(2), fork1(2), fork(3F), vfork(2),
and exec(2) and its variants are followed. The call to
vfork is replaced internally by a call to fork1. Descen-
dants creates by calls to system(3C), system(3F), sh(3F),
popen(3C), and similar functions, and their associated des-
cendant processes, are not followed.
If the -F all argument is used, all descendants are fol-
lowed, including those from system(3C), system(3F), sh(3F),
popen(3C), and similar functions.
If the -F =<regex> argument is used, all descendants whose
name or lineage match the regular expression are followed.
When matching lineage, the ".er" should be omitted. When
matching names, both the command, and its arguments are part
of the expression.
For example, to capture data on the descendant process of
the first exec from the first fork from the first call to
system in the founder, use:
collect -F '=_c1_f1_x1'
To capture data on all the variants of exec, but not fork,
use:
collect -F '=.*_x[0-9]/*'
To capture data from a call to system("echo hello")
but not system("goodbye"), use:
collect -F '=echo hello'
The Analyzer and er_print automatically read experiments for
descendant processes when the founder experiment is read,
and the experiments for the descendant processes are
selected for data display.
To specifically select the data for display from the command
line, specify the path name explicitly to either er_print or
Analyzer. The specified path must include the founder exper-
iment name, and the descendant experiment's name inside the
founder directory.
For example, to see the data for the third fork of the
test.1.er experiment:
er_print test.1.er/_f3.er
analyzer test.1.er/_f3.er
You can prepare an experiment group file with the explicit
names of descendant experiments of interest.
To examine descendant processes in the Analyzer, load the
founder experiment and chhose View > Filter data. The
Analyzer displays a list of experiments with only the
founder experiment checked. Uncheck the founder experiment
and check the descendant experiment of interest.
PROFILING SCRIPTS
An experimental feature for profiling scripts has been
implemented. The implementation may change in a subsequent
release.
Normally, collect requires that its target be an ELF execut-
able. To profile a script, set the environment variable
SP_COLLECTOR_SKIP_CHECKEXEC, and the checking for an ELF
executable will be disabled. By default, data will be
collected on the program launched to execute the script, and
on all descendants processes. To collect data only on a
specific process, use the -F flag to specify the name of the
executable to follow.
For example, to profile the script foo.sh, but collect data
primarily from the executable bar, use the commands:
setenv SP_COLLECTOR_SKIP_CHECKEXEC #(csh)
collect -F =bar foo.sh
Data will be collected on the founder process launched to
execute the script, and all bar processes spawned from the
script, but not for other processes.
JAVA PROFILING
Java profiling consists of collecting a performance experi-
ment on the JVM machine as it runs your .class or .jar
files. If possible, callstacks are collected in both the
Java model and in the machine model.
Data can be shown with view mode set to User, Expert, or
Machine. User mode shows each method by name, with data for
interpreted and HotSpot-compiled methods aggregated
together; it also suppresses data for non-user-Java threads.
Expert mode separates HotSpot-compiled methods from inter-
preted methods, and does not suppress non-user Java threads.
Machine mode shows data for interpreted Java methods against
the JVM machine as it does the interpreting, while data for
methods compiled with the Java HotSpot virtual machine is
reported for named methods. All threads are shown. In all
three modes, data is reported in the usual way for any non-
OpenMP C, C++, or Fortran code called by a Java target.
Such code corresponds to Java native methods. The Analyzer
and the er_print utility can switch between the view mode
User, view mode Expert, and view mode Machine, with User
being the default.
Clock-based profiling and hardware counter overflow profil-
ing are supported. Synchronization tracing collects data
only on the Java monitor calls, and synchronization calls
from native code; it does not collect data about internal
synchronization calls within the JVM.
Heap tracing is not supported for Java, and generates an
error if specified.
When collect inserts a target name of java into the argument
list, it examines environment variables for a path to the
java target, in the order JDK_HOME, and then JAVA_PATH. For
the first of these environment variables that is set, the
resultant target is verified as an ELF executable. If it is
not, collect fails with an error indicating which environ-
ment variable was used, and the full path name that was
tried.
If neither of those environment variables is set, the
collect command uses the the version set by your PATH. If
there is no java in your PATH, a system default of
/usr/java/bin/java is tried.
Java Profiling requires Java[TM] 2 SDK (JDK) 5, Update 19 or
later JDK 5's; or Java[TM] 2 SDK (JDK) 6, Update 18 or later
JDK 6's.
JAVA PROFILING WITH A DLOPEN'd LIBJVM.SO
Some applications are not pure Java, but are C or C++ appli-
cations that invoke dlopen to load libjvm.so, and then start
the JVM by calling into it. To profile such applications,
set the environment variable SP_COLLECTOR_USE_JAVA_OPTIONS,
and add -j on to the collect command line. Do not set
either LD_LIBRARY_PATH for this scenario.
SHARED_OBJECT HANDLING
Normally, the collect command causes data to be collected
for all shared objects in the address space of the target,
whether on the initial library list, or explicitly dlopen'd.
However, there are some circumstances under which some
shared objects are not profiled.
One such scenario is when the target program is invoked with
lazy-loading. In such cases, the library is not loaded at
startup time, and is not loaded by explicitly calling dlo-
pen, so the shared object name is not included in the exper-
iment, and all PCs from it are mapped to the <Unknown> func-
tion. The workaround is to set LD_BIND_NOW, to force the
library to be loaded at startup time.
Another such scenario is when the executable is built with
the -B direct. In that case the object is dynamically loaded
by a call specifically to the dynamic linker entry point of
dlopen, and the libcollector interposition is bypassed. The
shared object name is not included in the experiment, and
all PCs from it are mapped to the <Unknown> function. The
workaround is to not use -B direct.
OPENMP PROFILING
Data collection for OpenMP programs collects data that can
be displayed in any of the three view modes, just as for
Java programs. The presentation is identical for user mode
and expert mode. Slave threads are shown as if they were
really forked from the master thread, and have call stacks
matching the master thread. Frames in the call stack coming
from the OpenMP runtime code (libmtsk.so) are suppressed.
For machine mode, the actual native stacks are shown.
In user mode, various artificial functions are introduced as
the leaf function of a call stack whenever the runtime
library is in one of several states. These functions are
<OMP-overhead>, <OMP-idle>, <OMP-reduction>, <OMP-
implicit_barrier>, <OMP-explicit_barrier>, <OMP-lock_wait>,
<OMP-critical_section_wait>, and <OMP-ordered_section_wait>.
Two additional clock-profiling metrics are added to the data
for clock-profiling experiments:
OpenMP Work
OpenMP Wait
OpenMP Work is counted when the OpenMP runtime thinks the
code is doing work. It includes time when the process is
consuming User-CPU time, but it also can include time when
the process is consuming System-CPU time, waiting for page
faults, waiting for the CPU, etc.. Hence, OpenMP Work can
exceed User-CPU time. OpenMP Wait is accumulated when the
OpenMP runtime thinks the process is waiting. It can include
User-CPU time for busy-waits (spin-waits), but it also
includes Other-Wait time for sleep-waits.
The inclusive metrics are visible by default; the exclusive
are not. Together, the sum of those two metrics equals the
Total LWP Time metric. These metrics are added for all
clock- and hardware counter profiling experiments.
Collecting information for every fork in the execution of
the program can be very expensive. You can suppress that
cost by setting the environment variable
SP_COLLECTOR_NO_OMP. If you do so, the program will have
substantially less dilation, but you will not see the data
from slave threads propagate up the the caller, and eventu-
ally to main(), as it normally will without that variable
being set.
A new collector for OpenMP 3.0 is enabled by default in this
release. It can profile programs that use explicit tasking.
Programs built with earlier compilers can be profiled with
the new collector only if a patched version of libmtsk.so is
available. If it is not installed, you can switch data col-
lection to use the old collector by setting the environment
variable SP_COLLECTOR_OLDOMP.
Note that the OpenMP profiling functionality is only avail-
able for applications compiled with the Oracle Solaris Stu-
dio compilers, since it depends on the Oracle Solaris Studio
compiler runtime. GNU-compiled code will only see machine-
level callstacks.
DATASPACE PROFILING
A dataspace profile is a data collection in which memory-
related events, such as cache misses, are reported against
the data object references that cause the events rather than
just the instructions where the memory-related events occur.
Dataspace profiling is not available on systems running the
Linux OS, nor on x86 based systems running the Solaris OS.
To allow dataspace profiling, the target can be written in
C, C++ or Fortran, and must be compiled for SPARC architec-
ture, with the -xhwcprof -xdebugformat=dwarf -g flags, as
described above. Furthermore, the data collected must be
hardware counter profiles and the optional + must be
prepended to the counter name. If the optional + is
prepended to one memory-related counter, but not all, the
counters without the + reports dataspace data against the
<Unknown> data object, with subtype (Dataspace data not
requested during data collection).
With the data collected, the er_print utility allows three
additional commands: data_objects, data_single, and
data_layout, as well as various commands relating to Memory
Objects. See the er_print(1) man page for more information.
In addition, the Analyzer now includes two tabs related to
dataspace profiling, labeled DataObjects and DataLayout, as
well as a set of tabs relating to Memory Objects. See the
analyzer(1) man page for more information.
Clock-based dataspace profiling should only be used on
machines that do not support hardware counter profiling with
memory-based counters. It requires the same compilation
flags as for hardware counter profiling. Data should be
interpreted with care, as explained above.
MPI PROFILING
The collect command can be used for MPI profiling to manage
collection of the data from the constituent MPI processed,
collect MPI trace data, and organize the data into a single
"founder" experiment, with "subexperiments" for each MPI
process.
The collect command can be used with MPI by simply prefacing
the command that starts the MPI job and its arguments with
the desired collect command and its arguments (assuming you
have inserted the -- argument to indicate the end of the
mpirun arguments). For example, on an SMP machine,
% mpirun -np 16 -- a.out 3 5
can be replaced by
% collect -M OMPT mpirun -np 16 -- a.out 3 5
This command runs an MPI tracing experiment on each of the
16 MPI processes, collecting them all in an MPI experiment,
named by the usual conventions for naming experiments. It
assumes use of the Oracle Message Passing Toolkit (previ-
ously known as sun HPC ClusterTools) version of MPI.
The initial collect process reformats the mpirun command to
specify running collect with appropriate arguments on each
of the individual MPI processes.
Note that the -- argument immediately before the target name
is required for MPI profiling (although it is optional for
mpirun itself), so that collect can separate the mpirun
arguments from the target and its arguments. If it is not
supplied, collect prints an error message, and no experiment
is run.
Furthermore, a -x PATH argument is added to the mpirun argu-
ments by collect, so that the remote collect's can find
their targets. If any environment variables in your
environment begin with "VT_" or with "SP_COLLECTOR_", they
are passed to the remote collect with -x flags for each.
MIMD MPI runs are supported, with the similar proviso that
there must be a "--" argument after each ":" (indicating a
new target and local mpirun arguments for it). If it is not
supplied, collect prints an error message, and no experiment
is run.
Some versions of Oracle Message Passing Toolkit, or Sun HPC
ClusterTools have functionality for MPI State profiling.
When clock-profiling data is collected on an MPI experiment
run with such a version of MPI, two additional metrics can
be shown:
MPI Work
MPI Wait
MPI Work accumulates when the process is inside the MPI run-
time doing work, such as processing requests or messages;
MPI Wait accumulates when the process is inside the MPI run-
time, but waiting for an event, buffer, or message.
In the Analyzer, when MPI trace data is collected, two addi-
tional tabs are shown, MPI Timeline and MPI Chart.
The technique of using mpirun to spawn explicit collect com-
mands on the MPI processes is no longer supported to collect
MPI trace data, and should not be used. It can still be
used for all other types of data.
MPI profiling is based on the open source VampirTrace 5.5.3
release. It recognizes several VampirTrace environment
variables, and a new one, VT_STACKS, which controls whether
or not callstacks are recorded in the data. For further
information on the meaning of these variables, see the Vam-
pirTrace 5.5.3 documentation.
The default values of the environment variables
VT_BUFFER_SIZE and VT_MAX_FLUSHES limit the internal buffer
of the MPI API trace collector to 64 MB and the number of
times that the buffer is flushed to 1, respectively. Events
that are to be recorded after the limit has been reached are
no longer written into the trace file. The environment vari-
ables apply to every process of a parallel application,
meaning that applications with n processes will typically
create trace files n times the size of a serial application.
To remove the limit and get a complete trace of an applica-
tion, set VT_MAX_FLUSHES to 0. This setting causes the MPI
API trace collector to flush the buffer to disk whenever the
buffer is full. To change the size of the buffer, use the
environment variable VT_BUFFER_SIZE. The optimal value for
this variable depends on the application which is to be
traced. Setting a small value will increase the memory
available to the application but will trigger frequent
buffer flushes by the MPI API trace collector. These buffer
flushes can significantly change the behavior of the appli-
cation. On the other hand, setting a large value, like 2G,
will minimize buffer flushes by the MPI API trace collector,
but decrease the memory available to the application. If not
enough memory is available to hold the buffer and the appli-
cation data this might cause parts of the application to be
swapped to disk leading also to a significant change in the
behavior of the application.
Another important variable is VT_VERBOSE, which turns on
various error and status messages, and setting it to 2 or
higher is recommended if problems arise.
Normally, MPI trace output data is post-processed when the
mpirun target exits; a processed data file is written to the
experiment, and information about the post-processing time
is written into the experiment header. MPI postprocessing
is not done if MPI tracing is explicitly disabled.
In the event of a failure in post-processing, an error is
reported, and no MPI Tabs or MPI tracing metrics will be
available.
If the mpirun target does not actually invoke MPI, an exper-
iment will still be recorded, but no MPI trace data will be
produced. The experiment will report an MPI post-processing
error, and no MPI Tabs or MPI tracing metrics will be avail-
able.
If the environment variable VT_UNIFY is set to "0", the
post-processing routines, er_vtunify and er_mpipp will not
be run by collect. They will be run the first time either
er_print or analyzer are invoked on the experiment.
USING COLLECT WITH PPGSZ
The collect command can be used with ppgsz by running the
collect command on the ppgsz command, and specifying the -F
on flag. The founder experiment is on the ppgsz executable
and is uninteresting. If your path finds the 32-bit version
of ppgsz, and the experiment is being run on a system that
supports 64-bit processes, the first thing the collect com-
mand does is execute an exec function on its 64-bit version,
creating _x1.er. That executable forks, creating _x1_f1.er.
The descendant process attempts to execute an exec function
on the named target, in the first directory on your path,
then in the second, and so forth, until one of the exec
functions succeeds. If, for example, the third attempt
succeeds, the first two descendant experiments are named
_x1_f1_x1.er and _x1_f1_x2.er, and both are completely
empty. The experiment on the target is the one from the
successful exec, the third one in the example, and is named
_x1_f1_x3.er, stored under the founder experiment. It can
be processed directly by invoking the Analyzer or the
er_print utility on test.1.er/_x1_f1_x3.er.
If the 64-bit ppgsz is the initial process run, or if the
32-bit ppgsz is invoked on a 32-bit kernel, the fork descen-
dant that executes exec on the real target has its data in
_f1.er, and the real target's experiment is in _f1_x3.er,
assuming the same path properties as in the example above.
See the section "FOLLOWING DESCENDANT PROCESSES", above.
For more information on hardware counters, see the "Hardware
Counter Overflow Profiling" section below.
The collect command operates by inserting a shared library,
libcollector.so, into the target's address space
(LD_PRELOAD), and by using a second shared library,
collaudit.so, to record shared-object use with the runtime
linker's audit interface (LD_AUDIT). Those two shared
libraries write the files that constitute the experiment.
Several problems might arise if collect is invoked on exe-
cutables that call setuid or setgid, or that create descen-
dant processes that call setuid or setgid. If the user run-
ning the experiment is not root, collection fails because
the shared libraries are not installed in a trusted direc-
tory. The workaround is to run the experiments as root, or
use crle(1) to grant permission. Users should, of course,
take great care when circumventing security barriers, and do
so at their own risk.
In addition, the umask for the user running the collect com-
mand must be set to allow write permission for that user,
and for any users or groups that are set by the
setuid/setgid attributes of a program being exec'd and for
any user or group to which that program sets itself. If the
mask is not set properly, some files might not be written to
the experiment, and processing of the experiment might not
be possible. If the log file can be written, an error is
shown when the user attempts to process the experiment.
Other problems can arise if the target itself makes any of
the system calls to set UID or GID, or if it changes its
umask and then forks or runs exec on some other process, or
crle was used to configure how the runtime linker searches
for shared objects.
If an experiment is started as root on a target that changes
its effective GID, the er_archive process that is automati-
cally run when the experiment terminates fails, because it
needs a shared library that is not marked as trusted. In
that case, you can run er_archive (or er_print or Analyzer)
explicitly by hand, on the machine on which the experiment
was recorded, immediately following the termination of the
experiment.
DATA COLLECTED
Three types of data are collected: profiling data, tracing
data and sampling data. The data packets recorded in profil-
ing and tracing include the callstack of each LWP, the LWP,
thread, and CPU IDs, and some event-specific data. The data
packets recorded in sampling contain global data such as
execution statistics, but no program-specific or event-
specific data. All data packets include a timestamp.
Clock-based Profiling
The event-specific data recorded in clock-based profil-
ing is an array of counts for each accounting micro-
state. The microstate array is incremented by the sys-
tem at a prescribed frequency, and is recorded by the
Collector when a profiling signal is processed.
Clock-based profiling can run at a range of frequencies
which must be multiples of the clock resolution used
for the profiling timer. If you try to do high-
resolution profiling on a machine with an operating
system that does not support it, the command prints a
warning message and uses the highest resolution sup-
ported. Similarly, a custom setting that is not a mul-
tiple of the resolution supported by the system is
rounded down to the nearest non-zero multiple of that
resolution, and a warning message is printed.
Clock-based profiling data is converted into the fol-
lowing metrics:
User CPU Time
Wall Time
Total LWP Time
System CPU Time
Wait CPU Time
User Lock Time
Text Page Fault Time
Data Page Fault Time
Other Wait Time
For experiments on multithreaded applications, all of
the times, other than Wall Time, are summed across all
LWPs in the process; Wall Time is the time spent in
all states for LWP 1 only. Total LWP Time adds up to
the real elapsed time, multiplied by the average number
of LWPs in the process.
If clock-based profiling is performed on an OpenMP pro-
gram, two additional metrics:
OpenMP Work
OpenMP Wait
are provided. On the Solaris OS, OpenMP Work accumu-
lates when work is being done in parallel. OpenMP Wait
accumulates when the OpenMP runtime is waiting for syn-
chronization, and accumulates whether the wait is using
CPU time or sleeping, or when work is being done in
parallel, but the thread is not scheduled on a CPU.
On Linux, OpenMP Work and OpenMP Wait are accumulated
only when the process is active in either user or sys-
tem mode. Unless you have specified that OpenMP should
do a busy wait, OpenMP Wait on Linux will not be use-
ful.
If clock-based profiling is performed on an MPI pro-
gram, run under Oracle Message Passing Toolkit or Sun
HPC ClusterTools release 8.1 or later, two additional
metrics:
MPI Work
MPI Wait
is provided. On Solaris, MPI Work accumulates when the
MPI runtime is active. MPI Wait accumulates when the
MPI runtime is waiting for the send or receive of a
message, or when the MPI runtime is active, but the
thread is not running on a CPU.
On Linux, MPI Work and MPI Wait are accumulated only
when the process is active in either user or system
mode. Unless you have specified that MPI should do a
busy wait, MPI Wait on Linux will not be useful. If
clock-based dataspace profiling is specified, an addi-
tional metric:
Max. Mem Stalls
is provided.
Hardware Counter Overflow Profiling
Hardware counter overflow profiling records the number
of events counted by the hardware counter at the time
the overflow signal was processed. This type of profil-
ing is now available on systems running the Linux OS,
provided that they have the Perfctr patch installed.
Hardware counter overflow profiling can be done on sys-
tems that support overflow profiling and that include
the hardware counter shared library, libcpc.so(3). You
must use a version of the Solaris OS no earlier that
the Solaris 10 OS. On UltraSPARC[R] computers, you must
use a version of the hardware no earlier than the
UltraSPARC III hardware. On computers that do not sup-
port overflow profiling, an attempt to select hardware
counter overflow profiling generates an error.
The counters available depend on the specific CPU pro-
cessor and operating system. Running the collect com-
mand with no arguments prints out a usage message that
contains the names of the counters. The counters that
are aliased to common names are displayed first in the
list, followed by a list of the raw hardware counters.
If neither the performance counter subsystem nor
collect know the names for the counters on a specific
chip, the tables are empty. In most cases, however,
the counters can be specified numerically. The lines
of output are formatted similar to the following:
Aliased HW counters available for profiling:
cycles[/{0|1}],9999991 ('CPU Cycles', alias for Cycle_cnt; CPU-cycles)
insts[/{0|1}],9999991 ('Instructions Executed', alias for Instr_cnt; events)
dcrm[/1],100003 ('D$ Read Misses', alias for DC_rd_miss; load events)
...
Raw HW counters available for profiling:
Cycle_cnt[/{0|1}],1000003 (CPU-cycles)
Instr_cnt[/{0|1}],1000003 (events)
DC_rd[/0],1000003 (load events)
SI_snoop[/0],1000003 (not-program-related events)
...
In the first line of aliased counter output, the first
field, "cycles", gives the counter name that can be
used in the -h counter... argument. It is followed by a
specification of which registers can be used for that
counter. The next field, "9999991", is the default
overflow value for that counter. The following field
in parentheses, "CPU Cycles", is the metric name, fol-
lowed by the raw hardware counter name. The last field,
"CPU-cycle", specifies the type of units being counted.
There can be up to two words for the type of informa-
tion. The second or only word of the type information
can be either "CPU-cycles" or "events". If the counter
can be used to provide a time-based metric, the value
is CPU-cycles; otherwise it is events.
The second output line of the aliased counter output
above has "events" instead of "CPU-cycles" at the end
of the line, indicating that it counts events, and can-
not be converted to a time.
The third output line above has two words of type
information, "load events", at the end of the line. The
first word of type information can have the value of
"load", "store", "load-store", or "not-program-
related". The first three of these type values indicate
that the counter is memory-related and the counter name
can be preceded by the "+" sign when used in the
collect -h command. The "+" sign indicates the
request for data collection to attempt to find the pre-
cise instruction and virtual address that caused the
event on the counter that overflowed.
The "not-program-related" value indicates that the
counter captures events initiated by some other pro-
gram, such as CPU-to-CPU cache snoops. Using the
counter for profiling generates a warning and profiling
does not record a call stack. It does, however, show
the time being spent in an artificial function called
"collector_not_program_related". Thread IDs and LWP IDs
are recorded, but are meaningless.
Each line in the raw hardware counter list includes the
internal counter name as used by cputrack(1), the
register number(s) on which that counter can be used,
the default overflow value, and the counter units,
which is either CPU-cycles or Events.
EXAMPLES:
Example 1: Using the aliased counter information listed
in the above sample output, the following command:
collect -h cycles/0,hi,+dcrm,9999
enables the CPU Cycle profiling on register 0. The "hi"
value enables a sample rate that is approximately 10
times faster than the default rate of 9999991. The
"dcrm" value enables the D$ Read Miss profiling on
register 1 and the preceding "+" enables Dataspace pro-
filing for the dcrm. The "9999" value sets the sampling
to be done every 9999 read misses, instead of the
default value of every 100003 read misses.
Example 2:
Running the collect command with no arguments on an AMD
Opteron machine would produce a raw hardware counter
output similar to the following :
FP_dispatched_fpu_ops[/{0|1|2|3}],1000003 (events)
FP_cycles_no_fpu_ops_retired[/{0|1|2|3}],1000003 (CPU-cycles)
...
Using the above raw hardware counter output, the fol-
lowing command:
collect -h FP_dispatched_fpu_ops~umask=0x3/2,10007
enables the Floating Point Add and Multiply operations
to be tracked at the rate of 1 capture every 10007
events. (For more details on valid attribute values,
refer to the processor documentation). The "/2" value
specifies the data is to be captured using the register
2 of the hardware.
Synchronization Delay Tracing
Synchronization delay tracing records all calls to the
various thread synchronization routines where the
real-time delay in the call exceeds a specified thres-
hold. The data packet contains timestamps for entry and
exit to the synchronization routines, the thread ID,
and the LWP ID at the time the request is initiated.
(Synchronization requests from a thread can be ini-
tiated on one LWP, but complete on another.)
Synchronization delay tracing data is converted into
the following metrics:
Synchronization Delay Events
Synchronization Wait Time
Heap Tracing
Heap tracing records all calls to malloc, free, real-
loc, memalign, and valloc with the size of the block
requested, its address, and for realloc, the previous
address.
Heap tracing data is converted into the following
metrics:
Leaks
Bytes Leaked
Allocations
Bytes Allocated
Leaks are defined as allocations that are not freed.
If a zero-length block is allocated, it counts as an
allocation with zero bytes allocated. If a zero-length
block is not freed, it counts as a leak with zero bytes
leaked.
For applications written in the Java[TM] programming
language, leaks are defined as allocations that have
not been garbage-collected. Heap profiling for such
applications is obsolescent and will not be supported
in future releases.
Heap tracing experiments can be very large, and might
be slow to process.
MPI Tracing
MPI tracing records calls to the MPI library for func-
tions that can take a significant amount of time to
complete. MPI tracing is implemented using the Open
Source Vampir Trace code.
MPI tracing data is converted into the following
metrics:
MPI Time
MPI Sends
MPI Bytes Sent
MPI Receives
MPI Bytes Received
Other MPI Events
MPI Time is the total LWP time spent in the MPI func-
tion. IF MPI state times are also collected, MPI Work
Time plus MPI Wait Time for all MPI functions other
than MPI_Init and MPI_Finalize should approximately
equal MPI Work Time. On Linux, MPI Wait and Work are
based on user+system CPU time, while MPI Time is based
on real time, so the numbers will not match.
The MPI Bytes Received metric counts the actual number
of bytes received in all messages. MPI Bytes Sent
counts the actual number of bytes sent in all messages.
MPI Sends counts the number of messages sent, and MPI
Recieves counts the number of messages received.
MPI_Sendrecv counts as both a send and a receive. MPI
Other Events counts the events in the trace that are
neither sends nor receives.
Count Data
Count data is recorded by instrumenting the executable,
and counting the number of times each instruction was
executed. It also counts the number of times the first
instruction in a function is executed, and calls that
the function execution count.
Count data is converted into the following metric:
Bit Func Count
Bit Inst Exec
Bit Inst Annul
Data-race Detection Data
Data-race detection data consists of pairs of race-
access events that constitute a race. The events are
combined into a race, and races for which the call
stacks for the two access are identical are merged into
a race group.
Data-race detection data is converted into the follow-
ing metric:
Race Accesses
Deadlock Detection Data
Deadlock detection data consists of pairs of threads
with conflicting locks.
Deadlock detection data is converted into the following
metric:
Deadlocks
Sampling and Global Data
Sampling refers to the process of generating markers
along the time line of execution. At each sample point,
execution statistics are recorded. All of the data
recorded at sample points is global to the program, and
does not map to function-level metrics.
Samples are always taken at the start of the process,
and at its termination. By default or if a non-zero -S
argument is specified, samples are taken periodically
at the specified interval. In addition, samples can be
taken by using the libcollector(3) API.
The data recorded at each sample point consists of
microstate accounting information from the kernel,
along with various other statistics maintained within
the kernel.
RESTRICTIONS
The Collector can support up to 16K user threads. Data from
additional threads is discarded, and a collector error gen-
erated. To support more threads, set the environment vari-
able SP_COLLECTOR_NUMTHREADS to a larger number.
By default, the Collector collects stacks that are 256
frames deep. To support deeper stacks, set the environment
variable SP_COLLECTOR_STACKBUFSZ to a larger number.
The Collector interposes on some signal-handling routines to
ensure that its use of SIGPROF signals for clock-based pro-
filing and SIGEMT (Solaris) or SIGIO (Linux) for hardware
counter overflow profiling is not disrupted by the target
program. The Collector library re-installs its own signal
handler if the target program installs a signal handler. The
Collector's signal handler sets a flag that ensures that
system calls are not interrupted to deliver signals. This
setting could change the behavior of the target program.
The Collector interposes on setitimer(2) to ensure that the
profiling timer is not available to the target program if
clock-based profiling is enabled.
The Collector interposes on functions in the hardware
counter library, libcpc.so, so that an application cannot
use hardware counters while the Collector is collecting per-
formance data. The interposed functions return a value of
-1.
Dataspace profiling is not available on systems running the
Linux OS, nor on x86 based systems running the Solaris OS.
For this release, the data from collecting periodic samples
is not reliable on systems running the Linux OS.
For this release, wide data discrepancies are observed when
profiling multithreaded applications on systems running the
RedHat Enterprise Linux OS.
Hardware counter overflow profiling cannot be run on a sys-
tem where cpustat is running, because cpustat takes control
of the counters, and does not let a user process use them.
Java Profiling requires Java[TM] 2 SDK (JDK) 5, Update 19 or
later JDK 5's; or Java[TM] 2 SDK (JDK) 6, Update 18 or later
JDK 6's.
Data is not collected on descendant processes that are
created to use the setuid attribute, nor on any descendant
processes created with an exec function run on an executable
that is not dynamically linked. Furthermore, subsequent
descendant processes might produce corrupted or unreadable
experiments. The workaround is to ensure that all processes
spawned are dynamically-linked and do not have the setuid
attribute.
Applications that call vfork(2) have these calls replaced by
a call to fork1(2).
SEE ALSO
analyzer(1), collector(1), dbx(1), er_archive(1), er_cp(1),
er_export(1), er_mv(1), er_print(1), er_rm(1), tha(1), lib-
collector(3), and the Performance Analyzer manual.