How Data Collection Works

Language:

The output from a data collection run is an experiment, which is stored as a directory with various internal files and subdirectories in the file system.

Experiment Format

All experiments must have three files:

A log file (log.xml), an XML file that contains information about what data was collected, the versions of various components, a record of various events during the life of the target, and the word size of the target
A map file (map.xml), an XML file that records the time-dependent information about what load objects are loaded into the address space of the target, and the times at which they are loaded or unloaded
An overview file, which is a binary file containing usage information recorded at every sample point in the experiment

In addition, experiments have binary data files representing the profile events in the life of the process. Each data file has a series of events, as described in Interpreting Performance Metrics. Separate files are used for each type of data, but each file is shared by all threads in the target.

For clock profiling or hardware counter overflow profiling, the data is written in a signal handler invoked by the clock tick or counter overflow. For synchronization tracing, heap tracing, I/O tracing, MPI tracing, or OpenMP tracing, data is written from libcollector routines that are interposed by the LD_PRELOAD environment variable on the normal user-invoked routines. Each such interposition routine partially fills in a data record, then invokes the normal user-invoked routine, and fills in the rest of the data record when that routine returns, and writes the record to the data file.

All data files are memory-mapped and written in blocks. The records are filled in such a way as to always have a valid record structure, so that experiments can be read as they are being written. The buffer management strategy is designed to minimize contention and serialization between threads.

An experiment can optionally contain an ASCII file with the name notes. This file is automatically created when using the -C comment argument to the collect command. You can create or edit the file manually after the experiment has been created. The contents of the file are prepended to the experiment header.

`archives` Directory

Each experiment has an archives directory that contains binary files describing each load object referenced in the map.xml file. These files are produced by the er_archive utility, which runs at the end of data collection. If the process terminates abnormally, the er_archive utility might not be invoked, in which case, the archive files are written by the er_print utility or Performance Analyzer when first invoked on the experiment.

Archive directories can also include copies of shared objects or of source files, depending on the options used to archive the experiment.

Subexperiments

Subexperiments are created when multiple processes are profiled, such as when you follow descendant processes, collect an MPI experiment, or profile the kernel with user processes.

Descendant processes write their experiments into subdirectories within the founder-experiment directory. These new subexperiments are named to indicate their lineage as follows:

An underscore is appended to the creator's experiment name.
One of the following code letters is added: f for fork, x for exec, and c for other descendants. On Linux, C is used for a descendant generated by clone(2).
A number to indicate the index of the fork or exec is added after the code letter. .
The experiment suffix, .er is appended to complete the experiment name.

For user processes, if the experiment name for the founder process is test.1.er, the experiment for the descendant process created by its third fork is test.1.er/_f3.er. If that descendant process executes a new image, the corresponding experiment name is test.1.er/_f3_x1.er. Descendant experiments consist of the same files as the parent experiment, but they do not have descendant experiments (all descendants are represented by subdirectories in the founder experiment), and they do not have archive subdirectories (all archiving is done into the founder experiment).

Experiments on the kernel by default are named ktest.1.er rather than test.1.er. When data is also collected on user processes, the kernel experiment contains subexperiments for each user process being followed. The kernel subexperiments are named using the format _process-name_PID_process-id.1.er. For example an experiment run on a sshd process running under process ID 1264 would be named ktest.1.er/_sshd_PID_1264.1.er.

Data for MPI programs are collected by default into test.1.er, and all the data from the MPI processes are collected into subexperiments, one per rank. The Collector uses the MPI rank to construct a subexperiment name with the form M_rm.er, where m is the MPI rank. For example, MPI rank 1 would have its experiment data recorded in the test.1.er/M_r1.er directory.

Dynamic Functions

An experiment where the target creates dynamic functions has additional records in the map.xml file describing those functions. An additional file, dyntext, contains a copy of the actual instructions of the dynamic functions. The copy is needed to produce annotated disassembly of dynamic functions.

Java Experiments

A Java experiment has additional records in the map.xml file, both for dynamic functions created by the JVM software for its internal purposes and for dynamically compiled (HotSpot) versions of the target Java methods.

In addition, a Java experiment includes a JAVA_CLASSES file, containing information about all of the user’s Java classes invoked.

Java tracing data is recorded using a JVMTI agent, which is part of libcollector.so. The agent receives events that are mapped into the recorded trace events. The agent also receives events for class loading and HotSpot compilation, that are used to write the JAVA_CLASSES file, and the Java-compiled method records in the map.xml file.

Recording Experiments

You can record an experiment on a user-mode target in three different ways:

With the collect command
With dbx creating a process
With dbx creating an experiment from a running process

The Profile Application dialog in Performance Analyzer runs a collect experiment.

`collect` Experiments

When you use the collect command to record an experiment, the collect utility creates the experiment directory and sets the LD_PRELOAD environment variable to ensure that libcollector.so and other libcollector modules are preloaded into the target’s address space. The collect utility then sets environment variables to inform libcollector.so about the experiment name, and data collection options, and executes the target on top of itself.

libcollector.so and associated modules are responsible for writing all experiment files.

`dbx` Experiments That Create a Process

When dbx is used to launch a process with data collection enabled, dbx also creates the experiment directory and ensures preloading of libcollector.so. Then dbx stops the process at a breakpoint before its first instruction, and calls an initialization routine in libcollector.so to start the data collection.

Java experiments can not be collected by dbx, because dbx uses a Java Virtual Machine Debug Interface (JVMDI) agent for debugging. That agent can not coexist with the Java Virtual Machine Tools Interface (JVMTI) agent needed for data collection.

`dbx` Experiments on a Running Process

When dbx is used to start an experiment on a running process, it creates the experiment directory but cannot use the LD_PRELOAD environment variable. dbx makes an interactive function call into the target to open libcollector.so, and then calls the libcollector.so initialization routine, just as it does when creating the process. Data is written by libcollector.so and its modules just as in a collect experiment.

Because libcollector.so was not in the target address space when the process started, any data collection that depends on interposition on user-callable functions (synchronization tracing, heap tracing, MPI tracing) might not work. In general, the symbols have already been resolved to the underlying functions so the interposition can not happen. Furthermore, the following of descendant processes also depends on interposition, and does not work properly for experiments created by dbx on a running process.

If you have explicitly preloaded libcollector.so before starting the process with dbx or before using dbx to attach to the running process, you can collect tracing data.