Preparing Your Program for Data Collection and Analysis

Language:

You do not need to do anything special to prepare most programs for data collection and analysis. You should read one or more of the subsections below if your program does any of the following:

Installs a signal handler. See Data Collection and Signals.
Dynamically compiles functions. See Dynamic Functions and Modules.
Creates descendant processes that you want to profile. See Follow Processes with the -F option.
Uses the profiling timer or hardware counter API directly. See Using System Libraries.
Calls setuid(2) or executes a setuid file. See Data Collection and Signals and Using setuid and setgid.

Also, if you want to control data collection from your program during runtime, see Program Control of Data Collection Using libcollector Library.

Using Dynamically Allocated Memory

Many programs rely on dynamically-allocated memory, using features such as:

malloc, valloc, alloca (C/C++)
new (C++)
Stack local variables (Fortran)
MALLOC, MALLOC64 (Fortran)

You must take care to ensure that a program does not rely on the initial contents of dynamically allocated memory unless the memory allocation method is explicitly documented as setting an initial value. For example, compare the descriptions of calloc and malloc in the man page for malloc(3C).

Occasionally, a program that uses dynamically allocated memory might appear to work correctly when run alone, but might fail when run with performance data collection enabled. Symptoms might include unexpected floating‐point behavior, segmentation faults, or application-specific error messages.

Such behavior might occur if the uninitialized memory is, by chance, set to a benign value when the application is run alone but is set to a different value when the application is run in conjunction with the performance data collection tools. In such cases, the performance tools are not at fault. Any application that relies on the contents of dynamically allocated memory has a latent bug: an operating system is at liberty to provide any content whatsoever in dynamically allocated memory unless explicitly documented otherwise. Even if an operating system happens to always set dynamically allocated memory to a certain value today, such latent bugs might cause unexpected behavior with a later revision of the operating system, or if the program is ported to a different operating system in the future.

The following tools may help in finding such latent bugs:

Code Analyzer, a Oracle Developer Studio tool which when used with the compilers and other tools can show the following:

Static code checking

Code Analyzer can show results of static code checking, which is performed when you compile your application with the Oracle Developer Studio C or C++ compiler and specify the –xanalyze=code option.

Dynamic memory access checking

Code Analyzer can show results of dynamic memory access checking, which is performed when you instrument your binary with discover using the–a option, and then run the instrumented binary to generate data.

For more information, see Oracle Developer Studio 12.5: Code Analyzer User’s Guide
f95 -xcheck=init_local

For more information, see the Oracle Developer Studio 12.5: Fortran User’s Guide or the f95(1) man page.
lint utility

For more information, see the Oracle Developer Studio 12.5: C User’s Guide or the lint(1) man page.
Runtime checking under dbx

For more information, see the Oracle Developer Studio 12.5: Debugging a Program with dbx manual or the dbx(1) man page.

Using System Libraries

The Collector interposes on functions from various system libraries to collect tracing data and to ensure the integrity of data collection. The following list describes situations in which the Collector interposes on calls to library functions.

Collecting synchronization wait tracing data. The Collector interposes on functions from the Oracle Solaris C library, libc.so, on Oracle Solaris.

Collecting I/O Trace data. The Collector interposes on the following functions:

/* interposition function handles */
static int (*__real_open)(const char *path, int oflag, ...) = NULL;
#if WSIZE(32)
static int (*__real_open64)(const char *path, int oflag, ...) = NULL;
#endif
static int (*__real_fcntl)(int fildes, int cmd, ...) = NULL;
static int (*__real_openat)(int fildes, const char *path, int oflag, ...) = NULL;
static int (*__real_close)(int fildes) = NULL;
static FILE *(*__real_fopen)(const char *filename, const char *mode) = NULL;
static int (*__real_fclose)(FILE *stream) = NULL;
static int (*__real_dup)(int fildes) = NULL;
static int (*__real_dup2)(int fildes, int fildes2) = NULL;
static int (*__real_pipe)(int fildes[2]) = NULL;
static int (*__real_socket)(int domain, int type, int protocol) = NULL;
static int (*__real_mkstemp)(char *template) = NULL;
static int (*__real_mkstemps)(char *template, int slen) = NULL;
static int (*__real_creat)(const char *path, mode_t mode) = NULL;

#if WSIZE(32)
static int (*__real_creat64)(const char *path, mode_t mode) = NULL;
#endif
static FILE *(*__real_fdopen)(int fildes, const char *mode) = NULL;

static ssize_t (*__real_read)(int fildes, void *buf, size_t nbyte) = NULL;
static ssize_t (*__real_write)(int fildes, const void *buf, size_t nbyte) = NULL;
static ssize_t (*__real_readv)(int fildes, const struct iovec *iov, int iovcnt) = NULL;
static ssize_t (*__real_writev)(int fildes, const struct iovec *iov, int iovcnt) = NULL;
static size_t (*__real_fread)(void *ptr, size_t size, size_t nitems, FILE *stream) = NULL;
static size_t (*__real_fwrite)(void *ptr, size_t size, size_t nitems, FILE *stream) = NULL;
static ssize_t (*__real_pread)(int fildes, void *buf, size_t nbyte, off_t offset) = NULL;
static ssize_t (*__real_pwrite)(int fildes, const void *buf, size_t nbyte, off_t offset) = NULL;
#if OS(Linux)
static ssize_t (*__real_pwrite64)(int fildes, const void *buf, size_t nbyte, off64_t offset) = NULL;
#endif
static char *(*__real_fgets)(char *s, int n, FILE *stream) = NULL;
static int (*__real_fputs)(const char *s, FILE *stream) = NULL;
static int (*__real_fputc)(int c, FILE *stream) = NULL;
static int (*__real_fprintf)(FILE *stream, const char *format, ...) = NULL;
static int (*__real_vfprintf)(FILE *stream, const char *format, va_list ap) = NULL;

static off_t (*__real_lseek)(int fildes, off_t offset, int whence) = NULL;
static offset_t (*__real_llseek)(int fildes, offset_t offset, int whence) = NULL;
static int (*__real_chmod)(const char *path, mode_t mode) = NULL;
static int (*__real_access)(const char *path, int amode) = NULL;
static int (*__real_rename)(const char *old, const char *new) = NULL;
static int (*__real_mkdir)(const char *path, mode_t mode) = NULL;
static int (*__real_getdents)(int fildes, struct dirent *buf, size_t nbyte) = NULL;
static int (*__real_unlink)(const char *path) = NULL;
static int (*__real_fseek)(FILE *stream, long offset, int whence) = NULL;
static void (*__real_rewind)(FILE *stream) = NULL;
static long (*__real_ftell)(FILE *stream) = NULL;
static int (*__real_fgetpos)(FILE *stream, fpos_t *pos) = NULL;
static int (*__real_fsetpos)(FILE *stream, const fpos_t *pos) = NULL;
#if WSIZE(32)
static int (*__real_fgetpos64)(FILE *stream, fpos64_t *pos) = NULL;
static int (*__real_fsetpos64)(FILE *stream, const fpos64_t *pos) = NULL;
#endif // WSIZE(32)
static int (*__real_fsync)(int fildes) = NULL;
static struct dirent *(*__real_readdir)(DIR *dirp) = NULL;
#if OS(Linux)
static int (*__real_flock)(int fd, int operation) = NULL;
#endif
static int (*__real_lockf)(int fildes, int function, off_t size) = NULL;
static int (*__real_fflush)(FILE *stream) = NULL;
#if OS(Linux) && ARCH(Intel) && WSIZE(32)
static FILE *(*__real_fopen_2_1)(const char *filename, const char *mode) = NULL;
static int (*__real_fclose_2_1)(FILE *stream) = NULL;
static FILE *(*__real_fdopen_2_1)(int fildes, const char *mode) = NULL;
static int (*__real_fgetpos_2_2)(FILE *stream, fpos_t *pos) = NULL;
static int (*__real_fsetpos_2_2)(FILE *stream, const fpos_t *pos) = NULL;
static FILE *(*__real_fopen_2_0)(const char *filename, const char *mode) = NULL;
static int (*__real_fclose_2_0)(FILE *stream) = NULL;
static FILE *(*__real_fdopen_2_0)(int fildes, const char *mode) = NULL;
static int (*__real_fgetpos_2_0)(FILE *stream, fpos_t *pos) = NULL;
static int (*__real_fsetpos_2_0)(FILE *stream, const fpos_t *pos) = NULL;
#endif

Collecting heap tracing data. The Collector interposes on the functions malloc, realloc, memalign and free. Versions of these functions are found in the C standard library, libc.so, and also in other libraries such as libumen.so, libmalloc.so, and libmtmalloc.so.
Collecting MPI tracing data. The Collector interposes on functions from the specified MPI library.
Ensuring the integrity of clock data. The Collector interposes on setitimer and prevents the program from using the profiling timer.
Ensuring the integrity of hardware counter data. The Collector interposes on functions from the hardware counter library, libcpc.so, and prevents the program from using the counters. Calls from the program to functions from this library return a value of -1.
Enabling data collection on descendant processes. The Collector interposes on the functions fork(2), fork1(2), vfork(2), fork(3F), posix_spawn(3p), posix_spawnp(3p), system(3C), system(3F), sh(3F), popen(3C), and exec(2) and its variants. Calls to vfork are replaced internally by calls to fork1. These interpositions are done for the collect command.
Guaranteeing the handling of the SIGPROF and SIGEMT signals by the Collector. The Collector interposes on sigaction to ensure that its signal handler is the primary signal handler for these signals.

The interposition does not succeed under the following circumstances:

Statically linking a program with any of the libraries that contain functions that are interposed.
Attaching dbx to a running application that does not have the collector library preloaded.
Dynamically loading one of these libraries and resolving the symbols by searching only within the library.

The failure of interposition by the Collector can cause loss or invalidation of performance data.

The er_sync.so, er_iotrace.so, er_heap.so, and er_mpviewn.so (where n indicates the MPI version) libraries are loaded only if synchronization wait tracing data, I/O tracing data, heap tracing data, or MPI tracing data, respectively, are requested.

Data Collection and Signals

Signals are used for both clock profiling and hardware counter profiling. SIGPROF is used in data collection for all experiments. The period for generating the signal depends on the data being collected. SIGEMT on Solaris or SIGIO on Linux is used for hardware counter profiling. The overflow interval depends on the user parameter for profiling. Any user code that uses or manipulates the profiling signals may potentially interfere with data collection. When the Collector installs its signal handler for a profile signal, it sets a flag that ensures that system calls are not interrupted to deliver signals. This setting could change the behavior of a target program that uses the profiling signals for other purposes.

When the Collector installs its signal handler for a profile signal, it remembers whether or not the target had installed its own signal handler. The Collector also interposes on some signal-handling routines and does not allow the user to install a signal handler for these signals; it saves the user's handler, just as it does when the Collector replaces a user handler on starting the experiment.

Profiling signals are delivered by from the profiling timer or hardware-counter-overflow handling code in the kernel, or in response to: the kill(2), sigsend(2) tkill(2), tgkill(2) or _lwp_kill(2) system calls, the raise(3C) and sigqueue(3C) library calls or the kill command. A signal code is delivered with the signal so that the Collector can distinguish the origin. If it is delivered for profiling, it is processed by the Collector; if it is not delivered for profiling, it is delivered to the target signal handler.

When the Collector is running under dbx, the profiling signal delivered occasionally has its signal code corrupted, and a profile signal may be treated as if it were generated from a system or library call or a command. In that case, it will be incorrectly delivered to the user's handler. If the user handler was set to SIG_DFL, it will cause the process to fail core dump.

When the Collector is invoked after attaching to a target process, it will install its signal handler, but it cannot interpose on the signal-handling routines. If those user code installs a signal handler after the attach, it will override the Collector's signal handler, and data will be lost.

Note that any signal including either of the profiling signals might cause premature termination of a system call. The program must be prepared to handle that behavior. When libcollector installs the signal handlers for data collection it specifies to restart system calls that can be restarted. However, some system calls such as sleep(3C) return early without reporting an error.

Sample and Pause-Resume Signals

Signals may be specified by the user as a sample signal (-l) or a pause-resume signal (-y). SIGUSR1 or SIGUSR2 are recommended for this use, but any signal that is not used by the target, may be used.

The profiling signals may be used if the process does not otherwise use them, but they should be used only if no other signal is available. The Collector interposes on some signal-handling routines and does not allow the user to install a signal handler for these signals; it saves the user's handler, just as it does when the Collector replaces a user handler on starting the experiment.

If the Collector is invoked after attaching to a target process, and the user code installs a signal handler for the sample or pause-resume signal, those signals will not longer operate as specified.

Using `setuid` and `setgid`

Restrictions enforced by the dynamic loader make it difficult to use setuid(2) and collect performance data. If your program calls setuid or executes a setuid file, the Collector probably cannot write an experiment file because it lacks the necessary permissions for the new user ID.

The collect command operates by inserting a shared library, libcollector.so, into the target's address space (LD_PRELOAD). Several problems might arise if you invoke the collect command invoked on executables that call setuid or setgid, or that create descendant processes that call setuid or setgid. If you are not root when you run an experiment, collection fails because the shared libraries are not installed in a trusted directory. The workaround is to run the experiments as root, or use crle(1) to grant permission. Take great care when circumventing security barriers; you do so at your own risk.

When running the collect command, your umask must be set to allow write permission for you, for any users or groups that are set by the setuid attributes and setgid attributes of a program being executed with exec(), and for any user or group to which that program sets itself. If the mask is not set properly, some files might not be written to the experiment, and processing of the experiment might not be possible. If the log file can be written, an error is shown when you attempt to process the experiment.

Other problems can arise if the target itself makes any of the system calls to set UID or GID, or if it changes its umask and then forks or runs exec() on some other executable, or if crle was used to configure how the runtime linker searches for shared objects.

If an experiment is started as root on a target that changes its effective GID, the er_archive process that is automatically run when the experiment terminates fails because it needs a shared library that is not marked as trusted. In that case, you can run the er_archive utility (or the er_print utility or the analyzer command) immediately following the termination of the experiment, on the machine on which the experiment was recorded.

Program Control of Data Collection Using libcollector Library

If you want to control data collection from your program, the Collector shared library, libcollector.so contains some API functions that you can use. The functions are written in C. A Fortran interface is also provided. Both C and Fortran interfaces are defined in header files that are provided with the library.

The API functions are defined as follows.

void collector_sample(char *name);
void collector_pause(void);
void collector_resume(void);
void collector_terminate_expt(void);

Similar functionality is provided for Java programs by the CollectorAPI class, which is described in Java Interface.

C and C++ Interface

You can access the C and C++ interface of the Collector API by including collectorAPI.h and linking with -lcollectorAPI, which contains real functions to check for the existence of the underlying libcollector.so API functions.

If no experiment is active, the API calls are ignored.

Fortran Interface

The Fortran API libfcollector.h file defines the Fortran interface to the library. The application must be linked with -lcollectorAPI to use this library. (An alternate name for the library, -lfcollector, is provided for backward compatibility.) The Fortran API provides the same features as the C and C++ API, excluding the dynamic function and thread pause and resume calls.

Insert the following statement to use the API functions for Fortran:

include "libfcollector.h"

Note - Do not link a program in any language with -lcollector. If you do, the Collector might exhibit unpredictable behavior.

Java Interface

Use the following statement to import the CollectorAPI class and access the Java API. Note however that your application must be invoked with a classpath pointing to /installation_directory/lib/collector.jar where installation-directory is the directory in which the Oracle Developer Studio software is installed.

import com.sun.forte.st.collector.CollectorAPI;

The Java CollectorAPI methods are defined as follows:

CollectorAPI.sample(String name)
CollectorAPI.pause()
CollectorAPI.resume()
CollectorAPI.terminate()

The Java API includes the same functions as the C and C++ API, excluding the dynamic function API.

The C include file libcollector.h contains macros that bypass the calls to the real API functions if data is not being collected. In this case, the functions are not dynamically loaded. However, using these macros is risky because the macros do not work well under some circumstances. Using collectorAPI.h is safer because it does not use macros. Rather, it refers directly to the functions.

The Fortran API subroutines call the C API functions if performance data is being collected, otherwise, they return. The overhead for the checking is very small and should not significantly affect program performance.

To collect performance data you must run your program using the Collector, as described later in this chapter. Inserting calls to the API functions does not enable data collection.

If you intend to use the API functions in a multithreaded program, ensure that they are called only by one thread. The API functions perform actions that apply to the process and not to individual threads. If each thread calls the API functions, the data that is recorded might not be what you expect. For example, if collector_pause() or collector_terminate_expt() is called by one thread before the other threads have reached the same point in the program, collection is paused or terminated for all threads, and data can be lost from the threads that were executing code before the API call.

C, C++, Fortran, and Java API Functions

This section provides descriptions of the API functions.

C and C++: collector_sample(char *name)

Fortran: collector_sample(string name)

Java: CollectorAPI.sample(String name)

Record a sample packet and label the sample with the given name or string. The label is displayed by Performance Analyzer in the Selection Details window when you select a sample in the Timeline view. The Fortran argument string is of type character.

Sample points contain data for the process and not for individual threads. In a multithreaded application, the collector_sample() API function ensures that only one sample is written if another call is made while it is recording a sample. The number of samples recorded can be less than the number of threads making the call.

Performance Analyzer does not distinguish between samples recorded by different mechanisms. If you want to see only the samples recorded by API calls, you should turn off all other sampling modes when you record performance data.
C, C++, Fortran: collector_pause()

Java: CollectorAPI.pause()

Stop writing event-specific data to the experiment. The experiment remains open, and global data continues to be written. The call is ignored if no experiment is active or if data recording is already stopped. This function stops the writing of all event-specific data even if it is enabled for specific threads by the collector_thread_resume() function.
C, C++, Fortran: collector_resume()

Java: CollectorAPI.resume()

Resume writing event-specific data to the experiment after a call to collector_pause() . The call is ignored if no experiment is active or if data recording is active.
C, C++, Fortran: collector_terminate_expt()

Java: CollectorAPI.terminate

Terminate the experiment whose data is being collected. No further data is collected, but the program continues to run normally. The call is ignored if no experiment is active.

Dynamic Functions and Modules

If your C or C++ program dynamically compiles functions into the data space of the program, you must supply information to the Collector if you want to see data for the dynamic function or module in Performance Analyzer. The information is passed by calls to collector API functions. The definitions of the API functions are as follows.

void collector_func_load(char *name, char *alias,
    char *sourcename, void *vaddr, int size, int lntsize,
    Lineno *lntable);
void collector_func_unload(void *vaddr);

You do not need to use these API functions for Java methods that are compiled by the Java HotSpot virtual machine, for which a different interface is used. The Java interface provides the name of the method that was compiled to the Collector. You can see function data and annotated disassembly listings for Java compiled methods, but not annotated source listings.

This section provides descriptions of the API functions.

`collector_func_load()` Function

Pass information about dynamically compiled functions to the Collector for recording in the experiment. The parameter list is described in the following table.

Table 6 Parameter List for collector_func_load()

Parameter	Definition
`name`	The name of the dynamically compiled function that is used by the performance tools. The name does not have to be the actual name of the function. The name need not follow any of the normal naming conventions of functions, although it should not contain embedded blanks or embedded quote characters.
`alias`	An arbitrary string used to describe the function. It can be `NULL`. It is not interpreted in any way, and can contain embedded blanks. It is displayed in the Summary tab of the Analyzer. It can be used to indicate what the function is, or why the function was dynamically constructed.
`sourcename`	The path to the source file from which the function was constructed. It can be `NULL`. The source file is used for annotated source listings.
`vaddr`	The address at which the function was loaded.
`size`	The size of the function in bytes.
`lntsize`	A count of the number of entries in the line number table. It should be zero if line number information is not provided.
`lntable`	A table containing `lntsize` entries, each of which is a pair of integers. The first integer is an offset, and the second entry is a line number. All instructions between an offset in one entry and the offset given in the next entry are attributed to the line number given in the first entry. Offsets must be in increasing numeric order, but the order of line numbers is arbitrary. If `lntable` is `NULL`, no source listings of the function are possible, although disassembly listings are available.

`collector_func_unload()` Function

Inform the collector that the dynamic function at the address vaddr has been unloaded.