Go to main content
Oracle® Developer Studio 12.5: Performance Analyzer

Exit Print View

Updated: June 2016
 
 

Preparing Your Program for Data Collection and Analysis

You do not need to do anything special to prepare most programs for data collection and analysis. You should read one or more of the subsections below if your program does any of the following:

Also, if you want to control data collection from your program during runtime, see Program Control of Data Collection Using libcollector Library.

Using Dynamically Allocated Memory

    Many programs rely on dynamically-allocated memory, using features such as:

  • malloc, valloc, alloca (C/C++)

  • new (C++)

  • Stack local variables (Fortran)

  • MALLOC, MALLOC64 (Fortran)

You must take care to ensure that a program does not rely on the initial contents of dynamically allocated memory unless the memory allocation method is explicitly documented as setting an initial value. For example, compare the descriptions of calloc and malloc in the man page for malloc(3C).

Occasionally, a program that uses dynamically allocated memory might appear to work correctly when run alone, but might fail when run with performance data collection enabled. Symptoms might include unexpected floating‐point behavior, segmentation faults, or application-specific error messages.

Such behavior might occur if the uninitialized memory is, by chance, set to a benign value when the application is run alone but is set to a different value when the application is run in conjunction with the performance data collection tools. In such cases, the performance tools are not at fault. Any application that relies on the contents of dynamically allocated memory has a latent bug: an operating system is at liberty to provide any content whatsoever in dynamically allocated memory unless explicitly documented otherwise. Even if an operating system happens to always set dynamically allocated memory to a certain value today, such latent bugs might cause unexpected behavior with a later revision of the operating system, or if the program is ported to a different operating system in the future.

Using System Libraries

The Collector interposes on functions from various system libraries to collect tracing data and to ensure the integrity of data collection. The following list describes situations in which the Collector interposes on calls to library functions.

  • Collecting synchronization wait tracing data. The Collector interposes on functions from the Oracle Solaris C library, libc.so, on Oracle Solaris.

  • Collecting I/O Trace data. The Collector interposes on the following functions:

    /* interposition function handles */
    static int (*__real_open)(const char *path, int oflag, ...) = NULL;
    #if WSIZE(32)
    static int (*__real_open64)(const char *path, int oflag, ...) = NULL;
    #endif
    static int (*__real_fcntl)(int fildes, int cmd, ...) = NULL;
    static int (*__real_openat)(int fildes, const char *path, int oflag, ...) = NULL;
    static int (*__real_close)(int fildes) = NULL;
    static FILE *(*__real_fopen)(const char *filename, const char *mode) = NULL;
    static int (*__real_fclose)(FILE *stream) = NULL;
    static int (*__real_dup)(int fildes) = NULL;
    static int (*__real_dup2)(int fildes, int fildes2) = NULL;
    static int (*__real_pipe)(int fildes[2]) = NULL;
    static int (*__real_socket)(int domain, int type, int protocol) = NULL;
    static int (*__real_mkstemp)(char *template) = NULL;
    static int (*__real_mkstemps)(char *template, int slen) = NULL;
    static int (*__real_creat)(const char *path, mode_t mode) = NULL;
    
    #if WSIZE(32)
    static int (*__real_creat64)(const char *path, mode_t mode) = NULL;
    #endif
    static FILE *(*__real_fdopen)(int fildes, const char *mode) = NULL;
    
    static ssize_t (*__real_read)(int fildes, void *buf, size_t nbyte) = NULL;
    static ssize_t (*__real_write)(int fildes, const void *buf, size_t nbyte) = NULL;
    static ssize_t (*__real_readv)(int fildes, const struct iovec *iov, int iovcnt) = NULL;
    static ssize_t (*__real_writev)(int fildes, const struct iovec *iov, int iovcnt) = NULL;
    static size_t (*__real_fread)(void *ptr, size_t size, size_t nitems, FILE *stream) = NULL;
    static size_t (*__real_fwrite)(void *ptr, size_t size, size_t nitems, FILE *stream) = NULL;
    static ssize_t (*__real_pread)(int fildes, void *buf, size_t nbyte, off_t offset) = NULL;
    static ssize_t (*__real_pwrite)(int fildes, const void *buf, size_t nbyte, off_t offset) = NULL;
    #if OS(Linux)
    static ssize_t (*__real_pwrite64)(int fildes, const void *buf, size_t nbyte, off64_t offset) = NULL;
    #endif
    static char *(*__real_fgets)(char *s, int n, FILE *stream) = NULL;
    static int (*__real_fputs)(const char *s, FILE *stream) = NULL;
    static int (*__real_fputc)(int c, FILE *stream) = NULL;
    static int (*__real_fprintf)(FILE *stream, const char *format, ...) = NULL;
    static int (*__real_vfprintf)(FILE *stream, const char *format, va_list ap) = NULL;
    
    static off_t (*__real_lseek)(int fildes, off_t offset, int whence) = NULL;
    static offset_t (*__real_llseek)(int fildes, offset_t offset, int whence) = NULL;
    static int (*__real_chmod)(const char *path, mode_t mode) = NULL;
    static int (*__real_access)(const char *path, int amode) = NULL;
    static int (*__real_rename)(const char *old, const char *new) = NULL;
    static int (*__real_mkdir)(const char *path, mode_t mode) = NULL;
    static int (*__real_getdents)(int fildes, struct dirent *buf, size_t nbyte) = NULL;
    static int (*__real_unlink)(const char *path) = NULL;
    static int (*__real_fseek)(FILE *stream, long offset, int whence) = NULL;
    static void (*__real_rewind)(FILE *stream) = NULL;
    static long (*__real_ftell)(FILE *stream) = NULL;
    static int (*__real_fgetpos)(FILE *stream, fpos_t *pos) = NULL;
    static int (*__real_fsetpos)(FILE *stream, const fpos_t *pos) = NULL;
    #if WSIZE(32)
    static int (*__real_fgetpos64)(FILE *stream, fpos64_t *pos) = NULL;
    static int (*__real_fsetpos64)(FILE *stream, const fpos64_t *pos) = NULL;
    #endif // WSIZE(32)
    static int (*__real_fsync)(int fildes) = NULL;
    static struct dirent *(*__real_readdir)(DIR *dirp) = NULL;
    #if OS(Linux)
    static int (*__real_flock)(int fd, int operation) = NULL;
    #endif
    static int (*__real_lockf)(int fildes, int function, off_t size) = NULL;
    static int (*__real_fflush)(FILE *stream) = NULL;
    #if OS(Linux) && ARCH(Intel) && WSIZE(32)
    static FILE *(*__real_fopen_2_1)(const char *filename, const char *mode) = NULL;
    static int (*__real_fclose_2_1)(FILE *stream) = NULL;
    static FILE *(*__real_fdopen_2_1)(int fildes, const char *mode) = NULL;
    static int (*__real_fgetpos_2_2)(FILE *stream, fpos_t *pos) = NULL;
    static int (*__real_fsetpos_2_2)(FILE *stream, const fpos_t *pos) = NULL;
    static FILE *(*__real_fopen_2_0)(const char *filename, const char *mode) = NULL;
    static int (*__real_fclose_2_0)(FILE *stream) = NULL;
    static FILE *(*__real_fdopen_2_0)(int fildes, const char *mode) = NULL;
    static int (*__real_fgetpos_2_0)(FILE *stream, fpos_t *pos) = NULL;
    static int (*__real_fsetpos_2_0)(FILE *stream, const fpos_t *pos) = NULL;
    #endif
  • Collecting heap tracing data. The Collector interposes on the functions malloc, realloc, memalign and free. Versions of these functions are found in the C standard library, libc.so, and also in other libraries such as libumen.so, libmalloc.so, and libmtmalloc.so.

  • Collecting MPI tracing data. The Collector interposes on functions from the specified MPI library.

  • Ensuring the integrity of clock data. The Collector interposes on setitimer and prevents the program from using the profiling timer.

  • Ensuring the integrity of hardware counter data. The Collector interposes on functions from the hardware counter library, libcpc.so, and prevents the program from using the counters. Calls from the program to functions from this library return a value of -1.

  • Enabling data collection on descendant processes. The Collector interposes on the functions fork(2), fork1(2), vfork(2), fork(3F), posix_spawn(3p), posix_spawnp(3p), system(3C), system(3F), sh(3F), popen(3C), and exec(2) and its variants. Calls to vfork are replaced internally by calls to fork1. These interpositions are done for the collect command.

  • Guaranteeing the handling of the SIGPROF and SIGEMT signals by the Collector. The Collector interposes on sigaction to ensure that its signal handler is the primary signal handler for these signals.

    The interposition does not succeed under the following circumstances:

  • Statically linking a program with any of the libraries that contain functions that are interposed.

  • Attaching dbx to a running application that does not have the collector library preloaded.

  • Dynamically loading one of these libraries and resolving the symbols by searching only within the library.

The failure of interposition by the Collector can cause loss or invalidation of performance data.

The er_sync.so, er_iotrace.so, er_heap.so, and er_mpviewn.so (where n indicates the MPI version) libraries are loaded only if synchronization wait tracing data, I/O tracing data, heap tracing data, or MPI tracing data, respectively, are requested.

Data Collection and Signals

Signals are used for both clock profiling and hardware counter profiling. SIGPROF is used in data collection for all experiments. The period for generating the signal depends on the data being collected. SIGEMT on Solaris or SIGIO on Linux is used for hardware counter profiling. The overflow interval depends on the user parameter for profiling. Any user code that uses or manipulates the profiling signals may potentially interfere with data collection. When the Collector installs its signal handler for a profile signal, it sets a flag that ensures that system calls are not interrupted to deliver signals. This setting could change the behavior of a target program that uses the profiling signals for other purposes.

When the Collector installs its signal handler for a profile signal, it remembers whether or not the target had installed its own signal handler. The Collector also interposes on some signal-handling routines and does not allow the user to install a signal handler for these signals; it saves the user's handler, just as it does when the Collector replaces a user handler on starting the experiment.

Profiling signals are delivered by from the profiling timer or hardware-counter-overflow handling code in the kernel, or in response to: the kill(2), sigsend(2) tkill(2), tgkill(2) or _lwp_kill(2) system calls, the raise(3C) and sigqueue(3C) library calls or the kill command. A signal code is delivered with the signal so that the Collector can distinguish the origin. If it is delivered for profiling, it is processed by the Collector; if it is not delivered for profiling, it is delivered to the target signal handler.

When the Collector is running under dbx, the profiling signal delivered occasionally has its signal code corrupted, and a profile signal may be treated as if it were generated from a system or library call or a command. In that case, it will be incorrectly delivered to the user's handler. If the user handler was set to SIG_DFL, it will cause the process to fail core dump.

When the Collector is invoked after attaching to a target process, it will install its signal handler, but it cannot interpose on the signal-handling routines. If those user code installs a signal handler after the attach, it will override the Collector's signal handler, and data will be lost.

Note that any signal including either of the profiling signals might cause premature termination of a system call. The program must be prepared to handle that behavior. When libcollector installs the signal handlers for data collection it specifies to restart system calls that can be restarted. However, some system calls such as sleep(3C) return early without reporting an error.

Sample and Pause-Resume Signals

Signals may be specified by the user as a sample signal (-l) or a pause-resume signal (-y). SIGUSR1 or SIGUSR2 are recommended for this use, but any signal that is not used by the target, may be used.

The profiling signals may be used if the process does not otherwise use them, but they should be used only if no other signal is available. The Collector interposes on some signal-handling routines and does not allow the user to install a signal handler for these signals; it saves the user's handler, just as it does when the Collector replaces a user handler on starting the experiment.

If the Collector is invoked after attaching to a target process, and the user code installs a signal handler for the sample or pause-resume signal, those signals will not longer operate as specified.

Using setuid and setgid

Restrictions enforced by the dynamic loader make it difficult to use setuid(2) and collect performance data. If your program calls setuid or executes a setuid file, the Collector probably cannot write an experiment file because it lacks the necessary permissions for the new user ID.

The collect command operates by inserting a shared library, libcollector.so, into the target's address space (LD_PRELOAD). Several problems might arise if you invoke the collect command invoked on executables that call setuid or setgid, or that create descendant processes that call setuid or setgid. If you are not root when you run an experiment, collection fails because the shared libraries are not installed in a trusted directory. The workaround is to run the experiments as root, or use crle(1) to grant permission. Take great care when circumventing security barriers; you do so at your own risk.

When running the collect command, your umask must be set to allow write permission for you, for any users or groups that are set by the setuid attributes and setgid attributes of a program being executed with exec(), and for any user or group to which that program sets itself. If the mask is not set properly, some files might not be written to the experiment, and processing of the experiment might not be possible. If the log file can be written, an error is shown when you attempt to process the experiment.

Other problems can arise if the target itself makes any of the system calls to set UID or GID, or if it changes its umask and then forks or runs exec() on some other executable, or if crle was used to configure how the runtime linker searches for shared objects.

If an experiment is started as root on a target that changes its effective GID, the er_archive process that is automatically run when the experiment terminates fails because it needs a shared library that is not marked as trusted. In that case, you can run the er_archive utility (or the er_print utility or the analyzer command) immediately following the termination of the experiment, on the machine on which the experiment was recorded.

Program Control of Data Collection Using libcollector Library

If you want to control data collection from your program, the Collector shared library, libcollector.so contains some API functions that you can use. The functions are written in C. A Fortran interface is also provided. Both C and Fortran interfaces are defined in header files that are provided with the library.

The API functions are defined as follows.

void collector_sample(char *name);
void collector_pause(void);
void collector_resume(void);
void collector_terminate_expt(void);

Similar functionality is provided for Java programs by the CollectorAPI class, which is described in Java Interface.

C and C++ Interface

You can access the C and C++ interface of the Collector API by including collectorAPI.h and linking with -lcollectorAPI, which contains real functions to check for the existence of the underlying libcollector.so API functions.

If no experiment is active, the API calls are ignored.

Fortran Interface

The Fortran API libfcollector.h file defines the Fortran interface to the library. The application must be linked with -lcollectorAPI to use this library. (An alternate name for the library, -lfcollector, is provided for backward compatibility.) The Fortran API provides the same features as the C and C++ API, excluding the dynamic function and thread pause and resume calls.

Insert the following statement to use the API functions for Fortran:

include "libfcollector.h"

Note -  Do not link a program in any language with -lcollector. If you do, the Collector might exhibit unpredictable behavior.

Java Interface

Use the following statement to import the CollectorAPI class and access the Java API. Note however that your application must be invoked with a classpath pointing to /installation_directory/lib/collector.jar where installation-directory is the directory in which the Oracle Developer Studio software is installed.

import com.sun.forte.st.collector.CollectorAPI;

The Java CollectorAPI methods are defined as follows:

CollectorAPI.sample(String name)
CollectorAPI.pause()
CollectorAPI.resume()
CollectorAPI.terminate()

The Java API includes the same functions as the C and C++ API, excluding the dynamic function API.

The C include file libcollector.h contains macros that bypass the calls to the real API functions if data is not being collected. In this case, the functions are not dynamically loaded. However, using these macros is risky because the macros do not work well under some circumstances. Using collectorAPI.h is safer because it does not use macros. Rather, it refers directly to the functions.

The Fortran API subroutines call the C API functions if performance data is being collected, otherwise, they return. The overhead for the checking is very small and should not significantly affect program performance.

To collect performance data you must run your program using the Collector, as described later in this chapter. Inserting calls to the API functions does not enable data collection.

If you intend to use the API functions in a multithreaded program, ensure that they are called only by one thread. The API functions perform actions that apply to the process and not to individual threads. If each thread calls the API functions, the data that is recorded might not be what you expect. For example, if collector_pause() or collector_terminate_expt() is called by one thread before the other threads have reached the same point in the program, collection is paused or terminated for all threads, and data can be lost from the threads that were executing code before the API call.

C, C++, Fortran, and Java API Functions

This section provides descriptions of the API functions.

  • C and C++: collector_sample(char *name)

    Fortran: collector_sample(string name)

    Java: CollectorAPI.sample(String name)

    Record a sample packet and label the sample with the given name or string. The label is displayed by Performance Analyzer in the Selection Details window when you select a sample in the Timeline view. The Fortran argument string is of type character.

    Sample points contain data for the process and not for individual threads. In a multithreaded application, the collector_sample() API function ensures that only one sample is written if another call is made while it is recording a sample. The number of samples recorded can be less than the number of threads making the call.

    Performance Analyzer does not distinguish between samples recorded by different mechanisms. If you want to see only the samples recorded by API calls, you should turn off all other sampling modes when you record performance data.

  • C, C++, Fortran: collector_pause()

    Java: CollectorAPI.pause()

    Stop writing event-specific data to the experiment. The experiment remains open, and global data continues to be written. The call is ignored if no experiment is active or if data recording is already stopped. This function stops the writing of all event-specific data even if it is enabled for specific threads by the collector_thread_resume() function.

  • C, C++, Fortran: collector_resume()

    Java: CollectorAPI.resume()

    Resume writing event-specific data to the experiment after a call to collector_pause() . The call is ignored if no experiment is active or if data recording is active.

  • C, C++, Fortran: collector_terminate_expt()

    Java: CollectorAPI.terminate

    Terminate the experiment whose data is being collected. No further data is collected, but the program continues to run normally. The call is ignored if no experiment is active.

Dynamic Functions and Modules

If your C or C++ program dynamically compiles functions into the data space of the program, you must supply information to the Collector if you want to see data for the dynamic function or module in Performance Analyzer. The information is passed by calls to collector API functions. The definitions of the API functions are as follows.

void collector_func_load(char *name, char *alias,
    char *sourcename, void *vaddr, int size, int lntsize,
    Lineno *lntable);
void collector_func_unload(void *vaddr);

You do not need to use these API functions for Java methods that are compiled by the Java HotSpot virtual machine, for which a different interface is used. The Java interface provides the name of the method that was compiled to the Collector. You can see function data and annotated disassembly listings for Java compiled methods, but not annotated source listings.

This section provides descriptions of the API functions.

collector_func_load() Function

Pass information about dynamically compiled functions to the Collector for recording in the experiment. The parameter list is described in the following table.

Table 6  Parameter List for collector_func_load()
Parameter
Definition
name
The name of the dynamically compiled function that is used by the performance tools. The name does not have to be the actual name of the function. The name need not follow any of the normal naming conventions of functions, although it should not contain embedded blanks or embedded quote characters.
alias
An arbitrary string used to describe the function. It can be NULL. It is not interpreted in any way, and can contain embedded blanks. It is displayed in the Summary tab of the Analyzer. It can be used to indicate what the function is, or why the function was dynamically constructed.
sourcename
The path to the source file from which the function was constructed. It can be NULL. The source file is used for annotated source listings.
vaddr
The address at which the function was loaded.
size
The size of the function in bytes.
lntsize
A count of the number of entries in the line number table. It should be zero if line number information is not provided.
lntable
A table containing lntsize entries, each of which is a pair of integers. The first integer is an offset, and the second entry is a line number. All instructions between an offset in one entry and the offset given in the next entry are attributed to the line number given in the first entry. Offsets must be in increasing numeric order, but the order of line numbers is arbitrary. If lntable is NULL, no source listings of the function are possible, although disassembly listings are available.

collector_func_unload() Function

Inform the collector that the dynamic function at the address vaddr has been unloaded.