The UltraSPARC and Pentium microprocessor families contain hardware performance counters that allow the measurement of many different hardware events related to CPU behavior, including instruction and data cache misses as well as various internal states of the processor. More recent processors allow a variety of events to be captured. The counters can be configured to count user events or system events, or both. The two processor families currently share the restriction that only two event types can be measured simultaneously.
UltraSPARC III and Pentium II processors are able to generate an interrupt on counter overflow, allowing the counters to be used for various forms of profiling.
This manual page describes a set of APIs that allow Solaris applications to use these counters. Applications can measure their own behavior, the behavior of other applications, or the behavior of the whole system.
There are two principal models for using these performance counters. Some users of these statistics wish to observe system-wide behavior; others wish to view the performance counters as part of the register set exported by each LWP. On a machine performing more than one activity, these two models are in conflict because the counters represent a critical hardware resource that cannot simultaneously be both shared and private.
To fully support the two-level threads model in Solaris, it would be necessary to virtualize the performance counters to each thread. This version of the library does not allow per-thread data to be captured unless bound threads are used. Even without bound threads, however, the counters can still be used to assess aggregate program behavior.
Although some events are common to all processors, it is apparent that the counters expose a great deal of the specific implementation details of the processor architecture. For this reason, events are specified by name using a string-based hardware event specification language. The values of the tokens in the language vary from processor model to processor model, and can only be interpreted with reference to the relevant hardware documentation. The functions provided to specify the strings use environment variables or arguments so that the names do not have to be compiled in applications, thus extending their longevity and portability across platforms and processor generations.
check the version the application was compiled with against the version of the library
determine the performance counter version of the current CPU
return the corresponding printable string to describe that interface
return the number of valid counter registers in the cpc_event(3CPC) data structure
return a reference to the corresponding processor documentation
Performance counters can be present in hardware but not acccessible because either some of the necessary system software components are not available or not installed, or the counters may be in use by other processes. The cpc_access(3CPC) function determines the accessibility of the counters and should be invoked before any attempt to program the counters.
Events are specified using a getsubopt(3C)-style language for both the events and the additional control bits that determine what causes the counters to increment. The cpc_strtoevent() function translates a string to an event specification which can then be used to program the counters.The cpc_eventtostr() function returns the canonical form of the string that corresponds to a particular event. The cpc_getusage(3CPC) function returns a string that specifies the syntax of the string, while cpc_walk_names(3CPC) allows the caller to apply a function to each possible event supported on the relevant processor.
Each processor on the system possesses its own set of performance counter registers. For a single process, it is often desirable to maintain the illusion that the counters are an intrinsic part of that process (whichever processors it runs on), since this allows the events to be directly attributed to the process without having to make passive all other activity on the system.
To achieve this behavior, the library associates performance counter context with each LWP in the process; the context consists of a small amount of kernel memory to hold the counter values when the LWP is not running, and some simple kernel functions to save and restore those counter values from and to the hardware registers when the LWP performs a normal context switch. A process can only observe and manipulate its own copy of the performance counter control and data registers.
Though applications can be modified to instrument themselves as demonstrated above, it is frequently useful to be able to examine the behavior of an existing application without changing the source code. A separate library, libpctx, provides a simple set of interfaces that use the facilities of proc(4) to control a target process, and together with functions in libcpc, allow truss-like tools to be constructed to measure the performance counters in other applications. An example of one such application is cputrack(1).
The functions in libpctx are independent of those in libcpc. These functions manage a process using an event-loop paradigm — that is, the execution of certain system calls by the controlled process cause the library to stop the controlled process and execute callback functions in the context of the controlling process. These handlers can perform various operations on the target process using APIs in libpctx and libcpc that consume pctx_t handles.
cputrack(1), cpustat(1M), cpc_access(3CPC), cpc_bind_event(3CPC), cpc_count_usr_events(3CPC), cpc_pctx_bind_event(3CPC), cpc_event(3CPC), cpc_event_diff(3CPC), cpc_getcpuver(3CPC), cpc_seterrfn(3CPC), cpc_shared_bind_event(3CPC), cpc_strtoevent(3CPC), cpc_version(3CPC), pctx_capture(3CPC), pctx_set_events(3CPC), proc(4).