Programming Interfaces Guide

Chapter 1 Memory and CPU Management

This chapter describes an application developer's view of virtual memory and CPU management in the Solaris operating system.

Memory Management Interfaces

Applications use the virtual memory facilities through several sets of interfaces. This section summarizes these interfaces. This section also provides examples of the interfaces' use.

Creating and Using Mappings

mmap(2) establishes a mapping of a named file system object into a process address space. A named file system object can also be partially mapped into a process address space. This basic memory management interface is very simple. Use open(2) to open the file, then use mmap(2) to create the mapping with appropriate access and sharing options. Then, proceed with your application.

The mapping established by mmap(2) replaces any previous mappings for the specified address range.

The flags MAP_SHARED and MAP_PRIVATE specify the type of mapping. You must specify a mapping type. If the MAP_SHARED flag is set, write operations modify the mapped object. No further operations on the object are needed to make the change. If the MAP_PRIVATE flag is set, the first write operation to the mapped area creates a copy of the page. All further write operations reference the copy. Only modified pages are copied.

A mapping type is retained across a fork(2).

After you have established the mapping through mmap(2), the file descriptor used in the call is no longer used. If you close the file, the mapping remains until munmap(2) undoes the mapping. Creating a new mapping replaces an existing mapping.

A mapped file can be shortened by a call to truncate. An attempt to access the area of the file that no longer exists causes a SIGBUS signal.

Mapping /dev/zero gives the calling program a block of zero-filled virtual memory. The size of the block is specified in the call to mmap(2). The following code fragment demonstrates a use of this technique to create a block of zeroed storage in a program. The block's address is chosen by the system.

removed to fr.ch4/pl1.create.mapping.c

Some devices or files are useful only when accessed by mapping. Frame buffer devices used to support bit-mapped displays are an example of this phenomenon. Display management algorithms are much simpler to implement when the algorithms operate directly on the addresses of the display.

Removing Mappings

munmap(2) removes all mappings of pages in the specified address range of the calling process. munmap(2) has no affect on the objects that were mapped.

Cache Control

The virtual memory system in SunOS is a cache system, in which processor memory buffers data from file system objects. Interfaces are provided to control or interrogate the status of the cache.

Using mincore

The mincore(2) interface determines the residency of the memory pages in the address space covered by mappings in the specified range. Because the status of a page can change after mincore checks the page but before mincore returns the data, returned information can be outdated. Only locked pages are guaranteed to remain in memory.

Using mlock and munlock

mlock(3C) causes the pages in the specified address range to be locked in physical memory. References to locked pages in this process or in other processes do not result in page faults that require an I/O operation. Because this I/O operation interferes with normal operation of virtual memory, as well as slowing other processes, the use of mlock is limited to the superuser. The limit to the number of pages that can be locked in memory is dependent on system configuration. The call to mlock fails if this limit is exceeded.

munlock releases the locks on physical pages. If multiple mlock calls are made on an address range of a single mapping, a single munlock call releases the locks. However, if different mappings to the same pages are locked by mlock, the pages are not unlocked until the locks on all the mappings are released.

Removing a mapping also releases locks, either through being replaced with an mmap(2) operation or removed with munmap(2).

The copy-on-write event that is associated with a MAP_PRIVATE mapping transfers a lock on the source page to the destination page. Thus locks on an address range that includes MAP_PRIVATE mappings are retained transparently along with the copy-on-write redirection. For a discussion of this redirection, see Creating and Using Mappings.

Using mlockall and munlockall

mlockall(3C) and munlockall(3C) are similar to mlock and munlock, but mlockall and munlockall operate on entire address spaces. mlockall sets locks on all pages in the address space and munlockall removes all locks on all pages in the address space, whether established by mlock or mlockall.

Using msync

msync(3C) causes all modified pages in the specified address range to be flushed to the objects mapped by those addresses. This command is similar to fsync(3C), which operates on files.

Library-Level Dynamic Memory

Library-level dynamic memory allocation provides an easy-to-use interface to dynamic memory allocation.

Dynamic Memory Allocation

The most often used interfaces are:

Other dynamic memory allocation interfaces are memalign(3C), valloc(3C), and realloc(3C)

Dynamic Memory Debugging

The Sun WorkShop package of tools is useful in finding and eliminating errors in dynamic memory use. The Run Time Checking (RTC) facility of the Sun WorkShop uses the functions that are described in this section to find errors in dynamic memory use.

RTC does not require the program be compiled using -g in order to find all errors. However, symbolic (-g) information is sometimes needed to guarantee the correctness of certain errors, particularly errors that are read from uninitialized memory. For this reason, certain errors are suppressed if no symbolic information is available. These errors are rui for a.out and rui + aib + air for shared libraries. This behavior can be changed by using suppress and unsuppress.

check -access

The -access option turns on access checking. RTC reports the following errors:

baf

Bad free

duf

Duplicate free

maf

Misaligned free

mar

Misaligned read

maw

Misaligned write

oom

Out of memory

rua

Read from unallocated memory

rui

Read from uninitialized memory

rwo

Write to read-only memory

wua

Write to unallocated memory

The default behavior is to stop the process after detecting each access error. This behavior can be changed using the rtc_auto_continue dbxenv variable. When set to on, RTC logs access errors to a file. The file name is determined by the value of the rtc_error_log_file_name dbxenv variable. By default, each unique access error is only reported the first time the error happens. Change this behavior using the rtc_auto_suppress dbxenv variable. The default setting of this variable is on.

check -leaks [-frames n] [-match m]

The -leaks option turns on leak checking. RTC reports the following errors:

aib

Possible memory leak – The only pointer points in the middle of the block

air

Possible memory leak – The pointer to the block exists only in register

mel

Memory leak – No pointers to the block

With leak checking turned on, you get an automatic leak report when the program exits. All leaks, including potential leaks, are reported at that time. By default, a non-verbose report is generated. This default is controlled by the dbxenv rtc_mel_at_exit. However, you can ask for a leak report at any time.

The -frames n variable displays up to n distinct stack frames when reporting leaks. The -match m variable combines leaks. If the call stack at the time of allocation for two or more leaks matches m frames, these leaks are reported in a single combined leak report. The default value of n is the larger of 8 or the value of m. The maximum value of n is 16. The default value of m is 2.

check -memuse [-frames n] [-match m]

The -memuse option turns on memory use (memuse) checking. Using check -memuse implies using check -leaks. In addition to a leak report at program exit, you also get a report listing blocks in use, biu. By default, a non-verbose report on blocks in use is generated. This default is controlled by the dbxenv rtc_biu_at_exit. At any time during program execution, you can see where the memory in your program has been allocated.

The -frames n and -match m variables function as described in the following section.

check -all [-frames n] [-match m]

Equivalent to check -access; check -memuse [-frames n] [-match m]. The value of rtc_biu_at_exit dbxenv variable is not changed with check -all. So, by default, no memory use report is generated at exit.

check [funcs] [files] [loadobjects]

Equivalent to check -all; suppress all; unsuppress all in funcs files loadobjects. You can use this option to focus RTC on places of interest.

Other Memory Control Interfaces

This section discusses additional memory control interfaces.

Using sysconf

sysconf(3C) returns the system dependent size of a memory page. For portability, applications should not embed any constants that specify the size of a page. Note that varying page sizes are not unusual, even among implementations of the same instruction set.

Using mprotect

mprotect(2) assigns the specified protection to all pages in the specified address range. The protection cannot exceed the permissions that are allowed on the underlying object.

Using brk and sbrk

A break is the greatest valid data address in the process image that is not in the stack. When a program starts executing, the break value is normally set by execve(2) to the greatest address defined by the program and its data storage.

Use brk(2) to set the break to a greater address. You can also use sbrk(2) to add an increment of storage to the data segment of a process. You can get the maximum possible size of the data segment by a call to getrlimit(2).

caddr_t
brk(caddr_t addr);

caddr_t
sbrk(intptr_t incr);

brk identifies the lowest data segment location not used by the caller as addr. This location is rounded up to the next multiple of the system page size.

sbrk, the alternate interface, adds incr bytes to the caller data space and returns a pointer to the start of the new data area.

CPU Performance Counters

This section describes developer interfaces for use of CPU Performance counters (CPC). Solaris applications can use CPC independent of the underlying counter architecture.

API Additions to libcpc

This section covers recent additions to the libcpc(3LIB) library. Please see the libcpc man page for information on older interfaces.

Initialization Interfaces

An application preparing to use the CPC facility initializes the library with a call to the cpc_open() function. This function returns a cpc_t * parameter that is used by the other interfaces. The syntax for the cpc_open() function is as follows:

cpc_t*cpc_open(intver);

The value of the ver parameter identifies the version of the interface that the application is using. The cpc_open() function fails if the underlying counters are inaccessible or unavailable.

Hardware Query Interfaces

uint_t cpc_npic(cpc_t *cpc);
uint_t cpc_caps(cpc_t *cpc);
void cpc_walk_events_all(cpc_t *cpc, void *arg,
          void (*action)(void *arg, const char *event));
void cpc_walk_events_pic(cpc_t *cpc, uint_t picno, void *arg, 
          void(*action)(void *arg, uint_t picno, const char *event));
void cpc_walk-attrs(cpc_t *cpc, void *arg,
          void (*action)(void *arg, const char *attr));

The cpc_npic() function returns the number of physical counters on the underlying processor.

The cpc_caps() function returns a uint_t parameter whose value is the result of the bitwise inclusive-OR operation performed on the capabilities that the underlying processor supports. There are two capabilities. The CPC_CAP_OVERFLOW_INTERRUPT capability enables the processor to generate an interrupt when a counter overflows. The CPC_CAP_OVERFLOW_PRECISE capability enables the processor to determine which counter generates an overflow interrupt.

The kernel maintains a list of the events that the underlying processor supports. Different physical counters on a single chip do not have to use the same list of events. The cpc_walk_events_all() function calls the the action() routine for each processor-supported event without regard to physical counter. The cpc_walk_events_pic() function calls the action() routine for each processor-supported event on a specific physical counter. Both of these functions pass the arg parameter uninterpreted from the caller to each invocation of the action() function.

The platform maintains a list of attributes that the underlying processor supports. These attributes enable access to advanced processor-specific features of the performance counters. The cpc_walk_attrs() function calls the action routine on each attribute name.

Configuration Interfaces

cpc_set_t *cpc_set_create(cpc_t *cpc);
int cpc_set_destroy(cpc_t *cpc, cpc_set_t *set);
int cpc_set_add_request(cpc_t *cpc, cpc_set_t *set, const char *event,
          uint64_t preset, uint_t flags, uint_t nattrs,
          const cpc_attr_t *attrs);
int cpc_set_request_preset(cpc_t *cpc, cpc_set_t *set, int index,
          uint64_t preset);

The opaque data type cpc_set_t represents collections of requests. The collections are called sets. The cpc_set_create() function creates an empty set. The cpc_set_destroy() function destroys a set and frees all the memory used by the set. Destroying a set releases the hardware resources the set uses.

The cpc_set_add_request() function adds requests to a set. The following list describes the parameters of a request.

event

A string that specifies the name of the event to count.

preset

A 64–bit unsigned integer that is used as the initial value of the counter.

flags

The results of the logical OR operation applied to a group of request flags.

nattrs

The number of attributes in the array that attrs points to.

attrs

A pointer to an array of cpc_attr_t structures.

The following list describes the valid request flags.

CPC_COUNT_USER

This flag enables counting of events that occur while the CPU is executing in user mode.

CPC_COUNT_SYSTEM

This flag enables counting of events that occur while the CPU is executing in privileged mode.

CPC_OVF_NOTIFY_EMT

This flag requests notification of hardware counter overflow.

The CPC interfaces pass attributes as an array of cpc_attr_t structures.

When the cpc_set_add_request() function returns successfully, it returns an index. The index references the data generated by the request added by the call to the cpc_set_add_request() function.

The cpc_set_request_preset() function changes the preset value of a request. This enables the re-binding of an overflowed set with new presets.

The cpc_walk_requests() function calls a user-provided action() routine on each request in cpc_set_t. The value of the arg parameter is passed to the user routine without interpretation. The cpc_walk_requests() function allows applications to print the configuration of each request in a set. The syntax for the cpc_walk_requests() function is as follows:

void cpc_walk_requests(cpc_t *cpc, cpc_set_t *set, void *arg,
void (*action)(void *arg, int index, const char *event,
uint64_t preset, uint_t flags, int nattrs,
            const cpc_attr_t *attrs));

Binding

The interfaces in this section bind the requests in a set to the physical hardware and set the counters to a starting position.

int cpc_bind_curlwp(cpc_t *cpc, cpc_set_t *set, uint_t flags);
int cpc_bind_pctx(cpc_t *cpc, pctx_t *pctx, id_t id, cpc_set_t *set,
          uint_t flags);
int cpc_bind_cpu(cpc_t *cpc, processorid_t id, cpc_set_t *set, 
          uint_t flags);
int cpc_unbind(cpc_t *cpc, cpc_set_t *set);

The cpc_bind_curlwp() function binds the set to the calling LWP. The set's counters are virtualized to this LWP and count the events that occur on the CPU while the calling LWP runs. The only flag that is valid for the cpc_bind_curlwp() routine is CPC_BIND_LWP_INHERIT.

The cpc_bind_pctx() function binds the set to a LWP in a process that is captured with libpctx(3LIB). This function has no valid flags.

The cpc_bind_cpu() function binds the set to the processor specified in the id parameter. Binding a set to a CPU invalidates existing performance counter contexts on the system. This function has no valid flags.

The cpc_unbind() function stops the performance counters and releases the hardware that is associated with the bound set. If a set is bound to a CPU, the cpc_unbind() function unbinds the LWP from the CPU and releases the CPC pseudo-device.

Sampling

The interfaces described in this section enable the return of data from the counters to the application. Counter data resides in an opaque data structure named cpc_buf_t. This data structure takes a snapshot of the state of counters in use by a bound set and includes the following information:

cpc_buf_t *cpc_buf_create(cpc_t *cpc, cpc_set_t *set);
int cpc_buf_destroy(cpc_t *cpc, cpc_buf_t *buf);
int cpc_set_sample(cpc_t *cpc, cpc_set_t *set, cpc_buf_t *buf);

The cpc_buf_create() function creates a buffer that stores data from the set specified in cpc_set_t. The cpc_buf_destroy() function frees the memory that is associated with the given cpc_buf_t. The cpc_buf_sample() function takes a snapshot of the counters that are counting on behalf of the specified set. The specified set must already be bound and have a buffer created before calling the cpc_buf_sample() function.

Sampling into a buffer does not update the preset of the requests that are associated with that set. When a buffer is sampled with the cpc_buf_sample() function, then unbound and bound again, counts start from the request's preset as in the original call to the cpc_set_add_request() function.

Buffer Operations

The following routines provide access to the data in a cpc_buf_t structure.

int cpc_buf_get(cpc_t *cpc, cpc_buf_t *buf, int index, uint64_t *val);
int cpc_buf_set(cpc_t *cpc, cpc_buf_t *buf, int index, uint64_t *val);
hrtime_t cpc_buf_hrtime(cpc_t *cpc, cpc_buf_t *buf);
uint64_t cpc_buf_tick(cpc_t *cpc, cpc_buf_t *buf);
int cpc_buf_sub(cpc_t *cpc, cpc_buf_t *result, cpc_buf_t *left
      cpc_buf_t *right);
int cpc_buf_add(cpc_t *cpc, cpc_buf_t *result, cpc_buf_t *left,
      cpc_buf_t *right);
int cpc_buf_copy(cpc_t *cpc, cpc_buf_t *dest, cpc_buf_t *src);
void cpc_buf_zero(cpc_t *cpc, cpc_buf_t *buf);

The cpc_buf_get() function retrieves the value of the counter that is identified by the index parameter. The index parameter is a value that is returned by the cpc_set_add_request() function before the set is bound. The cpc_buf_get() function stores the value of the counter at the location indicated by the val parameter.

The cpc_buf_set() function sets the value of the counter that is identified by the index parameter. The index parameter is a value that is returned by the cpc_set_add_request() function before the set is bound. The cpc_buf_set() function sets the counter's value to the value at the location indicated by the val parameter. Neither the cpc_buf_get() function nor the cpc_buf_set() function change the preset of the corresponding CPC request.

The cpc_buf_hrtime() function returns the high resolution timestamp that indicates when the hardware was sampled. The cpc_buf_tick() function returns the number of CPU clock cycles that have elapsed while the LWP is running.

The cpc_buf_sub() function computes the difference between the counters and tick values that are specified in the left and right parameters. The cpc_buf_sub() function stores the results in result. A given invocation of the cpc_buf_sub() function must have all cpc_buf_t values originate from the same cpc_set_t structure. The result index contains the result of the left - right computation for each request index in the buffers. The result index also contains the tick difference. The cpc_buf_sub() function sets the high-resolution timestamp of the destination buffer to the most recent time of the left or right buffers.

The cpc_buf_add() function computes the total of the counters and tick values that are specified in the left and right parameters. The cpc_buf_add() function stores the results in result. A given invocation of the cpc_buf_add() function must have all cpc_buf_t values originate from the same cpc_set_t structure. The result index contains the result of the left + right computation for each request index in the buffers. The result index also contains the tick total. The cpc_buf_add() function sets the high-resolution timestamp of the destination buffer to the most recent time of the left or right buffers.

The cpc_buf_copy() function makes dest identical to src.

The cpc_buf_zero() function sets everything in buf to zero.

Activation Interfaces

This section describes activation interfaces for CPC.

int cpc_enable(cpc_t *cpc);
int cpc_disable(cpc_t *cpc);

These two interfaces respectively enable and disable counters of any set that is bound to the executing LWP. Use of these interfaces enables an application to designate code of interest while deferring the counter configuration to a controlling process by using libpctx.

Error Handling Interfaces

This section describes CPC's error handling interfaces.

typedef void (cpc_errhndlr_t)(const char *fn, int subcode, const char *fmt,
          va_list ap);
void cpc_seterrhndlr(cpc_t *cpc, cpc_errhndlr_t *errhndlr);

These two interfaces allow the passage of a cpc_t handle. The cpc_errhndlr_t handle takes an integer subcode in addition to a string. The integer subcode describes the specific error that was encountered by the function that the fn argument refers to. The integer subcode simplifies an application's recognition of error conditions. The string value of the fmt argument contains an internationalized description of the error subcode and is suitable for printing.