Programming Interfaces Guide

Chapter 4 Locality Group APIs

This chapter describes the APIs that applications use to interact with locality groups.

Locality Groups Overview describes the locality group abstraction.

Verifying the Interface Version describes the functions that give information about the interface.

Initializing the Locality Group Interface describes function calls that initialize and shut down the portion of the interface that is used to traverse the locality group hierarchy and to discover the contents of a locality group.

Locality Group Hierarchy describes function calls that navigate the locality group hierarchy and get characteristics of the locality group hierarchy.

Locality Group Contents describes function calls that retrieve information about a locality group's contents.

Locality Group Characteristics describes function calls that retrieve information about a locality group's characteristics.

Locality Groups and Thread and Memory Placement describes how to affect a thread's memory placement and other memory management techniques.

Examples of API usage contains code that performs example tasks by using the APIs that are described in this chapter.

Locality Groups Overview

Shared memory multiprocessor computers contain multiple CPUs. Each CPU can access all of the memory in the machine. In some shared memory multiprocessors, the memory architecture enables each CPU to access some areas of memory more quickly than other areas.

When a machine with such a memory architecture runs Solaris, giving the kernel information about the shortest access times between a given CPU and a given area of memory can improve the system's performance. The locality group (lgroup) abstraction has been introduced in Solaris to handle this information. The lgroup abstraction is part of the Memory Placement Optimization (MPO) feature.

An lgroup is a set of CPU–like and memory–like devices in which each member of the set can access another member of that set within a bounded latency interval. The latency value of each lgroup is chosen by the operating system.

Lgroups are hierarchical. The lgroup hierarchy is a Directed Acyclic Graph (DAG) and is similar to a tree, except that an lgroup may have more than one parent. Like a tree, there is a root. The root lgroup contains all the resources in the system and can include child lgroups. Furthermore, the root lgroup can be characterized as having the highest latency value of all the lgroups in the system. All of its child lgroups will have lower latency values. The lgroups closer to the root have a higher latency while lgroups closer to leaves have lower latency.

A computer in which all the CPUs can access all the memory in the same amount of time can be represented with a single lgroup (see Figure 4–1). A computer in which some of the CPUs can access some areas of memory in a shorter time than other areas can be represented using multiple lgroups (see Figure 4–2).

Figure 4–1 Single Locality Group Schematic

All CPUs in the machine can access the memory in a comparable time frame.

Figure 4–2 Multiple Locality Groups Schematic

The machine's CPU and memory resources are grouped by bounded latency intervals.

The organization of the lgroup hierarchy simplifies the task of finding the nearest resources in the system. Each thread is assigned a home lgroup upon creation. The operating system attempts to allocate resources for the thread from the thread's home lgroup by default. For example, the Solaris kernel attempts to schedule a thread to run on the CPUs in the thread's home lgroup and allocate the thread's memory in a way that optimizes for locality. If the desired resources are not available from the thread's home lgroup, the kernel can traverse the lgroup hierarchy to find the next nearest resources from parents of the home lgroup.

The lgroup APIs export the lgroup abstraction for applications to use for observability and operformance tuning. Applications can use the APIs to traverse the lgroup hierarchy, discover the contents and characteristics of a given lgroup, and affect the thread and memory placement on lgroups. A new library, called liblgrp, contains the new APIs.

Verifying the Interface Version

The lgrp_version() function discussed in this section must be used to verify the presence of a supported lgroup interface before using the lgroup API.

Using lgrp_version()

#include <sys/lgrp_user.h>
int lgrp_version(const int version);

The lgrp_version() function takes a version number for the lgroup interface as an argument and returns the lgroup interface version that the system supports. When the current implementation of the lgroup API supports the version number in the version argument, the lgrp_version() function returns that version number. Otherwise, the lgrp_version() function returns LGRP_VER_NONE.


Example 4–1 Example of lgrp_version() use

#include <sys/lgrp_user.h>
if (lgrp_version(LGRP_VER_CURRENT) != LGRP_VER_CURRENT) {
    fprintf(stderr, "Built with unsupported lgroup interface %d\n",
        LGRP_VER_CURRENT);
    exit (1);
    }

Initializing the Locality Group Interface

Applications must call lgrp_init() in order to use the APIs for traversing the lgroup hierarchy and discover the contents of the lgroup hierarchy. The call to lgrp_init() gives the application a consistent snapshot of the lgroup hierarchy. The application developer can specify whether the snapshot contains only the resources available to the calling thread specifically or the resources available to the operating system in general. The lgrp_init() function returns a cookie that is used for the following tasks:

Using lgrp_init()

The lgrp_init() function initializes the lgroup interface and takes a snapshot of the lgroup hierarchy.

#include <sys/lgrp_user.h>
lgrp_cookie_t lgrp_init(lgrp_view_t view);

When the lgrp_init() function is called with LGRP_VIEW_CALLER as the view, the function returns a snapshot that contains only the resources available to the calling thread. When the lgrp_init() function is called with LGRP_VIEW_OS as the view, the function returns a snapshot that contains the resources that are available to the operating system. When a thread successfully calls the lgrp_init() function, the function returns a cookie that is used by any function that interacts with the lgroup hierarchy.

The lgroup hierarchy consists of a root lgroup that contains all of the machine's CPU and memory resources. The root lgroup may contain other locality groups defined by bounded latency intervals.

The lgrp_init() function can return two errors. When a view is invalid, the function returns EINVAL. When there is insufficient memory to allocate the snapshot of the lgroup hierarchy, the function returns ENOMEM.

Using lgrp_fini()

The lgrp_fini() function ends the usage of a given cookie and frees the corresponding lgroup hierarchy snapshot.

#include <sys/lgrp_user.h>
int lgrp_fini(lgrp_cookie_t cookie);

The lgrp_fini() function takes a cookie which represents an lgroup hierarchy snapshot created by a previous call to lgrp_init(). The lgrp_fini() function frees the memory that is allocated to that snapshot. After the call to lgrp_fini(), the cookie is invalid. Do not use that cookie again.

When the cookie passed to the lgrp_fini() function is invalid, lgrp_fini() returns EINVAL.

Locality Group Hierarchy

The APIs that are described in this section enable the calling thread to navigate the lgroup hierarchy. The lgroup hierarchy is a directed acyclic graph that is similar to a tree, except that a node may have more than one parent. The root lgroup represents the whole machine and is the lgroup with the highest latency value in the system. Each of the child lgroups contains a subset of the hardware in the root lgroup and is bounded by a lower latency value. Locality groups that are closer to the root have more resources and a higher latency. Locality groups that are closer to the leaves have fewer resources and a lower latency.

Using lgrp_cookie_stale()

The lgrp_cookie_stale() function determines whether the snapshot of the lgroup hierarchy represented by the given cookie is current.

#include <sys/lgrp_user.h>
int lgrp_cookie_stale(lgrp_cookie_t cookie);

The cookie returned by the lgrp_init() function can become stale due to several reasons that depend on the view the snapshot represents. A cookie returned by calling the lgrp_init() function with the view set to LGRP_VIEW_OS can become stale due to changes in the lgroup hierarchy such as dynamic reconfiguration or a change in a CPU's online status. A cookie returned by calling the lgrp_init() function with the view set to LGRP_VIEW_CALLER can become stale due to changes in the calling thread's processor set or changes in the lgroup hierarchy. A stale cookie is refreshed by calling the lgrp_fini() function with the old cookie, followed by calling lgrp_init() to generate a new cookie.

The lgrp_cookie_stale() function returns EINVAL when the given cookie is invalid.

Using lgrp_view()

The lgrp_view() function determines the view with which a given lgroup hierarchy snapshot was taken.

#include <sys/lgrp_user.h>
lgrp_view_t lgrp_view(lgrp_cookie_t cookie);

The lgrp_view() function takes a cookie representing a snapshot of the lgroup hierarchy and returns the snapshot's view of the lgroup hierarchy. Snapshots taken with the view LGRP_VIEW_CALLER contain only the resources that are available to the calling thread. Snapshots taken with the view LGRP_VIEW_OS contain all the resources that are available to the operating system.

The lgrp_view() function returns EINVAL when the given cookie is invalid.

Using lgrp_nlgrps()

The lgrp_nlgrps() function returns the number of locality groups in the system. If a system has only one locality group, memory placement optimizations have no effect.

#include <sys/lgrp_user.h>
int lgrp_nlgrps(lgrp_cookie_t cookie);

The lgrp_nlgrps() function takes a cookie representing a snapshot of the lgroup hierarchy and returns the number of lgroups available in the hierarchy.

The lgrp_nlgrps() function returns EINVAL when the cookie is invalid.

Using lgrp_root()

The lgrp_root() function returns the root lgroup ID.

#include <sys/lgrp_user.h>
lgrp_id_t lgrp_root(lgrp_cookie_t cookie);

The lgrp_root() function takes a cookie representing a snapshot of the lgroup hierarchy and returns the root lgroup ID.

Using lgrp_parents()

The lgrp_parents() function takes a cookie representing a snapshot of the lgroup hierarchy and returns the number of parent lgroups for the specified lgroup.

#include <sys/lgrp_user.h>
int lgrp_parents(lgrp_cookie_t cookie, lgrp_id_t child,
                 lgrp_id_t *lgrp_array, uint_t lgrp_array_size);

If lgrp_array() is not NULL and the value of lgrp_array_size is not zero, the lgrp_parents() function fills the array with parent lgroup IDs until the array is full or all parent lgroup IDs are in the array. The root lgroup has zero parents. When the lgrp_parents() function is called for the root lgroup, lgrp_array will not be filled in.

The lgrp_parents() function returns EINVAL when the cookie is invalid. The lgrp_parents() function returns ESRCH when the specified lgroup ID is not found.

Using lgrp_children()

The lgrp_children() function takes a cookie representing the calling thread's snapshot of the lgroup hierarchy and returns the number of child lgroups for the specified lgroup.

#include <sys/lgrp_user.h>
int lgrp_children(lgrp_cookie_t cookie, lgrp_id_t parent,
                  lgrp_id_t *lgrp_array, uint_t lgrp_array_size);

If lgrp_array is not NULL and the value of lgrp_array_size is not zero, the lgrp_children() function fills the array with child lgroup IDs until the array is full or all child lgroup IDs are in the array.

The lgrp_children() function returns EINVAL when the cookie is invalid. The lgrp_children() function returns ESRCH when the specified lgroup ID is not found.

Locality Group Contents

The following APIs retrieve information about the contents of a given lgroup.

Using lgrp_cpus()

The lgrp_cpus() function takes a cookie representing a snapshot of the lgroup hierarchy and returns the number of CPUs in a given lgroup.

#include <sys/lgrp_user.h>
int lgrp_cpus(lgrp_cookie_t cookie, lgrp_id_t lgrp, processorid_t *cpuids,
              uint_t count, int content);

If the cpuid[] argument is not NULL and the CPU count is not zero, the lgrp_cpus() function fills the array with CPU IDs until the array is full or all the CPU IDs are in the array.

The content argument can have the following two values:

LGRP_CONTENT_HIERARCHY

The lgrp_cpus() function returns IDs for the CPUs in this lgroup and this lgroup's descendants.

LGRP_CONTENT_DIRECT

The lgrp_cpus() function returns IDs for the CPUs in this lgroup only.

The lgrp_cpus() function returns EINVAL when the cookie, lgroup ID, or one of the flags is not valid. The lgrp_cpus() function returns ESRCH when the specified lgroup ID is not found.

Using lgrp_mem_size()

The lgrp_mem_size() function takes a cookie representing a snapshot of the lgroup hierarchy and returns the size of installed or free memory in the given lgroup. The lgrp_mem_size() function reports memory sizes in bytes.

#include <sys/lgrp_user.h>
lgrp_mem_size_t lgrp_mem_size(lgrp_cookie_t cookie, lgrp_id_t lgrp,
                              int type, int content)

The type argument can have the following two values:

LGRP_MEM_SZ_FREE

The lgrp_mem_size() function returns the amount of free memory in bytes.

LGRP_MEM_SZ_INSTALLED

The lgrp_mem_size() function returns the amount of installed memory in bytes.

The content argument can have the following two values:

LGRP_CONTENT_HIERARCHY

The lgrp_mem_size() function returns the amount of memory in this lgroup and this lgroup's descendants.

LGRP_CONTENT_DIRECT

The lgrp_mem_size() function returns the amount of memory in this lgroup only.

The lgrp_mem_size() function returns EINVAL when the cookie, lgroup ID, or one of the flags is not valid. The lgrp_mem_size() function returns ESRCH when the specified lgroup ID is not found.

Locality Group Characteristics

The following API retrieves information about the characteristics of a given lgroup.

Using lgrp_latency()

The lgrp_latency() function returns the latency between a CPU in one lgroup to the memory in another lgroup.

#include <sys/lgrp_user.h>
int lgrp_latency(lgrp_id_t from, lgrp_id_t to);

The lgrp_latency() function returns a value that represents the latency between a CPU in the lgroup given by the value of the from argument and the memory in the lgroup given by the value of the to argument. If both arguments point to the same lgroup, the lgrp_latency() function returns the latency value within that lgroup.


Note –

The latency value returned by the lgrp_latency() function is defined by the operating system and is platform-specific. This value does not necessarily represent the actual latency between hardware devices and may only be used for comparison within one domain.


The lgrp_latency() function returns EINVAL when the lgroup ID is not valid. When the lgrp_latency() function does not find the specified lgroup ID, the 'from' lgroup does not contain any CPUs, or the 'to' lgroup does not have any memory, the lgrp_latency() function returns ESRCH.

Locality Groups and Thread and Memory Placement

This section discusses the APIs used to discover and affect thread and memory placement with respect to lgroups. The lgrp_home() function is used to discover thread placement. The meminfo(2) system call is used to discover memory placement. The MADV_ACCESS flags to the madvise(3C) function are used to affect memory allocation among lgroups. The lgrp_affinity_set() function can affect thread and memory placement by setting a thread's affinity for a given lgroup. The affinities of an lgroup may specify an order of preference for lgroups from which to allocate resources. The kernel needs information about the likely pattern of an application's memory use in order to allocate memory resources efficiently. The madvise() function, and its shared object analogue madv.so.1, provide this information to the kernel. A running process can gather memory usage information about itself by using the meminfo() system call.

Using lgrp_home()

The lgrp_home() function returns the home lgroup for the specified process or thread.

#include <sys/lgrp_user.h>
lgrp_id_t lgrp_home(idtype_t idtype, id_t id);

The lgrp_home() function returns EINVAL when the ID type is not valid. The lgrp_home() function returns EPERM when the effective user of the calling process is not the superuser and the calling process' real or effective user ID does not match the real or effective user ID of one of the threads. The lgrp_home() function returns ESRCH when the specified process or thread is not found.

Using madvise()

The madvise() function advises the kernel that a region of user virtual memory in the range starting at the address specified in addr and with length equal to the value of the len parameter is expected to follow a particular pattern of use. The kernel uses this information to optimize the procedure for manipulating and maintaining the resources associated with the specified range. Use of the madvise() function can increase system performance when used by programs that have specific knowledge of their access patterns over memory.

#include <sys/types.h>
#include <sys/mman.h>
int madvise(caddr_t addr, size_t len, int advice);

The madvise() function provides the following flags to affect how a thread's memory is allocated among lgroups:

MADV_ACCESS_DEFAULT

This flag resets the kernel's expected access pattern for the specified range to the default.

MADV_ACCESS_LWP

This flag advises the kernel that the next LWP to touch the specified address range is the LWP that will access that range the most. The kernel allocates the memory and other resources for this range and the LWP accordingly.

MADV_ACCESS_MANY

This flag advises the kernel that many processes or LWPs will access the specified address range randomly across the system. The kernel allocates the memory and other resources for this range accordingly.

The madvise() function returns EAGAIN when some or all of the mappings in the specified address range, from addr to addr+len, are locked for I/O. The madvise() function returns EINVAL when the value of the addr parameter is not a multiple of the page size as returned by sysconf(3C). The madvise() function returns EINVAL when the length of the specified address range is less than or equal to zero. The madvise() function returns EINVAL when the advice is invalid. The madvise() function returns EIO when an I/O error occurs while reading from or writing to the file system. The madvise() function returns ENOMEM when addresses in the specified address range are outside the valid range for the address space of a process, or the addresses in the specified address range specifiy one or more pages that are not mapped. The madvise() function returns ESTALE when the NFS file handle is stale.

Using madv.so.1

The madv.so.1 shared object enables the selective configuration of virtual memory advice for launched processes and their descendants. To use the shared object, the following string must be present in the environment:

LD_PRELOAD=$LD_PRELOAD:madv.so.1

The madv.so.1 shared object applies memory advice as specified by the value of the MADV environment variable. The MADV environment variable specifies the virtual memory advice to use for all heap, shared memory, and mmap regions in the process address space. This advice is applied to all created processes. The following values of the MADV environment variable affect resource allocation among lgroups:

access_default

This value resets the kernel's expected access pattern to the default.

access_lwp

This value advises the kernel that the next LWP to touch an address range is the LWP that will access that range the most. The kernel allocates the memory and other resources for this range and the LWP accordingly.

access_many

This value advises the kernel that many processes or LWPs will access memory randomly across the system. The kernel allocates the memory and other resources accordingly.

The value of the MADVCFGFILE environment variable is the name of a text file that contains one or more memory advice configuration entries in the form <exec-name>:<advice-opts>.

The value of <exec-name> is the name of an application or executable. The value of <exec-name> can be a full pathname, a base name, or a pattern string.

The value of <advice-opts> is of the form <region>=<advice>. The values of <advice> are the same as the values for the MADV environment variable. Replace <region> with any of the following legal values:

madv

Advice applies to all heap, shared memory, and mmap(2) regions in the process address space.

heap

The heap is defined to be the brk(2) area. Advice applies to the existing heap and to any additional heap memory allocated in the future.

shm

Advice applies to shared memory segments. See shmat(2) for more information on shared memory operations.

ism

Advice applies to shared memory segments that are using the SHM_SHARE_MMU flag. The ism option takes precedence over shm.

dsm

Advice applies to shared memory segments that are using the SHM_PAGEABLE flag. The dsm option takes precedence over shm.

mapshared

Advice applies to mappings established by the mmap() system call using the MAP_SHARED flag.

mapprivate

Advice applies to mappings established by the mmap() system call using the MAP_PRIVATE flag.

mapanon

Advice applies to mappings established by the mmap() system call using the MAP_ANON flag. The mapanon option takes precendence when multiple options apply.

The value of the MADVERRFILE environment variable is the pathname where error messages are logged. In the absence of a MADVERRFILE location, the madv.so.1 shared object logs errors by using syslog(3C) with a LOG_ERR as the severity level and LOG_USER as the facility descriptor.

Memory advice is inherited. A child process has the same advice as its parent. The advice is set back to the system default advice after a call to exec(2) unless a different level of advice is configured via the madv.so.1 shared object. Advice is only applied to mmap() regions explicitly created by the user program. Regions established by the run-time linker or by system libraries that make direct system calls are not affected.

madv.so.1 Usage Examples

The following examples illustrate specific aspects of the madv.so.1 shared object.


Example 4–2 Setting Advice for a Set of Applications

This configuration applies advice to all ISM segments for applications with exec names that begin with foo.

$ LD_PRELOAD=$LD_PRELOAD:madv.so.1
$ MADVCFGFILE=madvcfg
$ export LD_PRELOAD MADVCFGFILE
$ cat $MADVCFGFILE
        foo*:ism=access_lwp


Example 4–3 Excluding a Set of Applications From Advice

This configuration sets advice for all applications with the exception of ls.

$ LD_PRELOAD=$LD_PRELOAD:madv.so.1
$ MADV=access_many
$ MADVCFGFILE=madvcfg
$ export LD_PRELOAD MADV MADVCFGFILE
$ cat $MADVCFGFILE
        ls:


Example 4–4 Pattern Matching in a Configuration File

Because the configuration specified in MADVCFGFILE takes precedence over the value set in MADV, specifying * as the <exec-name> of the last configuration entry is equivalent to setting MADV. This example is equivalent to the previous example.

$ LD_PRELOAD=$LD_PRELOAD:madv.so.1
$ MADVCFGFILE=madvcfg
$ export LD_PRELOAD MADVCFGFILE
$ cat $MADVCFGFILE
        ls:
        *:madv=access_many



Example 4–5 Advice for Multiple Regions

This configuration applies one type of advice for mmap() regions and different advice for heap and shared memory regions for applications whose exec() names begin with foo.

$ LD_PRELOAD=$LD_PRELOAD:madv.so.1
$ MADVCFGFILE=madvcfg
$ export LD_PRELOAD MADVCFGFILE
$ cat $MADVCFGFILE
        foo*:madv=access_many,heap=sequential,shm=access_lwp

Using meminfo()

The meminfo() function gives the calling process information about the virtual memory and physical memory that the system has allocated to that process.

#include <sys/types.h>
#include <sys/mman.h>
int meminfo(const uint64_t inaddr[], int addr_count,
    const uint_t info_req[], int info_count, uint64_t outdata[],
    uint_t validity[]);

The meminfo() function can return the following types of information:

MEMINFO_VPHYSICAL

The physical memory address corresponding to the given virtual address.

MEMINFO_VLGRP

The lgroup to which the physical page corresponding to the given virtual address belongs.

MEMINFO_VPAGESIZE

The size of the physical page corresponding to the given virtual address.

MEMINFO_VREPLCNT

The number of replicated physical pages that correspond to the given virtual address.

MEMINFO_VREPL|n

The nth physical replica of the given virtual address.

MEMINFO_VREPL_LGRP|n

The lgroup to which the nth physical replica of the given virtual address belongs.

MEMINFO_PLGRP

The lgroup to which the given physical address belongs.

The meminfo() function takes the following parameters:

inaddr

An array of input addresses.

addr_count

The number of addresses that are passed to meminfo().

info_req

An array listing the types of information that are being requested.

info_count

The number of pieces of information that are requested for each address in the inaddr array.

outdata

An array where the meminfo() function places the results. The array's size is equal to the product of the values of the info_req and addr_count parameters.

validity

An array of size equal to the value of the addr_count parameter. The validity array contains bitwise result codes. The 0th bit of the result code evaluates the validity of the corresponding input address. Each successive bit in the result code evaluates the validity of the response to the members of the info_req array in turn.

The meminfo() function returns EFAULT when the area of memory that the outdata or validity arrays point to cannot be written to. The meminfo() function returns EFAULT when the area of memory that the info_req or inaddr arrays point to cannot be read from. The meminfo() function returns EINVAL when the value of info_count exceeds 31 or is less than 1. The meminfo() function returns EINVAL when the value of addr_count is less than zero.


Example 4–6 Use of meminfo() to print out physical pages and page sizes corresponding to a set of virtual addresses

void
print_info(void **addrvec, int how_many)
{
        static const int info[] = {
                MEMINFO_VPHYSICAL,
                MEMINFO_VPAGESIZE};
        uint64_t * inaddr = alloca(sizeof(uint64_t) * how_many);
        uint64_t * outdata = alloca(sizeof(uint64_t) * how_many * 2;
        uint_t * validity = alloca(sizeof(uint_t) * how_many);

        int i;

        for (i = 0; i < how_many; i++)
                inaddr[i] = (uint64_t *)addr[i];

        if (meminfo(inaddr, how_many,  info,
                    sizeof (info)/ sizeof(info[0]),
                    outdata, validity) < 0)
                ...

        for (i = 0; i < how_many; i++) {
                if (validity[i] & 1 == 0)
                        printf("address 0x%llx not part of address
                                        space\n",
                                inaddr[i]);

                else if (validity[i] & 2 == 0)
                        printf("address 0x%llx has no physical page
                                        associated with it\n",
                                inaddr[i]);

                else {
                        char buff[80];
                        if (validity[i] & 4 == 0)
                                strcpy(buff, "<Unknown>");
                        else
                                sprintf(buff, "%lld", outdata[i * 2 +
                                                1]);
                        printf("address 0x%llx is backed by physical
                                        page 0x%llx of size %s\n",
                                        inaddr[i], outdata[i * 2], buff);
                }
        }
}

Locality Group Affinity

The kernel assigns a thread to a locality group when the light weight process (LWP) for that thread is created. That lgroup is called the thread's home lgroup. The kernel runs the thread on the CPUs in the thread's home lgroup and allocates memory from that lgroup whenever possible. If resources from the home lgroup are unavailable, the kernel allocates resources from other lgroups. When a thread has affinity for more than one lgroup, the operating system allocates resources from lgroups chosen in order of affinity strength. There are three affinity levels:

  1. LGRP_AFF_STRONG indicates strong affinity. If this lgroup is the thread's home lgroup, the operating system avoids rehoming the thread to another lgroup if possible. Events such as dynamic reconfiguration, processor, offlining, processor binding, and processor set binding and manipulation may still result in thread rehoming.

  2. LGRP_AFF_WEAK indicates weak affinity. If this lgroup is the thread's home lgroup, the operating system rehomes the thread if necessary for load balancing purposes.

  3. LGRP_AFF_NONE indicates no affinity. If a thread has no affinity to any lgroup, the operating system assigns the thread a home lgroup.

The operating system uses lgroup affinities as advice when allocating resources for a given thread. The advice is factored in with the other system constraints. Processor binding and processor sets do not change lgroup affinities, but may restrict the lgroups on which a thread can run.

Using lgrp_affinity_get()

The lgrp_affinity_get() function returns the affinity that a LWP or set of LWPs have for a given lgroup.

#include <sys/lgrp_user.h>
lgrp_affinity_t lgrp_affinity_get(idtype_t idtype, id_t id, lgrp_id_t lgrp);

The idtype and id arguments specify the LWP or set of LWPs that the lgrp_affinity_get() function examines. If the value of idtype is P_PID, the lgrp_affinity_get() function gets the lgroup affinity for one of the LWPs in the process whose process ID matches the value of the id argument. If the value of idtype is P_LWPID, the lgrp_affinity_get() function gets the lgroup affinity for the LWP of the current process whose LWP ID matches the value of the id argument. If the value of idtype is P_MYID, the lgrp_affinity_get() function gets the lgroup affinity for the current LWP or process.

The lgrp_affinity_get() function returns EINVAL when the given lgroup, affinity, or ID type is not valid. The lgrp_affinity_get() function returns EPERM when the effective user of the calling process is not the superuser and the calling process' ID does not match the real or effective user ID of one of the LWPs. The lgrp_affinity_get() function returns ESRCH when a given lgroup or LWP is not found.

Using lgrp_affinity_set()

The lgrp_affinity_set() function sets the affinity that a LWP or set of LWPs have for a given lgroup.

#include <sys/lgrp_user.h>
int lgrp_affinity_set(idtype_t idtype, id_t id, lgrp_id_t lgrp,
                      lgrp_affinity_t affinity);

The idtype and id arguments specify the LWP or set of LWPs the lgrp_affinity_set() function examines. If the value of idtype is P_PID, the lgrp_affinity_set() function sets the lgroup affinity for all of the LWPs in the process whose process ID matches the value of the id argument to the affinity level specified in the affinity argument. If the value of idtype is P_LWPID, the lgrp_affinity_set() function sets the lgroup affinity for the LWP of the current process whose LWP ID matches the value of the id argument to the affinity level specified in the affinity argument. If the value of idtype is P_MYID, the lgrp_affinity_set() function sets the lgroup affinity for the current LWP or process to the affinity level specified in the affinity argument.

The lgrp_affinity_set() function returns EINVAL when the given lgroup, affinity, or ID type is not valid. The lgrp_affinity_set() function returns EPERM when the effective user of the calling process is not the superuser and the calling process' ID does not match the real or effective user ID of one of the LWPs. The lgrp_affinity_set() function returns ESRCH when a given lgroup or LWP is not found.

Examples of API usage

This section contains code that performs example tasks by using the APIs that are described in this chapter.


Example 4–7 Move Memory to a Thread

The following code sample moves the memory in the range from the address specified by addr to the address specified by addr+len to the thread specified by MADV_ACCESS_LWP.

#include <sys/mman.h>
#include <sys/types.h>

/*
 * Move memory to thread
 */
mem_to_thread(caddr_t addr, size_t len)
{
	if (madvise(addr, len, MADV_ACCESS_LWP) < 0)
		perror("madvise");
}


Example 4–8 Move a Thread to Memory

This sample code uses the meminfo() function to return the lgroup of a specified memory page and raises the specified thread's affinity to that lgroup with the lgrp_affinity_set function().

#include <sys/lgrp_user.h>
#include <sys/mman.h>
#include <sys/types.h>


/*
 * Move a Thread to Memory
 */
int
thread_to_memory(caddr_t va)
{
	uint64_t	addr;
	ulong_t		count;
	lgrp_id_t	home;
	uint64_t	lgrp;
	uint_t		request;
	uint_t		valid;

	addr = (uint64_t)va;
	count = 1;
	request = MEMINFO_VLGRP;
	if (meminfo(&addr, 1, &request, 1, &lgrp, &valid) != 0) {
		perror("meminfo");
		return (1);
	}

	if (lgrp_affinity_set(P_LWPID, P_MYID, lgrp, LGRP_AFF_STRONG) != 0) {
		perror("lgrp_affinity_set");
		return (2);
	}

	home = lgrp_home(P_LWPID, P_MYID);
	if (home == -1) {
		perror ("lgrp_home");
		return (3);
	}

	if (home != lgrp)
		return (-1);

	return (0);
}


Example 4–9 Walk the lgroup Hierarchy

The following sample code walks through and prints out the lgroup hierarchy.

#include <stdlib.h>
#include <sys/lgrp_user.h>
#include <sys/types.h>


/*
 * Walk and print lgroup hierarchy from given lgroup
 * through all its descendants
 */
int
lgrp_walk(lgrp_cookie_t cookie, lgrp_id_t lgrp, lgrp_content_t content)
{
	lgrp_affinity_t	aff;
	lgrp_id_t	*children;
	processorid_t	*cpuids;
	int		i;
	int		ncpus;
	int		nchildren;
	int		nparents;
	lgrp_id_t	*parents;
	lgrp_mem_size_t	size;

	/*
	 * Print given lgroup, caller's affinity for lgroup,
	 * and desired content specified
	 */
	printf("LGROUP #%d:\n", lgrp);

	aff = lgrp_affinity_get(P_MYID, P_MYID, lgrp);
	if (aff == -1)
		perror ("lgrp_affinity_get");
	printf("\tAFFINITY: %d\n", aff);

	printf("CONTENT %d:\n", content);

	/*
	 * Get CPUs
	 */
	ncpus = lgrp_cpus(cookie, lgrp, NULL, 0, content);
	printf("\t%d CPUS: ", ncpus);
	if (ncpus == -1) {
		perror("lgrp_cpus");
		return (-1);
	} else if (ncpus > 0) {
		cpuids = malloc(ncpus * sizeof (processorid_t));
		ncpus = lgrp_cpus(cookie, lgrp, cpuids, ncpus, content);
		if (ncpus == -1) {
			free(cpuids);
			perror("lgrp_cpus");
			return (-1);
		}
		for (i = 0; i < ncpus; i++)
			printf("%d ", cpuids[i]);
		free(cpuids);
	}
	printf("\n");

	/*
	 * Get memory size
	 */
	printf("\tMEMORY: ");
	size = lgrp_mem_size(cookie, lgrp, LGRP_MEM_SZ_INSTALLED, content);
	if (size == -1) {
		perror("lgrp_mem_size");
		return (-1);
	}
	printf("installed bytes 0x%llx, ", size);
	size = lgrp_mem_size(cookie, lgrp, LGRP_MEM_SZ_FREE, content);
	if (size == -1) {
		perror("lgrp_mem_size");
		return (-1);
	}
	printf("free bytes 0x%llx\n", size);

	/*
	 * Get parents
	 */
	nparents = lgrp_parents(cookie, lgrp, NULL, 0);
	printf("\t%d PARENTS: ", nparents);
	if (nparents == -1) {
		perror("lgrp_parents");
		return (-1);
	} else if (nparents > 0) {
		parents = malloc(nparents * sizeof (lgrp_id_t));
		nparents = lgrp_parents(cookie, lgrp, parents, nparents);
		if (nparents == -1) {
			free(parents);
			perror("lgrp_parents");
			return (-1);
               	}
		for (i = 0; i < nparents; i++)
			printf("%d ", parents[i]);
		free(parents);
	}
	printf("\n");

	/*
	 * Get children
	 */
	nchildren = lgrp_children(cookie, lgrp, NULL, 0);
	printf("\t%d CHILDREN: ", nchildren);
	if (nchildren == -1) {
		perror("lgrp_children");
		return (-1);
	} else if (nchildren > 0) {
		children = malloc(nchildren * sizeof (lgrp_id_t));
		nchildren = lgrp_children(cookie, lgrp, children, nchildren);
		if (nchildren == -1) {
			free(children);
			perror("lgrp_children");
			return (-1);
               	}
		printf("Children: ");
		for (i = 0; i < nchildren; i++)
			printf("%d ", children[i]);
		printf("\n");

		for (i = 0; i < nchildren; i++)
			lgrp_walk(cookie, children[i], content);

		free(children);
	}
	printf("\n");

	return (0);
}


Example 4–10 Find the Closest lgroup With Available Memory Outside a Given lgroup

#include <stdlib.h>
#include <sys/lgrp_user.h>
#include <sys/types.h>

#define	INT_MAX	2147483647

/*
 * Find next closest lgroup outside given one with available memory
 */
lgrp_id_t
lgrp_next_nearest(lgrp_cookie_t cookie, lgrp_id_t from)
{
	lgrp_id_t	closest;
	int		i;
	int		latency;
	int		lowest;
	int		nparents;
	lgrp_id_t	*parents;
	lgrp_mem_size_t	size;


	/*
	 * Get number of parents
	 */
	nparents = lgrp_parents(cookie, from, NULL, 0);
	if (nparents == -1) {
		perror("lgrp_parents");
		return (LGRP_NONE);
	}

	/*
	 * No parents, so current lgroup is next nearest
	 */
	if (nparents == 0) {
		return (from);
	}

	/*
	 * Get parents
	 */
	parents = malloc(nparents * sizeof (lgrp_id_t));
	nparents = lgrp_parents(cookie, from, parents, nparents);
	if (nparents == -1) {
		perror("lgrp_parents");
		free(parents);
		return (LGRP_NONE);
        }

	/*
	 * Find closest parent (ie. the one with lowest latency)
	 */
	closest = LGRP_NONE;
	lowest = INT_MAX;
	for (i = 0; i < nparents; i++) {
		lgrp_id_t	lgrp;

		/*
		 * See whether parent has any free memory
		 */
		size = lgrp_mem_size(cookie, parents[i], LGRP_MEM_SZ_FREE,
		    LGRP_CONTENT_HIERARCHY);
		if (size > 0)
			lgrp = parents[i];
		else {
			if (size == -1)
				perror("lgrp_mem_size");

			/*
			 * Find nearest ancestor if parent doesn't
			 * have any memory
			 */
			lgrp = lgrp_next_nearest(cookie, parents[i]);
			if (lgrp == LGRP_NONE)
				continue;
		}

		/*
		 * Get latency within parent lgroup
		 */
		latency = lgrp_latency(lgrp, lgrp);
		if (latency == -1) {
			perror("lgrp_latency");
			continue;
		}

		/*
		 * Remember lgroup with lowest latency
		 */
		if (latency < lowest) {
			closest = lgrp;
			lowest = latency;
		}
	}

	free(parents);
	return (closest);
}


/*
 * Find lgroup with memory nearest home lgroup of current thread
 */
lgrp_id_t
lgrp_nearest(lgrp_cookie_t cookie)
{
	lgrp_id_t	home;
	longlong_t	size;

	/*
	 * Get home lgroup
	 */
	home = lgrp_home(P_LWPID, P_MYID);

	/*
	 * See whether home lgroup has any memory available in its hierarchy
	 */
	size = lgrp_mem_size(cookie, home, LGRP_MEM_SZ_FREE,
	    LGRP_CONTENT_HIERARCHY);
	if (size == -1)
		perror("lgrp_mem_size");

	/*
	 * It does, so return the home lgroup.
	 */
	if (size > 0)
		return (home);

	/*
	 * Otherwise, find next nearest lgroup outside of the home.
	 */
	return (lgrp_next_nearest(cookie, home));
}


Example 4–11 Find Nearest lgroup With Free Memory

This example code finds the nearest lgroup with free memory to a given thread's home lgroup.

#include <stdlib.h>
#include <sys/lgrp_user.h>
#include <sys/types.h>


#define	INT_MAX	2147483647


/*
 * Find next closest lgroup outside given one with available memory
 */
lgrp_id_t
lgrp_next_nearest(lgrp_cookie_t cookie, lgrp_id_t from)
{
	lgrp_id_t	closest;
	int		i;
	int		latency;
	int		lowest;
	int		nparents;
	lgrp_id_t	*parents;
	lgrp_mem_size_t	size;


	/*
	 * Get number of parents
	 */
	nparents = lgrp_parents(cookie, from, NULL, 0);
	if (nparents == -1) {
		perror("lgrp_parents");
		return (LGRP_NONE);
	}

	/*
	 * No parents, so current lgroup is next nearest
	 */
	if (nparents == 0) {
		return (from);
	}

	/*
	 * Get parents
	 */
	parents = malloc(nparents * sizeof (lgrp_id_t));
	nparents = lgrp_parents(cookie, from, parents, nparents);
	if (nparents == -1) {
		perror("lgrp_parents");
		free(parents);
		return (LGRP_NONE);
        }

	/*
	 * Find closest parent (ie. the one with lowest latency)
	 */
	closest = LGRP_NONE;
	lowest = INT_MAX;
	for (i = 0; i < nparents; i++) {
		lgrp_id_t	lgrp;

		/*
		 * See whether parent has any free memory
		 */
		size = lgrp_mem_size(cookie, parents[i], LGRP_MEM_SZ_FREE,
		    LGRP_CONTENT_HIERARCHY);
		if (size > 0)
			lgrp = parents[i];
		else {
			if (size == -1)
				perror("lgrp_mem_size");

			/*
			 * Find nearest ancestor if parent doesn't
			 * have any memory
			 */
			lgrp = lgrp_next_nearest(cookie, parents[i]);
			if (lgrp == LGRP_NONE)
				continue;
		}

		/*
		 * Get latency within parent lgroup
		 */
		latency = lgrp_latency(lgrp, lgrp);
		if (latency == -1) {
			perror("lgrp_latency");
			continue;
		}

		/*
		 * Remember lgroup with lowest latency
		 */
		if (latency < lowest) {
			closest = lgrp;
			lowest = latency;
		}
	}

	free(parents);
	return (closest);
}


/*
 * Find lgroup with memory nearest home lgroup of current thread
 */
lgrp_id_t
lgrp_nearest(lgrp_cookie_t cookie)
{
	lgrp_id_t	home;
	longlong_t	size;

	/*
	 * Get home lgroup
	 */
	home = lgrp_home(P_LWPID, P_MYID);

	/*
	 * See whether home lgroup has any memory available in its hierarchy
	 */
	size = lgrp_mem_size(cookie, home, LGRP_MEM_SZ_FREE,
	    LGRP_CONTENT_HIERARCHY);
	if (size == -1)
		perror("lgrp_mem_size");

	/*
	 * It does, so return the home lgroup.
	 */
	if (size > 0)
		return (home);

	/*
	 * Otherwise, find next nearest lgroup outside of the home.
	 */
	return (lgrp_next_nearest(cookie, home));
}