The ChorusOS operating system provides instrumentation to inform applications of the current use of the various resources managed by the system. Several kinds of instrumentation are exported by the system:
Static read-only values that show how the system is configured. These attributes are usually tunable values set when you build your system.
Values that increase constantly, such as, the number of bytes transferred on a disk, or the number of packets received on a network interface. Such counters can only be read by the application. Some counters can be reset.
Values that increase and decrease depending upon the activity of the system, such as, the amount of memory used or the number of open file descriptors used. Most of the time, gauges are associated with watermarks. The ChorusOS operating system manages one high and one low watermark per gauge. Gauges can only be read, while watermarks can be read or reset.
Gauges with watermarks can also be associated with either a high or a low threshold, depending upon the semantics of the resource being instrumented. A threshold is represented by two values:
a rise value, such that when the gauge's value passes the rise value a system event will be generated and posted to the application level
a clear value, such that when the gauge's value passes the clear value, another system event will be generated and posted to application level
Rise and clear values are illustrated in the following figures:
You can modify the value of the threshold rise and clear values dynamically. At system initialization time, the thresholds are disabled until they are set explicitly by an application.
In addition, the system exhibits a number of tunable values that you can modify dynamically to affect the behavior of the system. These values might, for example, represent the maximum number of open file descriptors per process, or IP forwarding behavior.
The values exposed are given symbolic names according to a tree schema, or they can be accessed through an object identifier (OID), obtained from the symbolic name of the value. The API for getting or setting, or getting and setting, these values is based on the sysctl() facility defined by FreeBSD systems. See the following section for details.
The sysctl facility allows the retrieval of information from the system, and allows processes with appropriate privileges to set system information.
The information available from sysctl consists of integers, strings, tables, or opaque data structures. This information is organized in a tree structure containing two types of node:
Access data acquired dynamically on demand. These nodes transparently handle the information exposed by the microkernel
Represent the information exposed by the devices, as it appears and disappears dynamically
Only proxy leaf nodes have data associated with them.
The sysctl nodes are natively identified using a management information base (MIB) style name, an OID, in the form of a unique array of integers.
Two sysctl system calls are provided:
Function |
Description |
---|---|
sysctl() |
Get/set a value identified by its OID |
sysctlbyname() |
Get/set a value identified by its name |
For details, see the sysctl(1M) man page.
The sysctl() facility is used to expose the instrumentation information maintained by the device drivers. This information is retrieved via the Device Driver Manager (DDM).
The Device Driver Manager is a system component that enables a supervisor application to manage devices. Only the devices that export a management DDI interface or that have a parent that exports this DDI can be managed in this way. The DDM is an abstraction of the DKI and the management DDI.
The DDM is implemented as a set of functions that are organized in a library, and can only be used by one client at a time.
The DDM implements a tree of manageable devices with the following properties and features:
A device can be in one of the following three run states: DDM_RUNSTATE_ONLINE, DDM_RUNSTATE_OFFLINE, and DDM_RUNSTATE_INACTIVE.
A device can also be in one of the following availability states simultaneously: DDM_AVSTATE_ENABLED and DDM_AVSTATE_DISABLED.
A device in an online state is able to audit its own health, and export some statistics (in addition to standard operation).
A device in an offline state can only perform internal diagnostics
A device in the inactive state does not perform any operations, although it is able to change its state to another value. One of the purposes of the shut-down state is to be able to change a property of the device in the device tree.
A device in the DDM_AVSTATE_ENABLED state is able to have a driver running to manage it. However, a device in the DDM_AVSTATE_DISABLED state is locked and no drivers can be started to manage it.
The state of a device is changed on request from the DDM client or by external events, such as hardware failure or device hot swap. In both cases the DDM client is notified of the successful state change through a handler (callback) that is defined at the time of opening.
The initial internal device tree is built by taking all devices that satisfy the following criteria:
All devices that export the mngt DDI.
All devices that export the diag DDI.
All devices that have a bus parent that exports the mngt DDI. This means that the child drivers can be shut down or initialized via their bus parent.
The devices that are exposed via the DDM are:
All devices that have a parent (so that they can be shut down or reinitialized).
All diagnostic devices, as they are generally leaf devices, and not bus parent nodes.
The device tree API is summarized in the following table:
Function |
Description |
---|---|
svDdmAudit() |
Runs non-intrusive tests on an online device |
svDdmClose() |
Closes a previously made connection to the device driver manager |
svDdmDiag() |
Runs diagnostics on a node that is currently offline |
svDdmDisable() |
Locks the specified device node in the disabled state |
svDdmEnable() |
Enables a client to set the availability state of the specified device node to DDM_AVSTATE_ENABLED |
svDdmGetInfo() |
Enables the client of the DDM to obtain information on the specified node in the manageable device tree |
svDdmGetState() |
Enables the client of the DDM to get the state value of the specified node |
svDdmGetStats() |
Returns raw I/O statistics (counters) for an online device |
svDdmOffline() |
Enables the DDM client to set the run state of the specified node to DDM_RUNSTATE_OFFLINE |
svDdmOnline() |
Enables the DDM client to set the run state of the specified node to DDM_RUNSTATE_ONLINE |
svDdmOpen() |
Opens a connection to the device driver manager and obtains access to the management of the current device driver instances |
svDdmShutdown() |
Enables the DDM client to request that the driver running on the specified node is shut down |
A number of sysctl() entries are present in the sysctl tree. Each device appears as a sysctl node that holds per-device information, under the top-level dev node. Available information about the device includes:
Per-device information is stored in a sysctl node whose name derives from the canonical physical pathname of the device.
This string holds the device class, if provided by the DDM. If no value is supplied, the content of this entry defaults to '?'.
The integer contains both the availability and run status of the device, as provided by the DDM.
This structure holds the device-class-specific statistics. Reading this node returns an error if the device does not export statistics.
This entry triggers the diagnostic process of a device by writing a magic value to it (1), retrieves the result of the last diagnostic by reading it. An error may be returned if the device does not support diagnostics or if the diagnostics cannot run because the device is not in the appropriate state.
Similar to device diagnostics, this entry triggers the audit process and retrieves the result of the previous audit.
The SYSTEM_EVENTS feature enables a user-level application to be notified of the occurrence of events in the system and/or drivers. The following events are posted by the system and received by the application:
Gauges crossing their threshold
Creation or destruction of processes and, optionally, actors
File system mounts and unmounts
Detection of error in a driver
Detection of error in the operating system
System events are carried by messages that are placed in different queues, depending upon the kind of events. In the ChorusOS operating system, the system events feature relies on the MIPC microkernel feature. The maximum number of system events that can be queued by the system is fixed by a tunable, set when you build the system.
The system events feature is also available to user-level applications to exchange events and is not restricted to system-level communication.
In the context of system events, the following terms are defined:
An event is something that happens inside one entity corresponding to a change in the abstract state of that subsystem or application. Events are not generally observable from outside the entity, and cannot correspond to a change in the actual state of the entity. The entity in which the event occurs can notify certain user applications of the occurrence.
An event publisher is the entity that notifies other entities of the occurrences of a particular set of events. Notification of occurrences of events can be made directly to interested entities or through an intermediary dispatcher. The events can be generic to a particular technology or specific to the event publisher.
An event subscriber is an entity that is interested in the occurrence of certain events. It can subscribe its interest directly with the event publisher or with some intermediary entity to receive event notifications.
An event buffer is passed from an event publisher to an event consumer to indicate that an event has occurred. The buffer includes information to describe the occurrence of an event in a particular publisher. The event buffer can be passed directly from the publisher to the consumer, or through an intermediary dispatcher.
At a minimum, an event is described by its event type, event identifier and publisher ID. These three fields combine to form the event buffer header. The goal is to provide a simple and flexible way to describe the occurrence of an event. If additional information is required to describe the event further, a publisher can provide a list of self-defined attributes. Event attributes contain an event attribute name/value pair that combine to define that attribute. Event attributes are used in event objects to provide self-defining data as part of the event buffer. The name of the event attribute is a character string. The event attribute value is a self-defining data structure that contains a data-type specifier and the appropriate union member to hold the value of the data specified.
Applications are provided a libnvpair to handle the attribute list and to provide a set of interfaces for manipulating name-value pairs. The operations supported by the library include adding and deleting name-value pairs, looking up values, and packing the list into contiguous memory to pass it to another address space. The packed and unpacked data formats are freshened internally. New data types and encoding methods can be added with backward compatibility.
To enable the code of this library to be linked to the Solaris kernel or to the ChorusOS operating system, the standard errno variable is not used to notify the caller that an error occurred. Error values are returned by the library functions directly.
The system events API is summarized in the following table:
Function |
Description |
---|---|
sysevent_get_class_name() |
Get the class name of the event |
sysevent_get_subclass_name() |
Get the subclass name |
sysevent_get_size() |
Get the event buffer size |
sysevent_get_seq() |
Get the event buffer size |
sysevent_get_time() |
Get the time stamp |
sysevent_free() |
Free memory for system event handle |
sysevent_post_event() |
Post a system event from userland |
sysevent_get_event() |
Wait for a system event |
sysevent_get_attr_list() |
Get the attribute list pointer |
sysevent_get_vendor_name() |
Get the publisher vendor name |
sysevent_get_pub_name() |
Get the publisher name |
sysevent_get_pid() |
Get the publisher PID |
sysevent_lookup_attr() |
Search the attribute list |
sysevent_attr_next() |
Returns the next attribute associated with event |
sysevent_dup() |
Duplicate a system event |
The OS_GAUGES module generates
system events related to the OS component of the ChorusOS operating system,
following alarms or signals generated by gauges, counters and thresholds.
These system events are passed to the C_OS
.
The OS_GAUGES module has no dedicated system calls, but rather reads and controls counters, gauges and thresholds through sysctl(), sysctlbyname(), and the /PROCFS file system.
For details, see the INSTRUMENTATION(5FEA) man page.
MKSTAT
)Statistics regarding
the microkernel are provided to the C_OS by the MKSTAT
module. Statistics for events such as alarms and creation
or deletion of ChorusOS actors and POSIX processes are retrieved by sysctl and /proc and then grouped by function
type in the MKSTAT
module.
For details, see the INSTRUMENTATION(5FEA) man page.
The MKSTAT API is summarized in the following table:
Function |
Description |
---|---|
mkStatMem() |
Memory statistics |
mkStatSvPages() |
Supervisor page statistics |
mkStatActors() |
mkStatThreads |
mkStatThreads() |
Execution statistics |
mkStatCpu() |
CPU statistics |
mkStatActorMem() |
Per-actor statistics |
mkStatActorSvPages() |
Supervisor per-actor statistics |
mkStatThreadCpu() |
Per-thread statistics |
mkStatEvtCtrl() |
Event control statistics |
mkStatEvtWait() |
Events waiting statistics |
The C_OS
implements the microkernel memory instrumentation via the sysctl kern.mkstats.mem
node. The OS_GAUGES
feature must be set to true.
Instrumentation related to memory use comprises the following measurements:
Function |
Instrument Type |
Description |
---|---|---|
physPagesEquiped() |
Attribute |
Measures the amount of physical pages of memory available on the node |
physPagesavail() |
Gauge (low threshold) |
Measures the amount of physical pages of memory currently available |
allocFailures() |
Counter |
Number of memory allocation failures since boot |
pageSize() |
Attribute |
Size in bytes of physical page |
The C_OS
implements the microkernel supervisor page instrumentation
via the sysctl kern.mkstats.svpages
node. The OS_GAUGES
feature must be set to true.
Instrumentation related to use of supervisor pages comprises the following measurement:
Function |
Instrument Type |
Description |
---|---|---|
svPages() |
Gauge (high threshold) |
Measures number of supervisor pages currently allocated |
The C_OS
implements the microkernel execution
instrumentation via the sysctl kern.mkstats.actors
and kern.mkstats.threads
nodes. The OS_GAUGES
feature must be set to true.
Instrumentation related to microkernel execution function comprises the following measurements:
Function |
Instrument Type |
Description |
---|---|---|
maxActors() |
Attribute |
Measures the maximum number of actors that can be created |
actors() |
Gauge (high threshold) |
Measures the current number of actors in use |
maxThreads() |
Attribute |
Measures the maximum number of threads that can be created |
threads() |
Gauge (high threshold) |
Measures the current number of threads in use |
The C_OS
implements the microkernel CPU instrumentation via the sysctl kern.mkstats.cpu
node.
Instrumentation related to microkernel CPU use comprises the following measurements:
Function |
Instrument Type |
Description |
---|---|---|
total_cpu() |
Counter |
Measures the number of milliseconds CPU has been used since boot |
external() |
Counter |
Measures the number of milliseconds the CPU has been used outside execution actor since boot (similar to UNIX supervisor mode) |
internal() |
Counter |
Measures the number of milliseconds the CPU has been used inside execution actor supervisor mode since boot (similar to UNIX user mode) |
This basic instrumentation provides only raw measurements on top of which applications can compute ratios of CPU use according to their needs.
The C_OS
implements the microkernel POSIX process instrumentation via the sysctl kern.mkstats.procs
node.
Instrumentation related to microkernel processes comprises the following measurements:
Function |
Instrumentation Type |
Description |
---|---|---|
procs() |
Gauge (high threshold) |
Measures the current number of processes in use on the node |
nb_syscalls() |
Counter |
Counts the number of system calls performed since boot |
nb_syscalls_failures() |
Counter |
Counts the number of failed system calls since boot |
nb_fork_failures() |
Counter |
Counts the number of failed fork() system calls since boot |
The C_OS
implements the microkernel file instrumentation
via the sysctl kern.mkstats.files
node.
Instrumentation related to microkernel file use comprises the following measurements:
Function |
Instrument Type |
Description |
---|---|---|
open_files() |
Gauge (high threshold) |
Measures the current number of open files |
vnodes() |
Gauge (high threshold) |
Current number of used virtual nodes ( |
The following instrumentation is available for each mounted file system:
Function |
Instrument Type |
Description |
---|---|---|
fs_status() |
Attribute |
Determines availability of threshold controls (for example, a read-only mounted file system has no threshold control) |
fs_max_size() |
Attribute |
Size of the file system in blocks |
fs_bsize() |
Attribute |
Size in bytes of the block |
fs_space_free() |
Gauge (low threshold) |
Number of blocks currently available in the file system |
fs_max_files() |
Attribute |
Maximum number of files that can be created on the file system |
fs_nb_files() |
Gauge |
Current number of files created on the file system |
For each actor or process currently active on the system, the following
information is available to the C_OS
via the stats entry of the process directory in the /proc
file system:
Function |
Instrument Type |
Description |
---|---|---|
virtpages() |
Gauge (high threshold) |
Counts the number of virtual memory pages used by an actor |
physPages() |
Simple Gauge |
Counts the number of physical memory pages used by an actor |
lockPages() |
Simple Gauge |
Number of locked memory pages used by an actor |
process_virt_pages() |
Gauge (high threshold) |
Number of virtual memory pages used by a process |
process_phys_pages() |
Simple Gauge |
Number of physical memory pages used by a process |
process_lock_pages() |
Simple Gauge |
Number of locked memory pages used by a process |
open_files() |
Gauge (high threshold) |
Current number of open file descriptors |
internal_cpu() |
Counter |
Cumulated (all threads) internal CPU usage in milliseconds (similar to user mode) |
external_cpu() |
Counter |
Cumulated (all threads) external CPU usage in milliseconds (similar to system mode) |
For each thread currently active on the system, the following information is available via the stats entry of the process directory in the /proc file system:
Function |
Instrument Type |
Description |
---|---|---|
internal_cpu() |
Counter |
Internal CPU time spent in milliseconds (similar to user mode) |
external_cpu() |
Counter |
External CPU time spent in milliseconds (similar to supervisor mode) |
waiting_cnt() |
Counter |
Number of times the thread has been blocked |