ChorusOS 5.0 Features and Architecture Overview

System Instrumentation

The ChorusOS operating system provides instrumentation to inform applications of the current use of the various resources managed by the system. Several kinds of instrumentation are exported by the system:

Attributes:

Static read-only values that show how the system is configured. These attributes are usually tunable values set when you build your system.

Counters:

Values that increase constantly, such as, the number of bytes transferred on a disk, or the number of packets received on a network interface. Such counters can only be read by the application. Some counters can be reset.

Gauges:

Values that increase and decrease depending upon the activity of the system, such as, the amount of memory used or the number of open file descriptors used. Most of the time, gauges are associated with watermarks. The ChorusOS operating system manages one high and one low watermark per gauge. Gauges can only be read, while watermarks can be read or reset.

Thresholds:

Gauges with watermarks can also be associated with either a high or a low threshold, depending upon the semantics of the resource being instrumented. A threshold is represented by two values:

  • a rise value, such that when the gauge's value passes the rise value a system event will be generated and posted to the application level

  • a clear value, such that when the gauge's value passes the clear value, another system event will be generated and posted to application level

Rise and clear values are illustrated in the following figures:

Figure 3-6 Rise and Clear Values for a High Threshold

Graphic

Figure 3-7 Rise and Clear Values for a Low Threshold

Graphic

You can modify the value of the threshold rise and clear values dynamically. At system initialization time, the thresholds are disabled until they are set explicitly by an application.

In addition, the system exhibits a number of tunable values that you can modify dynamically to affect the behavior of the system. These values might, for example, represent the maximum number of open file descriptors per process, or IP forwarding behavior.

The values exposed are given symbolic names according to a tree schema, or they can be accessed through an object identifier (OID), obtained from the symbolic name of the value. The API for getting or setting, or getting and setting, these values is based on the sysctl() facility defined by FreeBSD systems. See the following section for details.

The sysctl Facility

The sysctl facility allows the retrieval of information from the system, and allows processes with appropriate privileges to set system information.

The information available from sysctl consists of integers, strings, tables, or opaque data structures. This information is organized in a tree structure containing two types of node:

Proxy leaf nodes

Access data acquired dynamically on demand. These nodes transparently handle the information exposed by the microkernel

Dynamically created nodes

Represent the information exposed by the devices, as it appears and disappears dynamically

Only proxy leaf nodes have data associated with them.

The sysctl nodes are natively identified using a management information base (MIB) style name, an OID, in the form of a unique array of integers.

sysctl API

Two sysctl system calls are provided:

Function 

Description 

sysctl()

Get/set a value identified by its OID 

sysctlbyname()

Get/set a value identified by its name 

For details, see the sysctl(1M) man page.

Device Instrumentation and Management

The sysctl() facility is used to expose the instrumentation information maintained by the device drivers. This information is retrieved via the Device Driver Manager (DDM).

The Device Driver Manager is a system component that enables a supervisor application to manage devices. Only the devices that export a management DDI interface or that have a parent that exports this DDI can be managed in this way. The DDM is an abstraction of the DKI and the management DDI.

The DDM is implemented as a set of functions that are organized in a library, and can only be used by one client at a time.

The DDM implements a tree of manageable devices with the following properties and features:

Availability and run states are completely independent of each other, despite the fact that a disabled device may eventually be inactive.

The state of a device is changed on request from the DDM client or by external events, such as hardware failure or device hot swap. In both cases the DDM client is notified of the successful state change through a handler (callback) that is defined at the time of opening.

Device Tree

The initial internal device tree is built by taking all devices that satisfy the following criteria:

The tree of devices exposed by the DDM to its client is only a subset of the internal tree managed by the DDM. This in turn is a subset of the complete device tree for the current board. The way in which it is built is described in the preceding section.

The devices that are exposed via the DDM are:

The device tree API is summarized in the following table:

Function 

Description 

svDdmAudit()

Runs non-intrusive tests on an online device 

svDdmClose()

Closes a previously made connection to the device driver manager 

svDdmDiag()

Runs diagnostics on a node that is currently offline 

svDdmDisable()

Locks the specified device node in the disabled state 

svDdmEnable()

Enables a client to set the availability state of the specified device node to DDM_AVSTATE_ENABLED

svDdmGetInfo()

Enables the client of the DDM to obtain information on the specified node in the manageable device tree 

svDdmGetState()

Enables the client of the DDM to get the state value of the specified node 

svDdmGetStats()

Returns raw I/O statistics (counters) for an online device 

svDdmOffline()

Enables the DDM client to set the run state of the specified node to DDM_RUNSTATE_OFFLINE

svDdmOnline()

Enables the DDM client to set the run state of the specified node to DDM_RUNSTATE_ONLINE

svDdmOpen()

Opens a connection to the device driver manager and obtains access to the management of the current device driver instances 

svDdmShutdown()

Enables the DDM client to request that the driver running on the specified node is shut down  

Related sysctl() entries

A number of sysctl() entries are present in the sysctl tree. Each device appears as a sysctl node that holds per-device information, under the top-level dev node. Available information about the device includes:

Name

Per-device information is stored in a sysctl node whose name derives from the canonical physical pathname of the device.

Class

This string holds the device class, if provided by the DDM. If no value is supplied, the content of this entry defaults to '?'.

Status

The integer contains both the availability and run status of the device, as provided by the DDM.

Statistics

This structure holds the device-class-specific statistics. Reading this node returns an error if the device does not export statistics.

Diagnostics

This entry triggers the diagnostic process of a device by writing a magic value to it (1), retrieves the result of the last diagnostic by reading it. An error may be returned if the device does not support diagnostics or if the diagnostics cannot run because the device is not in the appropriate state.

Audit

Similar to device diagnostics, this entry triggers the audit process and retrieves the result of the previous audit.

System Events

The SYSTEM_EVENTS feature enables a user-level application to be notified of the occurrence of events in the system and/or drivers. The following events are posted by the system and received by the application:

System events are carried by messages that are placed in different queues, depending upon the kind of events. In the ChorusOS operating system, the system events feature relies on the MIPC microkernel feature. The maximum number of system events that can be queued by the system is fixed by a tunable, set when you build the system.

The system events feature is also available to user-level applications to exchange events and is not restricted to system-level communication.

In the context of system events, the following terms are defined:

At a minimum, an event is described by its event type, event identifier and publisher ID. These three fields combine to form the event buffer header. The goal is to provide a simple and flexible way to describe the occurrence of an event. If additional information is required to describe the event further, a publisher can provide a list of self-defined attributes. Event attributes contain an event attribute name/value pair that combine to define that attribute. Event attributes are used in event objects to provide self-defining data as part of the event buffer. The name of the event attribute is a character string. The event attribute value is a self-defining data structure that contains a data-type specifier and the appropriate union member to hold the value of the data specified.

Applications are provided a libnvpair to handle the attribute list and to provide a set of interfaces for manipulating name-value pairs. The operations supported by the library include adding and deleting name-value pairs, looking up values, and packing the list into contiguous memory to pass it to another address space. The packed and unpacked data formats are freshened internally. New data types and encoding methods can be added with backward compatibility.

To enable the code of this library to be linked to the Solaris kernel or to the ChorusOS operating system, the standard errno variable is not used to notify the caller that an error occurred. Error values are returned by the library functions directly.

System Events API

The system events API is summarized in the following table:

Function 

Description 

sysevent_get_class_name()

Get the class name of the event 

sysevent_get_subclass_name()

Get the subclass name  

sysevent_get_size()

Get the event buffer size 

sysevent_get_seq()

Get the event buffer size 

sysevent_get_time()

Get the time stamp 

sysevent_free()

Free memory for system event handle 

sysevent_post_event()

Post a system event from userland 

sysevent_get_event()

Wait for a system event 

sysevent_get_attr_list()

Get the attribute list pointer 

sysevent_get_vendor_name()

Get the publisher vendor name 

sysevent_get_pub_name()

Get the publisher name  

sysevent_get_pid()

Get the publisher PID  

sysevent_lookup_attr()

Search the attribute list 

sysevent_attr_next()

Returns the next attribute associated with event 

sysevent_dup()

Duplicate a system event 

OS_GAUGES

The OS_GAUGES module generates system events related to the OS component of the ChorusOS operating system, following alarms or signals generated by gauges, counters and thresholds. These system events are passed to the C_OS.

The OS_GAUGES module has no dedicated system calls, but rather reads and controls counters, gauges and thresholds through sysctl(), sysctlbyname(), and the /PROCFS file system.

For details, see the INSTRUMENTATION(5FEA) man page.

Microkernel Statistics (MKSTAT)

Statistics regarding the microkernel are provided to the C_OS by the MKSTAT module. Statistics for events such as alarms and creation or deletion of ChorusOS actors and POSIX processes are retrieved by sysctl and /proc and then grouped by function type in the MKSTAT module.

For details, see the INSTRUMENTATION(5FEA) man page.

MKSTAT API

The MKSTAT API is summarized in the following table:

Function 

Description 

mkStatMem()

Memory statistics 

mkStatSvPages()

Supervisor page statistics 

mkStatActors()

mkStatThreads 

mkStatThreads()

Execution statistics 

mkStatCpu()

CPU statistics 

mkStatActorMem()

Per-actor statistics 

mkStatActorSvPages()

Supervisor per-actor statistics 

mkStatThreadCpu()

Per-thread statistics 

mkStatEvtCtrl()

Event control statistics 

mkStatEvtWait()

Events waiting statistics 

Microkernel Memory Instrumentation

The C_OS implements the microkernel memory instrumentation via the sysctl kern.mkstats.mem node. The OS_GAUGES feature must be set to true.

Instrumentation related to memory use comprises the following measurements:

Function 

Instrument Type 

Description 

physPagesEquiped()

Attribute 

Measures the amount of physical pages of memory available on the node 

physPagesavail()

Gauge (low threshold) 

Measures the amount of physical pages of memory currently available 

allocFailures()

Counter 

Number of memory allocation failures since boot 

pageSize()

Attribute 

Size in bytes of physical page 

Microkernel Supervisor Page Instrumentation

The C_OS implements the microkernel supervisor page instrumentation via the sysctl kern.mkstats.svpages node. The OS_GAUGES feature must be set to true.

Instrumentation related to use of supervisor pages comprises the following measurement:

Function 

Instrument Type 

Description 

svPages()

Gauge (high threshold) 

Measures number of supervisor pages currently allocated 

Microkernel Execution Instrumentation

The C_OS implements the microkernel execution instrumentation via the sysctl kern.mkstats.actors and kern.mkstats.threads nodes. The OS_GAUGES feature must be set to true.

Instrumentation related to microkernel execution function comprises the following measurements:

Function 

Instrument Type 

Description 

maxActors()

Attribute 

Measures the maximum number of actors that can be created 

actors()

Gauge (high threshold) 

Measures the current number of actors in use 

maxThreads()

Attribute 

Measures the maximum number of threads that can be created 

threads()

Gauge (high threshold) 

Measures the current number of threads in use 

Microkernel CPU Instrumentation

The C_OS implements the microkernel CPU instrumentation via the sysctl kern.mkstats.cpu node.

Instrumentation related to microkernel CPU use comprises the following measurements:

Function 

Instrument Type 

Description 

total_cpu()

Counter 

Measures the number of milliseconds CPU has been used since boot 

external()

Counter 

Measures the number of milliseconds the CPU has been used outside execution actor since boot (similar to UNIX supervisor mode) 

internal()

Counter 

Measures the number of milliseconds the CPU has been used inside execution actor supervisor mode since boot (similar to UNIX user mode) 

This basic instrumentation provides only raw measurements on top of which applications can compute ratios of CPU use according to their needs.

POSIX Process Instrumentation

The C_OS implements the microkernel POSIX process instrumentation via the sysctl kern.mkstats.procs node.

Instrumentation related to microkernel processes comprises the following measurements:

Function 

Instrumentation Type 

Description 

procs()

Gauge (high threshold) 

Measures the current number of processes in use on the node 

nb_syscalls()

Counter 

Counts the number of system calls performed since boot 

nb_syscalls_failures()

Counter 

Counts the number of failed system calls since boot 

nb_fork_failures()

Counter 

Counts the number of failed fork() system calls since boot

File Instrumentation

The C_OS implements the microkernel file instrumentation via the sysctl kern.mkstats.files node.

Instrumentation related to microkernel file use comprises the following measurements:

Function 

Instrument Type 

Description 

open_files()

Gauge (high threshold) 

Measures the current number of open files 

vnodes()

Gauge (high threshold) 

Current number of used virtual nodes (vnodes)

Per-File System Instrumentation

The following instrumentation is available for each mounted file system:

Function 

Instrument Type 

Description 

fs_status()

Attribute 

Determines availability of threshold controls (for example, a read-only mounted file system has no threshold control) 

fs_max_size()

Attribute 

Size of the file system in blocks 

fs_bsize()

Attribute 

Size in bytes of the block 

fs_space_free()

Gauge (low threshold) 

Number of blocks currently available in the file system 

fs_max_files()

Attribute 

Maximum number of files that can be created on the file system 

fs_nb_files()

Gauge 

Current number of files created on the file system 

Per-Actor and Per-Process Instrumentation

For each actor or process currently active on the system, the following information is available to the C_OS via the stats entry of the process directory in the /proc file system:

Function 

Instrument Type 

Description 

virtpages()

Gauge (high threshold) 

Counts the number of virtual memory pages used by an actor 

physPages()

Simple Gauge 

Counts the number of physical memory pages used by an actor 

lockPages()

Simple Gauge 

Number of locked memory pages used by an actor 

process_virt_pages()

Gauge (high threshold) 

Number of virtual memory pages used by a process 

process_phys_pages()

Simple Gauge 

Number of physical memory pages used by a process 

process_lock_pages()

Simple Gauge 

Number of locked memory pages used by a process 

open_files()

Gauge (high threshold) 

Current number of open file descriptors 

internal_cpu()

Counter 

Cumulated (all threads) internal CPU usage in milliseconds (similar to user mode) 

external_cpu()

Counter 

Cumulated (all threads) external CPU usage in milliseconds (similar to system mode) 

Microkernel Per-Thread Instrumentation

For each thread currently active on the system, the following information is available via the stats entry of the process directory in the /proc file system:

Function 

Instrument Type 

Description 

internal_cpu()

Counter 

Internal CPU time spent in milliseconds (similar to user mode) 

external_cpu()

Counter 

External CPU time spent in milliseconds (similar to supervisor mode) 

waiting_cnt()

Counter 

Number of times the thread has been blocked