ChorusOS 5.0 Application Developer's Guide

Software Black Box

The ChorusOS black box enables applications to read the event buffer and log events. The black box feature can be defined as a set of microkernel ring buffers. Multiple black boxes can be configured, each identified with a unique integer identifier. The number and sizes of black boxes on your ChorusOS are tunable. A specified black box can be frozen and events directed to another black box.

Events are tagged with specific information (event producer identifier, event producer name, filter data, and timestamp). Some filtering is performed before events are inserted into a black box. Events can be filtered in or out based on the event producer name, and on the severity of the event. Descriptive event tags can be added to a black box along with each event to enable finer grained post-process filtering.

Black Box APIs

All black box APIs are thread-safe. Calling any one of these APIs will produce valid results, even if the call is made by multiple threads. The bb_event and bb_freeze events are async-signal-safe. This means that they can safely be called from a signal handler, and cannot be interrupted by a signal. No black box APIs are cancel-safe or interruptible.

Using the Black Box

At node startup, a specific number of black boxes are allocated. The node will select one of these black boxes to be active. This is performed sufficiently early in the boot process so that all user applications are able to log events to it as soon as they start. Some black box parameters are configurable at node startup time, that is before applications start running.

Before an application component manager launches an application, it prepares to catch the application in case it fails unexpectedly. If the application does fail, the component manager logs information about the failure to the black box, and triggers a freeze of the current black box.

Any application can log events in a black box at any point by calling bb_event. If multiple entities on the node are generating black box events, a specific thread may have fewer events than expected recorded in the black box.

At some point, an event may occur which requires freezing the black box. This is accomplished by calling bb_freeze. When the current black box is frozen, a new black box is chosen (if one is available), and new events are routed into it.

The criteria by which event filtering is performed can be modified dynamically. See "Filtering APIs" for details on the capabilities of the filtering APIs.

A system process can locate a frozen black box, open it, and read its contents to diagnose a failure. After the failure has been diagnosed, the process calls bb_release to allow the system to reselect that black box.

The black box API can also be used to store microkernel-level information. In this case the black box logs exceptions, traps, panics, and failed system calls. To use this feature, the kern.blackbox.kernelLogging tunable must be set to 1.

When the HOT_RESTART feature is selected,persistent memory can be used to store the black box data. To use persistant memory in this way, the BSP must be compiled with the -DBB_PERSISTENT flag.


Note -

When the BSP is compiled with the -DBB_PERSISTENT flag, the total size allocated for the black boxes is fixed and equally distributed for the specified number of black boxes.


Example 14-1 demonstrates the basic use of the black box API in an application.


Example 14-1 Basic Use of the Software Black Box

#include <sys/blackbox.h>

#define SEV_LOW 4 
#define SEV_HIGH 24 
#define SEV_EMERG 31

int main(int argc, char **argv)
{ 
   int fd; 
   
   bb_event("main", SEV_LOW, "Starting with %d args", argc); 
   
   if (argc > 1) { 
   	   bb_event("main", SEV_LOW, "argv[1] = %s", argv[1]);
   } else { 
   	   bb_event("main", SEV_EMERG, "no arguments   exiting");
   	   exit(1);
   } 
   
   if ((fd = open(argv[1], O_RDWR)) ==  1) {
   	   bb_event("main", SEV_HIGH,
   	    "failed to open \"%s\": error %d", argv[1], errno);
   } 
   ... 
}


Note -

Depending on the node's filtering configuration, some of the events in this example may not be added to the black box.


Filtering APIs

The filtering APIs include the following interfaces:

These interfaces enable three levels of pre-process event filtering; filter list, filtered severity bitmap, and global severity bitmap.

You can specify an event producer to use the filter list and filtered severity bitmap (also known as the fine grained filters) by identifying the producer in a set passed to bb_setprodids.

The filter list contains a set of pairs of tag and severity. These entities are described in the definition of bb_getseverity. An event is entered into the black box if:

The filter list can be updated by reading the current list with bb_getfilters, adding to it, subtracting from it, or modifying it, and then passing the new list to bb_setfilters.

The filtered severity bitmap is a node-wide severity bitmap. An event will be entered into the black box if:

The filtered severity bitmap can be updated by including a tag, severity filter with a tag of BB_ANY_TAG in a call to bb_setfilters.

The global severity bitmap is also a node-wide severity bitmap. If a call to bb_event does not find a match in the filter list or the filtered severity bitmap, or if the caller is not using these filters, the bb_event call will fall back to using the global severity bitmap. An event will be entered into the black box if the call to bb_event has a severity that is enabled in the global severity bitmap.

The global severity bitmap can be modified by calling bb_getseverity, modifying the bitmap, and passing the new bitmap to bb_setseverity.

The three filters are used in order, from the most to the least specific:

If an event matches a filter in the filter list, it will added to the black box, otherwise it is tried against the filtered severity bitmap. If a match is found in the filtered severity bitmap, it will be added to the black box. If no match is found, it will be tried against the global severity bitmap. A consequence of this approach is that there is no strict filter out support. If the first filter tried does not match an event, another filter may find a match, even if the intent was for the more specific filter to filter it out. In such a case, however, the filters as specified are inconsistent, and a semantic should be chosen.

Using the Filtering APIs

The following example illustrates the use of tags in libraries and some filtering interfaces. It also shows what a black box dump should look like.


Example 14-2 Using the Filtering APIs

#include <sys/blackbox.h>

#define SEV_TRACE 8 
#define SEV_INIT  15
#define SEV_HIGH  28
#define SEV_OOM   30

int
foo_init(foo_t **fp, hat_t *hp)
{
  hat_t *nhp;
  bar_t *bp;
  uint_t total_reqs, i;
  *fp = NULL;

  if (hp == NULL) {
    bb_event("libA:foo_init", SEV_TRACE, "hat pointer is NULL, skipping");
    return (0);

  }
	
  for (nhp = hp, total_reqs = 0; nhp != NULL; nhp = nhp->hat_next)
      total_reqs += hp->hat_reqs;
  bb_event("libA:foo_init", SEV_TRACE, "total of %d requests", total_reqs);
		
  if (total_reqs == 0)
      return (0);

  *fp = (foo_t *)malloc(sizeof (foo_t));
  if (*fp == NULL) {
      bb_event("libA:foo_init", SEV_OOM, "out of memory for foo_t");
      return (-1);
  }

  bp = (bar_t *)malloc(total_reqs * sizeof (bar_t));
  if (bp == NULL) {
      bb_event("libA:foo_init", SEV_OOM,
               "no memory for %d bars", total_reqs);
      free(*fp);
      return (-1);
  }

  (*fp)->foo_array = bp;
  (*fp)->foo_hatptr = hp;
  (*fp)->foo_reqcnt = total_reqs;

  for (nhp = hp; nhp != NULL;
      nhp = nhp->hat_next, bp += nhp->hat_reqs) {
      nhp->hat_array = bp;

  for (i = 0; i < nhp->hat_reqs; i++)
      bp->bar_id = i;
  }
  bb_event("libA:foo_init", SEV_INIT, "foo at %p created", *fp);

  return (0);
}

Within the program itself:

int process_hatlist(hat_t *hp)
{
  foo_t *fp;

  bb_event("process_hatlist", SEV_TRACE, "received hat list %p", hp);
		
  if (foo_init(&fp, hp) != 0) {
      bb_event("process_hatlist", SEV_HIGH,
               "cant make foo for hatlist %p", hp);
      return ( 1);
  }

  if (fp != NULL) {
      foo_getreqs(fp);
      foo_enqueue(fp);
  }

  return (0);
}

Given the program from the previous example, the following examples illustrate filters and the potential black box traces that they may generate. (Note that this is simply an example trace format and may bear no resemblance to any output an administrative CLI program might display.)


Example 14-3 Filter and Trace Showing Successful Execution

Filter:

bb_filter_t filts[] = {
		{ 0, "libA:foo_init" },
};
bb_prodid_t prods[] = {
		{ BB_ALL_PROD, BB_ALL_PIDS }
};
bb_severity_t sev;
BB_SEV_SET_LIMIT(SEV_TRACE, &filts[0].bbf_severity);
BB_SEV_SET_LIMIT(SEV_HIGH, &sev);
bb_setfilters(filts, 1);
bb_setprodids(prods, 1);
bb_setseverity(sev);

Trace:

2000 03 15 18:56:02.928980 hatrouter[2583/1]: libA:foo_init(8):
		total of 93 requests
2000-03-15 18:56:02.929301 hatrouter[2583/1]: libA:foo_init(15):
		foo at 0x78a89380 created
2000-03-15 18:56:03.017327 foodb[2448/1]: libA:foo_init(8):
		total of 93 requests
2000-03-15 18:56:03.017936 foodb[2448/1]: libA:foo_init(15):
		foo at 0x300004bc8d8 created

In the previous example, the filter and trace indicate that for both producers, foo_init was successful. No errors ocurred and the chain of 93 hat_ts were successfully passed from the hatrouter to the foodb. The number after the tag libA:foo_init indicates the severity level of the event.



Example 14-4 Filter and Trace Showing Unsuccessful Execution

Filter:

bb_filter_t filts[] = {
		{ 0, BB_ALL_TAGS },
};
bb_prodid_t prods[] = {
		{ "hatrouter", BB_ALL_PIDS }
};
bb_severity_t sev;
BB_SEV_SET_LIMIT(SEV_TRACE, &filts[0].bbf_severity);
BB_SEV_SET_LIMIT(SEV_HIGH, &sev);
bb_setfilters(filts, 1);
bb_setprodids(prods, 1);
bb_setseverity(sev);

Trace:

2000-03-15 19:33:28.139821 hatrouter[2583/1]: process_hatlist(8):
		received hat list 0x7f78a190
2000-03-15 19:33:28.148295 hatrouter[2583/1]: libA:foo_init(30):
		out of memory for 28173 bar's
2000-03-15 19:33:28.148376 hatrouter[2583/1]: process_hatlist(28):
		can't make foo for hat list 0x7f78a190

In the previous example, the filter and trace suggest that the hatrouter received an extremely long chain of hat_ts and could not allocate enough bar_ts to back them up.