Using System Web Interface to Diagnose Problems

The System Web Interface Dashboard provides a single page overview of the status of a single Oracle Solaris instance. Select the name of the system to display information about that system configuration. Select the Faults & Activity button to show FMA faults and alerts and system configuration changes such as SMF and audit events.

The Dashboard displays visualizations for key components of the system. These visualizations show current and recent performance and system faults and other events that are related to the statistics on the visualization. Each visualization has four subsections that show the data values for the following time ranges: last 7 days, last 24 hours, last hour, and last minute. Hover your mouse pointer over the data to display a pop-up with the name, time stamp, and value of that data point. Related events are marked on the visualizations by blue triangular icons. Hover your mouse pointer over a triangle icon to display a pop-up with specific information about that event or alert. For example, a graph that shows a spike in CPU utilization might have an event icon at that same point that shows a CPU faulting or being taken offline.

The information on the Dashboard helps you quickly identify any anomalies, which guides you to which system resources to examine to diagnose potential problems. Past system resource use is often a good indicator of whether the current behavior is anomalous.

If any graph looks anomalous, or if a resource in the graph is specifically interesting, select that graph to open a sheet with more visualizations with related statistics and events. These sheets have sections, one or more groups in each section, and one or more visualizations in each group. Sections, groups, and visualizations have descriptions of the information that is displayed, possible causes of any anomalies, and references to other sheets and visualizations that might be helpful.

To review data on other sheets, select Oracle Solaris Analytics from the Applications menu, and select the thumbnail of the sheet you want to view. The online help explains how to use the filtering options at the top of the page or mark a sheet as a favorite to more quickly locate a sheet.

If the time of the data values that you want to investigate is not shown on the new sheet, use the arrow buttons or time range button at the top of the sheet to adjust the view. To hold the display at that time range while you investigate, select the Pause All button at the top of the sheet.

Many of the predefined visualizations report utilization, saturation, and errors of key system resources.

Utilization is expressed as the currently achieved percentage of theoretical maximum throughput.
Saturation is the amount of work queued: work that is not being actively serviced. Saturation is sometimes expressed as a multiple of the maximum utility (capacity).
Errors are expressed as occurrences per second.

Other visualizations show these key metrics across all available resources for that measurement. Viewing all resources can provide a greater appreciation of the state of a particular resource. For example, high CPU utilization might be correlated with a backup event; you might see from historical data that similar utilization occurs every day at this time. Alternatively, high CPU utilization might be correlated with a fault event or other error, resulting in overload of remaining resources.

In addition to averages across resources, individual resource utilization is shown for many resources, enabling you to more effectively evaluate resources that are allocated directly to particular workloads. For example, an individual CPU might be allocated directly to a zone, and individual NICs are allocated to distinct networks.

Visualizations also provide partitioned views of statistics to quickly identify whether individual resources are seeing excessive utilization, saturation, or errors. Viewing Different Aspects of the Same Data shows an example of using a partitioned statistic to determine which applications are responsible for most of the network traffic on a system. Similarly, in addition to overall CPU usage data, you can select the zone partition of CPU usage data to determine which zones are using the most CPU. One CPU might be overused because of one particular zone.

A single graph can show event as well as statistic data. Events might include administrative changes to relevant resources such as configuring a new datalink, or FMA events such as a system service transitioning into the maintenance state.