NAME
er_kernel - generate an Analyzer experiment on the Solaris
kernel
SYNOPSIS
er_kernel args [load-command]
AVAILABILITY
Solaris systems with DTrace supported
DESCRIPTION
The er_kernel command can generate an experiment from the
Solaris kernel, using the DTrace functionality provided with
some Solaris releases. The data may be examined with a GUI
program, analyzer, or a command-line version, er_print.
The er_kernel command may be used only by a user with DTrace
privileges.
If an optional command to provide a load is given, er_kernel
forks, and the child sleeps for a quiet period, then exe-
cutes the command to provide a load. When the child exits,
er_kernel continues for another quiet period, and then
exits. The duration of the quiet period may be specified by
a -q argument. The load command is launched as specified,
and may either be a command or a shell script. If it is a
script, it should wait for any commands it spawns to ter-
minate before exiting, or the experiment may be terminated
prematurely.
If an optional -t argument is given, er_kernel will collect
data according to the -t argument, and then exit.
If neither is specified, er_kernel will run until ter-
minated. It may always be terminated by ctrl-C (SIGINT), or
by using the kill command and sending SIGINT, SIGQUIT, or
SIGTERM to the er_kernel process.
ARGUMENTS
If invoked with no arguments, print a usage message.
If invoked with -h without any other arguments, and if the
processor supports hardware counter overflow profiling,
print two lists containing information about hardware
counters. The first list contains "aliased" hardware
counters; the second list contains raw hardware counters.
For more details, see the "Hardware Counter Overflow Profil-
ing" section in the collect.1 man page.
-p option
Collect clock-based profiles. The allowed values of
option are:
Value Meaning
off turn off clock-based profiling
on turn on clock-based profiling with the
default profiling interval of approximately
10 milliseconds
lo[w] turn on clock-based profiling with the low-
resolution profiling interval of approxi-
mately 100 milliseconds
hi[gh] turn on clock-based profiling with the high-
resolution profiling interval of approxi-
mately 1 millisecond
n turn on clock-based profiling with a profil-
ing interval of n.
The value may be an integer or floating-point
number, with a suffix of u specifying
microseconds, or m specifying milliseconds.
If no suffix is used, the value will be
assumed to be in milliseconds.
If the value is smaller than the system clock profiling
minimum it is set to the minimum; if it is not a multi-
ple of the clock profiling resolution it is rounded
down to the nearest multiple of the clock profiling
resolution. If it exceeds the clock profiling maximum,
an error is reported. If it is negative, an error is
reported. If it is zero, clock profiling is turned
off.
The DTrace profile provider, used to obtain the data,
is available only for integer values in ticks per
second. The value specified will be converted to an
integer rate, and then converted back to the time
corresponding to the actual rate used.
If no explicit -p off argument is given, and hardware-
counter overflow profiling is not turned on, clock-
based profiling is turned on by default.
-h option Collect hardware-counter overflow profiles (using
the DTrace cpc provider). The option is specified
as for the collect command. Hardware-counter pro-
filing is not available on systems prior to Oracle
Solaris 11. If the overflow mechanism on the chip
allows the kernel to tell which counter
overflowed, as many counters as the chip provides
may be used; otherwise, only one counter may be
specified. Dataspace profiling is not supported,
and dataspace requests are ignored.
The system hardware-counter mechanism can be used
by multiple processes for user profiling, but can
not be used for kernel profiling if any user pro-
cess, or cputrack, or another er_kernel is using
the mechanism. In that case, er_kernel will
report "HW counter profiling is not supported on
this system."
-F option Provide system-wide profiling, including the ker-
nel and applications. Control whether or not des-
cendant processes should have their data recorded.
The allowed values of option are:
Value Meaning
off Do not record experiments on application
processes; record on kernel only
(Default).
on Record experiments on all application
processes as well as the kernel
all Record experiments on all application
processes as well as the kernel
=<regexp> Record experiments on processes whose
name or PID matches the regular-
expression. See "SYSTEM-WIDE PROFIL-
ING", below.
-T { pid/tid | 0/did }
-T is no longer supported.
-t duration
Collect data for the specified duration. duration
may be a single number, followed by either m,
specifying minutes, or s, specifying seconds
(default), or two such numbers separated by a -
sign. If one number is given, data will be col-
lected from the start of the run until the given
time; if two numbers are given, data will be col-
lected from the first time to the second. If the
second time is zero, data will be collected until
the end of the run. If two non-zero numbers are
given, the first must be less than the second.
-q duration
Enforce a quiet period of length duration
(seconds) before and after running the specified
load. Default duration is 3 seconds. The quiet
period is ignored if no load is specified.
-S interval
Collect periodic samples at the interval specified
(in seconds). If interval is zero, do not collect
periodic samples. By default, enable periodic
sampling at 1-second intervals. The data recorded
in the samples is data for the er_kernel process,
and includes a timestamp and execution statistics
from the kernel, among other things. Samples are
markers within the data, and can be used for
filtering.
-C comment
Put the comment, either a single token, or a
quoted string, into the experiment. Up to ten
comments may be provided.
-o experiment_name
Use experiment_name as the name of the experiment
to be recorded. The experiment_name string must
end in the string .er; if not, report an error,
and do not run the experiment.
If -o is not specified, choose a name of the form
stem.n.er, where stem is a string, and n is a
number. If a -g argument is given, use the string
appearing before the .erg suffix in the group name
as the stem prefix; if no -g argument is given,
set the stem prefix to the string ktest.
If the name is not specified in the form
stem.n.er, and the the given name is in use, print
an error message and do not run experiment. If
the name is of that form, and the name is in use,
record the experiment under a name corresponding
to the first available value of n that is not in
use; issue a warning if the name is changed.
-l signal Record a sample point whenever the given signal is
delivered to the er_kernel process.
-y signal[,r]
Control recording of data with signal. Whenever
the given signal is delivered to the er_kernel
process, switch between paused (no data is
recorded) and resumed (data is recorded) states.
er_kernel is started in the resumed state if the
optional ,r flag is given, otherwise it is started
in the paused state. This option shall not affect
the recording of sample points.
-d directory_name
Place the experiment in directory directory_name.
if none is given, record into the current working
directory.
-g group_name
Consider the experiment to be part of experiment
group group_name. The group_name string must end
in the string .erg; if not, report an error, and
do not the experiment.
-L size Limit the amount of profiling and tracing data
recorded to size megabytes. The limit applies to
the sum of all profiling data and tracing data,
but not to sample points. The limit is only
approximate, and can be exceeded. Terminate the
experiment when the limit is reached. The allowed
values of size are:
Value Meaning
unlimited or none
Do not impose a size limit on the exper-
iment
n Impose a limit of n MB.; n must be posi-
tive and greater than zero.
There is no default limit on the amount of data
recorded.
-A option Control whether or not the kernel modules used
during the run are copied into the recorded exper-
iment. The allowed values of option are:
Value Meaning
on Archive the kernel modules.
off Do not archive the kernel modules into
the experiment.
copy Copy the kernel modules into the experi-
ment and archive them.
To copy experiments onto a different machine, or
read them from a different machine, the user
should specify -A copy.
The default setting for -A is copy.
-n Dry run: do not collect data, but print all the
details of the experiment that would be run. Turn
on -v.
-V Print the current version. No further arguments
are examined, and no further processing is done.
-v Print detailed information about the experiment
being run, including the current version.
SYSTEM-WIDE PROFILING
If the -F argument is used to specify following user
processes detected during an er_kernel experiment, a sub-
experiment for each such user process is created. The user
process will only record data when the process is in user
mode, and will record only the user callstack.
The subexperiments are named as follows:
_process-name_PID_process-pid.1.er
DATA RECORDED
Clock Profiling
Clock profiling experiments support two metrics,
labeled "KCPU Cycles" (metric name kcycles), for clock
profile events recorded in the kernel founder experi-
ment, and "KUCPU Cycles" (metric name kucycles) for
clock profile event recorded in user process sub-
experiments, when the CPU is in user-mode. Data is
recorded on a per-CPU basis, with the CPU number
recorded as the CPU, the PID of the process on behalf
of which the kernel is running recorded as the LWPID,
and the kernel thread ID recorded as thread in the raw
data.
The kernel founder experiment will contain data for the
kcycles metric. When the CPU is in system-mode, the
kernel callstacks are recorded; when the CPU is idle, a
single-frame callstack for the pseudo-function named:
<IDLE>
is recorded; when the CPU is in user-mode, a single-
frame callstack attributed to the pseudo-function
named:
<process-name_PID_process-pid>
is recorded. In the kernel experiment, no callstack
information on the user processes is recorded.
If -F is used to specify following user processes, the
subexperiments for each followed process will contain
data for the kucycles metric. User-level callstacks
will be recorded for all clock profile events where
that process was running in user mode.
Hardware Counter Profiling
Hardware counter profiles are recorded with the metric
for the named counter, using the system callstacks as
described above for clock-profiling experiments, in the
founder experiment, and user callstacks in the user-
process subexperiments.
NOTE: Because the same metric is used in both the
founder experiments, and the user-process subexperi-
ments, the HW counter metric will double count when the
CPU is in user mode. It will count against the user
callstacks in the user subexperiments, and against the
pseudo-function representing that process in the
founder experiment. To avoid the double counting,
filter the data to see only the kernel experiment or
only one or more user experiments.
PROFILING STATISTICS
When kernel profiling terminates, er_kernel will write
several lines of statistics for the driver, including any
counts for run time errors.
RUN TIME ERRORS
While er_kernel is running, it processes DTrace events.
Some of those events are delivered with inconsistent data.
Specifically, each event has the process PID in two places,
and they should be the same. However, for reasons not yet
understood, sometimes they are different. Such events are
recorded in the founder kernel experiment, against the
pseudo-function <INCONSISTENT_PID>. When these events
occur, er_kernel will also record an event in the subexperi-
ment corresponding to the PID in the reported user
callstack. The errors are also counted and, if verbose mode
is set, a message will be written to stderr.
DTrace also sometimes reports various errors. The most com-
mon of these is an invalid address, which appears to be
harmless. These errors are counted, and, if verbose mode is
set, they are logged to stderr.
The stack unwind done by DTrace may be incorrect, and, espe-
cially on x86/amd64 codes, may omit the caller of the
current leaf frame. These errors may occur on either the
kernel stack or the user stack.
SYSTEM SETUP FOR DTRACE
Normally, the DTrace driver is restricted to user root. To
use it as a regular user, username, that user must have
privileges assigned, and be in group sys.
To give privileges to the user, add a line:
username::::defaultpriv=basic,dtrace_kernel,dtrace_proc
to the file /etc/user_attr.
To put the user in group sys, add username to the sys line
in file /etc/group.
You must log out, and then log in again after making these
changes.
SEE ALSO
dtrace(1M) (Solaris 10 or later), analyzer(1), collect(1),
er_archive(1), er_cp(1), er_export(1), er_mv(1),
er_print(1), er_rm(1), er_src(1), and the Performance
Analyzer manual.