er_kernel args [load-command]
Solaris systems with DTrace supported
The er_kernel command can generate an experiment from the Solaris kernel, using the DTrace functionality provided with some Solaris releases. The data may be examined with a GUI program, analyzer, or a command-line version, er_print.
The er_kernel command may be used only by a user with DTrace privileges.
If a -F argument is given, er_kernel will also collect user-level data on those processes which match the pattern supplied as the argument to -F. See "SYSTEM-WIDE PROFILING", below, for more information.
If an optional command to provide a load is given, er_kernel forks, and the child sleeps for a quiet period, then executes the command to provide a load. When the child exits, er_kernel continues for another quiet period, and then exits. The duration of the quiet period may be specified by a -q argument. The load command is launched as specified, and may either be a command or a shell script. If it is a script, it should wait for any commands it spawns to terminate before exiting, or the experiment may be terminated prematurely.
If an optional -t argument is given, er_kernel will collect data according to the -t argument, and then exit.
If neither is specified, er_kernel will run until terminated. It may always be terminated by ctrl-C (SIGINT), or by using the kill command and sending SIGINT, SIGQUIT, or SIGTERM to the er_kernel process.
If invoked with no arguments, print a usage message.
If invoked with -h without any other arguments, and if the processor supports hardware counter overflow profiling, print two lists containing information about hardware counters. The first list contains "aliased" hardware counters; the second list contains raw hardware counters. For more details, see the "Hardware Counter Overflow Profiling" section in the collect (1) man page.
Collect clock-based profiles. The allowed values of option are:
turn off clock-based profiling
turn on clock-based profiling with the default profiling interval of approximately 10 milliseconds
turn on clock-based profiling with the low-resolution profiling interval of approximately 100 milliseconds
turn on clock-based profiling with the high-resolution profiling interval of approximately 1 millisecond
turn on clock-based profiling with a profiling interval of n.
The value may be an integer or floating-point number, with a suffix of u specifying microseconds, or m specifying milliseconds. If no suffix is used, the value will be assumed to be in milliseconds.
If the value is smaller than the system clock profiling minimum it is set to the minimum; if it is not a multiple of the clock profiling resolution it is rounded down to the nearest multiple of the clock profiling resolution. If it exceeds the clock profiling maximum, an error is reported. If it is negative, an error is reported. If it is zero, clock profiling is turned off.
The DTrace profile provider, used to obtain the data, is available only for integer values in ticks per second. The value specified will be converted to an integer rate, and then converted back to the time corresponding to the actual rate used.
If no explicit -p off argument is given, clock-based profiling is turned on by default. If -h high or If -h low is specified requesting the default counter set for that chip at high- or low-frequency, the default clock-profiling will also be set to high or low; an explicict -p argument will be respected.
Collect hardware-counter overflow profiles (using the DTrace cpc provider). The option is specified as for the collect(1) command. Hardware-counter profiling is not available on systems prior to Oracle Solaris 11. If the overflow mechanism on the chip allows the kernel to tell which counter overflowed, as many counters as the chip provides may be used; otherwise, only one counter may be specified.
Dataspace profiling is supported on SPARC systems running DTrace version 1.8 or later, only for precise counters. If requested on a system where it is not supported, the dataspace flag will be ignored, but the experiment will still be run.
The system hardware-counter mechanism can be used by multiple processes for user profiling, but can not be used for kernel profiling if any user process, or cputrack, or another er_kernel is using the mechanism. In that case, er_kernel will report "HW counters are temporarily unavailable; they may be in use for system profiling."
Control whether or not profile events from idle CPUs are recorded. The allowed values of option are:
Do not record profile events from idle CPUs. (Default).
Record profile events from idle CPUs.
Provide system-wide profiling, including the kernel and applications. Control whether or not profile data for user-level processes should be recorded. The allowed values of option are:
Do not record experiments on any user-level processes; record on kernel only (Default).
Record experiments on all user-level processes for which the er_kernel user has the appropriate permissions, as well as the kernel
Record experiments on user-level processes whose name or PID matches the regular-expression. For more information on the data recorded for user-level processes, see "SYSTEM-WIDE PROFILING", below. Note that the process name, as read from the /proc filesystem by er_kernel, is truncated by the OS to a maximum of 15 characters (plus a zero-byte). Patterns should be specified to match a process name so truncated.
Collect data for the specified duration. duration may be a single number, followed by either m, specifying minutes, or s, specifying seconds (default), or two such numbers separated by a - sign. If one number is given, data will be collected from the start of the run until the given time; if two numbers are given, data will be collected from the first time to the second. If the second time is zero, data will be collected until the end of the run. If two non-zero numbers are given, the first must be less than the second.
Enforce a quiet period of length duration (seconds) before and after running the specified load. Default duration is 3 seconds. The quiet period is ignored if no load is specified.
Collect periodic samples at the interval specified (in seconds). If interval is zero, do not collect periodic samples. By default, enable periodic sampling at 1-second intervals. The data recorded in the samples is data for the er_kernel process, and includes a timestamp and execution statistics from the kernel, among other things. Samples are markers within the data, and can be used for filtering.
Put the comment, either a single token, or a quoted string, into the experiment. Up to ten comments may be provided.
Use experiment_name as the name of the experiment to be recorded. The experiment_name string must end in the string .er; if it does not, report an error, and do not run the experiment.
If -o is not specified, choose a name of the form stem.n.er, where stem is a string, and n is a number. If a -g argument is given, use the string appearing before the .erg suffix in the group name as the stem prefix; if no -g argument is given, set the stem prefix to the string ktest.
If the name is not specified in the form stem.n.er, and the the given name is in use, print an error message and exit. If the name is of that form, and the name is in use, record the experiment under a name corresponding to the first available value of n that is not in use; issue a warning if the name is changed.
Record a sample point whenever the given signal is delivered to the er_kernel process.
Control recording of data with signal. Whenever the given signal is delivered to the er_kernel process, switch between paused (no data is recorded) and resumed (data is recorded) states. er_kernel is started in the resumed state if the optional ,r flag is given, otherwise it is started in the paused state. This option does not affect the recording of sample points.
Place the experiment in directory directory_name. if none is given, record into the current working directory.
Consider the experiment to be part of experiment group group_name. The group_name string must end in the string .erg; if not, report an error, and exit.
Limit the amount of profiling and tracing data recorded to size megabytes. The limit applies to the sum of all profiling data and tracing data, but not to sample points. The limit is only approximate, and can be exceeded. Terminate the experiment when the limit is reached. The allowed values of size are:
Do not impose a size limit on the experiment.
Impose a limit of n MB.; n must be positive and greater than zero.
There is no default limit on the amount of data recorded.
Control whether or not the kernel modules used during the run are copied into the recorded experiment. The allowed values of option are:
Archive the kernel modules and copy them into the experiment.
Copy and archive all source files and .anc files that can be found into the experiment.
Copy and archive all source files and .anc files that are referenced in the recorded data and can be found into the experiment.
Do not archive the kernel modules or source files into the experiment.
To copy experiments onto a different machine, or read them from a different machine, the user should specify -A on.
The default setting for -A is on.
Dry run: do not collect data, but print all the details of the experiment that would be run. Turn on -v.
Print the current version. No further arguments are examined, and no further processing is done.
Print detailed information about the experiment being run, including the current version.
If the -F argument is used to specify following user processes detected during an er_kernel experiment, a sub-experiment for each such user process is created. The sub-experiments are named as follows:
The user process sub-experiment will only record data when an event occurs for a followed process, in either user mode or system mode, and will record the user callstack. User sub-experiments are almost identical to user-mode collect experiments. For clock-profiling, they only record User-CPU-Time and System-CPU-Time; no wait times are recorded.
Processes may be followed only if the user running er_kernel has permission to open and read the /proc entry for the process to be followed. Note that the process name is truncated to 15 characters (plus a zero-byte) when read from /proc, so the pattern should be specified to match the truncated name.
When er_kernel is running in a global zone, user processes in other zones are not accessible, and may not be followed. The name of the pseudo-function for user-mode time wil be shown as noname-open.
Unlike collect, er_kernel data collection does not collect information on OpenMP runtime behavior, nor on Java runtime behavior and Java callstacks. Such user-level sub-experiments are comparable to collect experiments shown in machine mode. They do not have the data for the user mode displays in collect experiments.
Clock profiling kernel experiments support one metric labeled "Kernel CPU Time" (metric name kcpu), for clock profile events recorded in the kernel founder experiment. Data is recorded on a per-CPU basis, with the CPU number recorded as the CPU, the PID of the process on behalf of which the kernel is running recorded as the LWPID, and the kernel thread ID recorded as thread in the raw data.
The kernel founder experiment will contain data for the kcpu metric. When the CPU is in system-mode or idle, the kernel callstacks are recorded. When the CPU is in user-mode, a two-frame callstack with a top frame named:
calling a pseudo-function named:
corrseponding to the user process running. In the kernel experiment, no real callstack information on the user processes is recorded.
If -F is used to specify profiling user-level processes, the sub-experiments for each followed process will contain data for the User CPU Time and System CPU Time metrics, with user-level callstacks. They do not contain data for any wait time.
Hardware counter profiles are recorded with the metric for the named counter, using the system callstacks as described above for clock-profiling experiments, in the founder experiment, and user callstacks in the user-process sub-experiments.
For the founder kernel experiment, HW counter metric names will be prefaced by K_. For user-process sub-experiments, the HW counter metric names are as given, just as in collect experiments.
For dataspace profiling, the founder kernel experiment will have the data references in them for events occuring when the CPU is in system mode. Events ocurring when the CPU is in either User or Idle mode will not show any recorded data addresses.
If -F is used to specify following user processes, sub-experiments for each followed process will be recorded as if dataspace profiling was not specified. (The DTrace provider only provides addresses for kernel data references.)
When kernel profiling terminates, er_kernel will write several lines of statistics for the driver, including any counts for run time errors.
While er_kernel is running, it processes DTrace events. DTrace sometimes reports various errors. The most common of these is an invalid address, which appears to be harmless. These errors are counted, and, if verbose mode is set, they are logged to stderr.
The stack unwind done by DTrace may be incorrect, and, especially on x86/amd64 codes, may omit the caller of the current leaf frame when the process is in the epilogue of a function. These errors may occur in either the kernel stack or the user stack.
Normally, the DTrace driver is restricted to user root. To use it as a regular user, username, that user must have privileges assigned, and be in group sys.
To give privileges to the user, add a line:
to the file /etc/user_attr.
To put the user in group sys, add username to the sys line in file /etc/group.
You must log out, and then log in again after making these changes.
er_kernel profiling does not work in a non-global zone. er_kernel profiling is disabled in guest OS's running under x86 OVM, to avoid triggering OVM bug 13094572 which can cause a reboot. er_kernel HW counter profiling does work not on Solaris systems earlier than Solaris 11.
Performance Analyzer manual