Go to main content
Oracle Developer Studio 12.5 Man Pages

Exit Print View

Updated: June 2017
 
 

er_kernel(1)

Name

er_kernel - generate a Performance Analyzer experiment on the Oracle Solaris kernel

Synopsis

er_kernel args [load-command]

AVAILABILITY

Oracle Solaris systems with DTrace supported

Description

The er_kernel command can generate an experiment from the Oracle Solaris kernel, using the DTrace functionality provided with some Oracle Solaris releases. The data can be examined with Performance Analyzer (analyzer) or er_print.

The er_kernel command can be used only by a user with DTrace privileges. See the "SYSTEM SETUP FOR DTRACE" section below for more information.

If the -F argument is specified, er_kernel also collects user-level data on those processes that match the pattern supplied as the argument to -F. See "SYSTEM-WIDE PROFILING", below, for more information.

If an optional command to provide a load is given, er_kernel forks, and the child sleeps for a quiet period, then executes the command to provide a load. When the child exits, er_kernel continues for another quiet period, and then exits. The duration of the quiet period can be specified by the -q argument. The load command is launched as specified, and can either be a command or a shell script. If it is a script, it should wait for any commands it spawns to terminate before exiting, or the experiment might be terminated prematurely.

If the optional -t argument is given, er_kernel collects data for a period of time according to the -t argument and then exits.

If neither -q nor -t is specified, er_kernel runs until terminated. You can always terminate it by pressing Ctrl-C (SIGINT), or by using the kill command and sending SIGINT, SIGQUIT, or SIGTERM to the er_kernel process.

If no -F argument is given, but a load is specified, er_kernel collects user-level data on that load.

ARGUMENTS

If invoked with no arguments, print a usage message.

If invoked with -h without any other arguments, and if the processor supports hardware counter overflow profiling, print two lists containing information about hardware counters. The first list contains "aliased" hardware counters; the second list contains raw hardware counters. For more details, see the "Hardware Counter Overflow Profiling" section in the collect (1) man page.

-p option

Collect clock-based profiles. The allowed values of option are:

off

Turn off clock-based profiling.

lo[w]

Turn on clock-based profiling with a per-thread rate of approximately 10 samples per second.

on

Turn on clock-based profiling with a per-thread rate of approximately 100 samples per second.

hi[gh]

Turn on clock-based profiling with a per-thread rate of approximately 1000 samples per second.

n

Turn on clock-based profiling with a profiling interval of n.

The value may be an integer or floating-point number, with a suffix of u specifying microseconds, or m specifying milliseconds. If no suffix is used, the value is assumed to be in milliseconds.

If the value is smaller than the system clock profiling minimum it is set to the minimum; if it is not a multiple of the clock profiling resolution it is rounded down to the nearest multiple of the clock profiling resolution. If it exceeds the clock profiling maximum, an error is reported. If it is negative, an error is reported. If it is zero, clock profiling is turned off.

The DTrace profile provider, used to obtain the data, is available only for integer values in ticks per second. The value specified is converted to an integer rate, and then converted back to the time corresponding to the actual rate used.

If no explicit -p off argument is given, clock-based profiling is turned on by default. If -h high or If -h low is specified requesting the default counter set for that chip at high- or low-frequency, the default clock-profiling is set to high or low; an explicit -p argument will be respected.

-h option

Collect hardware-counter overflow profiles (using the DTrace cpc provider). The option is specified as for the collect(1) command. Hardware-counter profiling is not available on systems prior to Oracle Solaris 11. If the overflow mechanism on the chip allows the kernel to tell which counter overflowed, as many counters as the chip provides may be used; otherwise, only one counter may be specified.

Dataspace profiling is supported on SPARC systems running DTrace version 1.8 or later, only for precise counters. If requested on a system where it is not supported, the dataspace flag is ignored, but the experiment still runs.

The system hardware-counter mechanism can be used by multiple processes for user profiling, but can not be used for kernel profiling if any user process, or cputrack, or another er_kernel is using the mechanism. In that case, er_kernel reports "HW counters are temporarily unavailable; they may be in use for system profiling."

-x option

Control whether or not profile events from idle CPUs are recorded. The allowed values of option are:

on

Do not record profile events from idle CPUs. (Default).

off

Record profile events from idle CPUs.

-F option

Provide system-wide profiling, including the kernel and applications. Control whether or not profile data for user-level processes should be recorded. The allowed values of option are:

off

Do not record experiments on any user-level processes; record on kernel only. This is the default if no -F is given and no load is specified. If a load is specified and no -F is given, an experiment is recorded on that load. If -F is given, the load is followed only if it matches the-F specification.

on | all

Record experiments on all user-level processes for which the er_kernel user has the appropriate permissions, as well as the kernel.

Note that specifying –F on might record a very large number of user-level experiments which can use a lot of disk space and take a very long time to archive. In most cases, it is preferable to use –F =<regexp> and only collect user-level data for processes of interest, as named.

=<regexp>

Record experiments on user-level processes whose name or PID matches the regular-expression. For more information on the data recorded for user-level processes, see "SYSTEM-WIDE PROFILING" below. Note that the process name, as read from the /proc filesystem by er_kernel, is truncated by the OS to a maximum of 15 characters (plus a zero-byte). Patterns should be specified to match a process name so truncated.

-t duration

Collect data for the specified duration. duration may be a single number, followed by either m, specifying minutes, or s, specifying seconds (default), or two such numbers separated by a - sign. If one number is given, data is collected from the start of the run until the given time; if two numbers are given, data will be collected from the first time to the second. If the second time is zero, data is collected until the end of the run. If two non-zero numbers are given, the first must be less than the second.

-q duration

Enforce a quiet period of length duration (seconds) before and after running the specified load. Default duration is 3 seconds. The quiet period is ignored if no load is specified.

-S interval

Periodically sample process-wide resource utilization at the interval specified (in seconds). The allowed values of interval are:

off

Turn off periodic sampling.

on

Turn on periodic sampling with the default sampling interval (1 second).

n

Turn on periodic sampling with a sampling interval of n in seconds; n must be positive.

-b ubufsize

Tells DTrace to use a buffer size of ubufsize. Default buffer size is 1M. The user-supplied argument is passed directly to DTrace, and is not checked for syntax errors. See the dtrace (1M) man page for more information.

-C comment

Put the comment, either a single token, or a quoted string, into the experiment. Up to ten comments may be provided.

-o experiment_name

Use experiment_name as the name of the experiment to be recorded. The experiment_name string must end in the string .er; if it does not, report an error, and do not run the experiment.

If -o is not specified, choose a name of the form stem.n.er, where stem is a string, and n is a number. If a -g argument is given, use the string appearing before the .erg suffix in the group name as the stem prefix; if no -g argument is given, set the stem prefix to the string ktest.

If the name is not specified in the form stem.n.er, and the given name is in use, print an error message and exit. If the name is of that form, and the name is in use, record the experiment under a name corresponding to the first available value of n that is not in use; issue a warning if the name is changed.

-l signal

Sample the resource utilization of the er_kernel process whenever the given signal is delivered to the er_kernel process.

-y signal[,r]

Control recording of data with signal. Whenever the given signal is delivered to the er_kernel process, switch between paused (no data is recorded) and resumed (data is recorded) states. er_kernel is started in the resumed state if the optional ,r flag is given, otherwise it is started in the paused state. This option does not affect the recording of process-wide resource-utilization samples.

-d directory_name

Place the experiment in directory directory_name. if none is given, record into the current working directory.

-g group_name

Consider the experiment to be part of experiment group group_name. The group_name string must end in the string .erg; if not, report an error, and exit.

–O file

Append all output from er_kernel itself to the named file, but do not redirect the output from the spawned load. If file is set to /dev/null, suppress all output from er_kernel, including an error messages.

-L size

Limit the amount of profiling and tracing data recorded to size megabytes. The limit applies to the sum of all profiling data and tracing data, but not to process-wide resource-utilization samples. The limit is only approximate, and can be exceeded. Terminate the experiment when the limit is reached. The allowed values of size are:

unlimited | none

Do not impose a size limit on the experiment.

n

Impose a limit of n MB.; n must be positive and greater than zero.

There is no default limit on the amount of data recorded.

-A option

Control whether to perform archiving as part of data collection. Archiving is required to make an experiment self-contained and portable. The allowed values of option are:

on

Copy load objects (the target and any shared objects it uses) into the experiment. Also copy any ancillary files (.anc) and object files (.o) which have Stabs or DWARF information not in the load object.

src

In addition to copying load objects as in –A on, copy into the experiment all source files and ancillary files (.anc) that can be found.

usedsrc

Similar to –A src, but only copy source files, ancillary files (.anc), and load objects that are needed for analytics and can be found. This option might require additional processing time, but can result in smaller experiment sizes.

off

Do not copy or archive load objects or source files into the experiment.

The minimum archiving required that enables an experiment to be accessed on another machine is –A on. When using this option, note that on does not copy any sources or object files (.o's); it is your responsibility to ensure that those files are accessible from the machine where the experiment is being examined, and that they are not changed or rebuilt after the experiment was recorded.

The default setting is on.

-n

Dry run: do not collect data, but print all the details of the experiment that would be run. Turn on -v.

-V

Print the current version. No further arguments are examined, and no further processing is done.

-v

Print detailed information about the experiment being run, including the current version.

SYSTEM-WIDE PROFILING

If the -F argument is used to specify following user processes detected during an er_kernel experiment, a sub-experiment for each such user process is created.

If no -F is given but a load is specified, a sub-experiment for the load process is recorded.

The sub-experiments when -F is used are named as follows:

_process-name_PID_process-pid.1.er

The user process sub-experiment only records data when an event occurs for a followed process, in either user mode or system mode, and records the user callstack. User sub-experiments are almost identical to user-mode collect experiments. For clock-profiling, they only record User-CPU-Time and System-CPU-Time; no wait times are recorded.

Processes may be followed only if the user running er_kernel has permission to open and read the /proc entry for the process to be followed. Note that the process name is truncated to 15 characters (plus a zero-byte) when read from /proc, so the pattern should be specified to match the truncated name.

When er_kernel is running in a global zone, user processes in other zones are not accessible, and may not be followed. The name of the pseudo-function for user-mode time wil be shown as noname-open.

Unlike collect, er_kernel data collection does not collect information on OpenMP runtime behavior, nor on Java runtime behavior and Java callstacks. Such user-level sub-experiments are comparable to collect experiments shown in machine mode. They do not have the data for the user mode displays in collect experiments.

DATA RECORDED

Clock Profiling

Clock profiling kernel experiments support one metric labeled "Kernel CPU Time" (metric name kcpu), for clock profile events recorded in the kernel founder experiment. Data is recorded on a per-CPU basis, with the CPU number recorded as the CPU, the PID of the process on behalf of which the kernel is running recorded as the LWPID, and the kernel thread ID recorded as thread in the raw data.

The kernel founder experiment contains data for the kcpu metric. When the CPU is in system-mode or idle, the kernel callstacks are recorded. When the CPU is in user-mode, a two-frame callstack with a top frame named:

<USER_MODE>

calling a pseudo-function named:

<process-name_PID_process-pid>

corresponding to the user process running. In the kernel experiment, no real callstack information on the user processes is recorded.

If -F is used to specify profiling user-level processes, the sub-experiments for each followed process contain data for the User CPU Time and System CPU Time metrics, with user-level callstacks. They do not contain data for any wait time.

Hardware Counter Profiling

Hardware counter profiles are recorded with the metric for the named counter, using the system callstacks as described above for clock-profiling experiments, in the founder experiment, and user callstacks in the user-process sub-experiments.

For the founder kernel experiment, HW counter metric names are prefaced by K_. For user-process sub-experiments, the HW counter metric names are as given, just as in collect experiments.

For dataspace profiling, the founder kernel experiment has the data references for events occurring when the CPU is in system mode. Events occurring when the CPU is in either User or Idle mode do not show any recorded data addresses.

If -F is used to specify following user processes, sub-experiments for each followed process are recorded as if dataspace profiling was not specified. (The DTrace provider only provides addresses for kernel data references.)

PROFILING STATISTICS

When kernel profiling terminates, er_kernel writes several lines of statistics for the driver, including any counts for run time errors.

RUN TIME ERRORS

While er_kernel is running, it processes DTrace events. DTrace sometimes reports various errors. The most common of these is an invalid address, which appears to be harmless. These errors are counted, and, if verbose mode is set, they are logged to stderr.

The stack unwind done by DTrace may be incorrect, and, especially on x86/amd64 codes, may omit the caller of the current leaf frame when the process is in the epilogue of a function. These errors may occur in either the kernel stack or the user stack.

SYSTEM SETUP FOR DTRACE

Normally, the DTrace driver is restricted to user root. To use it as a regular user, username and that user must have privileges assigned.

To give privileges to the user, add a line:

username::::defaultpriv=basic,dtrace_kernel,dtrace_proc,dtrace_user

to the file /etc/user_attr.

You must log out, and then log in again after making these changes.

LIMITATIONS

er_kernel profiling does not work in a non-global zone. er_kernel profiling is disabled in guest OS's running under x86 Oracle VM, to avoid triggering Oracle VM bug 13094572 which can cause a reboot. er_kernel HW counter profiling does work not on Oracle Solaris systems earlier than Oracle Solaris 11.

See Also

analyzer (1) , collect (1) , er_archive (1) , er_cp (1) , er_export (1) , er_mv (1) , er_print (1) , er_rm (1) , er_src (1) , dtrace (1M)

Performance Analyzer manual