Man Page er_kernel.1




NAME

     er_kernel - generate an Analyzer experiment on  the  Solaris
     kernel


SYNOPSIS

     er_kernel args [load-command]


AVAILABILITY

     Solaris systems with DTrace supported


DESCRIPTION

     The er_kernel command can generate	an  experiment	from  the
     Solaris kernel, using the DTrace functionality provided with
     some Solaris releases.  The data may be examined with a  GUI
     program, analyzer,	or a command-line version, er_print.

     The er_kernel command may be used only by a user with DTrace
     privileges.

     If	an optional command to provide a load is given,	er_kernel
     forks,  and  the  child sleeps for	a quiet	period,	then exe-
     cutes the command to provide a load.  When	the child  exits,
     er_kernel	continues  for	another	 quiet	period,	 and then
     exits.  The duration of the quiet period may be specified by
     a	-q  argument.  The load	command	is launched as specified,
     and may either be a command or a shell script.  If	it  is	a
     script,  it  should  wait for any commands	it spawns to ter-
     minate before exiting, or the experiment may  be  terminated
     prematurely.

     If	an optional -t argument	is given, er_kernel will  collect
     data according to the -t argument,	and then exit.

     If	neither	is  specified,	er_kernel  will	 run  until  ter-
     minated.  It may always be	terminated by ctrl-C (SIGINT), or
     by	using the kill command and sending  SIGINT,  SIGQUIT,  or
     SIGTERM to	the er_kernel process.


ARGUMENTS

     If	invoked	with no	arguments, print a usage message.

     If	invoked	with -h	without	any other arguments, and  if  the
     processor	supports  hardware  counter  overflow  profiling,
     print  two	 lists	containing  information	 about	 hardware
     counters.	  The  first  list  contains  "aliased"	 hardware
     counters; the second list contains	 raw  hardware	counters.
     For more details, see the "Hardware Counter Overflow Profil-
     ing" section in the collect.1 man page.

     -p	option
	  Collect clock-based profiles.	 The  allowed  values  of
	  option are:
	  Value	    Meaning

	  off	    turn off clock-based profiling

	  on	    turn  on  clock-based  profiling   with   the
		    default  profiling	interval of approximately
		    10 milliseconds

	  lo[w]	    turn on clock-based	profiling with	the  low-
		    resolution	profiling  interval  of	 approxi-
		    mately 100 milliseconds

	  hi[gh]    turn on clock-based	profiling with the  high-
		    resolution	profiling  interval  of	 approxi-
		    mately 1 millisecond

	  n	    turn on clock-based	profiling with a  profil-
		    ing	interval of n.

		    The	value may be an	integer	or floating-point
		    number,   with   a	suffix	of  u  specifying
		    microseconds, or m	specifying  milliseconds.
		    If	no  suffix  is	used,  the  value will be
		    assumed to be in milliseconds.


	  If the value is smaller than the system clock	profiling
	  minimum it is	set to the minimum; if it is not a multi-
	  ple of the clock profiling  resolution  it  is  rounded
	  down	to  the	 nearest  multiple of the clock	profiling
	  resolution.  If it exceeds the clock profiling maximum,
	  an  error  is	reported.  If it is negative, an error is
	  reported.  If	it is zero,  clock  profiling  is  turned
	  off.

	  The DTrace profile provider, used to obtain  the  data,
	  is  available	 only  for  integer  values  in	ticks per
	  second.  The value specified will be	converted  to  an
	  integer  rate,  and  then  converted	back  to the time
	  corresponding	to the actual rate used.

	  If no	explicit -p off	argument is given, and	hardware-
	  counter  overflow  profiling	is  not	turned on, clock-
	  based	profiling is turned on by default.


     -h	option Collect hardware-counter	overflow profiles  (using
	       the DTrace cpc provider).  The option is	specified
	       as for the collect command.  Hardware-counter pro-
	       filing is not available on systems prior	to Oracle
	       Solaris 11.  If the overflow mechanism on the chip
	       allows	the   kernel   to   tell   which  counter
	       overflowed, as many counters as the chip	 provides
	       may  be	used;  otherwise, only one counter may be
	       specified.  Dataspace profiling is not  supported,
	       and dataspace requests are ignored.

	       The system hardware-counter mechanism can be  used
	       by  multiple processes for user profiling, but can
	       not be used for kernel profiling	if any user  pro-
	       cess,  or  cputrack, or another er_kernel is using
	       the  mechanism.	 In  that  case,  er_kernel  will
	       report  "HW  counter profiling is not supported on
	       this system."


     -F	option Provide system-wide profiling, including	the  ker-
	       nel and applications.  Control whether or not des-
	       cendant processes should	have their data	recorded.
	       The allowed values of option are:

	       Value	 Meaning

	       off	 Do not	record experiments on application
			 processes;   record   on   kernel   only
			 (Default).

	       on	 Record	experiments  on	 all  application
			 processes as well as the kernel

	       all	 Record	experiments  on	 all  application
			 processes as well as the kernel

	       =<regexp> Record	experiments  on	 processes  whose
			 name	or   PID   matches  the	 regular-
			 expression.   See  "SYSTEM-WIDE  PROFIL-
			 ING", below.


     -T	       { pid/tid | 0/did }
	       -T is no	longer supported.


     -t	duration
	       Collect data for	the specified duration.	 duration
	       may  be	a  single  number,  followed by	either m,
	       specifying  minutes,  or	 s,  specifying	  seconds
	       (default),  or  two  such numbers separated by a	-
	       sign.  If one number is given, data will	 be  col-
	       lected  from  the start of the run until	the given
	       time; if	two numbers are	given, data will be  col-
	       lected  from the	first time to the second.  If the
	       second time is zero, data will be collected  until
	       the  end	 of the	run.  If two non-zero numbers are
	       given, the first	must be	less than the second.

     -q	duration
	       Enforce	a  quiet  period   of	length	 duration
	       (seconds)  before  and after running the	specified
	       load.  Default duration is 3 seconds.   The  quiet
	       period is ignored if no load is specified.

     -S	interval
	       Collect periodic	samples	at the interval	specified
	       (in seconds).  If interval is zero, do not collect
	       periodic	samples.   By  default,	 enable	 periodic
	       sampling	at 1-second intervals.	The data recorded
	       in the samples is data for the er_kernel	 process,
	       and  includes a timestamp and execution statistics
	       from the	kernel,	among other things.  Samples  are
	       markers	within	the  data,  and	 can  be used for
	       filtering.


     -C	comment
	       Put the comment,	 either	 a  single  token,  or	a
	       quoted  string,	into  the  experiment.	Up to ten
	       comments	may be provided.

     -o	experiment_name
	       Use experiment_name as the name of the  experiment
	       to  be  recorded.  The experiment_name string must
	       end in the string .er; if not,  report  an  error,
	       and do not run the experiment.

	       If -o is	not specified, choose a	name of	the  form
	       stem.n.er,  where  stem	is  a  string, and n is	a
	       number.	If a -g	argument is given, use the string
	       appearing before	the .erg suffix	in the group name
	       as the stem prefix; if no -g  argument  is  given,
	       set the stem prefix to the string ktest.

	       If  the	name  is  not  specified  in   the   form
	       stem.n.er, and the the given name is in use, print
	       an error	message	and do not  run	 experiment.   If
	       the  name is of that form, and the name is in use,
	       record the experiment under a  name  corresponding
	       to  the	first available	value of n that	is not in
	       use; issue a warning if the name	is changed.

     -l	signal Record a	sample point whenever the given	signal is
	       delivered to the	er_kernel process.

     -y	signal[,r]
	       Control recording of data with  signal.	 Whenever
	       the  given  signal  is  delivered to the	er_kernel
	       process,	 switch	 between  paused  (no	data   is
	       recorded)  and  resumed (data is	recorded) states.
	       er_kernel is started in the resumed state  if  the
	       optional	,r flag	is given, otherwise it is started
	       in the paused state.  This option shall not affect
	       the recording of	sample points.

     -d	directory_name
	       Place the experiment in directory  directory_name.
	       if  none	is given, record into the current working
	       directory.

     -g	group_name
	       Consider	the experiment to be part  of  experiment
	       group  group_name.  The group_name string must end
	       in the string .erg; if not, report an  error,  and
	       do not the experiment.

     -L	size   Limit the amount	of  profiling  and  tracing  data
	       recorded	 to size megabytes.  The limit applies to
	       the sum of all profiling	data  and  tracing  data,
	       but  not	 to  sample  points.  The  limit  is only
	       approximate, and	can be exceeded.   Terminate  the
	       experiment when the limit is reached.  The allowed
	       values of size are:

	       Value	 Meaning

	       unlimited or none
			 Do not	impose a size limit on the exper-
			 iment

	       n	 Impose	a limit	of n MB.; n must be posi-
			 tive and greater than zero.
	  There	is  no	default	 limit	on  the	 amount	 of  data
	  recorded.

     -A	option Control whether or not  the  kernel  modules  used
	       during the run are copied into the recorded exper-
	       iment.  The allowed values of option are:

	       Value	 Meaning

	       on	 Archive the kernel modules.

	       off	 Do not	archive	the kernel  modules  into
			 the experiment.

	       copy	 Copy the kernel modules into the experi-
			 ment and archive them.

	       To copy experiments onto	a different  machine,  or
	       read  them  from	 a  different  machine,	 the user
	       should specify -A copy.

	       The default setting for -A is copy.

     -n	       Dry run:	do not collect data, but  print	 all  the
	       details of the experiment that would be run.  Turn
	       on -v.

     -V	       Print the current version.  No  further	arguments
	       are examined, and no further processing is done.

     -v	       Print detailed information  about  the  experiment
	       being run, including the	current	version.



SYSTEM-WIDE PROFILING

     If	the  -F	 argument  is  used  to	 specify  following  user
     processes	detected  during  an er_kernel experiment, a sub-
     experiment	for each such user process is created.	The  user
     process  will  only  record data when the process is in user
     mode, and will record only	the user callstack.
     The subexperiments	are named as follows:
	  _process-name_PID_process-pid.1.er



DATA RECORDED

     Clock Profiling
	  Clock	 profiling  experiments	 support   two	 metrics,
	  labeled  "KCPU Cycles" (metric name kcycles),	for clock
	  profile events recorded in the kernel	 founder  experi-
	  ment,	 and  "KUCPU  Cycles"  (metric name kucycles) for
	  clock	profile	 event	recorded  in  user  process  sub-
	  experiments,	when  the  CPU	is in user-mode.  Data is
	  recorded on  a  per-CPU  basis,  with	 the  CPU  number
	  recorded  as	the CPU, the PID of the	process	on behalf
	  of which the kernel is running recorded as  the  LWPID,
	  and  the kernel thread ID recorded as	thread in the raw
	  data.

	  The kernel founder experiment	will contain data for the
	  kcycles  metric.   When  the CPU is in system-mode, the
	  kernel callstacks are	recorded; when the CPU is idle,	a
	  single-frame callstack for the pseudo-function named:
		    <IDLE>
	  is recorded; when the	CPU is in  user-mode,  a  single-
	  frame	  callstack  attributed	 to  the  pseudo-function
	  named:
		    <process-name_PID_process-pid>
	  is recorded.	In the kernel  experiment,  no	callstack
	  information on the user processes is recorded.

	  If -F	is used	to specify following user processes,  the
	  subexperiments  for  each followed process will contain
	  data for the kucycles	 metric.   User-level  callstacks
	  will	be  recorded  for  all clock profile events where
	  that process was running in user mode.


     Hardware Counter Profiling
	  Hardware counter profiles are	recorded with the  metric
	  for  the  named counter, using the system callstacks as
	  described above for clock-profiling experiments, in the
	  founder  experiment,	and  user callstacks in	the user-
	  process subexperiments.

	  NOTE:	Because	the same  metric  is  used  in	both  the
	  founder  experiments,	 and  the user-process subexperi-
	  ments, the HW	counter	metric will double count when the
	  CPU  is  in  user mode.  It will count against the user
	  callstacks in	the user subexperiments, and against  the
	  pseudo-function   representing   that	 process  in  the
	  founder experiment.	To  avoid  the	double	counting,
	  filter  the  data  to	see only the kernel experiment or
	  only one or more user	experiments.



PROFILING STATISTICS

     When  kernel  profiling  terminates,  er_kernel  will  write
     several  lines  of	 statistics for	the driver, including any
     counts for	run time errors.



RUN TIME ERRORS

     While er_kernel is	 running,  it  processes  DTrace  events.
     Some  of  those events are	delivered with inconsistent data.
     Specifically, each	event has the process PID in two  places,
     and  they	should be the same.  However, for reasons not yet
     understood, sometimes they	are different.	Such  events  are
     recorded  in  the	founder	 kernel	 experiment,  against the
     pseudo-function  <INCONSISTENT_PID>.   When   these   events
     occur, er_kernel will also	record an event	in the subexperi-
     ment  corresponding  to  the  PID	in  the	  reported   user
     callstack.	 The errors are	also counted and, if verbose mode
     is	set, a message will be written to stderr.

     DTrace also sometimes reports various errors.  The	most com-
     mon  of  these  is	 an  invalid address, which appears to be
     harmless.	These errors are counted, and, if verbose mode is
     set, they are logged to stderr.

     The stack unwind done by DTrace may be incorrect, and, espe-
     cially  on	 x86/amd64  codes,  may	 omit  the  caller of the
     current leaf frame.  These	errors may occur  on  either  the
     kernel stack or the user stack.



SYSTEM SETUP FOR DTRACE

     Normally, the DTrace driver is restricted to user root.   To
     use  it  as  a  regular  user, username, that user	must have
     privileges	assigned, and be in group sys.

     To	give privileges	to the user, add a line:
	       username::::defaultpriv=basic,dtrace_kernel,dtrace_proc
     to	the file /etc/user_attr.

     To	put the	user in	group sys, add username	to the	sys  line
     in	file /etc/group.

     You must log out, and then	log in again after  making  these
     changes.


SEE ALSO

     dtrace(1M)	(Solaris 10 or later),	analyzer(1),  collect(1),
     er_archive(1),	er_cp(1),     er_export(1),	er_mv(1),
     er_print(1),  er_rm(1),  er_src(1),  and	the   Performance
     Analyzer manual.