This document is the annex to the document Practical Introduction to Achieving Determinism. Readme File: Links to all the Java RTS technical documents Contents
IntroductionThis document is an annex that complements the guide A Practical Introduction to Achieving Determinism. The basic Practical Introduction guide demonstrates and explains some simple examples of using Java RTS to quickly and easily achieve determinism. The "Getting Started" package (GettingStarted.zip) includes a shell script that automatically runs all the predefined scenarios for the example programs. If you simply want to run the script and observe the results, then you do not need to read this annex document. This annex contains additional information for those of you who want to understand in more detail how the example programs work, how to make changes in the script or in the program code, or how to test your own code for determinism.
Description of the Example ProgramsThe example programs in
this set are Java applications to generate Fibonacci
numbers, sleep for a specified time or wait for the next (calculated) period, and optionally allocate
memory to stress the garbage collector. The programs create
instances of the Purpose of the ProgramsThe purpose of the example programs is to use system resources (CPU time, memory) in a controllable way and measure execution time. At the end of the execution, the time measurements and various statistical values are printed to the system output. The measurements show how much each iteration deviates from the previous, mean, or "reference" value because of jitter. In addition, the measurements show whether or not the deviation is bounded within an acceptable limit. The embedded measurement functions in the examples measure the variations in the execution time, providing a wealth of information. The last two figures in the output are the most important: jitter and standard deviation on execution time. See the section Output from the Example Programs. Structure of the ProgramsAll the example programs are structured in the same way:
Although the examples force early class initialization and early JIT compilation, jitter still occurs at the beginning of program execution. That is why, for all the scenarios for these particular examples, we consider each execution of the bench thread to be in its steady state only at the beginning of the tenth loop, when we know that JIT compilation has completed. There are three programs in the example set:
Various relevant Java HotSpot VM options can be used with the runs.
In addition, the runs that are executed by the Java RTS VM can use the Java RTS VM options. . Processing LogicEach of the three example programs is built
using almost the same schema. The The threads are started in the following order: JLT stress thread, JLT or RTT bench thread, and finally RTT stress thread, if any. The
A How Much Stress is Produced?For comparison purposes, you can estimate how much stress is produced in a given run by counting the number of times the Fibonacci numbers are calculated and the amount of memory that was allocated. The total number of times the Fibonacci numbers are calculated is equal to: (<nb_outer_iterations> * <nb_inner_iterations> * <nb_stress_class_iterations> * 2) The factor 2 is included because the Fibonacci method is called twice. The parameter <nb_stress_class_iterations> also specifies how many Fibonacci numbers to calculate, that is, the limit of the recursion. (This number is the value of "n" in the formula for Fibonacci numbers in the next subsection.) The total amount of memory, in bytes,
that is allocated for the two arrays ( (<nb_outer_iterations> * <nb_inner_iterations> * The factor 2 is included because the allocation method is called twice. The arrays are two-dimensional, where the first dimension equals <nb_stress_class_iterations> and the second dimension equals <allocation_array_size>. Therefore, <total_array_size> equals: (<nb_stress_class_iterations> * <allocation_array_size> * 8) The factor 8 represents 4 bytes for each We must also add the size of the two allocated array objects, which is constant but VM-dependent. Calculation of the Fibonacci NumbersFibonacci calculation is performed by two simple Fibonacci classes (Fibonacci1 and Fibonacci2), which implement a recursive calculation of Fibonacci numbers: F(n) = F(n-1) + F(n-2) for n greater than 2, otherwise F(n) = 1. Embedding of Measurement EquipmentRecording of measurement is done
inside
each outer loop and simply consists of getting the wall-clock
time value and putting it into an array cell. This is done at the
beginning and
the end of each outer loop. The overhead of this recording is thus
minimal and constant. Computing and displaying the output is
done outside the bench thread when all the threads have terminated, in
the Input Parameters for the Example ProgramsA script file is provided with the example programs. If you run this script as-is, without any changes, then you can skip this section. You will simply execute the example programs with preset parameter values and then examine the output. If you want to execute the programs individually, specifying various values for the input parameters, read this section for a detailed description of the parameters. The input parameters for the example programs are in the following format: <bench-thread-parameters> <stress-thread-parameters>+ <verbosity-flag> The parameters have the following meanings:
The <bench-thread-parameters> and the <stress-thread-parameters> are mandatory. The verbosity flag is optional; if it is not given, the output is a summary. It is possible to suppress the creation of a stress thread by specifying zero as the value of any one of the first three stress thread parameters, that is, the parameters that determine the number of iterations (inner loop, outer loop, and internal stress class). Thread ParametersThe list of bench thread parameters and the list of stress thread parameters have the same format, as follows: <nb_outer_iterations> <nb_inner_iterations> <nb_stress_class_iterations> [<allocation_array_size>] <pause_time> <thread_priority> The meanings of the parameters are as follows:
As an example, the following thread parameters can be specified for the NonDeterministic program or the GCDeterministic program: "1000 10 20 50 30 norm" These parameter values mean: I want a stress or bench thread that will run 1000 outer iterations, each of which with 10 inner iterations, stressing classes (Fibonacci calculations and garbage production) with 20 iterations, the allocation array size is 50, the pause time inside the outer loop will be 30 milliseconds, and the priority will be at the normal priority level (either real-time or not, depending on the type of thread). Note that the array size is not to be specified for the Deterministic program. Input for NonDeterministic ProgramIn this case the threads are JLTs, because the program must be executable on a vanilla VM. In addition there are at most two threads running: the bench thread and the stress thread. Since the program also stresses the GC activity, the <allocation_array_size> parameter is needed in the specification of the thread parameters. The parameters are in the format: <JLT-bench-thread-parameters> <JLT-stress-thread-parameters> <verbosity-flag> Here is an example of the parameters: "1000 20 10 500 100 max 1000 20 10 200 10 min verbose" (The line above has been shortened for printability.) All the parameters except the verbosity flag are mandatory. The script file contains an example of the thread parameters for the NonDeterministic program. Input for Deterministic ProgramThe Deterministic program does not allocate anything and so the GC does not have to collect anything during the run. Thus the <allocation_array_size> parameter is not provided in any of the three thread parameters for this thread. There can be at most one bench RTT thread, one stress RTT thread, and one stress JLT running concurrently in this program. The parameters for this program have the following format: <RTT-bench-thread-parameters> <RTT-stress-thread-parameters> <JLT-stress-thread-parameters> <verbosity-flag> Here is an example of the parameters: "1000 20 10 0 norm 1000 20 10 200 norm 1000 20 10 10 min summary" (The line above has been shortened for printability.) All the parameters except the verbosity flag are mandatory. The script file contains an example of the thread parameters for the Deterministic program. Input for GCDeterministic ProgramIn addition to the functions of the Deterministic program, GCDeterministic also produces garbage to stress the RTGC. Thus the <allocation_array_size> parameter has to be provided for any of the runs of GCDeterministic. There can be at most one bench RTT thread, one stress RTT thread, and one stress JLT running concurrently in this program. So the parameters for this program have the following format: <RTT-bench-thread-parameters> <RTT-stress-thread-parameters> <JLT-stress-thread-parameters> <verbosity-flag> Here is an example of the parameters: "1000 10 10 20 100 max 1000 10 10 200 0 norm 1000 10 10 800 0 min verbose" (The line above has been shortened for printability.) All the parameters except the verbosity flag are mandatory. The script file contains an example of the thread parameters for the GCDeterministic program. Output from the Example ProgramsThis section describes in detail the output from the example programs. A great deal of output can be generated by these programs. You can request shorter output by using the verbosity flag for the run. If the verbosity flag is set to verbose, the output will contain the following information:
If the verbosity flag is set to any value other than verbose, just a subset of the output is printed:
All wall-clock times and elapsed times are expressed in microseconds. The "reference" execution time value used in the output is defined for a given run as the elapsed time value that occurred the most often. If there are several different elapsed times which occurred the same number of times, the first one found is the one that is chosen as the reference. Reference value make sense when the run is very stable and most of the elapsed time values are identical. Otherwise the "mean" execution time value makes more sense for a run with widely varying values. Both the reference value and the mean value are used in the calculations in the output. The script file that is provided for running the recommended scenarios sets the VM option flag PrintGC (and sometimes also the VM option flag PrintCompilation) to "on." Therefore, GC or compilation activity is also displayed in the output at various places, mixed in with the program output. This is to show clearly where garbage collection or compilation occurs. When reading the example program output, take care to differentiate this kind of output from the output from the programs. Reminder of Parameters for the RunAs a convenient reminder, in addition to the output from the example programs, the script prints the command that was executed, including all the input parameters. Example of OutputThe following is an example of this part of the output on Solaris OS. /usr/sbin/psrset -e 1 /net/amos/mackinac/releases/b43a/binaries/ solaris-i486/bin/java -Drtsj.precompile=nhrt.precompile -Drtsj.preload=itc.preload -Xms64m -Xmx64m -XX:+ITCRT -XX:RTGCCriticalReservedBytes=10m -XX:+PrintGC -XX:+PrintGC -classpath /net/amos/mackinac/work/olagneau/ docs-pass/getting-started-hands-on/ apps/mypacking/GettingStarted/dist/ GettingStarted.jar GCDeterministic 1000 10 10 500 4 max 1000 10 10 300 0 min 0 10 10 10 0 norm verbose (The lines above have been shortened for printability.) Output Header for the RunThe header for each run consists of the following:
Example of OutputThe following is an example of this part of the output. --------------- Running GCDeterministic Program --------------- Note: All time measurements are expressed in microsecond units, unless otherwise indicated, for example ms (milliseconds) and ns (nanoseconds). Static initialization finished: ==> Stressing Fibonacci and GarbageProducer classes initialized and loaded Computing stress cost for bench thread Calculated computation time for bench: (0 ms, 692252 ns) Calculated period for bench: (4 ms, 692252 ns) Stress cost time for bench computed Initialization and configuration of program finished ============================================ GCDeterministic program parameters ============================================ The options used for this run are: ---------------------------------------- Realtime bench thread parameters: ================================= Number of outer loops: 1000 Number of inner iterations: 10 Number of stress class iterations: 10 Size of elementary array to allocate: 500 Entered pause time: 4 milliseconds Calculated period (stress cost + pause): (4 ms, 692252 ns) Priority level: max Effective priority assigned: 69 Realtime stress thread parameters: ================================= Number of outer loops: 1000 Number of inner iterations: 10 Number of stress class iterations: 10 Size of elementary array to allocate: 300 Entered pause time: 4 milliseconds Priority level: min Effective priority assigned: 11 Output detail level: detailed ---------------------------------------- Time Measurement ResultsThe next output is a list of the raw time measurement results for each iteration of the outer loop. This output is printed only if the verbosity flag is set for the run. This is a large amount of data that will allow you to observe the evolution of the behavior of the run. In addition, it is formatted to be easily imported into any spreadsheet or statistical analysis tool, allowing you perform any analysis on a given run. The first ten iterations are printed out with their execution time only. The rest of this list is as follows:
Example of OutputThe following is an example of this part of the output. ================================ Time measurement results: ================================ Loop_no: 0 | exec_time: 5361 | Loop_no: 1 | exec_time: 5213 | Loop_no: 2 | exec_time: 5245 | Loop_no: 3 | exec_time: 5300 | Loop_no: 4 | exec_time: 5189 | Loop_no: 5 | exec_time: 5186 | Loop_no: 6 | exec_time: 5245 | Loop_no: 7 | exec_time: 5256 | Loop_no: 8 | exec_time: 5209 | Loop_no: 9 | exec_time: 5178 | start time: 1215790305050079335 | end time: 1215790305055257658 | mean_exec: 5178 | delta_previous: -31 | mean_delta_previous:31 | delta_mean_exec: 0 | delta_ref: 87 | min_exec: 5178 | max_exec: 5178 | std_dev: 0.0 | exec_jitter: 0 | Loop_no: 10 | exec_time: 5235 | start time: 1215790305056771280 | end time: 1215790305062006112 | mean_exec: 5206 | delta_previous: 57 | mean_delta_previous:44 | delta_mean_exec: 29 | delta_ref: 144 | min_exec: 5178 | max_exec: 5235 | std_dev: 19.79 | exec_jitter: 57 | ..... Lines omitted ..... Loop_no: 999 | exec_time: 4986 | start time: 1215790305468070019 | end time: 1215790305473056011 | mean_exec: 5140 | delta_previous: -79 | mean_delta_previous:61 | delta_mean_exec: -154 | delta_ref: -105 | min_exec: 4913 | max_exec: 5551 | std_dev: 97.84 | exec_jitter: 638 | Noticeable Deviations Based on Previous IterationThis output is printed only if the verbosity flag is set for the run. In addition, if no noticeable deviations were found, nothing is printed in this section of the output. Not counting the first ten iterations, the execution time is given for iterations where the elapsed time difference values vary more than 500 microseconds from the preceding value. This is a sign of any unexpected deviation that could be jitter. The shorter the execution time of the outer loop, the more relevant this data. Such deviations may indicate an occurrence of GC activity, a jitter occurrence, or simply an instability of the program due to some other reason (such as waiting for a resource or I/O). Example of OutputThe following is an example of this part of the output. =================================================== Noticeable deviations: =================================================== For loop 49, execution time: 5551, deviation: 614 Summary of Deviations Based on Previous IterationThis output summarizes the deviation occurrences for this run:
Example of OutputThe following is an example of this part of the output. ============================================ Summary of deviations based on previous: ============================================ Nb noticeable positive deviations ( > 500 microseconds) = 1 Nb noticeable negative deviations ( < -500 microseconds) = 0 Nb max identical results = 3 for value 5122 (The lines above have been shortened for printability.) Percentage Report of DeviationsThis part of the output reports within which deviation range the deviations have been found, both related to reference value and to mean value. Thus there are two deviations reported. This provides a valuable distribution of the deviations. This output is a list of deviations grouped by interval, for which is given the percentage of occurrence against the total number of loops. This output is ordered by time slice, from 0 microseconds to 1 millisecond, where the time slices are fine-grained up to 100 microseconds (5, 10, 20, 40, 50, 75, 100) and then coarser-grained from 100 to 1000 microseconds (200, 300, 400, 500, 750, 1000). The last value is for all deviations above 1 millisecond. If the run is very deterministic, the reference value will make much sense, and so the deviations from this reference are the most relevant. Otherwise the deviations to consider are rather the deviations from the mean value. Example of OutputThe following is an example of this part of the output. ==================================================== Percentage report of deviations: ==================================================== -- deviations based on most frequent value 5091 --- Lower values below are exclusive, higher value are inclusive: 0 to 5 microseconds deviations: 71, 7.10 % 5 to 10 microseconds deviations: 70, 7.0 % 10 to 20 microseconds deviations: 98, 9.8 % 20 to 40 microseconds deviations: 185, 18.5 % 40 to 50 microseconds deviations: 78, 7.8 % 50 to 75 microseconds deviations: 97, 9.70 % 75 to 100 microseconds deviations: 84, 8.4 % 100 to 200 microseconds deviations: 186, 18.6 % 200 to 300 microseconds deviations: 97, 9.70 % 300 to 400 microseconds deviations: 6, 0.6 % 400 to 500 microseconds deviations: 1, 0.1 % 500 to 750 microseconds deviations: 0, 0.0 % 750 to 1000 microseconds deviations: 0, 0.0 % More than 1000 microseconds deviations: 0, 0.0 % -- deviations based on mean execution time 5140 --- Lower values below are exclusive, higher value are inclusive: 0 to 5 microseconds deviations: 49, 4.9 % 5 to 10 microseconds deviations: 57, 5.7 % 10 to 20 microseconds deviations: 105, 10.5 % 20 to 40 microseconds deviations: 175, 17.5 % 40 to 50 microseconds deviations: 88, 8.8 % 50 to 75 microseconds deviations: 193, 19.30 % 75 to 100 microseconds deviations: 66, 6.6 % 100 to 200 microseconds deviations: 193, 19.30 % 200 to 300 microseconds deviations: 57, 5.7 % 300 to 400 microseconds deviations: 2, 0.2 % 400 to 500 microseconds deviations: 1, 0.1 % 500 to 750 microseconds deviations: 0, 0.0 % 750 to 1000 microseconds deviations: 0, 0.0 % More than 1000 microseconds deviations: 0, 0.0 % Ordered List of ValuesThis output is printed only if the verbosity flag is set for the run. This output summarizes the elapsed times in decreasing order of the number of occurrences in the loop. The elapsed times are grouped by the number of occurrences, and are in decreasing order of deviation from the “reference” value within a group. The deviation to mean execution time value is also given. Example of OutputThe following is an example of this part of the output. ======================================= Ordered list of values: ======================================= Execution time 5091 occurred 18 times, ref deviation is 0 microseconds, mean deviation is -49 microseconds Execution time 5122 occurred 13 times, ref deviation is 31 microseconds, mean deviation is -18 microseconds Execution time 5121 occurred 12 times, ref deviation is 30 microseconds, mean deviation is -19 microseconds Execution time 5086 occurred 12 times, ref deviation is -5 microseconds, mean deviation is -54 microseconds ..... Lines omitted ..... Execution time 4921 occurred 1 times, ref deviation is -170 microseconds, mean deviation is -219 microseconds Execution time 4913 occurred 1 times, ref deviation is -178 microseconds, mean deviation is -227 microseconds (The lines above have been shortened for printability.) Summary of ResultsThis part of the output provides the most relevant information about the determinism of the run.
Example of OutputThe following is an example of this part of the output. ============================================ Summary of results: ============================================ Mean execution time: 5140 microseconds Best execution time: 4913 microseconds Worst execution time: 5551 microseconds Most frequent execution time: 5091, 312 occurrences Execution time jitter: 638 Standard deviation: 97.84 Testing Determinism With Your Own CodeRead this section if you want to execute your own code and test it for determinism. First, be sure to read Description of the Example Programs. The example programs use "stress classes" to put stress on the system resources. There are two stress classes to calculate Fibonacci numbers (Fibonacci1 and Fibonacci2), and two stress classes to allocate memory that will be garbage-collected (GarbageProducer1 and GarbageProducer2). The structure of the example programs allows you to easily plug in your own code as an additional stress class. In this way you can observe the deterministic behavior of the programs as they execute your code. This provides you with a good benchmark for testing parts of your code with Java RTS. This section describes in detail how to plug in your code and have it executed by the example programs. For the purpose of plugging in external code, a simple hook has been put in the source code:
Therefore, to plug in your own code, you just have to create a class that will be a wrapper above your code and which will implement the StressingObject interface. You also have to uncomment the programs inside their static initialization part and run() methods. You then recompile, and you have finished: the programs will now use the stress provided by your own code. Note, however, that the stress methods of the programs instantiate only one StressingObject. This is different from the Fibonacci and GarbageProducer stressing methods, which instantiate two Fibonacci objects and two garbage-producer objects, respectively. |