Thedbwriter component writes and deletes the reporting data in the reporting database. It performs the following tasks:
Reads raw data from reporting files and writes this raw data into the reporting database.
Calculates derived values. You can configure which values are calculated and the rules of how to calculate them.
Deletes outdated data. You can configure how long to keep the data.
The sge_qmaster component generates the reporting files. You can configure the generation of the reporting files, see the attribute reporting_params in the man page sge_conf(5), and the attribute report_variables in the man page sge_host(5).
The installation procedure sets up these parameters. A script for starting up the dbwriter is provided with the reporting module. Please see the N1 Grid Engine 6 Installation Guide for details. The following parameters have to be set for dbwriter.
The dbwriter can connect to different brands of database systems (supported systems are PostgreSQL and Oracle). The following parameters have to be set:
DRIVER — to the name of the JDBC driver to use, for example, org.postgresql.Driver
DRIVERJAR — to the jar archive containing the JDBC driver, for example, lib/postgres43.jar.
Which database on which host to use is set by configuring the JDBC URL: The URL parameter is set to the JDBC URL of the database to use. Follow the guidelines of the database vendor for the syntax to use. For, example for a PostgreSQL database: jdbc:postgresql://<hostname>:5432/arco
The path where dbwriter will find reporting files is set in the REPORTING_FILE variable. The base directory is typically set to $SGE_ROOT/$SGE_CELL/common
The task of setting up these parameters will be done by the installation procedure. You can configure the behavior of the dbwriter with a number of command line parameters.
The -interval parameter sets the interval, in which dbwriter looks for new reporting files. If a reporting file is found, it is read and data is written to the reporting database.
The -pid parameter defines the path to the pid file. The dbwriter writes at startup a pid file. This contains the process id of the dbwriter. At shutdown of the dbwriter this file will be deleted.
The -pidCmd parameter defines a command which will be executed by the dbwriter to determine it's process id. This command should print it's parent process id to stdout. The dbwriter is a java application. The java virtual machine cannot determine its own process id. The default value of the pidCmd is $SGE_ROOT/utilbin/$ARCH/checkprog -ppid.
The -continuous parameter switches on the continuous mode. Without -continuous, dbwriter will perform its tasks just once. If continuous mode is switched on, it will run continuously and perform its tasks in each interval set with the -interval switch.
You can use the -debug option to configure the amount of messages output by dbwriter. A parameter to the -debug option is the debug level. In general, using the default debug level (-info) should be the preferred choice. Using more verbose debug levels greatly increases the amount of data output by dbwriter. You can specify the following debug levels:
-warning: Display only severe errors and warnings
info: Add a number of informational messages. This level is the default, if the -display switch isn't used
-config: Give additional information that is related to dbwriter configuration, e.g. about the processing of rules (derived values or delete rules)
-fine: Output more information. If this level is chosen, all SQL statements executed by dbwriter will be output.
-finer: for debugging
-finest: for debugging
-all: Display information for all levels (only for debugging purposes).
N1 Grid Engine 6 writes one report file containing data of different types:
host load values and consumables
The dbwriter command line parameter -reporting has to specify the path to the reporting file. The dbwriter component automatically parses the reporting file; once it has completed processing and has stored all the information into the database, it deletes the reporting file.
At dbwriter startup, and in continuous mode once an hour, derived values are calculated. You can configure which values to calculate in an XML file, which is by default in $SGE_ROOT/dbwriter/database/<database_type>/dbwriter.xml. <database_type> defines the type of database being used; currently, Oracle and Postgres are supported. The path to the configuration file is passed to dbwriter using the -calculation parameter.
The configuration file uses an XML format, and contains entries of rules for both derived values and deleted values (described in the next section). The rules for derived values have the following format.
The top-level start tag is <derive> It must be specified with three attributes:
object — which can be host, queue, user, group, department or project. Based on this attribute, the derived value is ultimately stored in one of: sge_host_values, sge_queue_values, sge_user_values, sge_group_values, sge_department_values, sge_project_values.
interval — which can be hour, day, month, or year.
variable — which is the name of the new derived value.
A second-level start tag, either <sql> or <auto>, describing the way the value should be derived. These tags are shown in detail as follows.
<sql> – This tag contains an SQL statement used for calculating the derived values. The exact syntax of the entries depends upon the type of database being used.
<auto> - for certain simple derived values, this tag can be used instead of a full SQL query. This tag has two attributes:
function — which gives the aggregate function to apply to the variable. This can be any function valid for the type of database being used. Some typical functions are AVG, SUM, VALUE, COUNT, MIN or MAX.
variable — which can be any variable tracked in the following tables: sge_host_values, sge_queue_values, sge_user_values, sge_group_values, sge_department_values, sge_project_values the variable specified must be from the table indicated by the object attribute of the enclosing <derive> tag, for example, if the object is host, the variable must be found in sge_host_values.
Two end tags matching the two start tags
Here is an example of a derivation rule using the <sql> tag. The sge_queue table has a composed primary key comprising q_qname and q_hostname. For a rule specified for the queue object_type, a query will be made for each entry in the sge_queue table, the placeholders __key_0__ will be replaced by the queue name,ilwill be replaced by the hostname.
<!-- average queue utilization per hour --> <derive object="queue" interval="hour" variable="h_utilized"> <sql> SELECT DATE_TRUNC( 'hour', qv_time_start) AS time_start, DATE_TRUNC( 'hour', qv_time_start) + INTERVAL '1 hour' AS time_end, AVG(qv_dvalue * 100 / qv_dconfig) AS value FROM sge_queue_values WHERE qv_variable = 'slots' AND qv_parent = (SELECT q_id FROM sge_queue WHERE q_qname = __key_0__ AND q_hostname = __key_1__) AND qv_time_start <= '__time_end__' AND qv_time_end > '__time_start__' GROUP BY time_start </sql> </derive>
Here is an example of a derivation rule using the <auto> tag.
<!-- average load per hour --> <derive object="host" interval="hour" variable="h_load"> <auto function="AVG" variable="np_load_avg" /> </derive>
At dbwriter startup, and in continuous mode once an hour, outdated records will be deleted. Which values to calculate can be configured in an XML file, by default in $SGE_ROOT/dbwriter/database/<database_type>/dbwriter.xml. <database_type> is the type of database being used; currently, Oracle and Postgres are supported. The path to the configuration file is passed to dbwriter using the -deletion parameter.
The configuration file uses an XML format, and contains entries of rules for both derived values and deleted values. Deletion rules are of the following format.
A top-level start tag <delete> with three attributes:
scope — which specifies the type of data to be deleted. Valid entries are job, job_log, share_log, ip, queue_values, ip, department_values, user_values, group_values. Based on this attribute, the values are deleted from the table with the same name with sge_ prepended.
time_range — which gives the unit of time_amount.
time_amount — which is the number of units (time_range) a record is kept.
An optional second-level start tag <sub_scope>, which specifies an additional condition for deletion. A subscope can be configured for all *_values scopes and the share_log scope.
One or Two end tags matching the two start tags
If a subscope is configured for a *_values rule, it contains a space separated list of variables to delete. If a subscope is specified for the share_log, it contains a space separated list of sharetree nodes to delete.
The following rule indicates that the four variables given in the subscope should be deleted from the table sge_host_values after 7 days.
<delete scope="host_values" time_range="day" time_amount="7"> <sub_scope>np_load_avg</sub_scope> <sub_scope>cpu</sub_scope> <sub_scope>mem_free</sub_scope> <sub_scope>virtual_free</sub_scope> </delete>
The following rule says to delete all variables from the table sge_host_values after 2 years.
<delete scope="host_values" time_range="year" time_amount="2"/>
The following rule says to delete all records for user fred after 1 month
<delete scope="share_log" time_range="month" time_amount="1"> <sub_scope>fred</sub_scope> </delete>