Configuration

The task of setting up these parameters will be done by the installation procedure. You can configure the behavior of the dbwriter with a number of command line parameters.

Interval

The -interval parameter sets the interval, in which dbwriter looks for new reporting files. If a reporting file is found, it is read and data is written to the reporting database.

Pid

The -pid parameter defines the path to the pid file. The dbwriter writes at startup a pid file. This contains the process id of the dbwriter. At shutdown of the dbwriter this file will be deleted.

PidCmd

The -pidCmd parameter defines a command which will be executed by the dbwriter to determine it's process id. This command should print it's parent process id to stdout. The dbwriter is a java application. The java virtual machine cannot determine its own process id. The default value of the pidCmd is $SGE_ROOT/utilbin/$ARCH/checkprog -ppid.

Continuous Mode

The -continuous parameter switches on the continuous mode. Without -continuous, dbwriter will perform its tasks just once. If continuous mode is switched on, it will run continuously and perform its tasks in each interval set with the -interval switch.

Debug Level

You can use the -debug option to configure the amount of messages output by dbwriter. A parameter to the -debug option is the debug level. In general, using the default debug level (-info) should be the preferred choice. Using more verbose debug levels greatly increases the amount of data output by dbwriter. You can specify the following debug levels:

-warning: Display only severe errors and warnings
info: Add a number of informational messages. This level is the default, if the -display switch isn't used
-config: Give additional information that is related to dbwriter configuration, e.g. about the processing of rules (derived values or delete rules)
-fine: Output more information. If this level is chosen, all SQL statements executed by dbwriter will be output.
-finer: for debugging
-finest: for debugging
-all: Display information for all levels (only for debugging purposes).

Reporting File

N1 Grid Engine 6 writes one report file containing data of different types:

host load values and consumables
queue consumables
job logging
job accounting
sharetree usage

The dbwriter command line parameter -reporting has to specify the path to the reporting file. The dbwriter component automatically parses the reporting file; once it has completed processing and has stored all the information into the database, it deletes the reporting file.

Calculation of Derived Values

At dbwriter startup, and in continuous mode once an hour, derived values are calculated. You can configure which values to calculate in an XML file, which is by default in $SGE_ROOT/dbwriter/database/<database_type>/dbwriter.xml. <database_type> defines the type of database being used; currently, Oracle and Postgres are supported. The path to the configuration file is passed to dbwriter using the -calculation parameter.

The configuration file uses an XML format, and contains entries of rules for both derived values and deleted values (described in the next section). The rules for derived values have the following format.

Derived Values Format

The top-level start tag is <derive> It must be specified with three attributes:
- object — which can be host, queue, user, group, department or project. Based on this attribute, the derived value is ultimately stored in one of: sge_host_values, sge_queue_values, sge_user_values, sge_group_values, sge_department_values, sge_project_values.
- interval — which can be hour, day, month, or year.
- variable — which is the name of the new derived value.
A second-level start tag, either <sql> or <auto>, describing the way the value should be derived. These tags are shown in detail as follows.
<sql> – This tag contains an SQL statement used for calculating the derived values. The exact syntax of the entries depends upon the type of database being used.
<auto> - for certain simple derived values, this tag can be used instead of a full SQL query. This tag has two attributes:
- function — which gives the aggregate function to apply to the variable. This can be any function valid for the type of database being used. Some typical functions are AVG, SUM, VALUE, COUNT, MIN or MAX.
- variable — which can be any variable tracked in the following tables: sge_host_values, sge_queue_values, sge_user_values, sge_group_values, sge_department_values, sge_project_values the variable specified must be from the table indicated by the object attribute of the enclosing <derive> tag, for example, if the object is host, the variable must be found in sge_host_values.
Two end tags matching the two start tags

Examples

Here is an example of a derivation rule using the <sql> tag. The sge_queue table has a composed primary key comprising q_qname and q_hostname. For a rule specified for the queue object_type, a query will be made for each entry in the sge_queue table, the placeholders __key_0__ will be replaced by the queue name,ilwill be replaced by the hostname.

<!--
        average queue utilization per hour
        -->
<derive object="queue" interval="hour" variable="h_utilized">
        <sql>
        SELECT DATE_TRUNC( 'hour', qv_time_start)
                        AS time_start,
                DATE_TRUNC( 'hour', qv_time_start) + INTERVAL '1 hour'
                        AS time_end,
                AVG(qv_dvalue * 100 / qv_dconfig)
                        AS value
                FROM sge_queue_values
                WHERE qv_variable = 'slots' AND
                        qv_parent = (SELECT q_id FROM sge_queue
                                WHERE q_qname = __key_0__
                                        AND q_hostname = __key_1__)
                        AND qv_time_start &lt;= '__time_end__' AND
                        qv_time_end &gt; '__time_start__'
                GROUP BY time_start
</sql>
</derive>

Here is an example of a derivation rule using the <auto> tag.

<!-- average load per hour -->
<derive object="host" interval="hour" variable="h_load">
        <auto function="AVG" variable="np_load_avg" />
</derive>

Deleting Outdated Records

At dbwriter startup, and in continuous mode once an hour, outdated records will be deleted. Which values to calculate can be configured in an XML file, by default in $SGE_ROOT/dbwriter/database/<database_type>/dbwriter.xml. <database_type> is the type of database being used; currently, Oracle and Postgres are supported. The path to the configuration file is passed to dbwriter using the -deletion parameter.

The configuration file uses an XML format, and contains entries of rules for both derived values and deleted values. Deletion rules are of the following format.

A top-level start tag <delete> with three attributes:
- scope — which specifies the type of data to be deleted. Valid entries are job, job_log, share_log, ip, queue_values, ip, department_values, user_values, group_values. Based on this attribute, the values are deleted from the table with the same name with sge_ prepended.
- time_range — which gives the unit of time_amount.
- time_amount — which is the number of units (time_range) a record is kept.
An optional second-level start tag <sub_scope>, which specifies an additional condition for deletion. A subscope can be configured for all *_values scopes and the share_log scope.
One or Two end tags matching the two start tags

If a subscope is configured for a *_values rule, it contains a space separated list of variables to delete. If a subscope is specified for the share_log, it contains a space separated list of sharetree nodes to delete.

Examples

The following rule indicates that the four variables given in the subscope should be deleted from the table sge_host_values after 7 days.

<delete scope="host_values" time_range="day" time_amount="7">
        <sub_scope>np_load_avg</sub_scope>
        <sub_scope>cpu</sub_scope>
        <sub_scope>mem_free</sub_scope>
        <sub_scope>virtual_free</sub_scope>
</delete>

The following rule says to delete all variables from the table sge_host_values after 2 years.

<delete scope="host_values" time_range="year" time_amount="2"/>

The following rule says to delete all records for user fred after 1 month

<delete scope="share_log" time_range="month" time_amount="1">
        <sub_scope>fred</sub_scope>
</delete>

Previous: Setup