Sun N1 Grid Engine 6.1 Administration Guide

Chapter 10 Configuring DBWriter

Thedbwriter component writes and deletes the reporting data in the reporting database. It performs the following tasks:

The sge_qmaster component generates the reporting files. You can configure the generation of the reporting files, see the attribute reporting_params in the man page sge_conf(5), and the attribute report_variables in the man page host_conf(5).

Setup

The installation procedure sets up these parameters. A script for starting up the dbwriter is provided with the reporting module. Please see the N1 Grid Engine 6 Installation Guide for details. The following parameters have to be set for dbwriter.

Database System

The dbwriter can connect to different brands of database systems (supported systems are PostgreSQL and Oracle). The following parameters have to be set:

Database Server

Which database on which host to use is set by configuring the JDBC URL: The URL parameter is set to the JDBC URL of the database to use. Follow the guidelines of the database vendor for the syntax to use. For, example for a PostgreSQL database: jdbc:postgresql://<hostname>:5432/arco

Base Directory for Reporting Files

The path where dbwriter will find reporting files is set in the REPORTING_FILE variable. The base directory is typically set to $SGE_ROOT/$SGE_CELL/common

Configuration

The task of setting up these parameters will be done by the installation procedure. You can configure the behavior of the dbwriter with a number of command line parameters.

Interval

The -interval parameter sets the interval, in which dbwriter looks for new reporting files. If a reporting file is found, it is read and data is written to the reporting database.

Pid

The -pid parameter defines the path to the pid file. The dbwriter writes at startup a pid file. This contains the process id of the dbwriter. At shutdown of the dbwriter this file will be deleted.

PidCmd

The -pidCmd parameter defines a command which will be executed by the dbwriter to determine it's process id. This command should print it's parent process id to stdout. The dbwriter is a java application. The java virtual machine cannot determine its own process id. The default value of the pidCmd is $SGE_ROOT/utilbin/$ARCH/checkprog -ppid.

Continuous Mode

The -continuous parameter switches on the continuous mode. Without -continuous, dbwriter will perform its tasks just once. If continuous mode is switched on, it will run continuously and perform its tasks in each interval set with the -interval switch.

Debug Level

You can use the -debug option to configure the amount of messages output by dbwriter. A parameter to the -debug option is the debug level. In general, using the default debug level (-info) should be the preferred choice. Using more verbose debug levels greatly increases the amount of data output by dbwriter. You can specify the following debug levels:

Reporting File

N1 Grid Engine 6 writes one report file containing data of different types:

The dbwriter command line parameter -reporting has to specify the path to the reporting file. The dbwriter component automatically parses the reporting file; once it has completed processing and has stored all the information into the database, it deletes the reporting file.

Calculation of Derived Values

At dbwriter startup, and in continuous mode once an hour, derived values are calculated. You can configure which values to calculate in an XML file, which is by default in $SGE_ROOT/dbwriter/database/<database_type>/dbwriter.xml. <database_type> defines the type of database being used; currently, Oracle and Postgres are supported. The path to the configuration file is passed to dbwriter using the -calculation parameter.

The configuration file uses an XML format, and contains entries of rules for both derived values and deleted values (described in the next section). The rules for derived values have the following format.

Derived Values Format

  1. The top-level start tag is <derive> It must be specified with three attributes:

    • object — which can be host, queue, user, group, department or project. Based on this attribute, the derived value is ultimately stored in one of: sge_host_values, sge_queue_values, sge_user_values, sge_group_values, sge_department_values, sge_project_values.

    • interval — which can be hour, day, month, or year.

    • variable — which is the name of the new derived value.

  2. A second-level start tag, either <sql> or <auto>, describing the way the value should be derived. These tags are shown in detail as follows.

  3. <sql> – This tag contains an SQL statement used for calculating the derived values. The exact syntax of the entries depends upon the type of database being used.

  4. <auto> - for certain simple derived values, this tag can be used instead of a full SQL query. This tag has two attributes:

    • function — which gives the aggregate function to apply to the variable. This can be any function valid for the type of database being used. Some typical functions are AVG, SUM, VALUE, COUNT, MIN or MAX.

    • variable — which can be any variable tracked in the following tables: sge_host_values, sge_queue_values, sge_user_values, sge_group_values, sge_department_values, sge_project_values the variable specified must be from the table indicated by the object attribute of the enclosing <derive> tag, for example, if the object is host, the variable must be found in sge_host_values.

  5. Two end tags matching the two start tags

Examples

Here is an example of a derivation rule using the <sql> tag. The sge_queue table has a composed primary key comprising q_qname and q_hostname. For a rule specified for the queue object_type, a query will be made for each entry in the sge_queue table, the placeholders __key_0__ will be replaced by the queue name,ilwill be replaced by the hostname.

<!--
        average queue utilization per hour
        -->
<derive object="queue" interval="hour" variable="h_utilized">
        <sql>
        SELECT DATE_TRUNC( 'hour', qv_time_start)
                        AS time_start,
                DATE_TRUNC( 'hour', qv_time_start) + INTERVAL '1 hour'
                        AS time_end,
                AVG(qv_dvalue * 100 / qv_dconfig)
                        AS value
                FROM sge_queue_values
                WHERE qv_variable = 'slots' AND
                        qv_parent = (SELECT q_id FROM sge_queue
                                WHERE q_qname = __key_0__
                                        AND q_hostname = __key_1__)
                        AND qv_time_start &lt;= '__time_end__' AND
                        qv_time_end &gt; '__time_start__'
                GROUP BY time_start
</sql>
</derive>

Here is an example of a derivation rule using the <auto> tag.

<!-- average load per hour -->
<derive object="host" interval="hour" variable="h_load">
        <auto function="AVG" variable="np_load_avg" />
</derive>

Deleting Outdated Records

At dbwriter startup, and in continuous mode once an hour, outdated records will be deleted. Which values to calculate can be configured in an XML file, by default in $SGE_ROOT/dbwriter/database/<database_type>/dbwriter.xml. <database_type> is the type of database being used; currently, Oracle and Postgres are supported. The path to the configuration file is passed to dbwriter using the -deletion parameter.

The configuration file uses an XML format, and contains entries of rules for both derived values and deleted values. Deletion rules are of the following format.

If a subscope is configured for a *_values rule, it contains a space separated list of variables to delete. If a subscope is specified for the share_log, it contains a space separated list of sharetree nodes to delete.

Examples

The following rule indicates that the four variables given in the subscope should be deleted from the table sge_host_values after 7 days.

<delete scope="host_values" time_range="day" time_amount="7">
        <sub_scope>np_load_avg</sub_scope>
        <sub_scope>cpu</sub_scope>
        <sub_scope>mem_free</sub_scope>
        <sub_scope>virtual_free</sub_scope>
</delete>

The following rule says to delete all variables from the table sge_host_values after 2 years.

<delete scope="host_values" time_range="year" time_amount="2"/>

The following rule says to delete all records for user fred after 1 month

<delete scope="share_log" time_range="month" time_amount="1">
        <sub_scope>fred</sub_scope>
</delete>