You can use the optional Accounting and Reporting Console (ARCo) to generate live accounting and reporting data from the grid engine system and store the data in the reporting database, which is a standard SQL database. ARCo supports the following SQL database systems:
PostgreSQL
Oracle
MySQL
ARCo also provides a web-based tool for generating information queries on the reporting database and for retrieving the results in tabular or graphical form. ARCo enables you to store queries for later use, to run predefined queries, and to run queries in batch mode. For more information about how to use ARCo, see Chapter 5, Accounting and Reporting, in Sun N1 Grid Engine 6.1 User’s Guide. For information about how to install ARCo, see Chapter 8, Installing the Accounting and Reporting Console, in Sun N1 Grid Engine 6.1 Installation Guide.
Raw reporting data is generated by sge_qmaster. This raw data is stored in a reporting file. The dbwriter program reads the raw data in the reporting file and writes it to the SQL reporting database, where it can be accessed by ARCo.
The dbwriter program performs the following tasks:
Reads raw data from the reporting file and writes this raw data to the reporting database.
Calculates derived values. You can configure which values to calculate, as well as the rules that govern the calculations.
Deletes outdated data. You can configure how long to keep data.
When dbwriter starts up, it calculates derived values. dbwriter also deletes outdated records at startup. If dbwriter runs in continuous mode, dbwriter continues to calculate derived values and to delete outdated records at hourly intervals, or at whatever interval you specify.
You can specify in an XML file the values that you want to calculate and the records that you want to delete. Use the -calculation option of the dbwriter command to specify the path to this XML file.
For detailed information about calculating derived values, see Calculating Derived Values With dbwriter.
For detailed information about deleting outdated records, see Deleting Outdated Records With dbwriter.
The reporting file contains the following types of data:
Host load values and consumable resources
Queue consumable resources
Job logging
Job accounting
Share-tree usage
When the grid engine system is first installed, the reporting file is disabled. To use ARCo, you must enable the reporting file for the cluster. Once enabled, the reporting file will be generated by sge_qmaster. By default, the reporting file is located in sge-root/cell/common. You can change the default with the -reporting option of the dbwriter command.
For information about configuring the generation of the reporting file, see the reporting_params parameter of the sge_conf(5) man page, and the report_variables parameter of the host_conf(5) man page.
To enable the reporting file with QMON, on the Main Control window click the Cluster Configuration button, select the global host, and then click Modify.
On the Cluster Settings dialog box, click the Advanced Settings tab.
In the Reporting Parameters field, set the following parameters:
Set accounting to true. true is the default value.
Set reporting to true.
Set flush_time to 00:00:15. 00:00:15 is the default value.
Set joblog to true.
Set sharelog to 00:00:00. 00:00:00 is the default value.
To enable the reporting file from the command line, use the qconf –mconf command to set the reporting_params attributes, as described in the preceding paragraph.
Once the reporting file is enabled, the dbwriter can read raw data from the reporting file and write it to the reporting database.
For more information about configuring the reporting file, see the reporting(5) man page. For complete details about installing and setting up ARCo, see Chapter 8, Installing the Accounting and Reporting Console, in Sun N1 Grid Engine 6.1 Installation Guide.
The rules for calculating derived values are specified in a derived tag, which is a sub tag of the DbWriterConfig tag. The following table lists the attributes of the derived tag:
Description
The object for which data is aggregated. The object is one of the following:
host
queue
project
department
user
group
The time range specifying how often to calculate the derived values. The time range is one of the following:
hour
day
month
year
The name of the variable to hold the calculated data.
The following table lists the subelements of the derived tag:
The SQL statement that calculates the derived values. The statement must produce the following columns:
time_start – Together with time_end, specifies the time period for the calculated value
time_end
value – The calculated derived value
The SQL statement can contain the following placeholders. dbwriter replaces the placeholders for each query, based on a rule:
__time_start__ – Start time for the query. dbwriter searches for the last previously calculated derived value from this rule, and uses this timestamp as the start time for the next query.
__time_end__ – End time for the query. This timestamp specifies the end of the last passed time range. For example, if the time range is day, and if derived values are calculated at 00:30, 00:00 is taken as time_end.
__key_0__, __key_1__, . . . , __key_n__ – Components of the primary key for the specified object type. For example, the sge_hosts table has the primary h_hostname. If a rule is processed for the host object type, one query is executed per entry in the sge_hosts table, the __key_0__ placeholder in the SQL statement is replaced by the hostname.
The sge_queue table has a composed primary key that is made up of q_qname and q_hostname.
dbwriter generates the SQL statement for the calculation of derived values.
The autogenerated SQL statement looks like the following template:
SELECT time_start, time_end, <function>(<value_field>) as value FROM ( SELECT TRUNC( <timestart_field>, <interval> ) as time_start TRUNC( <timestart_field>, <interval> ) + INTERVAL '1' <interval> as time_end, <value_field> FROM <object value table> WHERE <primary key field 0> = __key_0__ AND <primary key field 1> = __key_1__ AND . . . AND <parent key field> = (SELECT <parent key field> FROM <parent table> WHERE <parent filter> ) AND <timestart_field> <= {ts __time_start__ } AND <timeend_field> > {ts __time_end__ } GROUP BY time_start, time_end |
The SQL template parameters are as follows:
Description
Aggregate function for calculating the derived value. Comes from the function attribute of the auto tag in the XML file.
Depends on the object of the derived value.
Depends on the object of the derived value.
Depends on the object of the derived value.
Comes from the interval attribute of the derived tag
Name of the database table where the values are stored. Depends on the object (host => host_values, user => user_values, . . .)
Primary key that is necessary to join the value table to the parent table. Depends on the object.
Name of the field that holds the ID of the parent. Depends on the object.
Name of the parent database table. Depends on the object (host => host, user => user)
Filter for the parent table. A derived value for each entry of the parent table is calculated, for example, u_user = 'user1').
Here is an example of an autogenerated SQL statement:
<derive object="host" interval="day" variable="d_load"> <auto function="AVG" variable="h_load" /> </derive> SELECT time_start, time_end, AGE(hv_dvalue) FROM ( SELECT TRUNC( hv_time_start, 'day' ) as time_start, TRUNC( hv_time_start, 'day' ) + INTERVAL '1' day as time_end, hv_dvalue FROM sge_host_values WHERE hv_variable = 'h_load' AND hv_parent = (SELECT h_id FROM sge_host WHERE h_hostname = 'foo.bar') AND hv_time_start <= {ts '2004-05-21 00:00:00.0'} AND hv_time_end > {ts '2004-05-17 00:00:00.0'} ) GROUP BY time_start, time_end |
To delete outdated records in the reporting database, you must specify a deletion rule in the delete tag. The following table lists the attributes of the delete tag:
Description
The type of data to delete. Valid entries are the following:
job
job_log
share_log
host_values
queue_values
project_values
department_values
user_values
group_values
The unit of time_amount:
Number of units (time_range) during which a record is to be kept.
The following table lists a subelement of the delete tag:
For certain scopes, a subscope can be configured. The subscope specifies an additional condition for deletion. A subscope can be configured for all *_values scopes and for the share_log scope.
If a subscope is configured for a *_values rule, it contains a list of variables to delete, separated by spaces.
If a subscope is specified for the share_log, it contains a list of share-tree nodes to delete, separated by spaces.
If subscope are used, you should always have a fallback rule without subscope, which will delete all objects that are not explicitly named by the subscope.
Here is an example of a delete tag:
<?xml version="1.0" encoding="UTF-8"?> <DbWriterConfig> <!-- keep host values for 2 years --> <delete scope="host_values" time_range="year" time_amount="2"/> <!-- keep queue values one month --> <delete scope="queue_values" time_range="month" time_amount="1"> <sub_scope>slots</sub_scope> <sub_scope>state</sub_scope> </delete> </DbWriterConfig> |