Sun N1 Grid Engine 6.1 User's Guide

Chapter 6 Automating Grid Engine Functions Through the Distributed Resource Management Application API

You can automate N1 Grid Engine functions by writing scripts that run N1 Grid Engine commands and parse the results. However, for more consistent and efficient results, you can use the C or JavaTM language and the Distributed Resource Management Application API. This chapter introduces the DRMAA concept and explains how to use it with the C and Java languages.

The chapter includes the following information:

Introduction to Distributed Resource Management Application API (DRMAA)

The Distributed Resource Management Application API (DRMAA, which is pronounced like drama) is an Open Grid Forum specification to standardize job submission, monitoring, and control in Distributed Resource Management Systems (DRMS). The objective of the DRMAA Working Group was to produce an API that would be easy to learn, easy to implement, and that would enable useful application integrations with DRMS in a standard way.

The DRMAA specification is language, platform, and DRMS agnostic. A wide variety of different systems should be able to implement the DRMAA specification. To provide additional guidance for DRMAA implementeos in specific languages, the DRMAA Working Group also produced several DRMAA language binding specifications. These specifications define what a DRMAA implementation should resemble in a given language.

The DRMAA specification is currently at version 1.0. The DRMAA Java Language Binding Specification is also at version 1.0, as is the DRMAA C Language Binding Specification. N1 Grid Engine 6.1 provides implementations of both the 1.0 Java language binding and the 1.0 C language binding as well as older versions of each for backward compatibility. For more information about the DRMAA 1.0 specification and the language-specific binding specifications, see the Open Grid Forum DRMAA Working Group web site.

Developing with the C Language Binding

Important Files for the C Language Binding

To use the DRMAA C language binding implementation included with N1 Grid Engine 6.1, you need to know where to find the important files. The most important file is the DRMAA header file that you include from your C application to make the DRMAA functions available to your application. The DRMAA header file resides in sge-root/include/drmaa.h, where sge-root defaults to/usr/SGE. For detailed reference information about the DRMAA functions, see section 5 of the N1 Grid Engine man pages, located in the sge-root/man directory. To compile and link your application, use the DRMAA shared library at sge-root/lib/arch/libdrmaa.so.

Including the DRMAA Header File

To use the DRMAA functions in your application, every source file that uses a DRMAA function must include the DRMAA header file. To include the DRMAA header file in your source file, add the following line to your source code, usually near the top:

#include "drmaa.h"

Compiling Your C Application

When you compile your DRMAA application, you need to include some additional compiler directives to direct the compiler and linker to use DRMAA. The following directions apply for the Sun Studio Compiler Collection and for gcc. These instructions might not apply for other compilers and linkers. Consult the documentation for your specific compiler and linker products.

You must include two directives:

You also need to verify that the sge-root/lib/arch directory is included in your library search path (LD_LIBRARY_PATH on the Solaris Operating Environment and Linux). The sge-root/lib/arch directory is not included automatically when you set your environment using the settings.sh or settings.csh files.


Example 6–1 Compiling Your C Application Using Sun Studio Compiler

The following example shows how you would compile your DRMAA application using the Sun Studio Compiler. The following assumptions apply:

Sample commands would look like the following

% source /sge/default/common/settings.csh
% cc -I/sge/include -ldrmaa app.c

Running Your C Application

To run your compiled DRMAA application, verify the following:

The sge-root/lib/arch directory must be included in the library search path (LD_LIBRARY_PATH on the Solaris Operating Environment and Linux). The sge-root/lib/arch directory is not included automatically when you set your environment using the settings.sh or settings.csh files.

You must be logged into a machine that is an N1 Grid Engine submit host. If the machine is not an N1 Grid Engine submit host, all DRMAA function calls will fail, returning DRMAA_ERRNO_DRM_COMMUNICATION_FAILURE.

ProcedureHow to Use the DRMAA 0.95 C Language Binding

The DRMAA shared library, which is enabled by default, supports version 1.0 of the DRMAA C Language Binding Specification. For reasons of backward compatibility, however, Grid Engine also includes an implementation of the 0.95 version of the DRMAA C Language Binding Specification. You should develop all new applications with the 1.0 shared library, but you might occasionally discover an application that requires the 0.95 implementation.

To enable the 0.95 version of the shared library, follow these steps:

  1. Log in as a user that has permissions to modify the Grid Engine installation.


    % su -
  2. Change to the sge-root/lib/arch directory.


    % cd /sge/lib/sol-sparc64
  3. Remove the libdrmaa.so symbolic link.


    %  rm libdrmaa.so
  4. Create a new symbolic link to the 0.95 library.


    % ln -s libdrmaa.so.0.95 libdrmaa.so

    On the Solaris and Linux platforms, the shared library is tagged with a version number. Applications compiled and linked against the 1.0 version will fail claiming that the library could not be found if the 0.95 version of the shared library is enabled, and vice versa. On other platforms, a 1.0 application will load the 0.95 shared library successfully but might fail due to unknown symbols. A 0.95 application will load the 1.0 shared library successfully but will likely fail due to DRMAA functions returning unexpected error codes.

    • To restore the 1.0 version of the shared library, perform steps 1 through 3 and create a new symbolic link to the 1.0 library.


      % ln -s libdrmaa.so.1.0 libdrmaa.so

C Application Examples

The following examples illustrate some application interactions that use the C language bindings. You can find additional examples on the “How To” section of the Grid Engine Community Site.


Example 6–2 Starting and Stopping a Session

The following code segment shows the most basic DRMAA C binding program.

Every call to a DRMAA function returns an error code. If everything goes well, that code is DRMAA_ERRNO_SUCCESS. If an error occurs, an appropriate error code is returned. Every DRMAA function also takes at least two parameters. These two parameters are a string to populate with a error message in case of an error and an integer representing the maximum length of the error string.

On line 8, the example calls drmaa_init(). This function sets up the DRMAA session and must be called before most other DRMAA functions. Some functions, like drmaa_get_contact(), can be called before drmaa_init(), but these functions only provide general information. Any function that performs an action, such as drmaa_run_job() or drmaa_wait() must be called after drmaa_init() returns. If such a function is called before drmaa_init() returns, it will return the error code DRMAA_ERRNO_NO_ACTIVE_SESSION.

The dmraa_init() function creates a session and starts an event client listener thread. The session is used for organizing jobs submitted through DRMAA, and the thread is used to receive updates from the queue master about the state of jobs and the system in general. Once drmaa_init() has been called successfully, the calling application must also call drmaa_exit() before terminating. If an application does not call drmaa_exit() before terminating, the queue master might be left with a dead event client handle, which can decrease queue master performance.

At the end of the program, on line 17, drmaa_exit() cleans up the session and stops the event client listener thread. Most other DRMAA functions must be called before drmaa_exit(). Some functions, like drmaa_get_contact(), can be called after drmaa_exit(), but these functions only provide general information. Any function that performs an action, such as drmaa_run_job() or drmaa_wait() must be called before drmaa_exit() is called. If such a function is called after drmaa_exit() is called, it will return the error code DRMAA_ERRNO_NO_ACTIVE_SESSION.

01: #include 
02: #include "drmaa.h"
03: 
04: int main(int argc, char **argv) {
05:    char error[DRMAA_ERROR_STRING_BUFFER];
06:    int errnum = 0;
07: 
08:    errnum = drmaa_init(NULL, error, DRMAA_ERROR_STRING_BUFFER);
09: 
10:    if (errnum != DRMAA_ERRNO_SUCCESS) {
11:       fprintf(stderr, "Could not initialize the DRMAA library: %s\n", error);
12:       return 1;
13:    }
14: 
15:    printf("DRMAA library was started successfully\n");
16:    
17:    errnum = drmaa_exit(error, DRMAA_ERROR_STRING_BUFFER);
18: 
19:    if (errnum != DRMAA_ERRNO_SUCCESS) {
20:       fprintf(stderr, "Could not shut down the DRMAA library: %s\n", error);
21:       return 1;
22:    }
23: 
24:    return 0;
25: }


Example 6–3 Running a Job

The following code segment shows how to use the DRMAA C binding to submit a job to N1 Grid Engine. The beginning and end of this program are the same as in Example 6–2. The differences are on lines 16-59. On line 16, DRMAA allocates a job template. A job template is a structure used to store information about a job to be submitted. The same template can be reused for multiple calls to drmaa_run_job() or drmaa_run_bulk_job().

On line 22, the DRMAA_REMOTE_COMMAND attribute is set. This attribute tells DRMAA where to find the program to run. Its value is the path to the executable. The path can be relative or absolute. If relative, the path is relative to the DRMAA_WD attribute, which defaults to the user's home directory. For more information on DRMAA attributes, see the drmaa_attributes man page. For this program to work, the script sleeper.sh must be in your default path.

On line 32, the DRMAA_V_ARGV attribute is set. This attribute tells DRMAA what arguments to pass to the executable. For more information on DRMAA attributes, refer to the drmaa_attributes man page.

On line 43 , drmaa_run_job() submits the job. DRMAA places the id assigned to the job into the character array that is passed to drmaa_run_job(). The job is now running as though submitted by qsub. At this point, calling drmaa_exit() or terminating the program will have no effect on the job.

To clean things up, the job template is deleted on line 54. This frees the memory DRMAA set aside for the job template, but has no effect on submitted jobs.

Finally, on line 61, call drmaa_exit() is called. The call to drmaa_exit() is outside of the if structure started on line 18 because once drmaa_init() is called drmaa_exit() must be called before terminating, regardless of whether the other commands succeed.

01: #include 
02: #include "drmaa.h"
03: 
04: int main(int argc, char **argv) {
05:    char error[DRMAA_ERROR_STRING_BUFFER];
06:    int errnum = 0;
07:    drmaa_job_template_t *jt = NULL;
08: 
09:    errnum = drmaa_init(NULL, error, DRMAA_ERROR_STRING_BUFFER);
10: 
11:    if (errnum != DRMAA_ERRNO_SUCCESS) {
12:       fprintf(stderr, "Could not initialize the DRMAA library: %s\n", error);
13:       return 1;
14:    }
15: 
16:    errnum = drmaa_allocate_job_template(&jt, error, DRMAA_ERROR_STRING_BUFFER);
17: 
18:    if (errnum != DRMAA_ERRNO_SUCCESS) {
19:       fprintf(stderr, "Could not create job template: %s\n", error);
20:    }
21:    else {
22:       errnum = drmaa_set_attribute(jt, DRMAA_REMOTE_COMMAND, "sleeper.sh",
23:                                     error, DRMAA_ERROR_STRING_BUFFER);
24: 
25:       if (errnum != DRMAA_ERRNO_SUCCESS) {
26:          fprintf(stderr, "Could not set attribute \"%s\": %s\n",
27:                   DRMAA_REMOTE_COMMAND, error);
28:       }
29:       else {
30:          const char *args[2] = {"5", NULL};
31:          
32:          errnum = drmaa_set_vector_attribute(jt, DRMAA_V_ARGV, args, error,
33:                                               DRMAA_ERROR_STRING_BUFFER);
34:       }
35:       
36:       if (errnum != DRMAA_ERRNO_SUCCESS) {
37:          fprintf(stderr, "Could not set attribute \"%s\": %s\n",
38:                   DRMAA_REMOTE_COMMAND, error);
39:       }
40:       else {
41:          char jobid[DRMAA_JOBNAME_BUFFER];
42: 
43:          errnum = drmaa_run_job(jobid, DRMAA_JOBNAME_BUFFER, jt, error,
44:                                  DRMAA_ERROR_STRING_BUFFER);
45: 
46:          if (errnum != DRMAA_ERRNO_SUCCESS) {
47:             fprintf(stderr, "Could not submit job: %s\n", error);
48:          }
49:          else {
50:             printf("Your job has been submitted with id %s\n", jobid);
51:          }
52:       } /* else */
53: 
54:       errnum = drmaa_delete_job_template(jt, error, DRMAA_ERROR_STRING_BUFFER);
55: 
56:       if (errnum != DRMAA_ERRNO_SUCCESS) {
57:          fprintf(stderr, "Could not delete job template: %s\n", error);
58:       }
59:    } /* else */
60: 
61:    errnum = drmaa_exit(error, DRMAA_ERROR_STRING_BUFFER);
62: 
63:    if (errnum != DRMAA_ERRNO_SUCCESS) {
64:       fprintf(stderr, "Could not shut down the DRMAA library: %s\n", error);
65:       return 1;
66:    }
67: 
68:    return 0;
69: }

Developing with the Java Language Binding

Important Files for the Java Language Binding

To use the DRMAA Java language binding implementation included with N1 Grid Engine 6.1, you need to know where to find the important files. The most important file is the DRMAA JAR file sge-root/lib/drmaa.jar. To compile your DRMAA application, you must include the DRMAA JAR file in your CLASSPATH . The DRMAA classes are documented in the DRMAA JavadocTM, located in the sge-root/doc/javadocs directory. To access the Javadocs, open the file sge-root/doc/javadocs/index.html in your browser . When you are ready to run your application, you also need the DRMAA shared library, sge-root/lib/arch/libdrmaa.so, which provides the required native routines.

Importing the DRMAA Java Classes and Packages

To use the DRMAA classes in your application, your classes should import the DRMAA classes or packages. In most cases, only the classes in the org.ggf.drmaa package will be used. You can import these packages individually or using a wildcard package import. In some rare cases, you might need to reference the N1 Grid Engine DRMAA implementation classes found in the com.sun.grid.drmaa package. In those cases, you can import the classes individually or you can import all the classes in a given package. The names of the com.sun.grid.drmaa classes do not overlap with the org.ggf.drmaa classes, so you can import both packages without creating a namespace collision.

Compiling Your Java Application

To compile your DRMAA application, you must include the sge-root/lib/drmaa.jar file in your CLASSPATH. The drmaa.jar file will not be included automatically when you set your environment using the settings.sh or settings.csh files.

ProcedureHow to Use DRMAA with NetBeans 5.x

To use the DRMAA classses with your NetBeans 5.0 or 5.5 project, follow these steps:

  1. Click mouse button 3 on the project node and select Properties.

  2. Determine whether your project generates a build file or uses an existing file.

    • If your project uses a generated build file:

      1. Select Libraries in the left column.

      2. Click Add Library.

      3. Click Manage Libraries in the Libraries dialog box.

      4. Click New Library in the Library Management dialog box.

      5. Type DRMAA in the Library Name field in the New Library dialog box.

      6. Click OK to dismiss the New Library dialog box.

      7. Click Add JAR/Folder.

      8. Browse to the sge-root/lib directory in the file chooser dialog box and select the drmaa.jar file.

      9. Click Add JAR/Folder to dismiss the file chooser dialog box.

      10. Click OK to dismiss the Library Management dialog box.

      11. Select the DRMAA library and click Add Library to dismiss the Libraries dialog box.

    • If your project uses an existing build file:

      1. Select Java Sources Classpath in the left column.

      2. Click Add JAR/Folder.

      3. Browse to the sge-root/lib directory in the file chooser dialog box and select the drmaa.jar file.

      4. Click Choose to dismiss the file chooser dialog box.

  3. Click OK to dismiss the properties dialog box.

  4. Verify that the DRMAA shared library is in the library search path.

    To run your application from NetBeans, the DRMAA shared library file sge-root/lib/arch/libdrmaa.so must be included in the library search path (LD_LIBRARY_PATH on the Solaris Operating Environment and Linux). The sge-root/lib/arch directory is not included automatically when you set your environment using the settings.sh or settings.csh files.To set up the path for the shared library, perform one of the following:

    • Set up your environment in the shell before launching NetBeans.

    • Add to the netbeans-root/etc/netbeans.conf file to set up the environment, such as:

      # Setup environment for SGE
      . <sge-root>/<sge_cell>/common/settings.sh
      ARCH=`$SGE_ROOT/util/arch`
      LD_LIBRARY_PATH=$SGE_ROOT/lib/$ARCH; export LD_LIBRARY_PATH

Running Your Java Application

To run your compiled DRMAA application, verify the following:

Using the DRMAA 0.5 Java Language Binding

The DRMAA shared library, which is used by default, supports version 1.0 of the DRMAA Java Language Binding Specification. For reasons of backward compatibility, however, N1 Grid Engine also includes an implementation of the 0.5 version of the DRMAA Java Language Binding Specification. You should develop all new applications with the 1.0 shared library, but you might occasionally discover an application that requires the 0.5 implementation.

To use the 0.5 version of the drmaa.jar file, you should include the sge-root/lib/drmaa-0.5.jar file in your CLASSPATH either before or instead of the usual sge-root/lib/drmaa.jar file. In addition, the use of the 0.5 Java language binding requires enabling the 0.95 C language binding. See How to Use the DRMAA 0.95 C Language Binding.

Java Application Examples

The following examples illustrate some application interactions that use the Java language bindings. You can find additional examples on the “How To” section of the Grid Engine Community Site.


Example 6–4 Starting and Stopping a Session

The following code segment shows the most basic DRMAA Java language binding program.

Everything that you as a programmer do with DRMAA, you do through a Session object. You get the Session object from a SessionFactory. You get the SessionFactory from the static SessionFactory.getFactory() method. The reason for this chain is that the org.ggf.drmaa.* classes should be considered an immutable package to be used by every DRMAA Java language binding implementation. Because the package is immutable, to load a specific implementation, the SessionFactory uses a system property to find the implementation's session factory, which it then loads. That session factory is then responsible for creating the session in whatever way it sees fit. It should be noted that even though there is a session factory, only one session may exist at a time.

On line 9, SessionFactory.getFactory() gets a session factory instance . On line 10, SessionFactory.getSession() gets the session instance. On line 13, Session.init() initializes the session. "" is passed in as the contact string to create a new session because no initialization arguments are needed.

Session.init() creates a session and starts an event client listener thread. The session is used for organizing jobs submitted through DRMAA, and the thread is used to receive updates from the queue master about the state of jobs and the system in general. Once Session.init() has been called successfully, the calling application must also call Session.exit() before terminating. If an application does not call Session.exit() before terminating, the queue master might be left with a dead event client handle, which can decrease queue master performance. Use the Runtime.addShutdownHook() method to make sure Session.exit() gets called.

At the end of the program, on line 14, Session.exit() cleans up the session and stops the event client listener thread. Most other DRMAA methods must be called before Session.exit(). Some functions, like Session.getContact(), can be called after Session.exit(), but these functions only provide general information. Any function that performs an action, such as Session.runJob() or Session.wait() must be called before Session.exit() is called. If such a function is called after Session.exit() is called, it will throw a NoActiveSessionException.

01: package com.sun.grid.drmaa.howto;
02:
03: import org.ggf.drmaa.DrmaaException;
04: import org.ggf.drmaa.Session;
05: import org.ggf.drmaa.SessionFactory;
06:
07: public class Howto1 {
08:    public static void main(String[] args) {
09:       SessionFactory factory = SessionFactory.getFactory();
10:       Session session = factory.getSession();
11:
12:       try {
13:          session.init("");
14:          session.exit();
15:       } catch (DrmaaException e) {
16:          System.out.println("Error: " + e.getMessage());
17:       }
18:    }
19: }


Example 6–5 Running a Job

The following code segment shows how to use the DRMAA Java language binding to submit a job to N1 Grid Engine. The beginning and end of this program are the same as Example 6–4. The differences are on lines 16-24.

On line 16 , DRMAA allocates a JobTemplate. A JobTemplate is an object that is used to store information about a job to be submitted. The same template can be reused for multiple calls to Session.runJob() or Session.runBulkJobs().

On line 17, the remoteCommand attribute is set. This attribute tells DRMAA where to find the program to run. Its value is the path to the executable. The path can be relative or absolute. If relative, the path is relative to the workingDirectory attribute, which defaults to the user's home directory. For more information on DRMAA attributes, see the DRMAA Javadoc or the drmaa_attributes man page. For this program to work, the script sleeper.sh must be in your default path.

On line 18, the args attribute is set. This attribute tells DRMAA what arguments to pass to the executable. For more information on DRMAA attributes, see the DRMAA Javadoc or the drmaa_attributes man page.

On line 20, Session.runJob() submits the job. This method returns the ID assigned to the job by the queue master. The job is now running as though submitted by qsub. At this point, calling Session.exit() or terminating the program will have no effect on the job.

To clean things up, the job template is deleted on line 24. This action frees the memory DRMAA set aside for the job template, but has no effect on submitted jobs.

01: package com.sun.grid.drmaa.howto;
02:
03: import java.util.Collections;
04: import org.ggf.drmaa.DrmaaException;
05: import org.ggf.drmaa.JobTemplate;
06: import org.ggf.drmaa.Session;
07: import org.ggf.drmaa.SessionFactory;
08:
09: public class Howto2 {
10:    public static void main(String[] args) {
11:       SessionFactory factory = SessionFactory.getFactory();
12:       Session session = factory.getSession();
13:
14:       try {
15:          session.init("");
16:          JobTemplate jt = session.createJobTemplate();
17:          jt.setRemoteCommand("sleeper.sh");
18:          jt.setArgs(Collections.singletonList("5"));
19:
20:          String id = session.runJob(jt);
21:
22:          System.out.println("Your job has been submitted with id " + id);
23:
24:          session.deleteJobTemplate(jt);
25:          session.exit();
26:       } catch (DrmaaException e) {
27:          System.out.println("Error: " + e.getMessage());
28:       }
29:    }
30: }