Sun Cluster Data Services Developer's Guide for Solaris OS

Chapter 8 Sample DSDL Resource Type Implementation

This chapter describes a sample resource type, SUNW.xfnts, which is implemented with the Data Service Development Library (DSDL). This data service is written in C. The underlying application is the X Font Server, a TCP/IP-based service. Appendix C, DSDL Sample Resource Type Code Listings contains the complete code for each method in the SUNW.xfnts resource type.

This chapter covers the following topics:

X Font Server

The X Font Server is a TCP/IP-based service that serves font files to its clients. Clients connect to the server to request a font set, and the server reads the font files off the disk and serves them to the clients. The X Font Server daemon consists of a server binary at /usr/openwin/bin/xfs. The daemon is normally started from inetd. However, for the current sample, assume that the correct entry in the /etc/inetd.conf file has been disabled (for example, by using the fsadmin -d command) so that the daemon is under sole control of the Sun Cluster software.

X Font Server Configuration File

By default, the X Font Server reads its configuration information from the file /usr/openwin/lib/X11/fontserver.cfg. The catalog entry in this file contains a list of font directories that are available to the daemon for serving. The cluster administrator can locate the font directories in the cluster file system. This location optimizes the use of the X Font Server on Sun Cluster by maintaining a single copy of the font's database on the system. If the cluster administrator wants to change the location, the cluster administrator must edit fontserver.cfg to reflect the new paths for the font directories.

For ease of configuration, the cluster administrator can also place the configuration file itself in the cluster file system. The xfs daemon provides command-line arguments that override the default, built-in location of this file. The SUNW.xfnts resource type uses the following command to start the daemon under the control of the Sun Cluster software.

/usr/openwin/bin/xfs -config location-of-configuration-file/fontserver.cfg \
-port port-number

In the SUNW.xfnts resource type implementation, you can use the Confdir_list property to manage the location of the fontserver.cfg configuration file.

TCP Port Number

The TCP port number on which the xfs server daemon listens is normally the “fs” port, typically defined as 7100 in the /etc/services file. However, the -port option that the cluster administrator includes with the xfs command enables the cluster administrator to override the default setting.

You can use the Port_list property in the SUNW.xfnts resource type to set the default value and to enable the cluster administrator to use the -port option with the xfs command. You define the default value of this property as 7100/tcp in the RTR file. In the SUNW.xfnts Start method, you pass Port_list to the -port option on the xfs command line. Consequently, a user of this resource type is not required to specify a port number (the port defaults to 7100/tcp). The cluster administrator can specify a different value for the Port_list property when the cluster administrator configures the resource type.

`SUNW.xfnts` RTR File

This section describes several key properties in the SUNW.xfnts RTR file. It does not describe the purpose of each property in the file. For such a description, see Setting Resource and Resource Type Properties.

The Confdir_list extension property identifies the configuration directory (or a list of directories), as follows:

{
        PROPERTY = Confdir_list;
        EXTENSION;
        STRINGARRAY;
        TUNABLE = AT_CREATION;
        DESCRIPTION = "The Configuration Directory Path(s)";
}

The Confdir_list property does not specify a default value. The cluster administrator must specify a directory name when the resource is created. This value cannot be changed later because tunability is limited to AT_CREATION.

The Port_list property identifies the port on which the server daemon listens, as follows:

{
        PROPERTY = Port_list;
        DEFAULT = 7100/tcp;
        TUNABLE = ANYTIME;
}

Because the property declares a default value, the cluster administrator can specify a new value or accept the default value when the resource is created. No one can change this value later because tunability is limited to AT_CREATION.

Naming Conventions for Functions and Callback Methods

You can identify the various pieces of the sample code by knowing these conventions:

RMAPI functions begin with scha_.
DSDL functions begin with scds_.
Callback methods begin with xfnts_.
User-written functions begin with svc_.

`scds_initialize()` Function

The DSDL requires that each callback method call the scds_initialize() function at the beginning of the method.

This function performs the following operations:

Checks and processes the command-line arguments (argc and argv) that the framework passes to the data service method. The method does not have to process any additional command-line arguments.
Sets up internal data structures for use by the other functions in the DSDL.
Initializes the logging environment.
Validates fault monitor probe settings.

Use the scds_close() function to reclaim the resources that are allocated by scds_initialize().

`xfnts_start` Method

The RGM runs the Start method on a cluster node or zone when the resource group that contains the data service resource is brought online on that node or zone or when the resource is enabled. In the SUNW.xfnts sample resource type, the xfnts_start method activates the xfs daemon on that node or zone.

The xfnts_start method calls scds_pmf_start() to start the daemon under the PMF. The PMF provides automatic failure notification and restart features, as well as integration with the fault monitor.

Note –

The first call in xfnts_start is to scds_initialize(), which performs some necessary housekeeping functions. scds_initialize() Function and the scds_initialize(3HA) man page contain more information.

Validating the Service Before Starting the X Font Server

Before the xfnts_start method attempts to start the X Font Server, it calls svc_validate() to verify that a correct configuration is in place to support the xfs daemon.

rc = svc_validate(scds_handle);
   if (rc != 0) {
      scds_syslog(LOG_ERR,
          "Failed to validate configuration.");
      return (rc);
   }

See xfnts_validate Method for details.

Starting the Service With `svc_start()`

The xfnts_start method calls the svc_start() method, which is defined in the xfnts.c file, to start the xfs daemon. This section describes svc_start().

The command to start the xfs daemon is as follows:

# xfs -config config-directory/fontserver.cfg -port port-number

The Confdir_list extension property identifies the config-directory while the Port_list system property identifies the port-number. The cluster administrator provides specific values for these properties when he or she configures the data service.

The xfnts_start method declares these properties as string arrays. The xfnts_start method obtains the values that the cluster administrator sets by using the scds_get_ext_confdir_list() and scds_get_port_list() functions. These functions are described in the scds_property_functions(3HA) man page.

scha_str_array_t *confdirs;
scds_port_list_t    *portlist;
scha_err_t   err;

   /* get the configuration directory from the confdir_list property */
   confdirs = scds_get_ext_confdir_list(scds_handle);

   (void) sprintf(xfnts_conf, "%s/fontserver.cfg", confdirs->str_array[0]);

   /* obtain the port to be used by XFS from the Port_list property */
   err = scds_get_port_list(scds_handle, &portlist);
   if (err != SCHA_ERR_NOERR) {
      scds_syslog(LOG_ERR,
          "Could not access property Port_list.");
      return (1);
   }

Note that the confdirs variable points to the first element (0) of the array.

The xfnts_start method uses sprintf() to form the command line for xfs.

/* Construct the command to start the xfs daemon. */
   (void) sprintf(cmd,
       "/usr/openwin/bin/xfs -config %s -port %d 2>/dev/null",
       xfnts_conf, portlist->ports[0].port);

Note that the output is redirected to /dev/null to suppress messages that are generated by the daemon.

The xfnts_start method passes the xfs command line to scds_pmf_start() to start the data service under the control of the PMF.

scds_syslog(LOG_INFO, "Issuing a start request.");
   err = scds_pmf_start(scds_handle, SCDS_PMF_TYPE_SVC,
      SCDS_PMF_SINGLE_INSTANCE, cmd, -1);

   if (err == SCHA_ERR_NOERR) {
      scds_syslog(LOG_INFO,
          "Start command completed successfully.");
   } else {
      scds_syslog(LOG_ERR,
          "Failed to start HA-XFS ");
   }

Note the following points about the call to scds_pmf_start():

The SCDS_PMF_TYPE_SVC argument identifies the program to start as a data service application. This method can also start a fault monitor or some other type of application.
The SCDS_PMF_SINGLE_INSTANCE argument identifies this as a single-instance resource.
The cmd argument is the command line that was generated previously.
The final argument, -1, specifies the child monitoring level. The -1 value specifies that the PMF monitor all children as well as the original process.

Before returning, svc_pmf_start() frees the memory that is allocated for the portlist structure.

scds_free_port_list(portlist);
return (err);

Returning From `svc_start()`

Even when svc_start() returns successfully, the underlying application might have failed to start. Therefore, svc_start() must probe the application to verify that it is running before returning a success message. The probe must also take into account that the application might not be immediately available because it takes some time to start. The svc_start() method calls svc_wait(), which is defined in the xfnts.c file, to verify that the application is running.

/* Wait for the service to start up fully */
   scds_syslog_debug(DBG_LEVEL_HIGH,
       "Calling svc_wait to verify that service has started.");

   rc = svc_wait(scds_handle);

   scds_syslog_debug(DBG_LEVEL_HIGH,
       "Returned from svc_wait");

   if (rc == 0) {
      scds_syslog(LOG_INFO, "Successfully started the service.");
   } else {
      scds_syslog(LOG_ERR, "Failed to start the service.");
   }

The svc_wait() function calls scds_get_netaddr_list() to obtain the network address resources that are needed to probe the application.

/* obtain the network resource to use for probing */
   if (scds_get_netaddr_list(scds_handle, &netaddr)) {
      scds_syslog(LOG_ERR,
          "No network address resources found in resource group.");
      return (1);
   }

   /* Return an error if there are no network resources */
   if (netaddr == NULL || netaddr->num_netaddrs == 0) {
      scds_syslog(LOG_ERR,
          "No network address resource in resource group.");
      return (1);
   }

The svc_wait() function obtains the Start_timeout and Stop_timeout values.

svc_start_timeout = scds_get_rs_start_timeout(scds_handle)
   probe_timeout = scds_get_ext_probe_timeout(scds_handle)

To account for the time the server might take to start, svc_wait() calls scds_svc_wait() and passes a timeout value equivalent to three percent of the Start_timeout value. The svc_wait() function calls the svc_probe() function to verify that the application has started. The svc_probe() method makes a simple socket connection to the server on the specified port. If it fails to connect to the port, svc_probe() returns a value of 100, which indicates a total failure. If the connect goes through but the disconnect to the port fails, svc_probe() returns a value of 50.

On failure or partial failure of svc_probe(), svc_wait() calls scds_svc_wait() with a timeout value of 5. The scds_svc_wait() method limits the frequency of the probes to every five seconds. This method also counts the number of attempts to start the service. If the number of attempts exceeds the value of the Retry_count property of the resource within the period that is specified by the Retry_interval property of the resource, the scds_svc_wait() function returns failure. In this case, the svc_start() function also returns failure.

#define    SVC_CONNECT_TIMEOUT_PCT    95
#define    SVC_WAIT_PCT       3
   if (scds_svc_wait(scds_handle, (svc_start_timeout * SVC_WAIT_PCT)/100)
      != SCHA_ERR_NOERR) {

      scds_syslog(LOG_ERR, "Service failed to start.");
      return (1);
   }

   do {
      /*
       * probe the data service on the IP address of the
       * network resource and the portname
       */
      rc = svc_probe(scds_handle,
          netaddr->netaddrs[0].hostname,
          netaddr->netaddrs[0].port_proto.port, probe_timeout);
      if (rc == SCHA_ERR_NOERR) {
         /* Success. Free up resources and return */
         scds_free_netaddr_list(netaddr);
         return (0);
      }

       /* Call scds_svc_wait() so that if service fails too
      if (scds_svc_wait(scds_handle, SVC_WAIT_TIME)
         != SCHA_ERR_NOERR) {
         scds_syslog(LOG_ERR, "Service failed to start.");
         return (1);
      }

   /* Rely on RGM to timeout and terminate the program */
   } while (1);

Note –

Before it exits, the xfnts_start method calls scds_close() to reclaim resources that are allocated by scds_initialize(). scds_initialize() Function and the scds_close(3HA) man page contain more information.

`xfnts_stop` Method

Because the xfnts_start method uses scds_pmf_start() to start the service under the PMF, xfnts_stop uses scds_pmf_stop() to stop the service.

Note –

The first call in xfnts_stop is to scds_initialize(), which performs some necessary housekeeping functions. scds_initialize() Function and the scds_initialize(3HA) man page contain more information.

The xfnts_stop method calls the svc_stop() method, which is defined in the xfnts.c file, as follows:

scds_syslog(LOG_ERR, "Issuing a stop request.");
   err = scds_pmf_stop(scds_handle,
       SCDS_PMF_TYPE_SVC, SCDS_PMF_SINGLE_INSTANCE, SIGTERM,
       scds_get_rs_stop_timeout(scds_handle));

   if (err != SCHA_ERR_NOERR) {
      scds_syslog(LOG_ERR,
          "Failed to stop HA-XFS.");
      return (1);
   }

   scds_syslog(LOG_INFO,
       "Successfully stopped HA-XFS.");
   return (SCHA_ERR_NOERR); /* Successfully stopped */

Note the following points about the call in svc_stop() to the scds_pmf_stop() function:

The SCDS_PMF_TYPE_SVC argument identifies the program to stop as a data service application. This method can also stop a fault monitor or some other type of application.
The SCDS_PMF_SINGLE_INSTANCE argument identifies the signal.
The SIGTERM argument identifies the signal to use to stop the resource instance. If this signal fails to stop the instance, scds_pmf_stop() sends SIGKILL to stop the instance, and if that fails, returns with a timeout error. See the scds_pmf_stop(3HA) man page for details.
The timeout value is that of the Stop_timeout property of the resource.

Note –

Before it exits, the xfnts_stop method calls scds_close() to reclaim resources that are allocated by scds_initialize(). scds_initialize() Function and the scds_close(3HA) man page contain more information.

`xfnts_monitor_start` Method

The RGM calls the Monitor_start method on a node or zone to start the fault monitor after a resource is started on the node or zone. The xfnts_monitor_start method uses scds_pmf_start() to start the monitor daemon under the PMF.

Note –

The first call in xfnts_monitor_start is to scds_initialize(), which performs some necessary housekeeping functions. scds_initialize() Function and the scds_initialize(3HA) man page contain more information.

The xfnts_monitor_start method calls the mon_start method, which is defined in the xfnts.c file, as follows:

scds_syslog_debug(DBG_LEVEL_HIGH,
      "Calling Monitor_start method for resource <%s>.",
      scds_get_resource_name(scds_handle));

    /* Call scds_pmf_start and pass the name of the probe. */
   err = scds_pmf_start(scds_handle, SCDS_PMF_TYPE_MON,
       SCDS_PMF_SINGLE_INSTANCE, "xfnts_probe", 0);

   if (err != SCHA_ERR_NOERR) {
      scds_syslog(LOG_ERR,
          "Failed to start fault monitor.");
      return (1);
   }

   scds_syslog(LOG_INFO,
       "Started the fault monitor.");

   return (SCHA_ERR_NOERR); /* Successfully started Monitor */
}

Note the following points about the call in svc_mon_start() to the scds_pmf_start() function:

The SCDS_PMF_TYPE_MON argument identifies the program to start as a fault monitor. This method can also start a data service or some other type of application.
The SCDS_PMF_SINGLE_INSTANCE argument identifies this as a single-instance resource.
The xfnts_probe argument identifies the monitor daemon to start. The monitor daemon is assumed to be located in the same directory as the other callback programs.
The final argument, 0, specifies the child monitoring level. In this case, this value specifies that the PMF monitor the monitor daemon only.

Note –

Before it exits, the xfnts_monitor_start method calls scds_close() to reclaim resources that were allocated by scds_initialize(). scds_initialize() Function and the scds_close(3HA) man page contain more information.

`xfnts_monitor_stop` Method

Because the xfnts_monitor_start method uses scds_pmf_start() to start the monitor daemon under the PMF, xfnts_monitor_stop uses scds_pmf_stop()to stop the monitor daemon.

Note –

The first call in xfnts_monitor_stop is to scds_initialize(), which performs some necessary housekeeping functions. scds_initialize() Function and the scds_initialize(3HA) man page contain more information.

The xfnts_monitor_stop() method calls the mon_stop method, which is defined in the xfnts.c file, as follows:

scds_syslog_debug(DBG_LEVEL_HIGH,
      "Calling scds_pmf_stop method");

   err = scds_pmf_stop(scds_handle, SCDS_PMF_TYPE_MON,
       SCDS_PMF_SINGLE_INSTANCE, SIGKILL,
       scds_get_rs_monitor_stop_timeout(scds_handle));

   if (err != SCHA_ERR_NOERR) {
      scds_syslog(LOG_ERR,
          "Failed to stop fault monitor.");
      return (1);
   }

   scds_syslog(LOG_INFO,
       "Stopped the fault monitor.");

   return (SCHA_ERR_NOERR); /* Successfully stopped monitor */
}

Note the following points about the call in svc_mon_stop() to the scds_pmf_stop() function:

The SCDS_PMF_TYPE_MON argument identifies the program to stop as a fault monitor. This method can also stop a data service or some other type of application.
The SCDS_PMF_SINGLE_INSTANCE argument identifies this as a single-instance resource.
The SIGKILL argument identifies the signal to use to stop the resource instance. If this signal fails to stop the instance, scds_pmf_stop() returns with a timeout error. See the scds_pmf_stop(3HA) man page for details.
The timeout value is that of the Monitor_stop_timeout property of the resource.

Note –

Before it exits, the xfnts_monitor_stop method calls scds_close() to reclaim resources that were allocated by scds_initialize(). scds_initialize() Function and the scds_close(3HA) man page contain more information.

`xfnts_monitor_check` Method

The RGM calls the Monitor_check method whenever the fault monitor attempts to fail over the resource group that contains the resource to another node or zone. The xfnts_monitor_check method calls the svc_validate() method to verify that a correct configuration is in place to support the xfs daemon. See xfnts_validate Method for details. The code for xfnts_monitor_check is as follows:

/* Process the arguments passed by RGM and initialize syslog */
   if (scds_initialize(&scds_handle, argc, argv) != SCHA_ERR_NOERR)
{
      scds_syslog(LOG_ERR, "Failed to initialize the handle.");
      return (1);
   }

   rc =  svc_validate(scds_handle);
   scds_syslog_debug(DBG_LEVEL_HIGH,
       "monitor_check method "
       "was called and returned <%d>.", rc);

   /* Free up all the memory allocated by scds_initialize */
   scds_close(&scds_handle);

   /* Return the result of validate method run as part of monitor check */
   return (rc);
}

`SUNW.xfnts` Fault Monitor

The RGM does not directly call the PROBE method, but rather calls the Monitor_start method to start the monitor after a resource is started on a node or zone. The xfnts_monitor_start method starts the fault monitor under the control of the PMF. The xfnts_monitor_stop method stops the fault monitor.

The SUNW.xfnts fault monitor performs the following operations:

Periodically monitors the health of the xfs server daemon by using utilities that are specifically designed to check simple TCP-based services, such as xfs.
Tracks problems that the application encounters within a time window (using the Retry_count and Retry_interval properties) and decides whether to restart or fail over the data service if the application fails completely. The scds_fm_action() and scds_fm_sleep() functions provide built-in support for this tracking and decision mechanism.
Implements the failover or restart decision by using scds_fm_action().
Updates the resource state and makes the resource state available to administrative tools and GUIs.

`xfonts_probe` Main Loop

The xfonts_probe method implements a loop.

Before implementing the loop, xfonts_probe performs the following operations:

Retrieves the network address resources for the xfnts resource, as follows:

/* Get the ip addresses available for this resource */
   if (scds_get_netaddr_list(scds_handle, &netaddr)) {
      scds_syslog(LOG_ERR,
          "No network address resource in resource group.");
      scds_close(&scds_handle);
      return (1);
   }

   /* Return an error if there are no network resources */
   if (netaddr == NULL || netaddr->num_netaddrs == 0) {
      scds_syslog(LOG_ERR,
          "No network address resource in resource group.");
      return (1);
   }

Calls scds_fm_sleep() and passes the value of Thorough_probe_interval as the timeout value. The probe sleeps for the value of Thorough_probe_interval between probes, as follows:

timeout = scds_get_ext_probe_timeout(scds_handle);

   for (;;) {
      /*
       * sleep for a duration of thorough_probe_interval between
       *  successive probes.
       */
      (void) scds_fm_sleep(scds_handle,
          scds_get_rs_thorough_probe_interval(scds_handle));

The xfnts_probe method implements the following loop:

for (ip = 0; ip < netaddr->num_netaddrs; ip++) {
         /*
          * Grab the hostname and port on which the
          * health has to be monitored.
          */
         hostname = netaddr->netaddrs[ip].hostname;
         port = netaddr->netaddrs[ip].port_proto.port;
         /*
          * HA-XFS supports only one port and
          * hence obtain the port value from the
          * first entry in the array of ports.
          */
         ht1 = gethrtime(); /* Latch probe start time */
         scds_syslog(LOG_INFO, "Probing the service on port: %d.", port);

         probe_result =
         svc_probe(scds_handle, hostname, port, timeout);

         /*
          * Update service probe history,
          * take action if necessary.
          * Latch probe end time.
          */
         ht2 = gethrtime();

         /* Convert to milliseconds */
         dt = (ulong_t)((ht2 - ht1) / 1e6);

         /*
          * Compute failure history and take
          * action if needed
          */
         (void) scds_fm_action(scds_handle,
             probe_result, (long)dt);
      }   /* Each net resource */
   }    /* Keep probing forever */

The svc_probe() function implements the probe logic. The return value from svc_probe() is passed to scds_fm_action(), which determines whether to restart the application, fail over the resource group, or do nothing.

`svc_probe()` Function

The svc_probe() function makes a simple socket connection to the specified port by calling scds_fm_tcp_connect(). If the connect fails, svc_probe() returns a value of 100, which indicates a complete failure. If the connect succeeds, but the disconnect fails, svc_probe() returns a value of 50, which indicates a partial failure. If the connect and disconnect both succeed, svc_probe() returns a value of 0, which indicates success.

The code for svc_probe() is as follows:

int svc_probe(scds_handle_t scds_handle,
char *hostname, int port, int timeout)
{
   int  rc;
   hrtime_t   t1, t2;
   int    sock;
   char   testcmd[2048];
   int    time_used, time_remaining;
   time_t      connect_timeout;


   /*
    * probe the data service by doing a socket connection to the port
    * specified in the port_list property to the host that is
    * serving the XFS data service. If the XFS service which is configured
    * to listen on the specified port, replies to the connection, then
    * the probe is successful. Else we will wait for a time period set
    * in probe_timeout property before concluding that the probe failed.
    */

   /*
    * Use the SVC_CONNECT_TIMEOUT_PCT percentage of timeout
    * to connect to the port
    */
   connect_timeout = (SVC_CONNECT_TIMEOUT_PCT * timeout)/100;
   t1 = (hrtime_t)(gethrtime()/1E9);

   /*
    * the probe makes a connection to the specified hostname and port.
    * The connection is timed for 95% of the actual probe_timeout.
    */
   rc = scds_fm_tcp_connect(scds_handle, &sock, hostname, port,
       connect_timeout);
   if (rc) {
      scds_syslog(LOG_ERR,
          "Failed to connect to port <%d> of resource <%s>.",
          port, scds_get_resource_name(scds_handle));
      /* this is a complete failure */
      return (SCDS_PROBE_COMPLETE_FAILURE);
   }

   t2 = (hrtime_t)(gethrtime()/1E9);

   /*
    * Compute the actual time it took to connect. This should be less than
    * or equal to connect_timeout, the time allocated to connect.
    * If the connect uses all the time that is allocated for it,
    * then the remaining value from the probe_timeout that is passed to
    * this function will be used as disconnect timeout. Otherwise, the
    * the remaining time from the connect call will also be added to
    * the disconnect timeout.
    *
    */

   time_used = (int)(t2 - t1);

   /*
    * Use the remaining time(timeout - time_took_to_connect) to disconnect
    */

   time_remaining = timeout - (int)time_used;

   /*
    * If all the time is used up, use a small hardcoded timeout
    * to still try to disconnect. This will avoid the fd leak.
    */
   if (time_remaining <= 0) {
      scds_syslog_debug(DBG_LEVEL_LOW,
          "svc_probe used entire timeout of "
          "%d seconds during connect operation and exceeded the "
          "timeout by %d seconds. Attempting disconnect with timeout"
          " %d ",
          connect_timeout,
          abs(time_used),
          SVC_DISCONNECT_TIMEOUT_SECONDS);

      time_remaining = SVC_DISCONNECT_TIMEOUT_SECONDS;
   }

   /*
    * Return partial failure in case of disconnection failure.
    * Reason: The connect call is successful, which means
    * the application is alive. A disconnection failure
    * could happen due to a hung application or heavy load.
    * If it is the later case, don't declare the application
    * as dead by returning complete failure. Instead, declare
    * it as partial failure. If this situation persists, the
    * disconnect call will fail again and the application will be
    * restarted.
    */
   rc = scds_fm_tcp_disconnect(scds_handle, sock, time_remaining);
   if (rc != SCHA_ERR_NOERR) {
      scds_syslog(LOG_ERR,
          "Failed to disconnect to port %d of resource %s.",
          port, scds_get_resource_name(scds_handle));
      /* this is a partial failure */
      return (SCDS_PROBE_COMPLETE_FAILURE/2);
   }

   t2 = (hrtime_t)(gethrtime()/1E9);
   time_used = (int)(t2 - t1);
   time_remaining = timeout - time_used;

   /*
    * If there is no time left, don't do the full test with
    * fsinfo. Return SCDS_PROBE_COMPLETE_FAILURE/2
    * instead. This will make sure that if this timeout
    * persists, server will be restarted.
    */
   if (time_remaining <= 0) {
      scds_syslog(LOG_ERR, "Probe timed out.");
      return (SCDS_PROBE_COMPLETE_FAILURE/2);
   }

   /*
    * The connection and disconnection to port is successful,
    * Run the fsinfo command to perform a full check of
    * server health.
    * Redirect stdout, otherwise the output from fsinfo
    * ends up on the console.
    */
   (void) sprintf(testcmd,
       "/usr/openwin/bin/fsinfo -server %s:%d > /dev/null",
       hostname, port);
   scds_syslog_debug(DBG_LEVEL_HIGH,
       "Checking the server status with %s.", testcmd);
   if (scds_timerun(scds_handle, testcmd, time_remaining,
      SIGKILL, &rc) != SCHA_ERR_NOERR || rc != 0) {

      scds_syslog(LOG_ERR,
         "Failed to check server status with command <%s>",
         testcmd);
      return (SCDS_PROBE_COMPLETE_FAILURE/2);
   }
   return (0);
}

When finished, svc_probe() returns a value that indicates success (0), partial failure (50), or complete failure (100). The xfnts_probe method passes this value to scds_fm_action().

Determining the Fault Monitor Action

The xfnts_probe method calls scds_fm_action() to determine the action to take.

The logic in scds_fm_action() is as follows:

Maintain a cumulative failure history within the value of the Retry_interval property.
If the cumulative failure reaches 100 (complete failure), restart the data service. If Retry_interval is exceeded, reset the history.
If the number of restarts exceeds the value of the Retry_count property, within the time specified by Retry_interval, fail over the data service.

For example, suppose the probe makes a connection to the xfs server, but fails to disconnect. This indicates that the server is running, but could be hung or just under a temporary load. The failure to disconnect sends a partial (50) failure to scds_fm_action(). This value is below the threshold for restarting the data service, but the value is maintained in the failure history.

If during the next probe the server again fails to disconnect, a value of 50 is added to the failure history maintained by scds_fm_action(). The cumulative failure value is now 100, so scds_fm_action() restarts the data service.

`xfnts_validate` Method

The RGM calls the Validate method when a resource is created and when a cluster administrator updates the properties of the resource or its containing group. The RGM calls the Validate method before the creation or update is applied. A failure exit code from the method on any node or zone causes the creation or update to be canceled.

The RGM calls Validate only when a cluster administrator changes resource or resource group properties or when a monitor sets the Status and Status_msg resource properties. The RGM does not call Validate when the RGM sets properties.

Note –

The Monitor_check method also explicitly calls the Validate method whenever the PROBE method attempts to fail over the data service to a new node or zone.

The RGM calls Validate with additional arguments to those that are passed to other methods, including the properties and values that are being updated. The call to scds_initialize() at the beginning of xfnts_validate parses all the arguments that the RGM passes to xfnts_validate and stores the information in the scds_handle argument. The subroutines that xfnts_validate calls make use of this information.

The xfnts_validate method calls svc_validate(), which verifies the following conditions:

The Confdir_list property has been set for the resource and defines a single directory.

scha_str_array_t *confdirs;
   confdirs = scds_get_ext_confdir_list(scds_handle);

/* Return error if there is no confdir_list extension property */
   if (confdirs == NULL || confdirs->array_cnt != 1) {
      scds_syslog(LOG_ERR,
          "Property Confdir_list is not set properly.");
      return (1); /* Validation failure */
   }

The directory that is specified by Confdir_list contains the fontserver.cfg file.

(void) sprintf(xfnts_conf, "%s/fontserver.cfg", confdirs->str_array[0]);

   if (stat(xfnts_conf, &statbuf) != 0) {
      /*
       * suppress lint error because errno.h prototype
       * is missing void arg
       */
      scds_syslog(LOG_ERR,
          "Failed to access file <%s> : <%s>",
          xfnts_conf, strerror(errno));   /*lint !e746 */
      return (1);
   }

The server daemon binary is accessible on the cluster node or zone.

if (stat("/usr/openwin/bin/xfs", &statbuf) != 0) {
      scds_syslog(LOG_ERR,
          "Cannot access XFS binary : <%s> ", strerror(errno));
      return (1);
   }

The Port_list property specifies a single port.

scds_port_list_t   *portlist;
   err = scds_get_port_list(scds_handle, &portlist);
   if (err != SCHA_ERR_NOERR) {
      scds_syslog(LOG_ERR,
          "Could not access property Port_list: %s.",
         scds_error_string(err));
      return (1); /* Validation Failure */
   }

#ifdef TEST
   if (portlist->num_ports != 1) {
      scds_syslog(LOG_ERR,
          "Property Port_list must have only one value.");
      scds_free_port_list(portlist);
      return (1); /* Validation Failure */
   }
#endif

The resource group that contains the data service also contains at least one network address resource.

scds_net_resource_list_t *snrlp;
   if ((err = scds_get_rs_hostnames(scds_handle, &snrlp))
      != SCHA_ERR_NOERR) {
      scds_syslog(LOG_ERR,
          "No network address resource in resource group: %s.",
         scds_error_string(err));
      return (1); /* Validation Failure */
   }

   /* Return an error if there are no network address resources */
   if (snrlp == NULL || snrlp->num_netresources == 0) {
      scds_syslog(LOG_ERR,
          "No network address resource in resource group.");
      rc = 1;
      goto finished;
   }

Before it returns, svc_validate() frees all allocated resources.

finished:
   scds_free_net_list(snrlp);
   scds_free_port_list(portlist);

   return (rc); /* return result of validation */

Note –

Before it exits, the xfnts_validate method calls scds_close() to reclaim resources that were allocated by scds_initialize(). scds_initialize() Function and the scds_close(3HA) man page contain more information.

`xfnts_update` Method

The RGM calls the Update method to notify a running resource that its properties have changed. The only properties that can be changed for the xfnts data service pertain to the fault monitor. Therefore, whenever a property is updated, the xfnts_update method calls scds_pmf_restart_fm() to restart the fault monitor.

  /* check if the Fault monitor is already running and if so stop
   * and restart it. The second parameter to scds_pmf_restart_fm()
   * uniquely identifies the instance of the fault monitor that needs
   * to be restarted.
   */

   scds_syslog(LOG_INFO, "Restarting the fault monitor.");
   result = scds_pmf_restart_fm(scds_handle, 0);
   if (result != SCHA_ERR_NOERR) {
      scds_syslog(LOG_ERR,
          "Failed to restart fault monitor.");
      /* Free up all the memory allocated by scds_initialize */
      scds_close(&scds_handle);
      return (1);
   }

   scds_syslog(LOG_INFO,
   "Completed successfully.");

Note –

The second argument to scds_pmf_restart_fm() uniquely identifies the instance of the fault monitor to be restarted if there are multiple instances. The value 0 in the example indicates that there is only one instance of the fault monitor.

Chapter 8 Sample DSDL Resource Type Implementation

X Font Server

X Font Server Configuration File

TCP Port Number

SUNW.xfnts RTR File

Naming Conventions for Functions and Callback Methods

scds_initialize() Function

xfnts_start Method

Validating the Service Before Starting the X Font Server

Starting the Service With svc_start()

Returning From svc_start()

xfnts_stop Method

xfnts_monitor_start Method

xfnts_monitor_stop Method

xfnts_monitor_check Method

SUNW.xfnts Fault Monitor

xfonts_probe Main Loop

svc_probe() Function

Determining the Fault Monitor Action

xfnts_validate Method

xfnts_update Method

`SUNW.xfnts` RTR File

`scds_initialize()` Function

`xfnts_start` Method

Starting the Service With `svc_start()`

Returning From `svc_start()`

`xfnts_stop` Method

`xfnts_monitor_start` Method

`xfnts_monitor_stop` Method

`xfnts_monitor_check` Method

`SUNW.xfnts` Fault Monitor

`xfonts_probe` Main Loop

`svc_probe()` Function

`xfnts_validate` Method

`xfnts_update` Method