Sun Cluster Data Services Developer's Guide for Solaris OS

Returning From svc_start()

Even when svc_start() returns successfully, the underlying application might have failed to start. Therefore, svc_start() must probe the application to verify that it is running before returning a success message. The probe must also take into account that the application might not be immediately available because it takes some time to start. The svc_start() method calls svc_wait(), which is defined in the xfnts.c file, to verify that the application is running.

/* Wait for the service to start up fully */
   scds_syslog_debug(DBG_LEVEL_HIGH,
       "Calling svc_wait to verify that service has started.");

   rc = svc_wait(scds_handle);

   scds_syslog_debug(DBG_LEVEL_HIGH,
       "Returned from svc_wait");

   if (rc == 0) {
      scds_syslog(LOG_INFO, "Successfully started the service.");
   } else {
      scds_syslog(LOG_ERR, "Failed to start the service.");
   }

The svc_wait() function calls scds_get_netaddr_list() to obtain the network address resources that are needed to probe the application.

/* obtain the network resource to use for probing */
   if (scds_get_netaddr_list(scds_handle, &netaddr)) {
      scds_syslog(LOG_ERR,
          "No network address resources found in resource group.");
      return (1);
   }

   /* Return an error if there are no network resources */
   if (netaddr == NULL || netaddr->num_netaddrs == 0) {
      scds_syslog(LOG_ERR,
          "No network address resource in resource group.");
      return (1);
   }

The svc_wait() function obtains the Start_timeout and Stop_timeout values.

svc_start_timeout = scds_get_rs_start_timeout(scds_handle)
   probe_timeout = scds_get_ext_probe_timeout(scds_handle)

To account for the time the server might take to start, svc_wait() calls scds_svc_wait() and passes a timeout value equivalent to three percent of the Start_timeout value. The svc_wait() function calls the svc_probe() function to verify that the application has started. The svc_probe() method makes a simple socket connection to the server on the specified port. If it fails to connect to the port, svc_probe() returns a value of 100, which indicates a total failure. If the connect goes through but the disconnect to the port fails, svc_probe() returns a value of 50.

On failure or partial failure of svc_probe(), svc_wait() calls scds_svc_wait() with a timeout value of 5. The scds_svc_wait() method limits the frequency of the probes to every five seconds. This method also counts the number of attempts to start the service. If the number of attempts exceeds the value of the Retry_count property of the resource within the period that is specified by the Retry_interval property of the resource, the scds_svc_wait() function returns failure. In this case, the svc_start() function also returns failure.

#define    SVC_CONNECT_TIMEOUT_PCT    95
#define    SVC_WAIT_PCT       3
   if (scds_svc_wait(scds_handle, (svc_start_timeout * SVC_WAIT_PCT)/100)
      != SCHA_ERR_NOERR) {

      scds_syslog(LOG_ERR, "Service failed to start.");
      return (1);
   }

   do {
      /*
       * probe the data service on the IP address of the
       * network resource and the portname
       */
      rc = svc_probe(scds_handle,
          netaddr->netaddrs[0].hostname,
          netaddr->netaddrs[0].port_proto.port, probe_timeout);
      if (rc == SCHA_ERR_NOERR) {
         /* Success. Free up resources and return */
         scds_free_netaddr_list(netaddr);
         return (0);
      }

       /* Call scds_svc_wait() so that if service fails too
      if (scds_svc_wait(scds_handle, SVC_WAIT_TIME)
         != SCHA_ERR_NOERR) {
         scds_syslog(LOG_ERR, "Service failed to start.");
         return (1);
      }

   /* Rely on RGM to timeout and terminate the program */
   } while (1);

Note –

Before it exits, the xfnts_start method calls scds_close() to reclaim resources that are allocated by scds_initialize(). scds_initialize() Function and the scds_close(3HA) man page contain more information.