Sun Cluster Data Service for PostgreSQL Guide for Solaris OS

Tuning the Sun Cluster HA for PostgreSQL Fault Monitor

The Sun Cluster HA for PostgreSQL fault monitor verifies that the data service is running in a healthy condition.

A Sun Cluster HA for PostgreSQL fault monitor is contained in each resource that represents the PostgreSQL instance. You created these resources when you registered and configured Sun Cluster HA for PostgreSQL. For more information, see Registering and Configuring Sun Cluster HA for PostgreSQL.

System properties and extension properties of the PostgreSQL resources control the behavior of the fault monitor. The default values of these properties determine the preset behavior of the fault monitor. Because the preset behavior should be suitable for most Sun Cluster installations, tune the Sun Cluster HA for PostgreSQL fault monitor only if you need to modify this preset behavior.

Tuning the Sun Cluster HA for PostgreSQL fault monitor involves the following tasks:

The fault monitor Sun Cluster HA for PostgreSQL differentiates between connection problems and definitive application failures. The value of NOCONRET in the PostgreSQL parameter file specifies the return code for connection problems. This value results in a certain amount of ignored consecutive failed probes as long as they all return the value of NOCONRET. The first successful probe reverts this “failed probe counter” back to zero. The maximum number of failed probes is calculated as100 / NOCONRET. A definitive application failure will result in an immediate restart or failover.

The definition of the return value NOCONRET defines one of two behaviors for failed database connections of a PostgreSQL resource.

  1. Retry the connection to the test database several times before considering the PostgreSQL resource as failed and triggering a restart or failover.

  2. Complain at every probe that the connection to the test database failed. No restart or failover will be triggered.

To achieve either of these behaviors, you need to consider the standard resource properties retry_interval and thorough_probe_interval.

The value 100/NOCONRET defines the maximum number of retries for the probe in the case of a failed connection.

Assume that the following resource parameters are set:

If you encounter, for example, a shortage of available database sessions for 7 minutes, you will see 7 complaints in /var/adm/messages, but no resource restart. If the shortage lasts 10 minutes, you will have a restart of the PostgreSQL resource after the 10th probe.

If you do not want a resource restart in the previous example, set the value of NOCONRET=10 to 5 or less.

For more information, see Tuning Fault Monitors for Sun Cluster Data Services in Sun Cluster Data Services Planning and Administration Guide for Solaris OS.

Operation of the Sun Cluster HA for PostgreSQL Parameter File

The Sun Cluster HA for PostgreSQL resources use a parameter file to pass parameters to the start, stop, and probe commands. Changes to these parameters take effect at least at every restart or enabling, disabling of the resource.

Changing one of the following parameters, takes effect at the next probe of the PostgreSQL resource:


Note –

A false change of the parameters with an enabled PostgreSQL resource might result in an unplanned service outage. Therefore, disable the PostgreSQL resource first, execute the change, and then re-enable the resource.


Operation of the Fault Monitor for Sun Cluster HA for PostgreSQL

The fault monitor for Sun Cluster HA for PostgreSQL ensures that all the requirements for the zone boot component to run are met: