Internals for Script-Based Plug-Ins

Language:

This section describes the internals for the script-based plug-ins. It covers the following topics:

Plug-In Script Functional Requirements

A protection group has several global properties that are valid and relevant to both the primary and secondary clusters, and by extension all cluster nodes. Additionally, each replicated component has a set of local and global properties. Together, these properties describe and control the replication pertaining to one or more replicated services.

This section describes the following topics:

Plug-In Script Argument Validation

Each script provided in one of the protection group properties must be capable of validating the arguments with which it has been called in order to determine whether the arguments are complete and acceptable. Validation ensures that scripts such as switchover_script and takeover_script, that are not called regularly, do not fail because their arguments have become incompatible. Failing to validate the arguments could lead to the inability to switch over or take over in an emergency.

Scripts must therefore be able to validate the arguments defined by the administrator through the Oracle Solaris Cluster Manager browser interface or command-line interface (CLI), and issue a return code of zero, if they are correct. The script must not perform its real function at this stage, for example, to switch over, take over, or create a script-based plug-in configuration. If you do not want to perform these checks, the script must still return without performing any additional work in response to the validate arguments call.

The validate arguments step is denoted by the disaster recovery framework script-based plug-in Mbean passing validate_parameters=true as one of the command-line arguments. When a script-based plug-in replication component is added to a protection group, all the replicated component-specific scripts listed in Protection Group Properties - Overview are called on to validate their arguments. This call is made on one or more nodes per cluster depending on the particular script-based plug-in replicated component configuration as defined in the configuration file. For more information, see configuration_file Property and Protection Group Properties - Overview.

The same validation calls are made under the following circumstances:

When the replication component is modified because the modification might result in program argument changes
When there are protection group validation calls in response to the geopg validate protection-group command
When the disaster recovery framework is starting and recreating the initial script-based plug-in replicated component objects that are stored in the Cluster Configuration Repository (CCR)

There are also two protection group level program properties, add_app_rg_script and remove_app_rg_script, that have associated protection group argument properties.

Standardized Script Command-Line Arguments

All scripts are called using a standardized command-line structure. The format of the command line is as follows:

# developer-program-name administrator-supplied-program-arguments \
function=step-name \
validate_parameters={true|false} \
currentRole={PRIMARY|SECONDARY} \
pg=protection-group \
additional-function-dependent-arguments

where developer-program-name is the name of one of the externally developed scripts and administrator-supplied-program-arguments provides the arguments given for this script by the administrator when setting up a script-based plug-in configuration.

The use of the function=step-name argument enables scripts to determine what action they are being called on to perform. This function is especially important if a single script has been written to perform one or more tasks. Two scripts in particular need to be concerned with this argument: switchover_script and takeover_script.

The currentRole argument indicates the current role of the local cluster, while the pg argument denotes the name of the protection group containing the script-based plug-in configuration. Scripts should be prepared to deal with values in either uppercase or lowercase. The same is true of the newRole argument for switchover_script and takeover_script.

All scripts, if successful, must return a zero exit code. On failure, all scripts must return a nonzero exit code and generate a localized error message on standard error (stderr). Any output sent to standard output (stdout) is generally ignored (with the exception of create_config_script), unless common agent container logging is turned on. In that case, the output is saved in the /var/cacao/instances/default/logs/cacao.0 log file, along with other common agent container debugging information. Do not save debugging information as a matter of course because the volume of output can be substantial.

Script-Based Plug-In Replication Resource Groups and Resources

The name of the replication resource group for a particular protection group is defined by the value returned by create_config_script in the reprg= string sent to standard output. This string contains one or more replication resources referenced by individual replication resources named by create_config_script in the reprs= string sent to standard output. For any one protection group, the value returned by create_config_script must be identical.

The function of the replication resources is to monitor the state of the replication associated with the resource and thus the replicated component. The replication resource status, which is set by a probe method, is used to determine the overall status of the protection group. The start and stop methods of the replication resource do not start and stop the actual data replication.

The replication resource must be enabled and disabled by start_replication_script and stop_replication_script.

Protection Group Status Mapped from Replication Resource Status

The protection group status reflects the aggregated status of all replication resources in the replication resource group created by the developer-written create_config_script program.

The following table illustrates the mapping from the status of each replication resource to the protection group status. An X represents any possible status for the resource and demonstrates that the most restrictive status governs the overall status of the protection group.

Unknown	Faulted	Degraded	Online	Protection Group Status
True	X	X	X	UNKNOWN
False	True	X	X	FAULTED
False	False	True	X	DEGRADED
False	False	False	True	ONLINE

How the Disaster Recovery Framework Handles Password Properties

This section describes the mechanism by which the disaster recovery framework handles password properties, when the entity added to a protection group (for example, an Oracle Data Guard or script-based plug-in configuration) requires a password property.

The password properties are read during the execution of the geopg command. These password properties are recognized by their conformance to the pattern *_password. When geopgi (a back-end program called by the geopg command) parses the protection group properties list, it looks for such arguments. If the password has been supplied in cleartext, as shown in the following example, then geopg warns the user that the password is insecure, but continues processing the password.

… -p sysdba_password=password …

For any password properties that have been specified, the geopgi program enters non-echo mode and prompts for these passwords, as shown in the following example:

… -p local_service_password= -p remote_service_password= …

Once all the arguments have been processed, these pairs are written into an internal password file on the local node, which is root readable only. A separate internalPasswordFile argument is inserted into the properties list with the value hostname:filename.

Once in the core disaster recovery framework Java code, the internalPasswordFile argument is unpacked, and the file is read remotely through an internal common agent container to common agent container call. For security, the passwords are then converted into the hexadecimal representation of their character codes before they are written to the Oracle Solaris Cluster CCR, if the rest of the properties are correct and complete, and the validation succeeds.

When required, the passwords can be queried and converted back from the CCR and supplied to the appropriate programs to achieve the relevant switchovers, takeovers, or status queries.