Planned and Unplanned Outage Solutions

Planned and unplanned outages might occur in your PeopleSoft environment. Learn about the Oracle solutions that are available to minimize application downtime. Minimizing your PeopleSoft application downtime is based on the application, not the downtime of an individual component.

Unplanned Outage Solutions

The following are types of unplanned outages that might be caused by system or human failures in a PeopleSoft environment, and the technology solutions that you can use to recover and keep downtime to a minimum.

We recommend you test the basic scenarios below to ensure they are configured correctly in your environment, and to be confident you are ready to act if an emergency occurs.

Outage Type Oracle Solution Benefits Recovery Time
Load balancer Software load balancer, configuration replicated locally Connections seamlessly migrate to surviving load balancer No downtime.
PeopleSoft PIA Web Server node or component failure Redundant Web Servers without Coherence*Web cache server cluster Connections are redistributed to surviving nodes. Surviving nodes continue processing. No downtime. Re-authentication and re-submission of work may be required.
PeopleSoft PIA Web Server node or component failure Redundant Web Servers with Coherence*Web cache server cluster Connections are redistributed to surviving nodes, preserving session state. Surviving nodes continue processing. No downtime and no re-authentication or re-submission of work.
PeopleSoft Application Domain Server node or component failure

Redundant application domain servers

PIA servers configured with active connections load balanced across application servers, resubmits the work to a surviving app server.

Connections are redistributed to surviving nodes. Surviving nodes pick up the requests, no loss of context No downtime.
Database server or instance failure Oracle RAC, Application Continuity, FAN events Automatic recovery of work on failed instance – sessions transparently fail over, updates are resubmitted automatically Seconds to minutes.
Site failure Oracle Data Guard, rsync Full site failover with minimal to no loss of data Less than 10 minutes after the decision is made, for database role transition, file system mount, and PeopleSoft application startup.
Storage failure ASM Mirroring and automatic rebalance. No downtime.
Storage failure Oracle RMAN with flash recovery area. Fully managed database recovery and disk-based backups. Minutes to hours.
Storage failure Region-local Oracle object storage Cloud-managed database recovery and disk-based backups Minutes to hours.
Storage failure Oracle Data Guard, rsync Full site failover with minimal to no loss of data. Less than 10 minutes after the decision is made, for database role transition, file system mount, and PeopleSoft application startup.
Human error Oracle Data Guard with Flashback Database. Research on copy (standby) Hours (research through data fix).
Data corruption Oracle RMAN with fast recovery area. Online block media recovery and managed disk-based backups. Minutes to hours.
Data corruption Oracle Active Data Guard Automatically detects and repairs corrupted blocks using the physical standby database. No downtime, transparent to application.
Data corruption Oracle Data Guard Automatic validation and re-transmission of corrupted redo blocks No downtime, transparent to application.
Data corruption Oracle Data Guard Broker Fast failover to a local standby database, or full site failover to DR site.

Local standby: Less than 5 minutes after the decision is made, for database role transition, file system mount, and PeopleSoft application startup.

Full site failover: Less than 10 minutes after the decision is made, for database role transition, file system mount, and PeopleSoft.

Note:

It may be possible to recover quickly from a fault at the primary site and resume operations there, which may be less disruptive to the overall operation than switching to the secondary site. Thus, in the table above, we mentioned making a decision to do the failover and the time it is expected to take to perform a scripted transition once the decision is made. If you decide to not require a human decision before a failover to a DR site, then you will configure Fast-Start Failover in the database.

If Fast-Start Failover is configured and if the standby database apply lag is within the fast start failover lag limit, then the time to bring up the DR site will only add the fast-start failover timeout threshold to the overall time to transition to the standby.

Whether the action is taken automatically or not, the failover process should be fully scripted to ensure swift and accurate execution.

Planned Maintenance Solutions

The following is a summary of planned maintenance activities that typically occur in a PeopleSoft environment, and the recommended technology solutions to keep downtime to a minimum.

Maintenance Activity Solution PeopleSoft Outage
Mid-Tier operating system or hardware upgrade Load balancing, redundant services across Web and Tuxedo application servers. No downtime, assuming Coherence*Web is running.
PeopleSoft (application and PeopleTools) PeopleSoft out-of-place patching. Minutes (no schema changes) to hours (schema changes required)
PeopleSoft application configuration change PeopleSoft application rolling restart. No downtime
PeopleSoft upgrades PeopleSoft out-of-place upgrades. Hours to days (schema changes will be required; time depends on database size)*
Database tier operating system patching or hardware maintenance Oracle RAC rolling, Standby-First. No downtime
Oracle Database Release Update patching Oracle RAC rolling, Standby-First. No downtime
Oracle Database upgrades Data Guard transient logical rolling upgrade. See: Reducing PeopleSoft Downtime Using a Local Standby Database. Seconds to minutes
Oracle Grid and Oracle Clusterware upgrade and patches Oracle RAC rolling, Standby-First. No downtime

* In practice, there are ways to mitigate the impact of extended upgrade downtime - for example, by providing a read-only replica. Oracle Consulting Services can help you plan and execute the upgrade.