Bookshelf v7.8.2: Failover Mechanisms for Oracle BI Components

Oracle^®Business Intelligence Enterprise Edition Deployment Guide > Clustering, Load Balancing, and Failover in Oracle Business Intelligence >

Failover Mechanisms for Oracle BI Components

This section describes the failover process for BI components in a cluster:

BI Presentation Services Failure

Web Clients
Although an initial user session request can go to any BI Presentation Services, each user is then bound to a specific BI Presentation Services instance. Loss of that Presentation server will disconnect the session, and an error is relayed back to the browser. Any work in progress during the loss of the server that was not saved to disk is lost. The user must re-login to establish a new connection to an available BI Presentation Services. If user login is taking place via a Single Sign-On system such as Oracle Single Sign-On (SSO) this relogin takes place automatically. The new BI Presentation Services session will create a new BI Server session.

NOTE: When a BI Presentation Services instance fails, there is a small interval of time before the system recognizes that the instance has failed and before users are migrated to a new BI Presentation Services instance. There may be some loss of session state.
iBots
An error will be relayed to the BI Scheduler which will log the failure and then retry the job. The retry will establish a new connection to an available BI Presentation Services

BI Server Failure

When BI Server failure occurs, an ODBC error is sent back to the client.

BI Presentation Services
Each web user of Oracle BI has requests served by one BI Server. If this BI Server becomes unavailable, the end user may see an error, but a browser refresh will cause a new session to be established with an available BI Server.
Administration Tool
Administration Tool will relay the ODBC error when the BI Server that it is connecting to becomes unavailable, and then will close the connection. The Administrator will have to re-connect.
iBots
When BI Server failure occurs, the error will be relayed to the Scheduler, which logs the failure and retries the job. This will cause a connection to be established with an available BI Server.
3rd Party Clients
3rd Party Clients use ODBC to connect to the BI Server. When BI Server failure occurs, the error will be relayed and the session closed and re-opened according to the ODBC standard.

Master BI Server Failure

If the Master BI Server is unavailable, online metadata changes cannot be performed. This is an administration operation and does not impact runtime availability. If the Master BI Server is permanently unavailable, one of the other Servers must be appointed as the new master. This will require reconfiguration of all the servers.

BI Scheduler Failure

The BI Scheduler is monitored and managed by the Cluster Controller. If the BI Scheduler is unavailable, the Cluster Controller will determine the next BI Scheduler instance to activate. If the previous primary Scheduler becomes available again, the primary role will not revert.

When the active BI Scheduler fails, any open client connections will not receive an error as the Scheduler protocol is stateless and will seamlessly fail over.

iBots
iBot executions maintain state in the Scheduler tables. When the next instance of Scheduler becomes active, it will read the state of all job instances that were in progress, and execute them. An iBot will only deliver to those recipients that it did not deliver to prior to the failure of the primary instance.
Java, Command Line, or Script Job
The jobs will be re-executed from the beginning with a new job instance.

NOTE: Any job instance can be manually re-run from the Job Manager. For an iBot, this only delivers to those users that did not have successful deliveries. For example, if the mail server goes down half-way through an iBot execution, the re-run of the instance will only deliver to those recipients who did not receive email due to the mail server crash.

Cluster Controller Failure

The Cluster Controller supports detection of BI Server or BI Scheduler failures and failover for clients of failed servers.

The Cluster Controllers work on an active-passive model. All clients first attempt to connect to the Primary Cluster Controller. In the case where the Primary Cluster Controller is unavailable, clients will then connect to the Secondary Cluster Controller. The Secondary Cluster Controller then directs requests to BI Servers based on load and availability and to the active BI Scheduler instance. If the Primary later becomes available, all requests will then go to the Primary again.

The Secondary Cluster Controller monitors the session count on each BI Server just like the Primary, but does not dictate the active Scheduler unless the Primary Cluster Controller is down.

The Primary and Secondary Cluster Controllers monitor each other's life cycle. This is susceptible to a "Split-Brain" failure if the communication is down between the Cluster Controller instances, but each is up and can communicate with the other clients. In these cases, BI Servers are not effected, but the Scheduler may have two active instances at once. In rare cases, this may lead to double execution of jobs. When the line of communication comes back up, the Primary Cluster Controller will dictate to the cluster that only one Scheduler should be active. The possibility of a Split-Brain failure to occur is minimized by the fact that the Cluster components must exist on the same Local Area Network (LAN) and Multi-NIC is not supported for clustered deployments.

If both Cluster Controllers are unavailable, BI Presentation Services will return an error to any new user attempting to login. Existing sessions will not be affected.

Oracle® Business Intelligence Enterprise Edition Deployment Guide		Copyright © 2006, Oracle. All rights reserved.