Deployment Planning Guide > High-Availability Deployment Planning >

How Service Failures Affect the Siebel Deployment


This topic describes how major architectural components in a Siebel deployment are affected when a service failure occurs. Services include both hardware platforms and software applications.

Web Clients

Client PC hardware failure and browser crashes are the most common causes of Web Client failure. Operating System crashes can also cause this, but are rare. When the Web Client fails, user sessions are lost even though the sessions usually continue running on the Siebel Server.

This is because when the Web Client fails, the Siebel session cookie usually is also lost. Without the cookie, the user cannot be routed back to the existing user session on the Siebel Server. Therefore, the user will usually need to log in again and start a new user session.

Web Servers

Web servers may fail because of hardware or software issues. Typically, when the Web server fails, Web Clients cannot access Siebel Applications, since requests must go through the Web server first. Existing connections from the Web server to Siebel Servers are also lost.

If Web servers are set up for high availability, for example if there are multiple, load-balanced Web servers, then subsequent requests can be routed to another working Web server. Usually when this occurs, the function of affected Web Client user sessions is not noticeably affected.

Third-Party HTTP Load Balancer for Siebel Servers

Third-party HTTP load balancers handle communication between Web servers and Siebel Servers. Causes of failure differ significantly between hardware-based and software-based solutions. When the load balancer fails, Web Clients and Web servers going through the load balancer cannot communicate with Siebel Servers. Network connections in most cases would also be severed, and user sessions are lost.

If there are multiple, clustered load balancers, then the backup load balancer can take over. Some load balancers can fail over TCP sessions to a backup load balancer. See the vendor's load balancer documentation for details.

When the backup load balancer takes over, user sessions continue without interruption. However, users sessions are lost if any of the following occurs:

  • A Web Client makes a request while the load balancers are failing over.
  • TCP sessions are not cleaned up properly on the Web servers.

Siebel Servers

Siebel Servers may fail because of hardware or software issues. If the hardware platform fails, or the Siebel Server software fails, then all Siebel Server components are lost.

In other cases, individual Siebel Server components may fail. This can cause related user sessions or user requests to fail. The major groups of Siebel Server components are as follows:

  • Application Object Managers (AOMs). When AOM processes terminate unexpectedly, user sessions hosted by the AOM are lost. Users must log in to the Siebel application again.

    If users return to the same Siebel Server, SCBroker tries to route the user request to a running AOM process.

    If there is only one AOM process and it has failed, then the request is directed to a different Siebel Server, unless there is only one Siebel Server.

    If AutoStart is enabled, then the Siebel Server process tries to restart the terminated AOM process. If successful, the new AOM process can host new user sessions.

  • Batch-mode server components going through SRBroker. Most batch-mode server components receive server requests through SRBroker. An example is Workflow Manager. When a batch-mode component fails, the current server request fails as follows:
  • Synchronous server requests. An error is returned to the requesting component.
    • Asynchronous server requests. An error is logged but not returned to the requesting component.

      Subsequent requests for the failed batch-mode component will be attempted against either a different instance of the component on the same Siebel Server, or an instance of it on a different server.

      If no instance of the batch-mode component is available, then the request is logged to the S_SRM_REQUEST table to be processed later.

  • Direct Object Manager requests. Examples of direct Object Manager requests are those to Siebel Configurator Object Manager, and communication between AOMs and the Reports Server. Some of these components, such as the Reports Server and Configurator, have a native failover mechanism.
  • Other server components with location restrictions. There are specialized server components that do not communicate through SRBroker. Siebel Remote Server is an example. Typically, requests to these components can only be processed by a specific Siebel Server. Therefore, if the server fails, requests to that server will fail, until the server is restarted.

Siebel Database

Access to the Siebel Database can fail due to a number of factors:

  • Database server hardware failure
  • Database server running out of resources
  • Disk failure
  • Network failure

The impact on the Siebel deployment will be either temporary or long-term. For example, a temporary networking interruption, or a quick database server reboot, would result in a temporary disruption in service. A long-term interruption may occur when there is database corruption or a major server malfunction.

In general, user sessions are lost when there is a Siebel Database service interruption. Users must log in to the system again. Object manager sessions will continue to try to connect to the database and once the database is running (assuming the connection retry count has not been exceeded), the connection will succeed. Users should not notice that there was an outage, unless they are currently working at the time of the database failure. In this case, users get database error messages.

If the interruption is temporary, interactive server components and most of the batch-mode server components try to reconnect with the Siebel Database.

If the interruption is long-term, the Siebel deployment must be shut down and restarted once the database service is restored.

Impact of Service Failures

Table 6 summarizes the impact of failure of services in the Siebel deployment. The table includes information on specific services not already covered.

Table 6. How Service Failures Affect the Siebel Deployment
Service Failed
Affected Component
Impact

Gateway Name Server

Siebel Server components and Siebel Configurator Object Manager

You cannot start or add any new components.

Users can continue to log in and out of Siebel applications. Existing user sessions are not interrupted. Server requests will continue to be processed successfully. Exceptions are listed below.

Server administration functions

Unavailable.

Siebel Reports Server and report functionality

If connection information has been cached, the Reports Server can still be called. By default, connection information is cached when the connection is made.

Siebel Configurator Object Manager

You can still launch product configurator sessions, as long as the connection information has been cached. By default, the connection information is cached when the first connection is made.

Name Server database (siebns.dat)

This database maintains server configuration information for the Siebel Enterprise Server. If this database is corrupted or lost, you must reinstall all Siebel Servers.

Siebel Server

AOM components

The Siebel application is unavailable.

Siebel Connection Broker (SCBroker) failure: You cannot create new user sessions. If the SISNAPI connection between the Web server and the Object Manager fails, SWSE will retry the connection. If after a certain number of attempts the connection is still not available, the connection will completely fail and the user gets an error message.

Existing user sessions are unaffected by SCBroker failures.

EAI

Interface to external application unavailable.

Batch components

Loss of functionality (components such as Assignment Manager or Workflow unable to process server requests).

File System

Attachments

Unavailable.

Correspondence

Unavailable.

Shared user preference files

Unavailable.

Docking transaction files from EIM

Unavailable.

Email Response

Unable to process inbound messages. Unable to send outbound messages with attachments.

File System Manager (FSM)

Components that access the FSM

Current requests fail.

Attachments

Unavailable.

Web server

Siebel Web Clients accessing Application Object Managers (AOMs)

The Siebel application is unavailable to Web Clients. Mobile Web Clients are unaffected.

EAI inbound HTTP Adaptor

Unavailable.

Siebel database

Client access, background tasks, batch tasks

Unable to access Siebel Business Applications. The Siebel Enterprise Server cannot function. Only the Mobile Web Client is not immediately affected by a Siebel Database failure.

Batch and interactive components

Unavailable.

Specific Failures and Associated Impact

Siebel Systems conducted a benchmark of potential failure scenarios when running a Siebel deployment. Table 7 summarizes these test results and the associated impact on the tested Siebel environment.

NOTE:  All tests were conducted in a test lab. Actual results in production may differ due to the complexity of a production environment.

Test Environment

The test environment included multiple, load balanced Web servers. Web server load balancing was provided by a hardware-based HTTP load balancer. Multiple Application Object Manager servers were deployed with load balancing provided by either Siebel native load balancing or third-party HTTP load balancers (usage depending on test scenarios). Multiple batch component application servers were deployed and the request distribution mechanism was provided by Service Request Broker (SRB) and Service Request Processor (SRP). Product Configurator Object Manager servers were used and load balanced by the Configurator-provided load balancing scheme. A clustered pair of database servers was used. A clustered Siebel Gateway Name Server was also deployed.

Table 7. How Service Failures Affect the Siebel Deployment
Service Failed
Affected Component
Impact

Gateway Name Server

Siebel Server components and Siebel Configurator Object Manager

You cannot start or add any new components.

Users can continue to log in and out of Siebel applications. Existing user sessions are not interrupted. Server requests will continue to be processed successfully. Exceptions are listed below.

Server administration functions

Unavailable.

Siebel Reports Server and report functionality

If connection information has been cached, the Reports Server can still be called. By default, connection information is cached when the connection is made.

Siebel Configurator Object Manager

You can still launch product configurator sessions, as long as the connection information has been cached. By default, the connection information is cached when the first connection is made.

Name Server database (siebns.dat)

This database maintains server configuration information for the Siebel Enterprise Server. If this database is corrupted or lost, you must reinstall all Siebel Servers.

Siebel Server

AOM components

The Siebel application is unavailable.

Siebel Connection Broker (SCBroker) failure: You cannot create new user sessions. If the SISNAPI connection between the Web server and the Object Manager fails, SWSE will retry the connection. If after a certain number of attempts the connection is still not available, the connection will completely fail and the user gets an error message.

Existing user sessions are unaffected by SCBroker failures.

EAI

Interface to external application unavailable.

Batch components

Loss of functionality (components such as Assignment Manager or Workflow unable to process server requests).

File System

Attachments

Unavailable.

Correspondence

Unavailable.

Shared user preference files

Unavailable.

Docking transaction files from EIM

Unavailable.

Email Response

Unable to process inbound messages. Unable to send outbound messages with attachments.

File System Manager (FSM)

Components that access the FSM

Current requests fail.

Attachments

Unavailable.

Web server

Siebel Web Clients accessing Application Object Managers (AOMs)

The Siebel application is unavailable to Web Clients. Mobile Web Clients are unaffected.

EAI inbound HTTP Adaptor

Unavailable.

Siebel database

Client access, background tasks, batch tasks

Unable to access Siebel Business Applications. The Siebel Enterprise Server cannot function. Only the Mobile Web Client is not immediately affected by a Siebel Database failure.

Batch and interactive components

Unavailable.

Specific Failures and Associated Impact

Siebel Systems conducted a benchmark of potential failure scenarios when running a Siebel deployment. Table 8 summarizes these test results and the associated impact on the tested Siebel environment.

NOTE:  All tests were conducted in a test lab. Actual results in production may differ due to the complexity of a production environment.

Test Environment

The test environment included multiple, load balanced Web servers. Web server load balancing was provided by a hardware-based HTTP load balancer. Multiple Application Object Manager servers were deployed with load balancing provided by either Siebel native load balancing or third-party HTTP load balancers (usage depending on test scenarios). Multiple batch component application servers were deployed and the request distribution mechanism was provided by Service Request Broker (SRB) and Service Request Processor (SRP). Product Configurator Object Manager servers were used and load balanced by the Configurator-provided load balancing scheme. A clustered pair of database servers was used. A clustered Siebel Gateway Name Server was also deployed.

Table 8. Specific Failures and Associated Impact
Component Tested
Failure Scenario
Observed Behavior

Siebel Database Server

Observe system behavior while driving server CPU load to 100%.

  • Significant response time impact.
  • No failures were observed.

Siebel Object Manager (eChannel)

Observe system behavior while driving server CPU load to 100%.

  • Minor response time impact.
  • No failures were observed.

Web Server

Observe system behavior while driving server CPU load to 100%.

  • Negligible response time impact.
  • No failures were observed.

Workflow Server

Observe system behavior while driving server CPU load to 100%.

  • Negligible response time impact.
  • No failures were observed.

Siebel Object Manager (eChannel)

Observe system behavior while server memory consumption is 100%.

  • Significant response time impact.
  • Increased CPU usage and context switching were observed.
  • A few login failures were observed when attempting to log in additional users.

Workflow Server

Observe system behavior while server memory consumption is 100%.

  • Major response time impact.
  • Increased CPU usage and context switching were observed.
  • A few login failures were observed when attempting to log in additional users.

Siebel Object Manager (eChannel)

Observe system when all available disk space is consumed on the tested server.

  • Minor response time impact in some transactions.
  • Major response time impact when logging in new users.
  • No failures were observed.

Workflow Server

Observe system when all available disk space is consumed on the tested server.

  • Significant response time impact in Workflow transaction response time.
  • Significant response time impact when additional users logged in.
  • Negligible increase in CPU and context switching.
  • No failures were observed.

SCBroker

Simulate a process crash for various task-based server components while the server is handling both synchronous and asynchronous server requests. Also note system recovery after bringing the process back up.

  • SCBroker auto-restarts upon receiving an SEGV signal.
  • No failures were observed.
  • A new SCBroker was started when an SEGV signal was received.

SRBroker

Simulate a process crash for various task-based server components while the server is handling both synchronous and asynchronous server requests. Also note system recovery after bringing the process back up.

  • SRBroker does not auto-restart upon receiving an SEGV signal.
  • When eScripting invokes a WF, users get a "no server connect string" error message.
  • Failures were observed for the above step.

WFProcmgr (Workflow)

Simulate a process crash for various task based server components while the server is handling both synchronous and asynchronous server requests. Also note system recovery after bringing the process back up.

  • Shutdown WFProcmgr on one server caused a few failures initially and then stabilized with no further failures.
  • CPU and memory activity increased on the server still running WFProcmgr.
  • When the other WFProcMgr is shut down many failures resulted.
  • Brought up one WFProcMgr and no more failures were observed.

Siebel Gateway Name Server

Simulate the failure of the Siebel Gateway Name Server.

  • Unable to connect to srvrmgr but the transactions were passing.
  • When adding 100 more users, still unable to connect to srvrmgr but no errors were observed.
  • When the Gateway Name Server was restarted, still unable to connect to the Gateway.

Siebel Object Manager

Consume all available tasks on an Object Manager and observe the result.

  • Object Managers fail over to another Object Manager as expected when MaxTasks is reached.
  • When all Object Managers are out of tasks, the user receives a "server busy" error message.
  • When some users log out, new users can connect to servers again.

Siebel Object Manager

Simulate resource leaks while server recycling is enabled, and verify how process recycling works under load.

  • New Object Manager gets created when MemoryLimit is hit.
  • Old Object Manager remains instantiated for a period of time (even when no more users are running on it), but eventually the old Object Manager is recycled.
  • When MemoryLimitPercent is hit, then the whole component restarts. All traffic went to the other server.

Siebel Object Manager

Simulate applying a new SRF (simple SRF and browser script changes) and stopping and restarting each server.

  • Browser script gets updated on a visit to a URL (as documented in Bookshelf).
  • User that visits the URL hangs, even after browser scripts get updated.

Siebel Object Manager

Simulate a hanging thread or process.

  • Can still log in to the app server with the spinning process.
  • After simulating a hanging thread, 100 extra users were added. After that, killing the spinning process causes about 40 running users to fail (the number of users on that OM). This implies:
    • Object Managers with hanging threads still receive new connections.
    • You cannot safely kill a spinning process unless you offline/shutdown the whole comp group or server.
Deployment Planning Guide