Skip navigation.

Extension SDK for BEA WebLogic Network Gatekeeper

  Previous Next vertical dots separating previous/next from contents/index/pdf Contents View as PDF   Get Adobe Reader

High availability

The following sections contain descriptions of high-availability aspects for extensions to WebLogic Network Gatekeeper:

 


Introduction

HA is important for both incoming and outgoing traffic. This is handled mostly by the Plug-in and SC Managers. HA and load balancing between the underlying platform and Network Gatekeeper are managed by the SC Manager for northbound traffic, and the plug-in manager for southbound traffic. Recovery and distribution mechanisms only apply to newly created sessions since ongoing sessions will maintain the established reference between Network Gatekeeper and the underlying platform since there is no redundancy on a session level.

Network Gatekeeper would normally run on a cluster of servers running in parallel to support high availability. The number of servers required for the different configurations ranges normally from two to eight. This means for instance that all SCs will be run on all machines and contain exactly the same information, but have different active sessions. Each server running SCs will also execute a plug-in manager and an SC Manager, all synchronized and to be treated as equals. This also means that if one server crashes, applications may continue to use the system uninterrupted. All active sessions on the faulty machine will be lost.

 


Plug-in Manager and SC Manager

Each Network Gatekeeper server will have its own instance of the Plug-in Manager and SC Manager. All these instances share the same information, which means that it makes no difference which instance is used. Once one manager is obtained references to all other managers can be acquired. This is performed with the getAllResourceManagers method in the ResourceManager interface and getAllSCSManagers method in the SCSMgr interface.

An external plug-in could poll the Plug-in Manager and the SC Manager at regular intervals to see if additional managers have started.

 


SC Manager

The SC Manager can be used by all plug-ins capable of detecting network triggered sessions to obtain references to SCs.

Each Network Gatekeeper SLEE executes an instance of the SC Manager. All instances within one Network Gatekeeper node are synchronized and are to be treated as equals. Upon startup of an Network Gatekeeper SLEE, all SCs dealing with network triggered sessions will register their callback interfaces in the SC Manager executing in the same SLEE, and the change is propagated between all SC Manager instances.

Plug-ins using SC Manager

If no SC is found to be active or if all are under severe overload, the SC Manager will raise a SCSMgmtException to the getSCS method call. Under such a condition the plug-in should abort the dialogue since no suitable SC is available in the Network Gatekeeper cluster.

An SC returned by the SC Manager has always been checked and found working, however something might have happened to it during the time it takes the plug-in to invoke the reportNotification method. Under such a condition the plug-in could either choose to use the getSCS method again or to abort the dialogue.

There is a pinging mechanism between the SC Manager and the SCs. If an SC is found to be not reachable, it is put in a "zombie list" maintained by the SC Manager. All entries in the zombie list are checked periodically by the SC Manager, and zombies that are found working after some time will be put back in the list of active again. This mechanism deals with the case were network connectivity is lost for some time between Network Gatekeeper hosts.

In the case of inactivity, the plug-in could check the SC Managers for existence periodically by invoking __non_existant() on the SCSManager. However this may work differently with different ORBs but in our case it should be fine since we use the same ORB.

Failure on notification reporting

If the plug-in gets a CORBA exception on reportNotification towards an SC, it must consider the type of exception.

There is once condition that should trigger the plug-in to try with another SC by either invoking SCSDiscovery.getSCS or shifting to another SC if it uses the directly registered callbacks. This condition is if the received exception is a org.omg.CORBA.SystemException and completed status indicates org.omg.CORBA.CompletionStatus.COMPLETED_NO. There is only one way to guarantee that a broken network connection will not result in a lost relationship between Network Gatekeeper and the underlying platform on that SC, and that is putting the SC in a zombie list and perform regular isActive checks on the SC.

Listing 6-1 Examining the type of exception

if ( ex instanceof org.omg.CORBA.SystemException ) {
   org.omg.CORBA.SystemException coSyEx = (org.omg.CORBA.SystemException) ex;
   if ( coSyEx.completed == org.omg.CORBA.CompletionStatus.COMPLETED_NO)
      retry = true;
}

If the completion status indicates COMPLETED_YES or COMPLETED_MAYBE the plug-in cannot know for sure whether the notification has been handled or not and it should therefore treat the call normally. The plug-in should either start an activity supervision timer on the call that will expire after a certain time if no action is performed on the call from Network Gatekeeper, or it could rely on supervision timers in the MSC that will cause a TC_ABORT from the MSC after some time.

Incoming traffic

This section describes HA regarding the incoming traffic, that is traffic from the telecom network.

When a network-triggered event that should be sent to an SC is received, the SC Manager can be used. This manager will always return an active SC at the time of the request.

Despite this, the SC may crash immediately after the SC was received. In this case, the plug-in will have to retrieve a new SC using the SC Manager. Now, the SC will detect the error and return another working SC instance, if one exists.

The plug-in can also use the call back interfaces that the SCs register directly in the plug-in. If the plug-in detects an error that is not transient, the faulty listeners should be removed. The SC will register as a listener again once it is activated/restarted. On transient errors the plug-in should keep the call back interface and try to reuse it with subsequent calls. On several repeated errors the interface may be discarded even in this case.

If no SCs are available or an error (for example a CORBA system exception) is encountered in an active session then the plug-in must take its own default action and also destroy all objects related to that session.

Outgoing traffic

This section describes HA regarding the outgoing traffic, that is traffic from the SCs to the plug-ins.

The outgoing traffic works in a similar way as the incoming. In this case the Plug-in Manager is responsible to deliver plug-ins to the SCs. If the Plug-in manager detects an error in a plug-in (for example a CORBA system exceptions), it will remove this plug-in.

In the same way that several SCs may run in parallel, plug-ins can also run in parallel to allow the service to be used uninterrupted.

 


SESPA and ESPA

If a SESPA SC looses contact with the ESPA SC it currently uses, there is an automatic HA switch performed for the ESPA session object and the ESPA Manager object. This is achieved by SESPA who registers a proxy to the SESPA object that implements the SESPA interface in the SLEE Common Loader. See Stateless adapter framework.

Below is an example on how the SESPA SC registers the proxy object in this HA handler.

Listing 6-2 Registering an object in the HA Handler

m_haHandler = HAHandler.createInstance(m_sc, 
                                       this, 
                                        null);
m_haProxy = (com.incomit.sespa.myservicecapability.MyServiceCapability) m_haHandler.getHAProxy( com.incomit.sespa.myservicecapability.MyServiceCapability.class);

The HA handler is fetched and the Service Context and the object implementing the SESPA SC are provided as arguments. An additional parameter, a custom recovery manager, can also be provided. This is discussed later in this section.

When the HA handler has been retrieved, the class object representing the SESPA interface is provided to the HA handler and a HA proxy object is returned. This HA Proxy object is added to the SLEE Common loader as described in Stateless adapter framework.

When HA switch is performed between SESPA and ESPA, this is transparent for the SESPA, so the objects representing the ESPA session and ESPA manager can be used after a HA switch. If the SESPA implementation has objects that are created using either the session or manager object, these are not automatically restored. The SESPA SC must implement this recovery functionality. The object that performs the recovery, the service specific recovery manager, is provided as a parameter to the createInstance(...) method. The method recoversession(...) is called on the service specific recovery manager after the recovery manager has restored the ESPA Session and Manager objects.

An example of when to use service specific recovery manager is when a SESPA implementation on top of ESPA Messaging keeps track of opened mailboxes and reopens them after a HA switch.

 

Skip navigation bar  Back to Top Previous Next