Deployment Guide

     Previous  Next    Open TOC in new window  Open Index in new window  View as PDF - New Window  Get Adobe Reader - New Window
Content starts here

Provisioning Host Computers

This chapter summarizes host requirements for deployment components.

The purpose of this chapter is to help you provision host computers for components you plan to deploy in a production environment.

Component-Host Worksheets, provides example worksheets that characterize host requirements for typical deployment scenarios.

Use the worksheets provided in Component-Host Worksheets, to record decisions for your deployments.

This chapter includes the following topics:

BEA provides the following resources to help you to make these decisions.

Resource
Description
Consulting Services
Get advice from the experts.
Knowledge Base
  1. Register and log into the Support Center.
  2. Search the Knowledge Base for capacity-planning topics, such as "Load Balancing", "Failover", and the like.

 


Component Host Requirements

The following table provides guidelines for provisioning host computers for ALI components.

Component
Host Requirements
Portal Service
Minimum
Refer to Evaluating Hardware for the Portal Component, for guidance in determining adequate hardware.
Recommended
  • Dual processor, 1 Ghz or greater
  • 2 GB RAM
  • Separate host or share with Administrative portal and/or Image Service.
Scaling Guide
For large deployments, install multiple Portal components and configure load balancing and failover.
Portal Load Balancing
The portal can be used with any load balancing system that supports sticky IPs, such as Cisco LocalDirector, F5 Big-IP, and Windows NLB load balancing systems. Session states are maintained on the ALUI Web servers themselves. Therefore, if a Web server is taken out of the Web farm, sessions on that server are lost. If users have not set their Web browser to Remember My Password, they will have to log back in to the portal.
It is possible for the portal to become unresponsive while the Web site is still operational. In that case, the load balancer should assume that the portal is still operational and continue to send requests. The load balancer should perform content verification to ensure that the portal is actually available.
Since users use the Portal component in different ways, the load balancer should send requests to the computer with the most available resources instead of simply performing a round-robin distribution of requests.
For maximum fault tolerance, BEA recommends that load balancers be clustered, so if one load balancer fails, another will continue to distribute requests. Consult manufacturer guidelines on clustering load balancers.
Security Guide
Separate the Portal component from other system components to increase security. When you separate the Portal component and other components, persistent data (search and database) and back-end tasks (Automation Service) are not on the same computer.
If you run .NET portals in extranet environments, install the Portal component on its own computer and place that computer in a DMZ; install all other components (except the Image Service) behind the internal firewall.
Administrative Portal
Minimum
  • Can function also as a Portal component in a Web farm.
  • Can be installed on the same host as Portal component and/or Image Service.
  • If not functioning also as a Portal component, can be on the same host as Automation Service.
Recommended
Dedicate a CPU. Some administrative actions are CPU-intensive.
Scaling Guide
Install one Administrative Portal for your deployment.
Security Guide
If you prefer, you can install the Administrative Portal on a separate host that is located in a physical environment that only the ALI administrator can access.
Image Service
Minimum
256 MB RAM
Can be installed on the same host computer as the Portal component.
Recommended
512 MB RAM
More processing power is required if you use SSL or compression.
Scaling Guide
You can install one or many Image Service instances. Typically, you install one instance and specify this location when you install other ALUI components.
Security Guide
The Image Service contains static content that is typically not sensitive. Therefore, it is not imperative that you install the Image Service behind a firewall
Document Repository
Service
Minimum
256 MB RAM
Recommended
  • 512 MB RAM
  • Fault tolerant disk for doc storage.
Scaling Guide
Multiple instances of the Document Repository Service can be load balanced and failed over using IP load balancing such as NLB or a hardware load balancer. This will also provide partial failover for the Document Repository Service. However, the host for the Document Repository Service requires a single writable file system backing store. This backing store cannot be load balanced, but it can be failed over with one of the following:
  • A shared local disk, failed over via MSCS
  • An external shared network drive, implemented using either NAS or MSCS
Security Guide
Install the Document Repository Service behind a firewall and restrict access so that only computers that host ALUI components can access the Document Repository Service host. End-users do not need to access to the Document Repository Service host.
In Windows deployments, the Document Repository Service runs as a Windows service.
In UNIX or Linux deployments, the Document Repository Service runs as a daemon or console process.
Automation
Service
Minimum
Must be on a separate host from Portal component; otherwise, you must schedule all jobs to run during off-peak hours.
Recommended
  • Dual processor, 1 Ghz or greater
  • 1 GB RAM
Separate host or share with Administrative Portal and/or Image Service.
Scaling Guide
If you anticipate intensive use of Identity Services and Content Services jobs, install multiple Automation Services and configure load balancing. However, because Search performs document indexing and cannot be horizontally scaled, adding multiple Automation Services for the sole purpose of crawling content does not greatly improve system performance.
Automation Service Load Balancing
Automation Services do not require any special technology to provide load balancing or failover. Installing multiple instances of the Automation Services in a portal system will provide load balancing, as jobs can be designated to run on any set of available servers. In case of server failure mid-job, the job will not complete on another server. However, jobs are typically scheduled to recur, and the next instance of any standard ALI job will complete the processing.
Automation Services can be load balanced by registering job folders to multiple Automation Services. The Automation Services poll the database and pick up the next available job. Should one Automation Service fail, another Automation Service will run the necessary jobs.
Security Guide
Install the Automation Service behind a firewall and restrict access so that only computers that host ALI components can access the Automation Service host.
Search
Minimum
Small (up to 250,000 documents)
Dual CPU, 2 GB RAM
Medium (up to 500,000 documents)
Dual CPU, 4GB RAM
Larger
64-bit Solaris or AIX host, Dual CPU 1.2 Ghz or greater; 4-8 GB RAM
Recommended
64-bit Solaris or AIX host, Dual CPU 1.2 Ghz or greater; 4-8 GB RAM
Scaling Guide
CPU requirements are directly proportional to the number of users the component can support.
Indexing speed is proportional to the speed of an individual CPU.
RAM supports internal caching done by Search. RAM requirements are proportional to the size and number of documents indexed.
Search Load Balancing
You can improve performance by installing multiple Search instances and dedicating one instance for indexing jobs and the remaining instances for serving queries.
The Search indexing instance cannot be load balanced and does not support failover. If you experience performance problems with the Search indexing instance, enhance host capacity.
You can implement three levels of load capacity, with increasing deployment complexity:
  1. Single server performing both indexing and serving queries.
  2. One server performing indexing, with another taking the query load.
  3. One server performing indexing, with two or more taking the query load. The query servers can be proxied through a third-party load balancer.
Search (cont.)
Failover
You can implement three levels of failover, with increasing deployment complexity:
  1. Single server. With a single server performing both indexing and querying, an additional server can be configured for query failover. The failover server will NOT provide load balancing in this configuration.
  2. Two servers. With two servers splitting indexing and querying, an additional server can be configured for query failover. The failover server will NOT provide load balancing in this configuration.
  3. Externally managed query server pool. The third party load balancer will provide failover to other servers in the pool.
Security Guide
Install Search behind a firewall and restrict access so that only computers that host ALI components can access the Search host.
In Windows deployments, Search runs as a Windows service.
In UNIX and Linux deployments, Search runs as a daemon.
Search connects to other components through TCP.
Analytics
Minimum
  • Dual processor, 1 Ghz
  • 1 GB RAM
Recommended
Install on a separate host from the Portal component.
Scaling Guide
Install one Analytics.
Security Guide
Enable Unicast UDP on port 31314 for communication between Analytics and the Portal component.
End-user access to Analytics is gatewayed by the Portal component, so the Analytics host computer can reside behind a DMZ firewall.
Collaboration
Minimum
  • Dual processor, 1 Ghz
  • 1 GB RAM
Can reside on same host computer as other components that generate portlets, such as the Publisher, Studio, and Analytics.
Recommended
Install Collaboration on a separate host computer from other components to preclude contention for the JVM.
Scaling Guide
For large deployments, install multiple Collaborations and configure load balancing and failover. For details, refer to Collaboration documentation.
Security Guide
End-user access to the Collaboration is gatewayed by the Portal component, so the Collaboration host computer can reside behind a DMZ firewall. Collaboration connects to the Portal component through portlets via HTTP, to the ALI API Service via HTTP/SOAP, to the Collaboration database and portal database through JDBC, and to Search through TCP.
Publisher
Minimum
  • Dual processor, 1.2Ghz
  • 1 GB RAM
Can reside on same host computer as other components that generate portlets, such as the Collaboration, Studio, and Analytics.
Recommended
Install Publisher on a separate host computer from other components to preclude contention for the JVM.
Scaling Guide
Install one Publisher. If capacity is an issue, install Publisher on a separate host with premium hardware.
Security Guide
End-user access to the Publisher is gatewayed by the Portal component, so the Publisher host computer can reside behind a DMZ firewall.
Publisher connects to the Portal component through portlets via HTTP, to the ALI API Service via HTTP/SOAP, to the Publisher database and portal database through JDBC, and to Search through TCP.
Publisher publishes HTML pages and image files to a Web server (called the "publishing target") via file copy or FTP. The publishing target can have the same host as the Publisher or a separate host.
Studio
Minimum
  • Dual processor, 1Ghz
  • 1 GB RAM
Can reside on same host computer as other components that generate portlets, such as the Collaboration, Publisher, and Analytics.
Recommended
Install Studio on a separate host computer from other components to preclude contention for the JVM.
Scaling Guide
Install one Studio. If capacity is an issue, install Studio on a separate host with premium hardware.
Security Guide
End-user access to the Studio is gatewayed by the Portal component, so the Studio host computer can reside behind a DMZ firewall.
Studio connects to the Portal component through portlets via HTTP, to the ALI API Service via HTTP/SOAP, and to the Studio database and portal database through JDBC.
ALI API Service
Minimum
Can be on the same host as a Portal component.
Recommended
Install on the same host as a Portal component, unless you want to keep SOAP API behind a firewall.
If subject to heavy use, consider one or more separate hosts.
Scaling Guide
Install one ALI API Service.
Security Guide
If you do not want to expose the SOAP API through the extranet, install the ALI API Service on a separate host from a Portal component and locate the ALI API Service host behind a firewall.
Database Server
Minimum
  • 1 CPU, 1Ghz
  • 1 GB RAM
Recommended
  • 2-8 CPU
  • 4 GB RAM
Install on separate host computer.
Scaling Guide
Database Server Load Balancing
The database server can be scaled using any database-compatible clustering technology. Currently, this means that scaling can only be provided by a larger machine. If necessary, each portal database can be placed on a separate computer and scaled separately. If running on Windows, failover of databases can be provided with Microsoft Cluster Services, and geographic load balancing and failover can be provided using SQL Server replication. However, this method is technically and administratively challenging and is not recommended unless availability requirements cannot be met otherwise.
Oracle databases can be deployed for high availability. ALI supports both client-side connection and server-side connection failover with Oracle RAC. For more details, see the Knowledge Base article DA_288256 "How to configure Plumtree products to use Oracle RAC."
Security Guide
Install the database server behind a firewall and restrict access so that only computers that host ALI components can access the database server host. End users do not need access to the database server host.
Remote Server - Identity Services (IDS)
Minimum
  • Dual processor, 1Ghz
  • 1 GB memory
  • 2 GB disk space
Recommended
Install on a separate host from the Portal component.
To maximize performance, install in a network location that is in close proximity to back-end components.
Scaling Guide
Install additional Automation Services, as necessary, to accommodate a large number of IDS jobs.
Security Guide
End-user access to IDS portlets is gatewayed by the Portal component, so the IDS host computer can reside behind a DMZ firewall.
Remote Server - Content Services (CS)
Minimum
Install on a separate host from the Portal component.
Recommended
To maximize performance, install in a network location that is in close proximity to back-end data sources.
Scaling Guide
Install additional Automation Services, as necessary, to accommodate a large number of CS jobs.
Security Guide
End-user access to CS portlets is gatewayed by the Portal component, so the CS host computer can reside behind a DMZ firewall.
Remote Server - Portlets
Minimum
Can share a host with other portlets and Web services.
Recommended
Install on a separate host from the Portal component.
To maximize performance, install in a network location that is in close proximity to back-end components.
Scaling Guide
In general, caching enables static portlets with minimal personalization to scale very well to any number of users. Dynamic portlets with more personalization cannot be as effectively cached and so require more processing power. If necessary, you can improve performance by installing dynamic portlets on hosts with premium hardware.
Remote Server Load Balancing
Remote servers can be load balanced using Parallel Portal Engine load balancing. Refer to Load Balancing for instructions on configuring this feature. Remote servers can also be load balanced in a similar way to Portal components using the same kind of load balancing hardware.
Security Guide
End-user access to portlets is gatewayed by the Portal component, so the remote server host computer for portlets can reside behind a DMZ firewall.

 


Optimization Strategies

The following table characterizes optimization strategies you might consider when you provision computer resources for your site.

Goal
Approach
Low initial hardware cost
Organizations optimizing for low initial hardware cost seek to buy the least expensive machines necessary to make the software work reliably. Given a choice between repurposing two existing 1x700 MHz Pentium III servers and spending $7,500 on one 2x2.4GHz Pentium IV Server, they would choose the former.
Low hardware maintenance cost
Organizations optimizing for low hardware maintenance costs seek to reduce the number of machines needed to host the software. Because each additional computer incurs a minimum fixed cost in terms of administrative overhead, power consumption, space, and operating system license, these organizations would rather combine multiple ALUI components on a single, more powerful computer than distribute those components over multiple, less expensive machines.
High availability
Organizations optimizing for high availability are willing to spend extra money and effort to ensure that the portal and other ALUI components are available reliably to their users at all times. Such organizations typically purchase more computers and load balance them where possible, creating redundant configurations.
Low software maintenance cost
Organizations optimizing for low software maintenance cost assume that at some point in the life of the system, some part of the software will malfunction, and they seek both to lessen the chance that malfunctions will occur and lessen their impact when they do occur. Such organizations would typically purchase more individual computers to ensure that system components do not interfere with one another, and to reduce the risk that taking a computer out of the system to install new software will impact multiple system functions.
Scalability
Organizations optimizing for scalability assume that their deployments will be required to handle a large number of users. Such organizations would typically purchase extra hardware, and more expensive hardware, in order to create excess capacity in the system.
Performance
Organizations optimizing for performance seek to make their systems operate as fast as possible, especially in their ability to render pages quickly for end-users. Like organizations seeking to lower software maintenance costs, these organizations would distribute system components across a larger number of computers to ensure that each component has unrestricted access to the computing power it needs to perform its tasks the moment those tasks are called for.
Network Security
Organizations optimizing for network security seek to ensure that end-users touch only machines hosting the smallest amount of code and data. Such organizations also typically install firewalls between layers of their deployment, to ensure that if an intruder compromises one layer, the potential damage is limited. Such organizations tend to purchase more computers in order to isolate the Portal component, which end-users touch directly, from other components.

 


Load Balancing

The Parallel Portal Engine Load Balancer (PPE-LB) is a built in feature that allows you to load balance your Remote Servers to make better use of the Parallel Portal Engine (PPE). PPE-LB is a solution for middle-tier HTTP messaging (between the Portal component and the Remote Servers). It provides robust failover services for high availability and eliminates the need for a third party load balancing solution in front of Portlets. PPE-LB is designed to be as easy to configure as round robin DNS and readily solves proxy and SSL problems that are typically encountered with load balancing devices in middle-tier messaging.

On the DNS server, configure the Remote Server cluster name (for example, gs.portal.company.com) to resolve to multiple IP addresses. This is similar to setting up DNS round robin, except that PPE load balancing will failover, provide stickiness, and act as a load balancer. Each Remote Server in a cluster must have a unique IP address and must have the same software installed.

Note: Editing the hosts file on a Windows machine is not equivalent to configuring the DNS server. Windows caches and returns only the first IP address, instead of returning multiple IP addresses the way a DNS server does. If you are not able to configure the DNS server, contact Customer Support for registry settings you can add to provide equivalent functionality.

The entry in the DNS server should look something like this using BIND on a Unix DNS server:

remoteserver		60	IN	A	10.10.10.1
remoteserver		60	IN	A	10.10.10.2
remoteserver		60	IN	A	10.10.10.3

If the domain is company.com, then the remoteserver.company.com host name should be resolved to this list of IP addresses by the DNS server.

Portlet Support

Most Portlets should work correctly with PPE load balancing, but some Portlets may do in-memory caching that assumes the underlying database will not be modified by another application. Consult the Portlet documentation or Portlet developer to determine if specific Portlets can be load balanced.

PPE Load Balancing and SSL

If your Remote Servers use Secure Sockets Layer (SSL), BEA recommends creating a single SSL certificate by name and adding it to each machine in a Remote Server cluster.

Verifying That PPE Load Balancing is Configured Correctly

You can verify the DNS server configuration by using a tool called nslookup. For example, try using nslookup on www.microsoft.com:

  1. Open a command line prompt.
  2. Run nslookup. At the command prompt enter:
  3. nslookup www.microsoft.com

    This command will return something similar to the following lines:

    Server:  plumdc1.plumtree.com
    Address:  10.1.88.4
    Non-authoritative answer:
    Name:    www.microsoft.akadns.net
    Addresses:  207.46.197.100, 207.46.197.102, 207.46.230.218
    Aliases:  www.microsoft.com

    Notice that www.microsoft.com is using round robin DNS and three different IP addresses.

    The PPE updates itself from the DNS server. The PPE algorithm refreshes the list of IP addresses in a Remote Server cluster more frequently as more load is placed on it; it is not based on a timed update. It starts load balancing without requiring you to restart the server.

PPE Configuration Settings

PPE is implemented in the OpenHTTP standard. OpenHTTP settings are configured through the Portal component /settings/common/serverconfig.xml file. The default configuration includes many of built-in and internal settings for your deployment. You can configure the following additional settings.

Setting
Description
ForceHttp10
Sends HTTP/1.0 requests instead of HTTP/1.1. The sockets are closed after sending a single request.
TraceBodyAndHeaders
Fo debugging only. Traces the values of headers and some parts of the body of the requests/responses to PTSpy. Turned off by default because headers might contain passwords in cleartext.
HttpCacheSizeMb
Defines maximum size of the cached data. Cache uses LRU algorithm to decide which old entry should be kicked out in order to accommodate newer data.
ConnectionCacheTimeoutSec
Defines the time that socket remains unused in the cache before being closed by OpenHTTP.
MinimumDNSThreads
Specifies the minimum number of threads that are used to perform DNS lookups.
MaximumDNSThreads
Specifies the maximum number of threads that are used to perform DNS lookups.
ProxyURL
Specifies the URL for a proxy host.
ProxyUser
Specifies an authentication user name for the proxy connection.
ProxyPassword
Specifies an authentication password for the proxy connection.
ProxyBypass
Contains a list of hosts accessed directly instead of through the proxy.
ProxyBypassLocal
Boolean flag specifies that all hosts that are in the same domain should not be accessed through the proxy. If a hostname does not have any "." (dots) in its name, it is considered local, in the same DNS domain.

Before you configure other OpenHTTP settings, contact BEA Professional Services.

For more information on configuring OpenHTTP, see the Support Center Knowledge Base.

External Service Load Balancing

The portal is dependent on many other servers and services to function. Each of these services must provide some failover. At a minimum, these services include:

 


Scaling Using Federated Portals

One way of scaling a portal is to use multiple networked (federated) portals rather than one very large portal. This is especially useful if you require more than 25-50 GB of indexed content in your Knowledge Directory. It also makes sense if disparate departments need to share some data but use mostly different portlets, communities, and content. Sometimes the politics or organization of a company's business lends itself to a federated portal solution. Different groups can administer and control their own content separately using smaller systems that require less planning and maintenance. This can also be accomplished in a large portal system to some degree by having different departments control remote servers that serve secure content to the portal. In a federated portal system, information is shared via federated search and, possibly, shared portlets. For these systems, identify the scaling needs of each portal in the network and decide how the portals should be connected. It is very important that the various groups agree on how content is to be shared and how much load they can expect other portals to place on their portal. Size each system as you would a single large portal, but take into account potentially higher load on shared remote servers and federated search pages.


  Back to Top       Previous  Next