3 Grid Control Common Configurations

Oracle Enterprise Manager 10g Grid Control is based on a flexible architecture, which allows you to deploy the Grid Control components in the most efficient and practical manner for your organization. This chapter describes some common configurations that demonstrate how you can deploy the Grid Control architecture in various computing environments.

This chapter presents the common configurations in a logical progression, starting with the simplest configuration and ending with a complex configuration that involves the deployment of high availability components, such as server load balancers, Oracle Real Application Clusters, and Oracle Data Guard.

This chapter contains the following sections:

About Common Configurations
Summary of the Grid Control Architecture and Components
Deploying Grid Control Components on a Single Host
Managing Multiple Hosts and Deploying a Remote Management Repository
Using Multiple Management Service Installations
High Availability Configurations
Configuring Oracle Enterprise Manager 10.1 Agents for Use In Active/Passive Environments
Using Virtual Hostnames for Active/Passive High Availability Environments in Enterprise Manager Database Control

3.1 About Common Configurations

The common configurations described in this chapter are provided as examples only. The actual Grid Control configurations that you deploy in your own environment will vary depending upon the needs of your organization.

For example, the examples in this chapter assume you are using the OracleAS Web Cache port to access the Grid Control Console. By default, when you first install Grid Control, you display the Grid Control Console by navigating to the default OracleAS Web Cache port. In fact, you can modify your own configuration so administrators bypass OracleAS Web Cache and use a port that connects them directly to the Oracle HTTP Server.

For another example, in a production environment you will likely want to implement firewalls and other security considerations. The common configurations described in this chapter are not meant to show how firewalls and security policies should be implemented in your environment.

See Also:

Chapter 4, "Enterprise Manager Security" for information about securing the connections between Grid Control components

Chapter 5, "Configuring Enterprise Manager for Firewalls" for information about configuring firewalls between Grid Control components

Besides providing a description of common configurations, this chapter can also help you understand the architecture and flow of data among the Grid Control components. Based on this knowledge, you can make better decisions about how to configure Grid Control for your specific management requirements.

3.2 Summary of the Grid Control Architecture and Components

The Grid Control architecture consists of the following software components:

The Oracle Management Agent
The Oracle Management Service
The Oracle Management Repository
The Oracle Enterprise Manager 10g Grid Control Console

See Also:

Oracle Enterprise Manager Concepts for more information about each of the Grid Control components

The remaining sections of this chapter describe how you can deploy these components in a variety of combinations and across a single host or multiple hosts.

3.3 Deploying Grid Control Components on a Single Host

Figure 3-1 shows how each of the Grid Control components are configured to interact when you install Grid Control on a single host. This is the default configuration that results when you use the Grid Control installation procedure to install the Enterprise Manager 10g Grid Control Using a New Database installation type.

Figure 3-1 Grid Control Components Installed on a Single Host

Description of "Figure 3-1 Grid Control Components Installed on a Single Host"

When you install all the Grid Control components on a single host, the management data travels along the following paths:

Administrators use the Grid Control Console to monitor and administer the managed targets that are discovered by the Management Agents on each host. The Grid Control Console uses the default OracleAS Web Cache port (for example, port 7777 on UNIX systems and port 80 on Windows systems) to connect to the Oracle HTTP Server. The Management Service retrieves data from the Management Repository as it is requested by the administrator using the Grid Control Console.

See Also:
Oracle Application Server Web Cache Administrator's Guide for more information about the benefits of using OracleAS Web Cache
The Management Agent loads its data (which includes management data about all of the managed targets on the host, including the Management Service and the Management Repository database) by way of the Oracle HTTP Server upload URL. The Management Agent uploads data directly to Oracle HTTP Server and bypasses OracleAS Web Cache. The default port for the upload URL is 4889 (it if is available during the installation procedure). The upload URL is defined by the REPOSITORY_URL property in the following configuration file in the Management Agent home directory:
```
AGENT_HOME/sysman/config/emd.properties (UNIX)
AGENT_HOME\sysman\config\emd.properties (Windows)
```
See Also:
"Understanding the Enterprise Manager Directory Structure" for more information about the AGENT_HOME directory
The Management Service uses JDBC connections to load data into the repository database and to retrieve information from the repository so it can be displayed in the Grid Control Console. The repository connection information is defined in the following configuration file in the Management Service home directory:
```
ORACLE_HOME/sysman/config/emoms.properties (UNIX)
ORACLE_HOME\sysman\config\emoms.properties (Windows)
```
See Also:
"Reconfiguring the Oracle Management Service" for more information on modifying the repository connection information in the emoms.properties file
The Management Service sends data to the Management Agent by way of HTTP. The Management Agent software includes a built-in HTTP listener that listens on the Management Agent URL for messages from the Management Service. As a result, the Management Service can bypass the Oracle HTTP Server and communicate directly with the Management Agent. If the Management Agent is on a remote system, no Oracle HTTP Server is required on the Management Agent host.

The Management Service uses the Management Agent URL to monitor the availability of the Management Agent, submit Enterprise Manager jobs, and other management functions.

The Management Agent URL can be identified by the EMD_URL property in the following configuration file in the Management Agent home directory:
```
AGENT_HOME/sysman/config/emd.properties (UNIX)
AGENT_HOME\sysman\config\emd.properties (Windows)
```
For example:
```
EMD_URL=http://host1.acme.com:1831/emd/main/
```
In addition, by default, the name of the Management Agent as it appears in the Grid Control Console consists of the Management Agent host name and the port used by the Management Agent URL.

3.4 Managing Multiple Hosts and Deploying a Remote Management Repository

Installing all the Grid Control components on a single host is an effective way to initially explore the capabilities and features available to you when you centrally manage your Oracle environment.

A logical progression from the single-host environment is to a more distributed approach, where the Management Repository database is on a separate host and does not compete for resources with the Management Service. Such a configuration is shown in Figure 3-2.

Figure 3-2 Grid Control Components Distributed on Multiple Hosts with One Management Service

Description of "Figure 3-2 Grid Control Components Distributed on Multiple Hosts with One Management Service"

In this more distributed configuration, data about your managed targets travels along the following paths so it can be gathered, stored, and made available to administrators by way of the Grid Control Console:

Administrators use the Grid Control Console to monitor and administer the targets just as they do in the single-host scenario described in Section 3.3.
Management Agents are installed on each host on the network, including the Management Repository host and the Management Service host. The Management Agents upload their data to the Management Service by way of the Management Service upload URL, which is defined in the emd.properties file in each Management Agent home directory. The upload URL bypasses OracleAS Web Cache and uploads the data directly through the Oracle HTTP Server.
The Management Repository is installed on a separate machine that is dedicated to hosting the Management Repository database. The Management Service uses JDBC connections to load data into the repository database and to retrieve information from the repository so it can be displayed in the Grid Control Console. This remote connection is defined in the emoms.properties configuration file in the Management Service home directory.
The Management Service communicates directly with each remote Management Agent over HTTP by way of the Management Agent URL. The Management Agent URL is defined by the EMD_URL property in the emd.properties file of each Management Agent home directory. As described in Section 3.3, the Management Agent includes a built-in HTTP listener so no Oracle HTTP Server is required on the Management Agent host.

3.5 Using Multiple Management Service Installations

In larger production environments, you may find it necessary to add additional Management Service installations to help reduce the load on the Management Service and improve the efficiency of the data flow.

Note:

When you add additional Management Service installations to your Grid Control configuration, be sure to adjust the parameters of your Management Repository database. For example, you will likely need to increase the number of processes allowed to connect to the database at one time. Although the number of required processes will vary depending on the overall environment and the specific load on the database, as a general guideline, you should increase the number of processes by 40 for each additional Management Service.

For more information, see the description of the PROCESSES initialization parameter in the Oracle Database Reference.

The following sections provide more information about this configuration:

Determining When to Use Multiple Management Service Installations
Understanding the Flow of Management Data When Using Multiple Management Services

3.5.1 Determining When to Use Multiple Management Service Installations

Management Services not only exist as the receivers of upload information from Management Agents. They also retrieve data from the Management Repository. The Management Service renders this data in the form of HTML pages, which are requested by and displayed in the client Web browser. In addition, the Management Services perform background processing tasks, such as notification delivery and the dispatch of Enterprise Manager jobs.

As a result, the assignment of Management Agents to Management Services must be carefully managed. Improper distribution of load from Management Agents to Management Services may result in perceived:

Sluggish user interface response
Delays in delivering notification messages
Backlog in monitoring information being uploaded to the Management Repository
Delays in dispatching jobs

The following sections provide some tips for monitoring the load and response time of your Management Service installations:

Monitoring the Load on Your Management Service Installations
Monitoring the Response Time of the Enterprise Manager Web Application Target

3.5.1.1 Monitoring the Load on Your Management Service Installations

To keep the workload evenly distributed, you should always be aware of how many Management Agents are configured per Management Service and monitor the load on each Management Service.

At any time, you can view a list of Management Agents and Management Services using Setup on the Grid Control Console.

Use the charts on the Overview page of Management Services and Repository to monitor:

Loader backlog (files)

The Loader is part of the Management Service that pushes metric data into the repository at periodic intervals. If the Loader Backlog chart indicates that the backlog is high and Loader output is low, there is data pending load, which may indicate a system bottleneck or the need for another Management Service. The chart shows the total backlog of files totaled over all Oracle Management Services for the past 24 hours. Click the image to display loader backlog charts for each individual Management Service over the past 24 hours.
Notification delivery backlog

The Notification Delivery Backlog chart displays the number of notifications to be delivered that could not be processed in the time allocated. Notifications are delivered by the Management Services. This number is summed across all Management Services and is sampled every 10 minutes. The graph displays the data for the last 24 hours. It is useful for determining a growing backlog of notifications. When this graph shows constant growth over the past 24 hours, then you may want to consider adding another Management Service, reducing the number of notification rules, and verifying that all rules and notification methods are useful and valid.

3.5.1.2 Monitoring the Response Time of the Enterprise Manager Web Application Target

The information on the Management Services and Repository page can help you determine the load being placed on your Management Service installations. More importantly, you should also consider how the performance of your Management Service installations is affecting the performance of the Grid Control Console.

Use the EM Website Web Application target to review the response time of the Grid Control Console pages:

From the Grid Control Console, click the Targets tab and then click the Web Applications subtab.
Click EM Website in the list of Web Application targets.

In the Key Test Summary table, click homepage. The resulting page provides the response time of the Grid Control Console homepage URL.

See Also:

The Enterprise Manager online help for more information about using the homepage URL and Application Performance Management (also known as Application Performance Monitoring) to determine the performance of your Web Applications

Click Page Performance to view the response time of some selected Grid Control Console pages.

Note:
The Page Performance page provides data generated only by users who access the Grid Control Console by way of the OracleAS Web Cache port (usually, 7777).
Select 7 Days or 31 Days from the View Data menu to determine whether or not there are any trends in the performance of your Grid Control installation.

Consider adding additional Management Service installations if the response time of all pages is increasing over time or if the response time is unusually high for specific popular pages within the Grid Control Console.

Note:

You can use Application Performance Management and Web Application targets to monitor your own Web applications. For more information, see Chapter 6, "Configuring Services"

3.5.2 Understanding the Flow of Management Data When Using Multiple Management Services

Figure 3-3 shows a typical environment where an additional Management Service has been added to improve the performance of the Grid Control environment.

Figure 3-3 Grid Control Architecture with Multiple Management Service Installations

Description of "Figure 3-3 Grid Control Architecture with Multiple Management Service Installations"

In a multiple Management Service configuration, the management data moves along the following paths:

Administrators can use one of two URLs to access the Grid Control Console. Each URL refers to a different Management Service installation, but displays the same set of targets, all of which are loaded in the common Management Repository. Depending upon the host name and port in the URL, the Grid Control Console obtains data from the Management Service (by way of OracleAS Web Cache and the Oracle HTTP Server) on one of the Management Service hosts.

Each Management Agent uploads its data to a specific Management Service, based on the upload URL in its emd.properties file. That data is uploaded directly to the Management Service by way of Oracle HTTP Server, bypassing OracleAS Web Cache.

Note:

This is a known vulnerability and could lead to potential data loss. To mitigate this potential data loss in version 10.2 of the database with a shared OMS upload directory, see High Availability Configurations.

Each Management Service communicates by way of JDBC with a common Management Repository, which is installed in a database on a dedicated Management Repository host. Each Management Service uses the same database connection information, defined in the emoms.properties file, to load data from its Management Agents into the Management Repository. The Management Service uses the same connection information to pull data from the Management Repository as it is requested by the Grid Control Console.
Any Management Service in the system can communicate directly with any of the remote Management Agents defined in the common Management Repository. The Management Services communicate with the Management Agents over HTTP by way of the unique Management Agent URL assigned to each Management Agent.

As described in Section 3.3, the Management Agent URL is defined by the EMD_URL property in the emd.properties file of each Management Agent home directory. Each Management Agent includes a built-in HTTP listener so no Oracle HTTP Server is required on the Management Agent host.

3.6 High Availability Configurations

When you configure Grid Control for high availability, your aim is to protect each component of the system, as well as the flow of management data in case of performance or availability problems, such as a failure of a host or a Management Service.

One way to protect your Grid Control components is to use high availability software deployment techniques, which usually involve the deployment of hardware server load balancers, Oracle Real Application Clusters, and Oracle Data Guard.

Note:

The following sections do not provide a comprehensive set of instructions for configuring Grid Control for high availability. The examples here are shown only to provide examples of some common configurations of Grid Control components. These examples are designed to help you understand some of your options when you deploy Grid Control in your environment.

For a complete discussion of configuring Oracle products for high availability, refer to Oracle High Availability Architecture and Best Practices

Refer to the following sections for more information about common Grid Control configurations that take advantage of high availability hardware and software solutions:

Load Balancing Connections Between the Management Agent and the Management Service
Load Balancing Connections Between the Grid Control Console and the Management Service
Configuring the Management Repository for High Availability

3.6.1 Load Balancing Connections Between the Management Agent and the Management Service

Before you implement a plan to protect the flow of management data from the Management Agents to the Management Service, you should be aware of some key concepts.

Specifically, Management Agents do not maintain a persistent connection to the Management Service. When a Management Agent needs to upload collected monitoring data or an urgent target state change, the Management Agent establishes a connection to the Management Service. If the connection is not possible, such as in the case of a network failure or a host failure, the Management Agent retains the data and re-attempts to send the information later.

To protect against the situation where a Management Service is unavailable, you can use a server load balancer between the Management Agents and the Management Services.

However, if you decide to implement such a configuration, be sure to review the following sections carefully before proceeding:

Configuring the Management Services for High Availability
Understanding the Flow of Data When Load Balancing the Upload of Management Data
Configuring a Server Load Balancer for Management Agent Data Upload

3.6.1.1 Configuring the Management Services for High Availability

The Management Service for Grid Control 10g Release 2 has a new high availability feature called the Shared Filesystem Loader. In the Shared Filesystem Loader, management data files received from Management Agents are stored temporarily on a common shared location called the shared receive directory. All Management Services are configured to use the same storage location for the shared receive directory. The Management Services coordinate internally and distribute amongst themselves the workload of uploading files into the Management Repository. Should a Management Service go down for some reason, its workload is taken up by surviving Management Services.

Configuring the Shared Filesystem Loader

To configure the Management Service to use Shared Filesystem Loader, you must use the emctl config oms loader command.

Stop all the Management Services.
Configure a shared receive directory that is accessible by all Management Services.
Run emctl config oms loader -shared yes -dir <loader directory> individually on all Management Services hosts, where <loader directory> is the full path to the shared receive directory.
Restart all Management Services.

Caution:

Shared Filesystem Loader mode should be configured on all the Management Services in your Grid Control deployment using the previous steps. Management Services will fail to start if all the Management Services are not configured to run in the same mode.

3.6.1.2 Understanding the Flow of Data When Load Balancing the Upload of Management Data

Figure 3-4 shows a typical scenario where a set of Management Agents upload their data to a server load balancer, which redirects the data to one of two Management Service installations.

Figure 3-4 Load Balancing Between the Management Agent and the Management Service

Description of "Figure 3-4 Load Balancing Between the Management Agent and the Management Service"

In this example, only the upload of Management Agent data is routed through the server load balancer. The Grid Control Console still connects directly to a single Management Service by way of the unique Management Service upload URL.

When you load balance the upload of Management Agent data to multiple Management Service installations, the data is directed along the following paths:

Administrators can use one of two URLs to access the Grid Control Console just as they do in the multiple Management Service configuration defined in Section 3.6.2.2.
Each Management Agent uploads its data to a common server load balancer URL. This URL is defined in the emd.properties file for each Management Agent. In other words, the Management Agents connect to a virtual service exposed by the server load balancer. The server load balancer routes the request to any one of a number of available servers that provide the requested service.
Each Management Service, upon receipt of data, stores it temporarily in a local file and acknowledges receipt to the Management Agent. The Management Services then coordinate amongst themselves and one of them loads the data in a background thread in the correct chronological order.
Each Management Service communicates by way of JDBC with a common Management Repository, just as they do in the multiple Management Service configuration defined in Section 3.5.
Each Management Service communicates directly with each Management Agent by way of HTTP, just as they do in the multiple Management Service configuration defined in Section 3.5.

3.6.1.3 Configuring a Server Load Balancer for Management Agent Data Upload

This section describes guidelines you can use for configuring a server load balancer to balance the upload of data from Management Agents to multiple Management Service installations.

In the following examples, assume that you have installed two OMS processes on Host A and Host B. For ease of installation start with two hosts that have no application server processes installed. This insures that the default ports are used as seen in the table below. The following examples use these default values for illustration purposes.

Table 3-1 OMS Ports

Name	Default Value	Description	Source	Defined By
Secure Upload Port	1159	Used for secure upload of management data from agents.	httpd_em.conf and emoms.properties	Install. Can be modified by emctl secure OMS - secure port <port> command.
Agent Registration Port	4889	Used by agents during the registration phase to download agent wallets, for example, during emctl secure agent. In an unlocked OMS, it can be used for uploading management data to OMS.	httpd_em.conf and emoms.properties	Install
Secure Console Port	4444	Used for secure (https) console access.	ssl.conf	Install
Unsecure Console Port	7777	Used for unsecure (http) console access.	httpd.conf	Install
Webcache Secure Port	4443	Used for secure (https) console access.	webcache.xml	Install
Webcache Unsecure Port	7779	Used for unsecure (http) console access.	webcache.xml	Install

By default, the servicename on the OMS-side certificate uses the name of the OMS host. Agents do not accept this certificate when they communicate with the OMS via an SLB. You must run the following command to regenerate the certificate on both Management Services:

emctl secure -oms -sysman_pwd <sysman_pwd> -reg_pwd <agent_reg_password> -host slb.acme.com -secure_port 1159

Specifically, you should use the administration tools that are packaged with your server load balancer to configure a virtual pool that consists of the hosts and the services that each host provides. In the case of the Management Services pool, specify the host name and agent upload port. To insure a highly available Management Service, you should have two or more Management Services defined within the virtual pool. A sample configuration is provided below:

Create Pools

Pool abc_upload: Used for secure upload of management data from agents to Management Services Members: hostA:1159, hostB:1159 Persistence: None Load Balancing: round robin Pool abc_genWallet: Used for securing new agents Members: hostA:4889, host B:4889 Persistence: Active HTTP Cookie, method-> insert, expiration 5 minutes Load balancing: round robin Pool abc_uiAccess: Used for secure console access Members: hostA:4444, hostB:4444 Persistence: SSL Session ID, timeout-> 3000 seconds (should be greater than the OC4J session timeout of 45 minutes) Load balancing: round robin
Create Virtual Servers

Virtual Server for secure uploadAddress: slb.acme.com Service: 1159 Pool: abc_upload Virtual Server for agent registrationAddress: slb.acme.com Service: 4889 Pool: abc_genWallet Virtual Server for UI accessAddress: sslb.acme.com Service: https i.e. 443 Pool: abc_uiAccess

Modify the REPOSITORY_URL property in the emd.properties file located in the sysman/config directory of the Management Agent home directory. The host name and port specified must be that of the server load balancer virtual service.

See Also:

"Configuring the Management Agent to Use a New Management Service" for more information about modifying the REPOSITORY_URL property for a Management Agent

Declare the pool to use a load balancing policy, for example, Round Robin or Least Loaded. Do not configure persistence between Management Agents and Management Services.

This configuration allows the distribution of connections from Management Agents equally between Management Services. In the event a Management Service becomes unavailable, the load balancer should be configured to direct connections to the surviving Management Services.

To successfully implement this configuration, the load balancer can be configured to monitor the underlying Management Service. On some models, for example, you can configure a monitor on the server load balancer. The monitor defines the:

HTTP request that is to be sent to a Management Service
Expected result in the event of success
Frequency of evaluation

For example, the load balancer can be configured to check the state of the Management Service every 5 seconds. On three successive failures, the load balancer can then mark the component as unavailable and no longer route requests to it. The monitor should be configured to send the string GET /em/upload over HTTP and expect to get the response Http XML File receiver.

A sample monitor configuration is provided below.

Monitor mon_uploadType: https Interval: 60 Timeout: 181 Send String: GET /em/upload HTTP 1.0\n Receive Rule: Http Receiver Servlet active! Associate with: hostA:1159, hostB:1159 Monitor mon_genWalletType: http Interval: 60 Timeout: 181 Send String: GET /em/genwallet HTTP 1.0\n Receive Rule: GenWallet Servlet activated Associate with: hostA:4889, hostB:4889 Monitor mon_uiAccessType: https Interval: 5 Timeout: 16 Send String: GET /em/console/home HTTP/1.0\nUser-Agent: Mozilla/4.0 (compatible; MSIE 6.0, Windows NT 5.0)\n Receive Rule: /em/console/logon/logon;jsessionid=Associate with: hostA:4444, hostB:4444

Note:

The network bandwidth requirements on the Server Load Balancer need to be reviewed carefully. Monitor the traffic being handled by the load balancer using the administrative tools packaged with your load balancer. Ensure that the load balancer is capable of handling the traffic passing through it. For example, deployments with a large number of targets can easily exhaust a 100 Mbps Ethernet card. A Gigabit Ethernet card would be required in such cases.

See Also:

Your Server Load Balancer documentation for more information on configuring virtual pools, load balancing policies, and monitoring network traffic

3.6.2 Load Balancing Connections Between the Grid Control Console and the Management Service

Using a server load balancer to manage the flow of data from the Management Agents is not the only way in which a load balancer can help you configure a highly available Grid Control environment. You can also use a load balancer to balance the load and to provide a failover solution for the Grid Control Console.

The following sections provide more information about this configuration:

Understanding the Flow of Data When Load Balancing the Grid Control Console
Configuring a Server Load Balancer for the Grid Control Console
Configuring Oracle HTTP Server When Using a Server Load Balancer for the Grid Control Console

3.6.2.1 Understanding the Flow of Data When Load Balancing the Grid Control Console

Figure 3-5 shows a typical configuration where a server load balancer is used between the Management Agents and multiple Management Services, as well as between the Grid Control Console and multiple Management Services.

Figure 3-5 Load Balancing Between the Grid Control Console and the Management Service

Description of "Figure 3-5 Load Balancing Between the Grid Control Console and the Management Service"

In this example, a single server load balancer is used for the upload of data from the Management Agents and for the connections between the Grid Control Console and the Management Service.

When you use a server load balancer for the Grid Control Console, the management data uses the following paths through the Grid Control architecture:

Administrators use one URL to access the Grid Control Console. This URL directs the browser to the server load balancer virtual service. The virtual service redirects the browser to one of the Management Service installations. Depending upon the host name and port selected by the server load balancer from the virtual pool of Management Service installations, the Grid Control Console obtains the management data by way of OracleAS Web Cache and the Oracle HTTP Server on one of the Management Service hosts.
Each Management Agent uploads its data to a common server load balancer URL (as described in Section 3.6.1) and data is written to the shared receive directory.
Each Management Service communicates by way of JDBC with a common Management Repository, just as they do in the multiple Management Service configuration defined in Section 3.5.
Each Management Service communicates directly with each Management Agent by way of HTTP, just as they do in the multiple Management Service configuration defined in Section 3.

To configure the Management Services for High Availability, additionally you must use a storage device that is shared by all the Management Services. This is used to protect the data loaded from the agent to the OMS before it is processed by the OMS and staged in the repository. The shared storage can be a NFS mounted disk accessible to all Management Services. For achieving a truly highly available deployment, a shareable file system like Network Appliance™ Filer is recommended.

See "Configuring the Management Services for High Availability" for steps on configuring the Management Services for High Availability.

3.6.2.2 Configuring a Server Load Balancer for the Grid Control Console

Use the administration tools that are packaged with your server load balancer to configure a virtual pool that consists of the hosts and the services that each host provides. In the case of the Management Services pool, specify the host name and default OracleAS Web Cache port. To insure a highly available Management Service, you should have two or more Management Services defined within the virtual pool.

The load balancer parcels the work to any number of Management Service processes that it has in its virtual pool. This provides a method for constant communication to the Grid Control Console in the event of the failure of a Management Service.

The virtual pool for Grid Control Console should to be configured for session persistence. It is necessary that all requests from one user go to the same Management Service for the duration of a session. Use the persistence method provided by your load balancer. For example if you have enabled Enterprise Manager Framework Security and you are running the Management Service in a secure environment (using HTTPS and SSL), use SSL Session ID based persistence. If you have not enabled Enterprise Manager Framework Security and you are running in an environment that is not secure (using HTTP), you could use Client IP or Cookie based persistence.

3.6.2.3 Configuring Oracle HTTP Server When Using a Server Load Balancer for the Grid Control Console

The Management Service is implemented as a J2EE Web application, which is deployed on an instance of Oracle Application Server. Like many Web-based applications, the Management Service often redirects the client browser to a specific set of HTML pages, such as a logon screen and a specific application component or feature.

When the Oracle HTTP Server redirects a URL, it sends the URL, including the Oracle HTTP Server host name, back to the client browser. The browser then uses that URL, which includes the Oracle HTTP Server host name, to reconnect to the Oracle HTTP Server. As a result, the client browser attempts to connect directly to the Management Service host and bypasses the server load balancer.

To prevent the browser from bypassing the load balancer when a URL is redirected, edit the ServerName directive defined in the Oracle HTTP Server configuration file. This directive will be found in one of two places:

If you have enabled Enterprise Manager Framework Security and you are running the Management Service in a secure environment (using HTTPS and SSL), the ServerName directive you must change is located in the following configuration file:
```
ORACLE_HOME/Apache/Apache/conf/ssl.conf
```
If you have not enabled Enterprise Manager Framework Security and you are running in an environment that is not secure (using HTTP), the ServerName directive you must change is located in the following configuration file:
```
ORACLE_HOME/Apache/Apache/conf/httpd.conf
```

Change the ServerName directive so it matches the name of the server load balancer virtual service that you configured in Section 3.6.2.2.

See Also:

Oracle HTTP Server Administrator's Guide

3.6.3 Configuring the Management Repository for High Availability

When you configure Grid Control for high availability, there are several ways to configure the Management Repository to prevent the loss of management data stored in the database.

The following sections describe a typical configuration designed to safeguard your Management Repository:

Understanding the Flow of Data When Configuring the Management Repository for High Availability
Installing the Management Repository Configured for High Availability and Disaster Recovery
Specifying the Size of the Management Repository Tablespaces in a RAC Database
Configuring the Management Service to Use Oracle Net Load Balancing and Failover

3.6.3.1 Understanding the Flow of Data When Configuring the Management Repository for High Availability

Figure 3-6 shows a typical Grid Control high availability configuration, where server load balancers are balancing the load on the multiple Management Service installations and the Management Repository is protected by Oracle Real Application Clusters and Oracle Data Guard.

Figure 3-6 Grid Control High Availability Configuration

Description of "Figure 3-6 Grid Control High Availability Configuration"

When you install the Management Repository in a RAC database and incorporate Oracle Data Guard into the configuration, the management data uses the following paths through the Grid Control architecture:

Administrators use one URL to access the Grid Control Console. This URL directs the browser to the server load balancer virtual service as described in Section 3.6.2.

Each Management Agent uploads its data to a common server load balancer URL as described in Section 3.6.1.

Caution:

Before deploying a server load balancer for the upload of Management Agent data, be sure to review Section 3.6.1.3, "Configuring a Server Load Balancer for Management Agent Data Upload"

Each Management Service communicates by way of JDBC with a common Management Repository, which is installed in a Real Application Clusters instance. Each Management Service uses the same database connection information, defined in the emoms.properties file, to load data into the Management Repository. The Management Service uses the same connection information to pull data from the Management Repository as it is requested by the Grid Control Console.

See Also:

"Configuring the Management Service to Use Oracle Net Load Balancing and Failover" for information about configuring the connection to a Management Repository that is installed in a RAC database

In addition, the Management Repository is also protected by Oracle Data Guard. Note that only physical Data Guard is supported for protecting the Management Repository.

See Also:

Oracle Data Guard Concepts and Administration

For more information about Maximum Availability Architecture, see: http://www.oracle.com/technology/deploy/availability/htdocs/maa.htm

Each Management Service communicates directly with each Management Agent by way of HTTP, just as they do in the multiple Management Service configuration defined in Section 3.5.

See Also:

For information about Maximum Availability Architecture (MAA) refer to Oracle Application Server 10g High Availability Guide

3.6.3.2 Installing the Management Repository Configured for High Availability and Disaster Recovery

To install the Management Repository in a highly available configuration, use the following procedure:

Install the Oracle 10g Database Release 2 (10.2) software and create a RAC database, optionally using ASM for storage management. For a list of standard database best practices that should be applied to the repository database, refer to Chapter 2 of Oracle Database High Availability Overview 10g Release 2.
Begin installing Grid Control, using the Enterprise Manager 10g Grid Control Using an Existing Database installation option. Using Grid Control version 10.2.0.2, you can install the OMS processes to the same node as the RAC instances. Allowing the OMS and repository to share the same nodes reduces the cost of implementing a high availability solution.
When you are prompted for a database system identifier (SID) and port, specify the host name, listener port, and ORACLE_SID for one of the RAC instances.
After the Grid Control installation is complete, modify the Management Service connection string to take advantage of client failover in the event of a RAC host outage. This allows the OMS process to continue communications with surviving repository instances.

When you use a RAC cluster, a standby system, or both to provide high availability for the Management Repository, the Management Service can be configured to use an Oracle Net connect string that takes advantage of redundancy in the repository. Correctly configured, the Management Service process will continue to process data from Management Agents even during a database node outage.

To configure the Management Service to take advantage of this feature, follow these steps:

Use a text editor to open the following configuration file in the Management Service home directory:

ORACLE_HOME/sysman/config/emoms.properties
Locate the following entry in the emom.properties file:

oracle.sysman.eml.mntr.emdRepConnectDescriptors

Edit the entry so it includes references to the individual nodes within the RAC database.

The following example shows a connect string that supports a two-node RAC configuration. Note the backslash (\) before each equal sign (=), which is required when you enter the connect string within the emoms.properties configuration file.

oracle.sysman.eml.mntr.emdRepConnectDescriptor=(DESCRIPTION\=(ADDRESS_LIST\=(FAILOVER\=ON) (ADDRESS\=(PROTOCOL\=TCP) (HOST\=haem1.us.oracle.com) (PORT\=1521)) (CONNECT_DATA\=(SERVICE_NAME\=em10)))

See Also:

"Enabling Advanced Features of Oracle Net Services" in the Oracle Database Net Services Administrator's Guide for more information about using the FAILOVER parameter and other advanced features within a database connect string.

The Management Services rely on the connect time load balancing of the repository database listener to distribute connections between RAC instances. For the distribution to work optimally in Enterprise Manager Grid Control, ensure that the PREFER LEAST LOADED NODE <listener_name> property in listener.ora files is commented out or set to ON.
For a disaster recover site, use Enterprise Manager to configure a physical standby database for the repository database. Set the standby to Max Availability.
Enable Fast Start Failover in Data Guard. This feature allows a third process called an "observer" to monitor the health of the primary and standby databases. In the event of a failure on the primary, the observer senses the fault and initiates a switch to the standby database.
Create a failover trigger. New in Data Guard for version 10.2 of Enterprise Manager is the ability to fire a trigger when a database instance changes its role from primary to standby. You can use this feature to change the database services being offered by the primary and standby database instances. If you combine this feature with the connect string described in Step 3 above, you can change the service being offered to the active instances as opposed to the standby. A sample of the trigger used and the script that can be called is provided below.

CREATE OR REPLACE TRIGGER_set_rc_svc AFTER DB_ROLE_CHANGE ON DATABASE

DECLARE

role VARCHAR(30)

BEGIN

SELECT DATABASE_ROLE INTO role FROM V$DATABASE;

IF role = 'PRIMARY' THEN

DBMS_SERVICE.START_SERVICE('em_svc');

begin

dbms_scheduler.create_job(

job_name=>'oms_start',

job_type=>'executable',

job_action=>'/u01/app/oracle/product/gcha/oms10g/start_ oms.ksh',

enabled=>TRUE

);

end;

ELSE

DBMS_SERVICES.STOP_SERVICE('em_svc');

END IF;

exec DBMS_SERVICE.CREATE_SERVICE(service_name=>'em_svc',network_name=>'em_svc',aq_ha_notifications=>true,failover_method=>'BASIC'.failover_type=>'SELECT',failover_retries=>180.failover_delay=>1

exec DBMS_SERVICE.START_SERVICE('em_svc');

start_oms.ksh

#!/bin/ksh

/u01/app/oracle/product/gcha/oms10g/bin/emctl start oms

3.6.3.3 Specifying the Size of the Management Repository Tablespaces in a RAC Database

When you install the Management Repository into a RAC database instance, you cannot set the size of the required Enterprise Manager tablespaces. You can, however, specify the name and location of data files to be used by the Management Repository schema. The default sizes for the initial data file extents depend on using the AUTOEXTEND feature and as such are insufficient for a production installation. This is particularly problematic when storage for the RAC database is on a raw device.

If the RAC database being used for the Management Repository is configured with raw devices, there are three options for increasing the size of the repository.

The preferred option for high availability installs is to use Oracle Automatic Storage Management to manage database storage. The location string must be specified manually (for example, +DATA/<database_name>/<datafile_name). If ASM storage is used, there is no need to disable any space management storage settings.
You can create multiple raw partitions, with the first one equal to the default size of the tablespace as defined by the installation process.
Alternatively, you can create the tablespace using the default size, create a dummy object that will increase the size of the tablespace to the end of the raw partition, then drop that object.

Regardless, if raw devices are used, disable the default space management for these objects, which is to auto-extend.

3.6.3.4 Configuring the Management Service to Use Oracle Net Load Balancing and Failover

When you use a RAC cluster, a standby system, or both to provide high availability for the Management Repository, the Management Service can be configured to use an Oracle Net connect string that will take advantage of redundancy in the repository. Correctly configured, the Management Service process will continue to process data from Management Agents even during a database node outage.

To configure the Management Service to take advantage of this feature:

Use a text editor to open the following configuration file in the Management Service home directory:
```
ORACLE_HOME/sysman/config/emoms.properties
```
Locate the following entry in the emoms.properties file:
```
oracle.sysman.eml.mntr.emdRepConnectDescriptor=
```

Edit the entry so it includes references to the individual nodes within the RAC database.

The following example shows a connect string that supports a two-node RAC configuration. Note the backslash (\) before each equal sign (=), which is required when you are entering the connect string within the emoms.properties configuration file:

oracle.sysman.eml.mntr.emdRepConnectDescriptor=(DESCRIPTION\=(ADDRESS_
LIST\=(FAILOVER\=ON)(ADDRESS\=(PROTOCOL\=TCP)(HOST\=haem1.us.oracle.com)(PORT\
1521))(ADDRESS\=(PROTOCOL\=TCP)(HOST\=haem2.us.oracle.com)(PORT\=1521)))
(CONNECT_DATA\=(SERVICE_NAME\=em10)))

See Also:

3.7 Configuring Oracle Enterprise Manager 10.1 Agents for Use In Active/Passive Environments

Oracle Enterprise Manager 10.1 plays a key role when deploying Oracle solutions by providing the tools for managing, monitoring, and alerting for these environments. High Availability for Enterprise Manager itself is also another critical piece of the Oracle configurations. Active/Passive (or CFC - Cold Failover Cluster) environments refer to one type of high availability solution that allows an application to run on one node at a time. These environments generally use a combination of 'cluster' software to provide a logical hostname and IP Address, along with interconnected host/storage systems to share information to provide a measure of high availability for applications. In an Cold Failover Cluster environment, one host is considered the 'active node' where applications are run, accessing the data contained on the shared storage. The second node is considered the 'standby node', ready to run the same applications currently hosted on the primary in the event of a failure. The cluster software is configured to present a 'Logical Hostname' and IP address. This address provides a generic location for running applications that is not tied to either the active or the standby node.In the event of a failure of the active node, applications can be terminated either by the hardware failure or by the cluster software and re-started on the passive node using the same logical hostname and IP address to access the new node, resuming operations with little disruption. Automating failover of the virtual hostname and IP, along with starting the applications on the passive node, requires the use of the third party cluster software. Several Oracle partner vendors provide high availability solutions in this area.Oracle Enterprise Manager can be configured to support this Cold Failover Cluster configuration using the existing Enterprise Manager agents communicating to the Oracle Management Server (OMS) processes. The additional configuration steps for this environment are the installation of an extra management agent using the logical hostname and IP generated through the cluster software. The targets monitored by each agent need to be modified once the third agent is installed,. To insure continuous monitoring coverage by Enterprise Manager, the targets that will move between hardware nodes under the control of the cluster software are monitored by the agents installed using the floating IP or virtual hostname. This agent can now be moved along with the other applications configured for high availability using the cluster software In summary, this configuration results in the installation of three total agents, one for each hardware node and one for the IP address generated by the cluster software. Theoretically, if the cluster software supports the generation of multiple virtual IP address to support multiple high availability environments, the solution outlined here should scale to support it.See the Oracle® Enterprise Manager Advanced Configuration, Chapter 3 for alternative Enterprise Manager Architecture and High Availability architectures.The remainder of this document will review the installation and configuration steps required to configure Enterprise Manager to monitor Cold Failover Cluster environments. It will also review the performance considerations for such a configuration and describe the test environment and assumptions used to validate these steps and recommendations.

3.7.1 Installation and Configuration

The table below documents the steps required to configure agents in a CFC environment:

Table 3-2 Steps Required to Configure Agents in a Cold Failover Cluster Environment

Action	Method	Description/Outcome	Verification
Install the vendor specific cluster software	Installation method varies depending on the cluster vendor.	The minimal requirement is a 2-node cluster that supports Virtual or Floating IP addresses and shared storage.	Use the ping command to verify the existence of the floating IP address. Use nslookup or equivalent to verify the IP address in your environment.
Install agents to each physical node of the cluster using the physical IP address or hostname as the node name.	Use the Oracle Installer (OUI) to install agents to each node of the cluster.	When complete, the OUI will have installed agents on each node that will be visible through the Enterprise Manager console.	Check that the agent, host, and targets are visible in the Enterprise Manager environment.
Delete targets that will be configured for high availability using the cluster software.	Using the Enterprise Manager console, delete all targets discovered during the previous installation step that are managed by the cluster software except for the agent and the host.	Enterprise Manager will display the agent, hardware, and any target that is not cofigured for high availability.	Inspect the Enterprise Manager console and verify that all targets that will be assigned to the agent running on the floating IP address have been deleted from the agents monitoring the fixed IP addresses.
Install a third agent to the cluster using the logical IP or logical hostname as the host specified in the OUI at install time. Note: This installation should not detect or install to more than one node.	This agent must follow all the same conventions as any application using the cluster software to move between nodes (i.e., installed on the shared storage using the logical IP address). In version 10.1, this installation requires an additional option to be used at the command line during installation time. The 'OUI_HOSTNAME' flag must be set as in the following example: (/144)- >./runInstaller OUI_HOSTNAME=<Logical IP address or hostname>	Third agent installed, currently monitoring all targets discovered on the host running physical IP.	To verify the agent is configured correctly, use the 'emctl status agent' at the command line and verify the use of the logical IP o virtual hostname. Also verify the agent is correct to the correct OMS URL and that the agent is uploading the files. When the agent is running and uploading, use the console to verify that it has correctly discovered targets that will move to the standby node duirng a failover operation.
Delete any targets from the agent monitoring the logical IP that will not switch to the passive node during failover.	Use the Enterprise Manager console to delete any targets that will not move between hosts in a switchover or failover scenario. These might be targets that are not attached to this logical IP address for failover or are not configured for redundancy.	Enterprise Manager console is now running three agents. Any target that is configured for switchover using cluster software will be monitored by an agent that will transition suring switchover or failover operations.	The operation is also verified by inspecting the Enterprise Manager user interface. By this step, all targets that will move between nodes should be monitored by the agent running on the virtual hostname. All remaining targets should me monitored by an agent running on an individual node.
Add the new logical host to the cluster definition.	Using the All Targets tab on theEnterprise Manager console, find the cluster target and add the newly discovered logical host to the existing cluster target definition.	It is also possible (not required) to use the Add Cluster Target option on the All Targets tab, making a new composite target using the nodes of the cluster.	The Grid Control console will now correctly display all the hosts associated with the cluster.
Place the agent process running on the logical IP under the control of the cluster software.	This will vary based on the cluster software vendor.	Agent will transition along with applications. A suggested order of operation is covered in the next section.	Verify that the agent can be stopped and restarted on the standby node using the cluster software.

3.7.2 Switchover Steps

Each cluster vendor will implement the process of building a wrapper around the steps required to do a switchover or failover in a different fashion. The steps themselves are generic and are listed below

Shut down the Enterprise Manager agent
Shut down all the applications running on the virtual IP and shared storage
Switch the IP and shared storage to the new node
Restart the applications
Restart the Enterprise Manager agent

Stopping the agent first and restarting it after the other applications have started prevents Enterprise Manager from triggering any false 'target down' alerts that would otherwise occur during a switchover or failover.

3.7.3 Performance Implications

While it is logical to assume that running two agent processes on the active host many have some performance implications, this was not shown during our testing. It is important to keep in mind that if the agents are configured as described in this document, the agent monitoring the physical host IP will only have two targets to monitor. Therefore the only additional overhead is the two agent processes themselves and the commands they issue to monitor an agent and the operating system. During our testing, we noticed an overhead of between 1-2% of CPU usage.

3.7.4 Test Environment

The goal of this paper was to validate the fundamental concept of switching agents between nodes in an active/passive environment regardless of any particular vendor's cluster software. With that goal in mind, all operations normally managed by cluster software such as migration of IP address and storage between nodes were done manually

These tests were conduction in the following hardware and software environment:

2 Sun Ultra-250's running Solaris 2.8 with applicable patches
Enterprise Manager Version was 10.1.0.3 for the Agent and OMS running with a 10.1.0.3 database for the repository.
Communication between Agent and OMS was tested in both a secured and unsecured fashion

3.7.5 Summary

Generically, configuring Enterprise Manager 10.1 to support Cold Cluster Failover environments is a matter of three steps.

Install an agent for each virtual hostname that is presented by the cluster and insure that agent is correctly communicating to the OMS.
Configure the agent that will move between nodes to monitor the appropriate highly available targets.
Verify the agent can be stopped on the primary node and restarted on the secondary automatically by the cluster software in the event of a switchover or failover.

3.8 Using Virtual Hostnames for Active/Passive High Availability Environments in Enterprise Manager Database Control

This section provides information to database administrators about configuring a version 10g database in Cold Failover Cluster environments using Enterprise Manager Database Control.

The following conditions must be met for Database Control to service a database instance after failing over to a different host in the cluster:

The installation of the database must be done using a Virtual IP address.
The installation must be conducted on a shared disk/volume which holds the binaries, configuration, and runtime data (including the database)
Configuration data and metadata must also failover to the surviving node
Inventory location must failover to the surviving node
Software owner and timezone parameters must be the same on all cluster member nodes that will host this database

The following items are configuration and installation points you should consider before getting started.

To override the physical hostname of the cluster member with a virtual hostname, software must be installed using the parameter ORACLE_HOSTNAME
For inventory pointer, software must be installed using the commanding parameter 'invPtrLoc' to point to the shared inventory location file, which includes the path to the shared inventory location.

The database software, the configuration of the database and Database Control is done on a shared volume.

Note:

The use of OCFS V1 to install shared Oracle binaries is currently not supported. If you are using an NFS mounted volume for the installation, you must specify rsize and wsize in your command to prevent running into I/O issues.

Example:

lxdb.acme.com:/u01/app/share1/u01/app/share1 nfs, rw,bg,rsize=32768,wsize=32768,hard,nointr,tcp,noac,vers=3,timeo=600 0 0

3.8.1 Set Up the Alias for the Virtual Hostname/Virtual IP Address

You can set up the alias for the virtual hostname/virtual IP address by either allowing the clusterware to set it up automatically or by setting it up manually before installation and startup of Oracle services. The virtual hostname must be static and resolveable consistently on the network. All nodes participating in the setup must resolve the virtual IP address to the same hostname. Standard TCP tools similar to nslookup and traceroute can be used to verify.

3.8.2 Set Up Shared Storage

Shared storage can be storage managed by the cluserware that is in use or you can use any shared file system volume as long as it is not unsupported, such as OCFS V1. The most common shared file system is NFS.

3.8.3 Set Up the Environment

Some operating system versions require specific operating system patches to be applied prior to installing version 10gR2 of the Oracle database. You must also have sufficient kernel resources available when you conduct the installation.

Before you launch the installer, specific environment variables must be verified. Each of the following variables must be identically be set for the account your are using to install the software on all machines participating in the cluster.

Operating system variable TZ, timezone setting. You should unset this prior to the installation.
PERL variables. Variables like PERL5LIB should be unset to prevent the installation and Database Control from picking up the incorrect set of PERL libraries.
Paths used for dynamic libraries. Based on the operating system, the variables can be LD_LIBRARY_PATH, LIBPATH, SHLIB_PATH, or DYLD_LIBRARY_PATH. These variables should ONLY point to directories that are visible and usable on each node of the cluster.

3.8.4 Ensure That the Oracle USERNAME, ID, and GROUP NAME Are In Sync On All Cluster Members

The user and group of the software owner should be defined identically on all nodes of the cluster. You can verify this using the following command:

$ id -a
uid=1234(oracle) gid=5678(dba)groups=5678(dba)

3.8.5 Ensure That Inventory Files Are On the Shared Storage

To ensure that inventory files are on the shared storage, follow these steps:

Create you new ORACLE_HOME directory.
Create the Oracle Inventory directory under the new Oracle home
```
cd <shared oracle home>
mkdir oraInventory
```
Create the oraInst.loc file. This file contains the Inventory directory path information required by the Universal Installer:
1. vi oraInst.loc
2. Enter the path information to the Oracle Inventory directory and specify the group of the software owner as the dba user. For example:
```
inventory_loc=/app/oracle/product/10.2/oraInventory inst_group=dba
```

3.8.6 Start the Installer

To start the installer, point to the inventory location file oraInst.loc, and specify the hostname of the virtual group. The debug parameter in the example below is optional:

run installer -invPtrloc /app/oracle/share1/oraInst.loc ORACLE_HOSTNAME=lxdb.acme.com -debug

3.8.7 Start Services

You must start the services in the following order:

Establish IP address on the active node
Start the TNS listener
Start the database
Start dbconsole
Test functionality

In the event that services do not start, do the following:

Establish IP on failover box
Start TNS listener
```
lsnrctl start
```
Start the database
```
dbstart
```
Start Database Console
```
emctl start dbconsole
```
Test functionality

To manually stop or shutdown a service, follow these steps:

Stop the application.
Stop Database Control
```
emctl stop dbconsole
```
Stop TNS listener
```
lsnrctl stop
```
Stop the database
```
dbshut
```
Stop IP