|Oracle® Enterprise Manager Advanced Configuration
10g Release 3 (10.2.0.3.0)
|PDF · Mobi · ePub|
Oracle Enterprise Manager 10g Grid Control is based on a flexible architecture, which allows you to deploy the Grid Control components in the most efficient and practical manner for your organization. This chapter describes some common configurations that demonstrate how you can deploy the Grid Control architecture in various computing environments.
The common configurations are presented in a logical progression, starting with the simplest configuration and ending with a complex configuration that involves the deployment of high availability components, such as load balancers, Oracle Real Application Clusters, and Oracle Data Guard.
This chapter contains the following sections:
The common configurations described in this chapter are provided as examples only. The actual Grid Control configurations that you deploy in your own environment will vary depending upon the needs of your organization.
For instance, the examples in this chapter assume you are using the OracleAS Web Cache port to access the Grid Control Console. By default, when you first install Grid Control, you display the Grid Control Console by navigating to the default OracleAS Web Cache port. In fact, you can modify your own configuration so administrators bypass OracleAS Web Cache and use a port that connects them directly to the Oracle HTTP Server.
For another example, in a production environment you will likely want to implement firewalls and other security considerations. The common configurations described in this chapter are not meant to show how firewalls and security policies should be implemented in your environment.
See Also:Chapter 5, "Enterprise Manager Security" for information about securing the connections between Grid Control components
Chapter 6, "Configuring Enterprise Manager for Firewalls" for information about configuring firewalls between Grid Control components
Besides providing a description of common configurations, this chapter can also help you understand the architecture and flow of data among the Grid Control components. Based on this knowledge, you can make better decisions about how to configure Grid Control for your specific management requirements.
The Grid Control architecture consists of the following software components:
Oracle Management Agent
Oracle Management Service
Oracle Management Repository
Oracle Enterprise Manager 10g Grid Control Console
See Also:Oracle Enterprise Manager Concepts for more information about each of the Grid Control components
The remaining sections of this chapter describe how you can deploy these components in a variety of combinations and across a single host or multiple hosts.
Figure 3-1 shows how each of the Grid Control components are configured to interact when you install Grid Control on a single host. This is the default configuration that results when you use the Grid Control installation procedure to install the Enterprise Manager 10g Grid Control Using a New Database installation type.
When you install all the Grid Control components on a single host, the management data travels along the following paths:
Administrators use the Grid Control Console to monitor and administer the managed targets that are discovered by the Management Agents on each host. The Grid Control Console uses the default OracleAS Web Cache port (for example, port 7777 on UNIX systems and port 80 on Windows systems) to connect to the Oracle HTTP Server. The Management Service retrieves data from the Management Repository as it is requested by the administrator using the Grid Control Console.
See Also:Oracle Application Server Web Cache Administrator's Guide for more information about the benefits of using OracleAS Web Cache
The Management Agent loads its data (which includes management data about all the managed targets on the host, including the Management Service and the Management Repository database) by way of the Oracle HTTP Server upload URL. The Management Agent uploads data directly to Oracle HTTP Server and bypasses OracleAS Web Cache. The default port for the upload URL is 4889 (it if is available during the installation procedure). The upload URL is defined by the
REPOSITORY_URL property in the following configuration file in the Management Agent home directory:
See Also:"Understanding the Enterprise Manager Directory Structure" for more information about the AGENT_HOME directory
The Management Service uses JDBC connections to load data into the Management Repository database and to retrieve information from the Management Repository so it can be displayed in the Grid Control Console. The Management Repository connection information is defined in the following configuration file in the Management Service home directory:
ORACLE_HOME/sysman/config/emoms.properties (UNIX) ORACLE_HOME\sysman\config\emoms.properties (Windows)
See Also:"Reconfiguring the Oracle Management Service" for more information on modifying the Management Repository connection information in the emoms.properties file
The Management Service sends data to the Management Agent by way of HTTP. The Management Agent software includes a built-in HTTP listener that listens on the Management Agent URL for messages from the Management Service. As a result, the Management Service can bypass the Oracle HTTP Server and communicate directly with the Management Agent. If the Management Agent is on a remote system, no Oracle HTTP Server is required on the Management Agent host.
The Management Service uses the Management Agent URL to monitor the availability of the Management Agent, submit Enterprise Manager jobs, and other management functions.
The Management Agent URL can be identified by the
EMD_URL property in the following configuration file in the Management Agent home directory:
AGENT_HOME/sysman/config/emd.properties (UNIX) AGENT_HOME\sysman\config\emd.properties (Windows)
In addition, by default, the name of the Management Agent as it appears in the Grid Control Console consists of the Management Agent host name and the port used by the Management Agent URL.
Installing all the Grid Control components on a single host is an effective way to initially explore the capabilities and features available to you when you centrally manage your Oracle environment.
A logical progression from the single-host environment is to a more distributed approach, where the Management Repository database is on a separate host and does not compete for resources with the Management Service. The benefit in such a configuration is scalability; the workload for the Management Repository and Management Service can now be split. This configuration also provides the flexibility to adjust the resources allocated to each tier, depending on the system load. (Such a configuration is shown in Figure 3-2.) See Section 18.104.22.168, "Monitoring the Load on Your Management Service Installations" for additional information.
In this more distributed configuration, data about your managed targets travels along the following paths so it can be gathered, stored, and made available to administrators by way of the Grid Control Console:
Administrators use the Grid Control Console to monitor and administer the targets just as they do in the single-host scenario described in Section 3.2.
Management Agents are installed on each host on the network, including the Management Repository host and the Management Service host. The Management Agents upload their data to the Management Service by way of the Management Service upload URL, which is defined in the
emd.properties file in each Management Agent home directory. The upload URL bypasses OracleAS Web Cache and uploads the data directly through the Oracle HTTP Server.
The Management Repository is installed on a separate machine that is dedicated to hosting the Management Repository database. The Management Service uses JDBC connections to load data into the Management Repository database and to retrieve information from the Management Repository so it can be displayed in the Grid Control Console. This remote connection is defined in the
emoms.properties configuration file in the Management Service home directory.
The Management Service communicates directly with each remote Management Agent over HTTP by way of the Management Agent URL. The Management Agent URL is defined by the
EMD_URL property in the
emd.properties file of each Management Agent home directory. As described in Section 3.2, the Management Agent includes a built-in HTTP listener so no Oracle HTTP Server is required on the Management Agent host.
In larger production environments, you may find it necessary to add additional Management Service installations to help reduce the load on the Management Service and improve the efficiency of the data flow.
Note:When you add additional Management Service installations to your Grid Control configuration, be sure to adjust the parameters of your Management Repository database. For example, you will likely need to increase the number of processes allowed to connect to the database at one time. Although the number of required processes will vary depending on the overall environment and the specific load on the database, as a general guideline, you should increase the number of processes by 40 for each additional Management Service.
For more information, see the description of the PROCESSES initialization parameter in the Oracle Database Reference.
The following sections provide more information about this configuration:
Figure 3-3 shows a typical environment where an additional Management Service has been added to improve the scalability of the Grid Control environment.
In a multiple Management Service configuration, the management data moves along the following paths:
Administrators can use one of two URLs to access the Grid Control Console. Each URL refers to a different Management Service installation, but displays the same set of targets, all of which are loaded in the common Management Repository. Depending upon the host name and port in the URL, the Grid Control Console obtains data from the Management Service (by way of OracleAS Web Cache and the Oracle HTTP Server) on one of the Management Service hosts.
Each Management Agent uploads its data to a specific Management Service, based on the upload URL in its
emd.properties file. That data is uploaded directly to the Management Service by way of Oracle HTTP Server, bypassing OracleAS Web Cache.
Whenever more than one Management Service is installed, it is a best practice to have the Management Service upload to a shared directory. This allows all Management Service processes to manage files that have been downloaded from any Management Agent. Configure this functionality from the command line of each Management Service process as follows:
emctl config oms loader -shared <yes|no> -dir <load directory>
Note:By adding a load balancer, you can avoid the following problems:
Should the Management Service fail, any Management Agent connected to it cannot upload data.
Because user accounts only know about one Management Service, users lose connectivity should the Management Service go down even if the other Management Service is up.
See Section 3.5, "High Availability Configurations" for information regarding load balancers.
Each Management Service communicates by way of JDBC with a common Management Repository, which is installed in a database on a dedicated Management Repository host. Each Management Service uses the same database connection information, defined in the
emoms.properties file, to load data from its Management Agents into the Management Repository. The Management Service uses the same connection information to pull data from the Management Repository as it is requested by the Grid Control Console.
Any Management Service in the system can communicate directly with any of the remote Management Agents defined in the common Management Repository. The Management Services communicate with the Management Agents over HTTP by way of the unique Management Agent URL assigned to each Management Agent.
As described in Section 3.2, the Management Agent URL is defined by the
EMD_URL property in the
emd.properties file of each Management Agent home directory. Each Management Agent includes a built-in HTTP listener so no Oracle HTTP Server is required on the Management Agent host.
Management Services not only exist as the receivers of upload information from Management Agents. They also retrieve data from the Management Repository. The Management Service renders this data in the form of HTML pages, which are requested by and displayed in the client Web browser. In addition, the Management Services perform background processing tasks, such as notification delivery and the dispatch of Enterprise Manager jobs.
As a result, the assignment of Management Agents to Management Services must be carefully managed. Improper distribution of load from Management Agents to Management Services may result in perceived:
Sluggish user interface response
Delays in delivering notification messages
Backlog in monitoring information being uploaded to the Management Repository
Delays in dispatching jobs
To keep the workload evenly distributed, you should always be aware of how many Management Agents are configured per Management Service and monitor the load on each Management Service.
At any time, you can view a list of Management Agents and Management Services using Setup on the Grid Control console.
Use the charts on the Overview page of Management Services and Repository to monitor:
The Loader is part of the Management Service that pushes metric data into the Management Repository at periodic intervals. If the Loader Backlog chart indicates that the backlog is high and Loader output is low, there is data pending load, which may indicate a system bottleneck or the need for another Management Service. The chart shows the total backlog of files totaled over all Oracle Management Services for the past 24 hours. Click the image to display loader backlog charts for each individual Management Service over the past 24 hours.
The Notification Delivery Backlog chart displays the number of notifications to be delivered that could not be processed in the time allocated. Notifications are delivered by the Management Services. This number is summed across all Management Services and is sampled every 10 minutes. The graph displays the data for the last 24 hours. It is useful for determining a growing backlog of notifications. When this graph shows constant growth over the past 24 hours, then consider adding another Management Service, reducing the number of notification rules, and verifying that all rules and notification methods are useful and valid.
The information on the Management Services and Repository page can help you determine the load being placed on your Management Service installations. More importantly, consider how the performance of your Management Service installations is affecting the performance of the Grid Control Console.
EM Website Web Application target to review the response time of the Grid Control Console pages:
From the Grid Control Console, click the Targets tab and then click the Web Applications subtab.
In the Key Test Summary table, click homepage. The resulting page provides the response time of the Grid Control Console homepage URL.
Click Page Performance to view the response time of some selected Grid Control Console pages.
Note:The Page Performance page provides data generated only by users who access the Grid Control Console by way of the OracleAS Web Cache port (usually, 7777).
Select 7 Days or 31 Days from the View Data menu to determine whether or not there are any trends in the performance of your Grid Control installation.
Consider adding additional Management Service installations if the response time of all pages is increasing over time or if the response time is unusually high for specific popular pages within the Grid Control Console.
Note:You can use Application Performance Management and Web Application targets to monitor your own Web applications. For more information, see Chapter 7, "Configuring Services".
When you configure Grid Control for high availability, your aim is to protect each component of the system, as well as the flow of management data in case of performance or availability problems, such as a failure of a host or a Management Service.
One way to protect your Grid Control components is to use high availability software deployment techniques, which usually involve the deployment of hardware load balancers, Oracle Real Application Clusters, and Oracle Data Guard.
Note:The following sections do not provide a comprehensive set of instructions for configuring Grid Control for high availability. The examples here are shown only to provide examples of some common configurations of Grid Control components. These examples are designed to help you understand some of your options when you deploy Grid Control in your environment.
For a complete discussion of configuring Oracle products for high availability, refer to the Oracle white paper Enterprise Manager Maximum Availability Architecture (MAA) Best Practices Enterprise Manager 10R2 available at:
Refer to the following sections for more information about common Grid Control configurations that take advantage of high availability hardware and software solutions. These configurations are part of the Maximum Availability Architecture (MAA).
Before you implement a plan to protect the flow of management data from the Management Agents to the Management Service, you should be aware of some key concepts.
Specifically, Management Agents do not maintain a persistent connection to the Management Service. When a Management Agent needs to upload collected monitoring data or an urgent target state change, the Management Agent establishes a connection to the Management Service. If the connection is not possible, such as in the case of a network failure or a host failure, the Management Agent retains the data and reattempts to send the information later.
To protect against the situation where a Management Service is unavailable, you can use a load balancer between the Management Agents and the Management Services.
However, if you decide to implement such a configuration, be sure to understand the flow of data when load balancing the upload of management data.
Figure 3-4 shows a typical scenario where a set of Management Agents upload their data to a load balancer, which redirects the data to one of two Management Service installations.
In this example, only the upload of Management Agent data is routed through the load balancer. The Grid Control Console still connects directly to a single Management Service by way of the unique Management Service upload URL. This abstraction allows Grid Control to present a consistent URL to both Management Agents and Grid Control consoles, regardless of the loss of any one Management Service component.
When you load balance the upload of Management Agent data to multiple Management Service installations, the data is directed along the following paths:
Administrators can use one of two URLs to access the Grid Control Console just as they do in the multiple Management Service configuration.
Each Management Agent uploads its data to a common load balancer URL. This URL is defined in the
emd.properties file for each Management Agent. In other words, the Management Agents connect to a virtual service exposed by the load balancer. The load balancer routes the request to any one of a number of available servers that provide the requested service.
Each Management Service, upon receipt of data, stores it temporarily in a local file and acknowledges receipt to the Management Agent. The Management Services then coordinate amongst themselves and one of them loads the data in a background thread in the correct chronological order.
Also, each Management Service communicates by way of JDBC with a common Management Repository, just as they do in the multiple Management Service configuration defined in Section 3.4.
Each Management Service communicates directly with each Management Agent by way of HTTP, just as they do in the multiple Management Service configuration defined in Section 3.4.
Using a load balancer to manage the flow of data from the Management Agents is not the only way in which a load balancer can help you configure a highly available Grid Control environment. You can also use a load balancer to balance the load and to provide a failover solution for the Grid Control Console.
The following sections provide more information about this configuration:
Figure 3-5 shows a typical configuration where a load balancer is used between the Management Agents and multiple Management Services, as well as between the Grid Control Console and multiple Management Services.
In this example, a single load balancer is used for the upload of data from the Management Agents and for the connections between the Grid Control Console and the Management Service.
When you use a load balancer for the Grid Control Console, the management data uses the following paths through the Grid Control architecture:
Administrators use one URL to access the Grid Control Console. This URL directs the browser to the load balancer virtual service. The virtual service redirects the browser to one of the Management Service installations. Depending upon the host name and port selected by the load balancer from the virtual pool of Management Service installations, the Grid Control Console obtains the management data by way of OracleAS Web Cache and the Oracle HTTP Server on one of the Management Service hosts.
Each Management Agent uploads its data to a common load balancer URL (as described in Section 3.5.1) and data is written to the shared receive directory.
Each Management Service communicates by way of JDBC with a common Management Repository, just as they do in the multiple Management Service configuration defined in Section 3.4.
Each Management Service communicates directly with each Management Agent by way of HTTP, just as they do in the multiple Management Service configuration defined in Section 3.
The Management Service is implemented as a J2EE Web application, which is deployed on an instance of Oracle Application Server. Like many Web-based applications, the Management Service often redirects the client browser to a specific set of HTML pages, such as a logon screen and a specific application component or feature.
When the Oracle HTTP Server redirects a URL, it sends the URL, including the Oracle HTTP Server host name, back to the client browser. The browser then uses that URL, which includes the Oracle HTTP Server host name, to reconnect to the Oracle HTTP Server. As a result, the client browser attempts to connect directly to the Management Service host and bypasses the load balancer.
To prevent the browser from bypassing the load balancer when a URL is redirected, edit the
ServerName directive defined in the Oracle HTTP Server configuration file. This directive will be found in one of two places:
If you have enabled Enterprise Manager Framework Security and you are running the Management Service in a secure environment (using HTTPS and SSL), the
ServerName directive you must change is located in the following configuration file:
If you have not enabled Enterprise Manager Framework Security and you are running in an environment that is not secure (using HTTP), the
ServerName directive you must change is located in the following configuration file:
ServerName directive so it matches the name of the load balancer virtual service that you configured.
See Also:Oracle HTTP Server Administrator's Guide
The Management Service for Grid Control 10g Release 2 has a high availability feature called the Shared Filesystem Loader. In the Shared Filesystem Loader, management data files received from Management Agents are stored temporarily on a common shared location called the shared receive directory. All Management Services are configured to use the same storage location for the shared receive directory. The Management Services coordinate internally and distribute amongst themselves the workload of uploading files into the Management Repository. Should a Management Service go down for some reason, its workload is taken up by surviving Management Services.
To configure the Management Service to use Shared Filesystem Loader, you must use the
emctl config oms loader command.
Stop all the Management Services.
Configure a shared receive directory that is accessible by all Management Services.
emctl config oms loader -shared yes -dir <loader directory> individually on all Management Services hosts, where <loader directory> is the full path to the shared receive directory.
Restart all Management Services.
Caution:Shared Filesystem Loader mode must be configured on all the Management Services in your Grid Control deployment using the previous steps. Management Services will fail to start if all the Management Services are not configured to run in the same mode.
This section describes guidelines you can use for configuring a load balancer to balance the upload of data from Management Agents to multiple Management Service installations.
In the following examples, assume that you have installed two Management Service processes on Host A and Host B. For ease of installation, start with two hosts that have no application server processes installed. This ensures that the default ports are used as seen in the following table. The examples use these default values for illustration purposes.
|Name||Default Value||Description||Source||Defined By|
Secure Upload Port
Used for secure upload of management data from Management Agents.
httpd_em.conf and emoms.properties
Install. Can be modified by
Agent Registration Port
Used by Management Agents during the registration phase to download Management Agent wallets, for example, during
httpd_em.conf and emoms.properties
Secure Console Port
Used for secure (https) console access.
Unsecure Console Port
Used for unsecure (http) console access.
Webcache Secure Port
Used for secure (https) console access.
Webcache Unsecure Port
Used for unsecure (http) console access.
By default, the service name on the Management Service-side certificate uses the name of the Management Service host. Management Agents do not accept this certificate when they communicate with the Management Service through a load balancer. You must run the following command to regenerate the certificate on both Management Services:
emctl secure -oms -sysman_pwd <sysman_pwd> -reg_pwd <agent_reg_password> -host slb.acme.com -secure_port 1159
Specifically, you should use the administration tools that are packaged with your load balancer to configure a virtual pool that consists of the hosts and the services that each host provides. In the case of the Management Services pool, specify the host name and Management Agent upload port. To insure a highly available Management Service, you should have two or more Management Services defined within the virtual pool. A sample configuration follows.
In this sample, both pools and virtual servers are created.
Pool abc_upload: Used for secure upload of management data from Management Agents to Management Services Members: hostA:1159, hostB:1159 Persistence: None Load Balancing: round robin Pool abc_genWallet: Used for securing new Management Agents Members: hostA:4889, host B:4889 Persistence: Active HTTP Cookie, method-> insert, expiration 60 minutes Load balancing: round robin Pool abc_uiAccess: Used for secure console access Members: hostA:4444, hostB:4444 Persistence: SSL Session ID, timeout-> 3000 seconds (should be greater than the OC4J session timeout of 45 minutes) Load balancing: round robin
Create Virtual Servers
Virtual Server for secure uploadAddress: slb.acme.com Service: 1159 Pool: abc_upload Virtual Server for Management Agent registrationAddress: slb.acme.com Service: 4889 Pool: abc_genWallet Virtual Server for UI accessAddress: sslb.acme.com Service: https i.e. 443 Pool: abc_uiAccess
Modify the REPOSITORY_URL property in the
emd.properties file located in the
sysman/config directory of the Management Agent home directory. The host name and port specified must be that of the load balancer virtual service.
See Also:"Configuring the Management Agent to Use a New Management Service" for more information about modifying the REPOSITORY_URL property for a Management Agent
Declare the pool to use a load balancing policy, for example, Round Robin or Least Loaded. Do not configure persistence between Management Agents and Management Services.
This configuration allows the distribution of connections from Management Agents equally between Management Services. In the event a Management Service becomes unavailable, the load balancer should be configured to direct connections to the surviving Management Services.
To successfully implement this configuration, the load balancer can be configured to monitor the underlying Management Service. On some models, for example, you can configure a monitor on the load balancer. The monitor defines the:
HTTP request that is to be sent to a Management Service
Expected result in the event of success
Frequency of evaluation
For example, the load balancer can be configured to check the state of the Management Service every 5 seconds. On three successive failures, the load balancer can then mark the component as unavailable and no longer route requests to it. The monitor should be configured to send the string
GET /em/upload over HTTP and expect to get the response
Http XML File receiver. See the following sample monitor configuration.
In this sample, three monitors are configured: mon_upload, mon_genWallet, and mon_uiAccess.
Monitor mon_uploadType: https Interval: 60 Timeout: 181 Send String: GET /em/upload HTTP 1.0\n Receive Rule: Http Receiver Servlet active! Associate with: hostA:1159, hostB:1159 Monitor mon_genWalletType: http Interval: 60 Timeout: 181 Send String: GET /em/genwallet HTTP 1.0\n Receive Rule: GenWallet Servlet activated Associate with: hostA:4889, hostB:4889 Monitor mon_uiAccessType: https Interval: 5 Timeout: 16 Send String: GET /em/console/home HTTP/1.0\nUser-Agent: Mozilla/4.0 (compatible; MSIE 6.0, Windows NT 5.0)\n Receive Rule: /em/console/logon/logon;jsessionid=Associate with: hostA:4444, hostB:4444
Note:The network bandwidth requirements on the Load Balancer need to be reviewed carefully. Monitor the traffic being handled by the load balancer using the administrative tools packaged with your load balancer. Ensure that the load balancer is capable of handling the traffic passing through it. For example, deployments with a large number of targets can easily exhaust a 100 Mbps Ethernet card. A Gigabit Ethernet card would be required in such cases.
See Also:Your Load Balancer documentation for more information on configuring virtual pools, load balancing policies, and monitoring network traffic
When you configure Grid Control for high availability, there are several ways to configure the Management Repository to prevent the loss of management data stored in the database. One way is to specify the size of the Management Repository tablespaces in a Real Application Cluster (RAC) database.
When you install the Management Repository into a RAC database instance, you cannot set the size of the required Enterprise Manager tablespaces. You can, however, specify the name and location of data files to be used by the Management Repository schema. The default sizes for the initial data file extents depend on using the AUTOEXTEND feature and as such are insufficient for a production installation. This is particularly problematic when storage for the RAC database is on a raw device.
The preferred option for high availability installs is to use Oracle Automatic Storage Management to manage database storage. The location string must be specified manually (for example,
+DATA/<database_name>/<datafile_name). If ASM storage is used, there is no need to disable any space management storage settings.
You can create multiple raw partitions, with the first one equal to the default size of the tablespace as defined by the installation process.
Alternatively, you can create the tablespace using the default size, create a dummy object that will increase the size of the tablespace to the end of the raw partition, then drop that object.
Regardless, if raw devices are used, disable the default space management for these objects, which is to auto-extend.
The following sections document best practices for installation and configuration of each Grid Control component.
The Management Agent is started manually. It is important that the Management Agent be automatically started when the host is booted to insure monitoring of critical resources on the administered host. To that end, use any and all operating system mechanisms to automatically start the Management Agent. For example, on UNIX systems this is done by placing an entry in the UNIX
/etc/init.d that calls the Management Agent on boot or by setting the Windows service to start automatically.
Once the Management Agent is started, the watchdog process monitors the Management Agent and attempts to restart it in the event of a failure. The behavior of the watchdog is controlled by environment variables set before the Management Agent process starts. The variables that control this behavior follow. All testing discussed here was done with the default settings.
EM_MAX_RETRIES – This is the maximum number of times the watchdog will attempt to restart the Management Agent within the EM_RETRY_WINDOW. The default is to attempt restart of the Management Agent 3 times.
EM_RETRY_WINDOW - This is the time interval in seconds that is used together with the EM_MAX_RETRIES environmental variable to determine whether the Management Agent is to be restarted. The default is 600 seconds.
The watchdog will not restart the Management Agent if the watchdog detects that the Management Agent has required restart more than EM_MAX_RETRIES within the EM_RETRY_WINDOW time period.
The Management Agent persists its intermediate state and collected information using local files in the
$AGENT_HOME/$HOSTNAME/sysman/emd sub tree under the Management Agent home directory.
In the event that these files are lost or corrupted before being uploaded to the Management Repository, a loss of monitoring data and any pending alerts not yet uploaded to the Management Repository occurs.
At a minimum, configure these sub-directories on striped redundant or mirrored storage. Availability would be further enhanced by placing the entire $AGENT_HOME on redundant storage. The Management Agent home directory is shown by entering the command '
emctl getemhome' on the command line, or from the Management Services and Repository tab and Agents tab in the Grid Control console.
The Management Service contains results of the intermediate collected data before it is loaded into the Management Repository. The loader receive directory contains these files and is typically empty when the Management Service is able to load data as quickly as it is received. Once the files are received by the Management Service, the Management Agent considers them committed and therefore removes its local copy. In the event that these files are lost before being uploaded to the Management Repository, data loss will occur. At a minimum, configure these sub-directories on striped redundant or mirrored storage. When Management Services are configured for the Shared Filesystem Loader, all services share the same loader receive directory. It is recommended that the shared loader receive directory be on a clustered file system like NetApps Filer.
Similar to the Management Agent directories, availability would be further enhanced by placing the entire Management Service software tree on redundant storage. This can also be determined at the command line using the '
emctl getemhome' or by using the Management Services and Repository tab in the Grid Control console.
Grid Control comes preconfigured with a series of default rules to monitor many common targets. These rules can be extended to monitor the Grid Control infrastructure as well as the other targets on your network to meet specific monitoring needs.
The following list is a set of recommendations that extend the default monitoring performed by Enterprise Manager. Use the Notification Rules link on the Preferences page to adjust the default rules provided on the Configuration/Rules page:
Ensure the Agent Unreachable rule is set to alert on all agent unreachable and agent clear errors.
Ensure the Repository Operations Availability rule is set to notify on any unreachable problems with the Management Service or Management Repository nodes. Also modify this rule to alert on the Targets Not Providing Data condition and any database alerts that are detected against the database serving as the Management Repository.
Modify the Agent Upload Problems Rule to alert when the Management Service status has hit a warning or clear threshold.
Enterprise Manager provides error reporting mechanisms through e-mail notifications, PL/SQL packages, and SNMP alerts. Configure these mechanisms based on the infrastructure of the production site. If using e-mail for notifications, configure the notification rule through the Grid Control console to notify administrators using multiple SMTP servers if they are available. This can be done by modifying the default e-mail server setting on the Notification Methods option under Setup.
Backup procedures for the database are well established standards. Configure backup for the Management Repository using the RMAN interface provided in the Grid Control console. Refer to the RMAN documentation or the Maximum Availability architecture document for detailed implementation instructions.
In addition to the Management Repository, the Management Service and Management Agent should also have regular backups. Backups should be performed after any configuration change. Best practices for backing up these tiers are documented in the section, Section 11.3, "Oracle Enterprise Manager Backup, Recovery, and Disaster Recovery Considerations".
In the event of a problem with Grid Control, the starting point for any diagnostic effort is the console itself. The Management System tab provides access to an overview of all Management Service operations and current alerts. Other pages summarize the health of Management Service processes and logged errors These pages are useful for determining the causes of any performance problems as the summary page shows at a historical view of the amount of files waiting to be loaded to the Management Repository and the amount of work waiting to be completed by Management Agents.
When assessing the health and availability of targets through the Grid Control console, information is slow to appear in the UI, especially after a Management Service outage. The state of a target in the Grid Control console may be delayed after a state change on the monitored host. Use the Management System page to gauge backlog for pending files to be processed.
The model used by the Management Agent to assess the state of health for any particular monitored target is poll based. Management Agents immediately post a notification to the Management Service as soon as a change in state is detected. This infers that there is some potential delay for the Management Agent to actually detect a change in state.