Skip Headers
Oracle® Enterprise Manager Cloud Control Advanced Installation and Configuration Guide
12c Release 1 (12.1.0.1)
E24089-15
  Go To Table Of Contents
Contents
Go To Index
Index

Previous
Previous
 
Next
Next
 

13 Sizing Your Enterprise Manager Deployment

Oracle Enterprise Manager Cloud Control 12c Release 12.1.0.1 has the ability to scale for hundreds of users and thousands of systems and services on a single Enterprise Manager implementation.

This chapter describes techniques for achieving optimal performance using the Oracle Enterprise Manager application. It can also help you with capacity planning, sizing and maximizing Enterprise Manager performance in a large scale environment. By maintaining routine housekeeping and monitoring performance regularly, you insure that you will have the required data to make accurate forecasts of future sizing requirements. Receiving good baseline values for the Enterprise Manager Cloud Control vital signs and setting reasonable warning and critical thresholds on baselines allows Enterprise Manager to monitor itself for you.

This chapter contains the following sections:

Overview of Oracle Enterprise Manager Cloud Control Architecture

The architecture for Oracle Enterprise Manager Cloud Control exemplifies two key concepts in application performance tuning: distribution and parallelization of processing. Each component of Cloud Control can be configured to apply both these concepts.

The components of Enterprise Manager Cloud Control include:

For more information about the Cloud Control architecture, see the Oracle Enterprise Manager Cloud Control 12c Release 12.1.0.1 documentation:

The Oracle Enterprise Manager 12c documentation is available at the following location on the Oracle Technology Network (OTN):

http://otn.oracle.com/documentation/oem.html

Enterprise Manager Cloud Control Sizing

Oracle Enterprise Manager provides a highly available and scalable deployment topology. This chapter lays out the basic minimum sizing and tuning recommendations for initial capacity planning for your Oracle Enterprise Manager deployment. This chapter assumes a basic understanding of Oracle Enterprise Manager components and systems. A complete description of Oracle Enterprise Manager can be obtained from http://docs.oracle.com/cd/E24628_01/doc.121/e25353/overview.htm. This information is a starting point for site sizing. Every site has its own characteristics and should be monitored and tuned as needed.

Sizing is a critical factor for Enterprise Manager performance. Inadequately sized Enterprise Manager deployments will result in frustrated users and the overall benefits of Enterprise Manager may be compromised. The resources required for Enterprise Manager OMS and Repository tiers will vary significantly based on the number of monitored targets. While there are many additional aspects to be considered when sizing Enterprise Manager infrastructure, the following guidelines provide a simple methodology that can be followed to determine the minimum required hardware resources and initial configuration settings for the OMS and Repository tiers.

Overview of Sizing Guidelines

The following sections provide an overview of the sizing guidelines.

Hardware Information

The sizing guidelines outlined in this chapter were obtained by running a virtual environment on the following hardware and operating system combination.

  • Hardware -- Oracle's Sun Fire X4170 M2

  • Hypervisor -- 64 bit Linux Oracle Virtual Server

  • Operating System of Virtual Machines -- 64 bit Oracle Enterprise Linux

The virtual environment setup had a one to one mapping of CPUs between the Oracle Virtual Server (OVS) host and the virtual machines running on it. The OVS servers had enough RAM to support all virtual machines without memory swapping.

This information is based on a 64 bit Oracle Enterprise Linux environment. If you are running on other platforms, you will need to convert the sizing information based on similar hardware performance. This conversion should be based on single-thread performance. Running on a machine with 24 slow cores is not equivalent to running on a machine with 12 fast cores even though the total machine performance might be the same on a throughput benchmark. Single thread performance is critical for good Enterprises Manager user interface response times.

Sizing Specifications

The sizing guidelines for Oracle Enterprise Manager are divided into three sizes: Small, Medium and Large. The definitions of each size are shown in Table 13-1.

Table 13-1 Oracle Enterprise Manager Site Sizes

Size Agent Count Target Count Concurrent User Sessions

Small

< 100

< 1000

<10

Medium

>= 100, < 1000

>= 1000, < 10,000

>= 10, < 25

Large

>= 1000

>= 10,000

>= 25, <= 50*


For larger user loads see Large Concurrent UI Load.

Sizing for Upgraded Installs

For upgraded installs the following queries can be run as the sysman user to obtain the Management Agent and target counts for use in Table 1.

  • Agent count - select count(*) from mgmt_targets where target_type = 'oracle_emd'

  • Target count – select count(*) from mgmt_targets where target_type != 'oracle_emd'

Minimum Hardware Requirements

Table 13-2 lists the minimum hardware requirements for the three configurations.

Table 13-2 Oracle Enterprise Manager Minimum Hardware Requirements

Size OMS Machine Count* Cores per OMS Memory per OMS (GB) Database Machine Count* Cores per Database Machine Memory per Database Machine (GB)

Small

1

2

6

1

2

6

Medium

2

4

8

2 (RAC)

4

8

Large

2

4

8

4

16

8

2 (RAC)

2 (RAC)

8

8

16

16

*The OMS and database instances are not co-located


Network Topology Considerations

A critical consideration when deploying Enterprise Manager Cloud Control is network performance between tiers. Enterprise Manager Cloud Control ensures tolerance of network glitches, failures, and outages between application tiers through error tolerance and recovery. The Management Agent in particular is able to handle a less performant or reliable network link to the Management Service without severe impact to the performance of Enterprise Manager as a whole. The scope of the impact, as far as a single Management Agent's data being delayed due to network issues, is not likely to be noticed at the Enterprise Manager Cloud Control system wide level.

The impact of slightly higher network latencies between the Management Service and Management Repository will be substantial, however. Implementations of Enterprise Manager Cloud Control have experienced significant performance issues when the network link between the Management Service and Management Repository is not of sufficient quality.

The Management Service host and Repository host should be located in close proximity to each other. Ideally, the round trip network latency between the two should be less than 1 millisecond.

Software Configurations

The following sections provide information about small, medium and large configurations.

Small Configuration

The Small configuration is based on the minimum requirements that are required by the Oracle Enterprise Manager installer.

Minimum OMS Settings

No additional settings are required.

Minimum Database Settings

Table 13-3 lists the minimum recommended database settings.

Table 13-3 Small Site Minimum Database Settings

Parameter Minimum Value

*memory_target of 3GB can be used in place of sga_target and pga_aggregate_target

processes

300

pga_aggregate_target*

1024M

sga_target*

2G

redo log file size

300M

shared_pool_size

600M


Medium Configuration

The Medium configuration modifies several out-of-box Oracle Enterprise Manager settings.

Minimum OMS Settings

The Oracle Management Service (OMS) heap size should be set to 4096m.

Minimum Repository Database Settings

Table 13-4 lists the minimum repository database settings that are recommended for a Medium configuration.

Table 13-4 Medium Site Minimum Database Settings

Parameter Minimum Value

*memory_target of 5.25GB can be used in place of sga_target and pga_aggregate_target

processes

600

pga_aggregate_target*

1280M

sga_target*

4G

redo log file size

600M

shared_pool_size

600M


Large Configuration

The Large configuration modifies several out-of-box Oracle Enterprise Manager settings.

Minimum OMS Settings

Table 13-5 lists the minimum OMS settings that are recommended for Large configurations.

Table 13-5 Large Site Minimum OMS Settings

OMS Count Heap Size Minimum Value

2

8192M

4

4096M


Minimum Repository Database Settings

Table 13-6 lists the minimum repository database settings that are recommended for a Large configuration.

Table 13-6 Large Site Minimum Database Settings

Parameter Minimum Value

*memory_target of 7.5GB can be used in place of sga_target and pga_aggregate_target

processes

1000

pga_aggregate_target*

1536M

sga_target*

6G

redo log file size

1000M

shared_pool_size

600M


Repository Tablespace Sizing

Table 13-7 lists the required minimum storage requirements for the Management Repository.

Table 13-7 Total Management Repository Storage

Deployment Size Minimum Tablespace Sizes*
SYSTEM** MGMT_TABLESPACE MGMT_ECM_DEPOT_TS MGMT_AD4J_TS TEMP

*These are strictly minimum values and are intended as rough guidelines only. The actual size of the MGMT_TABLESPACE could vary widely from deployment to deployment due to variations in target type distribution, user customization, and several other factors. These tablespaces are defined with AUTOEXTEND set to ON by default to help mitigate space constraint issues. On raw file systems Oracle recommends using more than the minimum size to help prevent space constraint issues.
**The SYSTEM and TEMP tablespace sizes are minimums for Enterprise Manager only repositories. If Enterprise Manager is sharing the repository database with other application(s), these minimums may be too low.
Note: You cannot monitor tablespaces through the use of alerts with auto extended files in version 11g of Enterprise Manager. You can either set up TABLESPACE FULL alerts generate if you want to have greater control over the management of your tablespaces, or you can allow Oracle to grow your database and not alert you through the AUTOEXTEND feature. Therefore to exercise greater control of the TABLESPACE FULL alerts, you can turn off autoextend.

Small

600 MB

50 GB

1 GB

100 MB

10 GB

Medium

600 MB

200 GB

4 GB

200 MB

20 GB

Large

600 MB

300 GB

Greater than 4 GB

400 MB

40 GB


Additional Configurations

Some Enterprise Manager installations may need additional tuning settings based on larger individual system loads. Additional settings are listed below.

Large Concurrent UI Load

If more than 50 concurrent users are expected per OMS, the following settings should be altered as seen in Table 13-8.

Table 13-8 Large Concurrent UI Load Additional Settings

Process Parameter Value Where To Set

OMS

-Djbo.recyclethreshold

Number of concurrent users / number of OMS

Per OMS

OMS

-Djbo.ampool.maxavailablesize

Number of concurrent users / number of OMS

Per OMS

OMS

Heap Size

Additional 4GB for every increment of 50 users

Per OMS

Database

sga_target

Additional 1GB for every increment of 50 users

Per Instance


Higher user loads will require more hardware capacity. An additional 2 cores for both the database and OMS hosts for every 50 concurrent users.

Example: A site with 1500 agents and 15,000 targets with 150 concurrent users would require at a minimum the setting modifications listed in Table 13-9 (based on a LARGE 2 OMS configuration).

Table 13-9 Large Concurrent UI Load Additional Settings Example for 2 OMS Configurations

Process Parameter Value Calculation

OMS

-Djbo.recyclethreshold

75 (set on each OMS)

150 users / 2 OMS

OMS

-Djbo.ampool.maxavailablesize

75 (set on each OMS)

150 users / 2 OMS

OMS

Heap Size

12GB (set on each OMS)

8GB (standard large setting) + ((150 users – 50 default large user load) / 2 OMS)* (4GB / 50 users)

Database

sga_target

8GB

6GB (standard large setting) + (150 users - 50 default large user load) * (1GB / 50 users)


Minimum Additional Hardware required is listed in Table 13-10.

Table 13-10 Large Concurrent UI Load Minimum Additional Hardware Example For 2 OMS Configuration

Tier Parameter Value Calculation

OMS

CPU cores

24 (total between all OMS hosts)

8 cores * 2 OMS (default large core count) + (150 users - 50 default large user load) *(2 cores * 2 OMS)/ 50 users)

Database

CPU cores

24 (total between all Database hosts)

8 cores * 2 OMS (default large core count) + (150 users - 50 default large user load) *(2 cores * 2 OMS / 50 users)


The physical memory of each machine would have to be increased to support running this configuration as well.

BI Publisher Configuration

If you plan on integrating BI Publisher version 11.1.1.5 with Enterprise Manager Release 12c Cloud Control, which is required for BI Publisher reports to function, add 1.5 GB to the host memory requirements stated above.

Enterprise Manager Cloud Control Performance Methodology

An accurate predictor of capacity at scale is the actual metric trend information from each individual Enterprise Manager Cloud Control deployment. This information, combined with an established, rough, starting host system size and iterative tuning and maintenance, produces the most effective means of predicting capacity for your Enterprise Manager Cloud Control deployment. It also assists in keeping your deployment performing at an optimal level.

Here are the steps to follow to enact the Enterprise Manager Cloud Control sizing methodology:

  1. If you have not already installed Enterprise Manager Cloud Control, choose a rough starting host configuration as listed in Table 13-1.

  2. Periodically evaluate your site's vital signs (detailed later).

  3. Eliminate bottlenecks using routine DBA/Enterprise Manager administration housekeeping.

  4. Eliminate bottlenecks using tuning.

  5. Extrapolate linearly into the future to plan for future sizing requirements.

Step one need only be done once for a given deployment. Steps two, three, and four must be done, regardless of whether you plan to grow your Enterprise Manager Cloud Control site, for the life of the deployment on a regular basis. These steps are essential to an efficient Enterprise Manager Cloud Control site regardless of its size or workload. You must complete steps two, three, and four before you continue on to step five. This is critical. Step five is only required if you intend to grow the deployment size in terms of monitored targets. However, evaluating these trends regularly can be helpful in evaluating any other changes to the deployment.

Step 1: Choosing a Starting Platform Cloud Control Deployment

For information about choosing a starting platform Cloud Control deployment, see Overview of Sizing Guidelines.

Step 2: Periodically Evaluating the Vital Signs of Your Site

This is the most important step of the five. Without some degree of monitoring and understanding of trends or dramatic changes in the vital signs of your Enterprise Manager Cloud Control site, you are placing site performance at serious risk. Every monitored target sends data to the Management Repository for loading and aggregation through its associated Management Agent. This adds up to a considerable volume of activity that requires the same level of management and maintenance as any other enterprise application.

Enterprise Manager has "vital signs" that reflect its health. These vital signs should be monitored for trends over time as well as against established baseline thresholds. You must establish realistic baselines for the vital signs when performance is acceptable. Once baselines are established, you can use built-in Oracle Enterprise Manager Cloud Control functionality to set baseline warning and critical thresholds. This allows you to be notified automatically when something significant changes on your Enterprise Manager site. The following table is a point-in-time snapshot of the Enterprise Manager Cloud Control vital signs for two sites:

Module Metrics EM Site 1 EM Site 2
Site URL
emsite1.acme.com emsite2.acme.com




Target Counts Database Targets 192 (45 not up) 1218 (634 not up)

Host Targets 833 (12 not up) 1042 (236 not up)





Total Targets 2580 (306 not up) 12293 (6668 not up)




Loader Statistics Loader Threads 6 16

Total Rows/Hour 1,692,000 2,736,000

Rows/hour/load/thread 282,000 171,000

Rows/second/load thread 475 187

Percent of Hour Run 15 44




Rollup Statistics Rows per Second 2,267 417

Percent of Hour Run 5 19




Job Statistics Job Dispatchers 2 4

Job Steps/second/dispatcher 32 10




Event Statistics Events Processed (last hour) 536 1,100




Management Service Host Statistics Average % CPU (Host 1) 9 (emhost01) 13 (emhost01)

Average % CPU (Host 2) 6 (emhost02) 17 (emhost02)

Average % CPU (Host 3) N/A 38 (em6003)

Average % CPU (Host 4) N/A 12 (em6004)

Number of CPUs per host 2 X 2.8 (Xeon) 4 X 2.4 (Xeon)

Memory per Host (GB) 6 6




Management Repository Host Statistics Average % CPU (Host 1) 12 (db01rac) 32 (em6001rac)

Average % CPU (Host 2)


Average % CPU (Host 3)


Average % CPU (Host 4)


Number of CPUs per host


Buffer Cache Size (MB)


Memory per Host (GB) 6 12

Total Management Repository Size (GB) 56 98

RAC Interconnect Traffic (MB/s) 1 4

Management Server Traffic (MB/s) 4 4

Total Management Repository I/O (MB/s) 6 27




Enterprise Manager UI Page Response/Sec Home Page 3 6

All Host Page 3 30+

All Database Page 6 30+

Database Home Page 2 2

Host Home Page 2 2

The two Enterprise Manager sites are at the opposite ends of the scale for performance.

EM Site 1 is performing very well with high loader rows/sec/thread and high rollup rows/sec. It also has a very low percentage of hours run for the loader and the rollup. The CPU utilization on both the OMS and Management Repository Server hosts are low. Most importantly, the UI Page Response times are excellent. To summarize, Site 1 is doing substantial work with minimal effort. This is how a well configured, tuned and maintained Oracle Enterprise Manager Cloud Control site should look.

Conversely, EM Site 2 is having difficulty. The loader and rollup are working hard and not moving many rows. Worst of all are the UI page response times. There is clearly a bottleneck on Site 2, possibly more than one.

The following table outlines metric guidelines for the different modules based on tests that were run with the configurations outlined. It can serve as a reference point for you to extrapolate information and data based on the metrics and test environment used in the specified environment.

Table 13-11 Metric Guidelines for Modules Based On Test Environments

Module Metrics Value Test Environment

Loader Statistics

NA

NA

OMS Details


# of OMS Hosts = 2
# of CPU Per Host = 4 Intel Xeon
Memory = 6 GB

Repository Details


# of Repository Nodes = 2
# of CPU per host = 4 Intel Xeon
Memory = 6 GB

EM Details


Shared Recv Directory = Yes
# of Agents = 867
# of Hosts = 867
Total Targets = 1803

The Metrics are collected for 5 hours after 2 OMS instances were started and each agent had 50 MB of upload backlog files.

Total Rows/Hour

4,270,652

Rows/Hour/loaderthread

427,065

Rows/second/loaderthread

120

Rollup Statistics

Rows per second



Job Statistics

Job Dispatchers

1 x Number of OMS instances


Job Steps/second/dispatcher



Notification Statistics

Notifications per second

16

OMS Details


# of OMS Hosts = 1
# of CPU Per Host = 4 Intel Xeon
Memory = 6 GB

Repository Details


# of Repository Nodes = 1
# of CPU per host = 4 Intel Xeon
Memory = 6 GB

EM Details


# of OMS instances = 1
# of Repository Nodes = 1
# of Agents = 2474
# of Hosts = 2474
DB Total Targets = 8361

Alert Statistics

Alerts per hour

7200

OMS Details


# of OMS Hosts = 1
# of CPU Per Host = 4 Intel Xeon

Memory = 6 GB

Repository Details


# of Repository Nodes = 1
# of CPU per host = 4 Intel Xeon

Memory = 6 GB

EM Details


# of OMS instances = 1
# of Repository Nodes = 1
# of Agents = 2474
# of Hosts = 2474

DB Total Targets = 8361

Management Service Host Statistics

Average % CPU (Host 1)

31%

OMS Details


# of OMS Hosts = 2
# of CPU Per Host = 4 Intel Xeon

Memory = 6 GB

Repository Details


# of Repository Nodes = 2
# of CPU per host = 4 Intel Xeon

Memory = 6 GB

EM Details

Shared Recv Directory = Yes


# of Agents = 867
# of Hosts = 867

Total Targets = 1803

The Metrics are collected for 5 hours after 2 OMS instances were started and each agent had 50 MB of upload backlog files.

Average % CPU (Host 2)

34%

Number of CPUs per host

4 (Xeon)

Memory per Host (GB)

6

Management Repository Host Statistics

Average % CPU (Host 1)

32%

OMS Details


# of OMS Hosts = 2
# of CPU Per Host = 4 Intel Xeon

Memory = 6 GB

Repository Details


# of Repository Nodes = 2
# of CPU per host = 4 Intel Xeon

Memory = 6 GB

EM Details

Shared Recv Directory = Yes


# of Agents = 867
# of Hosts = 867

Total Targets = 1803

The Metrics are collected for 5 hours after 2 OMS instances were started and each agent had 50 MB of upload backlog files.

Average % CPU (Host 2)

26%

Number of CPUs per host

4

SGA Target

2 GB

Memory per Host (GB)

6

Total Management Repository Size (GB)

94

RAC Interconnect Traffic (MB/s)

1

Management Server Traffic (MB/s)


Total Management Repository I/O (MB/s)


Enterprise Manager UI Page Response/Sec

Home Page

9.1 secs

OMS Details


# of OMS Hosts = 1
# of CPU Per Host = 4 Intel Xeon

Memory = 6 GB

Repository Details


# of Repository Nodes = 1
# of CPU per host = 4 Intel Xeon

Memory = 6 GB

EM Details


# of OMS instances = 1
# of Repository Nodes = 1
# of Agents = 2474
# of Hosts = 2474

DB Total Targets = 8361

All Host Page

9.8 secs

All Database Page

5.7 secs

Database Home Page

1.7 secs

Host Home Page

< 1 sec


These vital signs are all available from within the Enterprise Manager interface. Most values can be found on the All Metrics page for each host, or the All Metrics page for OMS. Keeping an eye on the trends over time for these vital signs, in addition to assigning thresholds for warning and critical alerts, allows you to maintain good performance and anticipate future resource needs. You should plan to monitor these vital signs as follows:

  • Take a baseline measurement of the vital sign values seen in the previous table when the Enterprise Manager Cloud Control site is running well.

  • Set reasonable thresholds and notifications based on these baseline values so you can be notified automatically if they deviate substantially. This may require some iteration to fine-tune the thresholds for your site. Receiving too many notifications is not useful.

  • On a daily (or weekly at a minimum) basis, watch for trends in the 7-day graphs for these values. This will not only help you spot impending trouble, but it will also allow you to plan for future resource needs.

The next step provides some guidance of what to do when the vital sign values are not within established thresholds. Also, it explains how to maintain your site's performance through routine housekeeping.

Step 3: Using DBA and Enterprise Manager Tasks To Eliminate Bottlenecks

It is critical to note that routine housekeeping helps keep your Enterprise Manager Cloud Control site running well. The following are lists of housekeeping tasks and the interval on which they should be done.

Offline Monthly Tasks

Enterprise Manager Administrators should monitor the database built-in Segment Advisor for recommendations on Enterprise Manager Repository segment health. The Segment Advisor advises administrators which segments need to be rebuilt/reorganized and provides the commands to do so.

For more information about Segment Advisor and issues related to system health, refer to notes 242736.1 and 314112.1 in the My Oracle Support Knowledge Base.

Step 4: Eliminating Bottlenecks Through Tuning

The most common causes of performance bottlenecks in the Enterprise Manager Cloud Control application are listed below (in order of most to least common):

  1. Housekeeping that is not being done (far and away the biggest source of performance problems)

  2. Hardware or software that is incorrectly configured

  3. Hardware resource exhaustion

When the vital signs are routinely outside of an established threshold, or are trending that way over time, you must address two areas. First, you must ensure that all previously listed housekeeping is up to date. Secondly, you must address resource utilization of the Enterprise Manager Cloud Control application. The vital signs listed in the previous table reflect key points of resource utilization and throughput in Enterprise Manager Cloud Control. The following sections cover some of the key vital signs along with possible options for dealing with vital signs that have crossed thresholds established from baseline values.

High CPU Utilization

When you are asked to evaluate a site for performance and notice high CPU utilization, there are a few common steps you should follow to determine what resources are being used and where.

  1. Use the Processes display on the Enterprise Manager Host home page to determine which processes are consuming the most CPU on any Management Service or Management Repository host that has crossed a CPU threshold.

  2. Once you have established that Enterprise Manager is consuming the most CPU, use Enterprise Manager to identify what activity is the highest CPU consumer. Typically this manifests itself on a Management Repository host where most of the Management Service's work is performed. Here are a few typical spots to investigate when the Management Repository appears to be using too many resources.

    1. Click the CPU Used database resource listed on the Management Repository's Database Performance page to examine the SQL that is using the most CPU at the Management Repository.

    2. Check the Database Locks on the Management Repository's Database Performance page looking for any contention issues.

High CPU utilization is probably the most common symptom of any performance bottleneck. Typically, the Management Repository is the biggest consumer of CPU, which is where you should focus. A properly configured and maintained Management Repository host system that is not otherwise hardware resource constrained should average roughly 40 percent or less total CPU utilization. An OMS host system should average roughly 20 percent or less total CPU utilization. These relatively low average values should allow sufficient headroom for spikes in activity. Allowing for activity spikes helps keep your page performance more consistent over time. If your Enterprise Manager Cloud Control site interface pages happen to be responding well (approximately 3 seconds) while there is no significant (constant) loader backlog, and it is using more CPU than recommended, you may not have to address it unless you are concerned it is part of a larger upward trend.

The recommended path for tracking down the root cause of high Management Repository CPU utilization is captured under step 3.b above. This allows you to start at the Management Repository Performance page and work your way down to the SQL that is consuming the most CPU in its processing. This approach has been used very successfully on several real world sites.

If you are running Enterprise Manager on Intel based hosts, the Enterprise Manager Cloud Control Management Service and Management Repository will both benefit from Hyper-Threading (HT) being enabled on the host or hosts on which they are deployed. HT is a function of certain late models of Intel processors, which allows the execution of some amount of CPU instructions in parallel. This gives the appearance of double the number of CPUs physically available on the system. Testing has proven that HT provides approximately 1.5 times the CPU processing power as the same system without HT enabled. This can significantly improve system performance. The Management Service and Management Repository both frequently have more than one process executing simultaneously, so they can benefit greatly from HT.

Loader Vital Signs

The vital signs for the loader indicate exactly how much data is continuously coming into the system from all the Enterprise Manager Agents. The most important items here are the percent of hour runs and rows/second/thread. The (Loader) % of hour run indicates whether the loader threads configured are able to keep pace with the incoming data volume. As this value approaches 100%, it becomes apparent that the loading process is failing to keep pace with the incoming data volume. The lower this value, the more efficiently your loader is running and the less resources it requires from the Management Service host. Adding more loader threads to the OMS can help reduce the percent of hour run for the loader.

Rows/Second/Thread is a precise measure of each loader thread's throughput per second. The higher this number, the better. Rows/Second/Thread as high as 1200 have been observed on some smaller, well configured and maintained Enterprise Manager Cloud Control sites. If you have not increased the number of loader threads and this number is trending down, it may indicate a problem later. One way to overcome a decreasing rows/second/thread is to add more loader threads.

The number of Loader Threads is always set to one by default in the OMS configuration file. Each OMS can have a maximum of 10 loader threads. Adding loader threads to a OMS typically increases the overall host CPU utilization by 2% to 5% on a Enterprise Manager Cloud Control site with many Management Agents configured. Customers can change this value as their site requires. Most medium size and smaller configurations will never need more than one loader thread. Here is a simple guideline for adding loader threads:

Max total (across all OMS instances) loader threads = 2 X number of Management Repository host CPUs

There is a diminishing return when adding loader threads. You will not yield 100% capacity from the second, or higher, thread. There should be a positive benefit, however. As you add loader threads, you should see rows/second/thread decrease, but total rows/hour throughput should increase. If you are not seeing significant improvement in total rows/hour, and there is a constantly growing loader file backlog, it may not be worth the cost of the increase in loader threads. You should explore other tuning or housekeeping opportunities in this case.

To add more loader threads, you can change the following configuration parameter where n is a positive integer [1-10]:

em.loader.threadPoolSize=n

The default is 1 and any value other than [1-10] will result in the thread pool size defaulting to 1. This property file is located in the {ORACLE_HOME}/sysman/config directory. Changing this parameter will require a restart of the Management Service to be reloaded with the new value.

The following two parameters are set for the Receiver module which receives files from agents.

  1. em.loader.maxDataRecvThreads=n (Default 75)

    Where n is a positive integer and default value is 75. This is used to limit the maximum number of concurrent data file receiver threads. So at the peak time only 75 receiver threads will be receiving files and an extra request will be rejected with a Server Busy error. These rejected requests will be resent by the agent after the default retry time.

    Care should be taken while setting this value as too high a value will put an increased load on the OMS machine and shared receiver directory box. If too low a value is set then data file receive throughput will be low.

  2. oracle.sysman.emRep.dbConn.maxConnForReceiver=n (Default 25)

    Where n is a positive integer and default value is 25. This is used to set the maximum number of Repository Database connections for the receive threads. Oracle recommends you set this value equal to em.loader.maxDataRecvThreads, as each Receiver thread gets one DB session and there will be no wait for DB connections.

Rollup Vital Signs

The rollup process is the aggregation mechanism for Enterprise Manager Cloud Control. The two vital signs for the rollup are the rows/second and % of hour run. Due to the large volume of data rows processed by the rollup, it tends to be the largest consumer of Management Repository buffer cache space. Because of this, the rollup vital signs can be great indicators of the benefit of increasing buffer cache size.

Rollup rows/second shows exactly how many rows are being processed, or aggregated and stored, every second. This value is usually around 2,000 (+/- 500) rows per second on a site with a decent size buffer cache and reasonable speedy I/O. A downward trend over time for this value may indicate a future problem, but as long as % of hour run is under 100 your site is probably fine.

If rollup % of hour run is trending up (or is higher than your baseline), and you have not yet set the Management Repository buffer cache to its maximum, it may be advantageous to increase the buffer cache setting. Usually, if there is going to be a benefit from increasing buffer cache, you will see an overall improvement in resource utilization and throughput on the Management Repository host. The loader statistics will appear a little better. CPU utilization on the host will be reduced and I/O will decrease. The most telling improvement will be in the rollup statistics. There should be a noticeable improvement in both rollup rows/second and % of hour run. If you do not see any improvement in any of these vital signs, you can revert the buffer cache to its previous size. The old Buffer Cache Hit Ratio metric can be misleading. It has been observed in testing that Buffer Cache Hit Ratio will appear high when the buffer cache is significantly undersized and Enterprise Manager Cloud Control performance is struggling because of it. There will be times when increasing buffer cache will not help improve performance for Cloud Control. This is typically due to resource constraints or contention elsewhere in the application. Consider using the steps listed in the High CPU Utilization section to identify the point of contention. Cloud Control also provides advice on buffer cache sizing from the database itself. This is available on the database Memory Parameters page.

One important thing to note when considering increasing buffer cache is that there may be operating system mechanisms that can help improve Enterprise Manager Cloud Control performance. One example of this is the "large memory" option available on Red Hat Linux. The Linux OS Red Hat Advanced Server™ 2.1 (RHAS) has a feature called big pages. In RHAS 2.1, bigpages is a boot up parameter that can be used to pre-allocate large shared memory segments. Use of this feature, in conjunction with a large Management Repository SGA, can significantly improve overall Cloud Control application performance. Starting in Red Hat Enterprise Linux™ 3, big pages functionality is replaced with a new feature called huge pages, which no longer requires a boot-up parameter.

Rollup Process

The Rollup process introduces the concept of rollup participating instance; where rollup processing will be distributed among all participating instances. To add a candidate instance to the participating EMROLLUP group, the parameter instance_groups should be set on the instance level as follows:

  • Add EMROLLUP_1 to the instance_group parameter for node 1

    Add EMROLLUP_2 to the instance_group parameter for node 2

  • Introduce the PQ and PW parallel processing modes where:

    • PQ is the parallel query/parallel dml mode. In this mode, each participating instance will have one worker utilizing the parallel degree specified.

    • PW is the parallel worker mode. In this mode, each participating instance will have a number of worker jobs equal to the parallel level specified

  • Distribute the work load for all participating RAC instances as follows:

    • Each participating instance will be allocated equal number of targets. So for (n) number of participating instances with total workload (tl), each instance will be allocated (tl/n).

    • Each worker on any participating instance will be allocated equal number of targets of that instance workload. So for (il) number of targets per instance with (w) number of workers, each worker will be allocated (il/w).

    • For each worker, the load is further divided into batches to control the number of times the rollup SQL is executed. The number of rows per batch will be the total number of rows allocated for the worker divided by the number of batches.

Use the following recommendations as guidelines during the Rollup process:

  • Use the parallel worker (PW) mode, and utilize the participating EMROLLUP_xx instance group.

  • The recommendation is to use the parallel worker mode.

  • Splitting the work among more workers will improve the performance and scalability until a certain point where the diminishing returns rule will apply. This is dependent on the number of CPUs available on each RAC node. In this test case, running with 10 workers was the optimal configuration, balancing the response time, machine CPU and IO utilization.

  • It is important to set a proper batch size (10 recommended). The optimal run was the one with 10 batches, attributed to balancing the number of executions of the main SQL (calling EMD_1HOUR_ROLLUP) and the sort space needed for each individual execution.

  • Start by setting the number of batches to 10 bearing in mind the number of batches can be changed based on the data distribution.

The recommendations above will yield the following results. Using the multi-instance parallel worker (8 PW) mode (with the redesigned code described earlier) improves the performance by a factor of 9-13 when utilizing two participating RAC instances.

Rollup row count (in millions) in MGMT_METRICS_1HOUR Time (min) Workers Batch Size
29.5 30 8 1
9.4 5 8 10
** For the entire test there were 15779 distinct TARGET_GUID

** The test produced “29.5 Million” new rollup rows in MGMT_METRICS_1HOUR


Run ** Rows/Workers Batches/Workers Rows/Batch Time (min)
8 PW /1 instance 3945 3945 1 40
8 PW /2 instances 1973 1973 1 30

Job, Notification, and Alert Vital Signs

Jobs, notifications, and alerts are indicators of the processing efficiency of the Management Service(s) on your Enterprise Manager Cloud Control site. Any negative trends in these values are usually a symptom of contention elsewhere in the application. The best use of these values is to measure the benefit of running with more than one OMS. There is one job dispatcher in each OMS. Adding OMS instances will not always improve these values. In general, adding OMS instances will improve overall throughput for Cloud Control when the application is not otherwise experiencing resource contention issues. Job, Notification, and Alert vital signs can help measure that improvement.

I/O Vital Signs

Monitoring the I/O throughput of the different channels in your Enterprise Manager Cloud Control deployment is essential to ensuring good performance. At minimum, there are three different I/O channels on which you should have a baseline and alert thresholds defined:

  • Disk I/O from the Management Repository instance to its data files

  • Network I/O between the OMS and Management Repository

  • RAC interconnect (network) I/O (on RAC systems only)

You should understand the potential peak and sustained throughput I/O capabilities for each of these channels. Based on these and the baseline values you establish, you can derive reasonable thresholds for warning and critical alerts on them in Cloud Control. You will then be notified automatically if you approach these thresholds on your site. Some Cloud Control site administrators can be unaware or mistaken about what these I/O channels can handle on their sites. This can lead to Enterprise Manager Cloud Control saturating these channels, which in turn cripples performance on the site. In such an unfortunate situation, you would see that many vital signs would be impacted negatively.

To discover whether the Management Repository is involved, you can use Cloud Control to check the Database Performance page. On the Performance page for the Management Repository, click the wait graph showing the largest amount of time spent. From this you can continue to drill down into the actual SQL code or sessions that are waiting. This should help you to understand where the bottleneck is originating.

Another area to check is unexpected I/O load from non-Enterprise Manager Cloud Control sources like backups, another application, or a possible data-mining co-worker who engages in complex SQL queries, multiple Cartesian products, and so on.

Total Repository I/O trouble can be caused by two factors. The first is a lack of regular housekeeping. Some of the Cloud Control segments can be very badly fragmented causing a severe I/O drain. Second, there can be some poorly tuned SQL statements consuming much of the site I/O bandwidth. These two main contributors can cause most of the Cloud Control vital signs to plummet. In addition, the lax housekeeping can cause the Management Repository's allocated size to increase dramatically.

One important feature of which to take advantage is asynchronous I/O. Enabling asynchronous I/O can dramatically improve overall performance of the Cloud Control application. The Sun Solaris™ and Linux operating systems have this capability, but may be disabled by default. The Microsoft Windows™ operating system uses asynchronous I/O by default. Oracle strongly recommends enabling of this operating system feature on the Management Repository hosts and on Management Service hosts as well.

Automatic Storage Management (ASM) is recommended for Enterprise Manager Cloud Control repository database storage.

About the Oracle Enterprise Manager Performance Page

There may be occasions when Enterprise Manager user interface pages are slow in the absence of any other performance degradation. The typical cause for these slow downs will be an area of Enterprise Manager housekeeping that has been overlooked. The first line of monitoring for Enterprise Manger page performance is the use of Enterprise Manager Beacons. These functionalities are also useful for web applications other than Enterprise Manager.

Beacons are designed to be lightweight page performance monitoring targets. After defining a Beacon target on an Management Agent, you can then define UI performance transactions using the Beacon. These transactions are a series of UI page hits that you will manually walk through once. Thereafter, the Beacon will automatically repeat your UI transaction on a specified interval. Each time the Beacon transaction is run, Enterprise Manager will calculate its performance and store it for historical purposes. In addition, alerts can be generated when page performance degrades below thresholds you specify.

When you configure the Enterprise Manager Beacon, you begin with a single predefined transaction that monitors the home page you specify during this process. You can then add as many transactions as are appropriate. You can also set up additional Beacons from different points on your network against the same web application to measure the impact of WAN latency on application performance. This same functionality is available for all Web applications monitored by Enterprise Manager Cloud Control.

After you are alerted to a UI page that is performing poorly, you can then use the second line of page performance monitoring in Enterprise Manager Cloud Control. This new end-to-end (or E2E) monitoring functionality in Cloud Control is designed to allow you to break down processing time of a page into its basic parts. This will allow you to pinpoint when maintenance may be required to enhance page performance. E2E monitoring in Cloud Control lets you break down both the client side processing and the server side processing of a single page hit.

The next page down in the Middle Tier Performance section will break out the processing time by tier for the page. By clicking the largest slice of the Processing Time Breakdown pie chart, which is JDBC time above, you can get the SQL details. By clicking the SQL statement, you break out the performance of its execution over time.

The JDBC page displays the SQL calls the system is spending most of its page time executing. This SQL call could be an individual DML statement or a PL/SQL procedure call. In the case of an individual SQL statement, you should examine the segments (tables and their indexes) accessed by the statement to determine their housekeeping (rebuild and reorg) needs. The PL/SQL procedure case is slightly more involved because you must look at the procedure's source code in the Management Repository to identify the tables and associated indexes accessed by the call.

Once you have identified the segments, you can then run the necessary rebuild and reorganization statements for them with the OMS down. This should dramatically improve page performance. There are cases where page performance will not be helped by rebuild and reorganization alone, such as when excessive numbers of open alerts, system errors, and metric errors exist. The only way to improve these calls is to address (for example, clean up or remove) the numbers of these issues. After these numbers are reduced, then the segment rebuild and reorganization should be completed to optimize performance. These scenarios are covered in Step 3: Using DBA and Enterprise Manager Tasks To Eliminate Bottlenecks. If you stay current, you should not need to analyze UI page performance as often, if at all.

Determining the Optimum Number of Middle Tier OMS Servers

Determining the optimum number of middle tier OMS servers is not a trivial task. A number of data points must be considered for an informed, justified and acceptable decision for introducing additional OMS instances. The number of monitored targets is one of the first considerations, but its weight in decision making is normally not substantial.

The following items should be considered and examined as part of this exercise:

  • The volume of job automation and scheduling used

  • The number of administrators working simultaneously in the console

  • Network bandwidth and data channel robustness from agents to the OMS servers

  • Number of triggered violations and notifications

  • Speed and stability of the IO system the OMS servers use

Careful investigation of each category is essential to making an informed decision. In some cases, just adding an OMS server or providing more CPU or memory to the same host may not make any difference in performance enhancement. You can use the current running OMS instances to collect accurate statistics on current OMS performance to calculate the number of required OMS servers for current or future deployments. Enterprise Manager has "vital signs" that reflect its health. These vital signs should be monitored for trends over time as well as against established baseline thresholds.

Step 5: Extrapolating Linearly Into the Future for Sizing Requirements

Determining future storage requirements is an excellent example of effectively using vital sign trends. You can use two built-in Cloud Control charts to forecast this: the total number of targets over time and the Management Repository size over time.

Both of the graphs are available on the All Metrics page for the Management Service. It should be obvious that there is a correlation between the two graphs. A straight line applied to both curves would reveal a fairly similar growth rate. After a target is added to Enterprise Manager Cloud Control for monitoring, there is a 31-day period where Management Repository growth will be seen because most of the data that will consume Management Repository space for a target requires approximately 31 days to be fully represented in the Management Repository. A small amount of growth will continue for that target for the next year because that is the longest default data retention time at the highest level of data aggregation. This should be negligible compared with the growth over the first 31 days.

When you stop adding targets, the graphs will level off in about 31 days. When the graphs level off, you should see a correlation between the number of targets added and the amount of additional space used in the Management Repository. Tracking these values from early on in your Enterprise Manager Cloud Control deployment process helps you to manage your site's storage capacity proactively. This history is an invaluable tool.

The same type of correlation can be made between CPU utilization and total targets to determine those requirements. There is a more immediate leveling off of CPU utilization as targets are added. There should be no significant increase in CPU cost over time after adding the targets beyond the relatively immediate increase. Introducing new monitoring to existing targets, whether new metrics or increased collections, would most likely lead to increased CPU utilization.

Using Returning Query Safeguards to Improve Performance

On the All Targets page, Enterprise Manager uses a safeguard that prevents a flood of data from slowing performance and consuming excessive resources within the OMS by limiting the number of rows that can be returned from a query. By default, the limit is set to 2000, but an Enterprise Manager administrator can modify the limit with the following command:

emctl set property -name oracle.sysman.emSDK.eml.maxRows -value 2000

Providing a value equal to 0 will turn off the safeguard and fetch all rows. The new value takes immediate effect; no OMS restart is required. If the value is less than 0, the default value (2000) will be used instead. The only way to indicate that no limiting should be performed is to set the value to exactly 0.

When there are too many results returned from a query and this limit comes into effect, the following message appears under the results table:

"This table of search results is limited to 2000 targets. Narrow the results by using Refine Search or Search Target Name. See the tuning guide for how to modify this limit."

Similar behaviors (and messages) are applied to other large tables throughout Enterprise Manager. The same OMS property (oracle.sysman.emSDK.eml.maxRows) controls the maximum limit for all of them together. This matches the behavior (and reuses the existing property) from previous Enterprise Manager releases.

Overview of Repository and Sizing Requirements for Fusion Middleware Monitoring

A Fusion Middleware target is like any other Enterprise Manager target. Therefore any repository or sizing guideline that is applicable for an Enterprise Manager target would be applicable on a Fusion Middleware target.

One major concern in the case of Fusion Middleware discovery is that too many targets may be discovered, created and monitored. This adds additional load on the OMS instance, repository and agent. In the case of very large number of targets, after target discovery Oracle recommends that users should review all the targets and their respective metrics.

Based on requirements, users should finalize which targets and metrics should be monitored and the required frequency those targets should be monitored.

After discovery, Oracle recommends you allow Fusion Middleware/ADP/JVMD monitoring to run for some duration (a few days to possibly a few weeks) and continuously monitor the database size and Operating System file system growth (in the case of ADP; ADP Manager requires a minimum of 10GB of disk space) until it becomes constant. You can then fine tune various parameters associated with these different features.

In version 12c of Enterprise Manager, both ADP and JVMD use Enterprise Manager repository as their repository. Their data are stored in the MGMT_AD4J_TS tablespace.

ADP Monitoring

Use the following information when utilizing ADP Monitoring.

  • ADP Manager Resources Requirement

    While managing 70K managed entities, if the number of managed entities is high you must allocate resources accordingly.

    Resource Amount
    Physical Memory 2 GB
    Minimum Disk Space 10 GB

  • ADP Data requirement

    To monitor each entity per JVM, the MGMT_AD4J_TS tablespace must have 8 MB available.

  • ADP Data Retention Policy

    ADP maintains sophisticated multi-tiered logic for aggregation (or compression) of performance data. This helps to optimize performance of interaction with the internal data repository both when querying data for presentation or inserting new performance metrics.

    Users who want to store longer term data should look for this section in Acsera.properties:

    Example 13-1

    #########################
    # Production setting
    # NOTE: use Model.GlobalSamplingRateSecs to configure Metric.Grain.0
    #########################
    Metric.Grain.0 0s
    Metric.TableInterval.0 = 4h
    Metric.DataLife.0 = 2d
     
    Metric.Grain.1 = 3m
    Metric.TableInterval.1 =1d
    Metric.DataLife.1 = 8d
     
    #Metric.Grain.2 = 30m
    #Metric.TableInterval.2 = 7d
    #Metric.DataLife.2 = 420d
    

    Uncomment the last 3 lines for the Metric.*.2 properties.

JVMD Monitoring

Use the following information when employing JVMD Monitoring.

  • JVMD Manager Resources Requirement

    To manage 200-300 jvms, JVMD manager requires physical memory of 1 GB. JVMD manager caches monitoring data in the TEMP space for each pool and flushes to the database frequently. Usually, depending on the number of pools the manager is monitoring and the amount of data being gathered from each pool, the size requirement of these temporary cache files varies, but it is rare to see more than a few MBs for each pool. If this is a concern, the TEMP space should be allocated accordingly.

  • JVMD Data requirement

    To monitor every JVM with OOB settings, the MGMT_AD4J_TS tablespace must have 50-100MB available.

  • JVM Diagnostics Historical Data and its Retention policy

    Historical data is available at three summary levels 0, 1 and 2.

    • Summary level 0 - is raw sample data taken at the specified pool polling interval (default 2 seconds). If you look at data within one hour on the Performance Diagnostics page, it shows summary level 0 data. Level 0 data is retained for 24 hours and subsequently purged. It can be changed via the Console Setup page, but before increasing the value, you should ensure that the repository is tuned properly to handle such large amounts of data.

    • Summary level 1 - is aggregated data. If you view data after more than one hour but less than 5 hours, it is summary level 1 data. The default aggregation interval is 90 seconds. This value can be changed via the Console Setup page. Level 1 data is retained for 20 days and subsequently purged.

    • Summary level 2 - is further aggregated data. If you view data more than five hours old, it is summary level 2 data. This data is aggregated every 60 minutes. Level 2 data is retained for 400 days and subsequently purged.

There are two JVMD features that can drastically affect MGMT_AD4J_TS tablespace usage:

  • JVMD Heap Dumps

    Analyzing heap requires massive tablespace resources. Oracle recommends having 5 times the size of the heap dump file you are loading free in your tablespace. Since you will have the heap dump file and know its size before you run the load script, you should ensure that you have adequate space to accommodate the dump before you load it into your database.

  • Thread Traces

    While these are smaller than heaps by an order of magnitude, these are loaded into the database automatically by default when you initiate a trace at the console. The size of these traces can vary dramatically depending on the number of active threads during the trace, the duration of the trace, and the sample interval of the trace. They should generally be under 100MB each, but a user utilizing a large number of these could manually fill up the database quickly. Again, since these are created only by manual intervention, you should ensure that there is adequate space to accommodate traces before initiating them.