10 Configuring Resource Consumption Management

This chapter describes how to use resource consumption management (RCM) to ensure the fairness of resource allocation and to reduce the contention of shared resources by collocated domain partitions in the server instance. You can create RCM policies using Oracle Enterprise Manager Fusion Middleware Control (FMWC) or WebLogic Scripting Tool (WLST) to provide consistent performance of domain partitions in multitenant environments.

This chapter includes the following sections:

Configuring Resource Consumption Management: Overview

Resource consumption management (RCM) provides a flexible, dynamic mechanism for WebLogic Server system administrators to manage shared resources and provide consistent performance of domain partitions in MT environments.

RCM policies are configured as resource managers. A resource manager can be created with a global scope at the domain level and then used as the RCM policy for any partition within the domain. You can also create a partition-scoped resource manager if the partition has RCM characteristics specific to that partition. See "Configuring Resource Consumption Management: Main Steps"

Software Requirements for Using RCM

RCM requires Oracle Java Development Kit (JDK) 8u60 build 32 or later.

Why Do You Need Resource Consumption Management?

When applications are deployed to multiple domain partitions, sharing low-level resources such as CPU, heap, network, and file descriptors can result in unfair resource allocation. Unusual resource consumption requests may happen for various reasons such as high traffic, application design, or malicious code. These request types can overload the capacity of a shared resource, preventing another collocated domain partition's access to the resource. By employing appropriate RCM policies at the domain partition level, resource consumption management prevents applications in one partition from negatively affecting applications in other partitions.

How to Enable RCM

Set the following JVM arguments to enable RCM in your environment:

-XX:+UnlockCommercialFeatures -XX:+ResourceManagement -XX:+UseG1GC

This flag must be applied on every server instance (JVM) where RCM is enabled.

An alternative method is to uncomment these JAVA_OPTIONS in the startWeblogic.sh file which gets created when a domain is created:

#JAVA_OPTIONS="-XX:+UnlockCommercialFeatures -XX:+ResourceManagement -XX:+UseG1GC ${SAVE_JAVA_OPTIONS}

You must do this on every server instance, and it must be done prior to starting the WebLogic Server instance.

If a Java Security Manager is used with WebLogic Server, then RCM requires the granting of the following permission in weblogic.policy:

permission RuntimePermission("jdk.management.resource.getResourceContextFactory")

For more information about using the Java Security Manager to protect resources in Weblogic Server, see "Using the Java Security Manager to Protect WebLogic Resources" in Developing Applications with the WebLogic Security Service.

Supported Resources for RCM

The following shared resources can be managed through RCM policies:

  • FileOpen: The number of open file descriptors in use by a partition. This includes files opened through FileInputStream, FileOutputstream, RandomAccessFile, and NIO File channels.

  • HeapRetained: The amount of heap (in MB) retained or in use by a partition.

  • CpuUtilization: The percentage of CPU time used by a partition with respect to the available CPU time of the WebLogic Server process.

Configuring Resource Consumption Management: Main Steps

A system administrator specifies resource consumption management policies on shared resources for each partition in the domain using a resource manager. A resource manager consists of one or more policies for one or more shared resources. Each policy consists of a constraint value for a resource and a specified action a WebLogic Server instance takes when the constraint value is met.

The following RCM policy types are supported:

Triggers

A trigger defines the static constraint value for the allowed resource use. When the consumption of that resource exceeds the constraint value, a specified action is performed. This policy type is best suited for environments where the resource use by a partition in the domain is predictable.

A system administrator can select the following actions when creating a trigger policy:

  • Notify: Provides notification to the system administrator that the constraint value has been reached. You can add more than one Notify trigger for a resource. For example, Notify when the Open Files go beyond 20; Notify when the Open Files go beyond 50. Also, you can use the WebLogic Diagnostics Framework (WLDF) to create watch rules to listen to log messages and provide advanced notifications.

  • Slow: Slows the rate at which the resource is consumed. When a Slow action is triggered, the Partition Work Manager's fair share value is reduced, which results in reducing the thread-usage time made available to the partition. For more information about Partition Work Managers, see Configuring Partition Work Managers.

  • Fail: Fails resource consumption requests for a resource until use is below the constraint value.

    Note:

    Fail is applicable only for Open Files and not for other resources.

    For example, to limit the number of open files in partition P1 to fewer than 100 files, create a resource manager with a trigger policy that has a constraint value of 100 units for the Open Files resource and a Fail action for the P1 partition.

  • Shutdown: Initiates the shutdown of a partition while allowing cleanup. This action is useful when a partition exceeds a known constraint value and an adverse effect of shared resources used by other partitions in the domain is expected. A partition is only shut down in the Managed Server when the constraint value has been met, allowing continuous availability in clustered environments. Fail and Shutdown triggers should not be used together.

  • Restart: Restarts the partition on the server instance on which the partition's resource consumption quotas have been breached. Many times an application misbehaves temporarily and a simple restart will suffice to correct the situation. Intervention by a system administrator would be required, but less desirable than enabling this partition auto-restart action. Like other RCM actions, the partition restart action is scoped to a given Managed Server.

    Additionally, you may specify the following optional configuration properties to prevent a partition from restarting in a loop.

    • max-restart-allowed

      The maximum number of RCM initiated partition restarts allowed in the specified time interval, after which the partition will be halted upon an RCM initiated request to restart the partition.

    • max-restart-allowed-interval

      The max-restart-allowed-interval is a fixed-width, sliding-window time interval (in seconds) within which the specified number (in max-restart-allowed) of RCM initiated partition restarts are permitted. A request to restart the partition that exceeds the max-restart-allowed number in the max-restart-allowed-interval leads to the partition being halted rather than being restarted.

    • restart-delay

      A delay (in seconds) introduced before starting the partition.

    You can define these properties in the RCM policy, under <resource-manager> in the config.xml file. All the restart actions within a given RCM policy share the values for these configurable properties. Administrators can also provide a restart delay, when the partition is being restarted. The configured delay would be introduced before starting the partition, and would avoid the partition being restarted in a tight loop or restarting too soon for some temporary external error condition to clear.

    Below is a sample configuration for the RCM restart action.

    <resource-manager>
        <name>ResourceManager-1</name>
     
        <restart-loop-protection>
     
            <!--This indicates whether restart loop protection is enabled or 
                not. If you want to disable the restart loop protection,  
                set this flag to false. -->
          <restart-loop-protection-enabled>true</restart-loop-protection-enabled>
     
            <!-- The partition can be restarted a maximum of 3 times in 60 minutes
                 by the RCM restart action. Subsequently, the partition   
                 would be halted. -->
            <max-restart-allowed>3</max-restart-allowed>
     
            <!-- Within any contiguous interval of 60 minutes, 
                 a maximum number of 3 restarts are allowed,
                 as specified by the max-restart-allowed property. -->
            <max-restart-allowed-interval>3600</max-restart-allowed-interval> 
     
            <!-- Introduce a delay of 10 seconds before starting the 
                 partition. -->
            <restart-delay>10</restart-delay>   
        </restart-loop-protection>
     
        <file-open>
            <name>RM1-FileOpenResource</name>
            <trigger>
                <name>RM1-Trigger-1</name>
                <value>20</value>
                <action>restart</action>
            </trigger>
        </file-open>
    </resource-manager>
    

    When the configured triggers for a restart action are breached, a request to restart the partition is generated, as shown in the following illustration.

    The graphic shows a a request to restart the partition.

Note:

Policy actions may be implemented synchronously or asynchronously depending on the action type. For instance, the Fail action synchronously uses the thread that requested the file open. Other actions, such as the Slow action configured for a Heap Retained resource proceed asynchronously.

Fair Share

The fair share policy allows a system administrator to allocate a share (a percentage of the available resource) based on a representative load of each partition in the domain. Fair share policies are used when the exact use requirements of a resource cannot be determined or are not practical to implement when using resource managers to provide the efficient and fair use of resources. When there is no contention between partitions in a domain for a given resource, each partition is able to use the amount of resources required for its immediate workload. If there is contention between partitions for a given resource, then each partition is constrained to use only its fair share of the available resource. Ensure that limits are set such that overall memory consumption does not cross the maximum available memory resulting in an out-of-memory exception.

Determining Fair Share Allocations for a Resource

A share is an allocation to use a specified amount of a resource. A system administrator allocates a share to a partition by specifying an integer value between 1 and 1000 in the associated resource manager fair share policy. For a given partition, the ratio of its configured fair share value to the sum total of all fair share values for the same resource in the domain determines the amount of resource allocated.

For example, a system administrator specifies a fair share value of 150 for a resource in partition P1 and a value of 100 for the same resource partition P2. If the workload is heavy enough in both partitions to create contention for that resource, then the resource allocation for partitions P1 is 150/(150+100) or 60 percent of the available resource.

Creating a Resource Manager

A resource manager consists of one or more policies for one or more shared resources. Each policy consists of a constraint value for a resource and a specified action that a WebLogic Server instance takes when the constraint value is met.

To configure resource managers, see:

Example RCM Configuration in config.xml

The following example is an annotated RCM configuration similar to the WLST example displayed in Configuring Resource Consumption Management: WLST Example.

<domain>
...
   <!--Define RCM Configuration -->
   <resource-management>
        <resource-manager>
            <name>Approved</name>
            <file-open>
                <trigger>
                    <name>Approved2000</name>
                    <value>2000</value><!-- in units-->
                    <action>shutdown</action>
                </trigger>
                <trigger>
                    <name>Approved1700</name>
                    <value>1700</value>
                    <action>slow</action>
                </trigger>
                <trigger>
                    <name>Approved1500</name>
                    <value>1500</value>
                    <action>notify</action>
                </trigger>
            </file-open>           
            <heap-retained>
                <trigger>
                    <name>Approved2GB</name>
                    <value>2097152</value>
                    <action>shutdown</action>
                </trigger>                               
                <fair-share-constraint>
                    <name>FS-ApprovedShare</name>
                    <value>60</value>
                </fair-share-constraint>
            </heap-retained>
        </resource-manager>

        <resource-manager>
            <name>Trial</name>
            <file-open>
                <trigger>
                    <name>Trial1000</name>
                    <value>1000</value><!-- in units-->
                    <action>shutdown</action>
                </trigger>
                <trigger>
                    <name>Trial700</name>
                    <value>700</value>
                    <action>slow</action>
                </trigger>
                <trigger>
                    <name>Trial500</name>
                    <value>500</value>
                    <action>notify</action>
                </trigger>
            </file-open>
                     ...           
        </resource-manager>

        <resource-manager>
            <name>RestartPartition</name>
            <file-open>
                <name>FileOpenQuota</name>
                <trigger>
                    <name>MaxFileOpenAllowed</name>
                    <value>20</value>
                    <action>restart</action>
                </trigger>
            </file-open>
            <restart-loop-protection>
                <restart-loop-protection-enabled>true</restart-loop-protection-enabled>
                    <max-restart-allowed>3</max-restart-allowed>
                    <max-restart-allowed-interval>3600
                     </max-restart-allowed-interval>
                    <restart-delay>10</restart-delay>
            </restart-loop-protection>
        </resource-manager>
    </resource-management>

    <partition>
        <name>Partition-0</name>
        <resource-group>
            <name>ResourceTemplate-0_group</name>
            <resource-group-template>ResourceTemplate-0</resource-group-template>
        </resource-group>
        ...
        <partition-id>1741ad19-8ca7-4339-b6d3-78e56d8a5858</partition-id>
 
        <!-- RCM Managers are targeted to partitions during partition
             creation or later by system administrators. -->
        <resource-manager-ref>Approved</resource-manager-ref>
    ...
    </partition>
..
</domain>

Dynamic Reconfiguration of Resource Managers

You can dynamically apply or remove a resource management policy from a domain partition. Changes to a resource management policy will be applied to all domain partitions that use that policy.

If a policy update for an active domain partition sets trigger values for a resource that is lower than the current use of that resource, then subsequent use of that resource would have the policy's recourse action applied. If a change to a policy would result in an immediate shutdown of an active domain partition based on the current use value, then the change would not be accepted as a dynamic reconfiguration change.

Configuring Resource Consumption Management: Monitoring Resource Use

RCM metrics for shared resources in a partition are available through a PartitionResourceMetricsRuntimeMBean.

Use these metrics to:

  • Monitor the current resource use in a partition.

  • Profile and analyze the resource consumption of a partition to generate data such as representative loads, peak loads, and peak load times needed to create effective resource managers and WLDF watches and notifications.

To monitor resource managers in Fusion Middleware Control, see "Monitor resource managers" in Administering Oracle WebLogic Server with Fusion Middleware Control.

By default, eager registration of resource meters is turned off. As a result, they get created lazily the first time the resource consumption metrics are queried for a particular resource. In that case, where the resource accounting is started lazily, the values returned from the resource consumption metrics might be different from the actual values of the resource consumed by the partition.

To get a true reflection of the amount of a resource consumed by a partition, the meters should be registered eagerly on partition startup. To enable eager registration of the resource meters, when starting the WebLogic Server instance, set the property, weblogic.rcm.enable-eager-resource-meter-registration, to true, as a JVM argument.

Best Practices and Considerations When Using Resource Consumption Management

The following sections provide best practices and considerations for system administrators developing resource management policies:

General Considerations

Recourse actions must be selected carefully by a system administrator. A lot of resources have complex interactions between them. For instance, slowing down CPU use (resulting in fewer threads allocated to the domain partition) may result in increased heap residency, thereby affecting retained heap use.

For a slow recourse action to be effective, applications must not create or manage threads. Oracle recommends that applications use any of the WebLogic Server capabilities like EJB Timers, Common J Work Manager and Timers, Managed Executor Service, Batch API and such, to manage the tasks, so that the slow recourse action will be effective.

Monitor Average and Peak Resource Use

Before specifying RCM policies, Oracle recommends that system administrators monitor average and peak resource use data and configure policies with sufficient headroom to balance efficient use of resources and meeting their Service Level Agreements (SLAs). See Configuring Resource Consumption Management: Monitoring Resource Use.

When to Use a Trigger

Use triggers when an administrator is aware of precise limits at which the corresponding trigger needs to be executed. The trigger will be executed after the configured threshold is exceeded for some resources like file, and may be delayed for some of the resources like heap and CPU.

When to Use Fair Share

A fair share policy is typically used by a system administrator to ensure that a bounded-size shared resource is shared effectively (and fairly) by competing consumers. A fair share policy may also be employed by a system administrator when a clear understanding of the exact use of a resource by a partition cannot be accurately determined in advance, and the system administrator would like efficient use of resources while ensuring fair allocation of shared resources to co-resident partitions. Use fair share policies in your environment when you have complementary workloads for a resource between partitions. See Use Complementary Workloads.

Use Complementary Workloads

When possible, maximize resource density by balancing the peak use times between partitions so that there is no overlap in peak use times and the sum of their averages is not above their maximum peak value. Antagonistic workloads on the other hand have overlapping peak use times and their sum of averages is greater than their maximum peak values.

Also consider collocating partition workloads that exercise resources differently. For instance, hosting a partition that has a predominantly CPU-bound workload with another partition that has a memory-bound workload could help in achieving better density and improving overall resource use.

When to Use Partition-Scoped RCM Policies

Use partition-scoped dynamic RCM policies if a partition has unique resource requirements. These policies facilitate easy import and export of partition RCM policies to and from existing domains.

If no resource management policies are explicitly set on a partition, then that partition has unconstrained access to available shared resources.

Managing CPU Use

CPU use is an excellent metric to track contention of CPU by collocated domain partitions, and is especially useful in fair share policies for CPU-bound workloads. Consider the following when using RCM policies to maximize CPU use:

  • When considering the workload of all the partitions in a domain (the consolidated workload), the peak CPU use should not greatly exceed the average CPU use. Minimizing the gap between peak CPU load and average CPU load maximizes the CPU use for the domain.

  • Oracle recommends configuring RCM CPU policies so that about 75 percent of CPU use is used for applications housed in the partitions of a domain. The remaining 25 percent should be allocated approximately as: 10 percent for operational tasks (backup, scheduled tasks, and other administration) and 15 percent for cluster failover.

Managing Heap

Develop a memory use plan that supports the requirements of the partition applications while continuing to provide enough available heap (headroom) for the domain and other system work. When evaluating heap requirements for a domain, consider the low, average, steady-state, and peak Heap Retained use values for each partition's representative workload.

RCM Limitations

Be aware of the following RCM limitations:

  • Heap resource consumption tracking and management is supported only when run with the G1 garbage collector (there is no RCM support for other JDK collectors).

  • There is no support to measure and account for resource consumption metrics for activities happening in JNI/native code.

  • Measurements of retained heap and CPU use are performed asynchronously and hence do not represent current (a point-in-time) value.

  • Discrimination of heap use for objects in static fields, and singleton objects of classes loaded from system and shared classloaders are problematic and may not be accurately represented in the final accounting values. If an instance of a class loaded from system and shared classloaders is loaded by a partition, the instance's use of heap is accounted against that partition.

  • Garbage collection activity is not isolated to specific domain partitions in WebLogic Server 12.2.1 with Oracle JDK 8u40.

  • There is a detrimental performance effect to enabling the RCM feature due to the additional tracking and management of resource consumption in the server instance.

Configuring Resource Consumption Management: WLST Example

You can implement and monitor RCM policies using WLST.

RCM WLST Example: Overview

The following is an example of an RCM configuration created using WLST. In this example, a system administrator has defined:

  • A Production resource manager representing the set of resource consumption management polices that the system administrator would like to establish for all production-tiered domain partitions in the domain. The Production resource manager has policies for various resources.

    • For the FileOpen resource type, three triggers are specified. A Production2000 trigger ensures the partition is shut down when the number of open file descriptors reaches 2000. A Production1700 trigger specifies that when the number of open file descriptors exceeds 1700, the domain partition must be slowed down. A Production1500 trigger specifies a notify action.

    • For the HeapRetained resource type, a Production2GB trigger is created to ensure that when the partition's retained heap value reaches 2 GB, the partition must be shut down. A fair share value of 60 is assigned to the Production resource manager.

  • A Trial resource manager that defines a similar but reduced set of policies.

  • A partition named Partition-0.

At the completion of this script, Partition-0 has been assigned the Production resource manager.

RCM WLST Example: WLST Script

The policy discussed in RCM WLST Example: Overview can be created using the WLST script in the following example:

startEdit()
 
cd('/ResourceManagement')
cd(domainName)
rm=cmo.createResourceManager('Approved')
fo=rm.createFileOpen('Approved-FO')
fo.createTrigger('Approved2000',2000,'shutdown')
fo.createTrigger('Approved1700',1700,'slow')
fo.createTrigger('Approved1500',1500,'notify')
hr=rm.createHeapRetained('Approved-HR')
hr.createTrigger('Approved2GB',2097152,'shutdown')
hr.createFairShareConstraint('FS-ApprovedShare', 60)
 
cd('/ResourceManagement')
cd(domainName)
rm=cmo.createResourceManager('Trial')
fo=rm.createFileOpen('Trial-FO')
fo.createTrigger('Trial1000',1000,'shutdown')
fo.createTrigger('Trial700',700,'slow')
fo.createTrigger('Trial500',500,'notify')
 
cd('/ResourceManagement')
cd(domainName)
rm = cmo.createResourceManager('RestartPartition')
rlp = rm.getRestartLoopProtection()
rlp.setRestartLoopProtectionEnabled(true)
rlp.setMaxRestartAllowed(3)
rlp.setMaxRestartAllowedInterval(3600)
rlp.setRestartDelay(10)
fo = rm.createFileOpen('FileOpenQuota')
fo.createTrigger('MaxFileOpenAllowed',20,'restart')
 
save()
activate()
 
startEdit()
cd('/Partitions')
cd(partition-0)
cmo.setResourceManagerRef(getMBean('/ResourceManagement/'+domainName+'/ResourceManager/Approved'))
save()
activate()

Configuring Resource Sharing: Related Tasks and Links

This section provides additional information that may be useful when implementing RCM in your environment.

For more information, see "Multitenancy Tuning Recommendations" in Tuning Performance of Oracle WebLogic Server.