13 Monitoring a Coherence Cluster

After you have discovered the Coherence target and enabled the Management Pack Access, you can start monitoring the health and performance of the cluster. You can monitor the entire cluster or drill down to the various entities of the cluster like nodes, caches, services, connection managers, and connections.

This chapter contains the following sections:

Cluster Level Pages
Detailed Pages
Performance Pages
Log File Monitoring
Cache Data Management
Reap Session Support
Push Replication Pattern
Transactional Cache Support
Integration with JVM Diagnostics
Viewing Performance Summary
Viewing Configuration Topology
Troubleshooting Coherence

13.1 Cluster Level Pages

At the cluster level, you can view the Home page for the cluster, view the performance of all nodes, caches, and connections in the cluster, and perform administrative tasks (compare and change configuration) for the different entities in the cluster.

13.1.1 Cluster Level Home Page

You can get a global view of the cluster from the Home page by following these steps:

From the Targets menu, select Middleware, then click on a Coherence Cluster target. The Coherence Cluster Home page is displayed.
Figure 13-1 Cluster Home Page
This page contains the following sections:
- General
- Graphs
- Cluster Management
- Service
- Applications
- Metric and Host Alerts

13.1.1.1 General

This section contains the following details:

Name and status of the cluster.
Availability%: This is really the availability of the cluster management node over the last 24 hours.
Number of Up Nodes: The number of nodes that are Up. Click on the link to drill down to the Selected Nodes Performance page.
Number of Down Nodes: The number of nodes that are Down. Click on the link to drill down to the Down Nodes page.
Number of Departed Nodes: The number of nodes that have departed from the cluster.
Storage Enabled Nodes: Indicates the number of nodes that are storage enabled.
Number of Weak Nodes: The number of nodes that are weak and have communication and performance issues. Click on the link to drill down to the Node Performance page.
Weakest Node: The weakest node in the cluster. Click on the link to drill down to the Node Home page.
Node with Max Queue Size: Indicates the node with the maximum queue size value in the cluster.
Node with Minimum Memory: Indicates the node with the minimum available memory in the cluster.
Number of Caches and Objects: The number of caches in the cluster and the number of objects stored in all caches in the cluster. Click on the Number of Caches link to drill down to the Cache Performance page.
Publisher and Receiver Success Rates: The Publisher and Receiver success rate for this cluster node since the node statistics were last reset.
Management Bean Server Node: This is the management node that contains the Coherence MBeanServer. Click on the link to drill down to the Node Home page.
Monitoring Agent: The Oracle Management Agent monitoring the cluster.
MBean Server Host: The host on which the management node is running. If the node on the MBean Server Host is not accessible, the monitoring capability of the entire cluster will be affected. To avoid this, we recommend that at least two management nodes are running on the cluster. If a management node departs from the cluster, you must update the host and port target properties to point to the host with the running management node.
License Mode: The license mode that this cluster is using. Possible values are Evaluation, Development or Production.
Product Edition: The product edition that this cluster is running on. Possible values are: Standard Edition (SE), Enterprise Edition (EE), Grid Edition (GE).

13.1.1.2 Graphs

Graphs indicating the health of the cluster are displayed here. The following graphs are displayed:

Nodes Uptime: This graph displays groups of nodes according to their uptime. The Node Uptime is calculated as the difference between the Current Time and the Node Timestamp. Nodes that have an uptime of less than a minute are displayed in the seconds bar, nodes with an uptime of less than a hour are displayed in the minutes bar and so on.
Caches with Lowest Hits to Gets Ratio: This graph shows caches (up to a maximum of 5) that have lowest Hits to Gets ratio. Click on the cache name in the legend section to drill down in to the Cache Details page to further investigate the reasons for the low hits to gets ratio.

Note:

The data in the General and Graphs section will be refreshed after one collection interval (from 5 to 15 minutes).

13.1.1.3 Cluster Management

In this section, you can start and stop one or more nodes, or stop a cluster with the default start script that is located in the emas/scripts/coherence directory of the Monitoring Agent.

You can do the following:

Start New Nodes: You can start one or more nodes based on an existing node. The new node will have the same configuration as the existing node.
Stop Nodes: You can stop all the nodes on a specific host.
Stop Cluster: You can stop an entire cluster if all the hosts are managed by Enterprise Manager Cloud Control.
Note:
You can set up a corrective action to start departed Coherence Nodes automatically. Before you define the corrective actions, you must set up preferred credentials on the hosts on which the departed nodes need to be restarted.
When a node departs, this corrective action is launched within 30 seconds and the new node is automatically started on the same host if the following variables have been defined:
- oracle.coherence.startscript: The absolute path to the start script needed to bring up a Coherence node. All customizations needed for starting this node should be in this script.
- oracle.coherence.home: The absolute path to the location in which the coherence folder is present which is $INSTALL_DIR/coherence. This folder contains Coherence binaries and libraries.

13.1.1.4 Services

This section shows all the services in the cluster. You can view the type of service (Cluster Service, Distributed Cache Service, Invocation Service, and Replicated Cache Service), status of the service (Machine-Safe, Node-Safe, and Endangered), the number of nodes in the service, storage enabled nodes, endangered nodes, and active transactions.

13.1.1.5 Applications

This section shows the applications that use this Coherence cluster to cache their HTTPSession Objects. You can view details of the Local Cache, Overflow Cache, and Servlet Context Cache.

13.1.1.6 Metric and Host Alerts

This section lists the alerts from all types of entities in the cluster; nodes, caches, services, connections and connection managers, along with their severity and the date on which the alert was triggered. The alerts are generated based on the thresholds defined in the Metrics Collection file. To configure these threshold values, select the Monitoring menu option from the Oracle Coherence Cluster and select the Metric and Collection Settings submenu.Refer to the Enterprise Manager Online Help for detailed information on the parameters displayed in the screen.

13.1.1.7 Cluster Level Operations

From the Cluster Home page, you can several other operations such as:

Viewing The Performance Summary: From the Oracle Coherence Cluster menu, select Monitoring, then select Performance Summary from the Oracle Coherence Cluster menu. You can view the performance of the cluster on this page.
Metric and Collection Settings: From the Oracle Coherence Cluster menu, select Monitoring, then select Metric and Collection Settings from the Oracle Coherence Cluster menu. You can set up corrective actions to add nodes and caches as Enterprise Manager targets.
Refreshing Cluster: From the Oracle Coherence Cluster menu, select Refresh Cluster. You can refresh a cluster to add newly discovered nodes and caches as Enterprise Manager targets or update any nodes whose attributes have been changed.
Coherence Node Provisioning: From the Oracle Coherence Cluster menu, select Coherence Node Provisioning. You can deploy a Coherence node across multiple targets in a farm.
Last Collected Configuration: From the Oracle Coherence Cluster menu, select Configuration, then select Last Collected. You can view the latest or saved configuration data for the Coherence cluster.
Topology: From the Oracle Coherence Cluster menu, select Configuration, then select Topology. The Configuration Topology Viewer provides a visual layout of the relationship of the Coherence cluster with other targets.

13.1.2 Cluster Level Node Performance Page

The Node Performance page displays a historical view of the metric data as it is stored in the repository. Click the Node Performance tab to view this page.

Figure 13-2 Coherence Node Performance Page

This page displays the performance of all the nodes in the cluster over a specified period of time. You can see charts showing the top nodes with lowest available memory, maximum send queue size, maximum puts, and maximum gets. By default, you can see the average performance metrics for the last 24 hours in all the Performance pages. If a target has been recently added, you can view real time data after one collection interval (between 5 to 15 minutes) since the 24 hour performance metrics will not be available. To view the real time charts, select one of the Real Time options in the View Data drop-down list in any of Performance pages. Using the View Data options, you can also view the average performance metrics for the last 7 or 31 days.The Node Performance page tab shows the performance of all nodes in this cluster. If you click on a link that shows multiple nodes like weak nodes, storage nodes, etc., the performance of the selected nodes will be displayed on this page. You can toggle between the two modes to see the performance of the selected nodes or all the nodes.

13.1.3 Cluster Level Cache Performance Page

This page displays the cache related performance over a specified period of time. You can view the performance of the top caches or all the caches.

Figure 13-3 Cluster Level Cache Performance

Select the All Caches option from the drop-down list to view the performance of all the caches in the cluster. If you select the All Caches option, you can see the total and average metric values over the selected period of time.

13.1.4 Cluster Level Connection Performance Page

This page shows the performance of all connection managers and connections in the cluster. The following Connection Manager graphs are displayed:

Top Connection Managers with Most Bytes Sent since the connection manager was last started.
Top Connection Managers with Most Bytes Received since the connection manager was last started.

A table with the list of Connection Managers is displayed with the following details:

Connection Manager: This is the name of the connection manager. It indicates the Service Name and the Node ID where the Service Name is the name of the service used by this Connection Manager. Click on the link to drill down to the Connection Manager Home page.
Service: The name of the service. Click on the link to drill down to the Service Home page.
Node ID: An ID is automatically assigned to any node that is part of the cluster.
Bytes Sent: The number of bytes sent per minute.
Bytes Received: The number of bytes received per minute.
Outgoing Buffer Pool Capacity: The maximum size of the outgoing buffer pool.
Outgoing Buffer Pool Size: The currently used value of the outgoing buffer pool.
Outgoing Message Backlog: The number of outgoing messages in the backlog
Outgoing Byte Backlog: The number of outgoing bytes in the backlog.

The following Connection related graphs are displayed:

Top Connections with Most Bytes Sent since the connection was last started.
Top Connections with Most Bytes Received since the connection was last started.

A table with the list of connections is displayed. Click on the link to drill down to the Details page.

Remote Client: The host on which this connection exists.
Up Since: The date and time from which this connection is running.
Connection Manager: This is the name of the connection manager. Click on the link to drill down to the Connection Manager Home page.
Service: The name of the service. Click on the link to drill down to the Service Home page.
Node ID: An ID is automatically assigned to any node that is part of the cluster.
Bytes Sent: The number of bytes sent per minute.
Bytes Received: The number of bytes received per minute.
Connection Time: The connection time in minutes.
Outgoing Message Backlog: The number of outgoing messages in the backlog.
Outgoing Byte Backlog: The number of outgoing bytes in the backlog.

13.1.5 Cluster Level Administration Page

This page allows you to compare configurations of nodes, caches, and services.

Figure 13-4 Cluster Level Administration Page

On this page, you can do the following:

Change Configuration: Select an entity (node, cache, connection manager, connection, or service) for which the configuration needs to be modified and click Go. The Change Configuration page is displayed. Enter the new values and click Update to save the values and return to the Coherence Node Administration page.
Compare Two Entities: To compare two entities (node, cache, connection managers, connections, and services), select the first entity from the first drop-down list, the second entity from the second drop-down list and click Go. You will see the Details page where the details of the selected entities are shown side by side in a two column table.

13.2 Detailed Pages

From the cluster level pages, you can click on any hyperlink and drill down to the detailed home pages of each entity in the cluster. Hyperlinks from one entity allow you to go to another entity. For example, you can view all the nodes of a cache in the Cache Detailed page, identify the node that is not contributing well for this cache, click on the node hyperlink and drill down to the Node Detailed page for further investigation. You can compare that node configuration with other nodes and change its runtime configuration if required.

Viewing The Performance Summary: From the Oracle Coherence Cluster menu, select Monitoring, then select Performance Summary. You can view the performance of the node or cache on this page.
Metric and Collection Settings: From the Oracle Coherence Cluster menu, select Monitoring, then select Metric and Collection Settings. You can set up corrective actions to add nodes and caches as Enterprise Manager targets.
Last Collected Configuration: From the Oracle Coherence Cluster menu, select Configuration, then select Last Collected. You can view the latest or saved configuration data for the Coherence cluster.
Topology: From the Oracle Coherence Cluster menu, select Configuration, then select Topology. The Configuration Topology Viewer provides a visual layout of the relationship of the Coherence cluster with other targets.

13.2.1 Node Home Page

This page shows the details and controls of a specific selected node in the Coherence cluster.

Figure 13-5 Node Home Page

This page contains the following sections:

General
- Node ID: When a node becomes part of a cluster, an ID is automatically assigned to the node. This ID can be a new number or the number that was assigned to a Departed Node.
- Machine Name: The name of the machine on which the node is running.
  
  Note: You must set the machine name property when a node is started. To set the machine name, set the -Dtangosol.coherence.machine=HOST_NAME flag in the start coherence script. The HOST_NAME is the complete name of the machine on which the node is running. This allows you to associate the machine name with the Host Details page. See the Coherence documentation for more details on setting this flag.
- Role Name: The role could be storage/data, application/process, proxy or management node.
- Site Name: This indicates the location of the coherence node.
- Rack Name: Name of the rack where the machine is located.
- Member Name: The unique name assigned to the Member.
- Process Name: Indicates the name of the process on which the node is hosted.
- Up Since: The date and time from this node has been up and running.
  
  Note: If a node has dropped out the cluster and rejoined the cluster, the period from which it last joined the cluster is displayed
- Product Edition: The product edition this Member is running. Possible values are: Standard Edition (SE), Enterprise Edition (EE), Grid Edition (GE).
Memory Usage: The maximum memory and the available memory on this node are displayed as line graphs.
Node Management: In this section, you can click Stop Node to stop this node and click Reset Statistics to reset all the statistics for this node.
Components: The list of components and the type of each component in the cluster are displayed here. Click on the link to drill down to the Component Home page.
Metric Alerts: This table shows a list of all the metric alerts specific to this node.
Host Alerts: This table shows a list of alerts from the host on which this node is running.

13.2.2 Cache Home Page

This page shows the details of the specific selected cache in the Coherence cluster.

Figure 13-6 Cache Home Page

This page contains the following sections:

General
- Status: Indicates the status of the cache.
- Availability: The percentage of time that the management agent was able to communicate with the cache. Click the percentage link to view the availability details for the past 24 hours.
- Name: The unique name assigned to the cache.
- Coherence Cluster: The name of the cluster. Click on the link to drill down to the Cluster Home page.
- Number of Nodes: The number of nodes on which the cache is running. Click on the link to drill down to the Node Home page.
- Service Name: The name of the service used by this cache.
- Number of Objects: Shows the number of objects in the cache.
- Memory Consumed: The amount of memory used by the cache in units.
- Queue Size: The size of the write-behind queue size. Applicable only for WRITE-BEHIND persistence type.
Cache Hits and Misses: This graph displays the total cache hits and misses per minute. Click on the graph to drill down to the Cache Hits Delta Sum metric page.
Nodes: This section lists all the nodes supporting this cache.
- Node ID: The ID number assigned to the node. Click on the link to drill down to the Node Home page.
- Persistence Type: The persistence type for this cache. Possible values include: NONE, READ-ONLY, WRITE-THROUGH, WRITE-BEHIND.
Metric Alerts: This table shows a list of all the metric alerts specific to this cache.
Host Alerts: This table shows a list of alerts from the host on which this cache is running.
Push Replication Tables: If a push replication enabled cache is present in your cluster, you can see the Publisher and Subscriber tables.

13.2.3 Connection Manager Home Page

Use this page to view the details and controls of a Connection Manager instance in the Coherence cluster.

Figure 13-7 Connection Manager Home Page

This page contains the following sections:

General
- Service Name: The unique name assigned to the service.
- Node ID: An ID is automatically assigned to any node that is part of the cluster. This can be a new number or a number that belongs to a Departed node.
- Connection Count: The number of connections associated with the connection manager instance.
- Host IP: The IP address of the host machine.
- Refresh Time: The date and time on which the connection manager instance was last refreshed.
Bytes Sent and Received: This graph displays the number of bytes that were sent and received per minute. Click on the graph to drill down to the Bytes Sent Metric page.
Connections
- Remote Client: A unique hexadecimal number assigned to each connection.
- Service Name: The unique name assigned to the service.
- Node ID: An ID is automatically assigned to any node that is part of the cluster. This can be a new number or a number that belongs to a Departed node.
- Outgoing Byte Backlog: The number of outgoing bytes in the backlog.
- Outgoing Message Backlog: The number of outgoing messages in the backlog.
- Up Since: The date and time from which the connection manager instance is up.
- Bytes Received: The number of bytes received per minute.
- Bytes Sent: The number of bytes sent per minute.
Metric Alerts: This table shows a list of all the metric alerts specific to this connection manager.
Host Alerts: This table shows a list of alerts from the host on which this connection manager instance is running.

13.3 Performance Pages

Entity level performance pages give detailed views of the performance of this particular entity. For nodes and caches, there are two views - Charts and Metrics. You can select the option from the View drop-down list.

13.3.1 Node Performance Page

This page displays the performance of a specific node over a specified period of time. You can view either the charts or the actual metrics by selecting the appropriate option from the View drop-down list. If you select Charts option, you can see the average gets, puts, memory available, send queue size, and publisher and receiver success rates over the selected period of time. If you select the Metrics option, you can see the total and average metric values over the selected period of time.

Figure 13-8 Selected Node Performance Page

This page shows the following details:

Charts: The following line graphs are displayed for the selected nodes on this page.
- Puts: This graph shows the average number of put() operations per minute. Click on the graph to drill down to the Puts Sum Delta metric page.
- Gets: This graph shows the average number of get() operations per minute. Click on the graph to drill down to the Gets Sum Delta metric page.
- Memory Usage: This graph shows available memory for the node over a period of time. You can use this graph to view trends by setting different real time and historical time options. Click on the graph to drill down to the Memory Available (MB metric page).
- Send Queue Size: This graph shows the Send Queue Size for this node. The Send Queue Size indicates the number of packets that are scheduled for delivery. This includes both packets that are to be sent immediately and packets that have already been sent but are waiting for acknowledgement. Click on the graph to drill down to the Send Queue Size metric page.
- Publisher and Receiver Success Rates: This graph shows the Publisher and Receiver success rate percentage. Click on the graph to drill down to the corresponding metric pages.
Metrics: The metric values for the selected period are displayed.
- Cache: This section shows the total and average metrics values for all the caches running on this node.
  - Name: Name of the cache.
  - Tier: Indicates whether it is the front or back tier.
  - Loader: Responsible for reading/writing data from the underlying database.
  - Number of Objects: Shows the number of objects in the cache.
  - Memory Consumed: The amount of memory used by the cache in units.
  - Cache Hits: The average number of cache hits in the last 24 hours. A cache hit is a read or a get() operation.
  - Cache Misses: The average number of cache misses in the last 24 hours.
  - Gets: The average number of get() operations in the last 24 hours.
  - Puts: The average number of put() operations that have occurred during the specified period.
- Service: This section shows the average values for all the services running on this node.
  - Name: The unique name assigned to the service.
  - Partitions Endangered: The total number of partitions that have not been currently backed up.
  - Status HA: The high availability status of the service. This can be Machine-Safe, Node-Safe, or Endangered.
  - Thread Count: The number of threads in the service thread pool.
  - Task Count: The average number of tasks that have been executed in the specified period.
- Storage Manager: This section shows all the storage manager instances running on this node.
  - Service Name: The unique name assigned to the service.
  - Cache Name: The name of the cache.
  - Events Dispatched: The total number of events dispatched by the Storage Manager per minute.
  - Eviction Count: The number of evictions from the backing map managed by this Storage Manager caused by entries expiry or insert operations that would make the underlying backing map to reach its configured size limit.
  - Index Info: An array of information for each index applied to the portion of the partitioned cache managed by the Storage Manager. Each element is a string value that includes a Value Extractor description, ordered flag (true to indicate that the contents of the index are ordered; false otherwise), and cardinality (number of unique values indexed).
  - Insert Count: The number of inserts into the backing map managed by this Storage Manager. In addition to standard inserts caused by put and invoke operations or synthetic inserts caused by get operations with read-through backing map topology, this counter is incremented when distribution transfers move resources `into` the underlying backing map and is decremented when distribution transfers move data `out`
  - Remove Count: The number of removes from the backing map managed by this Storage Manager caused by operations such as clear, remove or invoke.
- System: This section shows the system information for this node.
  - Maximum Memory (MB): The maximum amount of memory that the JVM will attempt to use in MB.
  - Memory Available (MB): The total amount of memory in the JVM available for new objects in MB.
  - CPU Count: The number of CPU cores for the machine on which this Member is running.
  - Input Arguments: The Input Arguments provide information about system parameters with which a Coherence node JVM is started. These are stored as configuration metrics in Repository and can be used to compare two different nodes. They also play an important role in stopping and restarting the node.
- Operating System
  - Committed Virtual Memory Size: The amount of virtual memory that is guaranteed to be available to the running process in bytes.
  - Free Physical Memory Size: The amount of free physical memory available.
  - Free Swap Space: The amount of free swap space available.
  - Total Physical Memory Size: The total amount of physical memory available.
  - Total Swap Space Size: The total amount of swap space size available.
  - Operating System Name: The name of the operating system being used.
  - Architecture: The architecture of the operating system.
  - Operating System Version: The version of the operating system being used.
  - Available Processors: The number of processors available to the Java Virtual Machine.
  - Max File Descriptor Count: The maximum number of file descriptors present.
  - Open File Descriptor Count: The number of open file descriptors available.
  - Process CPU Time: The CPU time used by the process on which the Java virtual machine is running.
- Platform Memory
  - Heap Memory: The runtime data area from which memory for all class instances and arrays are allocated.
  - Non Heap Memory: The Java virtual machine has a method area that is shared among all threads. The method area belongs to non-heap memory. It stores per-class structures such as a runtime constant pool, field and method data, and the code for methods and constructors.
- Platform Memory:
  - Thread Count: The current number of live threads including both daemon and non-daemon threads.
  - Peak Thread Count: The peak live thread count since the Java virtual machine started or peak was reset.
- Platform Thread
  - Garbage Collector Name: The name of the Garbage Collector.
  - Collection Count: The total number of collections that have occurred.
  - Collection Time Count: The approximate accumulated collection elapsed time in seconds.
  - Time Taken: The elapsed time of this garbage collection in milliseconds.
- Platform Garbage Collector
  - Garbage Collector Name: The name of the Garbage Collector.
  - Collection Count: The total number of collections that have occurred.
  - Collection Time Count: The approximate accumulated collection elapsed time in seconds.
  - Time Taken: The elapsed time of this garbage collection in milliseconds.
- Connection
  - Multicast Address: The IP address of the Member's Multicast Socket for group communication.
  - Multicast Enabled: Specifies whether this Member uses multicast for group communication.
  - Multicast Port: The port of the Member's Multicast Socket for group communication.
  - Multicast TTL: The time-to-live for multicast packets sent out on this Member`s Multicast Socket.
  - Multicast Threshold: The percentage (0 to 100) of the servers in the cluster that a packet will be sent to, above which the packet will be multicasted and below which it will be unicasted.
  - Flow Control Enabled: Indicates whether Flow Control has been enabled.
  - Burst Count: The maximum number of packets that can be sent without pausing. Anything less than one (e.g. zero) means no limit.
  - Buffer Publish Size: The buffer size of the unicast datagram socket used by the Publisher, measured in the number of packets.
  - Buffer Receive Size: The buffer size of the unicast datagram socket used by the Receiver, measured in the number of packets.
  - Burst Delay: The number of milliseconds to pause between bursts. Anything less than one (e.g. zero) is treated as one millisecond.
  - Unicast Address: The IP address of the Member's Datagram Socket for point-to-point communication.
  - Unicast Port: The port of the Member's Datagram Socket for point-to-point communication.
- Network
  - Packet Delivery Efficiency: The efficiency of packet loss detection and retransmission. A low efficiency is an indication that there is a high rate of unnecessary packet retransmissions.
  - Packets Bundled: The total number of packets which were bundled prior to transmission. The total number of network transmissions is equal to (Packets Sent - Packets Bundled).
  - Packets Received: The number of packets received per minute.
  - Packets Repeated: The number of duplicate packets received per minute.
  - Packets Resent: The number of packets resent per minute. A packet is resent when there is no ACK received within a timeout period.
  - Packets Resent Early: The total number of packets resent ahead of schedule. A packet is resent ahead of schedule when there is a NACK indicating that the packet has not been received.
  - Packets Resent Excess: The total number of packet retransmissions that were later proven unnecessary.
  - Packets Sent: The number of packets sent per minute.
  - Resend Delay: The minimum number of milliseconds that a packet will remain queued in the Publisher's re-send queue before it is resent to the recipient(s) if the packet has not been acknowledged.
  - Send Queue Size: The number of packets currently scheduled for delivery. This number includes both packets that are to be sent immediately and packets that have already been sent and awaiting for acknowledgment. Packets that do not receive an acknowledgment within Resend Delay interval will be automatically resent.
  - Traffic Jam Count: The maximum total number of packets in the send and resend queues that forces the Publisher to pause client threads. Zero means no limit.
  - Traffic Jam Delay: The number of milliseconds to pause client threads when a traffic jam condition has been reached. Anything less than one (e.g. zero) is treated as one millisecond.
  - Weakest Channel: The ID of the cluster node to which this node is having the most difficulty communicating, or -1 if none is found. A channel is considered to be weak if either the point-to-point publisher or receiver success rates are below 1.0.
  - Receiver Packet Utilization: The receiver packet utilization for this cluster node since the socket was last reopened. This value is a ratio of the number of bytes received to the number that would have been received had all packets been full. A low utilization indicates that data is not being sent in large enough chunks to make efficient use of the network.
  - TCPRing Failures: The number of recovered TCP Ring disconnects per minute. A recoverable disconnect is an abnormal event that is registered when the TCP Ring peer drops the TCP connection, but recovers after no more then maximum configured number of attempts.
  - TCPRing Timeouts: The number of TCP Ring Timeouts per minute.

13.3.1.1 Cache Performance Details Page

This page displays the performance of a specific cache over a specific period of time. You can view charts showing the number of cache hits, misses, store reads, and store writes. You can also see the aggregated totals and average metric values over the selected period of time.

Figure 13-9 Cache Performance Details Page

The following graphs are displayed:

Number of Objects: This graph shows the number of objects in the cache.
Memory Consumed (Units): This graph shows the amount of memory consumed by the cache per minute.
Hits: This graph shows the number of cache hits per minute.
Misses: This graph shows the number of cache misses per minute.
Puts: This graph shows the total number of put() operations per minute.
Gets: This graph shows the total number of get() operations per minute.
Store Reads: This graph shows the total number of load operations per minute.
Store Writes: This graph shows the total number of store and erase operations per minute.

Metrics: Select the Metrics option in the View By drop-down list to view the totals and averages for the cache.

Totals / Averages: The aggregated total and average values across all nodes and per node total / average values for the cache during the selected period are displayed.
- Hits: The number of successful fetches of the cached objects per minute.
- Misses: The number of failed fetches of the cached objects per minute.
- Puts: The number of addition of objects to a cache per minute.
- Gets: The number of retrieval of objects from a cache per minute.
- Prunes: The number of prune operations on the cache per minute. A prune operation occurs every time the cache reaches its high watermark.
- Store Reads: The number of reads from a data store per minute.
- Store Writes: The number of writes to a data store per minute.
- Number of Objects: The number of objects in the cache.
- Memory Consumed (Units): The amount of memory used by the cache in units.
- Average Time (ms): The average execution time for each operation (hits, misses, puts, gets, store reads, and store writes) are displayed. The values since the last collection divided by the time taken by the operations are displayed. For example, if there are 100 Hits and these 100 Hits took 20 milliseconds since last collection, this value is calculated as 100/20 = 5.
Storage Manager
- Events Dispatched: The total number of events dispatched by the Storage Manager per minute.
- Eviction Count: The number of evictions from the backing map managed by this Storage Manager caused by entries expiry or insert operations that would make the underlying backing map to reach its configured size limit.
- Insert Count: The number of inserts into the backing map managed by this Storage Manager. In addition to standard inserts caused by put and invoke operations or synthetic inserts caused by get operations with read-through backing map topology, this counter is incremented when distribution transfers move resources into the underlying backing map and is decremented when distribution transfers move data out.
- Remove Count: The number of removes from the backing map managed by this Storage Manager caused by operations such as clear, remove or invoke.
- Listener Filter Count: The number of listener filters.
- Listener Key Count: The number of listener keys.
- Listener Registrations: The number of listener registrations.
- Locks Granted: The number of locks granted.
- Locks Pending: The number of locks pending.

Apart from these pages, the detailed pages for the following entities are also available:

Service Performance Details Page: This page displays the performance of the selected service over a specific period of time. The Request Average Duration and the Request Max Duration charts are displayed. You can also see the average metric values over the selected period of time.
Connection Manager and Connection Performance Details Page: These pages displays the performance of the selected connection or connection manager.

13.3.2 Administration Pages

You can drill down to the Administration page for a specific node, cache, connection, or connection manager. You can change configuration for a specific entity, or compare configurations for two entities in the cluster.

13.3.2.1 Compare Configurations

You can compare the configurations for a node, cache, service, or connection manager target. The comparison details between the two selected entities are shown in a four column table. The first column shows the comparison result. The second column shows the attribute being compared and the values of the attribute being compared are displayed in the last two columns. When an attribute value is the same for both the entities, you would see an “=” in Result column. Otherwise, you would see a non-equal sign in Result column. The figure below shows the Compare Nodes page.

Figure 13-10 Compare Nodes Page

13.3.2.2 Change Configuration

You can change the configuration of a node, cache, or a service in the cluster.

13.4 Log File Monitoring

Enterprise Manager monitors log files for the occurrence of user specified patterns to check for abnormal conditions. Log files are periodically scanned for occurrence of one or more perl patterns and an alert is raised when the pattern occurs during a given scan.

You can set up each Coherence node to log all messages into a log file on the host on which the node is running. You must use a specific naming pattern in the log file name to ensure that it is monitored. To configure the log file monitoring criteria, follow these steps:

From the Targets menu, select Middleware, then click on a Coherence cluster target.
Navigate to the Administration page of the node for which the Log file alerts need to be configured.

Figure 13-11 Node Administration Page
Click the Log Alert Setup link. The Metric and Policy Settings page of the host on which the Coherence node is running is displayed.
Select Metric with Thresholds in the View drop-down box and search for the Log File Pattern Matched Line Count metric.
Click the pencil icon in this row to navigate to the Edit Advanced Settings: Log File Pattern Matched Line Count page.

Figure 13-12 Log File Pattern Matched Line Count Page
Click Add to add a row to the Monitored Objects table.
In the Log File Name field, specify the name in the format <coherence_cluster_target_name>_<Node_name>_<....any other optional names>.log.
Specify the pattern to be matched for or ignored in the Match Pattern in Perl and Ignore Pattern in Perl fields. Lines matching the ignore pattern will be ignored first, then lines matching specified match patterns will result in one record being uploaded to the repository for each pattern.
Enter the Critical and Warning thresholds for this metric and click Continue. You will return to the Metrics and Collection Settings page. Click OK to update the metric.
Any alerts generated will be displayed in the Coherence Log Alerts section in the Cluster Home page.

Figure 13-13 Cluster Home Page - Coherence Log Alerts

13.5 Cache Data Management

The Cache Data Management feature allows you to define indexes and perform queries against currently cached data that meets a specified set of criteria.

Note:

This feature is available to users with Administration privileges only if the Cache Data Management MBean has been registered in the Coherence Cluster.

To perform cache data management operations, navigate to the Cache Administration page and click Go in the Cache Data Management section. In the Cache Data Management page, you can select an operation and a query to perform a data management operation on the cache. You can perform the following operations:

Add Indexes: To create an index, select the Add Index option in the Operation field. In the Value Extractor List field, specify a comma separated list of expressions that identify the index, and enter the Host Credentials. The Value Extractor is used to extract an attribute from a given object for indexing.
Remove Indexes: To remove an index, select the Remove Index option in the Operation field and specify the Value Extractor List that identifies the index. Specify the Host Credentials and click Execute to remove the index from the cache.
Export: You can export the queried data onto a file. Select a query from the Query section or click Create to create a new query. Select the Export option in the Operation field and enter the absolute path to the file. This file can be saved on the host machine on which the management node is running.
Import: You can import queried data from a file. This file should be present on the host machine on which the management node is running. Select the Import option in the operation field and enter the absolute path to the file.
Insert: Select the Insert option in the Operation field and specify an unique (key value) pair. This key value pair will be inserted into the cache and can be provided from:7
- UI Table on this Page: Select the Type of Keys and Type of Values and the Host Credentials.
- Text File on Management Host: If the queries are stored in a text file, select this option and specify the location of the file.
- Database Table: If the queries are stored in a database table, specify the Database URL, Credentials, the SQL Query Statement and Properties.
Purge: Select Purge from the Operation drop-down list. Data matching the selected query will be deleted from the cache
View: Select View from the Operation drop-down list and specify the number of key-value pairs to be displayed on each page. Data matching the criteria will be displayed.
Update: Select Update from the Operation drop-down list. Specify the credentials for the host. Select a query from the Query table or create a new query to update the data in the cache.

13.6 Reap Session Support

HTTPSession Objects for one or more Java EE applications deployed on Application Servers can be cached in the Coherence cluster. These HTTP sessions are cleaned by the Session Reaper and the associated memory is freed up. The Session Reaper is responsible for destroying any sessions that are no longer used, which is determined when the session has timed out. It is configured to scan the entire set of sessions over a certain period of time, called the reaping cycle. The Session Reaper scans for sessions that have expired, and when it finds expired sessions it cleans them up.

To view the reap session metrics, click the Application link in the Applications table in the Cluster Home page. The following reap session metrics and graphs are displayed in the Application Home page.

Reap Duration: This graph shows the average reap duration in minutes.
Reaped Sessions: This graph shows the average number of reaped sessions in a reap cycle.
Reaped Sessions (24 Hour Averages): This table shows the average reap session data over the last 24 hours. The following details are displayed:
- Module Name: The name of the Coherence cluster on which the HTTP sessions have been cached.
- Node ID: When a node becomes part of a cluster, an ID is automatically assigned to the node. Click on the link to drill down to the Node Home page.
- Average Reap Duration: The average reap duration since the statistics were last reset.
- Average Reaped Sessions: The average number of reap sessions since the statistics were last reset.
- Total Reaped Sessions: The total number of expired sessions that have been reaped since the statistics were last reset.

13.7 Push Replication Pattern

The Push Replication Pattern provides extensible, flexible, high-performance, highly available, and scalable infrastructure to support the replication of EntryOperations occurring in one Coherence Cluster to one or more globally distributed Coherence clusters. The Push Replication Pattern advocates that:

Operations (such as insert, update and delete) occurring on data in one Site should be pushed using one or more Publishers to an associated device.
A Publisher is responsible for optimistically replicating operations (in the order in which the said Operations originally occurred) on or with the associated device.
If a device is unavailable for some reason, the operations to be replicated using the associated Publisher will be queued and executed (in the original order) at a later point in time.

Implementation of the Push Replication Pattern additionally advocates that:

The data on which operations occur are standard Coherence cache entries.
Operations replicated include inserts, updates, and deletes of cache entries. This includes NamedCache put and remove, as well as updates that are artifacts of invoking AbstractProcessor.
The operations that are enqueued for replication are called EntryOperations. They contain key and value pairs of updated cache entries in Binary form.
A Site is a Coherence Cluster.
A Site may act as both a sender of Operations and receiver of Operations. That is, multi-way multi-site push replication is permitted.
A Device may be any of the following; a local cluster, a remote cluster, a file system, a database, an i/o stream, a logging system etc.

After the EntryOperations are captured, they are sent to the messaging layer, for distribution to the Subscribers. The Publishers then publish the currently queued EntryOperations.

The Publisher and Subscriber tables lists all the publishers and subscribers and displays the data used by them. You can Suspend, Drain, or Resume publishing operations on a specific Publisher.

13.8 Transactional Cache Support

Transactional caches are specialized distributed caches that provide transactional guarantees. Transactional caches are required whenever performing a transaction using the Transaction Framework API. Transactional caches cannot interoperate with non-transactional caches.

The CacheMBean managed resource provides attributes and operations for all caches, including transactional caches. Many of the MBeans attributes are not applicable to transactional cache; invoking such attributes simply returns a -1 value. A cluster node may have zero or more instances of cache managed beans for transactional caches. The object name uses the form:

type=Cache, service=service name, name=cache name, nodeId=cluster node's id

The following list describes the CacheMBean attributes that are supported for transactional caches.

AverageGetMillis: The average number of milliseconds per get() invocation.
AveragePutMillis: The average number of milliseconds per put() invocation since the cache statistics were last reset.
HighUnits: The limit of the cache size measured in units. The cache will prune itself automatically once it reaches its maximum unit level. This is often referred to as the high water mark of the cache.
Size: The number of entries in the current data set.
TotalGets: The total number of get() operations since the cache statistics were last reset.
TotalGetsMillis: The total number of milliseconds spent on get() operations since the cache statistics were last reset.
TotalPuts: The total number of put() operations since the cache statistics were last reset.
TotalPutsMillis: The total number of milliseconds spent on put() operations since the cache statistics were last reset.

13.9 Integration with JVM Diagnostics

JVM Diagnostics allows administrators to identify the root cause of performance problems in the production environment without having to reproduce them in the test or development environment. JVM Diagnostics is a part of WLS Management Pack EE.

You can drill down to a Coherence node's JVM to identify the method or thread that is causing a delay. This feature allows you to trace live threads, identify resource contention related to locks, and trace the Java session to the database.

You can view the JVM Diagnostics data if the JVM Diagnostics Manager and JVM Diagnostics Agent have been deployed on the host machine on which the OMS running. You can deploy the JVM Diagnostics Manager and JVM Diagnostics Agent on:

Standalone Coherence
CoherenceWeb

After the JVM Diagnostics Manager and Agent have been deployed, navigate to the Node Home page and select JVM Diagnostics from the Node menu. For more details on the JVM Diagnostics Manager and Agent installation, see Oracle Enterprise Manager Cloud Control Basic Installation Guide.

13.9.1 Deploying on Standalone Coherence

To deploy the JVM Diagnostics Manager and the JVM Diagnostics Agent, follow these steps:

Follow the steps listed in the Oracle Enterprise Manager Cloud Control Basic Installation Guide to deploy the JVM Diagnostics Manager and the JVM Diagnostics Agent.
From the Setup menu, select Middleware Diagnostics, then click Setup JVM Diagnostics on the Middleware Diagnostics page. Click the JVMs and Pools tab and click Download, and download the jamagent.war.
Start Coherence with the JVM Diagnostics Agent.
1. Specify the value for the SYS_OPT parameter as: -SYS_OPT = "$SYS_OPT -Doracle.coherence.jamjvmid=[CoherenceClusterName]/[NodeName]"
2. Add the jamagent.war to the CLASSPATH as follows:
  
  CLASSPATH="$CLASSPATH:/scratch/ssmith/jvmd/jamagent.war"
  
  export CLASSPATH
3. Start Coherence with the JVM Diagnostics Agent as follows:
  
  $JAVA_HOME/bin/java -cp $CLASSPATH $JVM_OPT $SYS_OPT jamagent.jamrun [$JAMAGENT_PARAMS_LIST] $TARGET_CLASS $TARGET_CLASS_PARAMS
  
  where [$JAMAGENT_PARAMS_LIST] refers to the JVM Diagnostics Agent parameters. The mandatory parameters are:
  
  jamconshost = JVM Diagnostics Manager Host
  
  jamconsport = Listening port of JVM Diagnostics Manager
  
  oracle.ad4j.groupidprop = [CoherenceClusterName]/[NodeName]
  
  jamjvmid=MyCluster1/Node50
  
  Note:
  the value of the -Doracle.coherence.jamjvmid parameter must be the same as the value of the oracle.ad4j.groupidprop in the $JAMAGENT_PARAMS_LIST.

An example is given below:

SYS_OPT="$SYS_OPT -Doracle.coherence.jamjvmid=MyCluster1/Node50"

CLASSPATH="$CLASSPATH:/scratch/xiawu/jvmd/jamagent.war"

$JAVA_HOME/bin/java -cp $CLASSPATH $JVM_OPT $SYS_OPT jamagent.jamrun
jamconshost=10.229.187.109 jamconsport=3800
oracle.ad4j.groupidprop=MyCluster1/Node50
oracle.sysman.integration.coherence.EMIntegrationServer

13.9.2 Deploying on CoherenceWeb

To deploy the JVM Diagnostics Manager and the JVM Diagnostics Agent, follow these steps:

Follow the steps listed in the Oracle Enterprise Manager Cloud Control Basic Installation Guide to deploy the JVM Diagnostics Manager and the JVM Diagnostics Agent.
From the Setup menu, select Middleware Diagnostics, then click Setup JVM Diagnostics on the Middleware Diagnostics page. Click the JVMs and Pools tab and click Download, and download the JVM Diagnostics Agent WAR or Database Agent War (jamagent.war).

Update the web.xml file with the proper JVM Diagnostics Agent properties as follows:

<?xml version="1.0" ?>
<!DOCTYPE web-app PUBLIC "-//Sun Microsystems, Inc.//DTD Web Application
 2.3//EN""http://java.sun.com/dtd/web-app_2_3.dtd">
<web-app>
<servlet>
  <servlet-name>jamagent</servlet-name>
  <servlet-class>jamagent.jaminit</servlet-class>
  <init-param>
    <param-name>jamconshost</param-name>
    <param-value>localhost</param-value>
    <description>Default Jam Console host</description>
  </init-param>
  <init-param>
    <param-name>jamconsport</param-name>
    <param-value>3800</param-value>
    <description>Jam console port</description>
  </init-param>
  .....
</servlet>
</web-app>

Deploy the jamagent.war on the WebLogic Server.
1. In the WebLogic Server Start scripts (startManagedWebLogic.sh or startWebLogic.sh), add the following JAVA_OPTIONS and restart the server.
  
  JAVA_OPTIONS="$JAVA_OPTIONS -Doracle.coherence.wlsserver=[DomainName]/[WLSServerName]
2. To view the association between Coherence node and WebLogic Server (the WebLogic Server node on which the Coherence node has been deployed), modify the -Doracle.coherence.wlsserver option as follows and restart the server
  
  JAVA_OPTIONS="$JAVA_OPTIONS -Doracle.coherence.wlsserver=[DomainName]/[WLSServerName]/WLSListenPort/WLSSSLListenPort

13.10 Viewing Performance Summary

You can use the Performance Summary page to monitor the performance of the cluster, a node, or a cache. From the Oracle Coherence Cluster (Node or Cache) menu, select Monitoring, then select Performance Summary.

A set of default performance charts that shows the values of specific performance metrics over time are displayed. You can customize these charts to help you isolate potential performance issues. You can also view a series of regions specific to the cluster, node, or cache target. For more details, see the Performance Summary Online Help.

13.11 Viewing Configuration Topology

The Configuration Topology Viewer provides a visual layout of the relationship of the Coherence target with other targets. From the Oracle Coherence Cluster (Node or Cache) menu, select Configuration, then select Topology. A topology graph for the Coherence target is displayed. You can determine the source of its health problem and its impact on other targets. You can also view the members of the cluster and its relationships. For more details, see the Configuration Topology Viewer Online Help.

13.12 Troubleshooting Coherence

If you cannot collect metric data for any of the Coherence targets, check the following to ensure that the steps involved in discovering the target have been followed correctly.

Make sure that the management node has been successfully started and the host on which the management node is running is accessible from the Agent host.
Specify the appropriate User Name and Password if password authentication is enabled.
If you are not using SSL to start the management node, make sure that you have started the JVM using the com.sun.management.jmxremote.ssl=false option.
If you did not use the bulk operation MBean JAR to start the management node, you must leave the Bulk Operations Mbean field blank during discovery.