2 Isolate and Diagnose Application Performance Issues

Using Oracle Application Performance Monitoring, you can monitor performance of your application by following transactions across servers to identify the exact tier causing an application issue, see if the issue is specific to a geography and see application logs automatically in context of the application performance. Synthetic Monitoring helps in simulating a path in the application that a user would normally take, and ensure that the user can transition through the different web pages in the path smoothly. This helps is recognizing application performance issues before the end user experiences it.

Typical Workflow for Isolating Application Performance Issues

This section uses an example scenario to illustrate how you can isolate application performance issues. In this example scenario, as a DevOps administrator, you’re responsible for administering and supporting one of your enterprise applications used by your customers interested in carpool and vanpool services. Your line of business executives see a sudden drop in sales on your company website, and they ask you to investigate the reasons for drop in sales on the website. The ordering application is critical to your business, and it’s used by your customers daily to place service orders on your website.

Enterprise application deployments are complex. They involve various software tiers comprising applications, databases, web servers, and so on. You need simple and effective ways to isolate application issues and troubleshoot problems quickly.

You start troubleshooting this specific problem by:

  • Viewing alerts to see if the average response time for any page has exceeded the threshold

  • Checking if the errors are specific to any geography and drilling down to isolate the exact problem location

  • Isolating the problem down to the application servers and databases

  • Drilling down to logs to determine the exact root cause of the problem causing drop in sales on your website

Here are the common tasks to isolate application performance issues.

Task Description More Information

View alerts

From the list of alerts, pick an alert and view details.

View Alerts

Troubleshoot a slow page View details about the page to identify a possible problem in the page.

Troubleshoot a Slow Page

Identify a slow request Identify which request is slowing down the performance of the application.

Drill Down to Server Request Details

Identify issues in associated tiers Inspect associated tiers and identify issues. Find Issues in Associated Tiers
Identify issues based on location Identify if issues are being seen in a specific country Using Geomaps to Find Issues in Pages
View related logs Drill down to related logs to identify issues. Drill Down to Related Logs
View Diagrams to spot issues Study Diagrams to easily spot issues Isolate Issues through Diagrams

View Alerts

Oracle Application Performance Monitoring notifies you of application performance issues. As a DevOps administrator, you start troubleshooting by viewing such alerts, which provide a starting point to isolate the problem.

You can view alerts from the Alerts page and also from the entity page on which the alert was created. Alerts are created for fixed thresholds or anomalies for metrics on Pages, AJAX calls, and Server Requests.

View alerts from the Alerts page

  1. In the Oracle Application Performance Monitoring home page, in the left navigation pane, click Alerts.

    OR

    In the Alerts tile on the Home Page, click the number of Alerts. You can also click the number of Critical Alerts, or Warnings to view only the specific alerts.

    The Alerts page displays all the alerts that need your attention.

  2. Select APM in the Service dropdown to view Alerts from Oracle Application Performance Monitoring. You can further filter based on severity. You can view the following details of an alert.

    Detail Description
    Message The last alert message seen on this object. Click the message to view details and history of the alert.
    Entity and Entity Type The object for which the alert exists. Click the entity to open the object. Entity type is the type of object, a Page, or an AJAX call.
    Duration Duration for which this alert has been open.
  3. The icon in the first column indicates the type of alert — the status of an alert could be Critical, Warning, Clear (closed), and Fatal being the most severe. Click the arrow next to the icon to view some more details of the alert like Created Date, the Updated Date, and a brief history. In case of a closed alert, the Closed Date is displayed.

  4. Click the alert Message to view the details of the alert more closely.

  5. Click the Entity to view details of the entity on which the alert was created.

View alerts from the entity page

If an alert is created on an entity based on alert rules, the alert will be displayed in the entity page for that time period. You can view the alerts in the Alerts pane on the entity page.

  • The Alert pane displays the number of new alerts, open alerts that were carried over from previous time periods, and alerts that are still open at the end of the selected time period.

  • The Alerts tab lists all the alerts during the selected time period and their current status. Click the arrow next to the status of the alert to view details.

  • If the alert is an early warning, a chart indicates when the trigger event occurred. A prediction of how soon an error might occur is indicated.

Troubleshoot a Slow Page

Oracle Application Performance Monitoring helps identify a page that is loading slowly and points to possible reasons for the decrease in speed.

With Oracle Application Performance Monitoring, you can get an insight into how your application is performing at the user-end. The performance of your application’s web pages are monitored, and if your users are experiencing issues with any specific page, Oracle Application Performance Monitoring alerts you. With Oracle Application Performance Monitoring, you can also diagnose the reason for a slow page, and identify if the actual issue is at the server level or with the browser.

To identify the reason for a slow page:

  1. You can start from the Pages tile in the home page or Pages list.
    The Pages list gives some quick information about the listed pages. Below the page icon, you can see which browser was used. The Apdex value indicates the overall performance of the application. The Page User Satisfaction graph indicates the user experience with the page. The Page Load Time and the Views and Errors provide further data points on page performance.
  2. If a particular page, say cart.jsp seems to be taking an unusually long time to load, as indicated by Average Load Time or Max Load Time. Click the page name to view page details.
    The Alerts pane shows the number of alerts for the page for the selected time period. The Apdex pane shows the Application Performance Index for the selected time period, along with a chart for user satisfaction. The Average Page Load Breakdown chart indicates which phase of the page load is taking a long time. In this example, you can see that loading the First Byte is taking the most time, indicating a server side issue. This needs to be inspected further.

    Page load time

    If you see a break in the Page Load Time graph, it could be because the Page was not used during that time period, and hence no data was recorded. This is applicable to all entities.

  3. You can diagnose performance of AJAX calls seen on the page. Click the Ajax Calls tab to view the list of AJAX calls on the page, and view details for any AJAX call with slow response time.
    AJAX requests are automatically correlated with their page details and server requests for rapid identification of problems.
  4. Click the Server Requests tab to see which server requests were called to serve the page. If a server request appears to be slow, you can click the server request to view details and locate an issue, if any.
  5. Click the Instances tab to view instances of pages and to navigate to sessions related to those page instances. You can diagnose performance issue or error in the context of a user session from here.
  6. Click the Alerts tab to view all the alerts which were active for the page during the selected time period. Click an alert to view history of the alert.

Drill Down to Server Request Details

Oracle Application Performance Monitoring automatically discovers, classifies, and measures all your server requests. You get the information you need to understand what tier, request, and operation the application issue resides in. Let’s start monitoring server requests to investigate which requests have issues.

To identify the reason for a slow request:
  1. Monitor the response time for top five requests in the home page, or across all the requests in the Server Request list.
  2. If you find that the response time of a particular request is more than what’s acceptable, then view the details of the request.

    In this example, the request checkout seems to have a high max response time of over 5 seconds, and over 30% errors. This means that on an average, users are not seeing this high max response time, but at least some users are experiencing up to over 5 seconds of max response time. Let us inspect this further.

    Request with errors

  3. Click to view details of the server request.
  4. In the Diagram tab, see a pictorial representation of the calls made by the server request. You can see how many calls the request has received, how many internal and external calls it has made and if there are any errors in any of these calls. Hover over the objects in the diagram to view details in the Tooltip pane.
  5. Click Metrics to view further details of the server request.

    You can see in this example, that the max response time of the request has peaked to 5.41 seconds at 5.31 AM. This is higher than the expected value of 2 seconds.

    Details of the server request

  6. Drill down to the details of the application server to check for any issues resulting in a slow request.

    Server Request

  7. Inspect further to see if an associated tier is causing a slow request. Drill down to the Database tab and see if the issue is originating in the Database tier.
  8. Click the Alerts tab to view all the alerts which were active for the server request during the selected time period. Click an alert to view history of the alert.
The details displayed in the Server Request page, the contextual details displayed in the Application Server page, and the logs for the application server will help you identify the reason for a slow request.

Find Issues in Associated Tiers

Oracle Application Performance Monitoring helps you recognize bottlenecks in tiers associated with server requests.

To isolate the application performance problem down to the application infrastructure, such as database or application servers, navigate from the server request to the Database tab or to the Application Server page. To recognize issues in associated tiers:
  1. Identify a slow request and view the request summary.
  2. Place the cursor at any time point on the Tier Average Response graph to view the response time of each tier at that point. The Tier Average Response cell displays the various response times of each tier — the App Server, the Database, or the External Tier.

    In this example, the bulk of the average response time is spent on processing in the application server tier. The Tier Average Response graph also shows how the tiers trend over time - in this example you can see that in the selected time range, the application server tier is consistently contributing more to the average response time than the other tiers.

    Average tier response cell

    In this example the request’s response time has peaked to 5.41 seconds at 5.31 AM. This is higher than the expected value of 2 seconds.

    Details of the server request

  3. In the Diagram tab, see if all operations in the application server are slow, or if one particular operation is slow.
  4. The Application Sever tile shows a view of the performance of application server. Drill down to the application server page to check if the issue is in the application server tier.
  5. Drill down to check if the issue is in the database tier. To diagnose issues with the database tier, go to the Database tab to analyze the specific SQL statements and view the database logs.
  6. If both the tiers do not seem to have an issue, check the external tier for possible issues.
  7. Click the Instances tab to view details of a specific instance. You can drill down into the logs of an instance to get more details on the faults.

    Here’s an example of the faults in an instance.

    Faults in the instance

    You can also see these faults in the Diagram tab, and on the Links tab.

Find Issues in Pages Using Geomaps

Oracle Application Performance Monitoring helps you isolate issues in pages based on geography.

You can inspect if any issues in pages are cropping up in a certain location through the Pages By Geography pane in the home page. To isolate issues related to pages in a specific geographic location:
  1. In the Oracle Application Performance Monitoring home page, select a measurement by which you want to see pages, in the Pages By Geography pane.
    The color coded map indicates the locations where page usage has been observed and measured.

    Pages by geography

  2. Click a color coded continent and click the Zoom In icon to drill down to the countries where the selected measurement is further applied.
  3. Select a country within the continent and click the Zoom In icon to view details specific to the country.
The countries are color-coded based on the values of the metric selected in the drop down, with the lightest being the least number of pages of a selected metric. Hover over a country to view all measurements Oracle Application Performance Monitoring has observed for page-views within that country.
List of countries in the Geomap

You can drill-down and view regions within the following countries in the geomap:

  • 'BEL': Belguim
  • 'CHN': China
  • 'FRA': France
  • 'DEU': Germany
  • 'GBR': Great Brittain / United Kingdom
  • 'IND': India
  • 'ITA' : Italy
  • 'JPN': Japan
  • 'ESP': Spain
  • 'THA': Thailand
  • 'NLD': The Netherlands / Holland
  • 'USA' : United States of America

Drill Down to Related Logs

To isolate the application performance problem further, you can view and inspect log events of a request instance or an application server that might be causing the problem.

You can view logs from:
  • Application Server details page to see logs for that one application server.

  • Server request details page to see logs for the application server and databases relevant to the server request.

  • Server request Database tab to see logs for the databases relevant to the server request.

  • Server request Instances to see logs for the application server and database(s) specific to that server request instance.

To view log events of a server request or an application server:
  1. Drill down to logs related to a server request:
    1. In the Server Request Details page, go to the Instances tab and select an instance.
    2. In the Server Request Instance page, click View Related Logs above the summary pane.
  2. Drill down to logs related to an application server:
    1. Select the application server and view details.
    2. Click View Log above the Application Server summary pane.

Here’s an example of how to drill down to related logs to isolate an issue.

  1. Let us start from the cart.jsp page, for which an alert was displayed, primarily because the response time for the page was very high.

    Page with high response time

  2. Drill down to view details of the checkout Ajax call.

    Ajax call with errors

    Notice that the Ajax call has encountered a high number of errors.

  3. The call processing and the response times indicate a very slow call along the timeline. The call has encountered some errors.

    Ajax call with high response time

  4. The corresponding server request checkout indicates errors. Let us drill down further to view details of the server request checkout. The errors seem to be very high, close to 40%.

    The server request has close to 40% errors

  5. The Diagram tab displays all calls made by the server request. Hover over an object or an arrow to view details of the object or the call.

    Diagram tab

  6. Let us drill down to Instances to see what operation is failing. In the Instances tab, pick an instance which has a fault. Click View Related Logs to inspect this further by viewing logs.

  7. The log points to the time when the fault occurred, and indicates issues at the application server and the database levels.

    Logs for the

Isolate Issues through Diagrams

The Diagram tab in the Server Request details page gives a quick diagrammatic view of all the objects associates with the server request.

Here is the diagram of a server request with all the connected objects like the SQL calls, server requests and AJAX calls. The question mark indicates an unknown caller.

Server Request Diagram

Using the Diagram

The diagram represents the server request in the center, with all the calls made to and from the server request represented by a node. Hover over any connector between two nodes to cut out other traffic, and view details about the specific call. Hover over any node to view only the connections to and from the selected node. This helps in isolating the specific call you are looking for, to enable quicker identification of issues. Here is example of how hovering over a connector and a node cuts out other information from the diagram.

Connector and Nodes in a diagram

Using the Calls table

The Calls table that appears below the diagram lists all the calls made to and from the server request, showing only information pertaining to the object currently selected in the diagram. You can drill down further from this table to view the details of the server request, or of a related AJAX call to isolate a problem.

Using the Context menu

You can right-click on any node in the diagram to see a context menu through which you can easily move forward with troubleshooting and isolating the cause for an issue. The options available in the context menu depends on the type of object the selected node represents.

For example, right click a SQL call and select Isolate this Operation’s Calls. This will remove all other nodes from your diagram. From among the existing nodes, click on a server request node to display the operation’s inward and outward paths.

Typical Workflow for Using Synthetic Monitoring

You can use Synthetic Monitoring to script or record user paths, and use this to simulate user transactions on the application. These paths can be continuously monitored through Application Performance Monitoring, and potential issues can be caught early, before the end user experiences it.

Note:

You can define and use Synthetic Monitoring only if you have installed the Cloud Agent on Linux.

Here’s a typical workflow for setting up and using Synthetic Monitoring:

Task Description More Information
Deploy Cloud Agents This is a requirement before you can define locations.

This is applicable only for private locations.

See Install Cloud Agents in Installing and Managing Oracle Management Cloud Agents.

To ensure that you can define and use Synthetic Monitoring, the Cloud Agent should be installed on Linux.

Check for pre-requisites Review the list of pre-requisites.

This is applicable only for private locations.

Pre-requisites for Locations

Define Locations Define locations. This is done by an APM Administrator.

This is applicable only for private locations.

Define Locations

Define Synthetic Tests You can define synthetic tests for a HTTP Ping, Page Load or a Scripted Action. Define Synthetic Tests
Review Synthetic Test reports Use the Synthetic Test reports to monitor the performance of your applications. Monitor Application Performance through Synthetic Tests
View Sessions For synthetic tests of type Scripted Actions, you can view details of the session when the test was run. This option is available only if you are running synthetic tests on an application that is also being monitored by Oracle Application Performance Monitoring.
View HAR Reports View HAR reports for HTTP Ping or Scripted Action. This is available for public locations.
  • View HAR reports from the Instances tab.

  • Download HAR files.

See Monitor End User Experience through Sessions.

Define Synthetic Tests

You can schedule synthetic tests for various locations and ensure that the performance of the application is monitored at all times.

To define a synthetic test:
  1. In the left navigation pane, click Administration and select Synthetic Tests.
  2. In the Create Synthetic Monitoring Test window, choose the Type of test to create.
    • HTTP Ping — Testing the connectivity to and performance of your application

    • Page Load — Testing the performance of a single URL, being loaded by a browser

    • Scripted Actions — Testing the performance of a complete workflow recorded using Selenium scripting.

    • Rest Web Service — Testing the performance of a complete workflow that uses REST web service.

  3. If you are creating a synthetic test of the type HTTP Ping, provide these additional details:
    • Name: Provide a name for the synthetic test you are creating.

    • URL: Select HTTP or HTTPS and specify the URL you want to test.

    • Location: Choose the location/s from where you want to run the test.

    • Application: Optionally, select an application within which the result of this test will be displayed. Associating the test to an application ensures that the test results and alerts will be visible with the application reporting context in the Oracle Management Cloud UI.

    • Frequency: Specify at what interval you want the test to be executed.

    • Verify certificate: Check this option if you want to verify the validity of the SSL certificate during the tests.

    • Redirect: Check this option if you want the test to fail in case there is a redirection.

  4. If you are creating a synthetic test of the type Page Load, provide these additional details:
    • Name: Provide a name for the synthetic test you are creating.

    • URL: Select HTTP or HTTPS and specify the URL you want to test.

    • Location: Choose the location/s from where you want to run the test. Private locations defined by your administrator, and public locations that are configured are listed here.

    • Application: Optionally, select an application within which the result of this test will be displayed. Associating the test to an application ensures that the test results and alerts will be visible with the application reporting context in the Oracle Management Cloud UI.

    • Frequency: Specify at what interval you want the test to be executed.

  5. If you are creating a synthetic test of the type Scripted Actions, provide these additional details:
    • Name: Provide a name for the synthetic test you are creating.

    • Base URL: Select HTTP or HTTPS and specify the base URL on which to run the test. In the script, the Base URL will replace the URL from where you have recorded the Selenium test.

    • Select File: Click Choose File to browse and select the Selenium script for the synthetic test. The selected file can be of the format .java or .side.

      Note:

      1. If it is a .java file, ensure that the script is exported from Selenium either as a Java JUnit for WebDriver or a Java TestNG file. Ensure you run the complete recording in Selenium before exporting the script.
      2. If you are creating a .side file, note that you can create Test Suites using Selenium IDE. Ensure that your .side file contains only one test, and that you have run the complete recording in Selenium before exporting the script. To see the list of supported Selenium commands, see Supported Selenium Commands in Synthetic Tests.
    • Optionally, click Preview File to view the contents of the uploaded script. Note that any edits to the script should be done through Selenium IDE or other preferred tools.

    • Location: Choose the location/s from where you want to run the test. Private locations defined by your administrator, and public locations that are configured are listed here.

    • Application: Optionally, select an application within which the result of this test will be displayed. Associating the test to an application ensures that the test results and alerts will be visible with the application reporting context in the Oracle Management Cloud UI.

    • Frequency: Specify at what interval you want the test to be executed.

  6. If you are creating a synthetic test of the type Rest Web Service, provide these additional details:
    • Name: Provide a name for the synthetic test you are creating.

    • URL: Select HTTP or HTTPS and specify the REST URL on which to run the test.

    • Location: Choose the location/s from where you want to run the test. Private locations defined by your administrator, and public locations that are configured are listed here.

    • Application: Optionally, select an application within which the result of this test will be displayed. Associating the test to an application ensures that the test results and alerts will be visible with the application reporting context in the Oracle Management Cloud UI.

    • Set the time Interval and Request Time Out.

    • Select Authentication if required.

    • For Request Configuration, select a Method — either GET or POST. Specify the Query and Header parameters as required.

    • For Response Configuration, specify the Expected Http Status Code. Check the Verify Content option, and specify a regular expression to validate the response output as required.

    • Select the Redirect is Failure option if the test should fail on redirection.

  7. Click Save.
The schedule will be displayed in the list of Synthetic Tests, and will run as per the frequency specified in the schedule. You can edit a saved synthetic test by clicking the name of the test.

Monitor Application Performance through Synthetic Tests

You can monitor the performance of your application through synthetic tests and identify possible issues before they occur.

Oracle Application Performance Monitoring enables you to define and run synthetic workflows on your application. You can define a test for a HTTP ping, test for a specific page, or record a Selenium based script of a workflow on your application, and run these monitoring tests on your application anytime without having to wait for the actual workflow to occur.

You can view the results of the scheduled test and monitor the performance of the application, view the usage of resources and isolate possible issues.

To view the reports of scheduled synthetic test:

  1. In the left navigation pane, select Synthetic Tests. All the scheduled synthetic tests are listed in the Synthetic Tests pane.

    From this pane, you can view high level data about the listed synthetic tests, like the type of test, application, location, frequency, execution time, and availability. Scan through these details to identify the synthetic test you would like to drill down into.

  2. You can sort the listed synthetic tests on a number of criteria. Sort the tests based on Status to view the tests with errors on top. A green check mark over the test icon indicates a successful test, and a red X indicates errors while running the test.

  3. Examine the metrics and select the synthetic test report to drill down into. Click the synthetic test. The Synthetic Test page displays details of the synthetic test.

  4. The Metrics tab displays information on availability, execution time, time breakdown, transfer rate and download size. The details in this tab are for all the tests executed across all locations, and depends on the type of synthetic test.

    • HTTP Ping: This report displays information like availability, execution time, transfer rate, download size and ping time breakdown.

    • Page Load: This report displays information like availability, execution time and total load time breakdown.

    • Scripted Action: This report displays information for multiple pages that are part of the script and includes data points like AJAX calls and total load time breakdown.

  5. The Instances tab displays details for individual tests that were run across all locations.

    You can view the status of the test run at a specific time, for a specific location. If there is an error in the test, the error message is displayed in this pane.

    • For synthetic tests of type Scripted Actions, you can view details of the session when the test was run. In the Instances tab, click View Session. This option is available only if you are running synthetic tests on an application that is also being monitored by Oracle Application Performance Monitoring. The Session page displays a timeline view of the session along with details of multiple pages accessed during a user session. You can further drill down into the details of the individual pages within the timeline. See Monitoring End User Experience through Sessions.

    • For synthetic tests created on public locations, and for of the type Scripted Action and Page Load, you can view HAR reports. In the Instances tab, click View Har.

      The Har Statistics page displays details of the HTTP pages. You can view a summary of the data as graphs and detailed tables.

      Note:

      On Firefox, if you are trying to view HAR files, the browser might display an error ‘Unresponsive Script’. Click Continue and wait for the script to complete. This usually happens when the HAR files are large (above 600 KB).
    • To download the content, click Download Har or Download Screenshot. When downloading the content as a screenshot, it will be downloaded to the local host as a zip file.

Create Alert Rules Based on Synthetic Tests

Create Alert Rules when selected Synthetic Tests meet defined conditions and send a notification when the alert is raised, worsens in severity, or is cleared.

To create a synthetic test alert rule:
  1. From the left-hand navigation, click APM, then select Alert Rules.
  2. Click Create Alert Rule. Enter a name and click Add Entities.
  3. In the Select Entities menu, select Individually and click a Synthetic Test.
  4. Click Add Condition and select a Test Failed metric with a warning or critical threshold greater than 0. Click Add.

    The "number of consecutive minutes that metric should be outside threshold before generating alert" dialog should be less than the Collection Frequency selected for the Synthetic Test.

    Figure 2-1 Test Failed Synthetic Alert Rule

    This image shows the configuration selected for the alert rule.
  5. Create a new condition with the same parameters for the Test Error metric.
  6. Add the required notification channels.
  7. Click Save.

Metric and Frequency Examples:

  • Number of consecutive minutes >= Test frequency: This will generate an alert on the first test failure.

  • Number of consecutive minutes >= 2*Test frequency: This will generate an alert on two consecutive failures.

  • Test Failed > 0: This will evaluate a warning alert when there is a test failure.

  • Test Failed >= 1: will evaluate true when there is a test failure.

  • Test Failed > 1: will Not evaluate true when there is a test failure as on Failure metric value will be "1"

    Test Failed > 0.5: will evaluate true when there is a test failure.

To learn more about Alert Rules, see Create Alert Rules.

Troubleshoot Synthetic Tests

If you run into problems while using Synthetic Tests, here are some tips to debug.

Debug Cloud Agent Location

You can run Synthetic Tests on a private or a public location. For a test to run successfully on a cloud agent, a few basic set of prerequisites, called Location Compatibility have to be in place. To check for Location Compatibility:

  1. On the Oracle Management Cloud home page, click APM. In the left navigation pane, select APM Admin, and then, Locations.

  2. For the required location, click Compatibility Check.

    A green tick mark indicates that all the prerequisites are met; else a warning or error is displayed.

  3. Click on the status indication icon to view these details.

    1. Agent Version — Indicates the version of the cloud agent. Ensure that the Cloud Agent version is 1.33 or higher.

    2. Firefox Version — Indicates the version of the browser. Ensure that you have the correct Firefox version for your system to successfully execute the Selenium tests.

      • Oracle Linux 6: Firefox version 45
      • Oracle Linux 7: Firefox version 61-66

      Note:

      Firefox is the only supported browser. Other Firefox versions including Beta versions are unsupported.

      You can check if Firefox is present on the cloud agent machine by running the command firefox --version.

    3. Proxy Status — Indicates the status of the proxy. Ensure that the proxy specified is correct and reachable. Edit the location to correct the proxy information, if required.

    4. Proxy Error Message — Displays an error message in case of an error in the proxy settings.

    5. X-Server Unavailable Ports — Indicates the X-Server ports that are not available. Create X-Server on the ports that are missing.

    6. X-Server Unavailable Ports Message — Displays an error message if there are unavailable X-Server ports.

Debug Cloud Agent Crash Due to Memory Issues

When running Synthetic Tests that generate big HAR files, the Cloud Agent may run into memory issues and crash. When the Cloud Agent crashes, a log file is generated with name: hs_err_pid<pid>.log. It should look like the following:

Stack: [0x00007f771c697000,0x00007f771c798000],  sp=0x00007f771c795220,  free space=1016k
    Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
      C  [libzip.so+0x11d10]  newEntry.isra.4+0x60 C  [libzip.so+0x12b57]  ZIP_GetNextEntry+0x37
      J 3024  java.util.zip.ZipFile.getNextEntry(JI)J (0 bytes) @ 0x00007f776995def6 [0x00007f776995de40+0xb6]
      J 1477 C1 java.util.zip.ZipFile$ZipEntryIterator.next()Ljava/util/zip/ZipEntry; (212 bytes) @ 0x00007f77694d6b4c [0x00007f77694d68a0+0x2ac]
      J 1475 C1 java.util.zip.ZipFile$ZipEntryIterator.nextElement()Ljava/lang/Object; (5 bytes) @ 0x00007f77694d5f84 [0x00007f77694d5ec0+0xc4]
    j  oracle.sysman.emd.fetchlets.gfmsynmon.common.CommonUtil.addToZipfile(Ljava/io/File;Ljava/lang/String;Ljava/lang/String;Ljava/lang/String;)Ljava/io/File;+106
    j  oracle.sysman.emd.fetchlets.gfmsynmon.selenium.HarFetchletUtils.prepareCombinedZip(Ljava/io/File;)Ljava/io/File;+349
    j  oracle.sysman.emd.fetchlets.gfmsynmon.selenium.HarFetchlet.getMetric(Ljava/util/Properties;Ljava/util/ArrayList;Loracle/sysman/emSDK/agent/datacollection/CollectionFactory;Loracle/sysman/emSDK/agent/fetchlet/FetchletContext;Loracle/sysman/emSDK/agent/TargetID;Ljava/util/Map;Loracle/sysman/emSDK/agent/fetchlet/StateFullCallbacks;)Loracle/sysman/emSDK/agent/datacollection/CollectionResult;+428
    j  oracle.sysman.gcagent.target.interaction.execution.FetchletFactory.getMetric(Ljava/util/Prop

To solve this issue, follow these steps:

  1. Navigate to your Cloud Agent installation folder, and edit emd.properties with a text editor.
  2. Search for the agentJavaDefines property and add the following flags:
    agentJavaDefines=-Xmx2G -XX:MaxPermSize=128M -Dsun.zip.disableMemoryMapping=true

    Note:

    The -Xmx2G flag assigns 2GB as the maximum memory allocation pool for a Java Virtual Machine (JVM). The -Dsun.zip.disableMemoryMapping=true flag is needed for Cloud Agents versions 1.49 and below.

Debug Test Execution

You can check for the status of synthetic tests, and debug if they are not getting executed properly by following these steps.

  1. Create a synthetic test with a private or a public location. Wait for a few minutes before checking for its deployment. A test on a private location takes about 5 minutes to deploy, and about 15 minutes to deploy on a public location.

  2. On the Oracle Management Cloud home page, click APM. In the left navigation pane, select APM Admin, and then, Synthetic Test Definitions.

  3. For the required location, click Check Deployment. The status of the test is displayed in the Test Status dialog box.

    1. Location Name — Indicates the name of the private or public location.

    2. Deployment Status — Indicates whether the test got deployed on the Agent or the Cloud Container.

    3. Last Run Status — Indicates the time the test was last run. If the test was not executed, check for its location compatibility.

    4. Last Deployment Time — Indicates the last time the test was deployed onto the Agent or the Cloud Container.

      This time is first recorded when the test is created, and updated for each edit of the test. If the Deployment Status is Failed, and the test Run Status is Successful, then it means that the last update of the test failed.

Debug Log Location

You can check the logs to diagnose the failed test execution by following these steps:

  1. Change directory to the agent_inst folder:

    $ cd $AGENT_HOME/agent_inst

  2. Check test name in emd/targets.xml and note test_meid:

    <Property NAME="test_meid" VALUE="06E1665FE0A82B8057506B2A45F8FFC6"/>

  3. Check logs in test_meid/log folder:

    $ cd $AGENT_HOME/sysman/ApplicationsState/beacon/06E1665FE0A82B8057506B2A45F8FFC6>/log/*