Clusters Visualization

Clustering uses machine learning to identify the pattern of log records, and then to group the logs that have a similar pattern.

Clustering helps significantly reduce the total number of log entries that you have to explore and easily points out the outliers. Grouped log entries are presented as Sample Message.

You can generate alerts for the cluster utilities like potential issues and outliers by using the link by clusters feature. See Generate Alerts for Cluster Utilities.

  1. Search for logs for a set of entities. See Search Logs by Entities.
  2. From the Visualize panel, select Cluster (open cluster).
You can see that similar log records are grouped in clusters along with a histogram view of all the records grouped by time interval. You can zoom in to a particular set of intervals (records grouped by time intervals in this case) in the histogram by keeping your left mouse button pressed and drawing a rectangle over the required set of intervals. After you zoom in, the cluster records change based on the selected interval.
Clicking the question mark (?) icon opens a window that displays help content on clusters.

The Cluster view displays a summary banner at the top showing the following tabs:

  • Total Clusters: Total number of clusters for the selected log records.

  • Potential Issues: Number of clusters that have potential issues based on log records containing words such as error, fatal, exception, and so on.

  • Outliers: Number of clusters that have occurred only once during a given time period.

  • Trends: Number of unique trends during the time period. Many clusters may have the same trend. So, clicking this panel shows a cluster from each of the trends.

    Note:

    If you hover your cursor over this panel, then you can also see the number of log records (for example, 22 clusters from 1,200 log records).

When you click any of the tabs, the histogram view of the cluster changes to display the records for the selected tab.

Each cluster pattern displays the following:

  • Trend: This column displays a sparkline representation of the trend (called trend shape) of the generation of log messages (of a cluster) based on the time range that you selected when you clustered the records. Each trend shape is identified by a Shape ID such as 1, 2, 3, and so on. This helps you to sort the clustered records on the basis of trend shapes.

    Clicking the arrow to the left of a trend entry displays the time series visualization of the cluster results. This visualization shows how the log records in a cluster were spread out based on the time range that was selected in the query. The trend shape is a sparkline representation of the time series.

  • ID: This column lists the cluster ID. The ID is unique within the collection.

  • Count: This column lists the number of log records having the same message signature.

  • Sample Message: This column displays a sample log record from the message signature.

  • Log Source: This column lists the log sources that generated the messages of the cluster.



You can click Show Similar Trends to sort clusters in an ascending order of trend shapes. You can also select a cluster ID or multiple cluster IDs and click Show Records to display all the records for the selected IDs.

You can also hide a cluster message or multiple clusters from the cluster results if the output seems cluttered. Right-click the required cluster and select Hide Cluster.



In each record, the variable values are highlighted. You can view all the similar variables in each cluster by clicking a variable in the Sample Message section. Clicking the variables shows all the values (in the entire record set) for that particular variable.



In the Sample Message section, some cluster patterns display a <n> more samples... link. Clicking this link displays more clusters that look similar to the selected cluster pattern.

Clicking Back to Trends takes you back to the previous page with context (it scrolls back to where you selected the variable to drill down further). The browser back button also takes you back to the previous page; however, the context won’t be maintained, because the cluster command is executed again in that case.

Cluster the Log Data Using SQL Fields

Here’s an example of clustering the SQL fields:

The large volume of log records are reduced to 89 clusters, thus offering you fewer groups of log data to analyze.

You can drill down the clusters by selecting the variables. For example, from the above set of clusters, select the cluster that has the sample message SELECT version FROM V$INSTANCE:



This displays the histogram visualization of the log records containing the specified sample message. You can now analyze the original log content. Click Back to Cluster to return to the cluster visualization.

The Trends panel shows the SQLs that have similar execution pattern.

The Outliers panel displays the SQLs that are rare and different.

Use Cluster Compare Utility

The cluster compare utility can be used to identify new issues by comparing the current set of clusters to a baseline and reducing the results by eliminating common or duplicate clusters. Some of the typical scenarios are:

Given two sets of log data, the cluster compare utility removes the data pertaining to the common clusters, and displays histogram data and the records table that are unique to each set. For example, when you compare the log data from week x and week y, the clusters that are common to both the weeks are removed for simplification, and the data unique to each week is displayed. This enables you to identify patterns that are unique for the specific week, and analyze the behavior.

For the syntax and other details of the clustercompare command, see Clustercompare Command in Using Oracle Log Analytics Search.

  1. In clusters open cluster visualization, select your current time range. By default, the query is *. You can refine the query to filter the log data.
  2. In the Visualize panel, click Cluster Compare.

    The cluster compare dialog box opens.



  3. You can notice that the current query and the current time range are displayed for reference.
    • Baseline Query: By default, this is the same as your current query. Click the edit icon and modify the baseline query, if required.
    • Baseline Time Range: By default, the cluster compare utility uses the Use Time Shift option to determine the baseline time range. Hence, the baseline time range is of the same duration as the current time range and is shifted to the period before the current time range. You can modify this by clicking the edit icon icon and selecting Use Custom Time or Use Current Time. If you select Use Custom Time, then specify the custom time range using the menu.
    • Click Compare.

      You can now view the cluster comparison between the two log sets.



      Click the button corresponding to each set to view the details like clusters, potential issues, outliers, trends, and records table that are unique to the set. The page also displays the number of clusters that are common between the two log sets.

      In the above example, there are 11 clusters found only in the current range, 4 clusters found only in the baseline range, and 30 clusters common in both the ranges. The histogram for the current time range displays the visualization using only the log data that is unique to the current time range.

Note:

Clusters found only in the current range are returned first, followed by clusters found only in the baseline range. The combined results are limited to 500 clusters. To reduce the cluster compare results, reduce the current time range or append a command to limit the number of results. For example, append | head 250 will limit both current and baseline clusters to 250 each. Use multi-select (click and drag hold) on the cluster histogram to reduce the current time range when using the custom time option. The time range shift value may be converted to minutes or seconds to ensure no time gaps or overlaps occur between the current and baseline time ranges.