Clusters Visualization

Clustering uses machine learning to identify the pattern of log records, and then to group the logs that have a similar pattern.

Clustering helps significantly reduce the total number of log entries that you have to explore and easily points out the outliers. Grouped log entries are presented as Sample Message.

Open the navigation menu and click Observability & Management. Under Log Analytics, click Log Explorer.
Search for logs for a set of entities. See Search Logs by Entities.
From the Visualize panel, select Cluster ().

You can see that similar log records are grouped in clusters along with a histogram view of all the records grouped by time interval. You can zoom in to a particular set of intervals (records grouped by time intervals in this case) in the histogram by keeping your left mouse button pressed and drawing a rectangle over the required set of intervals. After you zoom in, the cluster records change based on the selected interval.

The Cluster view displays a summary banner at the top showing the following tabs:

Total Clusters: Total number of clusters for the selected log records.

Note

If you hover your cursor over this panel, then you can also see the number of log records (for example, 260 clusters from 1,413,036 log records).
Potential Issues: Number of clusters that have potential issues based on log records containing words such as error, fatal, exception, and so on.
Outliers: Number of clusters that have occurred only once during a given time period.
Trends: Number of unique trends during the time period. Many clusters may have the same trend. So, clicking this panel shows a cluster from each of the trends.

When you click any of the tabs, the histogram view of the cluster changes to display the records for the selected tab.

Each cluster pattern displays the following:

Trend: This column displays a sparkline representation of the trend (called trend shape) of the generation of log messages (of a cluster) based on the time range that you selected when you clustered the records. Each trend shape is identified by a Shape ID such as 1, 2, 3, and so on. This helps you to sort the clustered records on the basis of trend shapes.

Clicking the arrow to the left of a trend entry displays the time series visualization of the cluster results. This visualization shows how the log records in a cluster were spread out based on the time range that was selected in the query. The trend shape is a sparkline representation of the time series.
ID: This column lists the cluster ID. The ID is unique within the collection.
Count: This column lists the number of log records having the same message signature.
Sample Message: This column displays a sample log record from the message signature.
Log Source: This column lists the log sources that generated the messages of the cluster.

You can click Show Similar Trends to sort clusters in an ascending order of trend shapes. You can also select a cluster ID or multiple cluster IDs and click Show Records to display all the records for the selected IDs.

You can also hide a cluster message or multiple clusters from the cluster results if the output seems cluttered. Right-click the required cluster and select Hide Cluster.

In each record, the variable values are highlighted. You can view all the similar variables in each cluster by clicking a variable in the Sample Message section. Clicking the variables shows all the values (in the entire record set) for that particular variable.

In the Sample Message section, some cluster patterns display a <n> more samples... link. Clicking this link displays more clusters that look similar to the selected cluster pattern.

Clicking Back to Trends takes you back to the previous page with context (it scrolls back to where you selected the variable to drill down further). The browser back button also takes you back to the previous page; however, the context won’t be maintained, because the cluster command is executed again in that case.

Cluster the Log Data Using SQL Fields

Here’s an example of clustering the SQL fields:

Description of the illustration cluster_sql.png

The large volume of log records are reduced to 89 clusters, thus offering you fewer groups of log data to analyze.

You can drill down the clusters by selecting the variables. For example, from the above set of clusters, select the cluster that has the sample message SELECT version FROM V$INSTANCE:

Description of cluster_sql_variable.png follows

Description of the illustration cluster_sql_variable.png

This displays the histogram visualization of the log records containing the specified sample message. You can now analyze the original log content. Click Back to Cluster to return to the cluster visualization.

The Trends panel shows the SQLs that have similar execution pattern.

The Outliers panel displays the SQLs that are rare and different.

Use Cluster Compare Utility

The cluster compare utility can be used to identify new issues by comparing the current set of clusters to a baseline and reducing the results by eliminating common or duplicate clusters. Some of the typical scenarios are:

Which clusters are different in this week compared to last week?
See Cluster Compare by Time Shift.
What's the difference between the cluster set of entity A and the set of entity B?
See Cluster Compare by Current Time.
Things were working well in the month X. What changed in this month?
See Cluster Compare by Custom Time.

Given two sets of log data, the cluster compare utility removes the data pertaining to the common clusters, and displays histogram data and the records table that are unique to each set. For example, when you compare the log data from week x and week y, the clusters that are common to both the weeks are removed for simplification, and the data unique to each week is displayed. This enables you to identify patterns that are unique for the specific week, and analyze the behavior.

For the syntax and other details of the clustercompare command, see clustercompare.

In clusters visualization, select your current time range. By default, the query is *. You can refine the query to filter the log data.
In the Visualize panel, click Cluster Compare.
The cluster compare dialog box opens.
You can notice that the current query and the current time range are displayed for reference.
- Baseline Query: By default, this is the same as your current query. Click the and modify the baseline query, if required.
- Baseline Time Range: By default, the cluster compare utility uses the Use Time Shift option to determine the baseline time range. Hence, the baseline time range is of the same duration as the current time range and is shifted to the period before the current time range. You can modify this by clicking the icon and selecting Use Custom Time or Use Current Time. If you select Use Custom Time, then specify the custom time range using the menu.
- Click Compare.
  You can now view the cluster comparison between the two log sets.
  
  Click the button corresponding to each set to view the details like clusters, potential issues, outliers, trends, and records table that are unique to the set. The page also displays the number of clusters that are common between the two log sets.
  
  In the above example, there are 11 clusters found only in the current range, 4 clusters found only in the baseline range, and 30 clusters common in both the ranges. The histogram for the current time range displays the visualization using only the log data that is unique to the current time range.

Note

Clusters found only in the current range are returned first, followed by clusters found only in the baseline range. The combined results are limited to 500 clusters. To reduce the cluster compare results, reduce the current time range or append a command to limit the number of results. For example, append | head 250 will limit both current and baseline clusters to 250 each. Use multi-select (click and drag hold) on the cluster histogram to reduce the current time range when using the custom time option. The time range shift value may be converted to minutes or seconds to ensure no time gaps or overlaps occur between the current and baseline time ranges.

Use Dictionary Lookup in Cluster

Use dictionary lookup after the cluster command to annotate clusters.

Consider the cluster results for Linux Syslog Logs. To define a dictionary to add labels based on the Cluster Sample field:

Create a CSV file with the following contents:

Operator,Condition,Issue,Area
CONTAINS IGNORE CASE,invalid compare operation,Compare Error,Unknown
CONTAINS IGNORE CASE REGEX,DNS-SD.*?Daemon not running,DNS Daemon Down,DNS
CONTAINS ONE OF REGEXES,"[[Cc]onnection refused,[Cc]onnection .*? closed]",Connection Error,Network
CONTAINS IGNORE CASE,syntax error,Syntax Error,Validation
CONTAINS IGNORE CASE REGEX,Sense.*?(?:Error|fail),Disk Sensing Error,Disk
CONTAINS IGNORE CASE REGEX,device.*?check failed,Device Error,Disk

Import this as a Dictionary type lookup using the name Linux Error Categories. This lookup contains two fields, Issue and Area that can be returned on a matching condition. See Create a Dictionary Lookup.

Use the dictionary in cluster to return a field:

Run the cluster command for Linux Syslog Logs. Add a lookup command after cluster, as shown below:
```
'Log Source' = 'Linux Syslog Logs' 
| cluster
| lookup table = 'Linux Error Categories' select Issue using 'Cluster Sample'
```
The value of Cluster Sample for each row is evaluated against the rules defined in the Linux Error Categories dictionary. The Issue field is returned from each matching row.
Return more than one field by selecting each field in the lookup command:
```
'Log Source' = 'Linux Syslog Logs'
| cluster
| lookup table = 'Linux Error Categories' select Issue as Category, Area using 'Cluster Sample'
```
The above query selects the Issue field, and also renames it to Category. Area field is also selected, but not renamed.
Filter the cluster results using the dictionary fields:

Use the where command on the specific fields to filter the clusters. Consider the following query:
```
'Log Source' = 'Linux Syslog Logs' 
| cluster 
| lookup table = 'Linux Error Categories' select Issue as Category, Area using 'Cluster Sample' 
| where Area in (Unknown, Disk)
```
This displays only those records that matches the specified values for Area field.

Semantic Clustering

Cluster Visualization allows you to cluster text messages in log records. Cluster works by grouping messages that have similar number of words in a sentence, and identifying the words that change within those sentences. Cluster does not consider the literal meaning of the words during the grouping.

The nlp command supports semantic clustering. Semantic Clustering is done by extracting the relevant keywords from a message and clustering based on these keywords. Two sets of messages that have similar words are grouped together. Each such group is given a deterministic Cluster ID.

The following example shows the usage of NLP clustering and keywords on Linux Syslog Logs:

'Log Source' = 'Linux Syslogs Logs'
| link Time, Entity, cluster()
| nlp cluster('Cluster Sample') as 'Cluster ID', 
                    keywords('Cluster Sample') as Keywords
| classify 'Start Time', Keywords, Count, Entity as 'Cluster Keywords'

For more example use cases of semantic clustering, see Examples of Semantic Clustering.

nlp Command

The nlp command can be used only after the link command and supports two functions. cluster() can be used to cluster the specified field, and keywords() can be used to extract keywords from the specified field.

nlp command can be used only after the link command. See nlp.

nlp cluster():

cluster() takes the name of a field generated in Link, and returns a Cluster ID for each clustered value. The returned Cluster ID is a number, represented as a string. The Cluster ID can be used in queries to filter the clusters.

For example:

nlp cluster('Description') as 'Description ID' - This would extract relevant keywords from the Description field. The Description ID field would contain a unique ID for each generated cluster.
nlp keywords():

Extracts keywords from the specified field values. The keywords are extracted based on a dictionary. The dictionary name can be supplied using the table option. If no dictionary is provided, the out-of-the-box default dictionary NLP General Dictionary is used.

For example:

nlp keywords('Description') as Summary - This would extract relevant keywords from the Description field. The keywords are accessible using the Summary field.

nlp table='My Issues' cluster('Description') as 'Description ID' - Instead of the default dictionary, use the custom dictionary My Issues.

NLP Dictionary

Semantic Clustering works by splitting a message into words, extracting the relevant words and then grouping the messages that have similar words. The quality of clustering thus depends on the relevance of the keywords extracted.

A dictionary is used to decide what words in a message should be extracted.
The order of items in the dictionary is important. An item in the first row has higher ranking than the item in the second row.
A dictionary is created as a .csv file, and imported using the Lookup user interface with Dictionary Type option.
It is not necessary to create a dictionary, unless you want to change the ranking of words. The default out-of-the-box NLP General Dictionary is used if no dictionary is specified. It contains pre-trained English words.

See Create a Dictionary Lookup.

Following is an example dictionary iSCSI Errors:

Operator	Condition	Value
`CONTAINS IGNORE CASE`	error	noun
`CONTAINS IGNORE CASE`	reported	verb
`CONTAINS IGNORE CASE`	iSCSI	noun
`CONTAINS IGNORE CASE`	connection	noun
`CONTAINS IGNORE CASE`	closed	verb

The first field is reserved for future use. Second field is a word. The third word specifies the type for that word. The type can be any string and can be referred to from the query using the category parameter.

In the above example, the word error has higher ranking than the words reported or iSCSI. Similarly, connection has higher ranking than closed.

Using a Dictionary

Suppose that the following text is seen in the Message field:

Kernel reported iSCSI connection 1:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (2) 
Please verify the storage and network connection for additional faults

The above message is parsed and split into words. Non-alphabets are removed. Following are some of the unique words generated from the split:

Kernel
reported
iSCSI
connection
error
ERR
TCP
CONN
CLOSE
closed
state
...
...

There are a total of 24 words in the message. By default, semantic clustering would attempt to extract 20 words and use these words to perform clustering. In a case like the above, the system needs to know which words are important. This is achieved by using the dictionary.

The dictionary is an ordered list. If iSCSI Errors is used, then NLP would not extract ERR, TCP, or CONN because these words are not included in the dictionary. Similarly, the words error, reported, iSCSI, connection, and closed are given higher priority due to their ranking in the dictionary.

Oracle Cloud Infrastructure Documentation