nlp command can be used to extract keywords from a
string field, or to cluster records based on these extracted keywords. Keyword
extraction can be controlled using a custom NLP dictionary. If no dictionary is
provided, the default out-of-the box-dictionary is used.
For more information on semantic clustering, see Semantic Clustering Using Natural Language Processing.
Cluster Kernel Errors in Linux Syslog Logs
The following query clusters Kernel messages in Linux Syslog Logs:
'Log Source' = 'Linux Syslog Logs' and kernel | link cluster() | where 'Potential Issue' = '1' | nlp table = 'iSCSI Errors' cluster('Cluster Sample') as 'Cluster ID', keywords('Cluster Sample') as Summary | sort 'Cluster ID'
In the above query:
link cluster()runs the traditional cluster and returns a
nlp cluster('Cluster Sample')processes each
Cluster Sampleand assigns a Cluster ID. Messages that have similar meaning would get the same Cluster ID.
keywords('Cluster Sample')extracts the keywords used in clustering. This is returned in the
The following image shows the link results returned:
The first and second rows are not similar, and hence get different cluster IDs.
The third and fourth rows have similarity in the Cluster Sample. This can be seen in the overlap of keywords extracted in the
By default, a 70% overlap is required to form a cluster. This can be overridden using the
similarityparameter to cluster.
The Cluster ID generated is deterministic. Thus, the Cluster ID can be used as a shortcut for the list of keywords shown in the Summary column.
to Control the Number of Clusters
Running cluster using the default dictionary and a lower similarity threshold would produce fewer clusters:
'Log Source' = 'Linux Syslog Logs' and kernel | link cluster() | where 'Potential Issue' = '1' | nlp similarity=0.2 cluster('Cluster Sample') as 'Cluster ID', keywords('Cluster Sample') as Summary | sort 'Cluster ID'
This merged some of the rows into the existing clusters, as well as reduced the number of clusters:
Cluster the Database Alert Logs
The following query shows an example of semantically clustering Database Alert Logs:
'Log Source' = 'Database Alert Logs' | link cluster() | nlp cluster('Cluster Sample') as 'Cluster ID', keywords('Cluster Sample') as Summary | where Summary != null | classify 'Start Time', Summary, 'Cluster ID' as 'Database Messages'