Write Performant Queries for Searching and Exporting Logs

Here we discuss some of the important aspects that you must consider while composing the queries for searching and exporting logs.

Topics:

Understanding the Index

During the log ingestion, the log record is parsed using the parser, log source, fields, and extended fields, and labels are extracted. All these values are stored in a query-efficient index. In addition, the keywords in the entire log record are also indexed. The queries that use the index are several times faster than the queries not using the index. If you are querying over large amount of data, then you must ensure that the index is used.

When is the index not used?

An index is not used when a query uses wild cards, or when a field is manipulated using one of the string-processing functions. Following are some examples:

  • Avoid using wild cards in the query

    A common use case is to look for a specific pattern in a field as follows:

    'Error Text' like '%NullPointerException%'

    The above query will not use the index due to the use of wild cards. If the Error Text field has large amount of data, or if the query is run over large number of records, then it eventually times out. To avoid this, add a condition in the log source to detect this pattern and set a field. Then you can directly use that field in the query.

    For example, you can populate a field named Error Type in the log source, based on various conditions, and then update the query to use this field:

    'Error Type' = NullPointerException

    If you are not able to add the pattern to the log source, then an alternative is to change to a keyword search as follows:

    NullPointerException

    This will search for the word NullPointerException in the Original Log Content field which represents the raw log record. This is useful if you don't need an exact match, since this query is several times faster than the earlier query using the wild cards.

    Note

    The use of wild cards against a field storing large amount of fields performs worse than against a small field. But if the number of log records involved are large, then even wild cards against a small field are slow.
  • Avoid using expensive string operators

    An index is not used on fields generated using string operations like substr(), extract and jsonExtract. Following are some examples where the index is skipped:

    * | eval Prefix = substr('Host Name (Server)', 1, 3) | where Prefix = DBC
    
    * | where indexOf('Host Name (Server)', 'DBC') != -1

    If you need to frequently use string operators, then you should consider moving these to the parser or log source and create separate fields.

Long Time ranges

Querying large amount of data over a long period is also expensive. Try to reduce the time range, or add more search criteria to reduce the amount of data to be processed.

Using Multiple Log Sets

If your tenancy has log partitioning enabled, then you would be using a log set to select a part of the log data. Log sets are optimized to reduce the load on the system. If you query against a single log set, then it keeps the query to a smaller set of servers. But a query against a large number of log sets causes more data transfer and query against larger number of services. This can cause the queries to significantly slow down.

Avoid using more than five log sets in a query for the best performance.

Other expensive operations

Following are some of the other expensive operations that can slow down queries against large amount of data:

  • sort
  • head or tail commands
  • stats or timestats commands with high cardinality fields. These are the fields that have a large number of distinct values. Transaction ID, ECID, etc. are examples of such fields.