C Sample Parse Expressions

You can refer to the following sample parse expressions to create a suitable parse expression for extracting values from your log file.

A log file comprises entries that are generated by concatenating multiple field values. You may not need to view all the field values for analyzing a log file of a particular format. Using a parser, you can extract the values from only those fields that you want to view.

A parser extracts fields from a log file based on the parse expression that you’ve defined. A parse expression is written in the form of a regular expression that defines a search pattern. In a parse expression, you enclose search patterns with parentheses (), for each matching field that you want to extract from a log entry. Any value that matches a search pattern that’s outside the parentheses isn’t extracted.

Example 1

If you want to parse the following sample log entries:

Jun 20 15:19:29 hostabc rpc.gssd[2239]: ERROR: can't open clnt5aa9: No such file or directory
Jul 29 11:26:28 hostabc kernel: FS-Cache: Loaded
Jul 29 11:26:28 hostxyz kernel: FS-Cache: Netfs 'nfs' registered for caching

Following should be your parse expression:

(\S+)\s+(\d+)\s(\d+):(\d+):(\d+)\s(\S+)\s(?:([^:\[]+)(?:\[(\d+)\])?:\s+)?(.+)

In the preceding example, some of the values that the parse expression captures are:

  • (\S+): Multiple non-whitespace characters for the month

  • (\d+): Multiple non-whitespace characters for the day

  • (?:([^:\[]+): (Optional) All the characters except ^, :, \, []; this is for the service name

  • (.+): (Optional) Primary message content

Example 2

If you want to parse the following sample log entries:

####<Apr 27, 2014 4:01:42 AM PDT> <Info> <EJB> <host> <AdminServer> <[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <OracleSystemUser> <BEA1-13E2AD6CAC583057A4BD> <b3c34d62475d5b0b:6e1e6d7b:143df86ae85:-8000-000000000000cac6> <1398596502577> <BEA-010227> <EJB Exception occurred during invocation from home or business: weblogic.ejb.container.internal.StatelessEJBHomeImpl@2f9ea244 threw exception: javax.ejb.EJBException: what do i do: seems an odd quirk of the EJB spec. The exception is:java.lang.StackOverflowError>
####<Jul 30, 2014 8:43:48 AM PDT> <Info> <RJVM> <host.com> <> <Thread-9> <> <> <> <1406735028770> <BEA-000570> <Network Configuration for Channel "AdminServer" Listen Address host.com:7002 (SSL) Public Address N/A Http Enabled true Tunneling Enabled false Outbound Enabled false Admin Traffic Enabled true ResolveDNSName Enabled false> 

Following should be your parse expression::

####<(\p{Upper}\p{Lower}{2})\s+([\d]{1,2}),\s+([\d]{4})\s+([\d]{1,2}):([\d]{2}):([\d]{2})\s+(\p{Upper}{2})(?:\s+(\w+))?>\s+<(.*?)>\s+<(.*?)>\s+<(.*?)>\s+<(.*?)>\s+<(.*?)>\s+<(.*?)>\s+<(.*?)>\s+<(.*?)>\s+<\d{10}\d{3}>\s+<(.*?)>\s+<(.*?)(?:\n(.*))?>\s*

In the preceding example, some of the values that the parse expression captures are:

  • (\p{Upper}\p{Lower}{2}): 3-letter short name for the month; with the first letter in uppercase followed by two lowercase letters

  • ([\d]{1,2}): 1-or-2-digit day

  • ([\d]{4}): 4-digit year

  • ([\d]{1,2}): 1-or-2-digit hour

  • ([\d]{2}): 2-digit minute

  • ([\d]{2}): 2-digit second

  • (\p{Upper}{2}): 2-letter AM/PM in uppercase

  • (?:\s+(\w+)): (Optional, some entries may not return any value for this) Multiple alpha-numeric characters for the time zone

  • (.*?): (Optional, some entries may not return any value for this) One or multiple characters for the severity level; in this case <INFO>

  • (.*): Any additional details along with the message

Search Patterns

Some of the commonly used patterns are explained in the following table:

Pattern Description Example
. Any character except line break d.f matches def, daf, dbf, and so on
* Zero or more times D*E*F* matches DDEEFF, DEF, DDFF, EEFF, and so on
? Once or none; optional colou?r matches both colour and color
+ One or more Stage \w-\w+ matches Stage A-b1_1, Stage B-a2, and so on
{2} Exactly two times [\d]{2} matches 01, 11, 21, and so on
{1,2} Two to four times [\d]{1,2} matches 1, 12, and so on
{3,} Three or more times [\w]{3,} matches ten, hello, h2134, and so on
[ … ] One of the characters in the brackets [AEIOU] matches one uppercase vowel
[x-y] One of the characters in the range from x to y [A-Z]+ matches ACT, ACTION, BAT, and so on
[^x] One character that is not x [^/d]{2} matches AA, BB, AC, and so on
[^x-y] One of the characters not in the range from x to y [^a-z]{2} matches A1, BB, B2, and so on
[\d\D] One character that is a digit or a non-digit [\d\D]+ matches any character, including new lines, which the regular dot doesn't match
\s A whitespace (\S+)\s+(\d+) matches AA 123, a_ 221, and so on
\S One character that is not a whitespace (\S+) matches abcd, ABC, A1B2C3, and so on
\n A new line (\d)\n(\w) matches:

1

A

\w An alphanumeric character [\w-\w\w\w] matches a-123, 1–aaa, and so on
\p{Lower} Lowercase letters \p{Lower}{2} matches aa, ab, ac, bb, and so on
\p{Upper} Uppercase letters \p{Upper} matches A, B, C, and so on
\ followed by ?, [], *, . Escape character; to use the characters after \ as literals \? returns ?