Sample Parse Expressions

A log file comprises entries that are generated by concatenating multiple field values. You may not need to view all the field values for analyzing a log file of a particular format. Using a parser, you can extract the values from only those fields that you want to view.

A parser extracts fields from a log file based on the parse expression that you’ve defined. A parse expression is written in the form of a regular expression that defines a search pattern. In a parse expression, you enclose search patterns with parentheses (), for each matching field that you want to extract from a log entry. Any value that matches a search pattern that’s outside the parentheses isn’t extracted.

For the supported regex constructs, see Java Regex Package Documentation.

Example 1

If you want to parse the following sample log entries:

Jun 20 15:19:29 hostabc rpc.gssd[2239]: ERROR: can't open clnt5aa9: No such file or directory
Jul 29 11:26:28 hostabc kernel: FS-Cache: Loaded
Jul 29 11:26:28 hostxyz kernel: FS-Cache: Netfs 'nfs' registered for caching

Following should be your parse expression:

(\S+)\s+(\d+)\s(\d+):(\d+):(\d+)\s(\S+)\s(?:([^:\[]+)(?:\[(\d+)\])?:\s+)?(.+)

In the preceding example, some of the values that the parse expression captures are:

(\S+): Multiple non-whitespace characters for the month
(\d+): Multiple non-whitespace characters for the day
([^:\[]+): All the characters except : and [ for the service name
(.+): (Optional) Primary message content

Example 2

If you want to parse the following sample log entries:

####<Apr 27, 2014 4:01:42 AM PDT> <Info> <EJB> <host> <AdminServer> <[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <OracleSystemUser> <BEA1-13E2AD6CAC583057A4BD> <b3c34d62475d5b0b:6e1e6d7b:143df86ae85:-8000-000000000000cac6> <1398596502577> <BEA-010227> <EJB Exception occurred during invocation from home or business: weblogic.ejb.container.internal.StatelessEJBHomeImpl@2f9ea244 threw exception: javax.ejb.EJBException: what do i do: seems an odd quirk of the EJB spec. The exception is:java.lang.StackOverflowError>
####<Jul 30, 2014 8:43:48 AM PDT> <Info> <RJVM> <example.com> <> <Thread-9> <> <> <> <1406735028770> <BEA-000570> <Network Configuration for Channel "AdminServer" Listen Address example.com:7002 (SSL) Public Address N/A Http Enabled true Tunneling Enabled false Outbound Enabled false Admin Traffic Enabled true ResolveDNSName Enabled false>

Following should be your parse expression::

####<(\p{Upper}\p{Lower}{2})\s+([\d]{1,2}),\s+([\d]{4})\s+([\d]{1,2}):([\d]{2}):([\d]{2})\s+(\p{Upper}{2})(?:\s+(\w+))?>\s+<(.*?)>\s+<(.*?)>\s+<(.*?)>\s+<(.*?)>\s+<(.*?)>\s+<(.*?)>\s+<(.*?)>\s+<(.*?)>\s+<\d{10}\d{3}>\s+<(.*?)>\s+<(.*?)(?:\n(.*))?>\s*

In the preceding example, some of the values that the parse expression captures are:

(\p{Upper}\p{Lower}{2}): 3-letter short name for the month; with the first letter in uppercase followed by two lowercase letters
([\d]{1,2}): 1-or-2-digit day
([\d]{4}): 4-digit year
([\d]{1,2}): 1-or-2-digit hour
([\d]{2}): 2-digit minute
([\d]{2}): 2-digit second
(\p{Upper}{2}): 2-letter AM/PM in uppercase
(?:\s+(\w+)): (Optional, some entries may not return any value for this) Multiple alpha-numeric characters for the time zone
(.*?): (Optional, some entries may not return any value for this) One or multiple characters for the severity level; in this case <INFO>
(.*): Any additional details along with the message

Search Patterns

Some of the commonly used patterns are explained in the following table:

Pattern	Description	Example
.	Any character except line break	`d.f` matches def, daf, dbf, and so on
*	Zero or more times	`DEF*` matches DDEEFF, DEF, DDFF, EEFF, and so on
?	Once or none; optional	`colou?r` matches both colour and color
+	One or more	`Stage \w-\w+` matches Stage A-b1_1, Stage B-a2, and so on
{2}	Exactly two times	`[\d]{2}` matches 01, 11, 21, and so on
{1,2}	One to two times	`[\d]{1,2}` matches 1, 12, and so on
{3,}	Three or more times	`[\w]{3,}` matches ten, hello, h2134, and so on
[ … ]	One of the characters in the brackets	`[AEIOU]` matches one uppercase vowel
[x-y]	One of the characters in the range from x to y	`[A-Z]+` matches ACT, ACTION, BAT, and so on
[^x]	One character that is not x	`[^/d]{2}` matches AA, BB, AC, and so on
[^x-y]	One of the characters not in the range from x to y	`[^a-z]{2}` matches A1, BB, B2, and so on
[\d\D]	One character that is a digit or a non-digit	`[\d\D]+` matches any character, including new lines, which the regular dot doesn't match
\s	A whitespace	`(\S+)\s+(\d+)` matches AA 123, a_ 221, and so on
\S	One character that is not a whitespace	`(\S+)` matches abcd, ABC, A1B2C3, and so on
\n	A new line	`(\d)\n(\w)` matches: 1 A
\w	An alphanumeric character	`[\w-\w\w\w]` matches a-123, 1–aaa, and so on
\p{Lower}	Lowercase letters	`\p{Lower}{2}` matches aa, ab, ac, bb, and so on
\p{Upper}	Uppercase letters	`\p{Upper}` matches A, B, C, and so on
\ followed by ?, [], *, .	Escape character; to use the characters after \ as literals	`\?` returns ?