Sample Parse Expressions
You can refer to the following sample parse expressions to create a suitable parse expression for extracting values from your log file.
A log file comprises entries that are generated by concatenating multiple field values. You may not need to view all the field values for analyzing a log file of a particular format. Using a parser, you can extract the values from only those fields that you want to view.
A parser extracts fields from a log file based on the parse expression that you’ve defined. A parse expression is written in the form of a regular expression that defines a search pattern. In a parse expression, you enclose search patterns with parentheses (), for each matching field that you want to extract from a log entry. Any value that matches a search pattern that’s outside the parentheses isn’t extracted.
For the supported regex constructs, see Java Regex Package Documentation.
Example 1
If you want to parse the following sample log entries:
Jun 20 15:19:29 hostabc rpc.gssd[2239]: ERROR: can't open clnt5aa9: No such file or directory
Jul 29 11:26:28 hostabc kernel: FS-Cache: Loaded
Jul 29 11:26:28 hostxyz kernel: FS-Cache: Netfs 'nfs' registered for caching
Following should be your parse expression:
(\S+)\s+(\d+)\s(\d+):(\d+):(\d+)\s(\S+)\s(?:([^:\[]+)(?:\[(\d+)\])?:\s+)?(.+)
In the preceding example, some of the values that the parse expression captures are:
-
(\S+)
: Multiple non-whitespace characters for the month -
(\d+)
: Multiple non-whitespace characters for the day -
([^:\[]+)
: All the characters except:
and[
for the service name -
(.+)
: (Optional) Primary message content
Example 2
If you want to parse the following sample log entries:
####<Apr 27, 2014 4:01:42 AM PDT> <Info> <EJB> <host> <AdminServer> <[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'> <OracleSystemUser> <BEA1-13E2AD6CAC583057A4BD> <b3c34d62475d5b0b:6e1e6d7b:143df86ae85:-8000-000000000000cac6> <1398596502577> <BEA-010227> <EJB Exception occurred during invocation from home or business: weblogic.ejb.container.internal.StatelessEJBHomeImpl@2f9ea244 threw exception: javax.ejb.EJBException: what do i do: seems an odd quirk of the EJB spec. The exception is:java.lang.StackOverflowError>
####<Jul 30, 2014 8:43:48 AM PDT> <Info> <RJVM> <example.com> <> <Thread-9> <> <> <> <1406735028770> <BEA-000570> <Network Configuration for Channel "AdminServer" Listen Address example.com:7002 (SSL) Public Address N/A Http Enabled true Tunneling Enabled false Outbound Enabled false Admin Traffic Enabled true ResolveDNSName Enabled false>
Following should be your parse expression::
####<(\p{Upper}\p{Lower}{2})\s+([\d]{1,2}),\s+([\d]{4})\s+([\d]{1,2}):([\d]{2}):([\d]{2})\s+(\p{Upper}{2})(?:\s+(\w+))?>\s+<(.*?)>\s+<(.*?)>\s+<(.*?)>\s+<(.*?)>\s+<(.*?)>\s+<(.*?)>\s+<(.*?)>\s+<(.*?)>\s+<\d{10}\d{3}>\s+<(.*?)>\s+<(.*?)(?:\n(.*))?>\s*
In the preceding example, some of the values that the parse expression captures are:
-
(\p{Upper}\p{Lower}{2})
: 3-letter short name for the month; with the first letter in uppercase followed by two lowercase letters -
([\d]{1,2})
: 1-or-2-digit day -
([\d]{4})
: 4-digit year -
([\d]{1,2})
: 1-or-2-digit hour -
([\d]{2})
: 2-digit minute -
([\d]{2})
: 2-digit second -
(\p{Upper}{2})
: 2-letter AM/PM in uppercase -
(?:\s+(\w+))
: (Optional, some entries may not return any value for this) Multiple alpha-numeric characters for the time zone -
(.*?)
: (Optional, some entries may not return any value for this) One or multiple characters for the severity level; in this case<INFO>
-
(.*)
: Any additional details along with the message
Search Patterns
Some of the commonly used patterns are explained in the following table:
Pattern | Description | Example |
---|---|---|
. | Any character except line break | d.f matches def, daf, dbf, and so on
|
* | Zero or more times | D*E*F* matches DDEEFF, DEF, DDFF, EEFF, and so on
|
? | Once or none; optional | colou?r matches both colour and color
|
+ | One or more | Stage \w-\w+ matches Stage A-b1_1, Stage B-a2, and so on
|
{2} | Exactly two times | [\d]{2} matches 01, 11, 21, and so on
|
{1,2} | One to two times | [\d]{1,2} matches 1, 12, and so on
|
{3,} | Three or more times | [\w]{3,} matches ten, hello, h2134, and so on
|
[ … ] | One of the characters in the brackets | [AEIOU] matches one uppercase vowel
|
[x-y] | One of the characters in the range from x to y | [A-Z]+ matches ACT, ACTION, BAT, and so on
|
[^x] | One character that is not x | [^/d]{2} matches AA, BB, AC, and so on
|
[^x-y] | One of the characters not in the range from x to y | [^a-z]{2} matches A1, BB, B2, and so on
|
[\d\D] | One character that is a digit or a non-digit | [\d\D]+ matches any character, including new lines, which the regular dot doesn't match
|
\s | A whitespace | (\S+)\s+(\d+) matches AA 123, a_ 221, and so on
|
\S | One character that is not a whitespace | (\S+) matches abcd, ABC, A1B2C3, and so on
|
\n | A new line | (\d)\n(\w) matches:
1 A |
\w | An alphanumeric character | [\w-\w\w\w] matches a-123, 1–aaa, and so on
|
\p{Lower} | Lowercase letters | \p{Lower}{2} matches aa, ab, ac, bb, and so on
|
\p{Upper} | Uppercase letters | \p{Upper} matches A, B, C, and so on
|
\ followed by ?, [], *, . | Escape character; to use the characters after \ as literals | \? returns ?
|