Regular Expressions

You can use regular expressions to describe a set of strings based on common characteristics shared by each string in the set.

A regular expression is basically a sequence of characters that defines a search pattern, which is used for pattern matching. Regular expressions vary in complexity, but once you understand the basics of how they are constructed, you can decipher or create any regular expression.

String Literals

The most basic form of pattern matching is the match of a string literal. For example, if the regular expression is EMP and the input string is EMP, the match succeeds because the strings are identical. This regular expression also matches any string containing EMP, such as EMPLOYEE, TEMP, and TEMPERATURE.

Metacharacters

You can also use some special characters that affect the way a pattern is matched. One of the most common ones is the dot (.) symbol, which matches any character. For example, EMPLOYEE.ID matches EMPLOYEE_ID and EMPLOYEE-ID, but not EMPLOYEE_VERIFICATION_ID. Here, the dot is a metacharacter — a character with special meaning interpreted by the matcher.

Some other metacharacters are: ^ $ ? + * \ - [ ] ( ) { }.

If you want a metacharacter to be treated literally (as an ordinary character), you can use a backslash (\) to escape it. For example, the regular expression 9\+9 matches 9+9.

Character Classes

A character class is a set of characters enclosed within square brackets. It specifies the characters that successfully match a single character from a given input string.

The following table describes some common regular expression constructs.

Construct Description
[abc]

Matches one of the characters mentioned within square brackets.

Example: EMPLOYE[ER] matches EMPLOYEE and EMPLOYER.

[^abc]

Matches any character except the ones mentioned within square brackets.

Example: [^BC]AT matches RAT and HAT, but does not match BAT and CAT.

[A-Z0-9]

Matches any character in the range mentioned within square brackets. To specify a range, simply insert the dash metacharacter "-" between the first and last character to be matched; for example, [1-5] or [A-M]. You can also place different ranges beside each other within the class to further expand the match possibilities.

Example: [B-F]AT matches BAT, CAT, DAT, EAT, and FAT, but does not match AAT and GAT.

Oracle Data Safe also supports predefined character classes.

Capturing Groups

You can use capturing groups to treat multiple characters as a single unit. A capturing group is created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (SSN) creates a single group containing the letters S, S, and N.

Quantifiers

You can use quantifiers to specify the number of occurrences to match against.

The following table describes some common quantifiers.

Quantifier Description
X?

Matches zero or one occurrence of the specified character or group of characters.

Example: SSN_NUMBERS? matches strings SSN_NUMBER and SSN_NUMBERS.

X*

Matches zero or more occurrences of the specified character or group of characters.

Example: TERM.*DATE matches strings like TERMDATE, TERM_DATE and LAST_TERMINATION_DATE.

X+

Matches one or more occurrences of the specified character or group of characters.

Example: TERM.+DATE matches strings like TERM_DATE and TERMINATION_DATE, but not TERMDATE.

X{n}

Matches the specified character or group of characters exactly n times.

Example: 9{3} matches 999, but not 99.

X{n,}

Matches the specified character or group of characters at least n times.

Example: 9{3,} matches 999, 9999, and 99999, but not 99.

X{n,m}

Matches the specified character or group of characters at least n times but not more than m times.

Example: 9{3,4} matches 999 and 9999, but not 99.

You can also use quantifiers with character classes and capturing groups.

An example of regular expression using character class is SSN[0-9]+, which matches strings like SSN0, SSN1, and SSN12. Here, [0-9] is a character class and is allowed one or more times. The regular expression does not match SSN.

An example of regular expression using capturing group is SSN_NUM(BER)?, which matches SSN_NUM and SSN_NUMBER. (BER) is a capturing group and is allowed zero or one time.

Boundary Matchers

You can use boundary matchers to make pattern matching more precise by specifying where in the string the match should take place. For example, you might be interested in finding a particular word, but only if it appears at the beginning or end of an input string.

The following table describes common boundary matchers.

Boundary Construct Description
^

Matches the specified character or group of characters at the beginning of a string (starts with search).

Example: ^VISA matches strings beginning with VISA.

$

Matches the specified character or group of characters at the end of a string (ends with search).

Example: NUMBER$ matches strings ending with NUMBER.

\b

Marks a word boundary. Matches the character or group of characters specified between a pair of \b only if it is a separate word (as opposed to substring within a longer string).

Example: \bAGE\b matches strings like EMPLOYEE AGE and PATIENT AGE INFORMATION, but does not match strings like AGEING and EMPLOYEEAGE.

If no boundary matcher is specified, a contains search is performed. For example, ELECTORAL matches strings containing ELECTORAL, such as ELECTORAL_ID, ID_ELECTORAL, and ELECTORALID.

An exact match search can be performed by using ^ and $ together. For example, ^ADDRESS$ searches for the exact string ADDRESS. It matches the string ADDRESS, but does not match strings like PRIMARY_ADDRESS and ADDRESS_HOME.

Logical Operators

If you want to match any one of the characters or group of characters separated by pipe, you can use the pipe or vertical bar character (|) . For example, EMPLOY(EE|ER)_ID matches EMPLOYEE_ID and EMPLOYER_ID.

Examples

^JOB.*(TITLE|PROFILE|POSITION)$ matches strings beginning with JOB, followed by zero or more occurrences of any character, and ending with TITLE, PROFILE, or POSITION.

^[A-Z]{3}[0-9]{2}[A-Z0-9]$ matches strings beginning with three letters, followed by two digits, and ending with a letter or digit.

BIRTH.?(COUNTRY|PLACE)|(COUNTRY|PLACE).*BIRTH matches strings such as BIRTH COUNTRY, PATIENT_BIRTH_PLACE, PLACE_OF_BIRTH, and EMPLOYEE'S COUNTRY OF BIRTH.