Regular Expressions
You can use regular expressions to describe a set of strings based on common characteristics shared by each string in the set.
A regular expression is basically a sequence of characters that defines a search pattern, which is used for pattern matching. Regular expressions vary in complexity, but once you understand the basics of how they are constructed, you can decipher or create any regular expression.
String Literals
The most basic form of pattern matching is the match of a string literal. For example, if the regular expression is EMP
and the input string is EMP
, the match succeeds because the strings are identical. This regular expression also matches any string containing EMP
, such as EMPLOYEE
, TEMP
, and TEMPERATURE
.
Metacharacters
You can also use some special characters that affect the way a pattern is matched. One of the most common ones is the dot (.
) symbol, which matches any character. For example, EMPLOYEE.ID
matches EMPLOYEE_ID
and EMPLOYEE-ID
, but not EMPLOYEE_VERIFICATION_ID
. Here, the dot is a metacharacter — a character with special meaning interpreted by the matcher.
Some other metacharacters are: ^ $ ? + * \ - [ ] ( ) { }
.
If you want a metacharacter to be treated literally (as an ordinary character), you can use a backslash (\
) to escape it. For example, the regular expression 9\+9
matches 9+9
.
Character Classes
A character class is a set of characters enclosed within square brackets. It specifies the characters that successfully match a single character from a given input string.
The following table describes some common regular expression constructs.
Construct | Description |
---|---|
[abc] |
Matches one of the characters mentioned within square brackets. Example: |
[^abc] |
Matches any character except the ones mentioned within square brackets. Example: |
[A-Z0-9] |
Matches any character in the range mentioned within square brackets. To specify a range, simply insert the dash metacharacter " Example: |
Oracle Data Safe also supports predefined character classes.
Capturing Groups
You can use capturing groups to treat multiple characters as a single unit. A capturing group is created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (SSN)
creates a single group containing the letters S
, S
, and N
.
Quantifiers
You can use quantifiers to specify the number of occurrences to match against.
The following table describes some common quantifiers.
Quantifier | Description |
---|---|
X? |
Matches zero or one occurrence of the specified character or group of characters. Example: |
X* |
Matches zero or more occurrences of the specified character or group of characters. Example: |
X+ |
Matches one or more occurrences of the specified character or group of characters. Example: |
X{n} |
Matches the specified character or group of characters exactly Example: |
X{n,} |
Matches the specified character or group of characters at least Example: |
X{n,m} |
Matches the specified character or group of characters at least Example: |
You can also use quantifiers with character classes and capturing groups.
An example of regular expression using character class is SSN[0-9]+
, which matches strings like SSN0
, SSN1
, and SSN12
. Here, [0-9]
is a character class and is allowed one or more times. The regular expression does not match SSN
.
An example of regular expression using capturing group is SSN_NUM(BER)?
, which matches SSN_NUM
and SSN_NUMBER
. (BER)
is a capturing group and is allowed zero or one time.
Boundary Matchers
You can use boundary matchers to make pattern matching more precise by specifying where in the string the match should take place. For example, you might be interested in finding a particular word, but only if it appears at the beginning or end of an input string.
The following table describes common boundary matchers.
Boundary Construct | Description |
---|---|
^ |
Matches the specified character or group of characters at the beginning of a string (starts with search). Example: |
$ |
Matches the specified character or group of characters at the end of a string (ends with search). Example: |
\b |
Marks a word boundary. Matches the character or group of characters specified between a pair of Example: |
If no boundary matcher is specified, a contains search is performed. For example, ELECTORAL
matches strings containing ELECTORAL
, such as ELECTORAL_ID
, ID_ELECTORAL
, and ELECTORALID
.
An exact match search can be performed by using ^
and $
together. For example, ^ADDRESS$
searches for the exact string ADDRESS
. It matches the string ADDRESS
, but does not match strings like PRIMARY_ADDRESS
and ADDRESS_HOME
.
Logical Operators
If you want to match any one of the characters or group of characters separated by pipe, you can use the pipe or vertical bar character (|
) . For example, EMPLOY(EE|ER)_ID
matches EMPLOYEE_ID
and EMPLOYER_ID
.
Examples
^JOB.*(TITLE|PROFILE|POSITION)$
matches strings beginning with JOB
, followed by zero or more occurrences of any character, and ending with TITLE
, PROFILE
, or POSITION
.
^[A-Z]{3}[0-9]{2}[A-Z0-9]$
matches strings beginning with three letters, followed by two digits, and ending with a letter or digit.
BIRTH.?(COUNTRY|PLACE)|(COUNTRY|PLACE).*BIRTH
matches strings such as BIRTH COUNTRY
, PATIENT_BIRTH_PLACE
, PLACE_OF_BIRTH
, and EMPLOYEE'S COUNTRY OF BIRTH
.