Regular Expression Conditions

The regex_like function performs regular expression matching. A regular expression is a pattern that the regular expression engine attempts to match with an input string. The syntax for invoking the regex_like function in a query is the same as all other functions, described in the Function Calls section. The regex_like function has 2 signatures with 2 and 3 parameters, respectively.

Syntax

boolean regex_like(any*, string)
 
boolean regex_like(any*, string, string)

Semantics

The regex_like function provides functionality similar to the LIKE operator in standard SQL, that is, it can be used to check if an input string matches a given pattern. The input string and the pattern are computed by the first and second arguments, respectively. A third, optional, argument specifies a set of flags that affect how the matching is done.

Normally, the regex_like function expects each of its arguments to return a single string. If that is not the case, it behaves as follows:

  • If it can be detected at compile time that the first argument will never return a string, it raises a compile-time error. Otherwise, it returns false if the first argument returns nothing, or more than one items, or a single item that is neither a string nor NULL.
  • It raises an error if the pattern or flags do not return a single string or NULL.
  • It returns NULL if any of the arguments returns a single NULL.

Otherwise, the regex_like function behaves as follows:

  • Raises an error if the pattern string is not valid or its length is greater than 512 characters.
  • Returns false if pattern does not match the input string.
  • Returns true if pattern matches input string.

The pattern string is the regular expression against which the input text is matched. The syntax of the pattern string is a subset of the one supported by the java Pattern class, see https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html. Specifically, each character in a regular expression is either a literal character that matches itself, or a meta character, that specifies a "construct" having a special meaning. Only the following constructs are supported: quoted characters, the quotation constructs, the period (.), and the greedy quantifier (*).

The period (.) is a meta-character that matches every character expect a new line. The greedy quantifier (*) is a meta-character that indicates zero or more occurrences of the preceding element. For example, the regex "D.*" matches any string that starts with the character 'D' and followed by zero or more characters.

For the full list of supported predefined quoted characters see https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html.

The flags string contains one or more characters, where each character is a flag specifying some particular behavior. The full list of acceptable characters and their semantics is listed in the following table:

Table 6-1 Predefined Quoted Characters

Flag Full Name Description
"d" UNIX_LINES

Enables Unix lines mode.

In this mode, only the '\n' line terminator is recognized in the behavior of period (.).

"i" CASE_INSENSITIVE

Enables case-insensitive matching.

By default, CASE_INSENSITIVE matching assumes that only characters in the US-ASCII character set are being matched. You can enable Unicode-aware CASE_INSENSITIVE by specifying the UNICODE_CASE flag in conjunction with this flag.

Specifying this flag may impose a slight performance penalty.

"x" COMMENTS

Permits white space and comments in pattern.

In this mode, white space is ignored, and embedded comments starting with # are ignored until the end of a line.

"l" LITERAL

When LITERAL is specified then the input string that specifies the pattern is treated as a sequence of literal characters. There is no special meaning for Metacharacters or escape sequences. The flags CASE_INSENSITIVE and UNICODE_CASE retain their impact on matching when used in conjunction with this flag. The other flags become superfluous.

"s" DOTALL

Enables DOTALL mode. In DOTALL mode, the expression dot (.) matches any character, including a line terminator. However, by default, the expression dot (.) does not match line terminators.

"u" UNICODE_CASE

When you enable the CASE_INSENSITIVE flag, by default, it does matching using only the characters in the US-ASCII character set. When you specify UNICODE_CASE then it does CASE_INSENSITIVE matching using the Unicode standard.

Specifying this flag may impose a performance penalty.

"c" CANON_EQ

When this flag is specified then two characters will be considered to match if, and only if, their full canonical decompositions match. When you specify this flag, the expression "a\u030A", for example, will match the string "\u00E5. By default, matching does not take canonical equivalence into account.

Specifying this flag may impose a performance penalty.

"U" UNICODE_CHARACTER_CLASS

Enables the Unicode version of Predefined character classes and POSIX character classes. When you specify this flag, then the (US-ASCII only) Predefined character classes and POSIX character classes are in conformance with Unicode Technical Standards. See http://unicode.org/reports/tr18/#Compatibility_Properties.

The flag implies UNICODE_CASE; it enables Unicode-aware case folding.Specifying this flag may impose a performance penalty.

Note:

The regex_like function will not be used for index scan optimization.

Example 6-39 Regular Expressions

To select all the people whose last name starts with 'S', use the pattern shown in the following query:

SELECT id, lastName FROM users where regex_like(lastName, "S.*");
 
 +----+----------+
 | id | lastName |
 +----+----------+
 | 10 | Smith    |
 +----+----------+
 
1 row returned

Example 6-40 Regular Expressions

To select all the people whose last name has at least one 'a' or 'A' in it, use the pattern and the flag for case insensitive shown in the following query:

SELECT id, lastName from users where regex_like(lastname,".*a.*","i");
 
 +----+----------+
 | id | lastName |
 +----+----------+
 | 20 | Ann      |
 | 30 | Paul     |
 +----+----------+
 
2 rows returned