Regular Expression Conditions

The regex_like function performs regular expression matching. A regular expression is a pattern that the regular expression engine attempts to match with an input string. The syntax for invoking the regex_like function in a query is the same as all other functions, described in the Function Calls section. The regex_like function has two signatures with two and three parameters, respectively.

Syntax

boolean regex_like(any*, string)
 
boolean regex_like(any*, string, string)

Semantics

The regex_like function provides functionality similar to the LIKE operator in standard SQL, that is, it can be used to check if an input string matches a given pattern. The input string and the pattern are computed by the first and second arguments, respectively. A third, optional, argument specifies a set of flags that affect how the matching is done.

Normally, the regex_like function expects each of its arguments to return a single string. If that is not the case, it behaves as follows:

  • If it can be detected at compile time that the first argument will never return a string, it raises a compile-time error. Otherwise, it returns false if the first argument returns nothing, more than one item, or a single item that is neither a string nor NULL.
  • It raises an error if the pattern or flags do not return a single string or NULL.
  • It returns NULL if any of the arguments returns a single NULL.

Otherwise, the regex_like function behaves as follows:

  • Raises an error if the pattern string is not valid or its length is greater than 512 characters.
  • Returns false if the pattern does not match the input string.
  • Returns true if the pattern matches the input string.
The pattern string is the regular expression against which the input text is matched. The syntax of the pattern string is a subset of the one supported by the Java Pattern class. Specifically, each character in a regular expression is either a literal character that matches itself (for example, the pattern string x matches the character 'x'), or a meta character, that specifies a construct having a special meaning. Only the following metacharacters are supported:
  • Period (.)

    You use period metacharacter to match every character except a new line

  • Greedy quantifier (*)

    You use the greedy quantifier (*) to indicate zero or more occurrences of the preceding element.

    For example, the character D with a combination of period (.) and (*) "D.*", matches any string that starts with the character 'D' and is followed by zero or more characters.

  • Quotation constructs (\, \Q, \E)

    You use the backslash '\' character as a quotation construct with other metacharacters to instruct the regular expression engine to interpret the following metacharacter as a literal character.

    For example, you use the '\*' pattern to match the literal asterisk '*' character and not interpret it as the greedy quantifier (*).

    You use '\Q' and '\E' quotation constructs in the query to match the exact string pattern that starts after the '\Q' character and ends at the '\E' character.

    For example, you use "\\Q**Houston**\\E" to match any string that has a value '**Houston**'.

    Note:

    In the SQL shell or other equivalent tools,you must use a double backslash '\\' in place of a single backslash '\' quotation construct.
  • Escape sequences

    You use the escape sequences to match certain predefined characters. The following escape sequences are supported:

The flags string contains one or more characters, where each character is a flag specifying some particular behavior. The full list of acceptable characters and their semantics is listed in the following table:

Table 6-2 Escape sequences

Character Description
\x{h...h} Matches the character with hexadecimal value 0xh...h (Character.MIN_CODE_POINT <= 0xh...h <= Character.MAX_CODE_POINT)
\xhh Matches the character with hexadecimal value 0xhh
\uhhhh

Matches the Unicode character specified by the hexadecimal number 0xhhhh.

Example: 'u000A' matches the newline character

\t Matches the tab character ('\u0009')
\r Matches the carriage-return character ('\u000D')
\n Matches the newline (line feed) character ('\u000A')
\f Matches the form-feed character ('\u000C')
\e Matches the escape character ('\u001B')
\cx Matches the control character corresponding to x
\a Matches the alert (bell) character ('\u0007')
\\ Matches the backslash '\' character ('\u005C')
\0nn Matches the character with octal value 0nn(0 <= n<= 7)
\0n Matches the character with octal value 0n (0 <= n <= 7)
\0mnn Matches the character with octal value 0mnn(0 <= m<= 3, 0 <= n<= 7)

The flag contains one or more characters, where each character flag specifies some particular behavior. The full list of flag characters and their semantics are listed in the following table:

Table 6-3 Predefined Quoted Characters

Flag Full Name Description
"d" UNIX_LINES

Enables Unix lines mode.

The new line character ('\n' ) is the only line termination method recognized in this mode.

"i" CASE_INSENSITIVE

Enables case-insensitive matching.

By default, CASE_INSENSITIVE matching assumes that only characters in the US-ASCII character set are being matched. You can enable Unicode-aware CASE_INSENSITIVE by specifying the UNICODE_CASE flag in conjunction with this flag.

Specifying this flag may impose a slight performance penalty.

"x" COMMENTS

Permits white space and comments in pattern.

In this mode, white space is ignored, and embedded comments starting with # are ignored until the end of a line.

"l" LITERAL

When LITERAL is specified then the input string that specifies the pattern is treated as a sequence of literal characters. There is no special meaning for Metacharacters or escape sequences. The flags CASE_INSENSITIVE and UNICODE_CASE retain their impact on matching when used in conjunction with this flag. The other flags become superfluous.

"s" DOTALL

Enables DOTALL mode. In DOTALL mode, the expression dot (.) matches any character, including a line terminator. However, by default, the expression dot (.) does not match line terminators.

"u" UNICODE_CASE

When you enable the CASE_INSENSITIVE flag, by default, it matches using only the characters in the US-ASCII character set. When you specify UNICODE_CASE then it performs CASE_INSENSITIVE matching using the Unicode standard.

Specifying this flag may impose a performance penalty.

"c" CANON_EQ

When this flag is specified then two characters will be considered to match if, and only if, their full canonical decompositions match. When you specify this flag, the expression "a\u030A", for example, will match the string "\u00E5. By default, matching does not take canonical equivalence into account.

Specifying this flag may impose a performance penalty.

"U" UNICODE_CHARACTER_CLASS

Enables the Unicode version of Predefined character classes and POSIX character classes. When you specify this flag, then the (US-ASCII only) Predefined character classes and POSIX character classes are in conformance with Unicode Technical Standards. See http://unicode.org/reports/tr18/#Compatibility_Properties.

The flag implies UNICODE_CASE; it enables Unicode-aware case folding.Specifying this flag may impose a performance penalty.

Note:

The regex_like function will not be used for index scan optimization.
If you want to follow along with the examples, create and load data into user data application table illustrated in this section and airline baggage tracking application table illustrated in the Tables used in the Examples topic.

Note:

Run the queries provided in the below examples from the SQL shell. Make sure that you use a double backslash '\\' in place of a single backslash '\' for quotation constructs as shown in the queries.

Example 6-68 Fetch from the user data application table the list of users whose last name starts with 'S'

SELECT id, firstName, lastName FROM users WHERE regex_like(lastName, "S.*")

Explanation: In the above query, you use the regex_like function to match the lastName field beginning with an 'S' character. You create a pattern with the first character as 'S' followed by the period metacharacter (.) and the greedy quantifier (* ) to match zero or more occurrences of any other character.

Output:

{"id":10,"firstName":"John","lastName":"Smith"}
{"id":20,"firstName":"Jane","lastName":"Smith"}
{"id":30,"firstName":"Adam","lastName":"Smith"}
{"id":40,"firstName":"Joanna","lastName":"Smith"}

Example 6-69 Fetch from the user data application table the list of users whose last name has at least one 'w' or 'W'

SELECT id, lastName FROM users WHERE regex_like(lastname,".*w.*","i")

Explanation: In the above query, use the regex_like function with the required pattern and the 'i' flag to enable matching that is not case-sensitive.

Output:
{"id":60,"lastName":"Law"}
{"id":50,"lastName":"Winslet"}

Example 6-70 Validate the format of an e-mail address in the user data application table

SELECT id, firstName, lastName, email FROM users WHERE
      regex_like(email,".*@.*\\..*")
Explanation: In the above query, you fetch the list of users with a valid email format. In the user data application table, the email field contains the e-mail address of the user. Assuming the email field has an '@' character and ends with a '.string' pattern such as '.com', '.us', '.in', and so forth, you can validate the e-mail address using the regex_like function as follows:
  • Use the '@' character preceded and followed by a combination of period metacharacter (.) and the greedy quantifier (* ). This combination creates a pattern to match zero or more occurrences of other characters before and after the '@' character in the email field.
  • To achieve a domain name pattern such as '.com', you use the quotation construct backslash '\' character to match the period as a literal '.' character and not a period (.) metacharacter. Further, use the combination of period metacharacter (.) and the greedy quantifier (* ) to allow any domain name.

Note:

This example only validates the e-mail address format by checking the pattern availability of '@' and '.' characters in the specified order. It does not assure the validity of the e-mail address itself. All the rows that include the mentioned character pattern are fetched.
Output:
{"id":50,"firstName":"Aubrey","lastName":"Winslet","email":"reachaubrey@somemail.co.us"}
{"id":60,"firstName":"Jimmy","lastName":"Law","email":"reachjimlaw@gotmail.co.us"}
{"id":20,"firstName":"Jane","lastName":"Smith","email":"jane.smith201@reachmail.com"}
{"id":10,"firstName":"John","lastName":"Smith","email":"john.smith@reachmail.com"}

Example 6-71 Fetch from the user data application table the list of users with a five-star rating for community service

SELECT id, firstName, lastName FROM users WHERE
      regex_like(communityService,"\\Q*****\\E")
Explanation: In the above query, you fetch the list of users actively involved in community service. Assuming each user is rated for their involvement in community service with a certain '*' rating, you retrieve the list of users from the user data application table who have five stars, that is, "*****" in the communityService field. You use the regex_like function with the quotation constructs '\Q' and '\E' to match the pattern of five asterisk '*' characters. In this query, the asterisk (* ) character is used as a literal '*' character and not as a greedy quantifier (* ).

Note:

If you create the pattern without quotation constructs, an error is generated indicating that the specified pattern for the regex_like function is invalid.
Output:
{"id":20,"firstName":"Jane","lastName":"Smith"}

Example 6-72 Fetch from the user data application table the list of users whose street attribute of the address field has a suspected data entry error due to the presence of a tab character

SELECT id, firstName, users.address.street FROM users WHERE regex_like(users.address.street, ".*\t.*")
Explanation: You can use the regex_like function to identify the rows that include escape sequences. For a detailed list of supported predefined characters, see Table 6-2. In this query, you fetch the list of users from the user data application table whose street attribute of the address field erroneously includes a tab character. You use the regex_like function with the following pattern:
  • Use escape sequence '\t' to identify the tab character.
  • Use the combination of period metacharacter (.) and the greedy quantifier (*) before and after the escape sequence to allow zero or more occurrences of any other character in the street attribute.
Output:
{"id":70,"firstName":"Dierdre","street":"Maine\t(Suburb)"}
Alternatively, you can match the escape sequences using the corresponding Unicode hexadecimal character as shown in the query below. The Unicode value 0x0009 corresponds to a horizontal tab or a tab character. You get the same output as above.
SELECT id, firstName, users.address.street FROM users WHERE regex_like(users.address.street,
      ".*\u0009.*")

Note:

Oracle NoSQL Database supports insertion of control characters (ASCII code 0~31) and characters with ASCII code > 128 using their Unicode hexadecimal values in the SQL shell or equivalent tools.

For example, you can insert the Escape (ESC) character using its Unicode hexadecimal value 0x001B as given in the user data application table (see the state field in the row with id = 70).

If you want to fetch the corresponding row, match the pattern through regular expression using '\e' as provided in the query below:
SELECT id, firstName, users.address[].state FROM users WHERE
      EXISTS(users.address[regex_like($element.state, ".*\\e")])

Explanation: In this query, you check whether or not the state attribute of the address field includes a string that ends with an escape character. You use the quotation construct '\' to match the escape character preceded by the combination of period metacharacter (.) and greedy quantifier (*) to allow zero or more occurrences of other characters before the escape character.

Output:
{"id":70,"firstName":"Dierdre","state":"TX\u001B"}

Example 6-73 Find all the bags that traveled through ORD airport as an intermediate hop between the source of a trip and the final destination

SELECT 
ticketNo AS TICKETNUM,
fullName AS NAME, 
BaggageInfo.bagInfo.routing[] AS ROUTING
FROM BaggageInfo WHERE regex_like(BaggageInfo.bagInfo.routing[],".*/ORD/.*")
Explanation: In an airline baggage tracking application, you can fetch the list of passengers who had to transit at certain airports. For each passenger, the routing field in the BaggageInfo table contains the airport codes in the format source/transit/destination. In this query, you use the regex_like function to match the airport code of the transit airport as follows:
  • Use the combination of the period (. ) metacharacter and greedy quantifier (*) to allow any source airport code characters.
  • Include the airport code of the transit airport ORD between the two forward slash '/' characters.
  • Use the combination of the period (. ) metacharacter and greedy quantifier (*) to allow any destination airport code characters.
Output:
{"TICKETNUM":176234463813,"NAME":"Henry Jenkins","ROUTING":"SFO/ORD/FRA"}
{"TICKETNUM":1762392135540,"NAME":"Adelaide Willard","ROUTING":"GRU/ORD/SEA"}

Example 6-74 Find all the passengers with area code 364 in their contact phone.

SELECT ticketNo, contactPhone, fullName FROM BaggageInfo WHERE regex_like(contactPhone,"364-.*")
Explanation: In an airline baggage tracking application, you can fetch the list of passengers having the same area code in their contact phones. The contactPhone field contains the US-based contact number in a three-digit area code followed by a seven-digit local number format. Assuming the contact phone pattern in the contactPhone field is XXX-YYY-ZZZZ where X, Y, and Z are digits between 0-9, you use the regex_like function as follows:
  • Use the area code 364 followed by a hyphen '-' character.
  • Use the combination of the period (.) metacharacter and greedy quantifier (*) to allow the pattern matching of any three-digit number followed by a hyphen '-' character and a four-digit number.
Output:
{"ticketNo":1762320369957,"contactPhone":"364-610-4444","fullName":"Lorenzo Phil"}
{"ticketNo":1762320569757,"contactPhone":"364-610-4455","fullName":"Lucinda Beckman"}

User data application table

The following code creates the user data application table - users table.
CREATE TABLE users ( id INTEGER,
                     firstName STRING,
                     lastName STRING,
                     otherNames ARRAY(RECORD(first STRING, last STRING)),
                     age INTEGER,
                     income INTEGER,
                     address JSON,
                     connections ARRAY(INTEGER),
                     email STRING,
                     communityService STRING,
PRIMARY KEY (id) );
The following code populates users tables with sample rows.
INSERT INTO users VALUES ( 

   10,   
   "John",
   "Smith",
   [ {"first" : "Johny", "last" : "Good"}, {"first" : "Johny2", "last" : "Brave"},{"first" : "Johny3", "last" : "Kind"},{"first" : "Johny4", "last" : "Humble"} ],
   22,
   45000,
   {
      "street" : "Pacific Ave",
      "number" : 101,
      "city" : "Santa Cruz",
      "state" : "CA",
      "zip" : 95008,
      "phones" : [
         { "area" : 408, "number" : 4538955, "kind" : "work" },
         { "area" : 831, "number" : 7533341, "kind" : "home" },
         { "area" : 831, "number" : 7533382, "kind" : "mobile" }
      ]
   },
   [ 30, 55, 43 ],
   "john.smith@reachmail.com",
   "****"
);

INSERT INTO users VALUES ( 

   20,   
   "Jane",
   "Smith",
   [ {"first" : "Jane", "last" : "BeGood"} ],
   22,
   55000,
   {
      "street" : "Atlantic Ave",
      "number" : 201,
      "city" : "San Jose",
      "state" : "CA",
      "zip" : 95005,
      "phones" : [
         { "area" : 608, "number" : 6538955, "kind" : "work" },
         { "area" : 931, "number" : 9533341, "kind" : "home" },
         { "area" : 931, "number" : 9533382, "kind" : "mobile" }
      ]
   },
   [ 40, 75, 63 ],
   "jane.smith201@reachmail.com",
   "*****"
);

INSERT INTO users VALUES ( 

   30,   
   "Adam",
   "Smith",
   [ {"first" : "Adam", "last" : "BeGood"} ],
   45,
   75000,
   {
      "street" : "Indian Ave",
      "number" : 301,
      "city" : "Houston",
      "state" : "TX",
      "zip" : 95075,
      "phones" : [
         { "area" : 618, "number" : 6618955, "kind" : "work" },
         { "area" : 951, "number" : 9613341, "kind" : "home" },
         { "area" : 981, "number" : 9613382, "kind" : "mobile" }
      ]
   },
   [ 60, 45, 73 ],
   "adam.smith201reachmail.com",
   "***"
);

INSERT INTO users VALUES ( 

   40,   
   "Joanna",
   "Smith",
   [ {"first" : "Joanna", "last" : "Smart"} ],
   NULL,
   75000,
   {
      "street" : "Tex Ave",
      "number" : 401,
      "city" : "Houston",
      "state" : "TX",
      "zip" : 95085,
      "phones" : [
         { "area" : NULL, "number" : 1618955, "kind" : "work" },
         { "area" : 451, "number" : 4613341, "kind" : "home" },
         { "area" : 481, "number" : 4613382, "kind" : "mobile" }
      ]
   },
   [ 70, 30, 40 ],
   NULL,
   "**"
);

INSERT INTO users VALUES ( 

   50,   
   "Aubrey",
   "Winslet",
   [ {"first" : "Aubrey", "last" : "Regal"} ],
   NULL,
   15000,
   {
      "street" : "Tex Ave",
      "number" :501,
      "city" : "Houston",
      "state" : "TX",
      "zip" : 95085,
      "phones" : [
         { "area" : 723, "number" : 7618955, "kind" : "work" },
         { "area" : 751, "number" : 7613341, "kind" : "home" },
         { "area" : 781, "number" : 7613382, "kind" : "mobile" }
      ]
   },
   [ 50, 20, 40 ],
   "reachaubrey@somemail.co.us",
   "****"
);

INSERT INTO users VALUES ( 

   60,   
   "Jimmy",
   "Law",
   [ {"first" : "Jimmy", "last" : "Smart"} ],
   NULL,
   25000,
   {
      "street" : "Maine",
      "number" :501,
      "city" : "San Jose",
      "state" : "TX",
      "zip" : 95085,
      "phones" : [
         { "area" : 223, "number" : 2618955, "kind" : "work" },
         { "area" : 251, "number" : 2613341, "kind" : "home" },
         { "area" : 281, "number" : 2613382, "kind" : "mobile" }
      ]
   },
   [ 50, 20, 40 ],
   "reachjimlaw@gotmail.co.us",
   "***"
);

INSERT INTO users VALUES ( 

   70,   
   "Dierdre",
   "Amador",
   [ {"first" : "Dierdre", "last" : "Kind"} ],
   NULL,
   25000,
   {
      "street" : "Maine\t(Suburb)",
      "number" :701,
      "city" : "San Jose",
      "state" : "TX\u001B",
      "zip" : 95085,
      "phones" : [
         { "area" : 223, "number" : 6718955, "kind" : "work" },
         { "area" : 251, "number" : 6213341, "kind" : "home" },
         { "area" : 281, "number" : 6213382, "kind" : "mobile" }
      ]
   },
   [ 10, 60, 40 ],
   NULL,
   "***"
);