String Functions (Regular Expressions)¶
These string functions perform operations that match a regular expression (often referred to as a “regex”).
In this Topic:
List of Regex Functions¶
Function |
Notes |
---|---|
Alias for RLIKE. |
|
Alias for RLIKE. |
|
General Usage Notes¶
In these notes, “subject” refers to the string to operate on and “pattern” refers to the regular expression:
The subject is typically a variable column, while the pattern is typically a constant, but this is not required; every argument to a regular expression function can be either a constant or variable.
Patterns support the full POSIX ERE (Extended Regular Expression) syntax. For details, see the POSIX basic and extended section (in Wikipedia).
Patterns also support the following Perl backslash-sequences:
\d
: decimal digit (0-9).\D
: not a decimal digit.\s
: whitespace character.\S
: not a whitespace character.\w
: “word” character (a-z, A-Z, underscore (“_”), or decimal digit).\W
: not a word character.\b
: word boundary.\B
: not a word boundary.
For details, see the Character classes section (in Wikipedia) or the Backslash sequences section (in the Perl documentation).
Note that the backslash character needs to be escaped (i.e.
\\
becomes\
). For more information, see Escape Characters and Caveats (in this topic).By default, the POSIX wildcard character
.
(in the pattern) does not include newline characters\n
(in the subject) as matches.To also match newline characters, either replace
.
with(.|\n)
in thepattern
argument, or use thes
parameter in theparameters
argument (described below).All the regular expression functions support Unicode. A single Unicode character always counts as one character (i.e. the POSIX meta-character
.
matches exactly one Unicode character), regardless of the byte-length of the corresponding binary representation of that character. Also, for functions that take or return subject offsets, a single Unicode character counts as 1.
Parameters¶
Most regular expression functions support an optional parameters
argument as the very last input. The parameters
argument is a VARCHAR string that specifies the matching
behavior of the regular expression function. The following parameters are supported:
Parameter |
Effect |
---|---|
|
Enables case-sensitive matching. |
|
Enables case-insensitive matching. |
|
Enables multi-line mode (i.e. meta-characters |
|
Extracts sub-matches; applies only to REGEXP_INSTR and REGEXP_SUBSTR. |
|
Enables the POSIX wildcard character |
The default string is simply c
, which specifies:
Case-sensitive matching.
Single-line mode.
No sub-match extraction, except for REGEXP_REPLACE, which always uses sub-match extraction.
POSIX wildcard character
.
does not match\n
newline characters.
When specifying multiple parameters, the string is entered with no spaces or delimiters.
For example, ims
specifies case-insensitive matching in multi-line mode with POSIX wildcard matching.
If both c
and i
are included in the parameters
string, the one that occurs last in the string dictates whether the function performs case-sensitive or case-insensitive
matching. For example, ci
specifies case-sensitive matching because the “i” occurs last in the string.
Backreferences¶
Snowflake does not support backreferences in regular expression patterns (known as “squares” in formal language theory); however, backreferences are supported in the replacement string of the REGEXP_REPLACE function.
Corner Cases¶
In most regexp functions, an empty pattern (i.e. ''
) matches nothing, not even an empty subject.
The exceptions are REGEXP_LIKE and its aliases REGEXP and RLIKE,
in which the empty pattern matches the empty subject because the pattern is implicitly anchored at both ends
(i.e. ''
automatically becomes '^$'
).
An empty group (i.e. sub-expression ()
), matches the space in between characters, including the beginning and end of the subject.
Escape Characters and Caveats¶
As mandated by the POSIX standard, the single backslash character \
is used to escape meta-characters (e.g. \*
or \?
). The backslash character is also used for so-called
backslash-sequences (e.g. \w
).
Note that the backslash character is further used to insert control characters into SQL strings (e.g. \n
to insert a newline). As a result, to insert a single backslash character into
a SQL string literal, the backslash character needs to be escaped (i.e. \\
becomes \
).
For example, to insert the backreference \1
into a replacement string literal of REGEXP_LIKE, you might need to use \\\\1
.