REGEXP functions: Collated arguments are accepted (Pending)

Attention

This behavior change is in the 2026_04 bundle.

For the current status of the bundle, refer to Bundle history.

The behavior of the regular expression functions [ NOT ] REGEXP, REGEXP_COUNT, REGEXP_INSTR, REGEXP_LIKE, REGEXP_REPLACE, REGEXP_SUBSTR, REGEXP_SUBSTR_ALL (including the REGEXP_EXTRACT_ALL alias), and [ NOT ] RLIKE changes when their arguments have a collation specification.

Before the change:

When a REGEXP function received an argument with a collation specification, the function returned an error indicating that collation isn’t supported. To call a REGEXP function on a collated column, you had to apply the COLLATE function with an empty specification (COLLATE '') to the argument first.

After the change:

REGEXP functions accept arguments with a collation specification. The collation has no effect on how the regular expression is evaluated: matching is always case-sensitive unless you pass the i flag in the parameters argument. The collation specification (for example, en-ci) doesn’t make the match case-insensitive on its own.

For functions that return a string (REGEXP_SUBSTR and REGEXP_REPLACE), the collation of the subject argument propagates to the return value. The collation of other arguments (such as the pattern) is ignored. For REGEXP_SUBSTR_ALL, the result is an ARRAY, so collation isn’t propagated. To get an uncollated result from REGEXP_SUBSTR or REGEXP_REPLACE, apply COLLATE '' to the subject.

For example, when this change is enabled:

SELECT REGEXP_LIKE('Hello' COLLATE 'en-ci', 'hello');       -- Returns FALSE (case-sensitive).
SELECT REGEXP_LIKE('Hello' COLLATE 'en-ci', 'hello', 'i');  -- Returns TRUE  (case-insensitive via the 'i' flag).

-- The result of REGEXP_SUBSTR inherits the subject's collation (en-ci),
-- so the comparison is case-insensitive:
SELECT REGEXP_SUBSTR('HELLO' COLLATE 'en-ci', '.+') = 'hello';  -- Returns TRUE.

-- Stripping collation with COLLATE '' gives an uncollated (case-sensitive) result:
SELECT REGEXP_SUBSTR('HELLO' COLLATE '', '.+') = 'hello';       -- Returns FALSE.

This change reduces friction when working with collated columns in REGEXP functions, because you no longer have to strip the collation before passing a column to a REGEXP function.

Ref: 2314