snowflake.snowpark.functions.regexp_substr_all¶

snowflake.snowpark.functions.regexp_substr_all(subject: Union[snowflake.snowpark.column.Column, str], pattern: Union[snowflake.snowpark.column.Column, str], position: Union[snowflake.snowpark.column.Column, str] = None, occurrence: Union[snowflake.snowpark.column.Column, str] = None, regex_parameters: Union[snowflake.snowpark.column.Column, str] = None, group_num: Union[snowflake.snowpark.column.Column, str] = None) → Column[source]¶

Returns all substrings that match a regular expression within a string.

Parameters:

subject (ColumnOrName) – The string to search for matches.
pattern (ColumnOrName) – The regular expression pattern to match.
position (ColumnOrName, optional) – The position in the string to start searching from (1-based). Defaults to 1.
occurrence (ColumnOrName, optional) – Which occurrence of the pattern to return.
regex_parameters (ColumnOrName, optional) – String of one or more characters that specifies the parameters for the regular expression. Default is ‘c’ (case-sensitive).
values (Supported) –
- c: Case-sensitive matching
- i: Case-insensitive matching
- m: Multi-line mode
- e: Extract submatches
- s: Single-line mode (POSIX wildcard character . matches n)
group_num (ColumnOrName, optional) – The group number in the regular expression to extract. Defaults to None, which extracts the entire match.

Returns:

An array containing all matching substrings.

Return type:

Column

Examples::

>>> from snowflake.snowpark.functions import col, lit
>>> df = session.create_dataframe([['a1_a2a3_a4A5a6']], schema=["subject"])
>>> df.select(regexp_substr_all(col("subject"), lit('a[[:digit:]]')).alias("result")).collect()
[Row(RESULT='[\n  "a1",\n  "a2",\n  "a3",\n  "a4",\n  "a6"\n]')]

>>> df.select(regexp_substr_all(col("subject"), lit('a[[:digit:]]'), lit(2)).alias("result")).collect()
[Row(RESULT='[\n  "a2",\n  "a3",\n  "a4",\n  "a6"\n]')]

>>> df.select(regexp_substr_all(col("subject"), lit('a[[:digit:]]'), lit(1), lit(3)).alias("result")).collect()
[Row(RESULT='[\n  "a3",\n  "a4",\n  "a6"\n]')]

>>> df.select(regexp_substr_all(col("subject"), lit('a[[:digit:]]'), lit(1), lit(1), lit('i')).alias("result")).collect()
[Row(RESULT='[\n  "a1",\n  "a2",\n  "a3",\n  "a4",\n  "A5",\n  "a6"\n]')]

>>> df.select(regexp_substr_all(col("subject"), lit('(a)([[:digit:]])'), lit(1), lit(1), lit('ie'), lit(1)).alias("result")).collect()
[Row(RESULT='[\n  "a",\n  "a",\n  "a",\n  "a",\n  "A",\n  "a"\n]')]

>>> df.select(regexp_substr_all(col("subject"), lit('b')).alias("result")).collect()
[Row(RESULT='[]')]