snowflake.snowpark.functions.regexp_substr_all

snowflake.snowpark.functions.regexp_substr_all(subject: Union[snowflake.snowpark.column.Column, str], pattern: Union[snowflake.snowpark.column.Column, str], position: Union[snowflake.snowpark.column.Column, str] = None, occurrence: Union[snowflake.snowpark.column.Column, str] = None, regex_parameters: Union[snowflake.snowpark.column.Column, str] = None, group_num: Union[snowflake.snowpark.column.Column, str] = None) Column[source]

Returns all substrings that match a regular expression within a string.

Parameters:
  • subject (ColumnOrName) – The string to search for matches.

  • pattern (ColumnOrName) – The regular expression pattern to match.

  • position (ColumnOrName, optional) – The position in the string to start searching from (1-based). Defaults to 1.

  • occurrence (ColumnOrName, optional) – Which occurrence of the pattern to return.

  • regex_parameters (ColumnOrName, optional) – String of one or more characters that specifies the parameters for the regular expression. Default is ‘c’ (case-sensitive).

  • values (Supported) –

    • c: Case-sensitive matching

    • i: Case-insensitive matching

    • m: Multi-line mode

    • e: Extract submatches

    • s: Single-line mode (POSIX wildcard character . matches n)

  • group_num (ColumnOrName, optional) – The group number in the regular expression to extract. Defaults to None, which extracts the entire match.

Returns:

An array containing all matching substrings.

Return type:

Column

Examples::
>>> from snowflake.snowpark.functions import col, lit
>>> df = session.create_dataframe([['a1_a2a3_a4A5a6']], schema=["subject"])
>>> df.select(regexp_substr_all(col("subject"), lit('a[[:digit:]]')).alias("result")).collect()
[Row(RESULT='[\n  "a1",\n  "a2",\n  "a3",\n  "a4",\n  "a6"\n]')]
Copy
>>> df.select(regexp_substr_all(col("subject"), lit('a[[:digit:]]'), lit(2)).alias("result")).collect()
[Row(RESULT='[\n  "a2",\n  "a3",\n  "a4",\n  "a6"\n]')]
Copy
>>> df.select(regexp_substr_all(col("subject"), lit('a[[:digit:]]'), lit(1), lit(3)).alias("result")).collect()
[Row(RESULT='[\n  "a3",\n  "a4",\n  "a6"\n]')]
Copy
>>> df.select(regexp_substr_all(col("subject"), lit('a[[:digit:]]'), lit(1), lit(1), lit('i')).alias("result")).collect()
[Row(RESULT='[\n  "a1",\n  "a2",\n  "a3",\n  "a4",\n  "A5",\n  "a6"\n]')]
Copy
>>> df.select(regexp_substr_all(col("subject"), lit('(a)([[:digit:]])'), lit(1), lit(1), lit('ie'), lit(1)).alias("result")).collect()
[Row(RESULT='[\n  "a",\n  "a",\n  "a",\n  "a",\n  "A",\n  "a"\n]')]
Copy
>>> df.select(regexp_substr_all(col("subject"), lit('b')).alias("result")).collect()
[Row(RESULT='[]')]
Copy