You are viewing documentation about an older version (1.46.0). View latest version

snowflake.snowpark.functions.regexp_substr¶

snowflake.snowpark.functions.regexp_substr(subject: Union[snowflake.snowpark.column.Column, str], pattern: Union[snowflake.snowpark.column.Column, str], position: Union[snowflake.snowpark.column.Column, str] = None, occurrence: Union[snowflake.snowpark.column.Column, str] = None, regex_parameters: Union[snowflake.snowpark.column.Column, str] = None, group_num: Union[snowflake.snowpark.column.Column, str] = None) → Column[source]¶

Returns the portion of the subject that matches the regular expression pattern.

Parameters:

subject (ColumnOrName) – The string to search for matches.
pattern (ColumnOrName) – The regular expression pattern to match.
position (ColumnOrName, optional) – The position in the string to start searching from (1-based). Defaults to 1.
occurrence (ColumnOrName, optional) – Which occurrence of the pattern to return. Defaults to 1.
regex_parameters (ColumnOrName, optional) – String of one or more characters that specifies the parameters for the regular expression. Default is ‘c’ (case-sensitive).
values (Supported) –
- c: Case-sensitive matching
- i: Case-insensitive matching
- m: Multi-line mode
- e: Extract submatches
- s: Single-line mode (POSIX wildcard character . matches n)
group_num (ColumnOrName, optional) – The group number in the regular expression to extract. Defaults to None, which extracts the entire match.

Returns:

The substring that matches the pattern, or None if no match is found.

Return type:

Column

Examples::

# Basic usage - only subject and pattern >>> from snowflake.snowpark.functions import col, lit >>> df = session.create_dataframe([[“nevermore1, nevermore2, nevermore3.”, “nevermored”]], schema=[“subject”, “pattern”]) >>> df.select(regexp_substr(col(“subject”), col(“pattern”)).alias(“basic_match”)).collect() [Row(BASIC_MATCH=’nevermore1’)]

# With position parameter >>> df2 = session.create_dataframe([[“Hello world”, “world”, 7]], schema=[“subject”, “pattern”, “position”]) >>> df2.select(regexp_substr(col(“subject”), col(“pattern”), col(“position”)).alias(“position_match”)).collect() [Row(POSITION_MATCH=’world’)]

# With position and occurrence parameters >>> df3 = session.create_dataframe([[“nevermore1, nevermore2, nevermore3.”, “nevermored”, 1, 2]], schema=[“subject”, “pattern”, “position”, “occurrence”]) >>> df3.select(regexp_substr(col(“subject”), col(“pattern”), col(“position”), col(“occurrence”)).alias(“second_occurrence”)).collect() [Row(SECOND_OCCURRENCE=’nevermore2’)]

# With position, occurrence, and regex_parameters >>> df5 = session.create_dataframe([[“Hello world”, “hello”, 1, 1, “i”]], schema=[“subject”, “pattern”, “position”, “occurrence”, “regex_parameters”]) >>> df5.select(regexp_substr(col(“subject”), col(“pattern”), col(“position”), col(“occurrence”), col(“regex_parameters”)).alias(“case_insensitive”)).collect() [Row(CASE_INSENSITIVE=’Hello’)]

# With all parameters including group_num >>> df6 = session.create_dataframe([[“Hello (World) (Test)”, “(w+)”, 1, 1, “c”, 1]], schema=[“subject”, “pattern”, “position”, “occurrence”, “regex_parameters”, “group_num”]) >>> df6.select(regexp_substr(col(“subject”), col(“pattern”), col(“position”), col(“occurrence”), col(“regex_parameters”), col(“group_num”)).alias(“first_group”)).collect() [Row(FIRST_GROUP=’Hello’)]

# Skipping position - with occurrence only >>> df7 = session.create_dataframe([[“nevermore1, nevermore2, nevermore3.”, “nevermored”, “2”]], schema=[“subject”, “pattern”, “occurrence”]) >>> df7.select(regexp_substr(col(“subject”), col(“pattern”), occurrence=col(“occurrence”)).alias(“skip_position”)).collect() [Row(SKIP_POSITION=’nevermore2’)]

# Skipping position, occurrence - with regex_parameters only >>> df9 = session.create_dataframe([[“Hello World”, “hello”, “i”]], schema=[“subject”, “pattern”, “regex_parameters”]) >>> df9.select(regexp_substr(col(“subject”), col(“pattern”), regex_parameters=col(“regex_parameters”)).alias(“skip_to_regexp_params”)).collect() [Row(SKIP_TO_REGEXP_PARAMS=’Hello’)]

# Skipping position, occurrence, and regex_parameters - with group_num only >>> df10 = session.create_dataframe([[“Hello (world) (Test)”, “(w+)”, 1]], schema=[“subject”, “pattern”, “group_num”]) >>> df10.select(regexp_substr(col(“subject”), col(“pattern”), group_num=col(“group_num”)).alias(“skip_to_group_num”)).collect() [Row(SKIP_TO_GROUP_NUM=’Hello’)]

# Skipping position, occurrence - with regex_parameters and group_num >>> df12 = session.create_dataframe([[“Hello (World) (Test)”, “(w+)”, “c”, 1]], schema=[“subject”, “pattern”, “regex_parameters”, “group_num”]) >>> df12.select(regexp_substr(col(“subject”), col(“pattern”), regex_parameters=col(“regex_parameters”), group_num=col(“group_num”)).alias(“skip_to_params_and_group”)).collect() [Row(SKIP_TO_PARAMS_AND_GROUP=’Hello’)]