You are viewing documentation about an older version (1.46.0). View latest version

snowflake.snowpark.functions.regexp_instr¶

snowflake.snowpark.functions.regexp_instr(subject: Union[snowflake.snowpark.column.Column, str], pattern: Union[snowflake.snowpark.column.Column, str], position: Union[snowflake.snowpark.column.Column, str] = None, occurrence: Union[snowflake.snowpark.column.Column, str] = None, option: Union[snowflake.snowpark.column.Column, str] = None, regexp_parameters: Union[snowflake.snowpark.column.Column, str] = None, group_num: Union[snowflake.snowpark.column.Column, str] = None) → Column[source]¶

Returns the position of the specified occurrence of the regular expression pattern in the string subject. If no match is found, returns 0.

Parameters:

subject (ColumnOrName) – The string to search in.
pattern (ColumnOrName) – The regular expression pattern to search for.
position (ColumnOrName, optional) – The position in the string to start the search. Default is 1.
occurrence (ColumnOrName, optional) – The occurrence of the pattern to find. Default is 1.
option (ColumnOrName, optional) – Specifies whether to return the position of the first character of the match (0) or the position of the first character following the match (1). Default is 0.
regexp_parameters (ColumnOrName, optional) – String of one or more characters that specifies the parameters for the regular expression. Default is ‘c’ (case-sensitive).
values (Supported) –
- c: Case-sensitive matching
- i: Case-insensitive matching
- m: Multi-line mode
- e: Extract submatches
- s: Single-line mode (POSIX wildcard character . matches n)
group_num (ColumnOrName, optional) – Specifies which capture group to return the position for. Default is None, which returns the position of the entire match.

Returns:

The position of the match, or 0 if no match is found

Return type:

Column

Examples::

# Basic usage - only subject and pattern >>> from snowflake.snowpark.functions import col, lit >>> df = session.create_dataframe([[“nevermore1, nevermore2, nevermore3.”, “nevermored”]], schema=[“subject”, “pattern”]) >>> df.select(regexp_instr(col(“subject”), col(“pattern”)).alias(“basic_match”)).collect() [Row(BASIC_MATCH=1)]

# With position parameter >>> df2 = session.create_dataframe([[“Hello world”, “world”, 7]], schema=[“subject”, “pattern”, “position”]) >>> df2.select(regexp_instr(col(“subject”), col(“pattern”), col(“position”)).alias(“position_match”)).collect() [Row(POSITION_MATCH=7)]

# With position and occurrence parameters >>> df3 = session.create_dataframe([[“nevermore1, nevermore2, nevermore3.”, “nevermored”, 1, 2]], schema=[“subject”, “pattern”, “position”, “occurrence”]) >>> df3.select(regexp_instr(col(“subject”), col(“pattern”), col(“position”), col(“occurrence”)).alias(“second_occurrence”)).collect() [Row(SECOND_OCCURRENCE=13)]

# With position, occurrence, and option parameters >>> df4 = session.create_dataframe([[“Hello world”, “world”, 1, 1, 1]], schema=[“subject”, “pattern”, “position”, “occurrence”, “option”]) >>> df4.select(regexp_instr(col(“subject”), col(“pattern”), col(“position”), col(“occurrence”), col(“option”)).alias(“after_match”)).collect() [Row(AFTER_MATCH=12)]

# With position, occurrence, option, and regexp_parameters >>> df5 = session.create_dataframe([[“Hello world”, “hello”, 1, 1, 0, “i”]], schema=[“subject”, “pattern”, “position”, “occurrence”, “option”, “regexp_parameters”]) >>> df5.select(regexp_instr(col(“subject”), col(“pattern”), col(“position”), col(“occurrence”), col(“option”), col(“regexp_parameters”)).alias(“case_insensitive”)).collect() [Row(CASE_INSENSITIVE=1)]

# With all parameters including group_num >>> df6 = session.create_dataframe([[“Hello (World) (Test)”, “(w+)”, 1, 1, 0, “c”, 1]], schema=[“subject”, “pattern”, “position”, “occurrence”, “option”, “regexp_parameters”, “group_num”]) >>> df6.select(regexp_instr(col(“subject”), col(“pattern”), col(“position”), col(“occurrence”), col(“option”), col(“regexp_parameters”), col(“group_num”)).alias(“first_group”)).collect() [Row(FIRST_GROUP=1)]

# Skipping position - with occurrence only >>> df7 = session.create_dataframe([[“nevermore1, nevermore2, nevermore3.”, “nevermored”, “2”]], schema=[“subject”, “pattern”, “occurrence”]) >>> df7.select(regexp_instr(col(“subject”), col(“pattern”), occurrence=col(“occurrence”)).alias(“skip_position”)).collect() [Row(SKIP_POSITION=13)]

# Skipping position and occurrence - with option only >>> df8 = session.create_dataframe([[“Hello world”, “world”, 1]], schema=[“subject”, “pattern”, “option”]) >>> df8.select(regexp_instr(col(“subject”), col(“pattern”), option=col(“option”)).alias(“skip_position_occurrence”)).collect() [Row(SKIP_POSITION_OCCURRENCE=12)]

# Skipping position, occurrence, and option - with regexp_parameters only >>> df9 = session.create_dataframe([[“Hello World”, “hello”, “i”]], schema=[“subject”, “pattern”, “regexp_parameters”]) >>> df9.select(regexp_instr(col(“subject”), col(“pattern”), regexp_parameters=col(“regexp_parameters”)).alias(“skip_to_regexp_params”)).collect() [Row(SKIP_TO_REGEXP_PARAMS=1)]

# Skipping position, occurrence, option, and regexp_parameters - with group_num only >>> df10 = session.create_dataframe([[“Hello (world) (Test)”, “(w+)”, 1]], schema=[“subject”, “pattern”, “group_num”]) >>> df10.select(regexp_instr(col(“subject”), col(“pattern”), group_num=col(“group_num”)).alias(“skip_to_group_num”)).collect() [Row(SKIP_TO_GROUP_NUM=1)]

# Skipping position and occurrence - with option and regexp_parameters >>> df11 = session.create_dataframe([[“Hello World”, “Hello”, 1, “i”]], schema=[“subject”, “pattern”, “option”, “regexp_parameters”]) >>> df11.select(regexp_instr(col(“subject”), col(“pattern”), option=col(“option”), regexp_parameters=col(“regexp_parameters”)).alias(“skip_position_occurrence_with_params”)).collect() [Row(SKIP_POSITION_OCCURRENCE_WITH_PARAMS=6)]

# Skipping position, occurrence, and option - with regexp_parameters and group_num >>> df12 = session.create_dataframe([[“Hello (World) (Test)”, “(w+)”, “c”, 1]], schema=[“subject”, “pattern”, “regexp_parameters”, “group_num”]) >>> df12.select(regexp_instr(col(“subject”), col(“pattern”), regexp_parameters=col(“regexp_parameters”), group_num=col(“group_num”)).alias(“skip_to_params_and_group”)).collect() [Row(SKIP_TO_PARAMS_AND_GROUP=1)]