You are viewing documentation about an older version (1.47.0). View latest version

snowflake.snowpark.DataFrameAIFunctions.transcribe¶

DataFrameAIFunctions.transcribe(input_column: Union[snowflake.snowpark.column.Column, str], *, output_column: Optional[str] = None, **kwargs) → snowflake.snowpark.DataFrame[source]¶

Transcribe text from an audio file with optional timestamps and speaker labels.

Parameters:

input_column – The column (Column object or column name as string) containing FILE references to audio files. Use to_file to convert staged paths to FILE type.
output_column – The name of the output column to be appended. If not provided, a column named AI_TRANSCRIBE_OUTPUT is appended.
**kwargs – Additional options forwarded to the underlying function, e.g. timestamp_granularity.

Examples:

>>> import json
>>> # Basic transcription without timestamps
>>> _ = session.sql("CREATE OR REPLACE TEMP STAGE mystage ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE')").collect()
>>> _ = session.file.put("tests/resources/audio.ogg", "@mystage", auto_compress=False)
>>> from snowflake.snowpark.functions import col, to_file
>>> df = session.create_dataframe([["@mystage/audio.ogg"]], schema=["audio_path"])  # staged file path
>>> result_df = df.ai.transcribe(
...     input_column=to_file(col("audio_path")),
...     output_column="transcript",
... )
>>> result_df.columns
['AUDIO_PATH', 'TRANSCRIPT']
>>> result = json.loads(result_df.collect()[0]["TRANSCRIPT"])
>>> result['audio_duration'] > 120
True
>>> "glad to see things are going well" in result['text'].lower()
True

>>> # Transcription with word-level timestamps
>>> result_df = df.ai.transcribe(
...     input_column=to_file(col("audio_path")),
...     output_column="transcript",
...     timestamp_granularity='word',
... )
>>> result = json.loads(result_df.collect()[0]["TRANSCRIPT"])
>>> len(result["segments"]) > 0
True
>>> result["segments"][0]["text"].lower()
'the'
>>> 'start' in result["segments"][0] and 'end' in result["segments"][0]
True

>>> # Transcription with speaker diarization (requires a multi-speaker audio file)
>>> _ = session.file.put("tests/resources/conversation.ogg", "@mystage", auto_compress=False)
>>> df = session.create_dataframe([["@mystage/conversation.ogg"]], schema=["audio_path"])
>>> result_df = df.ai.transcribe(
...     input_column=to_file(col("audio_path")),
...     output_column="transcript",
...     timestamp_granularity='speaker',
... )
>>> result = json.loads(result_df.collect()[0]["TRANSCRIPT"])
>>> result["audio_duration"] > 100 and len(result["segments"]) > 0
True
>>> result["segments"][0]["speaker_label"]
'SPEAKER_00'
>>> 'jenny' in result["segments"][0]["text"].lower()
True
>>> 'start' in result["segments"][0] and 'end' in result["segments"][0]
True

This function or method is experimental since 1.39.0.