snowflake.snowpark.functions.ai_transcribe¶
- snowflake.snowpark.functions.ai_transcribe(audio_file: Column, **kwargs) Column[source]¶
Transcribes text from an audio file with optional timestamps and speaker labels.
AI_TRANSCRIBE supports numerous languages (automatically detected), and audio can contain more than one language. Timestamps and speaker labels are extracted based on the specified timestamp granularity.
- Parameters:
audio_file – A FILE type column representing an audio file. The audio file must be on a Snowflake stage that uses server-side encryption and is accessible to the user. Use the to_file() function to create a reference to your staged file.
**kwargs –
Configuration settings specified as key/value pairs. Supported keys:
timestamp_granularity: A string specifying the desired timestamp granularity. Possible values are:
’word’: The file is transcribed as a series of words, each with its own timestamp.
’speaker’: The file is transcribed as a series of conversational “turns”, each with its own timestamp and speaker label.
If this field is not specified, the entire file is transcribed as a single segment without timestamps by default.
- Returns:
A string containing a JSON representation of the transcription result. The JSON object contains the following fields:
audio_duration: The total duration of the audio file in seconds.
text: The transcription of the complete audio file (when timestamp_granularity is not specified).
segments: An array of segments (when timestamp_granularity is set to ‘word’ or ‘speaker’). Each segment contains:
start: The start time of the segment in seconds.
end: The end time of the segment in seconds.
text: The transcription text for the segment.
speaker_label: The label of the speaker for the segment (only when timestamp_granularity is ‘speaker’). Labels are of the form “SPEAKER_00”, “SPEAKER_01”, etc.
Note
Supports languages: Arabic, Bulgarian, Cantonese, Catalan, Chinese, Czech, Dutch, English, French, German, Greek, Hungarian, Indonesian, Italian, Japanese, Korean, Latvian, Polish, Portuguese, Romanian, Russian, Serbian, Slovenian, Spanish, Swedish, Thai, Turkish, Ukrainian.
Supported audio formats: FLAC, MP3, Ogg, WAV, WebM
Maximum file size: 700 MB
Maximum duration: 60 minutes with timestamps, 120 minutes without
Examples: