Error handling in high-performance Snowpipe Streaming¶

This topic outlines the error handling mechanisms available in the high-performance edition of Snowpipe Streaming. This enhanced approach provides detailed error information and improves the overall error handling process for a more robust and informative experience.

Key error handling features in the high-performance architecture¶

  • Enhanced channel status endpoint: This edition extends the channel status endpoint to provide more comprehensive error information.

  • Granular error details: The high-performance edition provides more detailed error information to help identify where it occurred and find the root causes of ingestion issues.

  • Improved client experience: The high-performance edition simplifies error handling for clients, reducing the complexity of error reasoning and recovery.

  • New channel history view: Snowflake also introduces SNOWPIPE_STREAMING_CHANNEL_HISTORY view, providing a historical record of channel activity to monitor and locate errors. This feature enables you to track error trends and proactively address potential issues.

Channel status endpoint details¶

The high-performance architecture introduces a new channel status endpoint to provide more detailed information. This endpoint returns point-in-time information about a channel.

In addition to the channel status information (statusCode, persistedOffsetToken) for the classic architecture, the high-performance architecture includes the following:

  • channel_status_code: Represents the current operational status of the streaming channel. This code provides a high-level indication of the channel’s health and ability to ingest data.

  • last_commited_offset_token: Indicates the offset token of the last row set that was successfully committed to the target table by Snowflake. This is crucial for tracking progress and ensuring data delivery.

  • created_on_ms: The timestamp, in milliseconds, indicating when the streaming channel was initially created within Snowflake.

  • database_name: The name of the database to which the streaming channel is configured to ingest data.

  • schema_name: The name of the schema within the specified database where the target table for the streaming channel resides.

  • pipe_name: The name of the Snowpipe object that is configured to utilize this Snowpipe Streaming channel for data ingestion into a specific target table.

  • channel_name: A user-created name for the specific Snowpipe Streaming channel instance.

  • rows_inserted: A count of the total number of data rows that have been successfully inserted into the target table through this streaming channel since its creation.

  • rows_parsed: A count of the total number of data rows that have been processed and parsed by the Snowpipe Streaming service for this channel. (but not necessarily inserted, for example, due to errors).

  • rows_error_count: A count of the total number of data rows that encountered errors during processing and were therefore rejected by the Snowpipe Streaming service for this channel.

  • last_error_offset_upper_bound: The upper bound of the offset token range of the last rowset that contained errors. This helps in identifying the approximate location of the most recent errors within the data stream.

  • last_error_message: A human-readable message corresponding to the latest error code.

  • last_error_timestamp: The timestamp indicating when the most recent error occurred on this streaming channel.

  • snowflake_avg_processing_latency_ms: The average latency, in milliseconds, observed by the Snowflake service in processing rowsets received by this channel. This metric provides insight into the performance of the ingestion pipeline within Snowflake.

Error-handling flow in the high-performance architecture¶

  • Client sends data: The client application uses the Snowpipe Streaming SDK to send data to Snowflake via the appendRow(s) API.

  • Server processing: The Snowflake service processes the data. This involves:

    • Buffering the data.

    • Parsing and validating the data.

    • Committing the data to the table.

  • Error detection: Errors can occur during any of the server-side processing stages.

  • Error recording: Snowflake records detailed information about the last occurred error, including:

    • The upper bound of the offset token range of the last rowset that contained errors. This helps in identifying the approximate location of the most recent errors within the data stream.

    • An error message.

    • A timestamp.

  • Error reporting:

    • The enhanced channel status endpoint provides access to the recorded error information.

    • Clients can query this endpoint to retrieve last occurred error details.

    • Snowflake also introduces SNOWPIPE_STREAMING_CHANNEL_HISTORY view, providing a historical record of channel activity, including errors and their offsets.

  • Client action: The client application uses the error information to:

    • Identify the cause of the error.

    • Implement appropriate error handling logic, such as:

      • Retrying the failed operation.

      • Logging the error.

      • Alerting an administrator.

      • Moving the erroneous data to a dead-letter queue.

      • Reopening channels.