Snowpipe Streaming Rowset API¶

Snowpipe Streaming Rowset API is a new set of REST APIs for interacting with the Snowpipe Streaming service. It allows you to stream varying volumes of data directly into tables in an ordered, row-oriented fashion with minimal latency, ensuring high-performance data streaming.

The Rowset API introduces thin client SDKs for programming languages like Python and Java. Support for loading from other languages and frameworks can be achieved by calling the REST APIs directly. The REST API and these thin client SDKs eliminate the need for client-heavy Java-only implementations and moves the buffering and processing of streamed data server-side to Snowflake.

The following pages describe the REST endpoints for the Rowset API (getting your rowset hostname, creating or opening a new channel, inserting a batch of rows to the given channel, dropping a channel, or getting channel status), configuring the Java SDK, and configuring the Python SDK.

To download the Python and Java SDKs for Snowpipie Streaming Rowset API, see this GitHub page。

Pipe and server-side processing¶

Snowpipe Streaming Rowset API inherits many concepts from our existing Snowpipe Streaming offering. Take a look at the Snowpipe Streaming documentation to be familiar with the concepts like clients, channels, and offset tokens.

Snowpipe Streaming Rowset API re-introduces the pipe object to support transformations, clustering, and default/autoincrement columns. The pipe object allows for the rows sent through Snowpipe Streaming to be buffered and processed server-side to apply any necessary filters or transformations before landing into the table.

Considerations¶

Data error handling

  • Validation at the SDK level

    The Rowset API performs basic data validations on the JSON format within the SDK. If any errors are found, the SDK will immediately return an Invalid row format error. However, errors that are not detected at the SDK level but occur during data processing on the Snowflake server will be handled differently.

  • Validation on the server side

    Snowflake ignores rows with errors and continues processing the remaining rows without interruption. To access the error history for rows detected on the Snowflake server, you can query the error details using the snowflake.monitoring.streaming_rowset_channel_history function, which provides the error history for a specific pipe or channel.

In summary,

  • For the Rowset API, the SDK performs basic data validations.

  • On the Snowflake server, the ON_ERROR behavior is always set to CONTINUE.

Case-sensitive and case-insensitive column names

In the Rowset API, the entire row is treated as a single variant data object, which is used to extract column definitions from the associated pipe.

  • Case-insensitive column names:

    Use the GET_IGNORE_CASE function (for example, GET_IGNORE_CASE ($1, ‘case_insensitive_cloumn_name’))

  • Case-sensitive column names:

    Use the GET function on the variant object directly.

Example table and rowset pipe creation¶

Here is an example to create a table and a pipe.

CREATE OR REPLACE TABLE MY_TABLE(data variant, c1 number, c2 string);

CREATE OR REPLACE PIPE MY_PIPE
AS
   COPY INTO MY_TABLE
   FROM TABLE (
       DATA_SOURCE (
           TYPE => 'STREAMING'
       )
   )
   MATCH_BY_COLUMN_NAME=CASE_INSENSITIVE
;
Copy

Limitations¶

The private preview Rowset API feature has the following limitations:

  • It supports AWS-only deployments.

  • The feature is tested for workloads up to 30 MB per second.

  • PrivateLink is not supported.

  • Cost is not charged during the preview.

  • Only a newline-delimited JSON format string is accepted for the insertRows requests.

  • The ‘STREAMING’ Snowpipe ON_ERROR option only supports CONTINUE.

  • Clustered tables are currently not supported.

  • Default and autoincrement columns are supported, but we recommend not using current_timestamp() or similar datetime functions as a default value. This is similar to Snowpipe recommendations.

  • Replication is not supported when using Rowset API.