Snowpipe Streaming Best Practices

Cost Optimization

As a best practice, we recommend calling the API with fewer Snowpipe Streaming clients that write more data per second. Aggregate data from multiple sources such as IoT devices or sensors using a Java or Scala application, then call the API to load data using the Snowflake Ingest SDK at higher flow rates. The API efficiently aggregates data across multiple target tables in an account.

A single Snowpipe Streaming client can open multiple channels to send data, but the client cost is only charged per active client. The number of channels does not affect the client cost. Therefore, we recommend using multiple channels per client for performance and cost optimization.

Using the same tables for both batch and streaming ingestion can also result in reduced Snowpipe Streaming compute costs due to pre-empted file migration operations. If Automatic Clustering is also enabled on the same table that Snowpipe Streaming is inserting into, compute costs for file migration may be reduced. The clustering operation will optimize and migrate data in the same transaction.

Performance Recommendations

For optimal performance in high-throughput deployments, we recommend the following actions:

  • Pass values for the TIME, DATE and all TIMESTAMP columns as one of the supported types from the java.time package.

  • When creating a channel using OpenChannelRequest.builder, set the OnErrorOption to OnErrorOption.CONTINUE and manually check the return value from insertRows for potential ingestion errors. This approach currently leads to a better performance than relying on exceptions thrown when OnErrorOption.ABORT is used.

  • Keep the size of each row batch passed to insertRows below 16MB.

  • If you are loading multiple rows, using insertRows will be more performant and cost effective than calling insertRow multiple times as there is less time spent on locks.

  • When setting the default log level to DEBUG, make sure that the following loggers keep logging on INFO. That’s because their DEBUG output is very verbose, which can lead to a significant performance degradation.

    • net.snowflake.ingest.internal.apache.parquet

    • org.apache.parquet