Snowpipe Streaming Best Practices¶
As a best practice, we recommend calling the API with fewer Snowpipe Streaming clients that write more data per second. Aggregate data from multiple sources such as IoT devices or sensors using a Java or Scala application, then call the API to load data using the Snowflake Ingest SDK at higher flow rates. The API efficiently aggregates data across multiple target tables in an account.
A single Snowpipe Streaming client can open multiple channels to send data, but the client cost is only charged per active client. The number of channels does not affect the client cost. Therefore, we recommend using multiple channels per client for performance and cost optimization.
Using the same tables for both batch and streaming ingestion can also result in reduced Snowpipe Streaming compute costs due to pre-empted file migration operations. If Automatic Clustering is also enabled on the same table that Snowpipe Streaming is inserting into, compute costs for file migration may be reduced. The clustering operation will optimize and migrate data in the same transaction.
For optimal performance in high-throughput deployments, we recommend the following actions:
Pass values for the TIME, DATE and all TIMESTAMP columns as one of the supported types from the
When creating a channel using
OpenChannelRequest.builder, set the
OnErrorOption.CONTINUEand manually check the return value from
insertRowsfor potential ingestion errors. This approach currently leads to a better performance than relying on exceptions thrown when
Keep the size of each row batch passed to
If you are loading multiple rows, using
insertRowswill be more performant and cost effective than calling
insertRowmultiple times as there is less time spent on locks.
When setting the default log level to DEBUG, make sure that the following loggers keep logging on INFO. That’s because their DEBUG output is very verbose, which can lead to a significant performance degradation.