Snowpipe Streaming best practices¶
As a best practice, we recommend calling the API with fewer Snowpipe Streaming clients that write more data per second. Use a Java or Scala application to aggregate data from multiple sources, such as IoT devices or sensors, and then use the Snowflake Ingest SDK to call the API to load data at higher flow rates. The API efficiently aggregates data across multiple target tables in an account.
A single Snowpipe Streaming client can open multiple channels to send data, but the client cost is only charged per active client. The number of channels does not affect the client cost. Therefore, we recommend using multiple channels per client for performance and cost optimization.
Using the same tables for both batch and streaming ingestion can also result in reduced Snowpipe Streaming compute costs due to pre-empted file migration operations. If Automatic Clustering is also enabled on the same table that Snowpipe Streaming is inserting into, compute costs for file migration may be reduced. The clustering operation will optimize and migrate data in the same transaction.
For optimal performance in high-throughput deployments, we recommend the following actions:
Pass values for the TIME, DATE, and all TIMESTAMP columns as one of the supported types from the
When creating a channel using
OpenChannelRequest.builder, set the
OnErrorOption.CONTINUE, and manually check the return value from
insertRowsfor potential ingestion errors. This approach currently leads to a better performance than relying on exceptions thrown when
Keep the size of each row batch passed to
insertRowsbelow 16 MB.
If you are loading multiple rows, using
insertRowswill be more performant and cost effective than calling
insertRowmultiple times because less time is spent on locks.
When setting the default log level to DEBUG, make sure that the following loggers keep logging on INFO: their DEBUG output is very verbose, which can lead to a significant performance degradation.