JSON Data Partitioning

It is important to partition the event data in your S3 bucket using logical, granular paths. Create a partitioning structure that includes identifying details such as application or location, along with the date when the event data was written to the S3 bucket. You can then copy any fraction of the partitioned data into Snowflake with a single command. You can copy data into Snowflake by the hour, day, month, or even year when you initially populate tables.

For example:

s3://bucket_name/application_one/2016/07/01/11/

s3://bucket_name/application_two/location_one/2016/07/01/14/

Where:

bucket_name

The unique S3 URI used to access your data.

application_one , application_two , location_one , etc.

Identifying details for the source of all data in the path. The data can be organized by the date when it was written. An optional 24-hour directory reduces the amount of data in each directory.

Note

S3 transmits a directory list with each COPY statement used by Snowflake, so reducing the number of files in each directory improves the performance of your COPY statements. You may even consider creating 10-15 minute increment folders in each hour.

Next: Step 1. Copy Data Into the Target Table