Snowpipe REST API¶
You interact with a pipe by making calls to REST endpoints. This topic describes the Snowpipe REST API for defining the list of files to ingest and fetching reports of the load history.
Snowflake also provides Java and Python APIs that simplify working with the Snowpipe REST API.
Data file ingestion¶
The Snowpipe API provides a REST endpoint for defining the list of files to ingest.
Endpoint: insertFiles
¶
Informs Snowflake about the files to be ingested into a table. A successful response from this endpoint means that Snowflake has recorded the list of files to add to the table. It does not necessarily mean the files have been ingested. For more details, see the response codes below.
In most cases, Snowflake inserts fresh data into the target table within a few minutes.
- method
POST
- post url
https://{account}.snowflakecomputing.com/v1/data/pipes/{pipeName}/insertFiles?requestId={requestId}
- post body
A JSON object with the following attributes:
Attribute
Required
Description
account
Yes
Account identifier for your Snowflake account.
pipeName
Yes
Case-sensitive, fully-qualified pipe name. For example,
myDatabase.mySchema.myPipe
.requestId
No
String used to track requests through the system. We recommend providing a random string with each request, e.g. a UUID.
- content-type
text/plain
application/json
- header fields
Accepts:
text/plain
orapplication/json
Authorization: BEARER <jwt_token>
For
text/plain
, the contents are the list of paths and filenames, one per line. Thesize
parameter is not allowed.For
application/json
, the contents are the list of paths, filenames, and file sizes (optional but recommended for better performance). An example payload is as follows:{ "files":[ { "path":"filePath/file1.csv", "size":100 }, { "path":"filePath/file2.csv", "size":100 } ] }
Note that if you follow our recommended best practices by partitioning your data in the stage using logical, granular paths, the path values in the payload include the complete paths to the staged files.
Note
The post can contain at most 5000 files.
Each file path given must be <= 1024 bytes long when serialized as UTF-8.
- response body
Response Codes:
200 — Success. Files added to the queue of files to ingest.
400 — Failure. Invalid request due to an invalid format, or limit exceeded.
404 — Failure.
pipeName
not recognized.This error code can also be returned if the role used when calling the endpoint does not have sufficient privileges. For more information, see Granting access privileges.
429 — Failure. Request rate limit exceeded.
500 — Failure. Internal error occurred.
Response Payload:
With a successful API request (i.e. code 200), the response payload contains the
requestId
andstatus
elements in JSON format. If an error occurs, the response payload may contain details about the error.If the COPY INTO <table> statement in the pipe definition includes the PATTERN copy option, the
unmatchedPatternFiles
attribute lists any files submitted in the header that did not match the regular expression and were therefore skipped.
Load history reports¶
The Snowpipe API provides REST endpoints for fetching load reports.
Endpoint: insertReport
¶
Retrieves a report of files submitted via insertFiles
whose contents were recently ingested into a table. Note that for large files, this may only be part of the file.
Note the following limitations for this endpoint:
The 10,000 most recent events are retained.
Events are retained for a maximum of 10 minutes.
An event occurs when data from a file submitted via insertFiles
has been committed to the table and is available to queries. The insertReport
endpoint can be thought of like the UNIX command tail. By calling this command repeatedly, it is possible to see the full history of events on a pipe over time. Note that the command must be called often enough to not miss events. How often depends on the rate files are sent to insertFiles
.
- method
GET
- get url
https://<account_identifier>.snowflakecomputing.com/v1/data/pipes/<pipeName>/insertReport?requestId=<requestId>&beginMark=<beginMark>
- header fields
Accept: text/plain or application/json
Authorization : BEARER <jwt_token>
- get body
A JSON object with the following attributes:
Attribute
Required
Description
account_identifier
Yes
Unique identifier for your Snowflake account.
The preferred format of the account identifier is as follows:
organization_name-account_name
Names of your Snowflake organization and account. For details, see Format 1 (preferred): Account name in your organization.
Alternatively, specify your account locator, along with the region and cloud platform where the account is hosted, if required. For details, see Format 2: Account locator in a region.
pipeName
Yes
Case-sensitive, fully-qualified pipe name. For example,
myDatabase.mySchema.myPipe
.requestId
No
String used to track requests through the system. We recommend providing a random string with each request, e.g. a UUID.
beginMark
Yes
Marker, returned by a previous call to
insertReport
, that can be used to reduce the number of repeated events seen when repeatedly callinginsertReport
. Note that this is a hint, and repeated events may occasionally still be returned.
- response body
Response Codes:
200 — Success. Report returned.
400 — Failure. Invalid request due to an invalid format, or limit exceeded.
404 — Failure.
pipeName
not recognized.This error code can also be returned if the role used when calling the endpoint does not have sufficient privileges. For more information, see Granting access privileges.
429 — Failure. Request rate limit exceeded.
500 — Failure. Internal error occurred.
Response Payload:
A success response (200) contains information about files that have recently been added to the table. Note that this report may only represent a portion of a large file.
For example:
{ "pipe": "TESTDB.TESTSCHEMA.pipe2", "completeResult": true, "nextBeginMark": "1_39", "files": [ { "path": "data2859002086815673867.csv", "stageLocation": "s3://mybucket/", "fileSize": 57, "timeReceived": "2017-06-21T04:47:41.453Z", "lastInsertTime": "2017-06-21T04:48:28.575Z", "rowsInserted": 1, "rowsParsed": 1, "errorsSeen": 0, "errorLimit": 1, "complete": true, "status": "LOADED" } ] }
Response Fields:
Field
Type
Description
pipe
String
The fully-qualified name of the pipe.
completeResult
Boolean
false
if an event was missed between the suppliedbeginMark
and the first event in this report history. Otherwise,true
.nextBeginMark
String
beginMark
to use on the next request to avoid seeing duplicate records. Note that this value is a hint. Duplicates can still occasionally occur.files
Array
An array of JSON objects, one object for each file that is part of the history response.
path
String
The file path relative to the stage location.
stageLocation
String
Either the stage ID (internal stage) or the S3 bucket (external stage) defined in the pipe.
fileSize
Long
File size, in bytes.
timeReceived
String
Time that this file was received for processing. Format is ISO-8601 in UTC time zone.
lastInsertTime
String
Time that data from this file was last inserted into the table. Format is ISO-8601 in UTC time zone.
rowsInserted
Long
Number of rows inserted into the target table from the file.
rowsParsed
Long
Number of rows parsed from the file. Rows with errors may be skipped.
errorsSeen
Integer
Number of errors seen in the file
errorLimit
Integer
Number of errors allowed in the file before it is considered failed (based on ON_ERROR copy option).
firstError
[1]String
Error message for the first error encountered in this file.
firstErrorLineNum
[1]Long
Line number of the first error.
firstErrorCharacterPos
[1]Long
Character position of the first error.
firstErrorColumnName
[1]String
Column name where the first error occurred.
systemError
[1]String
General error describing why the file was not processed.
complete
Boolean
Indicates whether the file was completely processed successfully.
status
String
Load status for the file:
LOAD_IN_PROGRESS: Part of the file has been loaded into the table, but the load process has not completed yet.
LOADED: The entire file has been loaded into the table.
LOAD_FAILED: The file load failed.
PARTIALLY_LOADED: Some rows from this file were loaded successfully, but others were not loaded due to errors. Processing of this file is completed.
[1] Values are only supplied for these fields when files include errors.
Endpoint: loadHistoryScan
¶
Fetches a report about ingested files whose contents have been added to table. Note that for large files, this may only be part of the file. This endpoint differs from insertReport
in that it views the history between two points in time. There is a maximum of 10,000 items returned, but multiple calls can be issued to cover the desired time range.
Important
This endpoint is rate limited to avoid excessive calls. To help avoid exceeding the rate limit (error code 429), we recommend relying more heavily on insertReport
than loadHistoryScan
. When calling loadHistoryScan
, specify the most narrow time range that includes a set of data loads. For example, reading the last 10 minutes of history every 8 minutes would work well. Trying to read the last 24 hours of history every minute will result in 429 errors indicating a rate limit has been reached. The rate limits are designed to allow each history record to be read a handful of times.
For a more comprehensive view, without these limits, Snowflake provides an Information Schema table function, COPY_HISTORY, that returns the load history of a pipe or table.
- method
GET
- get url
https://{account}.snowflakecomputing.com/v1/data/pipes/{pipeName}/loadHistoryScan?startTimeInclusive=<startTime>&endTimeExclusive=<endTime>&requestId=<requestId>
- header fields
Accept: text/plain or application/json
Authorization : BEARER <jwt_token>
- get body
A JSON object with the following attributes:
Attribute
Required
Description
account
Yes
Account identifier for your Snowflake account.
pipeName
Yes
Case-sensitive, fully-qualified pipe name. For example,
myDatabase.mySchema.myPipe
.startTimeInclusive
Yes
Timestamp in ISO-8601 format. Start of the time range to retrieve load history data.
endTimeExclusive
No
Timestamp in ISO-8601 format. End of the time range to retrieve load history data. If omitted, then CURRENT_TIMESTAMP() is used as the end of the range.
requestId
No
String used to track requests through the system. We recommend providing a random string with each request (e.g. a UUID).
- response body
Response Codes:
200 — Success. Load History scan results are returned.
400 — Failure. Invalid request due to an invalid format, or limit exceeded.
404 — Failure.
pipeName
not recognized.429 — Failure. Request rate limit exceeded.
500 — Failure. Internal error occurred.
Response Payload:
A success response (200) contains information about files that have recently been added to the table. Note that this report may only represent a portion of a large file.
For example:
{ "pipe": "TESTDB.TESTSCHEMA.pipe2", "completeResult": true, "startTimeInclusive": "2017-08-25T18:42:31.081Z", "endTimeExclusive":"2017-08-25T22:43:45.552Z", "rangeStartTime":"2017-08-25T22:43:45.383Z", "rangeEndTime":"2017-08-25T22:43:45.383Z", "files": [ { "path": "data2859002086815673867.csv", "stageLocation": "s3://mystage/", "fileSize": 57, "timeReceived": "2017-08-25T22:43:45.383Z", "lastInsertTime": "2017-08-25T22:43:45.383Z", "rowsInserted": 1, "rowsParsed": 1, "errorsSeen": 0, "errorLimit": 1, "complete": true, "status": "LOADED" } ] }
Response Fields:
Field
Type
Description
pipe
String
Fully-qualified name of the pipe.
completeResult
Boolean
false
if the report is incomplete (i.e. the number of entries in the specified time range exceeds the 10,000 entry limit). Iffalse
, the user can specify the currentrangeEndTime
value as thestartTimeInclusive
value for the next request to proceed to the next set of entries.startTimeInclusive
String
Starting timestamp (in ISO-8601 format) provided in the request.
endTimeExclusive
String
Ending timestamp (in ISO-8601 format) provided in the request.
rangeStartTime
String
Timestamp (in ISO-8601 format) of the oldest entry in the files included in the response.
rangeEndTime
String
Timestamp (in ISO-8601 format) of the latest entry in the files included in the response.
files
Array
An array of JSON objects, one object for each file that is part of the history response. Within the array, the response fields are the same as those returned in the
insertReport
response.