External Function Concepts

This topic provides a high-level description of how external functions work.

In this Topic:

How External Functions Work


Snowflake does not call a remote service directly. Instead, Snowflake calls the remote service through a cloud provider’s native HTTPS proxy service, for example API Gateway on AWS.

The main steps to call an external function are:

  1. A user’s client program passes Snowflake a SQL statement that calls an external function.

  2. When evaluating the external function as part of the query execution, Snowflake reads the external function definition, which contains the URL of the proxy service and the name of the API integration that contains authentication information for that proxy service.

  3. Snowflake reads information from the API integration and composes an HTTP POST request that contains:

    • HTTP header information. (Details are documented in CREATE EXTERNAL FUNCTION.)

    • The data to be processed. This data is in JSON format.

    • The proxy service “resource” to use. The resource contains information about the remote service, such as the location of that service.

    • The authentication information for that proxy service resource.

    The POST request is sent to the proxy service.

  4. The proxy service receives the POST and then processes and forwards the request to the actual remote service. You can loosely think of the proxy service and resource as a “relay function” that calls the remote service.

  5. The remote service processes the data and returns the result, which is passed back through the chain to the original SQL statement.

  6. If the remote service responds with an HTTP code to signal asynchronous processing, then Snowflake sends one or more HTTP GET requests to retrieve the result from the remote service. Snowflake continues to send GET requests as long as it receives the response code to keep requesting, or until the external function times out or returns an error.

This diagram illustrates the basic steps described above. Some details for asynchronous handling are not shown.

Illustration of the flow of data from external function call to remote service.

Synchronous Remote Service

Before a user can call an external function, developers and Snowflake account administrators must configure Snowflake to access the proxy service. Typically, the steps are done in approximately the order shown below (starting from the right-hand side of the diagram above and moving leftward towards Snowflake).

  1. A developer must write the remote service, and that remote service must be exposed via the HTTPS proxy service. For example, the remote service might be a Python function running on AWS Lambda and exposed via a resource in the Amazon API Gateway.

  2. In Snowflake, an ACCOUNTADMIN or a role with the CREATE INTEGRATION privilege must create an “API integration” object that contains authentication information that enables Snowflake to communicate with the proxy service. The API integration is created with the SQL command CREATE API INTEGRATION.

  3. A Snowflake user must execute the SQL command CREATE EXTERNAL FUNCTION. The user must use a role that has USAGE privilege on the API integration and has sufficient privileges to create functions.


    The CREATE EXTERNAL FUNCTION command does not actually create an external function in the sense of loading code that will be “executed outside Snowflake”. Instead, the CREATE EXTERNAL FUNCTION command creates a database object that indirectly references the code that executes outside Snowflake. More precisely, the CREATE EXTERNAL FUNCTION command creates an object that contains:

    • The URL of the resource in the HTTPS proxy service that acts as a relay function.

    • The name of the API integration to use to authenticate to the proxy service.

    • A name that is effectively an alias for the remote service. This alias is used in SQL commands, for example SELECT MyAliasForRemoteServiceXYZ(col1) ...;

The alias in Snowflake, the HTTPS proxy service resource’s name, and the remote service’s name can all be different. (Using the same name for all three can simplify administration, however.)

Although the steps described above are the most common way of executing an external function, some variations are allowed. For example:

  • The remote service might not be the final step in the chain; the remote service could call yet another remote service to do part of the work.

  • If the remote service doesn’t accept and return JSON-formatted data, then the HTTPS proxy service’s resource (the relay function) could convert the data from JSON format to another format (and convert the returned data back to JSON).

  • Although Snowflake recommends that the remote service behave as a true function (i.e. a piece of code that accepts 0 or more input parameters and returns an output) that has no side effects and keeps no state information, this is not strictly required. The remote service could perform other tasks, for example sending alerts if a value (such as a temperature reading in the data) is dangerously high. In rare cases, the remote service might keep state information, for example the total number of alerts issued.

Asynchronous Remote Service

An asynchronous remote service involves the same components (client, Snowflake, proxy service, and remote service) and the same general steps as described above. However, the details of the HTTP requests and responses are different.

Asynchronous remote services are useful when a remote service exceeds the timeouts built into components such as the proxy service. For example, Amazon’s API Gateway has a request timeout (which currently defaults to 29 seconds).

If your remote service frequently exceeds timeouts, then you can write your remote service to be compatible with the Snowflake asynchronous handling capability. This allows Snowflake to poll the remote service until results are available.

The topic Synchronous vs. Asynchronous Remote Services describes the HTTP requests and responses that are used to implement an asynchronous remote service.


The frequency with which queries hit timeouts depends in part upon the scalability of the remote service. If your remote service times out frequently, then see also the discussion of Scalability.


The remote service, the proxy service, and any other steps between Snowflake and the remote service, must be able to handle the peak workloads sent to them.

Some cloud platform providers have default usage limits or other quotas for proxy services and remote services, which can limit the throughput of external function calls.

Larger Snowflake warehouse sizes can increase the concurrency with which requests are sent, which might exceed the proxy service’s quota.

Users can see how many times Snowflake had to retry sending request batches (due to throttling or other errors) for a query by looking at the value for Retries due to transient errors on the query profile.

Scalability of the Remote Service

Developers who write remote services should consider:

  • The frequency with which the remote service will be called.

  • The number of rows sent per call.

  • The resources required to process each row.

  • The time distribution of calls (peak vs. average).

Capacity might need to increase over time as the callers change from a few developers and testers to an entire organization. If the remote service is used by multiple organizations, capacity might need to increase as the number of organizations increases. Furthermore, as the number and diversity of organizations increase, the size and timing of workloads might become more difficult to predict.

The remote service provider is responsible for providing enough capacity to handle peak workloads. Different techniques can be used to scale the service. If the remote service is managed by the author of the remote service, then the author might need to explicitly provision the service with enough capacity to handle peaks. Alternatively, the author might decide to use a hosted auto-scaled/elastic service, such as AWS Lambda.

Remote services should return HTTP response code 429 when overloaded. If Snowflake sees HTTP 429, Snowflake scales back the rate at which it sends rows, and retries sending batches of rows that were not processed successfully.

For more information about troubleshooting scalability issues, see Scalability.

If remote service invocations time out because each individual invocation takes a long time, rather than because the system is generally overloaded, then see the description of how to build an Asynchronous Remote Service.

Scalability of the Proxy Service

The proxy service should also be scalable. Fortunately, proxy services provided by major cloud providers are generally scalable.

However, some proxy services, including Amazon API Gateway and Azure API Management, have default usage limits. When the request rate exceeds the limit, these proxy services throttle requests. If necessary, you might need to ask AWS or Azure to increase your quota on your proxy service.

Users who develop or administer external functions should remember the following platform-specific information:

Amazon API Gateway

The Amazon API Gateway is itself a managed AWS service, which auto-scales to users’ workloads. Users should be familiar with various limits of API Gateway .

The Amazon API Gateway can be configured to help scale the remote service. Specifically, the API Gateway can be configured to enable caching and/or throttling of requests to reduce the load on the remote service if needed:

Because throttling can affect timeouts and retries, users might also want to review information about how Snowflake handles timeouts and retries:

Azure API Management Service

For Azure API Management, the limits depend on the SKU chosen for the service. The limits are documented in the API Management limits section of the Azure Subscription Service Limits .

Because throttling can affect timeouts and retries, users might also want to review information about how Snowflake handles timeouts and retries:


Resource requirements depend upon the way that rows are distributed across calls (many parallel calls with a few rows each vs. one call with the same total number of rows). A system that supports high capacity does not necessarily support high concurrency, and vice-versa. You should estimate the peak concurrency required, as well as the largest reasonable individual workloads, and provide enough resources to handle both types of peaks.

Furthermore, the concurrency estimate should take into account that Snowflake can parallelize external function calls. A single query from a single user might cause multiple calls to the remote service in parallel. Several factors affect the number of concurrent calls from Snowflake to a proxy service or remote service, including:

  • The number of concurrent users who are running queries with external functions.

  • The size of each user’s query.

  • The number of servers in the virtual warehouse (i.e. the warehouse size).

  • The number of warehouse clusters.

Handling concurrency properly can be particularly complex if external functions have side effects. The results can vary depending upon the order in which user’s rows are processed. (Snowflake recommends that you avoid writing or using remote services that have side effects.)


Depending upon where the remote service is running, you might need to consider:

  • Reliability.

  • Error-handling.

  • Debugging.

  • Upgrading (if the remote service might add new features or need bug fixes).

If the remote service is not stateless, you might also need to consider recovery after failure. (Snowflake strongly recommends that remote services be stateless.)

For information about timeouts and retries, see Account for Timeout Errors and Do Not Assume that the Remote Service is Passed Each Row Exactly Once.