Introduction to External Functions¶
This topic describes external functions, which call executable code that is developed, maintained, stored, and executed outside Snowflake.
This topic helps you:
Understand what an external function is.
Decide whether an external function is the best way for you to implement a UDF (user-defined function).
Choose the cloud platform for your external function.
In this Topic:
What is an External Function?¶
An external function calls code that executes outside Snowflake; the executed code is known as a remote service.
Information sent to a remote service is usually relayed through a proxy service. The proxy service can increase security by authenticating requests to the remote service.
Some security-related information is stored in an API integration, which is a database object stored inside Snowflake.
The diagram below shows the basic information flow from a client program, through Snowflake, and to the remote service:
Users can write and call their own remote services, or call remote services written by third parties. These remote services can be written using any HTTP server stack, including cloud serverless compute services such as AWS Lambda.
From the perspective of a user running a SQL statement, an external function behaves like any other UDF . External functions follow these rules:
External functions return a value.
External functions can accept parameters.
Inside Snowflake, an external function is represented as a database object. That object is created in a specific database and schema, and can be referenced using dot notation (e.g.
An external function can appear in any clause of a SQL statement in which other types of UDF can appear. For example:
select my_external_function_2(column_1, column_2) from table_1; select col1 from table_1 where my_external_function_3(col2) < 0; create view view1 (col1) as select my_external_function_5(col1) from table9;
An external function can be part of a more complex expression:
select upper(zipcode_to_city_external_function(zipcode)) from address_table;
The returned value can be a compound value, such as a VARIANT that contains JSON.
External functions can be overloaded; two different functions can have the same name but different signatures (different numbers or data types of input parameters).
How External Functions Work¶
Snowflake does not call a remote service directly. Instead, Snowflake calls the remote service through a cloud provider’s native HTTPS proxy service, for example API Gateway on AWS.
The main steps to call an external function are:
A user’s client program passes Snowflake a SQL statement that calls an external function.
When evaluating the external function as part of the query execution, Snowflake reads the external function definition, which contains the URL of the proxy service and the name of the API integration that contains authentication information for that proxy service.
Snowflake reads information from the API integration and composes an HTTP POST request that contains:
HTTP header information. (Details are documented in CREATE EXTERNAL FUNCTION.)
The data to be processed. This data is in JSON format.
The proxy service “resource” to use. The resource contains information about the remote service, such as the location of that service.
The authentication information for that proxy service resource.
The POST request is sent to the proxy service.
The proxy service receives the POST and then processes and forwards the request to the actual remote service. You can loosely think of the proxy service and resource as a “relay function” that calls the remote service.
The remote service processes the data and returns the result, which is passed back through the chain to the original SQL statement.
If the remote service responds with an HTTP code to signal asynchronous processing, then Snowflake sends one or more HTTP GET requests to retrieve the result from the remote service. Snowflake continues to send GET requests as long as it receives the response code to keep requesting, or until the external function times out or returns an error.
Frequently-used Terms for External Functions¶
Below are definitions for frequently-used terms:
An asynchronous remote service can be polled while the caller waits for results. Contrast with synchronous .
Asynchronous handling reduces sensitivity to timeouts.
For more information about asynchronous services, see Microsoft’s description of Aynchronous Request-Reply Pattern . (The information is not limited to Microsoft Azure.)
For more details about how users implement asynchronous behavior with Snowflake, see:
A batch is a group of 1 or more rows sent to the remote service.
Typically, when a query has a large number of rows to send to a remote service, the rows are split into batches. Batches typically allow more parallelism and faster queries. In some cases, batches reduce overloading of the remote service.
A remote service returns 1 batch of rows for each batch received. For a scalar external function, the number of rows in the returned batch is equal to the number of rows in the received batch.
Each batch has a unique batch ID, which is included in each request sent from Snowflake to the remote service.
Retry operations (e.g. due to timeouts) are typically done at the batch level.
An external function is a Snowflake construct that makes a remote service appear as though it is inside Snowflake, so that the remote service can be called by a SQL statement. In other words, an external function makes a remote service look like a local function.
An external function is a set of information that Snowflake uses to call the remote service. This information includes the URL of the proxy service (for example, Amazon API Gateway) that relays information to and from the remote service. This information is specified as part of the “CREATE EXTERNAL FUNCTION” command and is stored inside a database object.
Snowflake does not call a remote service directly; instead, Snowflake calls a remote service indirectly via a proxy service. A proxy service relays a request or function call (including data) from one piece of software to another, for example from Snowflake to a remote service. Typically, it also relays the returned value(s) back to the caller. Using a proxy service increases flexibility and security.
The Amazon API Gateway is an example of a proxy service.
A remote service is stored and executed outside Snowflake, and returns a value. For example, remote services can be implemented as:
An AWS Lambda function.
An HTTPS server (e.g. Node.js) running on an EC2 instance.
To be called by the Snowflake external function feature, the remote service must:
Expose an HTTPS endpoint.
Accept JSON inputs and return JSON outputs.
For more information about Snowflake-compatible HTTP headers and JSON formatted data, see Remote Service Input and Output Data Formats.
A synchronous remote service does not send any response until the results are ready. It can’t be polled. Contrast with asynchronous .
Synchronous code is easier to implement than asynchronous code.
A (UDF (user-defined function)) is a function written by a user or by a third party, rather than by Snowflake. A UDF can be:
Internal: the executable code is stored and executed inside Snowflake.
External: an external function, as defined in this topic (i.e. a construct that makes a remote service look like a local function).
Advantages of External Functions¶
External functions have the following advantages over other UDFs:
The code for the remote service can be written in languages that other UDFs cannot be written in, including:
Remote services can use functions and libraries that can’t be accessed by internal UDFs.
Remote services can interface with commercially available third-party libraries, such as machine-learning scoring libraries.
Developers can write remote services that can be called both from Snowflake and from other software written to use the same interface.
Limitations of External Functions¶
External functions have the following limitations:
Snowflake calls remote services indirectly through a cloud HTTP proxy service (such as the Amazon API gateway), so the remote service for an external function must be callable from a proxy service. Fortunately, almost any function that can act as an HTTPS endpoint can be accessed as an external function via a proxy service. The function author must program the proxy service to call the remote service (e.g. a function running on AWS Lambda).
Some cloud platforms might have specific requirements. For example, on AWS, external functions require regional endpoints or private endpoints. For more details, see Supported Platforms. For more details about Amazon API Gateway regional and private endpoints, see Choose your Endpoint Type: Regional Endpoint vs. Private Endpoint.
Currently, external functions cannot be shared with data consumers via Secure Data Sharing.
Because the remote service is opaque to Snowflake, the optimizer might not be able to perform some optimizations that it could perform for equivalent internal functions.
External functions have more overhead than internal functions (both built-in functions and internal UDFs) and usually execute more slowly.
Currently, external functions must be scalar functions. A scalar external function returns a single value for each input row.
External functions cannot be used in the following situations:
DEFAULTclause of a
CREATE TABLEstatement. In other words, the default value for a column cannot be an expression that calls an external function. If you try to include an external function in a
DEFAULTclause, then the
CREATE TABLEstatement fails.
A database object (e.g. table, view, UDF, or masking policy) shared via Secure Data Sharing. External functions are currently incompatible with sharing.
Only functions, not stored procedures, can be written using the external functions feature.
Billing for External Functions Usage¶
Using external functions incurs normal costs associated with:
In addition, you might need to pay indirect or third-party charges, including:
Charges by the provider of the remote service.
Charges can vary from vendor to vendor.
Additional Considerations for External Functions¶
Before an external function can be called the first time, an administrator must do some configuration work. This work requires knowledge of the cloud platform (e.g. AWS), especially security.
Writing a custom remote service typically requires knowledge of topics such as HTTP and REST APIs.
External functions can raise additional security issues. For example, if you call a third party’s function, that party could keep copies of the data passed to the function.
Platforms that Support Calling an External Function¶
An external function can be called from a Snowflake account on any cloud platform that Snowflake supports:
Amazon Web Services (AWS)
Google Cloud Platform (GCP)
The SQL syntax for calling an external function is the same on all platforms.
The SQL statements (CREATE EXTERNAL FUNCTION and CREATE API INTEGRATION) that configure access to these services are the same for all platforms. However, the clauses within these statements vary, depending upon the platforms hosting the proxy service and the remote service.
Platforms that Support Creating an External Function’s Remote Service and Proxy Service¶
Although an external function can be called from any platform, the external function’s remote service and proxy service must each be created on specific supported platforms.
In many cases, the platform and account for the remote service are the same as the platform and account for the proxy service. However, that is not required. For example, a SQL query could call an Azure Function (remote service) via an AWS API Gateway (proxy service). The SQL query itself could be running on a Snowflake instance running on GCP.
Platforms that Support a Remote Service¶
You need an HTTP server stack to host the remote service. Any HTTP server stack that can support the remote service should be compatible with external functions.
To create your remote service, you typically need:
An account with a cloud platform’s provider (e.g. a Microsoft Azure account to create an Azure Function). This account provides storage and compute services for the remote service. This account is separate from your Snowflake account.
Snowflake provides instructions for creating a remote service as:
An AWS Lambda function.
A Microsoft Azure function.
Snowflake is compatible with remote services running on:
Google Cloud Platform (GCP).
However, Snowflake does not yet support Google’s API gateway as a proxy service. To access a remote service running on GCP, you must create a proxy service on a supported platform (see below).
Platforms that Support a Proxy Service¶
You need an instance of a native HTTP proxy service on a cloud platform.
To configure your proxy service, you typically need:
An account with a cloud platform’s provider (e.g. an Amazon account to use AWS). This account provides storage and compute services for the proxy service. This account is separate from your Snowflake account.
A cloud platform role that has the privileges required to configure a proxy service. This cloud platform role is separate from your Snowflake role(s).
The following HTTPS proxy services are supported:
Amazon API Gateway.
Microsoft Azure API Management Service.
The sections below contain platform-specific information that users should be aware of before choosing a platform.
This feature supports only regional and private endpoints for the Amazon API Gateway. (For a description of the different types of endpoints, see https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-basic-concept.html)
Snowflake external functions and API integrations do not support AWS custom domains. To access an Amazon API Gateway from Snowflake, use the default URL generated by AWS, which looks similar to the following: