Creating an External Function on AWS

This document shows one way to create an external function on the Amazon Web Services (AWS) cloud platform, including:

  • Creating a remote service (Lambda Function on AWS).

  • Creating a proxy service (on Amazon API Gateway).

  • Creating an API Integration in Snowflake.

  • Creating an external function in Snowflake.

  • Calling the external function.

These instructions assume that you are already familiar with AWS administration. These instructions specify the general steps that you need to execute, but do not walk you through the details of the AWS administration console because the details could change.

In this Topic:

Prerequisites

You need:

  • An account with AWS, including privileges to:

    • Create AWS roles via IAM (identity and access management).

    • Create AWS Lambda functions.

    • Create an API Gateway endpoint.

  • A Snowflake account in which you have ACCOUNTADMIN privileges or a role with the CREATE INTEGRATION privilege.

This document assumes that you are an experienced AWS administrator.

Tip

During the creation of the external function, you must copy authentication-related information from Snowflake to AWS, and from AWS to Snowflake.

As you follow the instructions below, Snowflake recommends that you record information in the worksheet below for use in later steps:

AWS Account ID.............: _____________________________________________
Lambda Function Name.......: _____________________________________________
New IAM Role Name..........: _____________________________________________
New IAM Role ARN...........: _____________________________________________
Snowflake VPC ID (optional):_____________________________________________
API Gateway Resource Name..: _____________________________________________
Resource Invocation URL....: _____________________________________________
Method Request ARN.........: _____________________________________________
API_AWS_IAM_USER_ARN.......: _____________________________________________
API_AWS_EXTERNAL_ID........: _____________________________________________

Step 1: Create the Remote Service (Lambda Function on AWS)

There are multiple possible ways to create a remote service. This section shows how to create a remote service that is implemented as a Python function executing in AWS Lambda.

This sample Python-language function merely returns its inputs.

This function accepts and returns data in the same format (JSON) that Snowflake sends and reads.

This Python function receives two parameters, event and context. The event parameter includes many sub-fields, one of which is body. The body is a dictionary that includes a key named data; the corresponding value for data is a string that holds the data that Snowflake sent in JSON format. Because AWS Lambda conveniently processes the HTTP POST request sent by Snowflake, extracts the body, and passes the body inside the event parameter, this example function does not need to parse the entire HTTP POST request.

After this Python function extracts the body as a string, the function calls a JSON library function to convert the string to a Python data structure. The function then extracts individual rows from that data structure, processes them, and returns a value for each of them.

The JSON for a typical return value from an AWS Lambda function looks like the following:

{
"statusCode": <http_status_code>,
"body":
        {
            "data":
                  [
                      [ 0, <value> ],
                      [ 1, <value> ]
                      ...
                  ]
        }
}

The data of the return value matches the format described earlier for the input data. On AWS, the convention for an HTTP-compatible service is to return the body inside a JSON object that also includes the HTTP status code.

To create this AWS Lambda function, follow the steps below:

  1. Log into the AWS Management Console, if you haven’t already.

  2. Record your AWS account ID in the template above.

    If you need to look up your AWS account ID, follow the AWS instructions.

  3. Select “Lambda”.

  4. Select “Create function”.

  5. Enter a function name.

    Record this name in the template above on the line labeled “Lambda Function Name”.

  6. Select the language to use. For this example, choose Python 3.7.

  7. Choose or create an execution role for this function.

    Select the appropriate option(s), typically “Create a new role with basic Lambda permissions”.

    (This role is separate from your cloud account role and separate from your Snowflake role(s).)

  8. Open the window into which you can paste your function’s source code.

  9. Enter the code for the function. If you have not already written your own function, you can replace the default function code with the code below, which will echo its input. You can replace or update this code later when you are ready to create a custom function.

    This sample code assumes that you are using Lambda proxy integration, as Snowflake recommends in the instructions to create the API Gateway endpoint.

    import json
    
    def lambda_handler(event, context):
    
        # 200 is the HTTP status code for "ok".
        status_code = 200
    
        # The return value will contain an array of arrays (one inner array per input row).
        array_of_rows_to_return = [ ]
    
        try:
            # From the input parameter named "event", get the body, which contains
            # the input rows.
            event_body = event["body"]
    
            # Convert the input from a JSON string into a JSON object.
            payload = json.loads(event_body)
            # This is basically an array of arrays. The inner array contains the
            # row number, and a value for each parameter passed to the function.
            rows = payload["data"]
    
            # For each input row in the JSON object...
            for row in rows:
                # Read the input row number (the output row number will be the same).
                row_number = row[0]
    
                # Read the first input parameter's value. For example, this can be a
                # numeric value or a string, or it can be a compound value such as
                # a JSON structure.
                input_value_1 = row[1]
    
                # Read the second input parameter's value.
                input_value_2 = row[2]
    
                # Compose the output based on the input. This simple example
                # merely echoes the input by collecting the values into an array that
                # will be treated as a single VARIANT value.
                output_value = ["Echoing inputs:", input_value_1, input_value_2]
    
                # Put the returned row number and the returned value into an array.
                row_to_return = [row_number, output_value]
    
                # ... and add that array to the main array.
                array_of_rows_to_return.append(row_to_return)
    
            json_compatible_string_to_return = json.dumps({"data" : array_of_rows_to_return})
    
        except Exception as err:
            # 400 implies some type of error.
            status_code = 400
            # Tell caller what this function could not handle.
            json_compatible_string_to_return = event_body
    
        # Return the return value and HTTP status code.
        return {
            'statusCode': status_code,
            'body': json_compatible_string_to_return
        }
    
  10. Optional but strongly recommended: Test your function.

    For the sample Python function provided by Snowflake, use the following test data (replace any default data with the data below):

    {
      "body":
        "{ \"data\": [ [ 0, 43, \"page\" ], [ 1, 42, \"life, the universe, and everything\" ] ] }"
    }
    

    The execution results should be similar to:

    Response:
    {
      "statusCode": 200,
      "body": "{\"data\": [[0, [\"Echoing inputs:\", 43, \"page\"]], [1, [\"Echoing inputs:\", 42, \"life, the universe, and everything\"]]]}"
    }
    ...
    

If the preceding worked correctly, you now have an AWS Lambda function that you can use as the remote service for your external function.

Step 2: Configure the Proxy Service (Amazon API Gateway) and Create the API Integration (in Snowflake)

Configuring Amazon API Gateway as the proxy service requires several steps, including:

  • Creating a new IAM role in your AWS account.

  • Creating an Amazon API Gateway endpoint and configuring it.

  • Securing Your Amazon API Gateway endpoint.

  • Creating an API integration object in Snowflake.

  • Setting up a trust relationship between Snowflake and the new IAM role.

The steps to create these are interleaved because the API integration needs information from the API Gateway (the role’s ARN), and the API Gateway needs information from the API integration (the API_AWS_EXTERNAL_ID and API_AWS_IAM_USER_ARN).

Create a New IAM Role in your AWS Account

For Snowflake to authenticate to your AWS account, a Snowflake-owned IAM (identity and access management) user must be granted permission to assume an IAM role in your AWS account. To do this, a trust relationship must be established. To establish the trust relationship, you must create an IAM role on your AWS account and configure it with the ARN of the Snowflake-owned IAM user, and you must create an API integration object in Snowflake and configure the API integration object with the information about which IAM role to assume.

  1. Create a new IAM role.

  2. When asked to select the type of trusted entity, choose “Another AWS account”.

  3. When asked to “Specify accounts that can use this role”, paste the previously-saved AWS Account ID from the template above.

  4. Click on “Next: Permissions”.

  5. Set permissions (“Attach permissions policies”) if necessary.

  6. Enter a role name.

    • Record the role name as the “New IAM Role Name”.

  7. After you create the role:

    • Record the “Role ARN” as the “New IAM Role ARN” in the template above.

Create and Configure an API in the Amazon API Gateway

Choose your Endpoint Type: Regional Endpoint vs. Private Endpoint

You access a proxy service (such as Amazon API Gateway) via a URI, sometimes referred to as an “endpoint”. For your Amazon API Gateway, you can create either:

  • A regional endpoint.

  • A private endpoint.

A regional endpoint can be accessed across regions or even across clouds. Your Snowflake instance, your proxy service, and your remote service can all be in different regions or even on different cloud platforms. For example, a Snowflake instance running on Azure could send requests to an Amazon API Gateway regional endpoint, which in turn could forward data to a remote service running on GCP.

A private endpoint can be configured to allow access from only a Snowflake VPC (Virtual Private Cloud) in the same AWS region via PrivateLink.

For more details about the different types of endpoints on AWS, see:

If you want to use a private endpoint, and you are not sure which region you are using, you can look up your region by doing either of the following:

  • Call the SQL function CURRENT_REGION(), e.g. SELECT CURRENT_REGION().

  • Check your Snowflake account hostname, which normally indicates the cloud provider and region. For more information about account hostnames, regions, and cloud providers, see Supported Regions.

To use a private endpoint, your account must meet the following requirements:

  • Business Critical (or higher) edition of Snowflake.

On AWS, Snowflake supports connecting Snowflake accounts to AWS VPCs. Private endpoints enable communication between VPCs (Virtual Private Clouds) via PrivateLink. For more information about Snowflake and AWS PrivateLink, see:

Create the API Gateway Endpoint

Before you create and configure your proxy service, choose whether to use a regional endpoint or a private endpoint. For more information, see Choose your Endpoint Type: Regional Endpoint vs. Private Endpoint.

Note

You can create an external function on non-AWS-hosted instances of Snowflake. If your virtual warehouse is on Azure or GCP (Google Cloud Platform), you can create an external function that accesses a remote service through an Amazon API Gateway.

The steps to create an API Gateway endpoint are below:

  1. If you plan to use a private endpoint, then get the VPC (Virtual Private Cloud) ID by executing the following command in the Snowflake web interface:

    select system$get_snowflake_platform_info();
    

    The output should look similar to the following:

    {"snowflake-vpc-id":["vpc-12345699"]}
    

    Record the VPC ID (e.g. “vpc-12345699”) in the “Snowflake VPC ID” field of the template above.

    (Note: You must use a VPC ID, not a VPC Endpoint ID. VPC Endpoint IDs can change over time.)

  2. In the AWS management console, select API Gateway.

  3. Select Create API.

  4. Select the type of endpoint (regional or private).

    • If you want a regional endpoint, then:

      • Find “REST API” and click on its “Build” button.

    • If you want a private endpoint, then:

      • Find “REST API private” and click on its “Build” button.

    Important

    Make sure that you choose “REST API” or “REST API private”. Do not select “HTTP API” or another option.

  5. Choose the REST protocol.

  6. Select the New API option.

  7. If you are creating a private endpoint, enter the Snowflake VPC ID recorded in the “Snowflake VPC ID” field in the template.

    (If you are not creating a private endpoint, you do not need to enter a VPC ID.)

  8. Finish creating the new API.

  9. Create a resource.

    Record the resource name as the “API Gateway Resource Name”; you will need it later.

  10. Select “Create Method” for this resource, and specify the POST option.

    The “Integration type” should be “Lambda Function”.

  11. Click the checkbox “Use Lambda Proxy Integration”.

    The JSON without Lambda proxy integration would be different from the JSON with Lambda proxy integration. For more information about Lambda proxy integration, see the AWS documentation for:

  12. In the “Lambda Function” field, paste the Lambda Function Name that you recorded earlier.

  13. Save.

  14. Select the “Deploy API” action.

  15. Select or create a stage for this function.

  16. Click on the “POST” and then record the “Invoke URL” for the POST request; put this in the “Resource Invocation URL” field in the template.

    Make sure that the invocation URL includes the name of the resource; if it doesn’t, you might have clicked on the invocation URL for the stage rather than the resource.

Secure Your Amazon API Gateway Endpoint

For an overview of securing proxy service endpoints, such as Amazon API Gateway endpoints, see Secure Your Proxy Service Endpoint.

To secure an Amazon API Gateway endpoint:

  1. Specify that the “Method Request” requires AWS_IAM authorization.

    Record the Method Request ARN in the template above.

  2. Set the resource policy for the API Gateway to specify who is authorized to invoke the gateway endpoint.

    • Regional Endpoint:

      If the endpoint is a regional endpoint, then the resource policy typically looks similar to the following (except that the Principal and Resource need to be customized):

      {
          "Version": "2012-10-17",
          "Statement":
          [
              {
              "Effect": "Allow",
              "Principal":
                  {
                  "AWS": "arn:aws:sts::<12-digit-number>:assumed-role/<external_function_role>/snowflake"
                  },
              "Action": "execute-api:Invoke",
              "Resource": "arn:aws:execute-api:us-west-2:123456789012:ljkfds890a/*/POST/MyResourceName"
              }
          ]
      }
      

      Replace the following portions of the resource policy:

      • Replace the <12-digit-number> with the 12-digit number (AWS account ID) from the New IAM Role ARN.

      • Replace the <external_function_role> with the role name from the New IAM Role Name.

        For example, if your AWS Role Name is:

        arn:aws:iam::987654321098:role/MyNewIAMRole
        

        then the result should be:

        "AWS": "arn:aws:sts::987654321098:assumed-role/MyNewIAMRole/snowflake"
        
      • The “Resource” needs to be set to the Method Request ARN for the resource’s POST command.

        Note

        Setting the Resource to the Method Request ARN specifies that the API Gateway should allow calls to only the specified resource. It is possible to specify a subset of the Method Request ARN as a prefix, which allows multiple resources to be called from the same API Gateway.

        For example, if the Method Request ARN is:

        arn:aws:execute-api:us-west-1:123456789012:a1b2c3d4e5/*/POST/MyResource
        

        then you could specify just the following prefix:

        arn:aws:execute-api:us-west-1:123456789012:a1b2c3d4e5/*/
        
    • Private Endpoint:

      If the endpoint is a private endpoint, then the resource policy needs an additional section titled “Condition” that specifies the VPC (Virtual Private Cloud) ID. (The specified VPC is the VPC used by Snowflake instances in the same region as the private endpoint.) The additional section looks similar to the following, except that the VPC varies:

      "Condition": {
          "StringEquals": {
              "aws:sourceVpc": "vpc-12345699"
          }
      }
      

      After the “Condition” section is included, a resource policy for a private endpoint typically looks similar to the following (except that the Principal, Resource, and VPC information need to be customized):

      {
          "Version": "2012-10-17",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Principal": {
                      "AWS": "arn:aws:sts::<12-digit-number>:assumed-role/<external_function_role>/snowflake"
                  },
                  "Action": "execute-api:Invoke",
                  "Resource": "arn:aws:execute-api:us-west-2:123456789012:ljkfds890a/*/POST/MyResourceName",
                  "Condition": {
                      "StringEquals": {
                          "aws:sourceVpc": "vpc-12345699"
                      }
                  }
              }
          ]
      }
      

      To create the JSON for the private endpoint, follow the steps for a regional endpoint, then do the following:

      1. Copy and paste the “Condition” section above. Put this directly below the “Resource” section.

      2. Replace the value of the “aws:sourceVpc” field with the Snowflake VPC ID for your region, which should be recorded in the “Snowflake VPC ID” field of the template.

  3. Deploy the API.

In the next few steps, you create a Snowflake API integration object. Do not close your AWS administration window now; you must return to it later.

Create an API Integration Object in Snowflake

  1. Open up (if you haven’t already) a Snowflake session, typically a Snowflake web interface session.

  2. Use a Snowflake role with ACCOUNTADMIN privileges or the CREATE INTEGRATION privilege, for example:

    use role has_accountadmin_privileges;
    
  3. Type the CREATE API INTEGRATION command to create an API integration. The command should look similar to the following:

    CREATE OR REPLACE API INTEGRATION my_api_integration_01
      api_provider = aws_api_gateway
      api_aws_role_arn = '<new_IAM_role_ARN>'
      api_allowed_prefixes = ('https://')
      enabled = true
    ;
    

    If you are using a private endpoint, the api_provider should be “aws_private_api_gateway”. Otherwise, the api_provider should be “aws_api_gateway”.

    The <new_IAM_role_ARN> should be the New IAM Role ARN that you recorded earlier in the worksheet above.

    The api_allowed_prefixes field should contain the resource invocation URL that you recorded earlier.

    You might also want to customize the name of the API integration, rather than use the value in the example.

    Below is an example of a complete CREATE API INTEGRATION statement:

    create or replace api integration demonstration_external_api_integration_01
        api_provider=aws_api_gateway
        api_aws_role_arn='arn:aws:iam::123456789012:role/my_cloud_account_role'
        api_allowed_prefixes=('https://xyz.execute-api.us-west-2.amazonaws.com/production/')
        enabled=true;
    
  4. Execute the CREATE API INTEGRATION command you typed above.

  5. Execute the DESCRIBE INTEGRATION command.

    DESCRIBE INTEGRATION <my_integration_name>;
    

    For example:

    DESCRIBE INTEGRATION my_api_integration_01;
    
  6. Look for the property named “API_AWS_IAM_USER_ARN” and then record that property’s property_value for later use.

  7. Find the property named “API_AWS_EXTERNAL_ID” and record that property’s property_value for later use.

    Note that the property_value of the API_AWS_EXTERNAL_ID often ends with an equals sign (“=”). That equals sign is part of the value; make sure that you cut and paste it along with the rest of the property_value.

In the next few steps, you will return to your AWS administration window. Do not close your Snowflake administration window now; you must return to it later.

Set Up the Trust Relationship(s) between Snowflake and the New IAM Role

In the AWS Management Console:

  1. Select IAM.

  2. Select Roles.

  3. Look for the New IAM Role Name that you created and recorded earlier, and select it.

  4. Click on the “Trust relationships” tab, then click on the button to edit the trust relationship.

    This should open the Policy Document into which you can add authentication information.

  5. In the Policy Document, find the Statement.Principal.AWS field and replace the value (not the key) with the API_AWS_IAM_USER_ARN that you saved earlier.

  6. Find the Statement.Condition field. Initially, this should contain only curly braces (“{}”).

  7. Paste the following between the curly braces: "StringEquals": { "sts:ExternalId": "xxx" }.

  8. Replace the “xxx” with the value for the API_AWS_EXTERNAL_ID that you recorded earlier.

  9. After you are done editing the “Policy Document” for the trust relationship, it should look similar to the following:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::1234567898012:user/development/development_user"
          },
          "Action": "sts:AssumeRole",
          "Condition": {"StringEquals": { "sts:ExternalId": "EXTERNAL_FUNCTIONS_SFCRole=3_8Hcmbi9halFOkt+MdilPi7rdgOv=" }}
        }
      ]
    }
    
  10. Click on the button to update the trust policy.

Step 3: Create the External Function

Now go back to your Snowflake administration window (where you earlier typed the CREATE API INTEGRATION command).

  1. Type the CREATE EXTERNAL FUNCTION command. It should look similar to the following:

    CREATE EXTERNAL FUNCTION my_external_function(n INTEGER, v VARCHAR)
        RETURNS VARIANT
        API_INTEGRATION = <api_integration_name>
        AS '<invocation_url>'
        ;
    

    The <api_integration_name> value should contain the name of the API integration that you created earlier.

    The <invocation_url> value should be the resource invocation URL you recorded earlier. Make sure that this URL includes the API Gateway resource name, not just the stage name.

    This example passes two arguments (an INTEGER and a VARCHAR ) because those are the arguments that the remote service (e.g. Lambda function) expects. When you create your own remote service, you will pass appropriate arguments for your remote service.

  2. If you have not already executed the CREATE EXTERNAL FUNCTION command that you typed above, execute it now.

Step 4: Call the External Function

  1. If appropriate, grant USAGE privilege on the external function to one or more Snowflake roles so that those roles can call the external function. (A role must have USAGE or OWNERSHIP privileges on that external function.)

  2. Execute your function by calling:

    SELECT my_external_function(99, 'Luftballoons');
    

Troubleshooting

See also the general external functions troubleshooting document.

Unable to read the requests body inside the Amazon AWS Lambda Function

Symptom:

The function is unable to read the request body in the HTTP POST command.

Possible Cause(s):

You might not have enabled lambda proxy integration.

Possible Solution(s):

Enable lambda proxy integration.

For more details, see the steps in Create and Configure an API in the Amazon API Gateway.

SQL execution error: Error assuming AWS_ROLE. Please verify the role and externalId are configured correctly in your AWS policy.

Symptom:

When trying to call the external function, you get the following error message:

SQL execution error: Error assuming AWS_ROLE. Please verify the role and externalId are configured correctly in your AWS policy.

Possible Causes:

  • In the AWS Trust Relationship Policy for your role, the AWS ARN is incorrect. Possible causes of that include:

    • You didn’t set it.

    • You set it, but you used the ARN of the AWS role (incorrect) instead of the user ARN, which you can see from the “DESCRIBE INTEGRATION” command in Snowflake. Make sure that you use the API_AWS_IAM_USER_ARN rather than the API_AWS_ROLE_ARN.

  • In your AWS Trust Relationship Policy, the std:ExternalId is incorrect. Possible causes of that include:

    • You didn’t set it.

    • You re-created the API integration object. Re-creating the API object changes its external ID.

Error: 403 '{"Message":"User: <ARN> is not authorized to perform: execute-api:Invoke on resource: <MethodRequestARN>"}'

Symptom:

When trying to call the external function, you get the following error message:

Request failed for external function <function_name>. Error: 403 '{"Message":"User: <ARN> is not authorized to perform: execute-api:Invoke on resource: <MethodRequestARN>"}'

Possible Cause(s):

  • Using the wrong IAM Role ARN in the Resource Policy for the API Gateway.

  • Using the wrong Method Request ARN in the Resource Policy for the API Gateway.

Possible Solution(s):

  • Make sure that you followed the template, but:

    • Replaced the 12-digit number with the New IAM Role ARN.

    • Replaced the <external_user_role> with the New IAM Role Name.

  • Make sure that the Resource is correct; it should be the Method Request ARN.

Error: 403 '{"Message":"User: anonymous is not authorized to perform: execute-api:Invoke on resource: <MethodRequestARN>"}'

Symptom:

When trying to call the external function, you get the following error message:

Request failed for external function <function_name>. Error: 403 '{"Message":"User: anonymous is not authorized to perform: execute-api:Invoke on resource: <MethodRequestARN>"}'

Possible Cause(s):

One possible cause is that when you were configuring authorization for the API Gateway, you might not have specified that the Method Request requires AWS_IAM authorization for the resource.

Possible Solution(s):

If you did not follow the instructions in secure the Amazon API Gateway, then please follow them now to specify AWS_IAM authorization.