Creating an External Function on AWS Using the Web Interface

This document shows one way to create an external function on the Amazon Web Services (AWS) cloud platform, including:

  • Creating a remote service (Lambda Function on AWS).

  • Creating a proxy service (on Amazon API Gateway).

  • Creating an API Integration in Snowflake.

  • Creating an external function in Snowflake.

  • Calling the external function.

These instructions assume that you are already familiar with AWS administration. These instructions specify the general steps that you need to execute, but do not walk you through the details of the AWS Management Console because the details could change.

In this Topic:

Planning Your External Function on AWS

Prerequisites for Creating an External Function on AWS

You need:

  • An account with AWS, including privileges to:

    • Create AWS roles via IAM (identity and access management).

    • Create AWS Lambda functions.

    • Create an API Gateway endpoint.

  • A Snowflake account in which you have ACCOUNTADMIN privileges or a role with the CREATE INTEGRATION privilege.

This document assumes that you are an experienced AWS administrator.

Worksheet

As you create your external function, you should record specific information that you enter (e.g. the API Gateway URL) so that you can use that information in subsequent steps. The worksheet below helps you track this information.

==============================================================================================
=========================================== Worksheet ========================================
==============================================================================================

------------------ Information about the Lambda Function (remote service) --------------------

AWS Account ID.............: _____________________________________________

Lambda Function Name.......: _____________________________________________


---------------------- Information about the API Gateway (proxy Service) ---------------------

New IAM Role Name..........: _____________________________________________

New IAM Role ARN...........: _____________________________________________

Snowflake VPC ID (optional): _____________________________________________

New API Name...............: _____________________________________________

API Gateway Resource Name..: _____________________________________________

Resource Invocation URL....: _____________________________________________

Method Request ARN.........: _____________________________________________


------------ Information about the API Integration and External Function ---------------------

API Integration Name.......: _____________________________________________

API_AWS_IAM_USER_ARN.......: _____________________________________________

API_AWS_EXTERNAL_ID........: _____________________________________________

External Function Name.....: _____________________________________________

Additional Resources for Building External Functions on AWS (Optional Reading)

When you are ready to create your own remote service for your own external function, you might want to look at the examples of remote services based on Lambda functions that are available at:

Step 1: Create the Remote Service (Lambda Function on AWS)

There are multiple possible ways to create a remote service. This section shows how to create a remote service that is implemented as a Python function executing in AWS Lambda.

This external function is synchronous. For information about creating an asynchronous external function, see Creating an Asynchronous Function on AWS.

This sample Python-language function merely returns its input(s). The inputs are returned as a SQL VARIANT value, which may contain 1 or more values.

This function accepts and returns data in the same format (JSON) that Snowflake sends and reads. (For more information about data formats, see Remote Service Input and Output Data Formats .)

This Python function receives two parameters, event and context. The event parameter includes many sub-fields, one of which is body. The body is a dictionary that includes a key named data; the corresponding value for data is a string that holds the data that Snowflake sent in JSON format. Because AWS Lambda conveniently processes the HTTP POST request sent by Snowflake, extracts the body, and passes the body inside the event parameter, this example function does not need to parse the entire HTTP POST request.

After this Python function extracts the body as a string, the function calls a JSON library function to convert the string to a Python data structure. The function then extracts individual rows from that data structure, processes them, and returns a value for each of them.

The JSON for a typical return value from an AWS Lambda function looks like the following:

{
"statusCode": <http_status_code>,
"body":
        {
            "data":
                  [
                      [ 0, <value> ],
                      [ 1, <value> ]
                      ...
                  ]
        }
}

The data of the return value matches the format described earlier for the input data. On AWS, the convention for an HTTP-compatible service is to return the body inside a JSON object that also includes the HTTP status code.

To create this AWS Lambda function, follow the steps below:

  1. Log into the AWS Management Console, if you haven’t already.

  2. Record your AWS account ID in the “AWS Account ID” field in the worksheet.

    If you need to look up your AWS account ID, follow the AWS instructions.

  3. Select Lambda.

  4. Select Create function.

  5. Enter a function name.

    Record this name in the “Lambda Function Name” field in the worksheet.

  6. Select the language to use. For this example, choose Python 3.7.

  7. Choose or create an execution role for this function.

    Select the appropriate option(s), typically Create a new role with basic Lambda permissions.

    (This role is separate from your cloud account role and separate from your Snowflake role(s).)

  8. Click on the Create Function button.

  9. In the lambda_function tab, enter the code for the function. If you have not already written your own function, you can replace the default function code with the code below, which echoes its input. You can replace or update this code later when you are ready to create a custom function.

    This sample code assumes that you are using Lambda proxy integration, as Snowflake recommends in the instructions to create the API Gateway endpoint.

    If you cannot paste into the edit window, try double-clicking on the function’s file name to enable editing.

    import json
    
    def lambda_handler(event, context):
    
        # 200 is the HTTP status code for "ok".
        status_code = 200
    
        # The return value will contain an array of arrays (one inner array per input row).
        array_of_rows_to_return = [ ]
    
        try:
            # From the input parameter named "event", get the body, which contains
            # the input rows.
            event_body = event["body"]
    
            # Convert the input from a JSON string into a JSON object.
            payload = json.loads(event_body)
            # This is basically an array of arrays. The inner array contains the
            # row number, and a value for each parameter passed to the function.
            rows = payload["data"]
    
            # For each input row in the JSON object...
            for row in rows:
                # Read the input row number (the output row number will be the same).
                row_number = row[0]
    
                # Read the first input parameter's value. For example, this can be a
                # numeric value or a string, or it can be a compound value such as
                # a JSON structure.
                input_value_1 = row[1]
    
                # Read the second input parameter's value.
                input_value_2 = row[2]
    
                # Compose the output based on the input. This simple example
                # merely echoes the input by collecting the values into an array that
                # will be treated as a single VARIANT value.
                output_value = ["Echoing inputs:", input_value_1, input_value_2]
    
                # Put the returned row number and the returned value into an array.
                row_to_return = [row_number, output_value]
    
                # ... and add that array to the main array.
                array_of_rows_to_return.append(row_to_return)
    
            json_compatible_string_to_return = json.dumps({"data" : array_of_rows_to_return})
    
        except Exception as err:
            # 400 implies some type of error.
            status_code = 400
            # Tell caller what this function could not handle.
            json_compatible_string_to_return = event_body
    
        # Return the return value and HTTP status code.
        return {
            'statusCode': status_code,
            'body': json_compatible_string_to_return
        }
    
  10. Click on the Deploy button to deploy the function.

  11. Optional but strongly recommended: Test your function.

    For the sample Python function provided by Snowflake, use the following test data (replace any default data with the data below):

    {
      "body":
        "{ \"data\": [ [ 0, 43, \"page\" ], [ 1, 42, \"life, the universe, and everything\" ] ] }"
    }
    

    The execution results should be similar to:

    {
      "statusCode": 200,
      "body": "{\"data\": [[0, [\"Echoing inputs:\", 43, \"page\"]], [1, [\"Echoing inputs:\", 42, \"life, the universe, and everything\"]]]}"
    }
    

If the preceding worked correctly, you now have an AWS Lambda function that you can use as the remote service for your external function.

Step 2: Configure the Proxy Service (Amazon API Gateway) and Create the API Integration (in Snowflake)

Configuring Amazon API Gateway as the proxy service requires several steps, including:

  • Creating a new IAM role in your AWS account.

  • Creating an Amazon API Gateway endpoint and configuring it.

  • Securing your Amazon API Gateway endpoint.

  • Creating an API Integration object in Snowflake.

  • Setting up a trust relationship between Snowflake and the new IAM role.

The steps to create these are interleaved because the API integration needs information from the API Gateway (the role’s ARN (Amazon Resource Name)), and the API Gateway needs information from the API integration (the API_AWS_EXTERNAL_ID and API_AWS_IAM_USER_ARN).

Create a New IAM Role in your AWS Account

For Snowflake to authenticate to your AWS account, a Snowflake-owned IAM (identity and access management) user must be granted permission to assume an IAM role in your AWS account. To do this, a trust relationship must be established. To establish the trust relationship, you must create an IAM role on your AWS account and configure it with the ARN of the Snowflake-owned IAM user, and you must create an API integration object in Snowflake and configure the API integration object with the information about which IAM role to assume.

  1. Create a new IAM role.

  2. When asked to select the type of trusted entity, choose Another AWS account.

  3. When asked to Specify accounts that can use this role, paste the value from the “AWS Account ID” field of the worksheet.

  4. Click on Next: Permissions.

  5. Set permissions (Attach permissions policies) if necessary.

  6. Enter a role name.

    • Record the role name in the “New IAM Role Name” field in the worksheet.

  7. Click on the Create role button. After you create the role:

    • Record the Role ARN in the “New IAM Role ARN” field in the worksheet.

Create and Configure an API in the Amazon API Gateway

Choose your Endpoint Type: Regional Endpoint vs. Private Endpoint

You access a proxy service (such as Amazon API Gateway) via a URI, often referred to as an endpoint. For your Amazon API Gateway, you can create either:

  • A regional endpoint.

  • A private endpoint.

A regional endpoint can be accessed across regions or even across clouds. Your Snowflake instance, your proxy service, and your remote service can all be in different regions or even on different cloud platforms. For example, a Snowflake instance running on Azure could send requests to an Amazon API Gateway regional endpoint, which in turn could forward data to a remote service running on GCP.

A private endpoint can be configured to allow access from only a Snowflake VPC (Virtual Private Cloud) in the same AWS region via PrivateLink.

For more details about the types of endpoints on AWS, see:

If you want to use a private endpoint, and you are not sure which region you are using, you can look up your region by doing either of the following:

  • Call the SQL function CURRENT_REGION() (e.g. SELECT CURRENT_REGION()).

  • Check your Snowflake account hostname, which normally indicates the cloud provider and region. For more information about account hostnames, regions, and cloud providers, see Supported Cloud Regions.

To use a private endpoint, your account must meet the following requirements:

  • Business Critical (or higher) edition of Snowflake.

On AWS, Snowflake supports connecting Snowflake accounts to AWS VPCs. Private endpoints enable communication between VPCs (Virtual Private Clouds) via PrivateLink. For more information about Snowflake and AWS PrivateLink, see:

Create the API Gateway Endpoint

Before you create and configure your API Gateway, choose whether to use a regional endpoint or a private endpoint. For more information, see Choose your Endpoint Type: Regional Endpoint vs. Private Endpoint.

Note

You can create an external function on non-AWS-hosted instances of Snowflake. If your virtual warehouse is on Azure or GCP (Google Cloud Platform), you can create an external function that accesses a remote service through an Amazon API Gateway.

The steps to create an API Gateway endpoint are below:

  1. If you plan to use a private endpoint, then get the VPC (Virtual Private Cloud) ID by executing the following command in the Snowflake web interface:

    select system$get_snowflake_platform_info();
    

    The output should look similar to the following:

    {"snowflake-vpc-id":["vpc-12345699"]}
    

    Record the VPC ID (e.g. “vpc-12345699”) in the “Snowflake VPC ID” field of the worksheet.

    (Note: You must use a VPC ID, not a VPC Endpoint ID. VPC Endpoint IDs can change over time.)

  2. In the AWS management console, select API Gateway.

  3. Select Create API.

  4. Select the type of endpoint (regional or private).

    • If you want a regional endpoint, then:

      • Find REST API and click on its Build button.

    • If you want a private endpoint, then:

      • Find REST API private and click on its Build button.

    Important

    Make sure that you choose REST API or REST API private. Do not select HTTP API or another option.

  5. Select the New API option.

  6. Enter a name for the new API.

    Record this name in the “New API Name” field in the worksheet.

  7. If asked to select select an Endpoint Type, select either Regional or Private.

  8. If you are creating a private endpoint, enter the Snowflake VPC ID recorded in the “Snowflake VPC ID” field in the worksheet.

    (If you are not creating a private endpoint, you do not need to enter a VPC ID.)

  9. Click on the Create API button.

    This should take you to a new screen that allows you to create resources.

  10. Create a resource.

    (You might need to click the Actions button to see the Create Resource option.)

    Record the resource name in the “API Gateway Resource Name” field of the worksheet.

    After you click the Create Resource button, the screen should display No methods defined for the resource.

  11. Create a new method. Click Actions and select Create Method for this resource.

    Specify the POST option. (If you do not see the POST option, click on the small drop-down menu box under the resource name.)

    This should display a new window pane that shows Integration type and other options.

  12. The Integration type should be Lambda Function. If that is not already selected, then select it.

  13. Click the checkbox Use Lambda Proxy integration.

    It is important to select Lambda proxy integration because the JSON without Lambda proxy integration would be different from the JSON with Lambda proxy integration. For more information about Lambda proxy integration, see the AWS documentation for:

  14. In the Lambda Function field, paste the Lambda Function Name that you recorded in the worksheet.

  15. Save.

  16. Click on the Actions button, and select the Deploy API action.

  17. Select or create a stage for this function.

  18. Underneath the resource name, you should see POST.

    If you do not see this, you might need to expand the resource tree by clicking on the triangle that is to the left of the resource name.

  19. Click on POST, and then record the Invoke URL for the POST request in the “Resource Invocation URL” field in the worksheet.

    Make sure that the invocation URL includes the name of the resource that you created; if it doesn’t, you might have clicked on the invocation URL for the stage rather than the resource.

  20. Click on Save Changes.

Secure Your Amazon API Gateway Endpoint

For an overview of securing proxy service endpoints, such as Amazon API Gateway endpoints, see Secure the Proxy Service.

To secure an Amazon API Gateway endpoint:

  1. At this point, you should be on the screen that displays your API Gateway information, and you should see your resource and POST method.

    If you are not already there, do the following:

    1. In the AWS Management Console, go to the API Gateway page.

    2. Select your API Gateway if you have not already done so.

    3. In the left-hand pane, click on Resources.

    4. Click on the POST method. (If you don’t see this, expand the resource tree by clicking on the triangle to the left of the resource in the Resources pane, which is usually the second pane from the left.)

  2. Copy the Method Request ARN from the Method Request box to the “Method Request ARN” field in the worksheet.

  3. Click on the title Method Request.

  4. Specify that the Method Request requires AWS_IAM authorization.

    Note that after selecting AWS_IAM authorization from the drop-down menu, you must click on the small checkmark next to the menu to confirm your choice.

  5. Set the resource policy for the API Gateway to specify who is authorized to invoke the gateway endpoint.

    To reach the editing window that allows you to enter a resource policy, you might need to click on Resource Policy in the left-hand column of the window for the API.

    • Regional Endpoint:

      Paste the JSON-formatted resource policy template below into the resource policy editor, then replace the placeholders with the appropriate values from the worksheet, as described below.

      {
          "Version": "2012-10-17",
          "Statement":
          [
              {
              "Effect": "Allow",
              "Principal":
                  {
                  "AWS": "arn:aws:sts::<12-digit-number>:assumed-role/<external_function_role>/snowflake"
                  },
              "Action": "execute-api:Invoke",
              "Resource": "<method_request_ARN>"
              }
          ]
      }
      

      Replace the following portions of the resource policy:

      • Replace the <12-digit-number> with the AWS account ID, which you recorded in the worksheet.

      • Replace the <external_function_role> with the role name from the “New IAM Role Name” field in the worksheet.

        For example, if your AWS Role Name is:

        arn:aws:iam::987654321098:role/MyNewIAMRole
        

        then the result should be:

        "AWS": "arn:aws:sts::987654321098:assumed-role/MyNewIAMRole/snowflake"
        
      • Replace the <method_request_ARN> with the value in the “Method Request ARN” field of the worksheet. This is the ARN of the resource’s POST method.

        Note

        Setting the Resource to the Method Request ARN specifies that the API Gateway should allow calls to only the specified resource. It is possible to specify a subset of the Method Request ARN as a prefix, which allows multiple resources to be called from the same API Gateway.

        For example, if the Method Request ARN is:

        arn:aws:execute-api:us-west-1:123456789012:a1b2c3d4e5/*/POST/MyResource
        

        then you could specify just the following prefix:

        arn:aws:execute-api:us-west-1:123456789012:a1b2c3d4e5/*/
        
    • Private Endpoint:

      Paste the resource policy template below into the resource policy editor, then replace the placeholders with the appropriate values from the worksheet, as described below.

      {
          "Version": "2012-10-17",
          "Statement": [
              {
                  "Effect": "Allow",
                  "Principal": {
                      "AWS": "arn:aws:sts::<12-digit-number>:assumed-role/<external_function_role>/snowflake"
                  },
                  "Action": "execute-api:Invoke",
                  "Resource": "<method_request_ARN>",
                  "Condition": {
                      "StringEquals": {
                          "aws:sourceVpc": "<VPC_ID>"
                      }
                  }
              }
          ]
      }
      

      Replace the following portions of the resource policy:

      • Replace the <12-digit-number>, <external_function_role> and <method_request_ARN> as described above for a regional endpoint.

      • Replace the <VPC_ID> with the Snowflake VPC ID for your region, which should be recorded in the “Snowflake VPC ID” field of the worksheet.

  6. If you have not already clicked on Save to save the resource policy, do so now.

  7. Deploy the updated API.

In the next few steps, you create a Snowflake API integration object. Do not close your AWS administration window now; you must return to it later.

Create an API Integration Object in Snowflake

  1. Open up (if you haven’t already) a Snowflake session, typically a Snowflake web interface session.

  2. Use a Snowflake role with ACCOUNTADMIN privileges or the CREATE INTEGRATION privilege, for example:

    use role has_accountadmin_privileges;
    
  3. Type the CREATE API INTEGRATION command to create an API integration. The command should look similar to the following:

    CREATE OR REPLACE API INTEGRATION my_api_integration_01
      api_provider = aws_api_gateway
      api_aws_role_arn = '<new_IAM_role_ARN>'
      api_allowed_prefixes = ('https://')
      enabled = true;
    

    Customize the command:

    • If you are using a private endpoint, the api_provider clause should be set to aws_private_api_gateway. Otherwise, the api_provider clause should be set to aws_api_gateway.

    • The <new_IAM_role_ARN> should be the value in the “New IAM Role ARN” field in the worksheet.

    • The api_allowed_prefixes field should contain the resource invocation URL that you recorded earlier.

    • You might also want to customize the name of the API integration, rather than use the value in the example.

    Below is an example of a complete CREATE API INTEGRATION statement:

    create or replace api integration demonstration_external_api_integration_01
        api_provider=aws_api_gateway
        api_aws_role_arn='arn:aws:iam::123456789012:role/my_cloud_account_role'
        api_allowed_prefixes=('https://xyz.execute-api.us-west-2.amazonaws.com/production/')
        enabled=true;
    
  4. In the worksheet field titled “API Integration Name”, record the name of the API integration that you created. You need the API integration name when you execute the CREATE EXTERNAL FUNCTION command later.

  5. Execute the CREATE API INTEGRATION command you typed above.

  6. Execute the DESCRIBE INTEGRATION command.

    DESCRIBE INTEGRATION <my_integration_name>;
    

    For example:

    DESCRIBE INTEGRATION my_api_integration_01;
    
  7. Look for the property named API_AWS_IAM_USER_ARN and then record that property’s property_value in the worksheet.

  8. Find the property named API_AWS_EXTERNAL_ID and record that property’s property_value in the worksheet.

    Note that the property_value of the API_AWS_EXTERNAL_ID often ends with an equals sign (“=”). That equals sign is part of the value; make sure that you cut and paste it along with the rest of the property_value.

For the next few steps, you return to your AWS administration window. Do not close your Snowflake administration window now; you must return to it later.

Set Up the Trust Relationship(s) between Snowflake and the New IAM Role

In the AWS Management Console:

  1. Select IAM.

  2. Select Roles.

  3. In the worksheet, look up the value in the “New IAM Role Name” field, then look for the same value (role name) in the AWS Management Console.

  4. Click on the Trust relationships tab, then click on the button Edit trust relationship.

    This should open the Policy Document into which you can add authentication information.

  5. In the Policy Document, find the Statement.Principal.AWS field and replace the value (not the key) with the value in the “API_AWS_IAM_USER_ARN” field of the worksheet.

  6. Find the Statement.Condition field. Initially, this should contain only curly braces (“{}”).

  7. Paste the following between the curly braces: "StringEquals": { "sts:ExternalId": "xxx" }.

  8. Replace the xxx with the value for the “API_AWS_EXTERNAL_ID” field in the worksheet.

  9. After you are done editing the Policy Document for the trust relationship, it should look similar to the following:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::1234567898012:user/development/development_user"
          },
          "Action": "sts:AssumeRole",
          "Condition": {"StringEquals": { "sts:ExternalId": "EXTERNAL_FUNCTIONS_SFCRole=3_8Hcmbi9halFOkt+MdilPi7rdgOv=" }}
        }
      ]
    }
    
  10. Click on Update Trust Policy.

Step 3: Create the External Function

Now go back to the Snowflake web interface (where you earlier typed the CREATE API INTEGRATION command).

  1. Type the CREATE EXTERNAL FUNCTION command. It should look similar to the following:

    CREATE EXTERNAL FUNCTION my_external_function(n INTEGER, v VARCHAR)
        RETURNS VARIANT
        API_INTEGRATION = <api_integration_name>
        AS '<resource_invocation_url>';
    

    Customize the command:

    • The <api_integration_name> value should contain the name of the API integration that you created earlier.

    • The <resource_invocation_url> value should be the resource invocation URL you recorded in the worksheet. Make sure that this URL includes the API Gateway resource name, not just the stage name.

    • You might also want to customize the function name.

    This example passes two arguments (an INTEGER and a VARCHAR ) because those are the arguments that the Lambda function expects. When you create your own Lambda function, you must pass appropriate arguments for your Lambda function.

  2. In the worksheet, record the name of the external function that you created in the field titled “External Function Name”.

  3. If you have not already executed the CREATE EXTERNAL FUNCTION command that you typed above, execute it now.

Step 4: Call the External Function

  1. If appropriate, grant USAGE privilege on the external function to one or more Snowflake roles so that those roles can call the external function. (A role must have USAGE or OWNERSHIP privileges on that external function.)

  2. Execute your function by calling:

    SELECT my_external_function(42, 'Adams');
    

    If you customized the function name in the CREATE EXTERNAL FUNCTION command, then replace “my_external_function” with the cutomized name.

    The returned value should be similar to:

    [0, 42, "Adams"]
    

    where 42, "Adams" is the returned value and 0 is the row number of the returned value.

Creating an Asynchronous Function on AWS (Optional Reading)

To keep the tutorial simple, the tutorial creates a synchronous sample external function. More sophisticated external functions can benefit from being asynchronous.

This section of the documentation provides information about creating an asynchronous external function on AWS. (Before implementing your first asynchronous external function, you might want to read the conceptual overview of asynchronous external functions.)

On AWS, asynchronous remote services must overcome the following restrictions:

  • Because the HTTP POST and GET are separate requests, the remote service must keep information about the workflow launched by the POST request so that the state can later be queried by the GET request.

    Typically, each HTTP POST and HTTP GET invokes a separate instance of the handler function(s) in a separate process or thread. The separate instances do not share memory. In order for the GET handler to read the status or the processed data, the GET handler must access a shared storage resource that is available on AWS.

  • The only way for the POST handler to send the initial HTTP 202 response code is via a return statement (or equivalent), which terminates the execution of the handler. Therefore, prior to returning HTTP 202, the POST handler must launch an independent process (or thread) to do the actual data processing work of the remote service. This independent process typically needs access to the storage that is visible to the GET handler.

One way for an asynchronous remote service to overcome these restrictions is to use 3 processes (or threads) and shared storage:

Illustration of processes for an Asynchronous Remote Service

In this model, the processes have the following responsibilities:

  • The HTTP POST handler:

    • Reads the input data. In a Lambda Function, this is read from the body of the handler function’s event input parameter.

    • Reads the batch ID. In a Lambda Function, this is read from the header of the event input parameter.

    • Starts the data processing process, and passes it the data and the batch ID. The data is usually passed during the call, but could be passed by writing it to external storage.

    • Records the batch ID in shared storage that both the data processing process and the HTTP GET handler process can access.

    • If needed, records that the processing of this batch has not yet finished.

    • Returns HTTP 202 if no error was detected.

  • The data processing code:

    • Reads the input data.

    • Processes the data.

    • Makes the result available to the GET handler (either by writing the result data to shared storage, or by providing an API through which to query the results).

    • Typically, updates this batch’s status (e.g. from IN_PROGRESS to SUCCESS) to indicate that the results are ready to be read.

    • Exits. Optionally, this process can return an error indicator. Snowflake does not see this directly (Snowflake sees only the HTTP return codes from the POST handler and GET handler), but returning an error indicator from the data processing process might help during debugging.

  • The GET handler:

    • Reads the batch ID. In a Lambda Function, this is read from the header of the event input parameter.

    • Reads the storage to get the current status of this batch (e.g. IN_PROGRESS or SUCCESS).

    • If the processing is still in progress, then return 202.

    • If the processing has finished successfully, then:

      • Read the results.

      • Clean up storage.

      • Return the results along with HTTP code 200.

    • If the stored status indicates an error, then:

      • Clean up storage.

      • Return an error code.

    Note that the GET handler might be called multiple times for a batch if the processing takes long enough that multiple HTTP GET requests are sent.

There are many possible variations on this model. For example:

  • The batch ID and status could be written at the start of the data processing process rather than at the end of the POST process.

  • The data processing could be done in a separate function (e.g. a separate Lambda function) or even as a completely separate service.

  • The data processing code does not necessarily need to write to shared storage. Instead, the processed data could be made available another way. For example, an API could accept the batch ID as a parameter and return the data.

The implementation code should take into account the possibility that the processing will take too long or will fail, and therefore any partial results must be cleaned up to avoid wasting storage space.

The storage mechanism must be sharable across multiple processes (or threads). Possible storage mechanisms include:

The code for each of the 3 processes above can be written as 3 separate Lambda Functions (one for the POST handler, one for the data processing function, and one for the GET handler), or as a single function that can be invoked in different ways.

The sample Python code below is a single Lambda Function that can be called separately for the POST, the data processing, and the GET processes.

This code shows a sample query with output. The focus in this example is on the three processes and how they interact, not on the shared storage mechanism (DynamoDB) or data transformation (sentiment analysis). The code is structured to make it easy to replace the example storage mechanism and data transformation with different ones.

For simplicity, this example:

  • Hard-codes some important values (e.g. the AWS region).

  • Assumes the existence of some resources (e.g. the Jobs table in Dynamo).

import json
import time
import boto3

HTTP_METHOD_STRING = "httpMethod"
HEADERS_STRING = "headers"
BATCH_ID_STRING = "sf-external-function-query-batch-id"
DATA_STRING = "data"
REGION_NAME = "us-east-2"

TABLE_NAME = "Jobs"
IN_PROGRESS_STATUS = "IN_PROGRESS"
SUCCESS_STATUS = "SUCCESS"

def lambda_handler(event, context):
    # this is called from either the GET or POST
    if (HTTP_METHOD_STRING in event):
        method = event[HTTP_METHOD_STRING]
        if method == "POST":
            return initiate(event, context)
        elif method == "GET":
            return poll(event, context)
        else:
            return create_response(400, "Function called from invalid method")

    # if not called from GET or POST, then this lambda was called to
    # process data
    else:
        return process_data(event, context)


# Reads batch_ID and data from the request, marks the batch_ID as being processed, and
# starts the processing service.
def initiate(event, context):
    batch_id = event[HEADERS_STRING][BATCH_ID_STRING]
    data = json.loads(event["body"])[DATA_STRING]

    lambda_name = context.function_name

    write_to_storage(batch_id, IN_PROGRESS_STATUS, "NULL")
    lambda_response = invoke_process_lambda(batch_id, data, lambda_name)

    # lambda response returns 202, because we are invoking it with
    # InvocationType = 'Event'
    if lambda_response["StatusCode"] != 202:
        response = create_response(400, "Error in inititate: processing lambda not started")
    else:
        response = {
            'statusCode': lambda_response["StatusCode"]
        }

    return response


# Processes the data passed to it from the POST handler. In this example,
# the processing is to perform sentiment analysis on text.
def process_data(event, context):
    data = event[DATA_STRING]
    batch_id = event[BATCH_ID_STRING]

    def process_data_impl(data):
        comprehend = boto3.client(service_name='comprehend', region_name=REGION_NAME)
        # create return rows
        ret = []
        for i in range(len(data)):
            text = data[i][1]
            sentiment_response = comprehend.detect_sentiment(Text=text, LanguageCode='en')
            sentiment_score = json.dumps(sentiment_response['SentimentScore'])
            ret.append([i, sentiment_score])
        return ret

    processed_data = process_data_impl(data)
    write_to_storage(batch_id, SUCCESS_STATUS, processed_data)

    return create_response(200, "No errors in process")


# Repeatedly checks on the status of the batch_ID, and returns the result after the
# processing has been completed.
def poll(event, context):
    batch_id = event[HEADERS_STRING][BATCH_ID_STRING]
    processed_data = read_data_from_storage(batch_id)

    def parse_processed_data(response):
        # in this case, the response is the response from DynamoDB
        response_metadata = response['ResponseMetadata']
        status_code = response_metadata['HTTPStatusCode']

        # Take action depending on item status
        item = response['Item']
        job_status = item['status']
        if job_status == SUCCESS_STATUS:
            # the row number is stored at index 0 as a Decimal object,
            # we need to convert it into a normal int to be serialized to JSON
            data = [[int(row[0]), row[1]] for row in item['data']]
            return {
                'statusCode': 200,
                'body': json.dumps({
                    'data': data
                })
            }
        elif job_status == IN_PROGRESS_STATUS:
            return {
                'statusCode': 202,
                "body": "{}"
            }
        else:
            return create_response(500, "Error in poll: Unknown item status.")

    return parse_processed_data(processed_data)


def create_response(code, msg):
    return {
        'statusCode': code,
        'body': msg
    }


def invoke_process_lambda(batch_id, data, lambda_name):
    # Create payload to be sent to processing lambda
    invoke_payload = json.dumps({
        BATCH_ID_STRING: batch_id,
        DATA_STRING: data
    })

    # Invoke processing lambda asynchronously by using InvocationType='Event'.
    # This allows the processing to continue while the POST handler returns HTTP 202.
    lambda_client = boto3.client('lambda', region_name=REGION_NAME,)
    lambda_response = lambda_client.invoke(
        FunctionName=lambda_name,
        InvocationType='Event',
        Payload=invoke_payload
    )
    # returns 202 on success if InvocationType = 'Event'
    return lambda_response


def write_to_storage(batch_id, status, data):
    # we assume that the table has already been created
    client = boto3.resource('dynamodb')
    table = client.Table(TABLE_NAME)

    # Put in progress item in table
    item_to_store = {
        'batch_id': batch_id,
        'status': status,
        'data': data,
        'timestamp': "{}".format(time.time())
    }
    db_response = table.put_item(
        Item=item_to_store
    )


def read_data_from_storage(batch_id):
    # we assume that the table has already been created
    client = boto3.resource('dynamodb')
    table = client.Table(TABLE_NAME)

    response = table.get_item(Key={'batch_id': batch_id},
                          ConsistentRead=True)
    return response

Here is an example call to the asynchronous external function, along with sample output, including the sentiment analysis results:

create table test_tb(a string);
insert into test_tb values
    ('hello world'),
    ('I am happy');
select ext_func_async(a) from test_tb;

Row | EXT_FUNC_ASYNC(A)
0   | {"Positive": 0.47589144110679626, "Negative": 0.07314028590917587, "Neutral": 0.4493273198604584, "Mixed": 0.0016409909585490823}
1   | {"Positive": 0.9954453706741333, "Negative": 0.00039307220140472054, "Neutral": 0.002452891319990158, "Mixed": 0.0017087293090298772}

Notes about the sample code:

  • The data processing function is invoked by calling:

    lambda_response = lambda_client.invoke(
        ...
        InvocationType='Event',
        ...
    )
    

    The InvocationType should be ‘Event’, as shown above, because the 2nd process (or thread) must be asynchronous and Event is the only type of non-blocking call available through the invoke() method.

  • The data processing function returns an HTTP 200 code. However, this HTTP 200 code is not returned directly to Snowflake. Snowflake does not see any HTTP 200 until a GET polls the status and sees that the data processing function finished processing this batch successfully.

Troubleshooting External Functions on AWS

Platform-Independent Symptoms

Actual Return Values for a Data Type Do Not Match Expected Return Values

When passing arguments to or from an external function, ensure that the data types are appropriate. If the value sent can’t fit into the data type being received, the value might be truncated or corrupted in some other way.

For more details, see Ensure that Arguments to the External Function Correspond to Arguments Parsed By the Remote Service.

When you call your function from SQL, you get a message about row numbers being out of order

Possible Cause(s)

Remember that the row numbers you return within each batch should be monotonically ascending integers starting at 0. The input row numbers must also follow that rule, and each output row should match the corresponding input row (e.g. the output in output row 0 should correspond to the input in input row 0).

Possible Solution(s)
  1. Make sure that the row numbers you return are the same as the row numbers you received, and that each output value uses the row number of the corresponding input. That should work. If it doesn’t, then it is possible that the input row numbers were not right, or that you did not return the rows in the correct order, so move on to step 2 below.

  2. Make sure that the output row numbers start from 0, increase by 1, and are in order.

For more information about data input and output formats, see Remote Service Input and Output Data Formats.

When trying to call the external function, you get the message “Error parsing JSON: Invalid response”

Possible Causes(s)

The most likely cause is that the JSON returned by the remote service (e.g. AWS Lambda function) is not constructed correctly.

Possible Solution(s)

Make sure that you return an array of arrays, with one inner array returned for each input row received. Review the description of the output format at Data Format Received by Snowflake.

An error message saying that the format of the returned value is not JSON

Possible Cause(s)

One possible cause of this is that your return value includes double quotes inside the value.

Possible Solution(s)

Although JSON strings are delimited by double quotes, the string itself should not start and end with a quotation mark in most cases. If the embedded double quotes are incorrect, then remove them.

An error message saying that the function received the wrong number of rows

Possible Cause(s)

The remote service probably tried to return more or fewer rows than it received. (Remember, even though the function is nominally scalar, it might receive multiple rows in the “body” field of the “event” parameter, and should return exactly as many rows as it received.)

Possible Solution(s)

Make sure that the remote service returns one row for each row that it receives.

Platform-Specific Symptoms

API Gateway returns error 502 while the endpoint is using Lambda proxy integration

Possible Cause(s)

The Lambda function might have:

  • Timed out.

  • Thrown an exception.

  • Failed in some other way.

Possible Solution(s)

If the Lambda or API Gateway logs, are available to you, examine them.

If the source code of the Lambda function is available to you, then analyze and debug the code in the Lambda function. In some cases, you might be able to execute a copy of that code in a simpler context (outside AWS) to help debug it.

Verify that the data sent to the Lambda function is in the format that Lambda function expects. You might want to try sending a smaller, simpler data set to see whether that succeeds.

Verify that you are not sending too much data at a time.

In some cases, increasing the timeout might solve the problem, especially if the Lambda function requires a lot of CPU resources, or if the Lambda function itself calls other remote services and thus requires more time.

Unable to read the requests body in the HTTP POST method inside the Amazon AWS Lambda Function

Possible Cause(s)

You might not have enabled Lambda proxy integration.

Possible Solution(s)

Enable Lambda proxy integration.

For more details, see the steps in Create and Configure an API in the Amazon API Gateway.

Error: Error assuming AWS_ROLE

The full text of the message is:

SQL execution error: Error assuming AWS_ROLE. Please verify the role and externalId are configured correctly in your AWS policy.

Possible Cause(s)
  • In the AWS Trust Relationship Policy for your role, the AWS ARN is incorrect. Possible causes of that include:

    • You did not set it.

    • You set it, but you used the ARN of the AWS role (incorrect) instead of the user ARN, which you can see from the DESCRIBE INTEGRATION command in Snowflake. Make sure that you use the value from the “API_AWS_IAM_USER_ARN” field of the worksheet rather than the value from the “API_AWS_ROLE_ARN” field.

  • In your AWS Trust Relationship Policy, the std:ExternalId is incorrect. Possible causes of that include:

    • You did not set it.

    • You re-created the API integration object. Re-creating the API object changes its external ID.

Error: 403 '{"Message":"User: <ARN> is not authorized to perform: execute-api:Invoke"}'

The full text of the message is:

Request failed for external function <function_name>. Error: 403 '{"Message":"User: <ARN> is not authorized to perform: execute-api:Invoke on resource: <MethodRequestARN>"}'

Possible Cause(s)
  • The API Gateway resource policy has:

    • The wrong IAM Role ARN.

    • The wrong assumed role.

    • The wrong Method Request ARN.

  • The IAM role doesn’t have the right policy attached.

Possible Solution(s)
  • Make sure that you followed the resource policy template in Secure Your Amazon API Gateway Endpoint. Specifically, make sure that your resource policy:

    • Replaced the <12-digit number> with the value in the “AWS account ID” field of the worksheet.

    • Replaced the <external_function_role> with the value in the “New IAM Role Name” field of the worksheet.

    • Replaced the method_request_ARN in the Resource field with the value in the “Method Request ARN” field in the worksheet.

  • If you need to make sure that the IAM role has the correct permissions policy attached, you can find the role’s permissions policy list by following the steps below:

    1. In AWS, select the role.

    2. View the Summary for the role.

    3. Click on the Permissions tab.

    4. Verify that the required policy is in the Permissions policies list.

Error: 403 '{"Message":"User: anonymous is not authorized to perform: execute-api:Invoke"}'

The full text of the message is:

Request failed for external function <function_name>. Error: 403 '{"Message":"User: anonymous is not authorized to perform: execute-api:Invoke on resource: <MethodRequestARN>"}'

Possible Cause(s)

One possible cause is that when you were configuring authorization for the API Gateway, you might not have specified that the Method Request requires AWS_IAM authorization for the resource.

Possible Solution(s)

If you did not follow the instructions in secure the Amazon API Gateway, then please follow them now to specify AWS_IAM authorization.

Error: Error parsing JSON response ... Error: top-level JSON object must contain "data" JSON array element

The full text of the message is:

Error parsing JSON response for external function ... Error: top-level JSON object must contain "data" JSON array element

Possible Cause(s)
  • You might not have specified Lambda proxy integration for the POST command in your API Gateway resource.

Possible Solution(s)

Request failed for external function EXT_FUNC with remote service error: 403 ‘{“message”:”Forbidden”}’;

Possible Cause(s)

The proxy service required an API key, typically for authentication or billing. The API key is missing or incorrect.

Possible Solution(s)

Use the ALTER API INTEGRATION command to specify the correct API key.