Creating an External Function on AWS

This document shows one way to create an external function on Amazon AWS, including:

  • Creating a remote service (Lambda Function on AWS)

  • Creating a proxy service on AWS API Gateway.

  • Creating an API Integration in Snowflake.

  • Creating an external function in Snowflake.

  • Calling the external function.

These instructions assume that you are already familiar with Amazon AWS administration. These instructions specify the general steps that you need to execute, but do not walk you through the details of the AWS administration console because the details could change.

In this Topic:

Prerequisites

You need:

  • An account with AWS, including privileges to:

    • Create AWS roles via IAM (identity and access management).

    • Create Lambda functions.

    • Create an API Gateway endpoint.

  • A Snowflake account in which you have ACCOUNTADMIN privileges or a role with the CREATE INTEGRATION privilege.

This document assumes that you are an experienced AWS administrator.

Tip

During the creation of the external function, you must copy authentication-related information from Snowflake to your cloud platform, and from your cloud platform administrative window to Snowflake.

As you follow the instructions below, Snowflake recommends that you record information in some steps for use in later steps:

Cloud Platform (IAM) Account Id: _____________________________________________
Lambda Function Name...........: _____________________________________________
New IAM Role Name..............: _____________________________________________
Cloud Platform (IAM) Role ARN..: _____________________________________________
Proxy Service Resource Name....: _____________________________________________
Resource Invocation URL........: _____________________________________________
Method Request ARN.............: _____________________________________________
API_AWS_IAM_USER_ARN...........: _____________________________________________
API_AWS_EXTERNAL_ID............: _____________________________________________

Step 1: Create the Remote Service (Lambda Function on AWS)

There are multiple possible ways to create a remote service. This section shows how to create a remote service that is implemented as a Python function executing in AWS Lambda.

This sample Python-language function merely returns its inputs.

This function accepts data in the same format that Snowflake sends, and returns data in the same format that Snowflake reads.

This Python function receives two parameters, event and context. The event parameter includes many sub-fields, one of which is body. The body is a dictionary that includes a key named data; the corresponding value for data is a string that holds the data that Snowflake sent in JSON format. Because AWS Lambda conveniently processes the HTTP POST request sent by Snowflake, extracts the body, and passes the body inside the event parameter, this example function does not need to parse the entire HTTP POST request.

After this Python function extracts the body as a string, the function calls a JSON library function to convert the string to a Python data structure. The function then extracts individual rows from that data structure, processes them, and returns a value for each of them.

The JSON for a typical return value from an AWS Lambda function looks like the following:

{
"statusCode": <http_status_code>,
"body":
        {
            "data":
                  [
                      [ 0, <value> ],
                      [ 1, <value> ]
                      ...
                  ]
        }
}

The data of the return value matches the format described earlier for the input data. On AWS, the convention for an HTTP-compatible service is to return the body inside a JSON object that also includes the HTTP status code.

To create this AWS Lambda function, follow the steps below:

  1. Log into the AWS Management Console, if you haven’t already.

  2. Record your cloud platform account ID in the template above.

  3. Select “Lambda”.

  4. Select “Create function”.

  5. Enter a function name.

    Record this name in the template above on the line labeled “Lambda Function Name”.

  6. Select the language to use. For this example, choose Python 3.7.

  7. Choose or create an execution role for this function.

    Select the appropriate option(s), typically “Create a new role with basic Lambda permissions”.

    (This role is separate from your cloud account role and separate from your Snowflake role(s).)

  8. Open the window into which you can paste your function’s source code.

  9. Enter the code for the function. If you have not already written your own function, you can replace the default function code with the code below, which will echo its input. You can replace or update this code later when you are ready to create a custom function.

    import json
    
    def lambda_handler(event, context):
    
        # 200 is the HTTP status code for "ok".
        status_code = 200
    
        # The return value will contain an array of arrays (one inner array per input row).
        array_of_rows_to_return = [ ]
    
        try:
            # From the input parameter named "event", get the body, which contains
            # the input rows.
            event_body = event["body"]
    
            # Convert the input from a JSON string into a JSON object.
            payload = json.loads(event_body)
            # This is basically an array of arrays. The inner array contains the
            # row number, and a value for each parameter passed to the function.
            rows = payload["data"]
    
            # For each input row in the JSON object...
            for row in rows:
                # Read the input row number (the output row number will be the same).
                row_number = row[0]
    
                # Read the first input parameter's value. For example, this can be a
                # numeric value or a string, or it can be a compound value such as
                # a JSON structure.
                input_value_1 = row[1]
    
                # Read the second input parameter's value.
                input_value_2 = row[2]
    
                # Compose the output based on the input. This simple example
                # merely echoes the input by collecting the values into an array that
                # will be treated as a single VARIANT value.
                output_value = [input_value_1, input_value_2]
    
                # Put the returned row number and the returned value into an array.
                row_to_return = [row_number, output_value]
    
                # ... and add that array to the main array.
                array_of_rows_to_return.append(row_to_return)
    
            json_compatible_string_to_return = json.dumps({"data" : array_of_rows_to_return})
    
        except Exception as err:
            # 400 implies some type of error.
            status_code = 400
            # Tell caller what this function could not handle.
            json_compatible_string_to_return = event_body
    
        # Return the return value and HTTP status code.
        return {
            'statusCode': status_code,
            'body': json_compatible_string_to_return
        }
    
  10. Optional but strongly recommended: Test your function.

    For the sample Python function provided by Snowflake, use the following test data (replace any default data with the data below):

    {
      "body":
        "{ \"data\": [ [ 0, 43, \"page\" ], [ 1, 42, \"life, the universe, and everything\" ] ] }"
    }
    

    The execution results should be similar to:

    Response:
    {
      "statusCode": 200,
      "body": "{\"data\": [[0, [43, \"page\"]], [1, [42, \"life, the universe, and everything\"]]]}"
    }
    ...
    

If the preceding worked correctly, you now have an AWS Lambda function that you can use as the remote service for your external function.

Step 2: Configure the Proxy Service (API Gateway on AWS) and Create the API Integration (in Snowflake)

Configuring AWS API Gateway as the cloud platform proxy service requires several steps, including:

  • Creating a new IAM role in your cloud platform account.

  • Creating an API Gateway (the proxy service) and configuring it.

  • Securing Your Proxy Service Endpoint.

  • Creating an API integration object in Snowflake.

  • Setting up a trust relationship between Snowflake and the new IAM role.

The steps to create these are interleaved because the API integration needs information from the cloud platform (the cloud platform role’s ARN), and the API gateway needs the API_AWS_EXTERNAL_ID and API_AWS_IAM_USER_ARN from the API integration.

Create a New IAM Role in your Cloud Platform Account

For Snowflake to authenticate to your AWS account, a Snowflake-owned IAM user must be granted permission to assume an IAM role in your AWS account. To do this, a trust relationship must be established. To establish the trust relationship, you must create an IAM role on your AWS account and configure it with the ARN of the Snowflake-owned IAM user, and you must create an API integration object in Snowflake and configure the API integration object with the information about which IAM role to assume.

  1. Create a new cloud platform role via IAM (identity and access management).

  2. When asked to select the type of trusted entity, choose “Another AWS account”.

  3. When asked to “Specify accounts that can use this role”, paste the previously-saved Cloud Platform Account Id.

  4. Click on “Next: Permissions”.

  5. Set permissions (“Attach permissions policies”) if necessary.

  6. Enter a role name.

    Record the role name as the “New IAM Role Name”.

  7. After you create the role:

    • Record the “Role ARN” as the “Cloud Platform (IAM) Role ARN”.

Create and Configure an API in the API Gateway (Proxy Service)

Make sure that you use a regional endpoint. The preview of this feature supports only regional endpoints for the AWS API Gateway. (For a description of the different types of endpoints, see https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-basic-concept.html)

Although the only type of proxy service that Snowflake currently supports is the AWS API Gateway, you can create an external function on non-AWS-hosted instances of Snowflake. If your virtual warehouse is on Azure or GCP, you can create an external function that accesses a remote service through an AWS API Gateway.

The steps to create an API Gateway are below:

  1. Select “API Gateway”.

  2. Select “Create API”.

  3. Find “REST API” and click on its “Build” button.

  4. Choose the REST protocol.

  5. Select the “New API” option.

  6. Finish creating the new API.

  7. Create a resource.

    Record the resource name as the “Proxy Server Resource Name”; you will need it later.

  8. Select “Create Method” for this resource, and specify the POST option.

    The “Integration type” should be “Lambda Function”.

  9. Click the checkbox “Use Lambda Proxy Integration”.

  10. In the “Lambda Function” field, paste the Lambda Function Name that you recorded earlier.

  11. Save.

  12. Select the “Deploy API” action.

  13. Select or create a stage for this function.

  14. Click on the “POST” and then record the “Invoke URL” for the POST request; put this in the “Resource Invocation URL” field in the template.

    Make sure that the invocation URL includes the name of the resource; if it doesn’t, you might have clicked on the invocation URL for the stage rather than the resource.

Secure Your AWS API Gateway Proxy Service Endpoint

For an overview of securing proxy service endpoints, see Secure Your Proxy Service Endpoint.

To secure an AWS API Gateway endpoint:

  1. Specify that the “Method Request” requires “AWS_IAM” authorization.

    Record the Method Request ARN in the template.

  2. Set the resource policy for the API Gateway to specify who is authorized to invoke the gateway endpoint.

    The resource policy typically looks similar to the following (except that the Principal and Resource need to be customized):

    {
        "Version": "2012-10-17",
        "Statement":
        [
            {
            "Effect": "Allow",
            "Principal":
                {
                "AWS": "arn:aws:sts::<12-digit-number>:assumed-role/<external_function_role>/snowflake"
                },
            "Action": "execute-api:Invoke",
            "Resource": "arn:aws:execute-api:us-west-2:123456789012:ljkfds890a/*/POST/MyResourceName"
            }
        ]
    }
    

    Replace the following portions of the resource policy:

    • Replace the <12-digit-number> with the 12-digit number from the Cloud Platform (IAM) Role ARN. Typically, this is the same as the Cloud Platform (IAM) Account ID.

    • Replace the <external_function_role> with the role name from the Cloud Platform (IAM) Role ARN. Typically, this is the same as the New IAM Role Name.

      For example, if your Cloud Platform (IAM) Role Name is:

      arn:aws:iam::987654321098:role/MyNewIAMRole
      

      then the result should be :

      "AWS": "arn:aws:sts::987654321098:assumed-role/MyNewIAMRole/snowflake"
      
    • The “Resource” needs to be set to the Method Request ARN for the resource’s POST command.

      Note

      Setting the Resource to the Method Request ARN specifies that the proxy service should allow calls to only the specified resource. It is possible to specify a subset of the Method Request ARN as a prefix, which allows multiple resources to be called from the same proxy service.

      For example, if the Method Request ARN is:

      arn:aws:execute-api:us-west-1:123456789012:a1b2c3d4e5/*/POST/MyResource
      

      then you could specify just the following prefix:

      arn:aws:execute-api:us-west-1:123456789012:a1b2c3d4e5/*/
      
  3. Deploy the API.

In the next few steps, you create a Snowflake API integration object. Do not close your cloud platform administration window now; you must return to it later.

Create an API Integration Object in Snowflake

  1. Open up (if you haven’t already) a Snowflake session, typically a GUI session.

  2. Use a Snowflake role with ACCOUNTADMIN privileges or a role with the CREATE INTEGRATION privilege, for example:

    use role has_accountadmin_privileges;
    
  3. Type the CREATE API INTEGRATION command to create an API integration. The command should look similar to the following:

    CREATE OR REPLACE API INTEGRATION my_api_integration_01
      api_provider = aws_api_gateway
      api_aws_role_arn = '<cloud_platform_role_ARN>'
      enabled = true
      api_allowed_prefixes = ('https://')
    ;
    

    The <cloud_platform_role_ARN> should be the Cloud Platform (IAM) Role ARN that you recorded earlier.

    The api_allowed_prefixes field should contain the resource invocation URL that you recorded earlier.

    You might also want to customize the name of the API integration, rather than use the value in the example.

    Below is an example of a complete CREATE API INTEGRATION statement:

    create or replace api integration demonstration_external_api_integration_01
        api_provider=aws_api_gateway
        api_aws_role_arn='arn:aws:iam::123456789012:role/my_cloud_account_role'
        api_allowed_prefixes=('https://xyz.execute-api.us-west-2.amazonaws.com/production/')
        enabled=true;
    
  4. Execute the CREATE API INTEGRATION command you typed above.

  5. Execute the DESCRIBE INTEGRATION command.

    DESCRIBE INTEGRATION <my_integration_name>;
    

    For example:

    DESCRIBE INTEGRATION my_api_integration_01;
    
  6. Look for the property named “API_AWS_IAM_USER_ARN” and then record that property’s property_value for later use.

  7. Find the property named “API_AWS_EXTERNAL_ID” and record that property’s property_value for later use.

    Note that the property_value of the API_AWS_EXTERNAL_ID often ends with an equals sign (“=”). That equals sign is part of the value; make sure that you cut and paste it along with the rest of the property_value.

In the next few steps, you will return to your cloud platform administration window. Do not close your Snowflake administration window now; you must return to it later.

Set Up the Trust Relationship(s) between Snowflake and the New IAM Role

In the AWS Management Console:

  1. Select IAM.

  2. Select Roles.

  3. Look for the New IAM Role Name that you created and recorded earlier, and select it.

  4. Click on the “Trust relationships” tab, then click on the button to edit the trust relationship.

    This should open the Policy Document into which you can add authentication information.

  5. In the Policy Document, find the Statement.Principal.AWS field and replace the value (not the key) with the API_AWS_IAM_USER_ARN that you saved earlier.

  6. Find the Statement.Condition field. Initially, this should contain only curly braces (“{}”).

  7. Paste the following between the curly braces: "StringEquals": { "sts:ExternalId": "xxx" }.

  8. Replace the “xxx” with the value for the API_AWS_EXTERNAL_ID that you recorded earlier.

  9. After you are done editing the “Policy Document” for the trust relationship, it should look similar to the following:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::1234567898012:user/development/development_user"
          },
          "Action": "sts:AssumeRole",
          "Condition": {"StringEquals": { "sts:ExternalId": "EXTERNAL_FUNCTIONS_SFCRole=3_8Hcmbi9halFOkt+MdilPi7rdgOv=" }}
        }
      ]
    }
    
  10. Click on the button to update the trust policy.

Step 3: Create the External Function

Now go back to your Snowflake administration window (where you earlier typed the CREATE API INTEGRATION command).

  1. Type the CREATE EXTERNAL FUNCTION command. It should look similar to the following:

    CREATE EXTERNAL FUNCTION my_external_function(n INTEGER, v VARCHAR)
        RETURNS VARIANT
        API_INTEGRATION = <api_integration_name>
        AS '<invocation_url>'
        ;
    

    The <api_integration_name> value should contain the name of the API integration that you created earlier.

    The <invocation_url> value should be the resource invocation URL you recorded earlier. Make sure that this includes the resource name, not just the stage name.

    This example passes two arguments (an INTEGER and a VARCHAR ) because those are the arguments that the remote service expects. When you create your own remote service, you will pass appropriate arguments for your remote service.

  2. If you have not already executed the CREATE EXTERNAL FUNCTION command that you typed above, execute it now.

Step 4: Call the External Function

  1. If appropriate, grant USAGE privilege on the external function to one or more roles so that those roles can call the external function. (A role must have USAGE or OWNERSHIP privileges on that external function.)

  2. Execute your function by calling:

    SELECT my_external_function(99, 'Luftballoons');
    

Troubleshooting

Symptom:

When trying to call the external function, you get the following error message:

“SQL execution error: Error assuming AWS_ROLE. Please verify the role and externalId are configured correctly in your AWS policy.”

Possible Causes:

  • In the AWS Trust Relationship Policy for your role, the AWS ARN is incorrect. Possible causes of that include:

    • You didn’t set it.

    • You set it, but you used the ARN of the cloud platform role (incorrect) instead of the user ARN, which you can see from the “DESCRIBE INTEGRATION” command in Snowflake. Make sure that you use the API_AWS_IAM_USER_ARN rather than the API_AWS_ROLE_ARN.

  • In your AWS Trust Relationship Policy, the std:ExternalId is incorrect. Possible causes of that include:

    • You didn’t set it.

    • You re-created the API integration object. Re-creating the API object changes its external ID.

Symptom:

When trying to call the external function, you get the following error message:

Request failed for external function <function_name>. Error: 403 ‘{“Message”:”User: <ARN> is not authorized to perform: execute-api:Invoke on resource: <MethodRequestARN>”}’

Possible Cause(s):

Using the wrong role ARN in the Resource Policy for the API Gateway.

Possible Solution(s):

Make sure that you followed the template, but:

  • Replaced the 12-digit number with the IAM Account ID.

  • Replaced the <external_user_role> with the New IAM Role Name.

Also, make sure that the Resource is correct; it should be the Method Request ARN.