Creating an External Function on Google Cloud Platform

These instructions show how to create an external function hosted on Google Cloud Platform (GCP) using a Google Cloud Function as the remote service and a Google Cloud API Gateway as the proxy service. You can use these instructions either to create the sample external function provided by Snowflake or as a guide to create your own external function.

This documentation assumes that you are already familiar with GCP administration. These instructions describe the general steps that you need to execute, but do not describe the user interface in detail because the details could change.

Google also provides a command-line interface that you can use for many of these steps. For details about that, see the GCP documentation.

In this Topic:

Planning to Create an External Function on GCP

Prerequisites

The prerequisites are:

  • You must have a Google Cloud project ID.

  • You must have the correct services enabled for your Google Cloud Project. For detailed requirements, check the Google Cloud API Gateway quick-start documentation .

Worksheet

As you create your external function, you should record specific information (for example, the Cloud Function trigger URL) that you enter so that you can use that information in subsequent steps. The worksheet below helps you track this information.

======================================================================================
======================================= Worksheet ====================================
======================================================================================

--------------- Cloud Function (remote service) Information ---------------

Cloud Function Trigger URL: _______________________________________________


---------------------- API Config File Information ------------------------

Path Suffix ..............: _______________________________________________

Configuration File Name ..: _______________________________________________


----------------- API Gateway (proxy service) Information -----------------

Managed Service Identifier: _______________________________________________

Gateway Base URL .........: _______________________________________________


------------ API Integration Information ------------

API Integration Name .....: _______________________________________________

API_GCP_SERVICE_ACCOUNT ..: _______________________________________________


------------ External Function Information ------------

External Function Name ...: _______________________________________________


--------------------------- Security Information --------------------------

Security Definition Name .: _______________________________________________

Step 1: Create the Remote Service (Google Cloud Function)

In this tutorial, a remote service is implemented as a Google Cloud Function.

  1. Create a Google Cloud Function by following Google’s instructions to create a Cloud Function . If you are creating the sample Python-language function that Snowflake supplies, then choose the Python Quickstart; otherwise, choose the appropriate QuickStart based on the language that you need. As you follow Google’s instructions, make sure that you do the following:

    • Specify that the trigger for the function is HTTP.

    • Copy the trigger URL to the « Cloud Function Trigger URL » field in the worksheet above.

    • In the Authentication section, select Require authentication.

      The GCP instructions say to select Allow unauthenticated invocations. That is acceptable for sample functions, including the sample function provided by Snowflake, but most production systems should require authentication.

    • If Require HTTPS is not already enabled, then enable it.

    • Click Save.

    • Select an appropriate Runtime. If you are creating the sample Python function supplied by Snowflake, then choose a Python runtime, for example Python 3.7.

      Note

      Select the appropriate Runtime value before you paste in the code.

    • Replace the default code with either the Snowflake sample code below (written in Python 3.7) or with your own custom code.

      This sample code combines the input parameter values into a single list (array) and returns that list as a single value of SQL type VARIANT.

      This function accepts and returns data in the same format (JSON) that Snowflake sends and reads. (For more information about data formats, see Remote Service Input and Output Data Formats .)

      import json
      
      HTTP_SUCCESS = 200
      HTTP_FAILURE = 400
      
      def echo(request):
          try:
              # The list of rows to return.
              return_value = []
      
              payload = request.get_json()
              rows = payload["data"]
      
              # For each input row
              for row in rows:
                  # Include the row number.
                  row_number = row[0]
                  # Combine the value(s) in the row into a Python list that will be treated as an SQL VARIANT.
                  row_value = row[1:]
                  row_to_return = [row_number, row_value]
                  return_value.append(row_to_return)
      
              json_compatible_string_to_return = json.dumps( { "data" : return_value } )
              return (json_compatible_string_to_return, HTTP_SUCCESS)
      
          except:
              return(request.data, HTTP_FAILURE)
      
    • Make sure that the Entry point matches the name of the function (in this case, echo).

  2. After you finish creating the Cloud Function, use the Testing tab to call the function to make sure that it works as expected.

    For the sample Python function provided by Snowflake, use the following test data (replace any default data in the Testing tab with the data below):

    { "data":
      [
        [ 0, 43, "page" ],
        [ 1, 42, "life, the universe, and everything" ]
      ]
    }
    

    The execution results should be similar to:

    {"data":
      [
        [0, [43, "page"] ],
        [1, [42, "life, the universe, and everything"] ]
      ]
    }
    

    (The results might be displayed differently from the example shown above.)

    If the preceding worked correctly, you now have a Google Cloud Function that you can use as the remote service for your external function.

Step 2: Create and Configure the Proxy Service (Google Cloud API Gateway)

Snowflake does not send data (HTTP POST requests) directly to the remote service (e.g. GCP Cloud Function). Instead, Snowflake sends the data to a proxy service that relays the data from Snowflake to the remote service, and from the remote service back to Snowflake. On GCP, Snowflake supports the Google Cloud API Gateway as the proxy service.

Create a Google Cloud API Gateway to Route Requests to the Cloud Function

If you use the Google Cloud Console, then the specific steps to execute are:

  1. Create an API definition by creating a configuration file.

  2. Create an API Gateway.

    1. Create an API.

    2. Create an API Config.

    3. Create a Gateway with the API Config.

These steps are documented in more detail below.

Create an API Definition

On your local file system, create and customize a YAML-formatted configuration file that specifies the API that you are creating. The file should have the extension .yaml or .yml. A template for the configuration file is below:

swagger: '2.0'
info:
  title: API Gateway config for Snowflake external function.
  description: This configuration file connects the API Gateway resource to the remote service (Cloud Function).
  version: 1.0.0
schemes:
  - https
produces:
  - application/json
paths:
  /<PATH>:
    post:
      summary: Echo the input.
      operationId: echo
      x-google-backend:
        address: <HTTP ENDPOINT TO ROUTE REQUEST TO>
        protocol: h2
      responses:
        '200':
          description: <DESCRIPTION>
          schema:
            type: string

Fill or update the following fields:

  1. Replace the <PATH> with a unique name. This will be incorporated into URLs, so use only characters that are valid in URLs. For example, enter demo-func-resource.

    Note that for the <PATH>, unlike the other fields in this configuration file, you should enter the value before the colon, rather than after the colon. For example, the following is correct:

    paths:
      /demo-func-resource:
    

    The path name should not contain a path parameter . (Google supports path parameters when setting the path to a URL . However, Snowflake does not support path parameters in the corresponding URL specified in the CREATE EXTERNAL FUNCTION statement.)

  2. Copy the path from the immediately preceding step (e.g. demo-func-resource) to the « Path Suffix » field in the worksheet.

  3. Find the address field under the x-google-backend field, and replace the <HTTP ENDPOINT TO ROUTE REQUEST TO> with the value in the « Cloud Function Trigger URL » field of the worksheet. The result should look similar to:

    x-google-backend:
      address: https:// ...
    

    The URL should not have quotation marks around it.

    The URL can be the path to any HTTP endpoint, not only an endpoint hosted by Google.

    If you selected Require HTTPS when you created the remote service, then ensure that the URL you enter into the address field starts with https.

  4. Optionally, you can update any of the following:

    • The title in the info section.

    • The description in the info section.

    • The operationId in the post subsection of the paths section.

    • The summary in the post subsection of the paths section.

  5. Review your sample configuration file. It should look similar to the following:

    swagger: '2.0'
    info:
      title: "API Gateway config for Snowflake external function"
      description: "This configuration file connects the API Gateway resource to the remote service (Cloud Function)."
      version: 1.0.0
    schemes:
      - https
    produces:
      - application/json
    paths:
      /demo-func-resource:
        post:
          summary: "echo the input"
          operationId: echo
          x-google-backend:
            address: https://my_dev.cloudfunctions.net/demo-cloud-function-01
            protocol: h2
          responses:
            '200':
              description: echo result
              schema:
                type: string
    
  6. Save this configuration file.

  7. Record the file’s path and name in the « Configuration File Name » field in the worksheet.

If you want to learn more about the API configuration file, see the following GCP documentation:

Create an API Gateway

Create an API

This step creates a GCP API, which is a container that can contain one or more API Gateways and one or more configuration files.

  1. If you have not already done so, go to the Google Cloud API Gateway screen by clicking on the GCP menu and selecting API Gateway.

  2. Click on CREATE GATEWAY.

  3. Enter the Display Name and the API ID (e.g. demo-api-display-name-for-external-function1 and demo-api-id-for-external-function1).

    You do not need to record these in the worksheet because you do not need to enter these later to create your external function. However, you might want to record the API ID so that you can delete it when you are done with it.

Create an API Config

Upload your configuration file to GCP, which creates an API Config.

  1. Scroll to the API Config section of the screen.

  2. Search for the field that contains Upload an API Spec.

    Click on BROWSE and select your configuration file. The name of your configuration file was recorded in the « Configuration File Name » field of the worksheet.

  3. Enter a display name into the field that contains Display Name.

  4. Select a service account.

    If you created the sample function, then in the field that contains Select a Service Account, select App Engine default service account.

    If you are creating a function to use in production (rather than as a sample), you might choose a different service account.

    The selected service account must have appropriate privileges, including privileges to call the Cloud Function.

Create a Gateway with the API Config

  1. Scroll to the Gateway details section of the screen.

  2. Enter the Display Name of the new API Gateway.

  3. Click in the Location field and select the appropriate region (e.g. us-central1).

  4. Click on CREATE GATEWAY.

    This takes you to the APIs screen and shows you a list of your APIs.

    If your new API is not visible immediately, wait a few minutes, then click the Refresh button.

  5. Copy the value of the API’s Managed Service to the « Managed Service Identifier » field in the worksheet.

  6. At this point, you should still see a list of your APIs. Click on the name of the API.

    You should see 4 tabs: OVERVIEW, DETAILS, CONFIGS, and GATEWAYS.

  7. Click on the GATEWAYS tab.

  8. Copy the Gateway URL to the « Gateway Base URL » field in the worksheet.

Step 3: Create the API Integration Object in Snowflake

  1. Open up (if you haven’t already) a Snowflake web interface session.

  2. Execute the USE ROLE command to use a Snowflake role with ACCOUNTADMIN privileges or a role with the CREATE INTEGRATION privilege, for example:

    use role has_accountadmin_privileges;
    
  3. Type the CREATE API INTEGRATION command to create an API integration. The command should look similar to the following:

    create or replace api integration <integration_name>
        api_provider = google_api_gateway
        google_audience = '<google_audience_claim>'
        api_allowed_prefixes = ('<url>')
        enabled = true;
    
  4. Replace the <integration_name> with a unique integration name.

  5. Record the integration name in the « API Integration Name » field of the worksheet. You need the API Integration name later when you execute the CREATE EXTERNAL FUNCTION command.

  6. In the google_audience clause, replace the <google_audience_claim> with the value in the « Managed Service Identifier » field in the worksheet.

    (During authentication, Snowflake gives Google a JWT (JSON Web Token). The JWT contains an « aud » (« audience ») claim, which Snowflake sets to the google_audience field’s value. For more information about authenticating with Google, please see the Google service account authentication documentation. )

  7. In the api_allowed_prefixes clause, replace the <url> with the value in the « Gateway Base URL » field in the worksheet.

    (This field allows you to restrict the URLs to which this API integration can be applied. You can use a value that is more restrictive than the Gateway Base URL.)

  8. If you have not already done so, then execute the CREATE API INTEGRATION command you typed above.

  9. Read and record the API integration’s API_GCP_SERVICE_ACCOUNT information.

    1. Execute the DESCRIBE INTEGRATION command. For example:

      describe integration my_api_integration_name;
      
    2. Record the value of the API_GCP_SERVICE_ACCOUNT in the « API_GCP_SERVICE_ACCOUNT » field of the worksheet.

For more details about creating an API integration, see CREATE API INTEGRATION.

Step 4: Create the External Function in Snowflake

You should be in a Snowflake web interface session.

  1. Paste the command CREATE EXTERNAL FUNCTION. The command looks similar to the following:

    create or replace external function <external_function_name>(<parameters>)
        returns variant
        api_integration = <api_integration_name>
        as '<function_URL>' ;
    
  2. Replace the <external_function_name> with a unique function name (e.g. echo).

  3. Record the function name in the « External Function Name » field in the worksheet.

  4. Replace the <parameters> with the names and SQL data types of the parameters, if any. For example: a integer, b varchar.

    The parameters must correspond to the parameters expected by the remote service. The parameter names do not need to match, but the data types need to be compatible.

  5. Record the parameter name and data types in the « External Function Name » field in the worksheet.

  6. In the api_integration clause, replace the <api_integration_name> with the value in the « API Integration Name » field in the worksheet.

  7. Replace the <function_URL> with the value in the Gateway Base URL field in the worksheet, followed by a slash (/) and the value in the Path Suffix field in the worksheet.

    The URL should look similar to:

    https://<gateway-base-url>/<path-suffix>
    
  8. If you have not already done so, then execute the CREATE EXTERNAL FUNCTION command that you typed above.

Step 5: Call the External Function

Test the external function by calling it. For example, execute a SQL command similar to the following:

select my_external_function(42, 'Life, the Universe, and Everything');

Replace the function name « my_external_function » with the actual external function name that you recorded in the « External Function Name » field of the worksheet.

The returned value should be similar to:

[42, "Life, the Universe, and Everything"]

Step 6: Set the Security Policy on the Google Cloud API Gateway (Proxy Service)

Update and Reload the Configuration File

The previous steps create a Google Cloud Function that can be called by anyone who has the correct Google Cloud API Gateway endpoint. Unless you want your endpoint to be open to the public, you should secure it by adding a customized securityDefinitions section to the configuration file for the API definition.

The name of the configuration file is recorded in the « Configuration File Name » field of the worksheet. The instructions below show you how to add a securityDefinitions section to the configuration file. After this is added, customized, loaded, and deplayed, only Snowflake can call your Cloud Function through the API Gateway.

  1. Add the following securityDefinitions section to the configuration file. Add this just above the schemes: section of the configuration file and at the same indentation level.

    securityDefinitions:
      <security-def-name>:
        authorizationUrl: ""
        flow: "implicit"
        type: "oauth2"
        x-google-issuer: "<gmail service account>"
        x-google-jwks_uri: "https://www.googleapis.com/robot/v1/metadata/x509/<gmail service account>"
    
  2. Replace the <security-def-name> with a unique security definition name (e.g. snowflakeAccess01).

  3. Record this name in the « Security Definition Name » field in the worksheet.

  4. Replace the <gmail service account> with the value in the « API_GCP_SERVICE_ACCOUNT » field of the worksheet. Make the change in two fields in the configuration file:

    1. The x-google-issuer field.

    2. The end of the x-google-jwks_uri field.

  5. Update the post: section of the configuration file to reference the security definition that you created above.

    1. Below the operationId field, add:

      security:
        - <security-def-name>: []
      

      This should be indented at the same level as the operationId field.

      Replace <security-def-name> with the value in the « Security Definition Name » field in the worksheet.

      Make sure to include a hyphen and a blank prior to the security definition name, as shown above.

      Make sure to include the empty square braces ([]) after the colon.

      For example:

      security:
        - snowflakeAccess01: []
      

      Your updated configuration file should look similar to the following:

      swagger: '2.0'
      info:
        title: API Gateway config for Snowflake external function
        description: This configuration file connects the API Gateway resource to the remote service (Cloud Function).
        version: 1.0.0
      securityDefinitions:
        snowflakeAccess01:
          authorizationUrl: ""
          flow: "implicit"
          type: "oauth2"
          x-google-issuer: "<API_GCP_SERVICE_ACCOUNT>"
          x-google-jwks_uri: "https://www.googleapis.com/robot/v1/metadata/x509/<API_GCP_SERVICE_ACCOUNT>"
      schemes:
        - https
      produces:
        - application/json
      paths:
        /demo-func-resource:
          post:
            summary: Echo the input
            operationId: operationID
            security:
              - snowflakeAccess01: []
            x-google-backend:
              address: <Cloud Function Trigger URL>
              protocol: h2
            responses:
              '200':
                description: <DESCRIPTION>
                schema:
                  type: string
      
  6. Save the configuration file.

  7. Upload the updated configuration file.

    1. On the Gateways page, click on the name of your gateway.

    2. Click on EDIT.

    3. Under API Config, click in the box titled Select a Config.

    4. Select the option Create new API config.

    5. In the box that contains Upload an API Spec, click on the BROWSE button.

    6. Select the desired YAML file, which you created previously. Check that it has the extension « .yaml » or « .yml ».

    7. Enter the Display Name. Use a new, unique name, not the name that you used previously.

    8. If you are asked to Select a Service Account, then select App Engine default service account.

      If you are creating a function to use in production (rather than as a sample), you might choose a different service account.

      The selected service account must have appropriate privileges, including privileges to call the Google Cloud Function.

    9. You should now be back on the page for your API gateway. If the Config field shows the old API config file’s display name, then:

      1. Click on EDIT.

      2. Under API Config, find the Select a Config box again, and click in the box.

      3. Select the new API config.

      4. Click the UPDATE button.

        This takes you back to the list of API gateways.

      5. You might need to wait a few minutes while the API Gateway is updated.

        You might see an icon to the left of the API gateway name that indicates that the gateway is being refreshed.

        You can click on the REFRESH button above the gateway name to check whether the refresh is still in progress. After the icon to the left of the gateway name disappears, the gateway should be fully refreshed, and you can continue to the next step.

  8. Call your external function again to make sure that it works with the new security restrictions.

Links to Related Google Documentation

If you wish to read Google’s documentation about its Cloud Console for these steps, links are below:

Some readers might find the following Google documentation helpful:

Troubleshooting

Platform-Independent Symptoms

Actual Return Values for a Data Type Do Not Match Expected Return Values

When passing arguments to or from an external function, ensure that the data types are appropriate. If the value sent can’t fit into the data type being received, the value might be truncated or corrupted in some other way.

For more details, see Assurez-vous que les arguments de la fonction externe correspondent aux arguments analysés par le service distant.

When you call your function from SQL, you get a message about row numbers being out of order

Possible Cause(s)

Remember that the row numbers you return within each batch should be monotonically ascending integers starting at 0. The input row numbers must also follow that rule, and each output row should match the corresponding input row (e.g. the output in output row 0 should correspond to the input in input row 0).

Possible Solution(s)
  1. Make sure that the row numbers you return are the same as the row numbers you received, and that each output value uses the row number of the corresponding input. That should work. If it doesn’t, then it is possible that the input row numbers were not right, or that you did not return the rows in the correct order, so move on to step 2 below.

  2. Make sure that the output row numbers start from 0, increase by 1, and are in order.

For more information about data input and output formats, see Formats des données d’entrée et de sortie des services à distance.

When trying to call the external function, you get the message « Error parsing JSON: Invalid response »

Possible Causes(s)

The most likely cause is that the JSON returned by the remote service (e.g. AWS Lambda function) is not constructed correctly.

Possible Solution(s)

Make sure that you return an array of arrays, with one inner array returned for each input row received. Review the description of the output format at Format de données reçu par Snowflake.

An error message saying that the format of the returned value is not JSON

Possible Cause(s)

One possible cause of this is that your return value includes double quotes inside the value.

Possible Solution(s)

Although JSON strings are delimited by double quotes, the string itself should not start and end with a quotation mark in most cases. If the embedded double quotes are incorrect, then remove them.

An error message saying that the function received the wrong number of rows

Possible Cause(s)

The remote service probably tried to return more or fewer rows than it received. (Remember, even though the function is nominally scalar, it might receive multiple rows in the « body » field of the « event » parameter, and should return exactly as many rows as it received.)

Possible Solution(s)

Make sure that the remote service returns one row for each row that it receives.

Platform-Specific Symptoms

Request fails with “{« message »: »Audiences in Jwt are not allowed », »code »:403} “

Possible Cause(s)
  • This failure indicates that the value in the API integration’s google_audience field is not allowed.

Possible Solution(s)
  • Please verify that the API Integration’s google_audience value matches the managed service name of your API, which should be recorded in the « Managed Service Identifier » field in the worksheet.

  • If you added an x-google-audiences field to the securityDefinitions section of your API config file, make sure that the value in x-google-audiences matches the value in the google_audience field of the API integration.

For more information about authenticating with Google, please see the Google service account authentication documentation.

Request fails with “{« message »: »Jwt is missing », »code »:401} “

Possible Cause(s)
  • The value of the x-google-issuer field in the securityDefinitions field in the configuration file might not match the value of the API_GCP_SERVICE_ACCOUNT in the API integration (recorded in the worksheet).

  • The value in x-google-issuer might contain extra whitespace.

Possible Solution(s)
  • Update the x-google-issuer to match the API_GCP_SERVICE_ACCOUNT.

  • Remove unneeded whitespace.

Request fails with “403 Forbidden”

Possible Cause(s)

One possible cause for this failure is that the service account using the config does not have the appropriate permissions on the backend.

Possible Solution(s)

Update the service account’s permissions.