Upload and run custom functions in a clean room¶
Overview¶
You can upload custom Python UDFs and UDTFs into your clean room and run them from your templates to perform complex data actions. These actions include machine learning or customized data manipulation within a query, as part of a single-step or multi-step flow. Python is the only coding language supported for custom UDFs.
Your uploaded code can import and use packages from an approved bundle of Python packages and the Snowpark API.
Both providers and consumers can upload custom Python code into a clean room, although the process is different for providers and consumers.
This topic shows how to upload and run custom Python UDFs and UDTFs as either a provider or a consumer.
Tip
For background information about how to develop your own Python UDFs in a clean room, see the following topics:
How UDFs work on Snowflake for general background on how to write Python functions in Snowflake.
How to write UDTFs in Snowflake if you want to return tables from your functions.
How to create and upload custom templates into a clean room. UDFs/UDTFs are called from a custom template.
Using Snowpark in clean rooms (if you want to call your UDFs from Snowpark).
Running uploaded code¶
Each bundle of uploaded code can define multiple functions that call each other, but a bundle exposes only one handler function. This handler function can be called by templates created or run by anyone who uses the clean room. If the code creates internal tables, these tables can be accessed as described in Designing multi-step flows.
For example, if you uploaded a function named simple_add
that takes in two numeric parameters, you can call it from a template as shown here. The function is always
referenced using the scope cleanroom
. For example, a template could call simple_add
like this:
SELECT cleanroom.simple_add({{ price | sqlsafe | int }}, {{ tax | sqlsafe | int }}) ...
Tip
If the provider wants to run the code above, they must alias all SELECT columns that use an aggregate or custom function, because a results table is generated behind the scenes:
SELECT
cleanroom.simple_add(
{{ price | sqlsafe | int }}, {{ tax | sqlsafe | int }}
) AS TOTAL_ITEM_COST
...
You can upload multiple functions in a single package, and functions within a single package can call each other, but functions can’t call functions within other packages. (They can call the handler functions, though.) For example:
Package 1 |
Package 2 |
---|---|
|
|
Notes:
Code uploaded by either side (provider and consumer) can be run by either side.
A template can call function A or function B, but not A1, A2, B1, B2.
Function A can call function B, and the reverse.
Function A can’t call B1 or B2 and Function B can’t call A1 or A2.
A1 can call A2 and the reverse. A1 and A2 can call B. A1 and A2 can’t call B1 or B2.
B1 can call B2 and the reverse. B1 and B2 can call A. B1 and B2 can’t call A1 or A2.
Updating or deleting custom functions¶
You can upload or overwrite an existing function or template that you uploaded, but you can’t delete an existing function or template. The only way to “remove” a function is to create a dummy function with the exact same name and signature that always succeeds.
Uploading a function with the exact same signature as one that you previously uploaded will overwrite the existing function. The signature is the case-insensitive function name of the external handler, and the data types of all parameters, in the same order. Parameter names don’t matter. You can’t overwrite a function uploaded by another account.
Because the signature must match when you update a function, you cannot change the signature of an existing function: if you upload the
function foo(name VARIANT age INTEGER)
and then upload the function foo(name VARIANT age FLOAT)
, the second function will be added
to the clean room in addition to the first, because the argument types differ.
Provider-submitted code¶
Provider-submitted functions can be uploaded as inline code or from a Snowflake stage. Both techniques are covered here.
Your uploaded code can natively import and use packages from an approved set of Python packages. If you need a non-default package, you must use Snowpark Container Services in a clean room to host your code.
You can’t view uploaded provider code, even your own code, so be sure to include a copy of exactly what you upload into a clean room.
Tip
After updating provider-written code, you should update the default release directive and then call
provider.create_or_update_cleanroom_listing
to propagate the changes to consumers. If you do not call
provider.create_or_update_cleanroom_listing
, your default version will not update for consumers that are currently using the clean
room.
Here is a high level view of how a provider adds code to a clean room:
The provider creates and configures the clean room in the normal way.
The provider uploads code by calling
provider.load_python_into_cleanroom
. You can either upload your code inline directly within that procedure, or upload a code file to a stage, then provide the stage location to that procedure.Although your code can include multiple functions, only one handler is exposed for each upload. To expose multiple functions to templates, upload each handler by calling
provider.load_python_into_cleanroom
.After each successful code upload, a new patch version of the clean room is generated. You must then increase the default version by calling
provider.set_default_release_directive
with the new patch number. If the clean room is exposed externally, security checks are run before installing your code, and you must callprovider.view_cleanroom_scan_status
to confirm that security checks have passed before incrementing the default version.If you want to upload multiple functions in a single patch, you can bulk upload your code. However, bulk uploads can make it more challenging to debug if the upload has a security scan issue, because the file that caused the problem isn’t reported in the error response.
You create and upload a custom template that calls your code. The template must call the handler function using the
cleanroom
scope, that is:cleanroom.my_function(...)
.The consumer runs your template the same way as any other template.
Tip
If the consumer encounters a mount error when they install a clean room with custom code, this can indicate a syntax error in the code.
You can find code examples demonstrating this flow in the provider-written code example section.
Important notes about versioning¶
Every time the provider uploads a function, it increases the patch number (and there is a limit of 99 patch numbers). Therefore, do your best to test and debug your code thoroughly before adding it to the clean room to reduce version updates during development.
If you do update a patch number, customers using the clean room UI might need to refresh the page to see the change. Customers using the API should see changes immediately, but there can be a delay, depending on the available resources. Learn more about clean room versioning.
Uploading provider-written inline functions¶
You can upload the code inline in the code
parameter of provider.load_python_into_cleanroom
. Here is an example of uploading a
simple function inline:
CALL samooha_by_snowflake_local_db.provider.load_python_into_cleanroom(
$cleanroom_name,
'simple_add', -- Name used to call the UDF from a template.
['first INTEGER', 'second INTEGER'], -- Arguments of the UDF, specified as '<variable_name> <SQL type>' pairs.
['numpy', 'pandas'], -- Packages imported by the UDF.
'INTEGER', -- SQL return type of UDF.
'add_two', -- Handler function in your code called when external name is called.
$$
import numpy as np # Not used, but you can load supported packages.
import pandas as pd
def add_two(first, second):
return first + second
$$
);
The calling template calls cleanroom.simple_add
to call this function.
The provider examples demonstrate how to upload inline code.
Uploading provider-written functions from a stage¶
You can upload Python files to a clean room stage and reference the stage when you call provider.load_python_into_cleanroom
. Loading
code from a stage allows you to develop the code in your local system in an editor, avoid copy/paste errors when loading it inline, and
also have better versioning control. Note that you can upload multiple files in a single procedure call, but only one handler function is
exposed for each upload.
Code is loaded from a stage into the clean room when you call load_python_into_cleanroom
; later changes to the code on the stage are not
propagated to the clean room.
To upload your UDF to a stage:
Create your .py file and make it available in a location where you can upload it to a Snowsight stage.
To get the name of the stage for your clean room, call
provider.get_stage_for_python_files($cleanroom_name)
. This stage is accessible by the clean room – you cannot use an arbitrary stage that you create.Upload the .py file to the stage for your clean room. There are several ways to do this, including using the CLI, Snowsight, or language-specific drivers.
Call
provider.load_python_into_cleanroom
with the stage location, handler, external name, arguments, and return type. Templates in your clean room can now call the function.
The following example code shows how to load code into a clean room from a stage.
-- Save the following code as reverser.py:
--import numpy as np
--def main(some_string):
-- '''Return the reverse of a string plus a random number 1-10'''
-- return some_string[::-1] + str(np.random.randint(1,10))
-- Get the stage for your clean room.
CALL samooha_by_snowflake_local_db.provider.get_stage_for_python_files($cleanroom_name);
-- Save the file to the stage. Here is how to do it by using the Snowflake CLI
PUT file://~/reverser.py <STAGE_NAME> overwrite=True auto_compress=False;
-- Load the code from the stage into the clean room.
CALL samooha_by_snowflake_local_db.provider.load_python_into_cleanroom(
$cleanroom_name,
'reverse', -- Name used to call the function
['some_string STRING'], -- Arguments and SQL types
['numpy'], -- Any required packages
['/reverser.py'], -- Relative path to file on stage
'STRING', -- Return type
'reverser.main' -- <FILE_NAME>.<FUNCTION_NAME>
);
-- Uploading code, even from a stage, increases the patch number.
CALL samooha_by_snowflake_local_db.provider.set_default_release_directive(
$cleanroom_name, 'V1_0', <NEW_PATCH_NUMBER>);
-- Upload a template that calls the function.
CALL samooha_by_snowflake_local_db.provider.add_custom_sql_template(
$cleanroom_name,
$udf_template_name,
$$
SELECT
p.status,
cleanroom.reverse(p.status)
FROM SAMOOHA_SAMPLE_DATABASE.DEMO.CUSTOMERS AS p
LIMIT 100;
$$
);
-- Switch to the consumer account and run the template to see the results.
The provider examples demonstrate uploading code from a stage.
Troubleshooting syntax errors or scan failures in uploaded code¶
If you upload a function that fails because of a syntax error, or if a security scan fails, an unpublishable patch can be generated. Therefore, you should thoroughly test your code before upload to ensure that it has no syntax errors.
You can see the list of packages, and their review status, by running the following SQL command, providing the clean room ID in the place indicated:
SHOW VERSIONS IN APPLICATION PACKAGE samooha_cleanroom_cleanroom_id;
Security scans¶
A security scan is run after any action that generates a new patch version in an external clean room, such as when the provider uploads Python into the clean room. (Consumer-submitted code, described on this page, does not trigger a security scan.) Internal clean rooms do not run security scans, but if you change an internal clean room to an external clean room, it will trigger a security scan for that patch. A clean roomm patch cannot be published externally until the patch has been scanned.
Snowflake Data Clean Rooms uses the Snowflake Native App security scan framework. Follow the native app security best practices to avoid security scan errors.
You can perform additional patch-creating actions before the last security scan is complete. However, you must wait for
provider.view_cleanroom_scan_status
to show success before you can update the default release directive in order to serve the
latest version of the clean room.
Uploading multiple Python functions in a single patch¶
If you want to upload multiple template-callable Python packages to your clean room, you can call prepare_python_for_cleanroom
multiple
times, then call load_prepared_python_into_cleanroom
once to scan, upload, and generate a single patch for your clean room. The
following example demonstrates uploading a UDF and a UDTF using bulk uploading:
---- Add custom inline UDF ----
CALL samooha_by_snowflake_local_db.provider.prepare_python_for_cleanroom(
$cleanroom_name,
'get_next_status', -- Name of the UDF. Can be different from the handler.
['status VARCHAR'], -- Arguments of the UDF, specified as (variable name, SQL type).
['numpy'], -- Packages needed by UDF.
[], -- When providing the code inline, this is an empty array.
'VARCHAR', -- Return type of UDF.
'get_next_status', -- Handler.
$$
import numpy as np
def get_next_status(status):
"""Return the next higher status, or a random status
if no matching status found or at the top of the list."""
statuses = ['MEMBER', 'SILVER', 'GOLD', 'PLATINUM', 'DIAMOND']
try:
return statuses[statuses.index(status.upper()) + 1]
except:
return 'NO MATCH'
$$
);
---- Add custom inline UDTF. ----
CALL samooha_by_snowflake_local_db.provider.prepare_python_for_cleanroom(
$cleanroom_name,
'get_info', -- Name of the UDTF. Can be different from the handler.
['hashed_email VARCHAR', 'days_active INT', 'status VARCHAR', 'income VARCHAR'], -- Name/Type arguments of the UDTF.
['numpy'], -- Packages used by UDTF.
[], -- When providing the code inline, this is an empty array.
'TABLE(hashed_email VARCHAR, months_active INT, level VARCHAR)', -- Return type of UDTF.
'GetSomeVals', -- Handler class name.
$$
class GetSomeVals:
def __init__(self):
self.month_days = 30
def process(self, hashed_email, days_active, status, income):
'''Change days into rough months, and also return whether we
think the user's membership status is lower, higher, or equal to
what is expected, based on their income.'''
months_active = days_active // self.month_days
brackets = ['0-50K', '50K-100K', '100K-250K', '250K+']
statuses = ['MEMBER', 'SILVER', 'GOLD', 'PLATINUM']
if(statuses.index(status) < brackets.index(income)):
level = 'low'
elif(statuses.index(status) > brackets.index(income)):
level = 'high'
else:
level = 'equal'
yield(hashed_email, months_active, level)
$$
);
-- Upload all stored procedures.
-- Note the new patch number returned by this procedure. Keep this number for later use.
CALL samooha_by_snowflake_local_db.provider.load_prepared_python_into_cleanroom($cleanroom_name);
-- Set the release directive specified by the last load_python_into_cleanroom call.
CALL samooha_by_snowflake_local_db.provider.set_default_release_directive($cleanroom_name, 'V1_0', <PATCH_NUMBER>);
Provider-written code examples¶
The following examples demonstrate adding provider-written UDFs and UDTFs to a clean room.
Download the following examples and then upload them as worksheet files in your Snowflake account. You need separate accounts for the provider and consumer, each with the clean rooms API installed. Replace the information as noted in the sample files.
Loading a file from a stage
. Run this notebook after you run the provider example to try loading a UDF from a stage.Uploading multiple Python functions in a single patch.
This is a single-account internal testing clean room; you can use the same account for both the provider role and the consumer role.
Consumer-submitted code¶
Consumer-uploaded code is bundled and uploaded with a custom template using the consumer template upload flow. The uploaded code can be called by any template in the clean room.
To upload code as a consumer, you should understand custom template syntax.
Note that any code uploaded by a consumer can be seen by the provider when they request permission to upload. The consumer code is also visible whenever a provider or consumer examines the template.
Here is an overview of the steps to upload custom consumer code:
The provider creates the clean room in the standard way and then invites the consumer.
The consumer installs and configures the clean room in the standard way.
The consumer prepares a template that calls the UDF or UDTF within the
cleanroom
namespace. For example, to call the consumer-definedcalculate_tax
function, a simple template might look like the following snippet:SELECT {{ cleanroom.calculate_tax(p.cost) }} AS Tax FROM my_db.my_sch.sales AS p;
The consumer prepares their Python code. We recommend using double quotation marks (
" "
) rather than single quotation marks (' '
) in your code to avoid extra escaping needed later. Your code can reference these supported Python libraries.The consumer passes their Python code into
consumer.generate_python_request_template
. The procedure returns the Python code as a stored procedure, with a placeholder for the custom JinjaSQL template. There are several multi-line strings in the template that use$$
as multi-line delimiters.The consumer replaces the template placeholder in the output from
generate_python_request_template
with their JinjaSQL template.In the combined template, escape any single quotes like this:
\'
. This is because single quotes will be used as the outermost delimiter for the entire multi-line procedure string when you upload it to the clean room. Here is an example of a stored procedure that includes the consumer Python code and custom template, with character escaping:BEGIN CREATE OR REPLACE FUNCTION CLEANROOM.custom_compare(min_status STRING, max_status STRING, this_status STRING) RETURNS boolean LANGUAGE PYTHON RUNTIME_VERSION = 3.10 PACKAGES = (\'numpy\') HANDLER = \'custom_compare\' AS $$ import numpy as np def custom_compare(min_status:str, max_status:str, this_status:str): statuses = [\'MEMBER\', \'SILVER\', \'GOLD\', \'PLATINUM\'] return ((statuses.index(this_status) >= statuses.index(min_status)) & (statuses.index(this_status) <= statuses.index(max_status))) $$; -- Custom template LET SQL_TEXT varchar := $$ SELECT c.status, c.hashed_email FROM IDENTIFIER( {{ my_table[0] }} ) as c WHERE cleanroom.custom_compare({{ min_status }}, {{ max_status }}, c.status); $$; LET RES resultset := (EXECUTE IMMEDIATE :SQL_TEXT); RETURN TABLE(RES); END;
The consumer calls
consumer.create_template_request
with the combined template. Use single quotation marks (' '
) instead of double dollar sign delimiters ($$...$$
) around the code you provide for stored procedure in thetemplate_definition
argument. For example:CALL samooha_by_snowflake_local_db.consumer.create_template_request( $cleanroom_name, $template_name, ' BEGIN -- First, define the Python UDF. CREATE OR REPLACE FUNCTION CLEANROOM.custom_compare(min_status STRING, max_status STRING, this_status STRING) RETURNS boolean LANGUAGE PYTHON RUNTIME_VERSION = 3.10 PACKAGES = (\'numpy\') HANDLER = \'custom_compare\' AS $$ import numpy as np def custom_compare(min_status:str, max_status:str, this_status:str): statuses = [\'MEMBER\', \'SILVER\', \'GOLD\', \'PLATINUM\'] return ((statuses.index(this_status) >= statuses.index(min_status)) & (statuses.index(this_status) <= statuses.index(max_status))) $$; -- Then define and execute the SQL query. LET SQL_TEXT varchar := $$ SELECT c.status, c.hashed_email FROM IDENTIFIER( {{ my_table[0] }} ) as c WHERE cleanroom.custom_compare({{ min_status }}, {{ max_status }}, c.status); $$; -- Execute the query and then return the result. LET RES resultset := (EXECUTE IMMEDIATE :SQL_TEXT); RETURN TABLE(RES); END; ');
The consumer and provider continue with the standard consumer-defined template flow:
The provider views the template request (
provider.list_pending_template_requests
) and then approves it by callingapprove_template_request
. In the request, the provider can see the template and the bundled code.The consumer checks the request status (
consumer.list_template_requests
), and when the status is APPROVED, runs the template (consumer.run_analysis
).
Consumer-written code examples¶
The following examples demonstrate adding provider-written UDFs to a clean room.
Download the following examples and then upload them as worksheet files in your Snowflake account. You need separate accounts for the provider and consumer, each with the clean rooms API installed. Replace the information as noted in the sample files: