Code specification¶
This specification defines one or more code functions, procedures, or ML Jobs that can be called by a template.
A code specification can contain a maximum of 5 ML jobs, and a maximum of 5 functions and procedures combined.
At least one of ml_jobs, functions, or procedures must be defined.
For examples of different kinds of code specs, see Example specs.
Identifiers in the code specification have the following general requirements:
- Names: Must be valid Snowflake identifiers that start with a letter and contain only alphanumeric characters and underscores.
- Quoted identifiers: Double-quoted identifiers are supported for names with special characters.
- Case sensitivity: Unquoted identifiers are case-insensitive; quoted identifiers preserve case.
api_versionThe version of the Collaboration API used. Must be
2.0.0.spec_typeSpecification type identifier. Must be
code_spec.name: identifierA unique name for this code spec within this registry. Must be a valid Snowflake identifier with a maximum of 75 characters. This is used as the last name segment when calling the function in a template:
cleanroom.code_spec_name$function_nameversion: version_idCustom version identifier. Must be alphanumeric with underscores, maximum 20 characters.
description: description_text(Optional)A description of the code spec (maximum 1,000 characters).
artifacts(Optional)A list of staged files or packages that can be imported by your functions or procedures, and optionally exposed via handler functions. Maximum of 5 per spec.
alias: identifierAn alias for referencing this artifact in imports. When referencing this alias within this spec, use the bare alias name rather than
cleanroom.spec_name$alias; that is, use the bare function name to reference another function in this spec.stage_path: stage_pathFull stage path to the artifact file. For example,
@DB.SCHEMA.STAGE/path/file.whl.
- The stage must be internal. External stages aren’t supported.
- The stage must have DIRECTORY enabled: The stage containing artifacts must have
DIRECTORY = TRUEset. - Stage path format: Must follow
@[DB.]SCHEMA.STAGE/path/to/file.extformat. - No path traversal: Stage paths can’t contain
..or\. - This artifact must exist: The file must exist at the specified stage path when the code spec is registered.
- The stage must have SNOWFLAKE_SSE server-side encryption enabled. When creating or altering the stage, set
ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE'). - If you push, delete, or update a staged code file, you must call
ALTER STAGE {stage name} REFRESHto ensure that the collaboration has the latest information from the stage. Code updates are supported only before you register the code spec, as this is when the version is assigned and the hash checksum calculated.
description: description_text(Optional)A description of the artifact (maximum 500 characters).
content_hash: sha256_hash(Optional)Lowercase SHA-256 hash for integrity verification (64 hex characters).
functions(Required if no procedures or ML jobs are defined)A list of UDF or UDTF definitions.
name: identifierThe function name to expose to the calling template. Must be a valid Snowflake identifier.
typeThe function type. One of
UDForUDTF.languageThe function language. Currently only
PYTHONis supported.runtime_version: python_version(Optional)Python runtime version to use. Supported versions:
3.10to3.14.handler: handlerThe name of the handler function in the function code to call when
nameis called.arguments(Optional)Function arguments as a list of name-type pairs. Types must be valid Snowflake SQL types.
returns: sql_typeThe return type. For UDFs, use a SQL type such as STRING or FLOAT. For UDTFs, use
TABLE(column_definitions).packages(Optional)A list of packages used by this code. This can be any of these Anaconda Python packages or these Snowpark API packages. For example:
snowflake-snowpark-python,numpy.imports(Optional)A list of artifacts to import. These must be aliases from the artifacts list in this spec.
code_body(Optional)Inline Python code. Mutually exclusive with staged imports. Maximum size is 12 MB.
description: description_text(Optional)A description of the function (maximum 500 characters).
procedures(Required if no functions or ML jobs are defined)A list of stored procedure definitions. Fields are similar to
functions, except there is notypefield.ml_jobs(Required if no functions or procedures are defined)A list of ML job definitions. Maximum of 5 ML jobs per code spec. This limit is independent of the functions and procedures limit (which is also 5). Unlike functions and procedures, ML jobs run in an isolated container on a compute pool, which enables GPU acceleration, distributed training, arbitrary pip packages, and long-running asynchronous execution.
For more information, see ML Jobs in Data Clean Rooms.
name: identifierThe ML job name. Must be a valid Snowflake identifier with a maximum of 255 characters. Names are case-insensitive and must be unique within the code spec. This name is used in the template call pattern:
cleanroom.code_spec_name$ml_job_name(compute_pool, num_instances, warehouse, args_json).entrypoint: script.pyThe Python file to execute as the entry point. Must be a
.pyfile (case-sensitive), for exampletrain.pyorrun_pipeline.py. Module path notation such asmodule.functionisn’t supported: the entrypoint must always be a Python file.stage_code_dir: stage_pathThe stage directory containing the Python code to run. Must follow the format
@DB.SCHEMA.STAGE/pathand point to a directory, not a file.- The stage must be internal. External stages aren’t supported.
- The stage must have DIRECTORY enabled: Set
DIRECTORY = (ENABLE = TRUE)when creating the stage. - The stage must have SNOWFLAKE_SSE encryption enabled: Set
ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE')when creating the stage. - No path traversal: Stage paths can’t contain
..or backslashes. - Upload files without compression: When you upload files with
PUT, setAUTO_COMPRESS = FALSEso that files are stored as-is. Compressed uploads change the file names and contents, which breaks the entrypoint and integrity verification. - Refresh the directory after uploading files: Call
ALTER STAGE stage_name REFRESHafter uploading or updating any files. The collaboration uses a directory view to access staged code. Without a refresh, newly uploaded files aren’t visible to the collaboration and creation can stall. - Supported contents: The directory can contain any file type, including Python scripts (
.py), wheel packages (.whl), JAR files (.jar), shared libraries (.so), archive files (.zip,.tar.gz), configuration files (.json,.yaml), and serialized model artifacts (.pkl,.joblib). Nested subdirectories are supported. Only theentrypointfile must be a.pyscript. - Maximum 500 files per directory: The
stage_code_dircan contain at most 500 files (including all files in subdirectories). Registration fails if this limit is exceeded. - Total directory size: Keep the combined size of the
stage_code_dirunder approximately 2 GB. - Integrity verification: At registration, the system computes a SHA-256 hash for every file in the directory. When the collaboration loads the code, all hashes are re-verified. If any file has been modified, added, or removed since registration, the template request fails. Re-register the code spec after modifying staged files.
pip_requirements(Optional)A list of pip package requirements to install in the container at runtime. For example:
scikit-learn>=1.0,xgboost,pandas. Unlike function and procedure code specs, ML jobs can use any pip-installable package — not just the Anaconda-approved bundle.description: description_text(Optional)A description of the ML job (maximum 1,000 characters).
allow_monitoring: boolean(Optional)Whether to allow the analysis runner to view container logs for this ML job. Default:
false.image_tag: tag(Optional)The ML runtime image version to use. Maximum 64 characters. If omitted, the collaboration uses a default image version. Specifying a tag is recommended so that your code spec is pinned to a known set of pre-installed libraries and runtime behavior. Pinning prevents unexpected changes if the default image is updated in a future deployment.
For example:
image_tag: "2.5.0"
The following example registers a code spec with two ML jobs:
Templates that reference ML Jobs code specs call the generated procedure using the pattern
cleanroom.<code_spec_name>$<ml_job_name>(compute_pool, num_instances, warehouse, args_json).