Using Cortex Code CLI with other data platforms¶
Cortex Code CLI supports plugins that extend its capabilities to external data platforms. Each platform plugin adds domain-specific skills that you can use through natural language prompts.
Prerequisites¶
Cortex Code CLI installed and authenticated to Snowflake.
AWS Glue¶
Cortex Code includes built-in skills for both preparing AWS-side Iceberg infrastructure and creating a Snowflake catalog integration with AWS Glue Data Catalog. You can use these skills end-to-end: from raw data in S3 to queryable Iceberg tables in Snowflake, or independently if your data is already registered in Glue.
Note
These are built-in capabilities. No plugin needs to be enabled — the skills are available in every Cortex Code session.
Preparing data in AWS Glue¶
Use this skill to prepare the AWS side of an Iceberg pipeline: verify credentials, discover S3 data, create a Glue database and crawler, validate schemas, and convert parquet or CSV data to Iceberg format using Athena. At the end of this workflow, Cortex Code can hand off directly to the Snowflake catalog integration skill.
Capabilities¶
Capability |
Description |
Example Prompt |
|---|---|---|
AWS authentication |
Verify AWS CLI credentials, identify the caller identity, and list accessible S3 buckets |
“Check my AWS credentials and show me my S3 buckets” |
S3 data discovery |
Inventory source files in S3, identify formats, and validate directory structure for Athena compatibility |
“What data files are in my S3 bucket and are they structured for Athena?” |
Glue database & crawler setup |
Create or reuse a Glue database and IAM crawler role, detect Lake Formation mode, run the crawler |
“Set up a Glue database and crawler for my S3 data” |
Lake Formation grants |
Detect LF mode and grant the crawler role the minimum required permissions on the database and S3 location |
“My Glue crawler is failing with a Lake Formation permission error — help me fix it” |
Schema discovery & validation |
List discovered tables, compare column types, fix duplicate partition columns from Snowflake-exported parquet |
“Validate the schemas for the tables my crawler discovered” |
Parquet-to-Iceberg conversion |
Convert parquet, CSV, or JSON tables to Iceberg format using Athena CTAS or Glue Spark |
“Convert my parquet tables to Iceberg using Athena” |
Iceberg table registration |
Register existing Iceberg tables in Glue Data Catalog with correct |
“Register my existing Iceberg tables in Glue” |
Crawler cleanup |
Delete the crawler after schema discovery to avoid accidental re-crawls and ongoing cost |
“Delete the Glue crawler now that schema discovery is done” |
AWS side prerequisites¶
AWS CLI installed and configured. If it is not installed, Cortex Code provides installation instructions.
An AWS profile with sufficient permissions to create Glue databases, crawlers, and IAM roles, and to run Athena queries. See the minimum operator policy below.
Athena engine version 3 (Trino-based) workgroup if you are converting data to Iceberg format. Cortex Code checks the workgroup version and prompts you to upgrade if needed.
The following IAM policy covers the minimum permissions needed to run this skill:
If you already have a crawler IAM role, omit the IAMRoleCreation block. If Lake Formation is not enabled in
your account, omit the LakeFormation block.
Getting started¶
Start a Cortex Code session. Cortex Code asks for your AWS CLI profile and region before running any commands.
If your data is already in Iceberg format in S3 and you want to register it in Glue:
If your data is in parquet, CSV, or JSON and needs converting:
To run individual phases:
At the end of AWS setup, Cortex Code asks whether you want to continue with Snowflake catalog integration. If yes, it passes all collected variables (account ID, region, Glue database name, IAM role ARN, table names) directly to the catalog integration skill.
Troubleshooting¶
- AWS credentials expired
Symptom: Commands fail with
ExpiredTokenExceptionorExpiredToken.Solution: Run
aws sso login --profile <profile>to refresh your session, or runaws configure --profile <profile>to reconfigure credentials.- Lake Formation permission error on crawler
Symptom: The Glue crawler fails with
Insufficient Lake Formation permission(s): Required Describe on <database>.Solution: Lake Formation is active in your account. The crawler IAM role needs explicit LF grants on the Glue database. Use the prompt
My Glue crawler is failing with a Lake Formation permission errorand Cortex Code detects whether LF is in restrictive or IAM compatibility mode and generates the correctgrant-permissionscommands.- Tables not discovered correctly after crawling Iceberg data
Symptom: The crawler creates entries for metadata files (
metadata/,*.avro) instead of the actual Iceberg tables, or table names don’t match expectations.Solution: The standard S3 Glue crawler does not natively recognize Iceberg format. Use the prompt
Register my existing Iceberg tables in Glueto bypass the crawler and register tables directly with the correcttable_type = ICEBERGparameter and metadata location.- Athena CTAS fails with
HIVE_BAD_DATA Symptom: Parquet-to-Iceberg conversion fails with a type mismatch error.
Solution: The source table has column types that Athena cannot implicitly cast. Cortex Code re-generates the CTAS statement with explicit
CAST()expressions for the affected columns. You can also askSwitch to Glue Spark for the conversionto use theadd_filesapproach instead of Athena.- Athena backtick syntax error
Symptom: CTAS or SELECT queries fail with
SYNTAX_ERROR: backquoted identifiers are not supported.Solution: Athena’s Trino engine (v3) uses double quotes for identifiers, not backticks. Cortex Code automatically uses the correct quoting in generated SQL.
- Duplicate partition column error
Symptom: Athena queries on a partitioned table fail with a duplicate column error after crawling.
Solution: This occurs when Snowflake-exported parquet files include the partition column both inside the file and in the Hive directory path (
column=value/). Use the promptFix duplicate partition columns in my Glue tableand Cortex Code runsaws glue update-tableto remove the duplicate fromStorageDescriptor.Columns.
Snowflake catalog integration¶
Use this skill to create a Snowflake catalog integration for AWS Glue Data Catalog, configure access delegation, and query Iceberg tables registered in Glue directly from Snowflake using the Iceberg REST Catalog (IRC) protocol.
Capabilities¶
Capability |
Description |
Example Prompt |
|---|---|---|
Catalog integration setup |
Create a Snowflake catalog integration for AWS Glue Data Catalog using public connectivity or AWS PrivateLink |
“Set up a catalog integration for my AWS Glue Data Catalog” |
Access delegation |
Configure catalog-vended credentials (Lake Formation) or external volume credentials for S3 data access |
“Help me choose between vended credentials and external volume for Glue access” |
IAM trust policy |
Retrieve the Snowflake IAM user ARN and external ID, then configure the AWS trust policy |
“What trust policy do I need to add to my IAM role for Snowflake?” |
PrivateLink connectivity |
Provision an outbound PrivateLink endpoint and create the catalog integration for private connectivity |
“Set up a private Glue catalog integration using PrivateLink” |
Verification |
Test the catalog integration by listing namespaces and tables from Glue Data Catalog |
“Verify my Glue catalog integration and list available tables” |
Troubleshooting |
Diagnose and fix connection errors, IAM permission issues, Lake Formation access failures, and PrivateLink problems |
“My Glue catalog integration is returning a 403 error — help me fix it” |
Snowflake prerequisites¶
Snowflake:
ACCOUNTADMINrole, or explicit grants forCREATE INTEGRATIONandCREATE EXTERNAL VOLUMEon the account.Business Critical Edition or higher if you are using PrivateLink connectivity.
AWS:
An AWS account with Glue Data Catalog configured and Iceberg tables registered.
An IAM role that Snowflake can assume to access Glue and S3.
Lake Formation enabled and configured if you are using catalog-vended credentials (recommended).
Getting started¶
Start a Cortex Code session and use natural language to trigger the skill:
Cortex Code walks you through the setup one step at a time, collecting your AWS account ID, region, IAM role ARN, access delegation mode, and connectivity type before generating the SQL.
To verify an existing integration:
To troubleshoot a failing integration:
Authentication¶
The Glue catalog integration uses AWS SigV4 authentication. Snowflake assumes an IAM role in your AWS account to interact with Glue and S3.
During setup, Cortex Code generates the CREATE CATALOG INTEGRATION SQL with the role ARN you provide, then
retrieves the Snowflake IAM user ARN and external ID. You add those values to your IAM role’s trust policy to
authorize the connection.
Troubleshooting¶
- Trust relationship not configured
Symptom: Verification returns an
Access deniedorInvalidClientTokenIderror.Solution: Run
DESC CATALOG INTEGRATION <name>to retrieve the Snowflake IAM user ARN and external ID, then add them to your IAM role’s trust policy in the AWS console.- External ID mismatch
Symptom: Verification fails with an authentication error even though the trust policy exists.
Solution: Re-run
DESC CATALOG INTEGRATION <name>to get the current external ID and update the trust policy condition to match exactly.- Lake Formation access denied
Symptom: Tables are visible but queries return a
403 Forbiddenor Lake Formation permission error.Solution: Grant the IAM role
SELECTpermission on the Glue database and tables in the Lake Formation console.- PrivateLink endpoint not available
Symptom:
CREATE CATALOG INTEGRATIONwithCATALOG_API_TYPE = AWS_PRIVATE_GLUEfails with a connectivity error.Solution: Run
SELECT SYSTEM$PROVISION_PRIVATELINK_ENDPOINT('com.amazonaws.<region>.glue', 'glue.<region>.amazonaws.com')to provision the endpoint, wait for it to become available, then retry.- Cross-region PrivateLink not supported
Symptom: PrivateLink provisioning fails when your Snowflake account and Glue Data Catalog are in different regions.
Solution: Use public connectivity, or ensure your Snowflake account and Glue Data Catalog are in the same AWS region. AWS does not support cross-region PrivateLink for the Glue service.
Databricks¶
Cortex Code provides a built-in Databricks plugin with skills for managing workspaces, browsing Unity Catalog, building ETL pipelines, deploying bundles, diagnosing performance, optimizing costs, and more.
Capabilities¶
Capability |
Description |
Example Prompt |
|---|---|---|
CLI Setup & Auth |
Install the Databricks CLI and authenticate to your workspace using OAuth or a personal access token |
“Help me install the Databricks CLI and authenticate to my workspace” |
Unity Catalog |
Browse catalogs, schemas, tables, and volumes in your Databricks workspace |
“List all catalogs in my Databricks workspace” |
Cluster & Job Management |
List clusters, inspect job runs, view failures, and manage compute resources |
“Show me recent job runs that failed” |
PySpark ETL Pipelines |
Build medallion-architecture ETL pipelines with PySpark notebooks that read and write to Delta tables |
“Build a medallion ETL pipeline that reads and writes silver/gold tables” |
dbt on Databricks |
Create dbt projects with staging and marts layers, configured for Databricks and deployed with bundles |
“Create a dbt project with staging and marts layers deployed via DAB” |
Automation Bundles (DAB) |
Initialize and deploy Databricks Asset Bundles with scheduled jobs, pipelines, and configuration |
“Initialize a Databricks bundle project with a scheduled job that runs daily” |
Local Testing |
Generate pytest unit tests for PySpark notebooks with mocked dbutils and SparkSession |
“Generate pytest unit tests for my PySpark notebook that mock dbutils and SparkSession” |
Spark Performance |
Diagnose slow Spark jobs, identify spill, skew, and shuffle bottlenecks, and recommend fixes |
“My Spark job is slow and spilling to disk – help me diagnose and fix it” |
Cost Optimization |
Audit Databricks compute spend, identify waste, and recommend right-sizing and policy changes |
“Audit my Databricks compute costs and recommend optimizations” |
Notebook Refactoring |
Refactor monolithic notebooks into modular Python packages with thin orchestrator notebooks |
“Refactor my large notebook into modular Python packages with thin orchestrator notebooks” |
Databricks SQL |
Create and manage materialized views, use AI functions, and work with SQL warehouses |
“Help me create a materialized view and use AI functions in Databricks SQL” |
Databricks prerequisites¶
A Databricks workspace with API access enabled.
Workspace admin or sufficient permissions for the operations you want to perform.
The Databricks CLI is installed automatically on first use if not already present.
Enabling the Databricks plugin¶
Enable the plugin and start a new Cortex Code session:
After enabling, Databricks skills are available in all future sessions until you disable the plugin.
To disable the plugin:
Getting started¶
After enabling the plugin, start a Cortex Code session and set up your connection:
Once connected, explore your environment:
Then try more advanced workflows:
Authentication¶
The Databricks plugin uses the Databricks CLI’s authentication. You can authenticate using OAuth (recommended) or a personal access token.
- OAuth (recommended)
Run the following command and follow the browser prompts:
This creates a reusable OAuth token stored locally. Cortex Code uses it automatically.
- Personal access token
Set the following environment variables before starting Cortex Code:
Troubleshooting¶
- Plugin not found
Symptom:
cortex plugin enable databricksreports that the plugin doesn’t exist.Solution: Update Cortex Code CLI to the latest version. The Databricks plugin ships with Cortex Code starting from version 1.0.75.
- Authentication failures
Symptom: Operations return 401 or 403 errors.
Solution: Try the following steps:
Run
databricks auth loginagain to refresh your OAuth token.Verify that your personal access token hasn’t expired.
Check that your user has the required workspace permissions for the operation.
- Workspace connectivity
Symptom: Operations time out or fail to connect.
Solution: Verify the workspace URL is correct and reachable from your machine. If you’re behind a VPN or firewall, make sure that outbound HTTPS traffic to your Databricks workspace is allowed.