description:: Migrating code with Snowflake CoCo and the Snowpark Migration Accelerator

Migrating with Snowflake CoCo¶

Snowflake CoCo is the primary migration tool for any spark to snowflake migration. There are skills that are bundled directly with the Snowflake CoCo CLI that can work alongside the Snowpark Migration Accelerator (SMA) or completely separately to migrate you to Snowflake.

spark-migration: (User guide is on this page.) This skill is the primary orchestrator for any Spark to Snowflake migration.
snowpark-connect - Spark to Snowpark Connect: Migrate scripts and notebooks with spark references to Snowpark Connect using the Snowflake CoCo CLI. This skill is bundled as part of the spark-migration skill.

Spark Migration skill user guide¶

What this skill does¶

The spark-migration skill orchestrates an end-to-end migration of PySpark or Spark Scala code to Snowflake. It guides you through code conversion (LLM powered or with the SMA), issue tracking and resolution, and test scaffolding. It can also handle notebook conversion for notebooks with references to the Spark API.

Possible prerequisites¶

Before the skill begins, it checks for the following resources available locally:

Git must be installed to run the validation component (auto-installs via Homebrew on macOS if missing)
The SMA CLI is required if you choose the Snowpark API conversion path (the skill will search for it or ask you to provide the path)

Step-by-step walkthrough¶

Step 1: Load configuration¶

This skill creates a configuration file to save settings and project characteristics that you may want to keep from project to project. The skill will check for saved project configurations from previous runs automatically.

What happens:

If saved configurations exist, you’ll see a numbered list and be asked whether to reuse one or create a new configuration.
If no configurations exist, you’ll be asked for basic project information.

The information that is saved in a configuration file is as follows, but note that you can tell Snowflake CoCo to “use default values for all fields” if you do not want to save a specific configuration.

You will be prompted for:

Project Name (required): used as the config filename
Source Code Path (your PySpark source)
Output Folder (where converted code goes)
Customer Email
Customer Company

Step 2: Review configuration¶

A full summary of all the settings will be displayed, including defaults for optional parameters.

What happens:

You see the complete configuration with current/default values.
You choose “Use these settings” or “Edit settings”.

You will be prompted if:

This is a first run (all parameters shown for you to fill in)
You choose to edit (you only specify the numbers you want to change)

Key settings you can configure here:

Setting	Options	Default
Conversion Type	`snowpark-connect` (Snowpark Connect) / `snowpark_api` (with the SMA CLI)	`snowpark-connect`
Migration Status	`migrate` (run conversion) / `already_migrated` (use existing output)	`migrate`
Run Notebook Migration	yes / no	yes
Run EWI Fixer	yes / no	yes
Run Stage Conversion	yes / no	yes
Run Validation (DVP) Orchestrator	yes / no	yes

Step 3: Route based on migration status¶

No user interaction: the skill reads your configuration and routes automatically:

already_migrated → Step 4 (validate existing output)
migrate → Step 5 (choose conversion tool)

Step 4: Validate existing output (only if `already_migrated`)¶

If you already have an output from this skill or from the SMA from a previous conversion, you can choose ‘already_migrated’. This will move you to the git setup step below.

What happens:

The skill validates that Output/ and Reports/ directories exist at your output path.
It auto-detects SMA v1 format (Conversion-* timestamped folders).

You will be prompted if:

The output path is invalid: you’ll be asked to provide the correct one.

Step 5: Choose conversion tool (only if `migrate`)¶

You will be prompted to choose a conversion path if you did not specify one as part of the configuration file. This would be unusual.

You will be prompted for:

Output path: where to save the converted code
Conversion tool: two options:
- Snowpark API: uses the SMA CLI binary (requires installation)
- Snowpark Connect: uses the bundled snowpark-connect sub-skill (AI-driven conversion)

Step 6: Snowpark API conversion (if you chose that option)¶

What happens:

The SMA CLI binary is located/validated.
The conversion runs in the background (can take several minutes for large workloads).
Progress is monitored and reported to you.

You will be prompted for (if not already configured):

SMA CLI path (if not found automatically)
Enable Jupyter Conversion? (Y/N)
If you have embedded SQL, what source “Flavor” will that SQL be? (SparkSql / HiveSql / Databricks)
Generate Checkpoints? (Y/N)

What to expect: The SMA CLI processes your script and notebook files, and produces converted Snowpark files in the output directory, along with CSV reports documenting all issues found during conversion.

Step 7: Snowpark Connect conversion (if you chose that option)¶

What happens:

The bundled snowpark-connect sub-skill is loaded.
It detects whether your code is Python or Scala and routes to the appropriate migration workflow.
An AI-driven analysis and fix pipeline converts your code.

You will NOT typically be prompted: the sub-skill runs autonomously with the project information already collected.

What to expect: The sub-skill analyzes your code for compatibility issues, creates a conversion folder, applies fixes, updates imports/session creation, adds migration headers, and generates reports similar to what would be generated by the SMA (Issues.csv, etc.).

Step 8: Initialize Git and verify output¶

To perform the validation component, the skill will create a git repository or ask you to create one. This step may be skipped if you are not going to use the skill’s validation subskill.

What happens (automatic, no prompts):

A Git repository is initialized at the resolved output directory.
An initial commit captures the unmodified conversion output on the main branch.
A sma/migration-process branch is created for all subsequent modifications.
The output structure (Output/, Reports/Issues.csv) is verified.

You will be prompted if:

The directory is already a Git repo with uncommitted changes: you’ll choose to stash, commit, or abort.

Step 9: Dashboard generation¶

This dashboard is generated from the reports created in the conversion step(s) above. This runs on a local python server that the skill will create.

What happens (automatic):

The sma-dashboard-generator skill parses Reports/Issues.csv.
An interactive EWI (Errors, Warnings, Issues) tracking dashboard is generated.
A local web server starts and opens the dashboard in your browser.

No user prompts. The dashboard opens automatically at http://localhost:8080 (or the next available port).

Step 10: Notebook migration¶

What happens:

If configured as yes (default): notebooks are scanned and converted automatically using the snowflake-notebooks-migration subskill that is bundled with the spark-migration skill.
If not configured: the skill will still scan for notebook files and prompt you to run the subskill.

You will be prompted if:

The setting was not pre-configured AND notebooks are found: you’ll be asked whether to run notebook migration.

What to expect: Notebook files (.ipynb, .python, .scala, .sql, Databricks .py) are converted to Snowflake Workspace format in-place.

Step 11: EWI fixer¶

Automatically resolves conversion issues (EWIs) in the converted code using AI. This is necessary for Snowpark API runs, but is not necessary for Snowpark Connect runs. However, there still could be EWIs output by the snowpark-connect subskill.

Note that EWIs will be recorded in a report (the issues.csv file) and will be written as comments in the output code. You can choose whether you’d like to delete those inline comments in this step.

What happens:

If configured as yes (default): runs automatically with saved options.
If not configured: you’re asked whether to run it.

You will be prompted for (if not pre-configured):

Run EWI Fixer? (Yes / No)
EWI comment handling: Mark (keep comments with [FIXED]/[NOT-FIXED] prefix) or Remove (delete after fixing).
Which EWIs to process: Only pending / Retry not_auto_resolved / Specific EWI code / All (reset)

What to expect: The fixer reads EWI comments in your converted files, attempts to resolve each one, and updates the SQLite database with results. The dashboard will reflect the updated status.

Step 12: Stage conversion¶

Replaces embedded file paths (s3://, hdfs://, etc.) with Snowflake stage references (@stage_name/...).

What happens:

If configured as yes (default): runs automatically with the configured stage name.
If not configured: you’re asked.

You will be prompted if:

The setting was not pre-configured: you’ll be asked whether to replace embedded file paths.

What to expect: All cloud storage paths in your converted code are replaced with Snowflake internal stage references using the configured prefix (default: migration_stage).

Step 13: DVP orchestrator¶

Sets up a Data Validation Pipeline (DVP) workspace for testing the migrated code.

What happens:

If configured as yes (default): runs automatically. No prompt.
If configured as no: skips entirely.

What to expect: The DVP orchestrator creates a dvp/ workspace and runs up to 8 sub-skills:

Create DVP workspace structure
Convert notebooks to scripts (if applicable)
Generate Abstract Syntax Graph (ASG) from source files
Identify entrypoints in the code
Adapt code for testing
Identify I/O schemas (inputs/outputs)
Generate synthetic test data
Generate test setup and register test suites

Step 14: Final dashboard and summary¶

What happens (automatic):

The SMA Dashboard is reopened in your browser showing the final state (all EWI fixes, test registrations, etc.)
A final summary table is displayed showing the status of every step

Where you’ll usually be prompted: quick reference¶

Step	Prompt	When
1	Project name, source path, output path, email, company	First run or creating new config
2	“Use these settings” or “Edit settings”	Every run (with saved config)
4	Output path (if invalid)	Only if `already_migrated` and path is wrong
5	Output path, conversion tool choice	Only if `migrate`
6	SMA CLI path, Jupyter/SQL/Checkpoints options	Only if Snowpark API and not pre-configured
8	How to handle dirty git state	Only if existing repo has uncommitted changes
10	“Run Notebook Migration?”	Only if not pre-configured AND notebooks found
11	“Run EWI Fixer?”, comment mode, scope	Only if not pre-configured
12	“Run Stage Conversion?”	Only if not pre-configured

On subsequent runs: If you’ve saved a configuration, most prompts are skipped. The skill uses your saved preferences and runs through the pipeline with minimal interaction.

Output structure¶

If you went through all the steps, upon completion your output directory will contain:

<output>/
├── Output/              ← Converted Snowpark Python files
├── Reports/             ← CSV reports (Issues, Inventory, Dependencies)
├── Logs/                ← Execution logs
├── sma_storage.sqlite3  ← SQLite database tracking all EWIs and fixes
├── sma-dashboard/       ← Interactive web dashboard
└── dvp/                 ← Test validation workspace
    ├── 01-source/       ← Original PySpark source (copy)
    ├── 02-migrated/     ← Migrated code (copy)
    └── 03-tests/        ← Generated test suites

If you did not complete all the steps, only the artifacts related to the steps you executed will be present in the output.

Git branches¶

The skill maintains two branches:

main: the original, unmodified conversion output (your baseline)
sma/migration-process: all fixes and modifications applied by subsequent steps

Re-running the skill¶

Configurations are saved per-project. On subsequent runs:

You’ll see your saved config and can reuse it with one click
All defaults from your previous choices are preserved
You can selectively re-run individual sub-skills (EWI Fixer, Stage Conversion, etc.) independently by invoking them directly

Dashboard access¶

After the workflow completes, you can reopen the dashboard at any time:

cd "<output>/sma-dashboard" && python3 start_server.py

Migrating with Snowflake CoCo¶

Spark Migration skill user guide¶

What this skill does¶

Possible prerequisites¶

Step-by-step walkthrough¶

Step 1: Load configuration¶

Step 2: Review configuration¶

Step 3: Route based on migration status¶

Step 4: Validate existing output (only if already_migrated)¶

Step 5: Choose conversion tool (only if migrate)¶

Step 6: Snowpark API conversion (if you chose that option)¶

Step 7: Snowpark Connect conversion (if you chose that option)¶

Step 8: Initialize Git and verify output¶

Step 9: Dashboard generation¶

Step 10: Notebook migration¶

Step 11: EWI fixer¶

Step 12: Stage conversion¶

Step 13: DVP orchestrator¶

Step 14: Final dashboard and summary¶

Where you’ll usually be prompted: quick reference¶

Output structure¶

Git branches¶

Re-running the skill¶

Dashboard access¶

Step 4: Validate existing output (only if `already_migrated`)¶

Step 5: Choose conversion tool (only if `migrate`)¶