Snowpark Migration Accelerator: Supported Filetypes¶

The Snowpark Migration Accelerator (SMA) scans files in your selected source directory during project creation. While some files are excluded based on their type, SMA generates a summary report showing the count of files by extension.

The SMA tool searches for specific file extensions when analyzing references to the Spark API, SQL Statements, and other elements that contribute to the Readiness Scores. The tool can analyze both code files and notebooks located in any directory or subdirectory of your project.

Code Files¶

The Snowpark Migration Accelerator scans the following file types to identify references to Spark API and other third-party APIs:

  • Files with the extension .scala

  • Files with the extension .py

  • Files with the extension .python

SQL statements written in Spark SQL or HiveQL can be detected in the following file types:

  • SQL files with the extension .sql

  • Hive Query Language files with the extension .hql

Notebooks¶

Both the Spark Scala and PySpark parsers in the Snowpark Migration Accelerator (SMA) automatically scan and process Jupyter Notebook files and exported Databricks files when they are present in the source code directory.

  • Jupyter Notebook files (*.ipynb)

  • Databricks Notebook files (*.dbc)

The SMA will analyze notebook files to identify:

  • References to the Spark API

  • References to other third-party APIs

  • SQL statements

The analysis is performed based on the cell type within each notebook. Notebooks can contain a mix of SQL, Python, and Scala cells. The SMA will create an inventory of all cell types in its output report.

Excluded Files and folders¶

By default, certain files and folders are excluded from scanning. These exclusions primarily consist of project configuration files and their associated directories.

Folders type excluded from the scanning:¶

  • Python package installer (pip) - A tool for installing Python packages

  • Distribution packages (dist) - A directory containing Python packages ready for distribution

  • Virtual environment (venv) - An isolated Python environment for managing project dependencies

  • Site-packages - A directory where Python packages are installed for use across the system

Files type excluded from the scanning:¶

  • input.wsp - Workspace input file

  • .DS_Store - macOS system file that stores custom folder attributes

  • build.gradle - Gradle build configuration file

  • build.sbt - Scala Build Tool configuration file

  • pom.xml - Maven Project Object Model configuration file

  • storage.lck - Storage lock file