Snowpark Migration Accelerator: Understanding the Assessment Summary

After running an assessment, you can view the initial results and summary in the Assessment Summary Report. To access this report, click the View Results button.

View Results

This will display your assessment report. Keep in mind that this report summarizes the information from the inventory files created in the Output Reports folder during the SMA execution. For a comprehensive analysis, please review the Detailed Report in the output directory.

The Assessment Results section of the application contains several components, which are explained in detail below.

Standard Assessment Summary

The summary will appear as shown below:

Assessment Summary

In the top-right corner of the report, there is a date dropdown menu showing when your analysis was run. If you have executed the accelerator several times within the same project, the dropdown menu will display multiple dates. These dates correspond only to executions from your currently open project.

At the top left of the report, click the link to learn how to read through the different readiness scores. Below this link, you’ll find detailed explanations for each readiness score.

Spark API Readiness Score

This report includes several items, with the Readiness Score being the most important metric.

Let’s examine each section in detail:

Spark API Readiness Score

  1. Readiness Score - The Spark API Readiness Score is the main metric that SMA uses to evaluate how ready your code is for migration. This score represents the percentage of Spark API references that can be converted to Snowpark API. While this score is useful, it only considers Spark API references and doesn’t account for third-party libraries or other factors. Therefore, use it as an initial assessment rather than a complete evaluation.

    The score is calculated by dividing the number of convertible Spark API references by the total number of Spark API references found in your code. For example, if the score shows 3541/3746, it means 3541 references can be converted out of 3746 total references. A higher score indicates better compatibility with Snowpark API. You can find this score on the first page of the detailed report.

  2. Suggestion on what to do next - Based on your readiness score, SMA provides recommendations for your next steps in the migration process.

  3. Explanation of the Readiness Score - This section provides details about what the Spark API Readiness score means and how to interpret your results.

  4. Readiness Score Breakdown - This section shows how your score was calculated using two key metrics:

    • Usages ready for conversion: The number of Spark API references (functions, elements, or import statements) that can be converted to Snowpark

    • Identified usages: The total number of Spark API references found in your code

Third-Party Libraries Readiness Score

The Third-Party Libraries Readiness Score will be displayed in the following format:

Third-Party Libraries Readiness Score

  1. Readiness Score - Displays your readiness score and its category (green, yellow, or red). The Third-Party Libraries Readiness Score shows what percentage of your imported libraries are supported by Snowflake. For more details, see the Third-Party API Readiness Score section.

  2. Next Steps - Based on your readiness score, SMA provides recommendations for your next actions.

  3. Score Explanation - Describes what the Third-Party Libraries Readiness score means and how to interpret your results.

  4. Score Breakdown - Shows how your Third-Party Libraries Readiness Score was calculated using this formula: (Number of library calls supported in Snowpark) ÷ (Total number of identified library calls)

    Where:

    • “Library calls supported in Snowpark” means libraries that Snowpark can use

    • “Identified library calls” means all third-party library calls found in your code, including both Spark and non-Spark libraries, whether supported or not

SQL Readiness Score

The SQL Readiness Score will be displayed in the following format:

SQL Readiness Score

  • Readiness Score - Displays your readiness score and its category (green, yellow, or red). This score indicates how many SQL elements in your code can be successfully converted to Snowflake SQL. For more details, see the SQL Readiness Score section.

  • Next Steps - Based on your readiness score, SMA provides recommendations for actions to take before proceeding.

  • Score Explanation - Provides a clear explanation of the SQL Readiness score and how to interpret your results.

  • Score Breakdown - Shows a detailed calculation of your SQL Readiness Score, calculated as: (number of supported elements) ÷ (total number of elements).

Spark API Usages

Danger

The Spark API Usages section has been deprecated since version 2.0.2. You can now find:

The report contains three main sections displayed as tabs:

  1. Overall Usage Classification

  2. Spark API Usage Categorization

  3. Spark API Usages By Status

We will examine each section in detail below.

Overall Usage Classification

This tab displays a table containing three rows that show:

  • Supported operations

  • Unsupported operations

  • Total usage statistics

Overall Usage Classification

Additional details are provided in the following section:

  1. Usages Count - The total number of times Spark API functions are referenced in your code. Each reference is classified as either supported or unsupported, with totals shown at the bottom.

  2. Files with at least 1 usage - The number of files that contain at least one Spark API reference. If this number is less than your total file count, it means some files don’t use Spark API at all.

  3. Percentage of All Files - Shows what portion of your files use Spark API. This is calculated by dividing the number of files with Spark API usage by the total number of code files, expressed as a percentage.

Spark API Usage Categorization

This tab displays the different types of Spark references detected in your codebase. It shows the overall Readiness Score (which is the same score shown at the top of the page) and provides a detailed breakdown of this score by category.

Spark API Usage Categorization

You can find all available categorizations in the Spark Reference Categories section.

Spark API Usages By Status

The final tab displays a categorical breakdown organized by mapping status.

Spark API Usages by Status

The SMA tool uses seven main mapping statuses, which indicate how well Spark code can be converted to Snowpark. For detailed information about these statuses, refer to the Spark Reference Categories section.

Import Calls

Danger

The Import Calls section has been removed since version 2.0.2. You can now find:

The “Import Calls” section displays frequently used external library imports found in your codebase. Note that Spark API imports are excluded from this section, as they are covered separately in the “Spark API” section.

Import Calls

This table contains the following information:

The report displays the following information:

  1. A table with 5 rows showing:

    • The 3 most frequently imported Python libraries

    • An “Other” row summarizing all remaining packages

    • A “Total” row showing the sum of all imports

  2. A “Supported in Snowpark” column indicating whether each library is included in Snowflake’s list of supported packages in Snowpark.

  3. An “Import Count” column showing how many times each library was imported across all files.

  4. A “File Coverage” column showing the percentage of files that contain at least one import of each library. For example:

    • If ‘sys’ appears 29 times in the import statements but is only used in 28.16% of files, this suggests it’s typically imported once per file where it’s used.

    • The “Other” category might show 56 imports occurring across 100% of files.

For detailed import information per file, refer to the ImportUsagesInventory.csv file in the Output Reports.

File Summary

Danger

The File Summary section has been removed since version 2.0.2. You can now find:

The summary report contains multiple tables displaying metrics organized by file type and size. These metrics provide insights into the codebase’s volume and help estimate the required effort for the migration project.

The Snowpark Migration Accelerator analyzes all files in your source codebase, including both code and non-code files. You can find detailed information about the scanned files in the files.csv report.

The File Summary contains multiple sections. Let’s examine each section in detail.

File Type Summary

The File Type Summary displays a list of all file extensions found in your scanned code repository.

File Type Summary

The file extensions listed indicate which types of code files SMA can analyze. For each file extension, you will find the following information:

  • Lines of Code - The total number of executable code lines across all files with this extension. This count excludes comments and empty lines.

  • File Count - The total number of files found with this extension.

  • Percentage of Total Files - The percentage that files with this extension represent out of all files in the project.

To analyze your workload, you can easily identify whether it primarily consists of script files (such as Python or R), notebook files (like Jupyter notebooks), or SQL files. This information helps determine the main types of code files in your project.

Notebook Sizing by Language

The tool evaluates notebooks in your codebase and assigns them a “t-shirt” size (S, M, L, XL) based on the number of code lines they contain. This sizing helps estimate the complexity and scope of each notebook.

Notebook Sizing By Language

The notebook sizes are categorized according to the main programming language used within each notebook.

Notebook Stats By Language

This table displays the total number of code lines and cells in all notebooks, organized by programming language.

Notebook Stats by Language

These notebooks are organized by the primary programming language used within them.

Code File Content

When running SMA, the tab name will change based on your source language:

  • For Python source files, the tab will display “Python File Content”

  • For Scala source files, the tab will display “Scala File Content”

This row shows how many files contain Spark API references. The “Spark Usages” row displays:

  1. The number of files that use Spark APIs

  2. What percentage these files represent of the total codebase files analyzed

Code File Content

This metric helps identify what percentage of files do not contain Spark API references. A low percentage suggests that many code files lack Spark dependencies, which could mean the migration effort might be smaller than initially estimated.

Code File Sizing

The File Sizing tab name changes based on your source language:

  • For Python source files, it displays as “Python File Sizing”

  • For Scala source files, it displays as “Scala File Sizing”

The codebase files are categorized using “t-shirt” sizes (S, M, L, XL). Each size has specific criteria described in the “Size” column. The table also shows what percentage of all Python files falls into each size category.

Code File Sizing

Understanding the file size distribution in your codebase can help assess workload complexity. A high percentage of small files typically suggests simpler, less complex workloads.

Issues Summary

The Issues Summary provides critical information about potential problems found during code scanning. When transitioning from assessment to conversion, you’ll see a list of EWIs (Errors, Warnings, and Issues) detected in your codebase. For a detailed explanation of these issues, please refer to the Issue Analysis section in the documentation.

Issues Summary

At the top of the issue summary, you will find a table that provides an overview of all identified issues.

Issues Summary - Summary Table

The table contains two rows.

  • The “Number of issues” represents the total count of all issue codes found in each category.

  • The “Number of unique issues” represents the count of distinct error codes found in each category.

The problems are divided into three main categories:

  • Warnings indicate potential differences between source and target platforms that may not require immediate action but should be considered during testing. These could include slight variations in behavior for edge cases or notifications about changes in appearance compared to the source platform.

  • Conversion issues highlight elements that either failed to convert or need additional configuration to work properly in the target platform.

  • Parsing issues occur when the tool cannot interpret specific code elements. These are critical issues requiring immediate attention, typically caused by non-compiling source code or incorrect code extraction. If you believe your source code is correct but still receive parsing errors, it may be due to an unrecognized pattern in SMA. In such cases, please report an issue and include the problematic source code section.

The table summarizes the total count for each item.

Below this table, you will find a list of unique issue codes and their descriptions.

Issue Summary - Issue Code Table

Each issue code entry provides:

  • The unique issue identifier

  • A description of the issue

  • The number of occurrences

  • The severity level (Warning, Conversion Error, or Parsing Error)

You can click any issue code to view detailed documentation that includes:

  • A full description of the issue

  • Example code

  • Recommended solutions

For instance, clicking the first issue code shown above (SPRKPY1002) will take you to its dedicated documentation page.

By default, the table displays only the top 5 issues. To view all issues, click the SHOW ALL ISSUES button located below the table. You can also use the search bar above the table to find specific issues.

Understanding the remaining conversion work is crucial during assessment mode. You can find detailed information about each issue and its location in the issue inventory within the Reports folder.

Execution Summary

The execution summary provides a comprehensive overview of the tool’s recent analysis. It includes:

  • The code analysis score

  • User details

  • The unique execution ID

  • Version information for both SMA and Snowpark API

  • Project folder locations that were specified during Project Creation

Execution Summary

Appendixes

The appendixes contain additional reference information that can help you better understand the output generated by the SMA tool.

image (512).png

This guide contains general reference information about using the Snowpark Migration Accelerator (SMA). While the content may be updated periodically, it focuses on universal SMA usage rather than details about specific codebases.


This is what most users will see when they run the Snowpark Migration Accelerator (SMA). If you are using an older version, you might see the Abbreviated Assessment Summary instead, which is shown below.

Abbreviated Assessment Summary [Deprecated]

If your readiness score is low, your migration summary might appear as follows:

Assessment Summary

This summary contains the following information:

  • Execution Date: Shows when your analysis was performed. You can view results from any previous execution for this project.

  • Result: Indicates if your workload is suitable for migration based on the readiness score. The readiness score is a preliminary assessment tool and does not guarantee migration success.

  • Input Folder: Location of the source files that were analyzed.

  • Output Folder: Location where analysis reports and converted code files are stored.

  • Total Files: Number of files analyzed.

  • Execution Time: Duration of the analysis process.

  • Identified Spark References: Number of Spark API calls found in your code.

  • Count of Python (or Scala) Files: Number of source code files in the specified programming language.


Next Steps

The application provides several additional features, which can be accessed through the interface shown in the image below.

  • Retry Assessment - You can run the assessment again by clicking the Retry Assessment button on the Assessment Results page. This is useful when you make changes to the source code and want to see updated results.

  • View Log Folder - Opens the folder containing assessment execution logs. These text files provide detailed information about the assessment process and are essential for troubleshooting if the assessment fails. If technical support is needed, you may be asked to share these logs.

  • View Reports - Opens the folder containing assessment output reports. These include the detailed assessment report, Spark reference inventory, and other analyses of your source codebase. Each report type is explained in detail in this documentation.

  • Continue to Conversion - While this may seem like the next logical step, it’s important to review the assessment results thoroughly before proceeding. Note that running a conversion requires an access code. For more information, see the conversion section of this documentation.

The following pages provide detailed information about the reports generated each time the tool runs.