Snowpark Migration Accelerator: SMA Inventories¶

The Snowpark Migration Accelerator (SMA) analyzes your codebase and produces detailed data, which is stored in the Reports folder as spreadsheets (inventories). This data is used to create two types of reports:

The assessment summary
The curated reports

Understanding the inventory files may seem daunting at first, but they provide valuable insights into both your source workload and the converted workload. Below, we explain each output file and its columns in detail.

These inventories are also shared through telemetry data collection. For more details, please refer to the telemetry section of this documentation.

Assessment Report Details¶

The AssessmentReport.json file stores data that is displayed in both the Detailed Report and Assessment Summary sections of the application. This file is primarily used to populate these reports and may contain information that is also available in other spreadsheets.

Files Inventory¶

The files.csv contains a complete list of all files processed during tool execution, including their file types and sizes.

Path: The file location relative to the root directory. For example, files in the root directory will show only their filename.
Technology: The programming language of the source code (Python or Scala)
FileKind: Identifies if the file contains source code or is another type (such as text or log files)
BinaryKind: Indicates if the file is human-readable text or a binary file
Bytes: The file size measured in bytes
SupportedStatus: Always shows “DoesNotApply” as file support status is not applicable in this context

Import Usages Inventory¶

The ImportUsagesInventory.csv file contains a list of all external library imports found in your codebase. An external library is any package or module that is imported into your source code files.

Element: The unique identifier for the Spark reference
ProjectId: The root directory name where the tool was executed
FileId: The relative path and filename containing the Spark reference
Count: Number of occurrences of the element in a single line
Alias: Optional alternative name for the element
Kind: Always empty/null as all elements are imports
Line: Source code line number where the element appears
PackageName: Package containing the element
Supported: Indicates if the reference can be converted (True/False)
Automated: Empty/null (deprecated column)
Status: Always “Invalid” (deprecated column)
Statement: The actual code using the element [Not included in telemetry]
SessionId: Unique identifier for each tool execution
SnowConvertCoreVersion: Version number of the tool’s core processing engine
SnowparkVersion: Available Snowpark API version for the specific technology
ElementPackage: Package name containing the imported element (when available)
CellId: For notebook files, indicates the cell number containing the element
ExecutionId: Unique identifier for this SMA execution
Origin: Source type of the import (BuiltIn, ThirdPartyLib, or blank)

Input Files Inventory¶

The InputFilesInventory.csv file contains a detailed list of all files, organized by their file types and sizes.

Element: The filename, which is identical to FileId
ProjectId: The name of the project, represented by the root directory where the tool was executed
FileId: The complete path to the file containing the Spark reference, shown as a relative path
Count: The number of files sharing this filename
SessionId: A unique identifier assigned to each tool session
Extension: The file extension type
Technology: The programming language or technology type, determined by the file extension
Bytes: The file size measured in bytes
CharacterLength: The total number of characters in the file
LinesOfCode: The total number of code lines in the file
ParsingResult: Indicates whether the cell was successfully parsed (“Successful”) or encountered errors (“Error”)

Input and Ouput Files Inventory¶

The IOFilesInventory.csv file contains a list of all external files and resources that your code reads from or writes to.

Element: The specific item (file, variable, or component) being accessed for reading or writing operations
ProjectId: The name of the root directory where the tool was executed
FileId: The complete path and filename where Spark code was detected
Count: The number of occurrences of this filename
isLiteral: Indicates whether the read/write location is specified as a literal value
Format: The detected file format (such as CSV, JSON) if SMA can identify it
FormatType: Specifies if the identified format is explicit
Mode: Indicates whether the operation is “Read” or “Write”
Supported: Indicates if Snowpark supports this operation
Line: The line number in the file where the read or write operation occurs
SessionId: A unique identifier assigned to each tool session
OptionalSettings: Lists any additional parameters defined for the element
CellId: For notebook files, identifies the specific cell location (null for non-notebook files)
ExecutionId: A unique identifier for each time the tool is run

Issue Inventory¶

The Issues.csv file contains a detailed report of all conversion issues discovered in your codebase. For each issue, you will find:

A description explaining the problem
The precise location within the file where the issue occurs
A unique code identifier for the issue type

For more detailed information about specific issues, please refer to the issue analysis section of our documentation.

Code: A unique identifier assigned to each issue detected by the tool
Description: A detailed explanation of the issue, including the Spark reference name when applicable
Category: The type of issue found, which can be one of the following:
- Warning
- Conversion Error
- Parser Error
- Helper
- Transformation
- WorkAround
- NotSupported
- NotDefined
NodeType: The syntax node identifier where the issue was detected
FileId: The relative path and filename where the Spark reference was found
ProjectId: The root directory name where the tool was executed
Line: The specific line number in the source file where the issue occurs
Column: The specific character position in the line where the issue occurs

Joins Inventory¶

The JoinsInventory.csv file contains a comprehensive list of all dataframe join operations found in the codebase.

Element: Line number indicating where the join starts (and ends, if spanning multiple lines)
ProjectId: Name of the root directory where the tool was executed
FileId: Path and name of the file containing the Spark reference
Count: Number of files with the same filename
isSelfJoin: TRUE if joining a table with itself, FALSE otherwise
HasLeftAlias: TRUE if an alias is defined for the left side of the join, FALSE otherwise
HasRightAlias: TRUE if an alias is defined for the right side of the join, FALSE otherwise
Line: Starting line number of the join
SessionId: Unique identifier assigned to each tool session
CellId: Identifier of the notebook cell containing the element (null for non-notebook files)
ExecutionId: Unique identifier for each tool execution

Notebook Cells Inventory¶

The NotebookCellsInventory.csv file provides a detailed list of all cells within a notebook, including their source code content and the number of code lines per cell.

Element: The programming language used in the source code (Python, Scala, or SQL)
ProjectId: The name of the root directory where the tool was executed
FileId: The complete path and filename where Spark code was detected
Count: The number of files with this specific filename
CellId: For notebook files, the unique identifier of the cell containing the code (null for non-notebook files)
Arguments: This field is always empty (null)
LOC: The total number of code lines in the cell
Size: The total number of characters in the cell
SupportedStatus: Indicates whether all elements in the cell are supported (TRUE) or if there are unsupported elements (FALSE)
ParsingResult: Shows if the cell was successfully parsed (“Successful”) or if there were parsing errors (“Error”)

Notebook Size Inventory¶

The NotebookSizeInventory.csv file provides a summary of code lines for each programming language found in notebook files.

filename: The name of the spreadsheet file (identical to the FileId)
ProjectId: The name of the root directory where the tool was executed
FileId: The relative path and name of the file containing Spark references
Count: The number of files with this specific filename
PythonLOC: Number of Python code lines in notebook cells (zero for regular files)
ScalaLOC: Number of Scala code lines in notebook cells (zero for regular files)
SqlLOC: Number of SQL code lines in notebook cells (zero for regular files)
Line: This field is always empty (null)
SessionId: A unique identifier assigned to each tool session
ExecutionId: A unique identifier assigned to each tool execution

Pandas Usages Inventory¶

The PandasUsagesInventory.csv file contains a comprehensive list of all Pandas API references found in your Python codebase during the scanning process.

Element: The unique identifier for the pandas reference
ProjectId: The root directory name where the tool was executed
FileId: The relative path to the file containing the spark reference
Count: Number of occurrences of the element in a single line
Alias: The alternative name used for the element (only applies to imports)
Kind: The type of element, such as Class, Variable, Function, Import, etc.
Line: The source file line number where the element was found
PackageName: The package containing the element
Supported: Indicates if the reference is supported (True/False)
Automated: Indicates if the tool can automatically convert the element (True/False)
Status: Element classification: Rename, Direct, Helper, Transformation, WorkAround, NotSupported, or NotDefined
Statement: The context in which the element was used [Not included in telemetry]
SessionId: A unique identifier for each tool execution
SnowConvertCoreVersion: The version number of the tool’s core processing code
SnowparkVersion: The Snowpark API version available for the specific technology and tool run
PandasVersion: The pandas API version used to identify elements in the codebase
CellId: The cell identifier in the FileId (only for notebooks, null otherwise)
ExecutionId: A unique identifier for each tool execution

Spark Usages Inventory¶

The SparkUsagesInventory.csv file identifies where and how Spark API functions are used in your code. This information helps calculate the Readiness Score, which indicates how ready your code is for migration.

Element: The unique identifier for the Spark reference
ProjectId: The root directory name where the tool was executed
FileId: The relative path and filename containing the Spark reference
Count: Number of occurrences of the element in a single line
Alias: The element’s alias (only applies to import elements)
Kind: The element’s category (e.g., Class, Variable, Function, Import)
Line: The source file line number where the element was found
PackageName: The package name containing the element
Supported: Indicates if the reference is supported (True/False)
Automated: Indicates if the tool can automatically convert the element (True/False)
Status: Element categorization (Rename, Direct, Helper, Transformation, WorkAround, NotSupported, NotDefined)
Statement: The actual code where the element was used [NOTE: This column is not sent via telemetry]
SessionId: A unique identifier for each tool execution
SnowConvertCoreVersion: The tool’s core process version number
SnowparkVersion: The available Snowpark API version for the specific technology and tool run
CellId: For notebook files, the cell’s numerical location where the element was found
ExecutionId: A unique identifier for this specific SMA execution

The SqlStatementsInventory.csv file contains a count of SQL keywords found in Spark SQL elements.

Element: Name of the code element containing the SQL statement
ProjectId: Root directory name where the tool was executed
FileId: Relative path to the file containing the Spark reference
Count: Number of occurrences of the element in a single line
InterpolationCount: Number of external elements inserted into this element
Keywords: Dictionary containing SQL keywords and their frequency
Size: Total character count of the SQL statement
LiteralCount: Number of string literals in the element
NonLiteralCount: Number of SQL components that are not string literals
Line: Line number where the element appears
SessionId: Unique identifier for each tool session
CellId: Identifier of the notebook cell containing the element (null if not in a notebook)
ExecutionId: Unique identifier for each tool execution

SQL Elements Inventory¶

The SQLElementsInventory.csv file contains a count of SQL statements found within Spark SQL elements.

Here are the fields included in the SQL analysis report:

Element: SQL code element type (Example: SqlSelect, SqlFromClause)
ProjectId: Root directory name where the tool was executed
FileId: Path to the file containing the SQL code
Count: Number of occurrences of the element in a single line
NotebookCellId: ID of the notebook cell
Line: Line number where the element appears
Column: Column number where the element appears
SessionId: Unique ID for each tool session
ExecutionId: Unique ID for each tool run
SqlFlavor: Type of SQL being analyzed (Example: Spark SQL, Hive SQL)
RootFullName: Complete name of the main code element
RootLine: Line number of the main element
RootColumn: Column number of the main element
TopLevelFullName: Complete name of the highest-level SQL statement
TopLevelLine: Line number of the highest-level statement
TopLevelColumn: Column number of the highest-level statement
ConversionStatus: Result of SQL conversion (Example: Success, Failed)
Category: Type of SQL statement (Example: DDL, DML, DQL)
EWI: Error Warning Information code
ObjectReference: Name of the SQL object being referenced (Example: table name, view name)

SQL Embedded Usage Inventory¶

The SqlEmbeddedUsageInventory.csv file contains a count of SQL keywords found within Spark SQL elements.

Element: The type of SQL component found in the code (such as Select statement, From clause, or Numeric literal)
ProjectId: The name of the root directory where the tool was executed
FileId: The location and relative path of the file containing the SQL reference
Count: How many times this element appears in a single line
ExecutionId: A unique ID assigned to each tool execution
LibraryName: The name of the library in use
HasLiteral: Shows if the element contains literal values
HasVariable: Shows if the element contains variables
HasFunction: Shows if the element contains function calls
ParsingStatus: The current parsing state (Success, Failed, or Partial)
HasInterpolation: Shows if the element contains string interpolations
CellId: The identifier for the notebook cell
Line: The line number where the element is found
Column: The column number where the element is found

Third Party Usages Inventory¶

The ThirdPartyUsagesInventory.csv file contains

Element: The unique identifier for the third-party reference
ProjectId: The name of the project’s root directory where the tool was executed
FileId: The relative path to the file containing the Spark reference
Count: The number of occurrences of the element in a single line
Alias: The alternative name assigned to the element (if applicable)
Kind: The type classification of the element (variable, type, function, or class)
Line: The source file line number where the element was found
PackageName: The full package name (combination of ProjectId and FileId in Python)
Statement: The actual code where the element was used [NOTE: Not included in telemetry data]
SessionId: A unique identifier for each tool session
CellId: The notebook cell identifier where the element was found (null for non-notebook files)
ExecutionId: A unique identifier for each tool execution

Packages Inventory¶

The packagesInventory.csv file contains

Package Name: The name of the package being analyzed.
Project Name: The name of the project, which corresponds to the root directory where the tool was executed.
File Location: The file path where the package was found, shown as a relative path.
Occurrence Count: The number of times this package appears on a single line of code.

Tool Execution Summary¶

The tool_execution.csv file contains essential information about the current execution of the Snowpark Migration Accelerator (SMA) tool.

ExecutionId: A unique identifier assigned to each time the tool runs.
ToolName: The name of the tool being used. Can be either PythonSnowConvert or SparkSnowConvert (for Scala).
Tool_Version: The version number of the software.
AssemblyName: The complete name of the code processor (a more detailed version of ToolName).
LogFile: Indicates if a log file was generated when an error or failure occurred.
FinalResult: Indicates at which point the tool stopped if an error or failure occurred.
ExceptionReport: Indicates if an error report was generated when a failure occurred.
StartTime: The date and time when the tool began running.
EndTime: The date and time when the tool finished running.
SystemName: The machine’s serial number where the tool was run (used only for troubleshooting and verifying licenses).