description:

How the assistant works

Using the SMA AI assistant

You can use the SMA AI assistant to analyze and answer questions based exclusively on assessment data and documentation produced during the SMA migration assessment process.

How the assistant works

The assistant works on top of the assessment reports generated by the SMA (Snowflake Migration Accelerator). The process is divided into four main phases:

  • Activation
  • Processing questions and generating SQL
  • Retrieving answers
  • Formatting and delivering response

Assistant Architecture Flow

Phase 1: Activation and data ingestion

  1. Activate the assistant.

    The assistant is launched directly from the SMA application. Upon launch, you’ll be prompted to enter your Snowflake account credentials.

    Assistant Activation

  2. Configure authentication.

    Select the appropriate authentication method for connecting to the Snowflake account. The assistant supports multiple authentication types to accommodate different security requirements.

    Assistant Authentication Types

  3. Test the connection.

    Before proceeding, test the connection to ensure proper access to the Snowflake account. A successful connection test is required to activate the assistant, which then creates the necessary infrastructure (tables, schemas, etc.) in your Snowflake environment.

    Assistant Test Connection

  4. Provision infrastructure.

    Once the connection is validated, the assistant automatically provisions a dedicated table structure within your Snowflake account for storing and managing documentation.

    Assistant Activation

  5. Ingest assessment data. Upload all relevant assessment reports generated by the SMA. The assistant accepts both .csv and .docx files. Each uploaded file is mapped to a separate table in Snowflake, maintaining a one-to-one relationship between source files and database tables.

    Assistant Upload Assessment Files

Phase 2: Question processing and SQL generation (RAG)

Once the infrastructure is ready, you can begin asking questions through the chat interface. Each question triggers a sophisticated retrieval-augmented generation (RAG) process powered by Snowflake Cortex.

Assistant Chat Window

LLM selection

The assistant interface allows you to select from different Large Language Models (LLMs) to power the question-answering process. The available LLM options vary depending on the region where your Snowflake account is hosted, as Snowflake Cortex model availability differs across geographic regions.

Assistant Select LLMs

Assistant Select LLMs List

Note

The list of available models is dynamically populated based on your Snowflake account’s region. Some advanced models might only be available in specific regions due to Snowflake’s regional deployment of Snowflake Cortex AI services.

  1. Semantic documentation search

    The assistant performs a semantic search against a vectorized documentation table in the Snowflake account to identify most relevant documents and data sources that can potentially answer your query.

  2. Context-aware SQL generation

    Using the identified documentation as context, Snowflake Cortex generates a targeted SQL query designed to extract the precise information needed to answer your question.

  3. SQL validation (self-correction loop 1)

    The generated SQL query undergoes validation by passing it back to Snowflake Cortex along with the original question to ensure the query logically produces an answer that addresses the user’s intent.

Phase 3: Answer retrieval and refinement

The answer retrieval process adapts based on the SQL validation results, ensuring the most accurate response possible.

The following scenarios describe what might happen.

Scenario A: Valid SQL query

  • The validated SQL query is executed against the assessment data tables.
  • The query results are extracted and prepared as the preliminary answer.

Scenario B: Invalid or insufficient SQL query

  • When SQL generation is not viable, the assistant falls back to the summary .docx file as context.
  • Snowflake Cortex generates a preliminary answer based on the documentation content rather than structured data.

Final answer verification (self-correction loop 2)

Before presenting the answer, the assistant performs a final quality check:

  • The preliminary answer is evaluated by Snowflake Cortex to verify it logically addresses the original question.
    • If validated: The answer is accepted and moves to formatting.
    • If not validated: The question is posed directly to Snowflake Cortex as a general inquiry (without specific documentation context), and this response becomes the final answer.

Phase 4: Response formatting and delivery

Natural language formatting

The final answer is processed one last time by Snowflake Cortex to transform it into a clear, conversational response that’s easy to understand and properly formatted for the chat interface.

Assistant Answer

The formatted response is then displayed in the chat window, complete with proper structure, context, and any relevant details extracted from the assessment data.

Assessment report files and artifacts

The following files are uploaded to your account and serve as the data sources for the assistant:

File TypeFile NamePurpose
CSVDbxElementsInventory.csvLists Databricks (DBX) elements found inside notebooks.
CSVExecutionFlowInventory.csvLists the relations between different workload scopes based on function calls.
CSVCheckpoints.csvLists generated checkpoints for the user workload.
CSVDataFramesInventory.csvLists the DataFrames assignments found for generating checkpoints.
CSVArtifactDependencyInventory.csvLists the artifact dependencies of each file analyzed by the SMA.
CSVFiles.csvInventory of each file’s type and size present in that execution.
CSVImportUsagesInventory.csvLists all referenced import calls in the codebase.
CSVInputFilesInventory.csvLists every file by filetype and size.
CSVIOFilesInventory.csvLists all external elements being read from or written to.
CSVIssues.csvLists every conversion issue found, including description and location.
CSVJoinsInventory.csvInventory of all DataFrame joins done in that codebase.
CSVNotebookCellsInventory.csvInventory of all cells in a notebook.
CSVNotebookSizeInventory.csvLists the size in lines of code of different source languages in notebook files.
CSVPandasUsagesInventory.csvLists every reference to the Pandas API (Python Only).
CSVSparkUsagesInventory.csvShows the exact location and usage for each reference to the Spark API.
CSVSqlStatementsInventory.csvCount of SQL keywords present in SQL Spark elements.
CSVSQLElementsInventory.csvCount of SQL elements present in SQL Spark elements.
CSVSqlEmbeddedUsageInventory.csvCount of embedded SQL present in SQL Spark elements.
CSVThirdPartyUsagesInventory.csvLists the third-party references in the codebase.

Database schema reference

This section describes the database tables created in your Snowflake account when using the assistant. These tables store migration assessment data, code inventories, and metadata used by the assistant to provide context-aware responses.

Schema overview

The assistant system creates tables to store:

  • Migration assessment data: Results from code analysis including dependencies, issues, and usage patterns
  • Code inventories: Detailed tracking of various code elements (imports, functions, DataFrames, etc.)
  • Documentation metadata: Vector embeddings for semantic search capabilities
  • Execution tracking: Records of tool executions and their results

Most tables include an EXECUTIONID column to associate records with specific migration assessment runs.

Table reference

DOCUMENTATION_METADATA

Stores documentation text with vector embeddings for semantic search capabilities. Used by the assistant to find relevant context based on user questions using vector similarity.

ColumnTypeDescription
TABLE_NAMEVARCHARName of the table the documentation describes
DOCUMENTATION_TEXTVARCHARThe documentation text content
EMBEDDINGVECTOR(FLOAT, 768)Vector embedding for semantic search

ARTIFACTDEPENDENCYINVENTORIES

Lists the artifact dependencies of each file analyzed by the SMA. This inventory allows the user to determine which artifacts are needed for the file to work properly in Snowflake.

The following are considered artifacts: a third-party library, SQL entity, source of a read or write operation, and another source code file in the workload.

ColumnTypeDescription
EXECUTIONIDVARCHAR(16777216)The identifier of the execution
FILEIDVARCHAR(16777216)The identifier of the source code file
DEPENDENCYVARCHAR(16777216)The artifact dependency that the current file has
TYPEVARCHAR(16777216)The type of the artifact dependency
SUCCESSBOOLEANIf the artifact needs any intervention, it shows FALSE; otherwise, it shows TRUE
STATUSDETAILVARCHAR(16777216)The status of the artifact dependency, based on the type
ARGUMENTSVARCHAR(16777216)Extra data of the artifact dependency, based on the type
LOCATIONVARCHAR(16777216)The collection of cell ID and line number where the artifact dependency is being used in the source code file
INDIRECTDEPENDENCIESVARCHAR(16777216)A list of other files that this file relies on, even if not directly
TOTALINDIRECTDEPENDENCIESNUMBER(38,0)The total count of these indirect dependencies
DIRECTPARENTSVARCHAR(16777216)A list of files that directly use this file
TOTALDIRECTPARENTSNUMBER(38,0)The total count of these direct parent files
INDIRECTPARENTSVARCHAR(16777216)A list of files that use this file indirectly (through other files)
TOTALINDIRECTPARENTSNUMBER(38,0)The total count of these indirect parent files

CHECKPOINTSINVENTORIES

Lists the generated checkpoints for the user workload. These checkpoints are completely capable of being used in the Checkpoints Feature from the Snowflake Extension.

ColumnTypeDescription
NAMEVARCHAR(16777216)The checkpoint name (using the format described before)
FILEIDVARCHAR(16777216)The relative path of the file (starting from the input folder the user chose in the SMA tool)
CELLIDNUMBER(38,0)The number of cell where the DataFrame operation was found inside a notebook file
LINENUMBER(38,0)Line number where the DataFrame operation was found
COLUMNLINENUMBER(38,0)The column number where the DataFrame operation was found
TYPECHECKPOINTVARCHAR(16777216)The use case of the checkpoints (Collection or Validation)
DATAFRAMENAMEVARCHAR(16777216)The name of the DataFrame
LOCATIONASSIGNMENTNUMBER(38,0)The assignment number of the DataFrame name
ENABLEDBOOLEANIndicates whether the checkpoint is enabled (True or False)
MODENUMBERVARCHAR(16777216)The mode number of the collection (Schema [1] or DataFrame [2])
SAMPLEDATAFRAMENUMBER(38,0)The sample of the DataFrame
ENTRYPOINTVARCHAR(16777216)The entry point that guides the flow to execute the checkpoint
EXECUTIONIDVARCHAR(16777216)The execution ID

DATAFRAMESINVENTORIES

Lists the DataFrame assignments to use when generating checkpoints for the user workload

ColumnTypeDescription
FULLNAMEVARCHAR(16777216)The full name of the DataFrame
NAMEVARCHAR(16777216)The simple name of the variable of the DataFrame
FILEIDVARCHAR(16777216)The relative path of the file (starting from the input folder the user chose in the SMA tool)
CELLIDNUMBER(38,0)The number of cells where the DataFrame operation was found inside a notebook file
LINENUMBER(38,0)The line number where the DataFrame operation was found
COLUMNLINENUMBER(38,0)The column number where the DataFrame operation was found
ASSIGNMENTNUMBERNUMBER(38,0)The number of assignments for this particular identifier (not symbol) in the file
RELEVANTFUNCTIONVARCHAR(16777216)The relevant function why this was collected
RELATEDDATAFRAMESVARCHAR(16777216)The fully-qualified name of the DataFrame(s) involved in the operation (separated by semicolons)
ENTRYPOINTSVARCHAR(16777216)Empty for this phase
EXECUTIONIDVARCHAR(16777216)The execution ID

DBXELEMENTSINVENTORIES

Lists the DBX (Databricks) elements found inside notebooks.

ColumnTypeDescription
ELEMENTVARCHAR(16777216)The DBX element name
PROJECTIDVARCHAR(16777216)Name of the project (root directory the tool was run on)
FILEIDVARCHAR(16777216)File where the element was found and the relative path to that file
COUNTNUMBER(38,0)The number of times that element shows up in a single line
CATEGORYVARCHAR(16777216)The element category
KINDVARCHAR(16777216)A category for each element. These could include Function or Magic
LINENUMBER(38,0)The line number in the source files where the element was found
PACKAGENAMEVARCHAR(16777216)The name of the package where the element was found
SUPPORTEDBOOLEANWhether this reference is “supported” or not (True/False)
AUTOMATEDBOOLEANWhether or not the tool can automatically convert it (True/False)
STATUSVARCHAR(16777216)The categorization of each element (Rename, Direct, Helper, Transformation, WorkAround, NotSupported, or NotDefined)
SESSIONIDVARCHAR(16777216)Unique identifier for each run of the tool
SNOWCONVERTCOREVERSIONVARCHAR(16777216)The version number for the core code process of the tool
CELLIDNUMBER(38,0)If this element was found in a notebook file, the numbered location of the cell where this element was in the file
EXECUTIONIDVARCHAR(16777216)The unique identifier for this execution of the SMA
TECHNOLOGYVARCHAR(16777216)Source technology platform

DETAILEDREPORTS

Stores detailed migration assessment reports for each execution. Used by the assistant to provide context-specific answers about migration assessments.

ColumnTypeDescription
IDNUMBER(38,0)Auto-incrementing primary key
EXECUTION_IDVARCHAR(16777216)Unique identifier for the execution run
REPORT_TEXTVARCHAR(16777216)Full text of the detailed report

EXECUTIONFLOWINVENTORIES

Lists the relations between the different workload scopes, based on the function calls found. This inventory’s main purpose is to serve as the base for the entry points identification.

ColumnTypeDescription
CALLERVARCHAR(16777216)The full name of the scope where the call was found
CALLERTYPEVARCHAR(16777216)The type of the scope where the call was found. This can be: Function, Class, or Module
INVOKEDVARCHAR(16777216)The full name of the element that was called
INVOKEDTYPEVARCHAR(16777216)The type of the element. This can be: Function or Class
FILEIDVARCHAR(16777216)The relative path of the file (starting from the input folder the user chose in the SMA tool)
CELLIDNUMBER(38,0)The cell number where the call was found inside a notebook file, if applicable
LINENUMBER(38,0)The line number where the call was found
COLUMNLINENUMBER(38,0)The column number where the call was found
EXECUTIONIDVARCHAR(16777216)The execution ID

IMPORTUSAGESINVENTORIES

Contains all the referenced import calls in the codebase. An import is classified as an external library that gets imported at any point in the file.

ColumnTypeDescription
ELEMENTVARCHAR(16777216)The unique name for the actual Spark reference
PROJECTIDVARCHAR(16777216)Name of the project (root directory the tool was run on)
FILEIDVARCHAR(16777216)File where the Spark reference was found and the relative path to that file
COUNTNUMBER(38,0)The number of times that element shows up in a single line
ALIASVARCHAR(16777216)The alias of the element (if any)
KINDVARCHAR(16777216)Null/empty value because all elements are imports
LINENUMBER(38,0)The line number in the source files where the element was found
PACKAGENAMEVARCHAR(16777216)The name of the package where the element was found
ISSNOWPARKANACONDASUPPORTEDBOOLEANWhether this reference is “supported” or not. Values: True/False
AUTOMATEDVARCHAR(16777216)Null/empty. This column is deprecated
STATUSVARCHAR(16777216)Value Invalid. This column is deprecated
STATEMENTVARCHAR(16777216)The code where the element was used. (Note: This column is not sent via telemetry)
SESSIONIDVARCHAR(16777216)Unique identifier for each run of the tool
SNOWCONVERTCOREVERSIONVARCHAR(16777216)The version number for the core code process of the tool
SNOWPARKVERSIONVARCHAR(16777216)The version of Snowpark API available for the specified technology and run of the tool
ELEMENTPACKAGEVARCHAR(16777216)The package name where the imported element is declared (when available)
CELLIDNUMBER(38,0)If this element was found in a notebook file, the numbered location of the cell where this element was in the file
EXECUTIONIDVARCHAR(16777216)The unique identifier for this execution of the SMA
ORIGINVARCHAR(16777216)Category of the import reference. Possible values are BuiltIn, ThirdPartyLib, or blank
TECHNOLOGYVARCHAR(16777216)Source technology platform
FULLNAMEVARCHAR(16777216)It represents the correct full path for the current element

INPUTFILESINVENTORIES

Similar to the files inventory, lists every file by filetype and size.

ColumnTypeDescription
ELEMENTVARCHAR(16777216)Filename (same as FileId)
PROJECTIDVARCHAR(16777216)Name of the project (root directory the tool was run on)
FILEIDVARCHAR(16777216)File where the Spark reference was found and the relative path to that file
COUNTNUMBER(38,0)Count of files with that filename
SESSIONIDVARCHAR(16777216)Unique identifier for each session of the tool
EXTENSIONVARCHAR(16777216)The file’s extension
TECHNOLOGYVARCHAR(16777216)The source file’s technology based on extension
BYTESNUMBER(38,0)Size of the file in bytes
CHARACTERLENGTHNUMBER(38,0)Count of characters in the file
LINESOFCODENUMBER(38,0)Lines of code in the file
PARSERESULTVARCHAR(16777216)“Successful” if the cell was fully parsed, “Error” if it was not parsed
IGNOREDBOOLEANWhether file was ignored
ORIGINFILEPATHVARCHAR(16777216)Original file path
EXECUTIONIDVARCHAR(16777216)The unique identifier for this execution of the SMA

IOFILESINVENTORIES

Lists all external elements that are being read from or written to in the codebase.

ColumnTypeDescription
ELEMENTVARCHAR(16777216)The file, variable, or other element being read or written
PROJECTIDVARCHAR(16777216)Name of the project (root directory the tool was run on)
FILEIDVARCHAR(16777216)File where the Spark reference was found and the relative path to that file
COUNTNUMBER(38,0)Count of files with that filename
ISLITERALBOOLEANIf the read/write location was in a literal
FORMATVARCHAR(16777216)If the SMA can determine the format of the element (such as csv, json, etc.)
FORMATTYPEVARCHAR(16777216)If the format above is specific
MODEVARCHAR(16777216)Value will be Read or Write depending on whether there is a reader or writer
SUPPORTEDBOOLEANWhether this operation is supported in Snowpark
LINENUMBER(38,0)The line in the file where the read or write occurs
SESSIONIDVARCHAR(16777216)Unique identifier for each session of the tool
OPTIONSETTINGSVARCHAR(16777216)If a parameter is defined in the element, it will be listed here
CELLIDNUMBER(38,0)Cell ID where that element was in that FileId (if in a notebook, null otherwise)
EXECUTIONIDVARCHAR(16777216)Unique identifier for each run of the tool

ISSUES

Lists every conversion issue found in the codebase. A description, the exact location of the issue in the file, and a code associated with that issue will be reported.

ColumnTypeDescription
CODEVARCHAR(16777216)The unique code for the issues reported by the tool
DESCRIPTIONVARCHAR(16777216)The text describing the issue and the name of the Spark reference when applies
CATEGORYVARCHAR(16777216)The classification of each issue. Options: Warning, Conversion Error, Parser Error, Helper, Transformation, WorkAround, NotSupported, NotDefined
NODETYPEVARCHAR(16777216)The name associated to the syntax node where the issue was found
FILEIDVARCHAR(16777216)File where the Spark reference was found and the relative path to that file
PROJECTIDVARCHAR(16777216)Name of the project (root directory the tool was run on)
LINENUMBER(38,0)The line number in the source file where the issue was found
COLUMNLINENUMBER(38,0)The column position in the source file where the issue was found
URLVARCHAR(16777216)URL to documentation or more info
EXECUTIONIDVARCHAR(16777216)The unique identifier for this execution of the SMA

JOINSINVENTORIES

Contains an inventory of all DataFrame joins done in the codebase.

ColumnTypeDescription
ELEMENTVARCHAR(16777216)Line number where the join begins (and ends, if not on a single line)
PROJECTIDVARCHAR(16777216)Name of the project (root directory the tool was run on)
FILEIDVARCHAR(16777216)File where the Spark reference was found and the relative path to that file
COUNTNUMBER(38,0)Count of files with that filename
ISSELFJOINBOOLEANTRUE if the join is a self join, FALSE if not
HASLEFTALIASBOOLEANTRUE if the join has a left alias, FALSE if not
HASRIGHTALIASBOOLEANTRUE if the join has a right alias, FALSE if not
LINENUMBER(38,0)Line number where the join begins
KINDVARCHAR(16777216)Join type (INNER, LEFT, RIGHT, etc.)
SESSIONIDVARCHAR(16777216)Unique identifier for each session of the tool
CELLIDNUMBER(38,0)Cell ID where that element was in that FileId (if in a notebook, null otherwise)
EXECUTIONIDVARCHAR(16777216)Unique identifier for each run of the tool

NOTEBOOKCELLSINVENTORIES

Gives an inventory of all cells in a notebook based on the source code for each cell and the lines of code in that cell.

ColumnTypeDescription
ELEMENTVARCHAR(16777216)Source language (Python, Scala, or SQL)
PROJECTIDVARCHAR(16777216)Name of the project (root directory the tool was run on)
FILEIDVARCHAR(16777216)File where the Spark reference was found and the relative path to that file
COUNTNUMBER(38,0)Count of files with that filename
CELLIDNUMBER(38,0)Cell ID where that element was in that FileId (if in a notebook, null otherwise)
ARGUMENTSVARCHAR(16777216)Null (this field will be empty)
LOCNUMBER(38,0)Lines of code in that cell
SIZENUMBER(38,0)Count of characters in that cell
SUPPORTEDSTATUSBOOLEANTRUE unless the element (source language) is not supported by the SMA tool (FALSE)
PARSINGRESULTVARCHAR(16777216)“Successful” if the cell was fully parsed; “Error” if it was not parsed
EXECUTIONIDVARCHAR(16777216)Unique identifier for each run of the tool

NOTEBOOKSIZEINVENTORIES

Lists the size in lines of code of different source languages present in notebook files.

ColumnTypeDescription
ELEMENTVARCHAR(16777216)Filename (for this spreadsheet, it is the same as the FileId)
PROJECTIDVARCHAR(16777216)Name of the project (root directory the tool was run on)
FILEIDVARCHAR(16777216)File where the Spark reference was found and the relative path to that file
COUNTNUMBER(38,0)Count of files with that filename
PYTHONLOCNUMBER(38,0)Python lines of code present in notebook cells (will be 0 for non-notebook files)
SCALALOCNUMBER(38,0)Scala lines of code present in notebook cells (will be 0 for non-notebook files)
SQLLOCNUMBER(38,0)SQL lines of code present in notebook cells (will be 0 for non-notebook files)
LINEVARCHAR(16777216)Null (this field will be empty)
SESSIONIDVARCHAR(16777216)Unique identifier for each session of the tool
EXECUTIONIDVARCHAR(16777216)Unique identifier for each run of the tool

PACKAGESINVENTORIES

Tracks package usage in the codebase.

ColumnTypeDescription
ELEMENTVARCHAR(16777216)The name of the package
PROJECTIDVARCHAR(16777216)Name of the project (root directory the tool was run on)
FILEIDVARCHAR(16777216)File where package was found and the relative path to that file
COUNTNUMBER(38,0)The number of times that element shows up in a single line
ALIASVARCHAR(16777216)Package alias if used
KINDVARCHAR(16777216)Type of package
LINEVARCHAR(16777216)Line reference
PACKAGENAMEVARCHAR(16777216)Full package name
SUPPORTEDVARCHAR(16777216)Support status
AUTOMATEDVARCHAR(16777216)Automation status
STATUSVARCHAR(16777216)Migration status
STATEMENTVARCHAR(16777216)Full import statement
SESSIONIDVARCHAR(16777216)Session identifier
SNOWCONVERTCOREVERSIONVARCHAR(16777216)SnowConvert core version used
SNOWPARKVERSIONVARCHAR(16777216)Target Snowpark version
CELLIDVARCHAR(16777216)Cell identifier (for notebooks)
EXECUTIONIDVARCHAR(16777216)Unique identifier for each run of the tool
PARAMETERSINFOVARCHAR(16777216)Parameter information
TECHNOLOGYVARCHAR(16777216)Source technology platform

PANDASUSAGESINVENTORIES

[Python Only] Lists every reference to the Pandas API present in the scanned codebase.

ColumnTypeDescription
ELEMENTVARCHAR(16777216)The unique name for the actual pandas reference
PROJECTIDVARCHAR(16777216)Name of the project (root directory the tool was run on)
FILEIDVARCHAR(16777216)File where the Spark reference was found and the relative path to that file
COUNTNUMBER(38,0)The number of times that element shows up in a single line
ALIASVARCHAR(16777216)The alias of the element (applies just for import elements)
KINDVARCHAR(16777216)A category for each element. These could include Class, Variable, Function, Import and others
LINENUMBER(38,0)The line number in the source files where the element was found
PACKAGENAMEVARCHAR(16777216)The name of the package where the element was found
SUPPORTEDBOOLEANWhether this reference is “supported” or not. Values: True/False
AUTOMATEDBOOLEANWhether or not the tool can automatically convert it. Values: True/False
STATUSVARCHAR(16777216)The categorization of each element. Options: Rename, Direct, Helper, Transformation, WorkAround, NotSupported, NotDefined
STATEMENTVARCHAR(16777216)How that element was used. (Note: This column is not sent via telemetry)
SESSIONIDVARCHAR(16777216)Unique identifier for each run of the tool
SNOWCONVERTCOREVERSIONVARCHAR(16777216)The version number for the core code process of the tool
PANDASVERSIONVARCHAR(16777216)Version number of the pandas API that was used to identify elements in this codebase
CELLIDVARCHAR(16777216)Cell ID where that element was in that FileId (if in a notebook, null otherwise)
EXECUTIONIDVARCHAR(16777216)Unique identifier for each run of the tool
PARAMETERSINFOVARCHAR(16777216)Parameter information
TECHNOLOGYVARCHAR(16777216)Source technology platform

SPARKUSAGESINVENTORIES

Shows the exact location and usage for each reference to the Spark API. This information is used to build the Readiness Score.

ColumnTypeDescription
ELEMENTVARCHAR(16777216)The unique name for the actual Spark reference
PROJECTIDVARCHAR(16777216)Name of the project (root directory the tool was run on)
FILEIDVARCHAR(16777216)File where the Spark reference was found and the relative path to that file
COUNTNUMBER(38,0)The number of times that element shows up in a single line
ALIASVARCHAR(16777216)The alias of the element (applies just for import elements)
KINDVARCHAR(16777216)A category for each element. These could include Class, Variable, Function, Import and others
LINENUMBER(38,0)The line number in the source files where the element was found
PACKAGENAMEVARCHAR(16777216)The name of the package where the element was found
SUPPORTEDBOOLEANWhether this reference is “supported” or not. Values: True/False
AUTOMATEDBOOLEANWhether or not the tool can automatically convert it. Values: True/False
STATUSVARCHAR(16777216)The categorization of each element. Options: Rename, Direct, Helper, Transformation, WorkAround, NotSupported, NotDefined
STATEMENTVARCHAR(16777216)The code where the element was used. (Note: This column is not sent via telemetry)
SESSIONIDVARCHAR(16777216)Unique identifier for each run of the tool
SNOWCONVERTCOREVERSIONVARCHAR(16777216)The version number for the core code process of the tool
SNOWPARKVERSIONVARCHAR(16777216)The version of Snowpark API available for the specified technology and run of the tool
CELLIDNUMBER(38,0)If this element was found in a notebook file, the numbered location of the cell where this element was in the file
EXECUTIONIDVARCHAR(16777216)The unique identifier for this execution of the SMA
PARAMETERSINFOVARCHAR(16777216)Parameter information
TECHNOLOGYVARCHAR(16777216)Source technology platform

SQLEMBEDDEDUSAGEINVENTORIES

Contains a count of SQL keywords present in SQL Spark elements.

ColumnTypeDescription
ELEMENTVARCHAR(16777216)Name for the code element where the SQL was found (such as SqlFromClause, SqlSelect, SqlSelectBody, SqlSignedNumericLiteral)
PROJECTIDVARCHAR(16777216)Name of the project (root directory the tool was run on)
FILEIDVARCHAR(16777216)File where the SQL reference was found and the relative path to that file
COUNTNUMBER(38,0)The number of times that element shows up in a single line
EXECUTIONIDVARCHAR(16777216)Unique identifier for each run of the tool
LIBRARYNAMEVARCHAR(16777216)Name of the library being used
HASLITERALBOOLEANIndicates whether the element contains literals
HASVARIABLEBOOLEANIndicates whether the element contains variables
HASFUNCTIONBOOLEANIndicates whether the element contains functions
PARSINGSTATUSVARCHAR(16777216)Indicates the parsing status (such as Success, Failed, Partial)
HASINTERPOLATIONBOOLEANIndicates whether the element contains interpolations
CELLIDNUMBER(38,0)The notebook cell ID
LINENUMBER(38,0)The line number where that element occurs
COLUMNLINENUMBER(38,0)The column number where that element occurs

SQLFUNCTIONSINVENTORIES

Inventories SQL functions used in the code with their categories and migration status.

ColumnTypeDescription
ELEMENTVARCHAR(16777216)SQL function name
PROJECTIDVARCHAR(16777216)Project identifier
FILEIDVARCHAR(16777216)Identifier for the source file
COUNTNUMBER(38,0)Number of occurrences
CATEGORYVARCHAR(16777216)Function category
MIGRATIONSTATUSVARCHAR(16777216)Migration status for this function
CELLIDNUMBER(38,0)Cell identifier (for notebooks)
LINENUMBER(38,0)Line number in the source file
COLUMNLINENUMBER(38,0)Column position in the line
EXECUTIONIDVARCHAR(16777216)Unique identifier for the execution run

THIRDPARTYUSAGESINVENTORIES

Tracks third-party library and package usage found during code analysis.

ColumnTypeDescription
ELEMENTVARCHAR(16777216)The unique name for the third party reference
PROJECTIDVARCHAR(16777216)Name of the project (root directory the tool was run on)
FILEIDVARCHAR(16777216)File where the Spark reference was found and the relative path to that file
COUNTNUMBER(38,0)The number of times that element shows up in a single line
ALIASVARCHAR(16777216)The alias of the element (if any)
KINDVARCHAR(16777216)Categorization of the element such as variable, type, function, or class
LINENUMBER(38,0)The line number in the source files where the element was found
PACKAGENAMEVARCHAR(16777216)Package name for the element (concatenation of ProjectId and FileId in Python)
STATEMENTVARCHAR(16777216)The code where the element was used. (Note: This column is not sent via telemetry)
SESSIONIDVARCHAR(16777216)Unique identifier for each session of the tool
CELLIDNUMBER(38,0)Cell ID where that element was in that FileId (if in a notebook, null otherwise)
EXECUTIONIDVARCHAR(16777216)Unique identifier for each execution of the tool
PARAMETERSINFOVARCHAR(16777216)Parameter information

TOOLEXECUTIONS

Tracks tool execution metadata including timing, results, and version information.

ColumnTypeDescription
EXECUTIONIDVARCHAR(16777216)Unique identifier for each run of the tool
TOOLNAMEVARCHAR(16777216)The name of the tool. Values: PythonSnowConvert, SparkSnowConvert (Scala tool)
TOOLVERSIONVARCHAR(16777216)The version number of the tool
ASSEMBLYNAMEVARCHAR(16777216)The name of the code processor (essentially, a longer version of the ToolName)
LOGFILEVARCHAR(16777216)Whether a log file was sent on an exception/failure
FINALRESULTVARCHAR(16777216)Where the tool stopped if there was an exception/failure
EXCEPTIONREPORTVARCHAR(16777216)If an exception report was sent on an exception/failure
STARTTIMENUMBER(38,0)The timestamp for when the tool started executing
ENDTIMENUMBER(38,0)The timestamp for when the tool stopped executing
SYSTEMNAMEVARCHAR(16777216)The serial number of the machine where the tool was executing (this is only used for troubleshooting and license validation purposes)

Assessment files to table mapping

The following CSV files from assessment exports map to their corresponding database tables:

CSV FileDatabase Table
ArtifactDependencyInventory.csvARTIFACTDEPENDENCYINVENTORIES
CheckpointsInventory.csvCHECKPOINTSINVENTORIES
DataFramesInventory.csvDATAFRAMESINVENTORIES
DbxElementsInventory.csvDBXELEMENTSINVENTORIES
ExecutionFlowInventory.csvEXECUTIONFLOWINVENTORIES
ImportUsagesInventory.csvIMPORTUSAGESINVENTORIES
InputFilesInventory.csvINPUTFILESINVENTORIES
IOFilesInventory.csvIOFILESINVENTORIES
Issues.csvISSUES
JoinsInventory.csvJOINSINVENTORIES
NotebookCellsInventory.csvNOTEBOOKCELLSINVENTORIES
NotebookSizeInventory.csvNOTEBOOKSIZEINVENTORIES
PackagesInventory.csvPACKAGESINVENTORIES
PandasUsagesInventory.csvPANDASUSAGESINVENTORIES
SparkUsagesInventory.csvSPARKUSAGESINVENTORIES
SqlEmbeddedUsageInventory.csvSQLEMBEDDEDUSAGEINVENTORIES
SqlFunctionsInventory.csvSQLFUNCTIONSINVENTORIES
ThirdPartyUsagesInventory.csvTHIRDPARTYUSAGESINVENTORIES
tool_execution.csvTOOLEXECUTIONS
DetailedReport.docxDETAILEDREPORTS

Schema notes

  • The DOCUMENTATION_METADATA table uses vector embeddings (768 dimensions) for semantic search.
  • Tables are created with CREATE OR REPLACE to ensure clean setup on each initialization.

Stored procedures

The assistant creates several stored procedures in your Snowflake account to power the natural language query processing capabilities. These procedures leverage Snowflake Cortex AI to interpret questions, generate SQL, and format responses.

GET_CHATBOT_RESPONSE

The primary stored procedure that processes natural language questions and returns human-readable answers using Snowflake Cortex AI.

Signature

GET_CHATBOT_RESPONSE(IA_MODEL VARCHAR, QUESTION VARCHAR, EXECUTION_ID VARCHAR)
RETURNS VARCHAR

Parameters

ParameterTypeDescription
IA_MODELVARCHARThe Snowflake Cortex AI model to use (such as ‘llama3.1-70b’, ‘mistral-large2’)
QUESTIONVARCHARThe natural language question from the user
EXECUTION_IDVARCHARThe execution ID to filter data to a specific assessment run

Returns

VARCHAR - A human-readable answer to the user’s question

Workflow

The procedure implements a multi-step process:

  1. Context Retrieval (RAG)

    • Performs vector similarity search on DOCUMENTATION_METADATA table.
    • Uses SNOWFLAKE.CORTEX.EMBED_TEXT_768 with the ‘e5-base-v2’ model.
    • Retrieves the most relevant documentation using VECTOR_COSINE_SIMILARITY.
  2. SQL Generation

    • Constructs a prompt with the retrieved context and user question.
    • Calls SNOWFLAKE.CORTEX.COMPLETE to generate a SQL query.
    • The generated query is scoped to the specific EXECUTION_ID.
  3. SQL Validation (Self-Correction Loop 1)

    • Validates whether the generated SQL logically answers the question.
    • Uses Snowflake Cortex to perform a yes/no validation check.
  4. Fallback Path (if SQL is invalid)

    • Retrieves full report context from DETAILEDREPORTS table.
    • Attempts to answer using the detailed report as context.
    • Performs another validation check on the answer quality.
    • Falls back to general knowledge if context-based answer is insufficient.
  5. SQL Execution (if SQL is valid)

    • Executes the generated SQL query dynamically.
    • Stores results in a temporary table for processing.
    • Handles single results, multiple results (aggregates up to 10), and empty results.
  6. Response Formatting

    • Creates a final prompt combining the question and query results.
    • Calls Snowflake Cortex again to format the answer as a friendly, natural sentence.
    • Returns the human-readable response.

Error Handling

  • Returns "I could not find any data that matched your request." when query returns zero rows.
  • Automatically falls back through multiple strategies to ensure a meaningful answer.

Example

CALL GET_CHATBOT_RESPONSE(
    'llama3.1-70b',
    'How many Python files were analyzed?',
    'ABC123XYZ'
);

GET_CURRENT_REGION_DETAILS

Retrieves information about the current Snowflake account’s region, including cloud provider and region display name.

Signature

GET_CURRENT_REGION_DETAILS()
RETURNS TABLE (
    CLOUD_PROVIDER VARCHAR,
    CLOUD_REGION_NAME VARCHAR,
    CLOUD_REGION_DISPLAY_NAME VARCHAR
)

Parameters

None

Returns

A table with three columns:

ColumnTypeDescription
CLOUD_PROVIDERVARCHARThe cloud provider (such as ‘AWS’, ‘Azure’, ‘GCP’)
CLOUD_REGION_NAMEVARCHARThe technical region identifier (such as ‘us-east-1’)
CLOUD_REGION_DISPLAY_NAMEVARCHARThe human-readable region name (such as ‘US East (N. Virginia)’)

Workflow

  1. Executes SHOW REGIONS to populate the result set cache.
  2. Queries the result using RESULT_SCAN(LAST_QUERY_ID()).
  3. Filters to only the region matching CURRENT_REGION().
  4. Returns the filtered result as a table.

Purpose

This procedure is used to determine which Snowflake Cortex AI models are available, as model availability varies by region. The assistant uses this information to populate the LLM selection dropdown with region-specific options.

Example

CALL GET_CURRENT_REGION_DETAILS();
Output
CLOUD_PROVIDERCLOUD_REGION_NAMECLOUD_REGION_DISPLAY_NAME
AWSus-west-2US West (Oregon)

Stored procedure notes

  • Both procedures are created with CREATE OR REPLACE to ensure clean setup.
  • GET_CHATBOT_RESPONSE executes as OWNER to access the necessary tables and Snowflake Cortex functions.
  • The procedures use Snowflake’s dynamic SQL capabilities (EXECUTE IMMEDIATE) for flexible query execution.
  • Vector embeddings use the ‘e5-base-v2’ model with 768 dimensions for semantic search.
  • The multi-step validation process ensures high-quality, contextually relevant answers.

Troubleshooting

Resetting the assistant infrastructure

If you need to re-create the assistant infrastructure from scratch, you must first delete the local configuration file.

Configuration file location

  • macOS/Linux: ~/.smachatbot/config.json
  • Windows: %USERPROFILE%.smachatbotconfig.json

This JSON configuration file contains a composite key consisting of:

  • snowflake_identifier
  • snowflake_user
  • snowflake_role

To reset the assistant:

  1. Delete the config.json file from the appropriate directory:

    • macOS/Linux: ~/.smachatbot/config.json
    • Windows: %USERPROFILE%.smachatbotconfig.json
  2. Re-run the assistant initialization process

    Note

    Deleting this file will remove all stored connection settings and require you to reconfigure the assistant with your Snowflake credentials.

Known issues / FAQs

Snowflake connection caching with VPN

Issue

When testing the connection to Snowflake while a VPN is required but not yet connected, the Snowflake driver caches the failed connection state. Even after successfully connecting to the VPN, subsequent connection attempts may still fail due to this cached state.

Workaround

If you experience connection failures after connecting to your VPN:

  1. Close and restart the SMA application.

  2. Attempt the connection test again with the VPN active from the start.

    Tip

    Always ensure your VPN is connected before initiating any Snowflake connection tests to avoid this caching behavior.