Tutorial: Snowflake Native SDK for Connectors Java connector template¶
Introduction¶
Welcome to our tutorial on using a connector template utilizing Snowflake Native SDK for Connectors. This guide will help you setup a simple Connector Native Application.
In this tutorial you will learn how to:
- Deploy a Connector Native Application
- Configure a template connector to ingest data
- Customize a template connector to your own needs
The template contains various helpful comments in the code to make it easier for you to find specific files that need to be modified. Look for the comments with the following keywords, they will guide you and help implement your own connector:
TODOTODO: HINTTODO: IMPLEMENT ME
Before you begin this tutorial, you should prepare yourself by reviewing the following recommended content:
- Snowflake Native SDK for Connectors
- Tutorial: Snowflake Native SDK for Connectors example Java connector
Prerequisites¶
Before getting started please make sure that you meet the following requirements:
- Access to a Snowflake account with an
ACCOUNTADMINrole - Review Snowflake Native SDK for Connectors and keep it open while following this tutorial
- Review Tutorial: Snowflake Native SDK for Connectors example Java connector
- That tutorial uses an example connector based on this template and it can be referenced to check out example implementations of various components.
Prepare your local environment¶
Before proceeding you need to make sure all necessary software is installed on your machine and clone the connector template.
Java installation¶
Snowflake Native SDK for Connectors requires Java LTS (Long-Term Support) version 11 or higher. If the minimum required version of Java is not installed on your machine, you must install either Oracle Java or OpenJDK.
Oracle Java¶
The latest LTS release of the JDK is free to download and use, at no cost, under the Oracle NFTC. For download and installation instructions go to the Oracle page.
OpenJDK¶
OpenJDK is an open-source implementation of Java. For download and installation instructions go to openjdk.org and jdk.java.net.
You may also use a 3rd party OpenJDK version, such as Eclipse Temurin or Amazon Corretto.
Snowflake CLI configuration¶
The Snowflake CLI tool is required to build, deploy and install the connector. If you do not have Snowflake CLI on your machine - install it as per instructions available in Installing Snowflake CLI.
After the tool is installed - you need to configure a connection to Snowflake in your configuration file.
If you do not have any connections configured - create a new one named native_sdk_connection. You
can find an example connection in the deployment/snowflake.toml file.
If you already have a connection configured and would like to use it with the connector - use its
name instead of native_sdk_connection whenever this connection is used in this tutorial.
Template cloning¶
To clone the connector template use the following command:
In place of <project_dir> enter the name of the directory (it must not exist) in which the Java
project of your connector will be created.
After executing the command you will be asked to provide additional information for application instance and stage name configuration. You may provide any names, as long as they are valid unquoted Snowflake identifiers, or click enter to use the default values, which are shown in the square brackets.
An example command execution, providing custom application and stage names:
Connector build, deployment, and cleanup¶
The template can be deployed out of the box, even before any modification. The following sections will show you how to build, deploy and install the connector.
Build the connector¶
Building a connector created using Snowflake Native SDK for Connectors is a bit different from building a typical Java application. There are some things which must be done besides just building the .jar archives from the sources. Building the application consists of the following steps:
- Copying custom internal components to the build directory
- Copying SDK components to the build directory
Copy internal components¶
This step builds the connector .jar file and then copies it (along with the UI, manifest and setup
files) to the sf_build directory.
To run this step execute the command: ./gradlew copyInternalComponents.
Copy SDK components¶
This step copies the SDK .jar file (added as a dependency to the connector Gradle module) to the
sf_build directory and extracts bundled .sql files from the .jar archive.
Those .sql files allow the customization of which provided objects will be created during the
application installation. For the first time users customization is not recommended, because omitting
objects may cause some features to fail if done incorrectly. The template connector application uses
the all.sql file, which creates all recommended SDK objects.
To run this step execute the command: ./gradlew copySdkComponents.
Deploy the connector¶
To deploy a Native App an application package needs to be created inside Snowflake. After that all
the files from the sf_build directory need to be uploaded to Snowflake.
Please note - for development purposes version creation is optional, an application instance can be created directly from staged files. This approach allows you to see changes in most of the connector files without recreating the version and application instance.
The following operations will be performed:
- Create a new application package, if it does not already exist
- Create a schema and file stage inside the package
- Upload files from the
sf_builddirectory to the stage (this step may take some time)
To deploy the connector execute the command: snow app deploy --connection=native_sdk_connection.
For more information about the snow app deploy command see snow app deploy.
The created application package will now be visible in the App packages tab, in the
Data products category, in the Snowflake UI of your account.
Install the connector¶
Installation of the application is the last step of the process. It creates an application from the application package created previously.
To install the connector execute the command: snow app run --connection=native_sdk_connection.
For more information about the snow app run command see snow app run.
The installed application will now be visible in the Installed apps tab, in the
Data products category, in the Snowflake UI of your account.
Update connector files¶
If at any point you wish to modify any of the connector files - you can easily upload the modified files into the application package stage. The upload command depends on which files were updated.
Before any of the update commands are run - you have to copy the new files of your connector to the
sf_build directory by running: ./gradlew copyInternalComponents
UI .py files or connector .java files¶
Use the snow app deploy --connection=native_sdk_connection command, the current application
instance will use the new files without reinstallation.
setup.sql or manifest.yml files¶
Use the snow app run --connection=native_sdk_connection command, the current application
instance will be reinstalled after the new files are uploaded to stage.
Cleanup¶
After the tutorial is completed, or if for any reason you want to remove the application and its package, you can completely remove them from your account using the command:
snow app teardown --connection=native_sdk_connection --cascade --force
The --cascade option is needed to remove the destination database without transferring the ownership
to the account admin. In real connectors the database should not be removed to preserve the ingested
data, it should be either owned by the account admin or ownership should be transferred before
uninstallation.
Please note - the connector will consume credits until it is paused or removed, even if no ingestion was configured!
Prerequisites step¶
Right after installation the Connector is in its Wizard phase. This phase consists of a few steps that guide the end user through all the necessary configurations.
The first step is the Prerequisites step. It is optional and might not be necessary for every connector. Prerequisites are usually actions required from the user outside of the application, e.g. running queries in the SQL worksheet, doing configuration on the source system side, etc.
Read more about prerequisites: Prerequisites
The contents of each prerequisite are retrieved directly from the STATE.PREREQUISITES table,
located inside the connector. They can be customized through the setup.sql script. However, keep
in mind that the setup.sql script is executed on every installation, upgrade and downgrade of the
application. The inserts must be idempotent, because of this it is recommended to use a merge query
as in the example below:
Connector configuration step¶
The next step of the Wizard Phase is the connector configuration step. During this step you can configure database objects and permissions required by the connector. This step allows for the following configuration properties to be specified:
warehouseoperational_warehousecortex_warehousedestination_databasedestination_schemaglobal_scheduledata_owner_rolecortex_user_roleagent_usernameagent_role
If you need any other custom properties, they can be configured in one of the next steps of the Wizard phase. For more information on each of the properties see: Connector configuration
Additionally, the Streamlit component (streamlit/wizard/connector_config.py) provided in the
template shows how to trigger the Native Apps Permission SDK
and requests privilege grants from the end-user. As long as the available properties satisfy the
needs of the connector then there is no need to overwrite any of the backend classes, although this
is still possible the same way as for the components in the further steps of the configuration.
For more information on internal procedures and Java objects see: Connector configuration reference
The provided Streamlit example allows for requesting account level privileges configured in the
manifest.yml file - CREATE DATABASE and EXECUTE TASKS. It also allows the user to specify
a warehouse reference through the Permission SDK popup.
In the template, the user is asked to only provide the destination_database and destination_schema.
However, a TODO comment in streamlit/wizard/connector_configuration.py contains commented
code that can be reused to display more input fields in the Streamlit UI.
Connection configuration step¶
The next step of the Wizard Phase is the connection configuration step. This step allows the end-user to configure external connectivity parameters for the connector. This configuration may include identifiers of objects like secrets, integrations, etc.
Because this information varies depending on the source system for the data ingested by the connector, this is the first place where bigger customizations have to be made in the source code.
For more information on connection configuration see:
Starting with the Streamlit UI side (streamlit/wizard/connection_config.py) you need to add text
inputs for all needed parameters. An example text input is implemented for you and if you search the
code in this file, you can find a TODO with commented code for a new field.
After the properties are added to the form, they need to be passed to the backend layer of the connector.
To do so, two additional places must be modified in the Streamlit files. The first one is the
finish_config function in the streamlit/wizard/connection_config.py file. The state of the
newly added text inputs must be read here. Additionally, it can be validated if needed, and then passed
to the set_connection_configuration function.
For example if additional_connection_property was added it would look like this after the edits:
Then the set_connection_configuration function must be edited, it can be found in the
streamlit/native_sdk_api/connection_config.py file. This function is a proxy between Streamlit UI
and the underlying SQL procedure, which is an entry points to the backend of the connector.
After doing this, the new property is saved in the internal connector table, which contains the configuration. However, this is not the end of the possible customizations. Some backend components can be customized too, look for the following comments in the code to find them:
TODO: IMPLEMENT ME connection configuration validateTODO: IMPLEMENT ME connection callbackTODO: IMPLEMENT ME test connection
The validate part allows for any additional validation on the data received from the UI. It can also transform the data, e.g. change the character case, trim the provided data, or check if objects with provided names actually exist inside Snowflake.
Connection callback is a part that lets you perform any additional operation based on the config, e.g. alter procedures that need to use external access integrations, using a solution described in External integration setup reference.
Test connection is the final component of the connection configuration, it checks whether the connection can be established between the connector and the source system.
For more information on those internal components see:
Example implementations might look like this:
Finalize configuration step¶
The finalize connector configuration step is the final step of the Wizard Phase. This step has multiple responsibilities:
- Allows the user to specify any additional configuration needed by the connector
- Creates the sink database, schema and additional tables and views for the ingested data if needed
- Initializes internal components such as the scheduler and task reactor
For more information on configuration finalization see:
For more information on task reactor and scheduling see:
Similarly to the connection configuration step, customization can be started with the Streamlit UI.
The streamlit/wizard/finalize_config.py file contains a form with an example property. More
properties can be added according to the connector needs. To add another property look for a TODO
comment, that contains example code of adding a new property in the mentioned file.
After adding the text input for a new property it needs to be passed to the backend side. To do so,
modify the finalize_configuration function in the same file:
Next, open the streamlit/native_sdk_api/finalize_config.py file and add the new property to the
following function:
Again, similarly to the connection configuration step, this step also allows for the customization of various backend components, they can be found using the following comments in the source code:
TODO: IMPLEMENT ME validate sourceTODO: IMPLEMENT ME finalize internal
The validate source part is responsible for performing more sophisticated validations on the source systems. If the previous test connection only checked that a connection can be established, then validate source could check access to specific data in the system, e.g. extracting a single record of data.
Finalize internal is an internal procedure responsible for initializing task reactor and scheduler, creating a sink database and any necessary nested objects. It can also be used to save the configuration provided during the finalize step (this configuration is not saved by default).
More information on the internal components can be found in:
Additionally, input can be validated using the FinalizeConnectorInputValidator interface and
providing it to the finalize handler - check the TemplateFinalizeConnectorConfigurationCustomHandler file.
More information on using builders can be found in: Stored procedures and handlers customization.
Example implementation of the validate source might look like this:
Create resources¶
After the Wizard Phase is completed, the connector is ready to start ingesting data. But first, the resources must be implemented and configured. A resource is an abstraction describing a specific set of data in the source system, e.g. a table, an endpoint, a file, etc.
Different source systems might need different information about a resource - for that reason a resource
definition needs to be customized according to the specific needs. To do so, go to the streamlit/daily_use/data_sync_page.py
file. There you can find a TODO comment about adding text inputs for resource parameters. The
resource parameters should allow for the identification and retrieval of data from the source system.
Those parameters can be then extracted during the ingestion.
Once all necessary properties are added to the form, they can be passed to the backend side.
First, the state of the text fields has to be extracted and passed to the API level queue_resource
method in the streamlit/daily_use/data_sync_page.py file:
Then the create_resource function from the streamlit/native_sdk_api/resource_management.py file
needs to be updated:
Customizing CREATE_ RESOURCE() procedure logic¶
The PUBLIC.CREATE_RESOURCE() procedure allows the developer to customize its execution by implementing
custom logic that is plugged into several places of the main execution flow. The SDK allows the developer to:
- Validate the resource before it’s created. The logic should be implemented in the
PUBLIC.CREATE_RESOURCE_VALIDATE()procedure. - Perform custom operations before the resource is created. The logic should be implemented in the
PUBLIC.PRE_CREATE_RESOURCE()procedure. - Perform custom operations after the resource is created. The logic should be implemented in the
PUBLIC.POST_CREATE_RESOURCE()procedure.
More information about PUBLIC.CREATE_RESOURCE() procedure customization can be found here:
TemplateCreateResourceHandler.java¶
This class is a handler for the PUBLIC.CREATE_RESOURCE() procedure. Here, you can inject the Java
implementations of handlers for callback procedures mentioned before. By default the template provides
mocked Java implementations of callback handlers in order to get rid of calling SQL procedures, which
would extend the procedure execution time - Java implementations make the execution faster. These
mocked implementations do nothing apart from returning a success response. You can either provide the
custom implementation to the callback classes prepared by the template or create these callbacks
from scratch and inject them to the main procedure execution flow in the handler builder.
In order to implement the custom logic of callback methods that are called by default, look for the following comments in the code:
TODO: IMPLEMENT ME create resource validateTODO: IMPLEMENT ME pre create resource callbackTODO: IMPLEMENT ME post create resource callback
Ingestion¶
To perform ingestion of data you need to implement a class that will handle the connection with the source system and retrieve data based on the resource configuration. Scheduler and Task Reactor modules will take care of triggering and queueing of the ingestion tasks.
Ingestion logic is invoked from the TemplateIngestion class. Look for the TODO: IMPLEMENT ME ingestion
comment in the code and replace the random data generation with the data retrieval from the source system.
If you added custom properties to the resource definition, they can be fetched from the internal
connectors tables using the ResourceIngestionDefinitionRepository and properties available in the
TemplateWorkItem:
resourceIngestionDefinitionIdingestionConfigurationId
Example of retrieving data from a webservice might look like this:
Manage resources lifecycle¶
Once the logic of creating resources and the their ingestion is implemented, you can manage their lifecycle by calling the following procedures:
PUBLIC.ENABLE_RESOURCE()enables a particular resource, meaning that it will be scheduled for ingestionPUBLIC.DISABLE_RESOURCE()disables a particular resource, meaning that its ingestion scheduling will be stoppedPUBLIC.UPDATE_RESOURCE()allows you to update the ingestion configurations of a particular resource. It isn’t implemented in the Streamlit UI by default because sometimes it may be undesirable for the developer to allow the connector user to customize the ingestion configuration (revoke grants on this procedure to application roleADMINin order to disallow its usage completely).
All these procedures have Java handlers and are extended with callbacks that allow you to customize their execution. You can inject custom implementations of callbacks using the builders for these handlers. By default the template provides mocked Java implementations of callback handlers. These mocked implementations do nothing apart from returning a success response. You can either provide the custom implementation to the callback classes prepared by the template or create these callbacks from scratch and inject them to the main procedure execution flow in the handler builders.
TemplateEnableResourceHandler.java¶
This class is a handler for the PUBLIC.ENABLE_RESOURCE() procedure, which can be extended with
the callbacks that are dedicated to:
- Validate the resource before it’s enabled. Look for the
TODO: IMPLEMENT ME enable resource validatecomment in the code to provide the custom implementation. - Perform custom operations before the resource is enabled. Look for the
TODO: IMPLEMENT ME pre enable resourcecomment in the code to provide the custom implementation. - Perform custom operations after the resource is enabled. Look for the
TODO: IMPLEMENT ME post enable resourcecomment in the code to provide the custom implementation.
Learn more from the PUBLIC.ENABLE_RESOURCE() procedure detailed documentations:
TemplateDisableResourceHandler.java¶
This class is a handler for the PUBLIC.DISABLE_RESOURCE() procedure, which can be extended with the callbacks that are
dedicated to:
- Validate the resource before it’s disabled. Look for the
TODO: IMPLEMENT ME disable resource validatecomment in the code to provide the custom implementation. - Perform custom operations before the resource is disabled. Look for the
TODO: IMPLEMENT ME pre disable resourcecomment in the code in order to provide the custom implementation.
Learn more from the PUBLIC.DISABLE_RESOURCE() procedure detailed documentations:
TemplateUpdateResourceHandler.java¶
This class is a handler for the PUBLIC.UPDATE_RESOURCE() procedure, which can be extended with
the callbacks that are dedicated to:
- Validate the resource before it’s updated. Look for the
TODO: IMPLEMENT ME update resource validatecomment in the code to provide the custom implementation. - Perform custom operations before the resource is updated. Look for the
TODO: IMPLEMENT ME pre update resourcecomment in the code to provide the custom implementation. - Perform custom operations after the resource is updated. Look for the
TODO: IMPLEMENT ME post update resourcecomment in the code to provide the custom implementation.
Learn more from the PUBLIC.UPDATE_RESOURCE() procedure detailed documentations:
Settings¶
The template contains a settings tab that lets you view all the configuration made before.
However, if configuration properties were customized, then this view also needs some customizations.
Settings tab code can be found in the streamlit/daily_use/settings_page.py file.
To customize it, simply extract the values from the configuration for the keys that were added in
the respective configurations. For example, if earlier additional_connection_property was added
in the connection configuration step, then it could be added in the settings view like this: