Tutorial 1: Providers set up and test a CKE¶
Introduction¶
For providers, this tutorial describes how to set up and test your CKE.
What you’ll learn¶
In this tutorial you’ll learn how to:
- Create Snowflake objects
- Load your data into Snowflake
- Chunk your documents
- Create the Cortex Search Service
- Verify the CKE is working correctly
- Share and test the CKE with a consumer account
Prerequisites¶
The following prerequisites are required to complete this tutorial:
- You have a Snowflake account and user with a role that grants the necessary privileges to create a database, tables, virtual warehouse objects, Cortex Search services, and Streamlit apps.
Refer to the Snowflake in 20 minutes for instructions to meet these requirements.
Step 1: Create Snowflake objects¶
The first step is to create Snowflake objects.
Use the accountadmin role.
Create a warehouse named xsmall_cke_getting_started for creating and updating the index.
Create a separate role named cke_owner.
Create and use a database named cke_getting_started.
Create and use a schema called articles.
Step 2: Load your data into Snowflake¶
The next step is to load your data into Snowflake. Refer to Load data into Snowflake for more information.
The example code below stores data in a Snowflake table named cke_simple_article in the following format:
| Column name | Type | Description |
|---|---|---|
DOCUMENT_ID | VARCHAR | The unique identifier for the document. This is the primary key of the table. |
DOCUMENT_TITLE | VARCHAR | The title of the document. |
SOURCE_URL | VARCHAR | A URL linking to the source of a document. |
DOCUMENT_TEXT | VARCHAR | The document contents, parsed as text. This is the content that will be indexed and searched. |
Note that you can include additional document metadata in your indexed dataset. In our example below, we include only SOURCE_URL and DOCUMENT_ID, but you can add more columns depending on your document source.
Create a simple table.
Now insert some sample data into that table.
Step 3. Chunk your documents¶
Before creating a Cortex Search Service, we need to ensure that each “chunk” of indexed text is no more than approximately 375 words of text. To do this, we can apply a chunking algorithm via a Snowpark UDF that imports LangChain. First, we create a chunking UDF. Then, we apply that UDF to the cke_simple_article table and store the chunks in a cke_simple_article_chunks table. And finally, we verify that the chunks were created.
Run the example below to chunk the articles into parts for the Cortex Search Service. This process can take several minutes to complete.
Run the example below to split the documents into chunks for indexing.
Run the following to verify that the chunks were created.
Step 4. Create the Cortex Search Service¶
Now configure a Cortex Search Service named cke_simple_cortex_search_service to run on warehouse
xsmall_cke_getting_started and reference the chunked document table cke_simple_article_chunks. Note that this step can
take considerable time to complete, depending on the size of the database.
Step 5. Test the CKE¶
To verify the CKE is working correctly you can issue a simple query to the Cortex Search Service. This will verify that the service has correctly indexed your documents and that relevant documents come back from queries. This query should return the first chunk of the article “The Greenfield Biosphere” with a link to the source URL.
Step 6: Share the CKE privately for testing¶
After the Cortex Search Service has been created and is correctly responding to queries, you can share it. This shared Cortex Search Service is the Cortex Knowledge Extension. In this step, you’ll create a private listing and share it with another account for testing. Then you’ll test the listing in the consumer account that you shared the CKE with.
Create the share¶
- sign in to Snowsight.
- in the navigation menu, select Marketplace » Provider Studio.
- Select Listing in the upper-right corner and select Specified Consumers.
- Provide a title for the listing, and then click Next.
- Click + Select for What’s in the listing?.
- Select CKE_GETTING_STARTED.
- Expand ARTICLES.
- Expand Cortex Search Service.
- Select CKE_SIMPLE_CORTEX_SEARCH_SERVICE, and then select Done.
- Enter a description for the listing.
- Under Add consumer accounts, add the Snowflake account that you want to share and test the Cortex Knowledge Extension with. Note that must be in the same region as the provider, and you must have access to this account.
Test the share in a consumer account¶
-
sign in to Snowsight using the consumer account that you shared the CKE with above.
-
in the navigation menu, select Data sharing » Internal sharing.
-
Here, you should see the CKE_GETTING_STARTED listing that you shared above. Select Get.
-
Open a new worksheet and run the SQL command below to verify that the account has access to the shared data.
Note
If you specified name other than CKE_GETTING_STARTED in the Get dialog, you’ll need to change that in the snippet above.
At this point, you have a functional Cortex Knowledge Extension!