자습서 1: 공급자가 CKE 를 설정하고 테스트하기

소개

공급자를 위해, 이 자습서에서는 CKE 를 설정하고 테스트하는 방법을 설명합니다.

알아볼 내용

이 자습서에서는 다음 방법을 배웁니다.

  • Snowflake 오브젝트 만들기

  • Snowflake에 데이터 로딩하기

  • 문서 정리하기

  • Cortex Search Service 만들기

  • CKE 가 올바르게 작동하는지 확인

  • 컨슈머 계정으로 CKE 공유 및 테스트하기

전제 조건

이 자습서를 완료하려면 다음과 같은 필수 조건이 필요합니다.

  • 데이터베이스, 테이블, 가상 웨어하우스 오브젝트, Cortex Search Service 및 Streamlit 앱을 만드는 데 필요한 권한을 부여하는 역할이 있는 Snowflake 계정과 사용자가 있습니다.

이러한 요구 사항을 충족하기 위한 지침은 20분만에 Snowflake 시작하기 섹션을 참조하십시오.

1단계: Snowflake 오브젝트 생성

첫 번째 단계는 Snowflake 오브젝트를 생성하는 것입니다.

계정 관리자 역할을 사용합니다.

use role accountadmin;
Copy

인덱스 생성 및 업데이트를 위해 xsmall_cke_getting_started 라는 이름의 웨어하우스를 생성합니다.

create warehouse xsmall_cke_getting_started warehouse_size=xsmall;
Copy

cke_owner 라는 별도의 역할을 생성합니다.

create role cke_owner;
grant role cke_owner to user admin;
grant usage on warehouse xsmall_cke_getting_started to role cke_owner;
Copy

cke_getting_started 라는 데이터베이스를 생성하여 사용합니다.

grant create database on account to role cke_owner;
use role cke_owner;
create database cke_getting_started;
use database cke_getting_started;
Copy

articles 라는 스키마를 생성해 사용합니다.

create schema articles;
use schema articles;
Copy

2단계: Snowflake에 데이터 로딩하기

다음 단계는 데이터를 Snowflake에 로딩하는 것입니다. 자세한 내용은 Snowflake에 데이터 로드하기 섹션을 참조하십시오.

아래 예제 코드는 cke_simple_article 이라는 이름의 Snowflake 테이블에 데이터를 다음 형식으로 저장합니다.

열 이름

타입

설명

DOCUMENT_ID

VARCHAR

문서의 고유 식별자입니다. 테이블의 기본 키입니다.

DOCUMENT_TITLE

VARCHAR

문서의 제목입니다.

SOURCE_URL

VARCHAR

문서의 출처로 연결되는 URL 링크입니다.

DOCUMENT_TEXT

VARCHAR

텍스트로 구문 분석된 문서 내용입니다. 인덱스화되어 검색될 내용입니다.

인덱스된 데이터 세트에 추가 문서 메타데이터를 포함할 수 있습니다. 아래 예제에서는 SOURCE_URLDOCUMENT_ID 만 포함했지만, 문서 소스에 따라 열을 더 추가할 수 있습니다.

간단한 테이블을 생성합니다.

create or replace table cke_simple_article (
    DOCUMENT_ID VARCHAR,
    DOCUMENT_TITLE VARCHAR,
    SOURCE_URL VARCHAR,
    text VARCHAR
);
Copy

이제 해당 테이블에 샘플 데이터를 삽입합니다.

INSERT INTO cke_simple_article (DOCUMENT_ID, DOCUMENT_TITLE, SOURCE_URL, TEXT)
VALUES
    ('DOC_001', 'Sample Article 1', 'https://example.com/article1', 'This is some sample text for the first article.'),
    ('DOC_002', 'Sample Article 2', 'https://example.com/article2', 'Another sample text entry for the second article.'),
    ('DOC_003', 'Sample Article 3', 'https://example.com/article3', 'Yet another piece of text for the third article.');

INSERT INTO cke_simple_article (
    DOCUMENT_ID,
    DOCUMENT_TITLE,
    SOURCE_URL,
    text
)
VALUES (
    'DOC-GREEN-001',
    'The Grand Opening of Greenfield Biosphere',
    'https://www.example.com/news/greenfield-biosphere',
    'Greenfield Biosphere, nestled in the heart of a once-industrial landscape, opened its doors to the public today amid great fanfare and curiosity. This ambitious environmental initiative, spanning over 120 acres of reclaimed land, has been designed to house thousands of diverse plant species and animals under one vast, transparent dome. Over the past decade, teams of botanists, engineers, and conservationists collaborated intensively to restore the soil quality, implement renewable energy solutions, and establish sustainable water sources. Their efforts have resulted in an oasis that stands as a testament to nature''s resilience and humanity''s unwavering determination to coexist with it.

    Upon entering the biosphere, visitors pass through a series of controlled airlocks that maintain precise temperature and humidity levels, ensuring the delicate balance required for each habitat. The moment they step inside, a multitude of colors and scents envelops them. Towering palm trees sway gently, nurtured by a carefully engineered irrigation system that recycles water across various sections of the dome. Exotic butterflies flutter past patches of vibrant orchids, while small reptiles scurry along the edge of meandering pathways. Every detail, from lighting angles to seed selection, has been meticulously planned to promote biodiversity in a space that once lay barren.

    Local officials and environmental organizations herald this project as a bold step toward reversing ecological decline. The region had suffered decades of industrial pollution, leaving the soil depleted and wildlife populations on the brink of collapse. Public interest soared once the Greenfield Biosphere project was announced, prompting unprecedented fundraising campaigns and private investments. Citizens volunteered their time to plant seedlings, build composting facilities, and educate children on the importance of ecological stewardship. Now, as thousands explore the dome on opening day, excitement mingles with a sense of responsibility, fueling hope that this initiative can serve as a catalyst for broader restoration efforts.

    Beyond merely a tourist attraction, the Greenfield Biosphere plays a crucial role in scientific research. Biologists and ecologists from universities around the globe have established research stations within the dome to study plant migration, cross-pollination, and microclimates. Through advanced sensor networks, they collect data on everything from soil moisture levels to carbon sequestration rates, aiming to develop cutting-edge conservation strategies. Already, preliminary findings suggest that certain flora species exhibit faster growth rates under partial shade, which could help inform future reforestation projects. This research extends to aquatic ecosystems as well, with scientists closely monitoring newly formed ponds and streams for indicators of ecosystem health.

    During the grand opening ceremony, Mayor Allison Pierce praised the community for its unwavering dedication to the biosphere''s development. She emphasized how interagency cooperation and community outreach were pivotal in transforming a polluted wasteland into a verdant sanctuary. In her address, she remarked on the significance of involving local youth, who contributed to the design through art projects and educational workshops. According to Mayor Pierce, the next phase of the project will include expanding the biosphere''s capacity for endangered species breeding programs. This could cement the region''s reputation as a global leader in ecological preservation and innovation.

    For many, the real highlight of the day was the unveiling of the arboretum wing, a temperature-controlled section featuring ancient tree species that have long faced threats from illegal logging and habitat loss. Towering redwoods, thought to be too large to grow under a dome, stand proudly after years of careful nurturing. Visitors stood in awe as the directors revealed that these trees'' root systems, painstakingly preserved and transplanted, are now thriving in custom-engineered soil mixtures. A sense of reverence filled the air, with many attendees describing the experience as spiritual. The seed of hope planted in the community has visibly taken root.

    The venture''s economic impact is another key talking point. Local shops and restaurants anticipate an influx of tourists, and hotels report reservations scheduled months in advance. Construction of new eco-lodges in the surrounding areas is already underway, promising a blend of comfortable accommodations with sustainable building practices. The city council has also approved additional funding to improve roads and public transportation to accommodate the expected rise in visitor numbers. Environmental advocates caution, however, that increased foot traffic could inadvertently strain the biosphere''s delicate ecosystems, calling for balanced planning and continued emphasis on conservation education.

    Inside the administrative office, a dedicated operations team monitors real-time data feeds, adjusting temperature, humidity, and nutrient levels to meet each species'' unique needs. Modular solar panels installed around the dome generate sufficient electricity to power the entire facility, showcasing how renewable energy can be integrated seamlessly with large-scale infrastructure. Outside, an innovative wastewater treatment plant recycles greywater for irrigation, minimizing resource consumption. The architects behind the biosphere believe these sustainable technologies can be replicated in other communities looking to rehabilitate degraded land, turning once-polluted sites into living laboratories for environmental stewardship.

    While the facility is only in its first phase, future expansions are already on the drawing board. There are plans to introduce a marine habitat zone featuring coral reef tanks that highlight threats to underwater ecosystems. Specially designed walkways will give visitors a close-up view of these aquatic wonders without disturbing the delicate organisms within. Meanwhile, education programs will be expanded to local schools, offering field trips where students can learn about biodiversity, climate change, and sustainable technologies. The hope is that exposure to this living exhibit will inspire the next generation of environmental scientists, engineers, and policymakers.

    As dusk settled over the glass dome, a soft, multi-colored illumination replaced the natural daylight, casting enchanting shadows across the tropical foliage. Families strolled slowly along the paths, pausing to read plaques about the origins of each plant or to marvel at the occasional flutter of nocturnal pollinators. Meanwhile, a gentle hum of conversation reverberated in the background, carrying sentiments of astonishment and gratitude. The first day at Greenfield Biosphere ended with a collective realization that, with mindful planning, community collaboration, and respect for nature''s inherent wisdom, it is indeed possible to transform a scarred landscape into a flourishing haven for life and innovation.'
);
Copy

3단계. 문서 청크하기

Cortex Search Service를 만들기 전에 인덱스된 텍스트의 각 ‘청크’가 약 375단어 이하의 텍스트인지 확인해야 합니다. 이를 위해 LangChain 을 가져오는 Snowpark UDF 를 통해 청크 알고리즘을 적용할 수 있습니다. 먼저 청크 UDF 를 생성합니다. 그런 다음 cke_simple_article 테이블에 UDF 를 적용하고 cke_simple_article_chunks 테이블에 청크를 저장합니다. 마지막으로 청크가 생성되었는지 확인합니다.

아래 예제를 실행하여 문서를 Cortex Search Service를 위한 부분으로 청크합니다. 이 과정을 완료하는 데 몇 분 정도 걸릴 수 있습니다.

CREATE OR REPLACE FUNCTION text_chunker(text STRING)
    RETURNS TABLE (chunk VARCHAR)
    LANGUAGE PYTHON
    RUNTIME_VERSION = '3.9'
    HANDLER = 'text_chunker'
    PACKAGES = ('snowflake-snowpark-python', 'langchain')
    AS
$$
from snowflake.snowpark.types import StringType, StructField, StructType
from langchain.text_splitter import RecursiveCharacterTextSplitter
from snowflake.snowpark.files import SnowflakeFile
import logging
import pandas as pd

class text_chunker:

    def process(self, text: str):
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size = 2000,  # Adjust this as needed
            chunk_overlap = 300,  # Overlap to keep chunks contextual
            length_function = len
        )

        chunks = text_splitter.split_text(text)
        df = pd.DataFrame(chunks, columns=['chunk'])

        yield from df.itertuples(index=False, name=None)
$$;
Copy

아래 예제를 실행하여 문서를 인덱싱할 청크로 분할하십시오.

CREATE OR REPLACE TABLE cke_simple_article_chunks AS
    SELECT
        c.DOCUMENT_ID,
        c.DOCUMENT_TITLE,
        c.SOURCE_URL,
        t.chunk
    FROM cke_simple_article AS c, TABLE(text_chunker(CONCAT(c.DOCUMENT_TITLE, '\n', c.TEXT))) AS t;
Copy

다음을 실행하여 청크가 생성되었는지 확인합니다.

select * from cke_simple_article_chunks;
Copy

4단계. Cortex Search Service 만들기

이제 웨어하우스 xsmall_cke_getting_started 에서 실행되도록 cke_simple_cortex_search_service 라는 이름의 Cortex Search Service를 구성하고 청크된 문서 테이블 cke_simple_article_chunks 를 참조합니다. 이 단계는 데이터베이스의 크기에 따라 완료하는 데 상당한 시간이 걸릴 수 있습니다.

CREATE OR REPLACE CORTEX SEARCH SERVICE cke_simple_cortex_search_service
  ON chunk
  ATTRIBUTES document_title
  WAREHOUSE = xsmall_cke_getting_started
  TARGET_LAG = '1 hour'
  AS (
    SELECT
        chunk,
        document_title,
        source_url
      FROM cke_simple_article_chunks
  );
Copy

5단계. CKE 테스트

CKE 가 올바르게 작동하는지 확인하려면 Cortex Search Service에 간단한 쿼리를 실행하면 됩니다. 이렇게 하면 서비스가 문서를 올바르게 인덱스하고 관련 문서가 쿼리에서 돌아오는지 확인할 수 있습니다. 이 쿼리는 소스 URL 에 대한 링크와 함께 “The Greenfield Biosphere” 문서의 첫 번째 청크를 반환해야 합니다.

select snowflake.cortex.search_preview(
 'cke_getting_started.articles.cke_simple_cortex_search_service',
 '{ "query": "whats happening with the greenfield biosphere?", "columns": ["chunk","document_title","source_url"] }');
Copy

6단계: 테스트를 위해 CKE 를 비공개로 공유

Cortex Search Service가 생성되어 쿼리에 올바르게 응답하고 있으면 공유할 수 있습니다. 이 공유 Cortex Search Service는 Cortex Knowledge Extension입니다. 이 단계에서는 비공개 목록 을 생성해 다른 계정과 공유하여 테스트합니다. 그런 다음 CKE 를 공유한 컨슈머 계정에서 목록을 테스트합니다.

공유 생성하기

  1. Snowsight 에 로그인하고 Data Products » Provider Studio 로 이동합니다.

  2. 오른쪽 상단에서 Listing 을 선택하고 Specified Consumers 를 선택합니다.

  3. 목록의 제목을 입력한 다음 Next 를 클릭합니다.

  4. What’s in the listing? 을 보려면 + Select 를 클릭합니다.

  5. CKE_GETTING_STARTED 를 선택합니다.

  6. ARTICLES 를 확장합니다.

  7. Cortex Search Service 를 확장합니다.

  8. CKE_SIMPLE_CORTEX_SEARCH_SERVICE 를 선택한 다음 Done 을 선택합니다.

  9. 목록에 대한 설명을 입력합니다.

  10. Add consumer accounts 에서 공유하려는 Snowflake 계정을 추가하고 Cortex Knowledge Extension을 테스트합니다. 공급자와 같은 리전이어야 하며 이 계정에 대한 액세스 권한이 있어야 합니다.

컨슈머 계정에서 공유 테스트하기

  1. 위에서 CKE 를 공유한 Snowsight 컨슈머 계정으로 로그인합니다.

  2. Data Products » Private Sharing 으로 이동합니다.

  3. 여기에 위에서 공유한 CKE_GETTING_STARTED 목록이 표시됩니다. Get 를 선택합니다.

  4. 새 워크시트를 열고 아래 SQL 명령을 실행하여 계정에 공유 데이터에 대한 액세스 권한이 있는지 확인합니다.

    select
      snowflake.cortex.search_preview(
       'CKE_GETTING_STARTED_GUIDE__FAKE_ARTICLES.ARTICLES.CKE_SIMPLE_CORTEX_SEARCH_SERVICE',
       '{ "query": "whats happening with the biosphere?", "columns": ["chunk","document_title"] }'
      );
    
    Copy

    참고

    Get 대화 상자에서 CKE_GETTING_STARTED 이외의 이름을 지정한 경우 위의 코드조각에서 이름을 변경해야 합니다.

이제 기능하는 Cortex Knowledge Extension이 완료되었습니다!