카테고리:: 문자열 및 이진 함수 (AI 함수)

AI_EXTRACT¶

입력 문자열 또는 파일에서 정보를 추출합니다.

구문¶

입력 문자열에서 정보 추출:

AI_EXTRACT( <text>, <responseFormat> )

Copy

AI_EXTRACT( text => <text>,
            responseFormat => <responseFormat> )

Copy

파일에서 정보 추출:

AI_EXTRACT( <file>, <responseFormat> )

Copy

AI_EXTRACT( file => <file>,
            responseFormat => <responseFormat> )

Copy

인자¶

text

추출을 위한 입력 문자열입니다.

file

추출을 위한 FILE 입니다.

지원되는 파일 형식:

PDF
PNG
PPTX, PPT
EML
DOC, DOCX
JPEG, JPG
HTM, HTML
TEXT, TXT
TIF, TIFF
BMP, GIF, WEBP
MD

파일은 크기는 100MB 미만이어야 합니다.

responseFormat

다음 응답 형식 중 하나로 추출할 정보입니다.

추출할 레이블과 정보를 매핑하는 간단한 오브젝트 스키마입니다. 예를 들면 다음과 같습니다.
```
{'name': 'What is the last name of the employee?', 'address': 'What is the address of the employee?'}
```
추출할 정보가 포함된 문자열 배열입니다. 예를 들면 다음과 같습니다.
```
['What is the last name of the employee?', 'What is the address of the employee?']
```
두 개의 문자열(추출할 레이블과 정보)을 포함하는 배열의 배열입니다. 예를 들면 다음과 같습니다.
```
[['name', 'What is the last name of the employee?'], ['address', 'What is the address of the employee?']]
```
추출된 정보의 구조를 정의하는 JSON 스키마입니다. 엔터티 및 테이블 추출을 지원합니다. 예를 들면 다음과 같습니다.
```
{
  'schema': {
    'type': 'object',
    'properties': {
      'income_table': {
        'description': 'Income for FY2026Q2',
        'type': 'object',
        'column_ordering': ['month', 'income'],
        'properties': {
          'month': {
            'description': 'Month',
            'type': 'array'
          },
          'income': {
            'description': 'Income',
            'type': 'array'
          }
        }
      },
      'title': {
        'description': 'What is the title of the document?',
        'type': 'string'
      },
      'employees': {
        'description': 'What are the names of employees?',
        'type': 'array'
      }
    }
  }
}
```
참고
- JSON 스키마 형식을 다른 응답 형식과 결합할 수 없습니다. responseFormat``에 ``schema 키가 포함된 경우 JSON 스키마 내에서 모든 질문을 정의해야 합니다. 추가 키는 지원되지 않습니다.
- 이 모델은 특정 모양의 JSON 스키마만 허용합니다. 최상위 유형은 항상 독립적으로 추출된 하위 오브젝트를 포함하는 오브젝트여야 합니다. 하위 오브젝트는 테이블(열을 나타내는 문자열 목록의 오브젝트), 문자열 목록 또는 문자열일 수 있습니다.
  
  문자열은 현재 유일하게 지원되는 스칼라 유형입니다.
- Use the description field to provide context to the model; for example, to help the model localize the right table in a document. You can enter the column header name, or describe the column in other way.
- Use the column_ordering field to specify the order of all columns in the extracted table. The column_ordering field is case-sensitive and must match the column names defined in the properties field. The order should reflect the order of the columns in the document.

반환¶

추출된 정보를 포함하는 JSON 오브젝트입니다.

배열, 테이블 및 단일 값 추출을 포함하는 출력의 예:

{
  "error": null,
  "response": {
    "employees": [
      "Smith",
      "Johnson",
      "Doe"
    ],
    "income_table": {
      "income": ["$120 678","$130 123","$150 998"],
      "month": ["February", "March", "April"]
    },
    "title": "Financial report"
  }
}

액세스 제어 요구 사항¶

사용자는 SNOWFLAKE.CORTEX_USER 데이터베이스 역할<label-snowflake_db_roles_cortex_user>`이 부여된 역할을 사용해야 합니다. 이 권한 부여에 대한 내용은 :ref:`label-cortex_llm__privileges 섹션을 참조하세요.

사용법 노트¶

동일한 함수 호출에서 text 및 file 매개 변수를 동시에 사용할 수는 없습니다.
자연어로 질문하거나 추출할 정보(예: 도시, 거리, ZIP 코드)를 설명할 수 있습니다. 예를 들면 다음과 같습니다.
['address': 'City, street, ZIP', 'name': 'First and last name']
다음 언어가 지원됩니다.
- 아랍어
- 벵골어
- 버마어
- 세부아노어
- 중국어
- 체코어
- 네덜란드어
- 영어
- 프랑스어
- 독일어
- 히브리어
- 힌디어
- 인도네시아어
- 이탈리아어
- 일본어
- 크메르어
- 한국어
- 라오어
- 말레이어
- 페르시아어
- 폴란드어
- 포르투갈어
- 러시아어
- 스페인어
- 타갈로그어
- 태국어
- 터키어
- 우르두어
- 베트남어
문서의 길이는 125페이지를 넘지 않아야 합니다.
단일 AI_EXTRACT 호출에서 엔터티 추출의 경우 최대 100개의 질문을 할 수 있으며 테이블 추출의 경우 최대 10개의 질문을 할 수 있습니다.

테이블 추출 질문 1개는 엔터티 추출 질문 10개와 같습니다. 예를 들어, 단일 AI_EXTRACT 호출에서 4개의 테이블 추출 질문과 60개의 엔터티 추출 질문을 할 수 있습니다.
엔터티 추출의 최대 출력 길이는 질문당 토큰 512개입니다. 테이블 추출의 경우 모델은 최대 4096개의 토큰에 해당하는 답변을 반환합니다.
클라이언트 측 암호화 스테이지는 지원되지 않습니다.
신뢰도 점수는 지원되지 않습니다.

예¶

입력 문자열에서 추출¶

다음 예제에서는 입력 텍스트에서 정보를 추출합니다.

SELECT AI_EXTRACT(
  text => 'John Smith lives in San Francisco and works for Snowflake',
  responseFormat => {'name': 'What is the first name of the employee?', 'city': 'What is the address of the employee?'}
);

Copy

다음 예제에서는 입력 텍스트에서 정보를 추출하고 구문 분석합니다.

SELECT AI_EXTRACT(
  text => 'John Smith lives in San Francisco and works for Snowflake',
  responseFormat => PARSE_JSON('{"name": "What is the first name of the employee?", "address": "What is the address of the employee?"}')
);

Copy

파일에서 추출¶

다음 예제에서는 document.pdf 파일에서 정보를 추출합니다.

SELECT AI_EXTRACT(
  file => TO_FILE('@db.schema.files','document.pdf'),
  responseFormat => [['name', 'What is the first name of the employee?'], ['city', 'Where does the employee live?']]
);

Copy

다음 예제에서는 스테이지에서 디렉터리의 모든 파일에 있는 정보를 추출합니다.

참고

디렉터리 테이블이 활성화되어 있는지 확인합니다. 자세한 내용은 디렉터리 테이블 관리하기 섹션을 참조하십시오.
```
SELECT AI_EXTRACT(
  file => TO_FILE('@db.schema.files', relative_path),
  responseFormat => [
    'What is the document ID?',
    'What is the address of the company?'
  ]
) FROM DIRECTORY (@db.schema.files);
```
Copy

다음 예에서는 report.pdf 파일에서 title 값을 추출합니다.

SELECT AI_EXTRACT(
  file => TO_FILE('@db.schema.files', 'report.pdf'),
  responseFormat => {
    'schema': {
      'type': 'object',
      'properties': {
        'title': {
          'description': 'What is the title of document?',
          'type': 'string'
        }
      }
    }
  }
);

Copy

다음 예에서는 report.pdf 파일에서 employees 배열을 추출합니다.

SELECT AI_EXTRACT(
  file => TO_FILE('@db.schema.files', 'report.pdf'),
  responseFormat => {
    'schema': {
      'type': 'object',
      'properties': {
        'employees': {
          'description': 'What are the surnames of employees?',
          'type': 'array'
        }
      }
    }
  }
);

Copy

다음 예에서는 report.pdf 파일에서 income_table 테이블을 추출합니다.

SELECT AI_EXTRACT(
  file => TO_FILE('@db.schema.files', 'report.pdf'),
  responseFormat => {
    'schema': {
      'type': 'object',
      'properties': {
        'income_table': {
          'description': 'Income for FY2026Q2',
          'type': 'object',
          'column_ordering': ['month', 'income'],
          'properties': {
            'month': {
              'description': 'Month',
              'type': 'array'
            },
            'income': {
              'description': 'Income',
              'type': 'array'
            }
          }
        }
      }
    }
  }
);

Copy

다음 예에서는 report.pdf 파일에서 테이블(income_table), 단일 값(title) 및 배열(employees)을 추출합니다.

SELECT AI_EXTRACT(
  file => TO_FILE('@db.schema.files', 'report.pdf'),
  responseFormat => {
    'schema': {
      'type': 'object',
      'properties': {
        'income_table': {
          'description': 'Income for FY2026Q2',
          'type': 'object',
          'column_ordering': ['month', 'income'],
          'properties': {
            'month': {
              'description': 'Month',
              'type': 'array'
            },
            'income': {
              'description': 'Income',
              'type': 'array'
            }
          }
        },
        'title': {
          'description': 'What is the title of document?',
          'type': 'string'
        },
        'employees': {
          'description': 'What are the surnames of employees?',
          'type': 'array'
        }
      }
    }
  }
);

Copy

리전 가용성¶

리전 가용성 섹션을 참조하십시오.

법적 고지¶

법적 고지 사항은 Snowflake AI 및 ML 섹션을 참조하십시오.