Kategorien:: Zeichenfolgen- und Binärfunktionen (AI-Funktionen)

AI_EXTRACT¶

Extrahiert Informationen aus einer Eingabezeichenfolge oder Datei.

Syntax¶

Extract information from an input string:

AI_EXTRACT( <text>, <responseFormat> )

Copy

AI_EXTRACT( text => <text>,
            responseFormat => <responseFormat> )

Copy

Extract information from a file:

AI_EXTRACT( <file>, <responseFormat> )

Copy

AI_EXTRACT( file => <file>,
            responseFormat => <responseFormat> )

Copy

Argumente¶

text

Eine Eingabezeichenfolge für die Extraktion.

file

Eine FILE für die Extraktion.

Unterstützte Dateiformate:

PDF
PNG
PPTX, PPT
EML
DOC, DOCX
JPEG, JPG
HTM, HTML
TEXT, TXT
TIF, TIFF
BMP, GIF, WEBP
MD

Die Dateien müssen kleiner als 100 MB sein.

responseFormat

Informationen, die in einem der folgenden Antwortformate extrahiert werden sollen:

Einfaches Objektschema, das das Label und die zu extrahierenden Informationen zuordnet, z. B.:

{'name': 'What is the last name of the employee?', 'address': 'What is the address of the employee?'}

Ein Array von Zeichenfolgen, die die zu extrahierenden Informationen enthalten, z. B.:
```
['What is the last name of the employee?', 'What is the address of the employee?']
```
Ein Array von Arrays, die zwei Zeichenfolgen enthalten (Label und die zu extrahierenden Informationen), z. B.:
```
[['name', 'What is the last name of the employee?'], ['address', 'What is the address of the employee?']]
```
Ein JSON-Schema, das die Struktur der extrahierten Informationen definiert. Unterstützt die Extraktion von Entitäten und Tabellen. Beispiel:
```
{
  'schema': {
    'type': 'object',
    'properties': {
      'income_table': {
        'description': 'Income for FY2026Q2',
        'type': 'object',
        'column_ordering': ['month', 'income'],
        'properties': {
          'month': {
            'description': 'Month',
            'type': 'array'
          },
          'income': {
            'description': 'Income',
            'type': 'array'
          }
        }
      },
      'title': {
        'description': 'What is the title of the document?',
        'type': 'string'
      },
      'employees': {
        'description': 'What are the names of employees?',
        'type': 'array'
      }
    }
  }
}
```
Bemerkung
- Sie können das JSON-Schemaformat nicht mit anderen Antwortformaten kombinieren. Wenn responseFormat den schema-Schlüssel enthält, müssen Sie alle Fragen innerhalb des JSON-Schemas definieren. Zusätzliche Schlüssel werden nicht unterstützt.
- Das -Modell akzeptiert nur bestimmte Formen des JSON-Schemas. Der Typ der obersten Ebene muss immer ein Objekt sein, das unabhängig voneinander extrahierte Unterobjekte enthält. Unterobjekte können eine Tabelle (Objekt von Listen von Zeichenfolgen, die Spalten darstellen), eine Liste von Zeichenfolgen oder eine Zeichenfolge sein.
  
  Zeichenfolge ist derzeit der einzige unterstützte Skalartyp.
- Use the description field to provide context to the model; for example, to help the model localize the right table in a document. You can enter the column header name, or describe the column in other way.
- Use the column_ordering field to specify the order of all columns in the extracted table. The column_ordering field is case-sensitive and must match the column names defined in the properties field. The order should reflect the order of the columns in the document.

Rückgabewerte¶

Ein JSON-Objekt, das die extrahierten Informationen enthält.

Beispiel für eine Ausgabe, die die Extraktion von Arrays, Tabellen und Einzelwerten umfasst:

{
  "error": null,
  "response": {
    "employees": [
      "Smith",
      "Johnson",
      "Doe"
    ],
    "income_table": {
      "income": ["$120 678","$130 123","$150 998"],
      "month": ["February", "March", "April"]
    },
    "title": "Financial report"
  }
}

Anforderungen an die Zugriffssteuerung¶

Benutzer müssen eine Rolle verwenden, der die Datenbankrolle SNOWFLAKE.CORTEX_USER zugewiesen wurde: Weitere Informationen zum Erteilen dieser Berechtigung finden Sie unter Cortex LLM-Berechtigungen.

Nutzungshinweise¶

Sie können im selben Funktionsaufruf nicht beides gleichzeitig verwenden, text- und file-Parameter.
Sie können entweder Fragen in natürlicher Sprache stellen oder die zu extrahierenden Informationen beschreiben ( z. B. Stadt, Straße, ZIP-Code), zum Beispiel:
['address': 'City, street, ZIP', 'name': 'First and last name']
Die folgenden Sprachen werden unterstützt:
- Arabisch
- Bengalisch
- Birmanisch
- Cebuano
- Chinesisch
- Tschechisch
- Holländisch
- Englisch
- Französisch
- Deutsch
- Hebräisch
- Hindi
- Indonesisch
- Italienisch
- Japanisch
- Khmer
- Koreanisch
- Lao
- Malaiisch
- Persisch
- Polnisch
- Portugiesisch
- Russisch
- Spanisch
- Tagalog
- Thailändisch
- Türkisch
- Urdu
- Vietnamesisch
Die Dokumente dürfen nicht mehr als 125 Seiten umfassen.
In einem einzigen AI_EXTRACT-Aufruf können Sie maximal 100 Fragen zur Extraktion von Entitäten und maximal 10 Fragen zur Extraktion von Tabellen stellen.

Eine Frage zur Extraktion von Tabellen entspricht 10 Fragen zur Extraktion von Entitäten. Sie können in einem einzigen AI_EXTRACT-Aufruf zum Beispiel 4 Fragen zur Extraktion von Tabellen und 60 Fragen zur Extraktion von Entitäten auf einmal stellen.
Die maximale Ausgabelänge für die Extraktion von Entitäten beträgt 512 Token pro Frage. Bei der Tabellenextraktion liefert das Modell Antworten, die maximal 4.096 Token umfassen.
Clientseitig verschlüsselte Stagingbereiche werden nicht unterstützt.
Konfidenzwerte werden nicht unterstützt.

Beispiele¶

Extraktion aus einer Eingabezeichenfolge¶

Im folgenden Beispiel werden Informationen aus dem Eingabetext extrahiert:

SELECT AI_EXTRACT(
  text => 'John Smith lives in San Francisco and works for Snowflake',
  responseFormat => {'name': 'What is the first name of the employee?', 'city': 'What is the address of the employee?'}
);

Copy

Im folgenden Beispiel werden Informationen aus dem Eingabetext extrahiert und zurückgegeben:

SELECT AI_EXTRACT(
  text => 'John Smith lives in San Francisco and works for Snowflake',
  responseFormat => PARSE_JSON('{"name": "What is the first name of the employee?", "address": "What is the address of the employee?"}')
);

Copy

Extraktion aus einer Datei¶

Im folgenden Beispiel werden Informationen aus der document.pdf-Datei extrahiert:

SELECT AI_EXTRACT(
  file => TO_FILE('@db.schema.files','document.pdf'),
  responseFormat => [['name', 'What is the first name of the employee?'], ['city', 'Where does the employee live?']]
);

Copy

Im folgenden Beispiel werden Informationen aus allen Dateien in einem Stagingbereich extrahiert:

Bemerkung

Stellen Sie sicher, dass die Verzeichnistabelle aktiviert ist. Weitere Informationen dazu finden Sie unter Verwalten von Verzeichnistabellen.
```
SELECT AI_EXTRACT(
  file => TO_FILE('@db.schema.files', relative_path),
  responseFormat => [
    'What is the document ID?',
    'What is the address of the company?'
  ]
) FROM DIRECTORY (@db.schema.files);
```
Copy

Im folgenden Beispiel wird der title-Wert aus der report.pdf-Datei extrahiert:

SELECT AI_EXTRACT(
  file => TO_FILE('@db.schema.files', 'report.pdf'),
  responseFormat => {
    'schema': {
      'type': 'object',
      'properties': {
        'title': {
          'description': 'What is the title of document?',
          'type': 'string'
        }
      }
    }
  }
);

Copy

Im folgenden Beispiel wird das employees-Array aus der report.pdf-Datei extrahiert:

SELECT AI_EXTRACT(
  file => TO_FILE('@db.schema.files', 'report.pdf'),
  responseFormat => {
    'schema': {
      'type': 'object',
      'properties': {
        'employees': {
          'description': 'What are the surnames of employees?',
          'type': 'array'
        }
      }
    }
  }
);

Copy

Im folgenden Beispiel wird die income_table-Tabelle aus der report.pdf-Datei extrahiert:

SELECT AI_EXTRACT(
  file => TO_FILE('@db.schema.files', 'report.pdf'),
  responseFormat => {
    'schema': {
      'type': 'object',
      'properties': {
        'income_table': {
          'description': 'Income for FY2026Q2',
          'type': 'object',
          'column_ordering': ['month', 'income'],
          'properties': {
            'month': {
              'description': 'Month',
              'type': 'array'
            },
            'income': {
              'description': 'Income',
              'type': 'array'
            }
          }
        }
      }
    }
  }
);

Copy

Im folgenden Beispiel werden die Tabelle income_table, der einzelne Wert title und das Array (employees) aus der report.pdf-Datei extrahiert:

SELECT AI_EXTRACT(
  file => TO_FILE('@db.schema.files', 'report.pdf'),
  responseFormat => {
    'schema': {
      'type': 'object',
      'properties': {
        'income_table': {
          'description': 'Income for FY2026Q2',
          'type': 'object',
          'column_ordering': ['month', 'income'],
          'properties': {
            'month': {
              'description': 'Month',
              'type': 'array'
            },
            'income': {
              'description': 'Income',
              'type': 'array'
            }
          }
        },
        'title': {
          'description': 'What is the title of document?',
          'type': 'string'
        },
        'employees': {
          'description': 'What are the surnames of employees?',
          'type': 'array'
        }
      }
    }
  }
);

Copy

Regionale Verfügbarkeit¶

Siehe Regionale Verfügbarkeit.

Rechtliche Hinweise¶

Siehe KI und ML in Snowflake für rechtliche Hinweise.