Categories:

Semi-structured and structured data functions (Extraction)

XMLGET¶

Extracts an XML element object (often referred to as simply a tag) from the content of the outer XML element based on the name and instance number of the specified tag.

(Note that an XML tag is not the same as a Snowflake data governance tag.)

  • If any argument of XMLGET is NULL, the result is NULL.

  • If the tag instance is not found, the result is NULL.

Syntax¶

XMLGET( <expression> , <tag_name> [ , <instance_number> ] )
Copy

Arguments¶

expression

The expression from which to extract the element.

The expression must evaluate to an OBJECT (or a VARIANT containing an OBJECT). The OBJECT must contain valid XML in the internal format that Snowflake supports. Typically, that means that the OBJECT was produced by one of the following:

  • Calling the PARSE_XML function.

  • Loading the data (e.g. via the COPY INTO <table> command) and specifying that the data is in XML format.

The XMLGET function does not operate directly on a VARCHAR expression even if that VARCHAR contains valid XML text.

tag_name

The name of an XML tag stored in the expression.

instance_number

If the XML contains multiple instances of tag_name, then use instance_number to specify which instance to retrieve. Like an array index, the instance_number is 0-based, not 1-based.

instance_number can be omitted, in which case the default value 0 is used.

Returns¶

The data type of the returned value is OBJECT.

See the Usage Notes for more details.

Usage notes¶

  • The result of XMLGET is not the content of the tag (i.e. the text between the tags), but the entire element (the opening tag, content, and closing tag). The return value is an OBJECT. From this OBJECT, you can extract the tag name, the tag’s attribute values, and the contents of the element (including nested tags) by using the GET function:

    • To extract attribute values, use GET(tag, '@attrname').

    • To extract the content, use GET(tag, '$').

    • To extract the tag name, use GET(tag, '@').

  • You can extract nested tags by nesting XMLGET() calls, e.g.:

    select xmlget(xmlget(my_xml_column, 'my_tag'), 'my_inner_tag') ...;
    
    Copy
  • Positions of the inner tags in the content can be obtained by using GET(tag, 'inner-tag-name'); if the content contains multiple elements, the positions will be represented as an array.

  • You cannot use XMLGET to extract the outermost element. To get the outermost element, simply select the expression itself.

Examples¶

The following example creates a table with an OBJECT that contains XML, then uses XMLGET() to extract elements from that OBJECT.

CREATE TABLE xml_demo (ID INTEGER, object_col OBJECT);
INSERT INTO xml_demo (id, object_col)
    SELECT 1001,
        PARSE_XML('<level1> 1 <level2> 2 <level3> 3A </level3> <level3> 3B </level3> </level2> </level1>');
Copy
SELECT object_col,
       XMLGET(object_col, 'level2'),
       XMLGET(XMLGET(object_col, 'level2'), 'level3', 1)
    FROM xml_demo;
+-------------------------+------------------------------+---------------------------------------------------+
| OBJECT_COL              | XMLGET(OBJECT_COL, 'LEVEL2') | XMLGET(XMLGET(OBJECT_COL, 'LEVEL2'), 'LEVEL3', 1) |
|-------------------------+------------------------------+---------------------------------------------------|
| <level1>                | <level2>                     | <level3>3B</level3>                               |
|   1                     |   2                          |                                                   |
|   <level2>              |   <level3>3A</level3>        |                                                   |
|     2                   |   <level3>3B</level3>        |                                                   |
|     <level3>3A</level3> | </level2>                    |                                                   |
|     <level3>3B</level3> |                              |                                                   |
|   </level2>             |                              |                                                   |
| </level1>               |                              |                                                   |
+-------------------------+------------------------------+---------------------------------------------------+
Copy

This example shows how to use GET() with XMLGET() to retrieve the content of an element. In this example, the level2 tag contains three items (text and two nested tags), so GET returns these items in an ARRAY. The nested tags are represented by OBJECTs (key-value pairs). The @ property contains the nested tag name and the $ property contains the nested tag contents.

SELECT object_col,
       GET(XMLGET(object_col, 'level2'), '$')
    FROM xml_demo;
+-------------------------+----------------------------------------+
| OBJECT_COL              | GET(XMLGET(OBJECT_COL, 'LEVEL2'), '$') |
|-------------------------+----------------------------------------|
| <level1>                | [                                      |
|   1                     |   2,                                   |
|   <level2>              |   {                                    |
|     2                   |     "$": "3A",                         |
|     <level3>3A</level3> |     "@": "level3"                      |
|     <level3>3B</level3> |   },                                   |
|   </level2>             |   {                                    |
| </level1>               |     "$": "3B",                         |
|                         |     "@": "level3"                      |
|                         |   }                                    |
|                         | ]                                      |
+-------------------------+----------------------------------------+
Copy

This example shows how to use GET() with XMLGET() to retrieve an attribute of a tag:

INSERT INTO xml_demo (id, object_col)
    SELECT 1002,
        PARSE_XML('<level1> 1 <level2 an_attribute="my attribute"> 2 </level2> </level1>');
Copy
SELECT object_col,
       GET(XMLGET(object_col, 'level2'), '@an_attribute')
    FROM xml_demo
    WHERE ID = 1002;
+--------------------------------------------------+----------------------------------------------------+
| OBJECT_COL                                       | GET(XMLGET(OBJECT_COL, 'LEVEL2'), '@AN_ATTRIBUTE') |
|--------------------------------------------------+----------------------------------------------------|
| <level1>                                         | "my attribute"                                     |
|   1                                              |                                                    |
|   <level2 an_attribute="my attribute">2</level2> |                                                    |
| </level1>                                        |                                                    |
+--------------------------------------------------+----------------------------------------------------+
Copy