Categories:

Semi-structured Data Functions (Parsing)

PARSE_XML

Interprets an input string as an XML document, producing an OBJECT value. If the input is NULL, the output is NULL.

See also:

CHECK_XML

Syntax

PARSE_XML( <string_containing_xml> [ , <disable_auto_convert> ] )
Copy

Arguments

Required:

string_containing_xml

This expression should evaluate to a VARCHAR. The VARCHAR should contain valid XML.

Optional:

disable_auto_convert

Boolean expression that specifies whether or not the function should attempt to convert numeric and boolean values in string_containing_xml to Snowflake data types. (For details about this conversion, see Usage Notes below.)

  • If you do not want the function to convert these values, set this to TRUE. This has an effect that is similar to the DISABLE_AUTO_CONVERT parameter in CREATE FILE FORMAT.

  • If you want the function to convert these values, set this to FALSE or omit this argument.

Default: FALSE

Returns

The data type of the returned value is OBJECT. The OBJECT contains an internal representation of the XML.

Usage Notes

  • The content of every element in XML documents is text. PARSE_XML attempts to convert some XML data from text to native (Snowflake SQL) Data Types.

    • NUMERIC and BOOLEAN:

      PARSE_XML attempts to convert obviously numeric and Boolean values to the native representation in a way that printing these values back produces textually identical results. For example, when parsing decimal numbers, PARSE_XML attempts to preserve exactness of the representation by treating 123.45 as NUMBER(5,2), not as a DOUBLE. However:

      • Numbers in scientific notation (i.e. 1.2345e+02) or numbers that cannot be stored as fixed-point decimals due to range or scale limitations are stored as DOUBLE.

      • If the content of an XML element is a number with digits after the decimal point, then PARSE_XML might truncate trailing zeros.

      If you do not want the function to perform this conversion, pass TRUE for the disable_auto_convert argument.

    • TIMESTAMP, DATE, TIME, BINARY:

      Because XML does not represent values such as TIMESTAMP, DATE, TIME, or BINARY natively, these have to be represented as strings in XML. PARSE_XML does not automatically recognize these values; they are retained as strings, so users must convert the values from strings to native SQL data types if needed.

  • XML attributes are an unordered collection of name/value pairs. The PARSE_XML function does not necessarily preserve order. For example, converting text to XML and back to text might result in a string that contains the original information in a different order.

  • You might see changes in whitespace between elements when converting from string to XML.

Examples

The following example demonstrates how to use the PARSE_XML function to convert a string of XML to an OBJECT that can be inserted into an OBJECT column:

CREATE TABLE xtab (v OBJECT);

INSERT INTO xtab SELECT PARSE_XML(column1) AS v
  FROM VALUES ('<a/>'), ('<a attr="123">text</a>'), ('<a><b>X</b><b>Y</b></a>');

SELECT * FROM xtab;

+-------------------------+
|           V             |
+-------------------------+
| <a></a>                 |
| <a attr="123">text</a>  |
| <a><b>X</b><b>Y</b></a> |
+-------------------------+
Copy

The following example demonstrates the differences between using and disabling the conversion of numeric values. In this example, when the conversion is not disabled, the function interprets a number in scientific notation as a DOUBLE.

SELECT PARSE_XML('<test>22257e111</test>'), PARSE_XML('<test>22257e111</test>', TRUE);

+-------------------------------------+-------------------------------------------+
| PARSE_XML('<TEST>22257E111</TEST>') | PARSE_XML('<TEST>22257E111</TEST>', TRUE) |
|-------------------------------------+-------------------------------------------|
| <test>2.225700000000000e+115</test> | <test>22257e111</test>                    |
+-------------------------------------+-------------------------------------------+
Copy