Changes to XML parsing and emitting behavior (Pending)

Attention

This behavior change is in the 2025_01 bundle.

For the current status of the bundle, refer to Bundle History.

When this behavior change bundle is enabled, parsing and emitting XML content changes when using the COPY INTO <table> command with the XML file format and when calling the following functions:

Before the change:

XML parsing and emitting behavior:

  • Some queries that call the CHECK_XML function return a string with an error message.

  • Some queries that call the PARSE_XML function fail.

After the change:

XML parsing and emitting behavior:

  • Some queries that returned a string with an error message when calling the CHECK_XML function before the change now return NULL.

  • Some queries that failed when calling the PARSE_XML function before the change now succeed, and the function returns the parsed XML.

  • Queries with XML strings containing angle brackets or apostrophes return different results after the change.

  • Queries with XML strings containing white space or XML attributes relating to preserving white space return different results after the change.

The following sections provide more details about the changes.

Parsing XML content that contains processing instructions

The following example uses the PARSE_XML function to parse XML content with question marks in the processing instructions:

SELECT PARSE_XML('<?PITarget PIContent ??><mytag />') AS mytag;
Copy
Returned before the change::
100100 (22P02): Error parsing XML: prematurely terminated XML document in processing instructions, pos 33
Returned after the change::
+-----------------+
| MYTAG           |
|-----------------|
| <mytag></mytag> |
+-----------------+

Parsing XML content that contains angle brackets or apostrophes

The following example uses the PARSE_XML function to parse XML content that contains angle brackets and apostrophes in XML attribute values. After the change, apostrophes and angle brackets in the XML attribute values are properly escaped in the return value and in the emitted XML:

SELECT PARSE_XML('<mytag myattr="&lt;&gt;\'"/>') AS mytag;
Copy
Returned before the change::
+------------------------------+
| MYTAG                        |
|------------------------------|
| <mytag myattr="<>'"></mytag> |
+------------------------------+
Returned after the change::
+-----------------------------------------+
| MYTAG                                   |
|-----------------------------------------|
| <mytag myattr="&lt;&gt;&apos;"></mytag> |
+-----------------------------------------+

Parsing XML content that contains user-defined entities

The following example uses the PARSE_XML function to parse XML content that contains user-defined entities:

SELECT PARSE_XML('<!DOCTYPE doc [<!ENTITY placeholder "some text">]><doc>&placeholder;</doc>')
  AS placeholder;
Copy
Returned before the change::
100100 (22P02): Error parsing XML: unknown entity &placeholder;, pos 68
Returned after the change::
+-------------------------------------------------------------+
| PLACEHOLDER                                                 |
|-------------------------------------------------------------|
| <!DOCTYPE doc [<!ENTITY placeholder "some                   |
| text">]><doc>some text</doc>                                |
+-------------------------------------------------------------+

Parsing XML content that preserves white space

This change was made so the behavior in Snowflake matches the XML specification regarding preservation of whitespace:

  • Before the change, whitespace is preserved for the xsl:space="preserve" attribute. After the change, whitespace isn’t preserved for the xsl:space="preserve" attribute.

  • Before the change, whitespace isn’t preserved for the xml:space="preserve" attribute. After the change, whitespace is preserved for the xml:space="preserve" attribute.

The following example uses the PARSE_XML function to parse XML content and specifies the xsl:space="preserve" attribute:

SELECT PARSE_XML('<mytag xsl:space="preserve"> my content </mytag>')
  AS space_preserve;
Copy
Returned before the change::
+--------------------------------------------------+
| SPACE_PRESERVE                                   |
|--------------------------------------------------|
| <mytag xsl:space="preserve"> my content </mytag> |
+--------------------------------------------------+
Returned after the change::
+--------------------------------------------------+
| SPACE_PRESERVE                                   |
|--------------------------------------------------|
| <mytag xsl:space="preserve">my content</mytag>   |
+--------------------------------------------------+

The following example uses the PARSE_XML function to parse XML content and specifies the xml:space="preserve" attribute:

SELECT PARSE_XML('<mytag xml:space="preserve"> my content </mytag>')
  AS space_preserve;
Copy
Returned before the change::
+--------------------------------------------------+
| SPACE_PRESERVE                                   |
|--------------------------------------------------|
| <mytag xml:space="preserve">my content</mytag>   |
+--------------------------------------------------+
Returned after the change::
+--------------------------------------------------+
| SPACE_PRESERVE                                   |
|--------------------------------------------------|
| <mytag xml:space="preserve"> my content </mytag> |
+--------------------------------------------------+

Loading XML content that preserves white space

The following example loads data into a table using the COPY INTO <table> command. The PRESERVE_SPACE parameter is set to TRUE to preserve white space:

COPY INTO mytable
  FROM @my_xml_stage
  FILE_FORMAT = (TYPE = 'XML' PRESERVE_SPACE = TRUE);
Copy
Loaded content before the change::
+--------------------------------------------------+
| SPACE_PRESERVE                                   |
|--------------------------------------------------|
| <mytag xsl:space="preserve"> my content </mytag> |
+--------------------------------------------------+
Loaded content after the change::
+--------------------------------------------------+
| SPACE_PRESERVE                                   |
|--------------------------------------------------|
| <mytag xml:space="preserve"> my content </mytag> |
+--------------------------------------------------+

Before and after the change, the content preserves the white space, but the attribute changes from xsl:space="preserve" to xml:space="preserve".

Ref: 1862