Changes to XML parsing and emitting behavior (Pending)¶
Attention
This behavior change is in the 2025_01 bundle.
For the current status of the bundle, refer to Bundle History.
When this behavior change bundle is enabled, parsing and emitting XML content changes when using the COPY INTO <table> command with the XML file format and when calling the following functions:
- Before the change:
XML parsing and emitting behavior:
Some queries that call the CHECK_XML function return a string with an error message.
Some queries that call the PARSE_XML function fail.
- After the change:
XML parsing and emitting behavior:
Some queries that returned a string with an error message when calling the CHECK_XML function before the change now return NULL.
Some queries that failed when calling the PARSE_XML function before the change now succeed, and the function returns the parsed XML.
Queries with XML strings containing angle brackets or apostrophes return different results after the change.
Queries with XML strings containing white space or XML attributes relating to preserving white space return different results after the change.
The following sections provide more details about the changes.
Parsing XML content that contains processing instructions¶
The following example uses the PARSE_XML function to parse XML content with question marks in the processing instructions:
SELECT PARSE_XML('<?PITarget PIContent ??><mytag />') AS mytag;
- Returned before the change::
100100 (22P02): Error parsing XML: prematurely terminated XML document in processing instructions, pos 33
- Returned after the change::
+-----------------+ | MYTAG | |-----------------| | <mytag></mytag> | +-----------------+
Parsing XML content that contains angle brackets or apostrophes¶
The following example uses the PARSE_XML function to parse XML content that contains angle brackets and apostrophes in XML attribute values. After the change, apostrophes and angle brackets in the XML attribute values are properly escaped in the return value and in the emitted XML:
SELECT PARSE_XML('<mytag myattr="<>\'"/>') AS mytag;
- Returned before the change::
+------------------------------+ | MYTAG | |------------------------------| | <mytag myattr="<>'"></mytag> | +------------------------------+
- Returned after the change::
+-----------------------------------------+ | MYTAG | |-----------------------------------------| | <mytag myattr="<>'"></mytag> | +-----------------------------------------+
Parsing XML content that contains user-defined entities¶
The following example uses the PARSE_XML function to parse XML content that contains user-defined entities:
SELECT PARSE_XML('<!DOCTYPE doc [<!ENTITY placeholder "some text">]><doc>&placeholder;</doc>')
AS placeholder;
- Returned before the change::
100100 (22P02): Error parsing XML: unknown entity &placeholder;, pos 68
- Returned after the change::
+-------------------------------------------------------------+ | PLACEHOLDER | |-------------------------------------------------------------| | <!DOCTYPE doc [<!ENTITY placeholder "some | | text">]><doc>some text</doc> | +-------------------------------------------------------------+
Parsing XML content that preserves white space¶
This change was made so the behavior in Snowflake matches the XML specification regarding preservation of whitespace:
Before the change, whitespace is preserved for the
xsl:space="preserve"
attribute. After the change, whitespace isn’t preserved for thexsl:space="preserve"
attribute.Before the change, whitespace isn’t preserved for the
xml:space="preserve"
attribute. After the change, whitespace is preserved for thexml:space="preserve"
attribute.
The following example uses the PARSE_XML function to parse XML content and specifies the xsl:space="preserve"
attribute:
SELECT PARSE_XML('<mytag xsl:space="preserve"> my content </mytag>')
AS space_preserve;
- Returned before the change::
+--------------------------------------------------+ | SPACE_PRESERVE | |--------------------------------------------------| | <mytag xsl:space="preserve"> my content </mytag> | +--------------------------------------------------+
- Returned after the change::
+--------------------------------------------------+ | SPACE_PRESERVE | |--------------------------------------------------| | <mytag xsl:space="preserve">my content</mytag> | +--------------------------------------------------+
The following example uses the PARSE_XML function to parse XML content and specifies the xml:space="preserve"
attribute:
SELECT PARSE_XML('<mytag xml:space="preserve"> my content </mytag>')
AS space_preserve;
- Returned before the change::
+--------------------------------------------------+ | SPACE_PRESERVE | |--------------------------------------------------| | <mytag xml:space="preserve">my content</mytag> | +--------------------------------------------------+
- Returned after the change::
+--------------------------------------------------+ | SPACE_PRESERVE | |--------------------------------------------------| | <mytag xml:space="preserve"> my content </mytag> | +--------------------------------------------------+
Loading XML content that preserves white space¶
The following example loads data into a table using the COPY INTO <table> command. The PRESERVE_SPACE parameter is set to TRUE to preserve white space:
COPY INTO mytable
FROM @my_xml_stage
FILE_FORMAT = (TYPE = 'XML' PRESERVE_SPACE = TRUE);
- Loaded content before the change::
+--------------------------------------------------+ | SPACE_PRESERVE | |--------------------------------------------------| | <mytag xsl:space="preserve"> my content </mytag> | +--------------------------------------------------+
- Loaded content after the change::
+--------------------------------------------------+ | SPACE_PRESERVE | |--------------------------------------------------| | <mytag xml:space="preserve"> my content </mytag> | +--------------------------------------------------+
Before and after the change, the content preserves the white space, but the attribute changes from
xsl:space="preserve"
to xml:space="preserve"
.
Ref: 1862