[jira] [Updated] (XERCESC-2120) DOM Serialization does not correctly validate Surrogate Pairs

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (XERCESC-2120) DOM Serialization does not correctly validate Surrogate Pairs

JIRA xerces-c-dev@xml.apache.org

     [ https://issues.apache.org/jira/browse/XERCESC-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Blackton updated XERCESC-2120:
-------------------------------------
    Description:
When attempting to write an xml document containing valid UTF-16 surrogate pairs an error occurs during validation. This causes the write to fail.

It appears as though this issue was introduced with https://issues.apache.org/jira/browse/XERCESC-1854 in the following commit http://svn.apache.org/viewvc/xerces/c/trunk/src/xercesc/dom/impl/DOMLSSerializerImpl.cpp?r1=768978&r2=1226891.

I have supplied a reproducible and a potential patch. The string validator should be responsible for determining if the codepoint is part of a surrogate pair. However, I may also like to make the argument that this may not be the right location to be doing the string validation. As it will leave the output document in an inconsistent (half-written) state.

  was:
When attempting to write an xml document containing valid UTF-16 surrogate pairs an error occurs during validation. This causes the write to fail.

It appears as though this issue was introduced with https://issues.apache.org/jira/browse/XERCESC-1854 in the following commit http://svn.apache.org/viewvc/xerces/c/trunk/src/xercesc/dom/impl/DOMLSSerializerImpl.cpp?r1=768978&r2=1226891.

I have supplied a reproducible and a potential patch. The string validator should be responsible for determining if the codepoint is part of a surrogate pair.


> DOM Serialization does not correctly validate Surrogate Pairs
> -------------------------------------------------------------
>
>                 Key: XERCESC-2120
>                 URL: https://issues.apache.org/jira/browse/XERCESC-2120
>             Project: Xerces-C++
>          Issue Type: Bug
>          Components: DOM
>    Affects Versions: 3.2.0
>            Reporter: Andrew Blackton
>         Attachments: DOMCharacterValidationTest.cpp, DomStringValidation.patch
>
>
> When attempting to write an xml document containing valid UTF-16 surrogate pairs an error occurs during validation. This causes the write to fail.
> It appears as though this issue was introduced with https://issues.apache.org/jira/browse/XERCESC-1854 in the following commit http://svn.apache.org/viewvc/xerces/c/trunk/src/xercesc/dom/impl/DOMLSSerializerImpl.cpp?r1=768978&r2=1226891.
> I have supplied a reproducible and a potential patch. The string validator should be responsible for determining if the codepoint is part of a surrogate pair. However, I may also like to make the argument that this may not be the right location to be doing the string validation. As it will leave the output document in an inconsistent (half-written) state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]