When the XMLPARSE(XMLSS) compiler option is in effect, you can parse XML documents that are encoded in UTF-8 in a manner similar to parsing other XML documents, except that some additional requirements apply.
To parse an XML document that is encoded in UTF-8, you must specify CCSID 1208 in the ENCODING phrase of the XML PARSE statement, as shown in the following code fragment:
XML PARSE xml-document
WITH ENCODING 1208
PROCESSING PROCEDURE xml-event-handler
. . .
END-XML
You define xml-document as an alphanumeric data item or alphanumeric group item in WORKING-STORAGE or LOCAL-STORAGE.
By default, the parser returns XML document fragments in the alphanumeric XML special registers XML-TEXT, XML-NAMESPACE, and XML-NAMESPACE-PREFIX. UTF-8 characters are encoded using a variable number of bytes per character. Most COBOL operations on alphanumeric data assume a single-byte encoding, where each character is encoded in one byte. When you operate on UTF-8 characters as alphanumeric data, you must ensure that the data is processed correctly. Avoid operations (such as reference modification and moves that involve truncation) that can split a multibyte character between bytes. You cannot reliably use statements such as INSPECT to process multibyte characters in alphanumeric data.
You can more reliably process UTF-8 document fragments by specifying the RETURNING NATIONAL phrase on the XML PARSE statement. With the RETURNING NATIONAL phrase, XML document fragments are efficiently converted to UTF-16 encoding and are returned to the application in the national special registers XML-NTEXT, XML-NNAMESPACE, and XMLNNAMESPACE-PREFIX. Then you can efficiently process XML text fragments in national data items. (The UTF-16 encoding in national data items greatly facilitates Unicode processing in COBOL.)
The following code fragment illustrates the use of both the ENCODING phrase and the RETURNING NATIONAL phrase in parsing a UTF-8 XML document:
XML PARSE xml-document
WITH ENCODING 1208 RETURNING NATIONAL
PROCESSING PROCEDURE xml-event-handler
ON EXCEPTION
DISPLAY 'XML document error ' XML-CODE
STOP RUN
NOT ON EXCEPTION
DISPLAY 'XML document was successfully parsed.'
END-XML
related references
XMLPARSE
XML-TEXT and XML-NTEXT
XML-NAMESPACE and XML-NNAMESPACE
XML-NAMESPACE-PREFIX and XML-NNAMESPACE-PREFIX
XML PARSE statement (Enterprise COBOL Language Reference)