Rational Developer for System z
Enterprise COBOL for z/OS, Version 4.1, Programming Guide


Parsing XML documents encoded in UTF-8

When the XMLPARSE(XMLSS) compiler option is in effect, you can parse XML documents that are encoded in UTF-8 in a manner similar to parsing other XML documents, except that some additional requirements apply.

To parse an XML document that is encoded in UTF-8, you must specify CCSID 1208 in the ENCODING phrase of the XML PARSE statement, as shown in the following code fragment:

XML PARSE xml-document
    WITH ENCODING 1208  
    PROCESSING PROCEDURE xml-event-handler
    . . .
END-XML

You define xml-document as an alphanumeric data item or alphanumeric group item in WORKING-STORAGE or LOCAL-STORAGE.

By default, the parser returns XML document fragments in the alphanumeric XML special registers XML-TEXT, XML-NAMESPACE, and XML-NAMESPACE-PREFIX. UTF-8 characters are encoded using a variable number of bytes per character. Most COBOL operations on alphanumeric data assume a single-byte encoding, where each character is encoded in one byte. When you operate on UTF-8 characters as alphanumeric data, you must ensure that the data is processed correctly. Avoid operations (such as reference modification and moves that involve truncation) that can split a multibyte character between bytes. You cannot reliably use statements such as INSPECT to process multibyte characters in alphanumeric data.

You can more reliably process UTF-8 document fragments by specifying the RETURNING NATIONAL phrase on the XML PARSE statement. With the RETURNING NATIONAL phrase, XML document fragments are efficiently converted to UTF-16 encoding and are returned to the application in the national special registers XML-NTEXT, XML-NNAMESPACE, and XMLNNAMESPACE-PREFIX. Then you can efficiently process XML text fragments in national data items. (The UTF-16 encoding in national data items greatly facilitates Unicode processing in COBOL.)

The following code fragment illustrates the use of both the ENCODING phrase and the RETURNING NATIONAL phrase in parsing a UTF-8 XML document:

XML PARSE xml-document
    WITH ENCODING 1208  RETURNING NATIONAL 
    PROCESSING PROCEDURE xml-event-handler
  ON EXCEPTION
     DISPLAY 'XML document error ' XML-CODE
     STOP RUN
  NOT ON EXCEPTION
     DISPLAY 'XML document was successfully parsed.'
END-XML

Terms of use | Feedback

This information center is powered by Eclipse technology. (http://www.eclipse.org)