Rational Developer for System z
COBOL for Windows, Version 7.5, Programming Guide


Unicode and the encoding of language characters

COBOL for Windows provides basic runtime support for Unicode, which can handle tens of thousands of characters that cover all commonly used characters and symbols in the world.

A character set is a defined set of characters, but is not associated with a coded representation. A coded character set (also referred to in this documentation as a code page) is a set of unambiguous rules that relate the characters of the set to their coded representation. Each code page has a name and is like a table that sets up the symbols for representing a character set; each symbol is associated with a unique bit pattern, or code point. Each code page also has a coded character set identifier (CCSID), which is a value from 1 to 65,536.

Unicode has several encoding schemes, called Unicode Transformation Format (UTF), such as UTF-8, UTF-16, and UTF-32. COBOL for Windows uses UTF-16 (CCSID 1202) in little-endian format as the representation for national literals and data items that have USAGE NATIONAL.

UTF-8 represents ASCII invariant characters a-z, A-Z, 0-9, and certain special characters such as ' @ , . + - = / * ( ) the same way that they are represented in ASCII. UTF-16 represents these characters as NX'00nn', where X'nn' is the representation of the character in ASCII.

For example, the string 'ABC' is represented in UTF-16 as NX'004100420043'. In UTF-8, 'ABC' is represented as X'414243'.

One or more encoding units are used to represent a character from a coded character set. For UTF-16, an encoding unit takes 2 bytes of storage. Any character defined in any EBCDIC, ASCII, or EUC code page is represented in one UTF-16 encoding unit when the character is converted to the national data representation.

Cross-platform considerations: COBOL for Windows supports UTF-16 in little-endian format in national data. Enterprise COBOL for z/OS and COBOL for AIX support UTF-16 in big-endian format (UTF-16BE) in national data. If you are porting Unicode data that is encoded in UTF-16BE representation to COBOL for Windows from another platform, you must convert that data to UTF-16 in little-endian format to process the data as national data. With COBOL for Windows, you can perform such conversions by using the NATIONAL-OF intrinsic function.

related tasks
Converting to or from national (Unicode) representation

related references
Storage of national data
Locales and code pages that are supported
Character sets and code pages (COBOL for Windows Language Reference)


Terms of use | Feedback

Copyright IBM Corporation 1996, 2008.
This information center is powered by Eclipse technology. (http://www.eclipse.org)