The ILE C/C++ compilers support the following:
When EBCDIC wide characters are used, the CCSID of the EBCDIC characters depends on the CCSID of the LC_CTYPE category of the current locale. See Unicode Support for more information about Unicode characters.
The character conversion routines examine the CCSID setting for the LC_CTYPE category of the current locale to determine whether single-byte or multibyte characters are expected for the conversion from or to wide characters.
The handling of wide character conversions (to and from single-byte or multibyte character strings) is dependent on the LOCALETYPE parameter value specified on the compilation command. The handling depends on the shift state of the single-byte or multibyte character string. The mbtowc, mbstowcs, wctomb, and wcstombs functions maintain an internal shift state variable. The mbrtowc, mbsrtowcs, wcrtomb, and wcsrtombs functions allow the shift state variable to be passed as a parameter. The second set of functions is recommended because they are more versatile and are also threadsafe.
When converting from a single-byte CCSID to wide EBCDIC, the wide EBCDIC character is constructed by adding a zero byte to the single-byte character. For example, the single-byte CCSID 37 character A (hexadecimal value 0xC1) would have the hexadecimal value 0x00C1 when it is converted to a wide EBCDIC character.
When converting from a multibyte CCSID to wide EBCDIC, the conversion method depends on the shift state of the input string. In the initial shift state, characters are read exactly as if they were single-byte characters until a shift-out character (hexadecimal value 0x0E) is read. This character indicates a shift to double-byte shift state. In the double-byte shift state, 2 bytes are read at a time: the first byte makes up the first byte of the EBCDIC wide character and the second byte will be the second byte of the EBCDIC wide character. If the shift-in character (hexadecimal value 0x0F) is encountered, the function returns to the initial shift state parsing. For example, the multibyte string represented by the hexadecimal value C10E43DA0FC2 is translated to the EBCDIC wide character string with the hexadecimal value 00C143DA00C2.
When converting from wide EBCDIC to a single-byte CCSID, if the character has a hexadecimal value greater than 0x00FF, EOF is returned; otherwise, the top byte is truncated and the lower byte is returned. For example, the wide EBCDIC character with the hexadecimal value 0x00C1 is converted to the single-byte character whose hexadecimal value is 0xC1.
When converting from wide EBCDIC to a multibyte CCSID, the conversion method is determined by the shift state of the output string:
For example, the EBCDIC wide character string with the hexadecimal value 00C143DA00C2 is translated to a multibyte string with the hexadecimal value C10E43DA0FC2.
If LOCALETYPE(*LOCALEUCS2) is specified on the compilation command, wide character values are 2-byte UCS-2 values. All conversions between UCS-2 strings and single-byte or multibyte strings are conducted as if the iconv() function were used. CCSID 13488 is used for the UCS-2 string, and the CCSID of the LC_CTYPE category of the current locale is used for the single-byte or multibyte string.
If LOCALETYPE(*LOCALEUTF) is specified on the compilation command, wide character values are 4-byte UTF-32 values. All conversions between UTF-32 strings and single-byte or multibyte strings are conducted as if the iconv() function were used. UTF-32 is not supported by the iconv() function. Therefore, in conversions between a UTF-32 string and a single-byte or multibyte string, UTF-16 (CCSID 1200) is used as an intermediary data type. Transformations between UTF-32 and UTF-16 are accomplished using the QlgTransformUCSData() API. The iconv() API is used for the conversion between UTF-16 and the CCSID of the LC_CTYPE category of the current locale.
Several routines, including fwprintf, vwprintf, vfwprintf, wprintf, fputwc, fputws, putwc, putwchar, and ungetwc can be used to write wide characters to a file. These routines are not available when either LOCALETYPE(*CLD) or SYSIFCOPT(*NOIFSIO) is specified on the compilation command.
If LOCALETYPE(*LOCALE) is specified on the compilation command, the wide characters that are written are assumed to be wide character equivalents of the code points in the file CCSID. The CCSID of the file is assumed to be a single or multibyte EBCDIC CCSID.
If LOCALETYPE(*LOCALEUCS2) or LOCALETYPE(*LOCALEUTF) is specified on the compilation command, the wide characters that are being written are assumed to be Unicode characters. For LOCALETYPE(*LOCALEUCS2), they are assumed to be 2-byte UCS-2 characters. For LOCALETYPE(*LOCALEUTF), they are assumed to be 4-byte UTF-32 characters. If the file that is being written to is not one of the standard files, the Unicode characters are then written directly to the file as if the file had been opened for writing in binary mode. The CCSID of the file is assumed to be a Unicode CCSID that matches the locale setting. If the file that is being written to is a standard file, the Unicode input is converted to the CCSID of the job before being written to the file.
The non-wide character write routines (fprintf, vfprintf, vprintf, and printfcan) can take a wide character as input.
In all cases, the wide characters are converted to multibyte character strings in the CCSID of the LC_CTYPE category of the current locale as if the wctomb function or the wcstombs function were used. The file CCSID is assumed to match the CCSID of the LC_CTYPE category of the current locale.
If LOCALETYPE(*LOCALEUTF) is specified on the compilation command and the file that is being written to is a standard file, the output will automatically be converted from the CCSID of the LC_CTYPE category of the current locale to the CCSID of the file (which usually matches the job CCSID).
The routines that can read wide characters from a file include fgetwc, fgetws, fwscanf, getwc, getwchar, vfwscanf, vwscanf, and wscanf. These routines are not available when either LOCALETYPE(*CLD) or SYSIFCOPT(*NOIFSIO) is specified on the compilation command.
If LOCALETYPE(*LOCALE) is specified on the compilation command, the wide characters read from the file are assumed to be EBCDIC wide character equivalents of the code points in the file CCSID.
If LOCALETYPE(*LOCALEUCS2) or LOCALETYPE(*LOCALEUTF) is specified on the compilation command, the input wide characters and the characters in the file are assumed to be Unicode characters. For LOCALETYPE(*LOCALEUCS2), they are assumed to be 2-byte UCS-2 characters. For LOCALETYPE(*LOCALEUTF), they are assumed to be 4-byte UTF-32 characters. If the file that is being read is not one of the standard files, the Unicode characters are read directly from the file as if the file had been opened in binary mode. The CCSID of the file is assumed to be a Unicode CCSID that matches the locale setting. If the file that is being read is a standard file, then the job CCSID input that is read from the file is converted to the appropriate Unicode CCSID.
The non-wide character read routines (fscanf, scanf, vfscanf, and vscanf) can produce a wide character as output.
In all cases, the wide characters are converted from multibyte character strings in the CCSID of the LC_CTYPE category of the current locale to the appropriate wide character type for the locale setting as if the mbtowc function or the mbstowcs function were used.