UVALID returns a FIXED BIN(31) value which is zero if a string contains valid UTF data
and which is the index of the first invalid element if the string does not contain valid UTF data.
 >>-UVALID(x)---------------------------------------------------><
|
- x
- Expression which must have CHARACTER or WIDECHAR type.
If x has CHARACTER type,
then UVALID(x) will return 0 if the string contains valid UTF-8 data,
and otherwise it will return the index of the byte where the first invalid UTF-8 data starts.
If x has WIDECHAR type,
then UVALID(x) will return 0 if the string contains valid UTF-16 data,
and otherwise it will return the index of the widechar where the first invalid UTF-16 data starts.
Note that UVALID will indicate if the string contains valid UTF data (according to the rules below).
It does not indicate if these bytes have actually been allocated to represent any particular
character.
For UTF-8 data, the validity of a byte varies as follows according to its range:
- '00'x - '7f'x, it is valid
- '80'x - 'c1'x, it is invalid
- 'c2'x - 'df'x, it is valid if followed by a second byte and if that byte is in the range '80'x to 'bf'x
- 'e0'x - 'ef'x, it is valid if followed by 2 more bytes and if
- when the first byte is 'e0'x,
the second and third bytes must be in the ranges 'a0'x to 'bf'x and '80'x to 'bf'x, respectively.
- when the first byte is in the range 'e1'x to 'ec'x,
the second and third bytes must be in the ranges '80'x to 'bf'x
- when the first byte is 'ed'x,
the second and third bytes must be in the ranges '80'x to '9f'x and '80'x to 'bf'x, respectively.
- when the first byte is in the range 'ee'x to 'ef'x,
the second and third bytes must be in the ranges '80'x to 'bf'x
- 'f0'x - 'f4'x, it is valid if followed by 3 more bytes and if
- when the first byte is 'f0'x,
the second, third and fourth bytes must be in the ranges '90'x to 'bf', '80'x to 'bf'x and '80'x to 'bf'x, respectively.
- when the first byte is in the range 'f1'x to 'f3'x,
the second, third and fourth bytes must be in the range '80'x to 'bf'x
- when the first byte is 'f4'x,
the second, third and fourth bytes must be in the ranges '80'x to '8f'x, '80'x to 'bf'x and '80'x to 'bf'x, respectively.
- 'f5'x - 'ff'x, it is invalid
For UTF-16 data, the validity of a widechar varies as follows according to its range:
- '0000'wx - '007f'wx, it is valid and would be 1 byte if UTF-8
- '0080'wx - '07ff'wx, it is valid and would be 2 bytes if UTF-8
- '0800'wx - 'd7ff'wx, it is valid and would be 3 bytes if UTF-8
- 'd800'wx - 'dbff'x, it is valid if followed by a second widechar with a value of at least 'dc00'wx.
It is a unicode surrogate pair and would be 4 bytes if UTF-8
- 'dc00'wx - 'ffff'wx, it is valid and would be 3 bytes if UTF-8
|
This information center is powered by Eclipse technology. (http://www.eclipse.org)