Unicode Functions

Note:

Unicode functions are only allowed when converting to Unicode internally.

Table 51. Unicode Functions

Function	Description
lengthp	Returns the string length in print position. Half-width characters take one print position, full-width characters take two, and combining characters take zero. Syntax: dst_var = lengthp(source_value) source_value = date or text literal, column, variable, or expression dst_var = decimal, float, or integer variable Example: let #printLen = lengthp($string))
lengtht	Returns the string length in bytes when converted (transformed) to a specified encoding. Encoding names are the same as those allowed in OPEN or in SQR.INI. String and column variables can be used in place of the literal encoding name. Syntax: dst_var = lengtht(source_value, encoding_value) source_value = date or text literal, column, variable, or expression encoding_value = text literal, column, variable, or expression dst_var = decimal, float, or integer variable Example: let #sjisLen = lengtht($string, ‘shift-jis’)
substrp	(Returns a substring of a given string starting at a specified print position into the string and of a specified print length. When #printPos is in the middle of a full-width character, Production Reporting “rounds up” to the next character. When #printLen ends in a partial character, Production Reporting “rounds down” to the previous character. Syntax: dst_var = substrb(source_value, offset_value, length_value) source_value = date or text literal, column, variable, or expression. offset_value = decimal, float, or integer literal, column, variable, or expression. The value is always converted to integer. length_value = decimal, float, or integer literal, column, variable, or expression. The value is always converted to integer. dst_var = text variable Example: let $sub = substrp(&string, #printPos, #printlen)
substrt	Returns a Unicode string equivalent to a byte level substring of a given string after converting (transforming) the given string to a given encoding. If the substring of the converted string yields a partial character, that character will be truncated. Syntax: dst_var = substrb(source_value, offset_value, length_value, encoding_value)) source_value = date or text literal, column, variable, or expression offset_value = decimal, float, or integer literal, column, variable, or expression. The value is always converted to integer. length_value = decimal, float, or integer literal, column, variable, or expression. The value is always converted to integer. encoding_value = text literal, column, variable, or expression dst_var = text variable Example: let $sjisPrep = SUBSTRT ($string, 1, 10, ‘Shift-JIS’)
transform	Returns a Unicode string which is specified transform of a given string. Syntax: dst_var = transform (source_value, transform_value) source_value = date or text literal, column, variable or expression transform_value = text literal, column, variable, or expression dst_var - text variable Example: let $hiragana = transform (&string, ‘ToHiragana’) Production Reporting supports the following transforms: (*Source: Rosette API Reference) ToLowercase—Transforms all uppercase Latin letters to lowercase (this includes both "half-width" and "full-width" Latin characters). ToUppercase—Transforms all lowercase Latin letters to uppercase (this includes both "half-width" and "full-width" Latin characters). ToFullwidth—Transforms all half-width characters that also have a full-width representation to their full-width form. Characters with full-width representations are: Roman alphabet characters (A-z), digits (0-9), Japanese katakana characters, and the most commonly used punctuation characters (including Space). ToHalfwidth —ransforms all full-width characters that also have a half-width representation to their half-width form. Characters with half-width representations are: Roman alphabet characters (A-z), digits (0-9), Japanese katakana characters, and the most commonly used punctuation characters (including Space). ToHiragana—Transforms all full-width katakana characters to hiragana. To convert half-width katakana characters to hiragana, you must first convert the characters to full-width using the FullWidth transform. ToParagraphSeparator—Standardizes the line/paragraph separators in the text according to the following standards: Standard Code Point Line/Paragraph Separator Windows 0x0D0A 0x0D0A Macintosh 0x0D ToCR UNIX 0x0A ToLF Unicode U+2028 ToLineSeparator Unicode U+2029 ToParagraphSeparator EBCDIC 0x15 ToEBCDICNewLine HankakuKatakanaToZenkaku—Converts half-width (hankaku) Japanese katakana characters to the full-width (zenkaku) form. This conversion is almost identical to ToFullwidth, except that it automatically composes and combines katakana "accent" marks (dakuten and handakuten) appropriately, whereas ToFullwidth does not provide any special treatment for these marks. ZenkakuKatakanaToHankaku**—Converts full-width (zenkaku) Japanese katakana characters to the half-width (hankaku) form. This conversion is almost identical to ToHalfwidth, except that it automatically decomposes and separates katakana "accent" marks (dakuten and handakuten) appropriately, whereas ToHalfwidth does not provide any special treatment for these marks.
unicode	Returns a Unicode string from the string of hexadecimal values provided. The syntax of the literal for UNICODE is '[whitespace \| U+ \| \u]XXXX…' where X is a valid hexadecimal digit: 0-9, a-f, or A-F. The hexadecimal value will always be in big-endian form. Syntax: dst_var = unicode(source_value) source_value = text literal, column, variable or expression dst_var = text variable Example: let $uniStr = unicode ('U+5E73 U+2294')