Unicode Functions

Note:

Unicode functions are only allowed when converting to Unicode internally.

Table 51. Unicode Functions

Function

Description

lengthp

Returns the string length in print position. Half-width characters take one print position, full-width characters take two, and combining characters take zero.

Syntax: dst_var = lengthp(source_value)

  • source_value = date or text literal, column, variable, or expression

  • dst_var = decimal, float, or integer variable

Example: let #printLen = lengthp($string))

lengtht

Returns the string length in bytes when converted (transformed) to a specified encoding. Encoding names are the same as those allowed in OPEN or in SQR.INI. String and column variables can be used in place of the literal encoding name.

Syntax: dst_var = lengtht(source_value, encoding_value)

  • source_value = date or text literal, column, variable, or expression

  • encoding_value = text literal, column, variable, or expression

  • dst_var = decimal, float, or integer variable

Example: let #sjisLen = lengtht($string, ‘shift-jis’)

substrp

(Returns a substring of a given string starting at a specified print position into the string and of a specified print length. When #printPos is in the middle of a full-width character, Production Reporting “rounds up” to the next character. When #printLen ends in a partial character, Production Reporting “rounds down” to the previous character.

Syntax: dst_var = substrb(source_value, offset_value, length_value)

  • source_value = date or text literal, column, variable, or expression.

  • offset_value = decimal, float, or integer literal, column, variable, or expression. The value is always converted to integer.

  • length_value = decimal, float, or integer literal, column, variable, or expression. The value is always converted to integer.

  • dst_var = text variable

Example: let $sub = substrp(&string, #printPos, #printlen)

substrt

Returns a Unicode string equivalent to a byte level substring of a given string after converting (transforming) the given string to a given encoding. If the substring of the converted string yields a partial character, that character will be truncated.

Syntax: dst_var = substrb(source_value, offset_value, length_value, encoding_value))

  • source_value = date or text literal, column, variable, or expression

  • offset_value = decimal, float, or integer literal, column, variable, or expression. The value is always converted to integer.

  • length_value = decimal, float, or integer literal, column, variable, or expression. The value is always converted to integer.

  • encoding_value = text literal, column, variable, or expression

  • dst_var = text variable

Example: let $sjisPrep = SUBSTRT ($string, 1, 10, ‘Shift-JIS’)

transform

Returns a Unicode string which is specified transform of a given string.

Syntax: dst_var = transform (source_value, transform_value)

  • source_value = date or text literal, column, variable or expression

  • transform_value = text literal, column, variable, or expression

  • dst_var - text variable

Example: let $hiragana = transform (&string, ‘ToHiragana’)

Production Reporting supports the following transforms:

(***Source: Rosette API Reference)

  • ToLowercase—Transforms all uppercase Latin letters to lowercase (this includes both "half-width" and "full-width" Latin characters).

  • ToUppercase—Transforms all lowercase Latin letters to uppercase (this includes both "half-width" and "full-width" Latin characters).

  • ToFullwidth—Transforms all half-width characters that also have a full-width representation to their full-width form.

    Characters with full-width representations are: Roman alphabet characters (A-z), digits (0-9), Japanese katakana characters, and the most commonly used punctuation characters (including Space).

  • ToHalfwidth —ransforms all full-width characters that also have a half-width representation to their half-width form.

    Characters with half-width representations are: Roman alphabet characters (A-z), digits (0-9), Japanese katakana characters, and the most commonly used punctuation characters (including Space).

  • ToHiragana—Transforms all full-width katakana characters to hiragana.

    To convert half-width katakana characters to hiragana, you must first convert the characters to full-width using the FullWidth transform.

  • ToParagraphSeparator—Standardizes the line/paragraph separators in the text according to the following standards:

    Standard Code Point Line/Paragraph Separator

    Windows 0x0D0A 0x0D0A

    Macintosh 0x0D ToCR

    UNIX 0x0A ToLF

    Unicode U+2028 ToLineSeparator

    Unicode U+2029 ToParagraphSeparator

    EBCDIC 0x15 ToEBCDICNewLine

  • HankakuKatakanaToZenkaku—Converts half-width (hankaku) Japanese katakana characters to the full-width (zenkaku) form.

    This conversion is almost identical to ToFullwidth, except that it automatically composes and combines katakana "accent" marks (dakuten and handakuten) appropriately, whereas ToFullwidth does not provide any special treatment for these marks.

  • ZenkakuKatakanaToHankaku—Converts full-width (zenkaku) Japanese katakana characters to the half-width (hankaku) form.

    This conversion is almost identical to ToHalfwidth, except that it automatically decomposes and separates katakana "accent" marks (dakuten and handakuten) appropriately, whereas ToHalfwidth does not provide any special treatment for these marks.

unicode

Returns a Unicode string from the string of hexadecimal values provided. The syntax of the literal for UNICODE is

'[whitespace | U+ | \u]XXXX…'

where X is a valid hexadecimal digit: 0-9, a-f, or A-F. The hexadecimal value will always be in big-endian form.

Syntax: dst_var = unicode(source_value)

  • source_value = text literal, column, variable or expression

  • dst_var = text variable

Example: let $uniStr = unicode ('U+5E73 U+2294')