gendict - Compiles word list into ICU string trie dictionary
gendict [ --uchars | --bytes --transform transform ] [ -h, -?, --help ] [ -V, --version ] [ -c, --copyright ] [ -v, --verbose ] [ -i, --icud- atadir directory ] input-file output-file
GENDICT(1) ICU 69.1 Manual GENDICT(1)
NAME
gendict - Compiles word list into ICU string trie dictionary
SYNOPSIS
gendict [ --uchars | --bytes --transform transform ] [ -h, -?, --help ]
[ -V, --version ] [ -c, --copyright ] [ -v, --verbose ] [ -i, --icud-
atadir directory ] input-file output-file
DESCRIPTION
gendict reads the word list from dictionary-file and creates a string
trie dictionary file. Normally this data file has the .dict extension.
Words begin at the beginning of a line and are terminated by the first
whitespace. Lines that begin with whitespace are ignored.
OPTIONS
-h, -?, --help
Print help about usage and exit.
-V, --version
Print the version of gendict and exit.
-c, --copyright
Embeds the standard ICU copyright into the output-file.
-v, --verbose
Display extra informative messages during execution.
-i, --icudatadir directory
Look for any necessary ICU data files in directory. For exam-
ple, the file pnames.icu must be located when ICU's data is not
built as a shared library. The default ICU data directory is
specified by the environment variable ICU_DATA. Most configura-
tions of ICU do not require this argument.
--uchars
Set the output trie type to UChar. Mutually exclusive with
--bytes.
--bytes
Set the output trie type to Bytes. Mutually exclusive with
--uchars.
--transform
Set the transform type. Should only be specified with --bytes.
Currently supported transforms are: offset-<hex-number>, which
specifies an offset to subtract from all input characters. It
should be noted that the offset transform also maps U+200D to
0xFF and U+200C to 0xFE, in order to offer compatibility to lan-
guages that require these characters. A transform must be spec-
ified for a bytes trie, and when applied to the non-value char-
acters in the input-file must produce output between 0x00 and
0xFF.
input-file
The source file to read.
output-file
The file to write the output dictionary to.
CAVEATS
The input-file is assumed to be encoded in UTF-8. The integers in the
input-file that are used as values must be made up of ASCII digits.
They may be specified either in hex, by using a 0x prefix, or in deci-
mal. Either --bytes or --uchars must be specified.
ENVIRONMENT
ICU_DATA Specifies the directory containing ICU data. Defaults to
${prefix}/share/icu/69.1/. Some tools in ICU depend on the
presence of the trailing slash. It is thus important to make
sure that it is present if ICU_DATA is set.
AUTHORS
Maxime Serrano
VERSION
1.0
COPYRIGHT
Copyright (C) 2012 International Business Machines Corporation and oth-
ers
ATTRIBUTES
See attributes(7) for descriptions of the following attributes:
+---------------+-----------------------+
|ATTRIBUTE TYPE | ATTRIBUTE VALUE |
+---------------+-----------------------+
|Availability | developer/icu |
+---------------+-----------------------+
|Stability | Pass-through volatile |
+---------------+-----------------------+
SEE ALSO
http://www.icu-project.org/userguide/boundaryAnalysis.html
NOTES
Source code for open source software components in Oracle Solaris can
be found at https://www.oracle.com/downloads/opensource/solaris-source-
code-downloads.html.
This software was built from source available at
https://github.com/oracle/solaris-userland. The original community
source was downloaded from https://github.com/unicode-
org/icu/releases/download/release-69-1/icu4c-69_1-src.tgz.
Further information about this software can be found on the open source
community website at http://site.icu-project.org/.
ICU MANPAGE 1 June 2012 GENDICT(1)