gendict - Compiles word list into ICU string trie dictionary
gendict [ --uchars | --bytes --transform transform ] [ -h, -?, --help ] [ -V, --version ] [ -c, --copyright ] [ -v, --verbose ] [ -i, --icud- atadir directory ] input-file output-file
GENDICT(1) ICU 69.1 Manual GENDICT(1) NAME gendict - Compiles word list into ICU string trie dictionary SYNOPSIS gendict [ --uchars | --bytes --transform transform ] [ -h, -?, --help ] [ -V, --version ] [ -c, --copyright ] [ -v, --verbose ] [ -i, --icud- atadir directory ] input-file output-file DESCRIPTION gendict reads the word list from dictionary-file and creates a string trie dictionary file. Normally this data file has the .dict extension. Words begin at the beginning of a line and are terminated by the first whitespace. Lines that begin with whitespace are ignored. OPTIONS -h, -?, --help Print help about usage and exit. -V, --version Print the version of gendict and exit. -c, --copyright Embeds the standard ICU copyright into the output-file. -v, --verbose Display extra informative messages during execution. -i, --icudatadir directory Look for any necessary ICU data files in directory. For exam- ple, the file pnames.icu must be located when ICU's data is not built as a shared library. The default ICU data directory is specified by the environment variable ICU_DATA. Most configura- tions of ICU do not require this argument. --uchars Set the output trie type to UChar. Mutually exclusive with --bytes. --bytes Set the output trie type to Bytes. Mutually exclusive with --uchars. --transform Set the transform type. Should only be specified with --bytes. Currently supported transforms are: offset-<hex-number>, which specifies an offset to subtract from all input characters. It should be noted that the offset transform also maps U+200D to 0xFF and U+200C to 0xFE, in order to offer compatibility to lan- guages that require these characters. A transform must be spec- ified for a bytes trie, and when applied to the non-value char- acters in the input-file must produce output between 0x00 and 0xFF. input-file The source file to read. output-file The file to write the output dictionary to. CAVEATS The input-file is assumed to be encoded in UTF-8. The integers in the input-file that are used as values must be made up of ASCII digits. They may be specified either in hex, by using a 0x prefix, or in deci- mal. Either --bytes or --uchars must be specified. ENVIRONMENT ICU_DATA Specifies the directory containing ICU data. Defaults to ${prefix}/share/icu/69.1/. Some tools in ICU depend on the presence of the trailing slash. It is thus important to make sure that it is present if ICU_DATA is set. AUTHORS Maxime Serrano VERSION 1.0 COPYRIGHT Copyright (C) 2012 International Business Machines Corporation and oth- ers ATTRIBUTES See attributes(7) for descriptions of the following attributes: +---------------+-----------------------+ |ATTRIBUTE TYPE | ATTRIBUTE VALUE | +---------------+-----------------------+ |Availability | developer/icu | +---------------+-----------------------+ |Stability | Pass-through volatile | +---------------+-----------------------+ SEE ALSO http://www.icu-project.org/userguide/boundaryAnalysis.html NOTES Source code for open source software components in Oracle Solaris can be found at https://www.oracle.com/downloads/opensource/solaris-source- code-downloads.html. This software was built from source available at https://github.com/oracle/solaris-userland. The original community source was downloaded from https://github.com/unicode- org/icu/releases/download/release-69-1/icu4c-69_1-src.tgz. Further information about this software can be found on the open source community website at http://site.icu-project.org/. ICU MANPAGE 1 June 2012 GENDICT(1)