Go to main content

man pages section 1: User Commands

Exit Print View

Updated: Wednesday, July 27, 2022
 
 

gendict (1)

Name

gendict - Compiles word list into ICU string trie dictionary

Synopsis

gendict [ --uchars | --bytes --transform transform ] [ -h, -?, --help ]
[ -V, --version ] [ -c, --copyright ] [ -v, --verbose ] [  -i,  --icud-
atadir directory ]  input-file  output-file

Description

GENDICT(1)                      ICU 69.1 Manual                     GENDICT(1)



NAME
       gendict - Compiles word list into ICU string trie dictionary

SYNOPSIS
       gendict [ --uchars | --bytes --transform transform ] [ -h, -?, --help ]
       [ -V, --version ] [ -c, --copyright ] [ -v, --verbose ] [  -i,  --icud-
       atadir directory ]  input-file  output-file

DESCRIPTION
       gendict  reads  the word list from dictionary-file and creates a string
       trie dictionary file. Normally this data file has the .dict extension.

       Words begin at the beginning of a line and are terminated by the  first
       whitespace.  Lines that begin with whitespace are ignored.

OPTIONS
       -h, -?, --help
              Print help about usage and exit.

       -V, --version
              Print the version of gendict and exit.

       -c, --copyright
              Embeds the standard ICU copyright into the output-file.

       -v, --verbose
              Display extra informative messages during execution.

       -i, --icudatadir directory
              Look  for  any necessary ICU data files in directory.  For exam-
              ple, the file pnames.icu must be located when ICU's data is  not
              built  as  a  shared library.  The default ICU data directory is
              specified by the environment variable ICU_DATA.  Most configura-
              tions of ICU do not require this argument.

       --uchars
              Set  the  output  trie  type  to  UChar. Mutually exclusive with
              --bytes.

       --bytes
              Set the output trie  type  to  Bytes.  Mutually  exclusive  with
              --uchars.

       --transform
              Set  the  transform type. Should only be specified with --bytes.
              Currently supported transforms are:  offset-<hex-number>,  which
              specifies  an  offset to subtract from all input characters.  It
              should be noted that the offset transform also  maps  U+200D  to
              0xFF and U+200C to 0xFE, in order to offer compatibility to lan-
              guages that require these characters.  A transform must be spec-
              ified  for a bytes trie, and when applied to the non-value char-
              acters in the input-file must produce output  between  0x00  and
              0xFF.

        input-file
              The source file to read.

        output-file
              The file to write the output dictionary to.

CAVEATS
       The  input-file is assumed to be encoded in UTF-8.  The integers in the
       input-file that are used as values must be made  up  of  ASCII  digits.
       They  may be specified either in hex, by using a 0x prefix, or in deci-
       mal.  Either --bytes or --uchars must be specified.

ENVIRONMENT
       ICU_DATA  Specifies the directory  containing  ICU  data.  Defaults  to
                 ${prefix}/share/icu/69.1/.   Some  tools in ICU depend on the
                 presence of the trailing slash. It is thus important to  make
                 sure that it is present if ICU_DATA is set.

AUTHORS
       Maxime Serrano

VERSION
       1.0

COPYRIGHT
       Copyright (C) 2012 International Business Machines Corporation and oth-
       ers


ATTRIBUTES
       See attributes(7) for descriptions of the following attributes:


       +---------------+-----------------------+
       |ATTRIBUTE TYPE |   ATTRIBUTE VALUE     |
       +---------------+-----------------------+
       |Availability   | developer/icu         |
       +---------------+-----------------------+
       |Stability      | Pass-through volatile |
       +---------------+-----------------------+

SEE ALSO
       http://www.icu-project.org/userguide/boundaryAnalysis.html




NOTES
       Source code for open source software components in Oracle  Solaris  can
       be found at https://www.oracle.com/downloads/opensource/solaris-source-
       code-downloads.html.

       This    software    was    built    from    source     available     at
       https://github.com/oracle/solaris-userland.    The  original  community
       source     was     downloaded     from      https://github.com/unicode-
       org/icu/releases/download/release-69-1/icu4c-69_1-src.tgz.

       Further information about this software can be found on the open source
       community website at http://site.icu-project.org/.



ICU MANPAGE                       1 June 2012                       GENDICT(1)