man pages section 1: User Commands

Exit Print View

Updated: July 2014

gocr (1)


gocr - command line text recognition tool


gocr [OPTION] [-i] pnm-file


User Commands                                             GOCR(1)

     gocr - command line text recognition tool

     gocr [OPTION] [-i] pnm-file

     gocr is an optical character recognition program that can be
     used from the command line.  It takes  input  in  PNM,  PGM,
     PBM,  PPM, or PCX format, and writes recognized text to std-
     out.  If the pnm file is a single dash,  PNM  data  is  read
     from  stdin.   If gzip, bzip2 and netpbm-progs are installed
     and your system supports popen(3) also pnm.gz, pnm.bz2, png,
     jpg,  jpeg,  tiff,  gif, bmp, ps (only single pages) and eps
     are supported as input files (not as  input  stream),  where
     pnm can be replaced by one of ppm, pgm and pbm.

     -h   show usage information

     -i file
          read  input  from  file  (or  stdin if file is a single

     -o file
          send output to file instead of stdout

     -e file
          send errors to file instead of stderr or to  stdout  if
          file is a dash

     -x file
          progress  output  to  file  (file can be a file name, a
          fifo name or a file descriptor 1...255), this is useful
          for  GUI developpers to show the OCR progress, the file
          descriptor argument is only available, if compiled with
          __USE_POSIX defined

     -p path
          database  path, a final slash must be included, default
          is ./db/, this path will be populated  with  images  of
          learned characters

     -f format
          output  format  of  the  recognized text (ISO8859_1 TeX
          HTML XML UTF8 ASCII), XML will also output position and
          probability data

     -l level
          set  grey  level  to  level (0<160<=255, default: 0 for
          autodetect),  darker  pixels  belong   to   characters,
          brighter  pixels  are  interpreted as background of the

Linux                Last change: 29 Mar 2009                   1

User Commands                                             GOCR(1)

          input image

     -d size
          set dust size in pixels (clusters smaller than this are
          removed),  0 means no clusters are removed, the default
          is -1 for auto detection

     -s num
          set spacewidth between words in units of dots (default:
          0 for autodetect), wider widths are interpreted as word
          spaces, smaller as character spaces

     -v verbosity
          be verbose to stderr; verbosity is a bitfield

     -c string
          only  verbose  output  of  characters  from  string  to
          stderr,  more  output  is  generated for all characters
          within the string, the underscore  stands  for  unknown
          chars, this function is usefull to limit debug informa-
          tion to the necessary one

     -C string
          only recognise characters from string, this is a filter
          function  in cases where the interest is only to a part
          of the character alphabet, you can use 0-9  or  a-z  to
          specify ranges, use -- to detect the minus sign

     -a certainty
          set   value   for  certainty  of  recognition  (0..100;
          default: 95), characters with a  higher  certainty  are
          accepted, characters with a lower certainty are treated
          as unknown (not recognized); set higher values, if  you
          want to have only more certain recognized characters

     -u string
          output  this  string  for  every unrecognized character
          (default is "_")

     -m mode
          set oprational mode; mode is a bitfield (default: 0)

     -n bool
          if bool is non-zero, only recognise  numbers  (this  is
          now obsolete, use -C "0123456789")

     The verbosity is specified as a bitfield:

     1         print more info

     2         list shapes of boxes (see -c) to stderr

Linux                Last change: 29 Mar 2009                   2

User Commands                                             GOCR(1)

     4         list pattern of boxes (see -c) to stderr

     8         print pattern after recognition for debugging

     16        print debug information about recognition of lines
               to stderr

     32        create outXX.png with boxes and  lines  marked  on
               each general OCR-step

     The operation modes are:

     2         use database to recognize characters which are not
               recognized by other  algorithms,  (early  develop-

     4         switching  on  layout analysis or zoning (develop-

     8         don't compare unrecognized  characters  to  recog-
               nized one

     16        don't  try to divide overlapping characters to two
               or three single characters

     32        don't do context correction

     64        character packing, before recognition starts, sim-
               ilar  characters are searched and only one of this
               characters will be send to the recognition  engine

     130       extend  database,  prompts  user  for unidentified
               characters and extends  the  database  with  users
               answer (128+2, early development)

     256       switch  off  the  recognition  engine (makes sense
               together with -m 2)

     Joerg  Schulenburg  (see   for
     First version of man page by Tim Waugh <>

     This man page documents gocr, version 0.41.

     Report bugs to Joerg Schulenburg

Linux                Last change: 29 Mar 2009                   3

User Commands                                             GOCR(1)

     See   attributes(5)   for   descriptions  of  the  following

     |Availability   | image/gocr       |
     |Stability      | Volatile         |
     More  details   can   be   found   at   /usr/share/doc/gocr-
     X.XX/gocr.html.   Also  read /usr/share/doc/gocr-X.XX/README
     to learn, how to improve results.

     gocr -v 33 text1.pbm
          output verbose information, out30.png is created to see
          details of recognition process

     gocr -v 7 -c _YV text1.pbm
          verbose output for unknown chars and chars Y and V

     djpeg -pnm -gray text.jpg | gocr
          convert a jpeg file to pnm format and input via pipe

     This   software   was   built   from   source  available  at   The   original
     community   source   was   downloaded  from   http://prdown-

     Further information about this software can be found on  the
     open   source   community   website  at  http://jocr.source-

Linux                Last change: 29 Mar 2009                   4