Jive Forums API (5.5.20.2-oracle) Developer Javadocs

com.jivesoftware.base.util
Class WordExtractor

java.lang.Object
  extended by com.jivesoftware.base.util.WordExtractor
All Implemented Interfaces:
TextExtractor

public class WordExtractor
extends java.lang.Object
implements TextExtractor

This class extracts the text from a Word 6.0/95/97/2000/XP word doc


Constructor Summary
WordExtractor()
          Constructor
 
Method Summary
 java.lang.String extractText(java.io.InputStream in)
          Gets the text from a Word document.
 java.lang.String getDescription()
          Returns the description for the extractor.
 java.lang.String getName()
          Returns the name of the extractor.
 java.util.List getSupportedFileTypes()
          Returns a list of file extensions that the extractor knows how to handle.
 java.lang.String getText(java.lang.String filename, java.io.File file)
          Retrieves text from a file.
 java.lang.String getText(java.lang.String filename, java.io.InputStream is)
          Retrieves text from a file.
static java.lang.String removeControlCharacters(java.lang.String str)
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

WordExtractor

public WordExtractor()
Constructor

Method Detail

getName

public java.lang.String getName()
Description copied from interface: TextExtractor
Returns the name of the extractor.

Specified by:
getName in interface TextExtractor

getDescription

public java.lang.String getDescription()
Description copied from interface: TextExtractor
Returns the description for the extractor.

Specified by:
getDescription in interface TextExtractor

getSupportedFileTypes

public java.util.List getSupportedFileTypes()
Description copied from interface: TextExtractor
Returns a list of file extensions that the extractor knows how to handle. For instance, an extractor that knows how to parse HTML documents would likely return '.html' and '.htm'. Extractors which implement this interface should return extensions in lowercase with a leading period.

Specified by:
getSupportedFileTypes in interface TextExtractor

getText

public java.lang.String getText(java.lang.String filename,
                                java.io.File file)
                         throws java.io.IOException
Description copied from interface: TextExtractor
Retrieves text from a file. Since Jive stores attachments with a filename different than the 'real' filename, the real filename should be passed in. If the parser fails for whatever reason parsing the text it will return null.

Specified by:
getText in interface TextExtractor
Parameters:
filename - the 'real' filename of the file to parse
file - the actual file to parse
Returns:
the plain text contained in the file, or null if parsing fails
Throws:
java.io.IOException - if an error occurs reading the file

getText

public java.lang.String getText(java.lang.String filename,
                                java.io.InputStream is)
                         throws java.io.IOException
Description copied from interface: TextExtractor
Retrieves text from a file. Since Jive stores attachments with a filename different than the 'real' filename, the real filename should be passed in. If the parser fails for whatever reason parsing the text it will return null.

The stream should be closed after reading the data is completed. The InputStream will already be buffered, so there is no advantage to using additional buffering.

Specified by:
getText in interface TextExtractor
Parameters:
filename - the 'real' filename of the file to parse
is - an inputstream to read content from the file
Returns:
the plain text contained in the file, or null if parsing fails
Throws:
java.io.IOException - if an error occurs reading the file

extractText

public java.lang.String extractText(java.io.InputStream in)
                             throws java.lang.Exception
Gets the text from a Word document.

Parameters:
in - The InputStream representing the Word file.
Throws:
java.lang.Exception

removeControlCharacters

public static final java.lang.String removeControlCharacters(java.lang.String str)

Jive Forums Project Page

Copyright © 1999-2006 Jive Software.