Jive Forums API (5.5.20.2-oracle) Developer Javadocs

com.jivesoftware.base.util
Interface TextExtractor

All Known Implementing Classes:
HtmlExtractor, PDFExtractor, PowerPointExtractor, WordExtractor

public interface TextExtractor

A text extractor is a class which knows how to retrieve plain text from different file types such as PDF, HTML, Word, Excel, etc.


Method Summary
 java.lang.String getDescription()
          Returns the description for the extractor.
 java.lang.String getName()
          Returns the name of the extractor.
 java.util.List getSupportedFileTypes()
          Returns a list of file extensions that the extractor knows how to handle.
 java.lang.String getText(java.lang.String filename, java.io.File file)
          Retrieves text from a file.
 java.lang.String getText(java.lang.String filename, java.io.InputStream is)
          Retrieves text from a file.
 

Method Detail

getName

java.lang.String getName()
Returns the name of the extractor.


getDescription

java.lang.String getDescription()
Returns the description for the extractor.


getSupportedFileTypes

java.util.List getSupportedFileTypes()
Returns a list of file extensions that the extractor knows how to handle. For instance, an extractor that knows how to parse HTML documents would likely return '.html' and '.htm'. Extractors which implement this interface should return extensions in lowercase with a leading period.


getText

java.lang.String getText(java.lang.String filename,
                         java.io.File file)
                         throws java.io.IOException
Retrieves text from a file. Since Jive stores attachments with a filename different than the 'real' filename, the real filename should be passed in. If the parser fails for whatever reason parsing the text it will return null.

Parameters:
filename - the 'real' filename of the file to parse
file - the actual file to parse
Returns:
the plain text contained in the file, or null if parsing fails
Throws:
java.io.IOException - if an error occurs reading the file
java.lang.IllegalArgumentException - if the filename is not of a type that the extractor can extract text from

getText

java.lang.String getText(java.lang.String filename,
                         java.io.InputStream is)
                         throws java.io.IOException
Retrieves text from a file. Since Jive stores attachments with a filename different than the 'real' filename, the real filename should be passed in. If the parser fails for whatever reason parsing the text it will return null.

The stream should be closed after reading the data is completed. The InputStream will already be buffered, so there is no advantage to using additional buffering.

Parameters:
filename - the 'real' filename of the file to parse
is - an inputstream to read content from the file
Returns:
the plain text contained in the file, or null if parsing fails
Throws:
java.io.IOException - if an error occurs reading the file
java.lang.IllegalArgumentException - if the filename is not of a type that the extractor can extract text from

Jive Forums Project Page

Copyright © 1999-2006 Jive Software.