|
Jive Forums API (5.5.20.2-oracle) Developer Javadocs | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
public interface TextExtractor
A text extractor is a class which knows how to retrieve plain text from different file types such as PDF, HTML, Word, Excel, etc.
Method Summary | |
---|---|
java.lang.String |
getDescription()
Returns the description for the extractor. |
java.lang.String |
getName()
Returns the name of the extractor. |
java.util.List |
getSupportedFileTypes()
Returns a list of file extensions that the extractor knows how to handle. |
java.lang.String |
getText(java.lang.String filename,
java.io.File file)
Retrieves text from a file. |
java.lang.String |
getText(java.lang.String filename,
java.io.InputStream is)
Retrieves text from a file. |
Method Detail |
---|
java.lang.String getName()
java.lang.String getDescription()
java.util.List getSupportedFileTypes()
java.lang.String getText(java.lang.String filename, java.io.File file) throws java.io.IOException
filename
- the 'real' filename of the file to parsefile
- the actual file to parse
java.io.IOException
- if an error occurs reading the file
java.lang.IllegalArgumentException
- if the filename is not of a type that the extractor
can extract text fromjava.lang.String getText(java.lang.String filename, java.io.InputStream is) throws java.io.IOException
The stream should be closed after reading the data is completed. The InputStream will already be buffered, so there is no advantage to using additional buffering.
filename
- the 'real' filename of the file to parseis
- an inputstream to read content from the file
java.io.IOException
- if an error occurs reading the file
java.lang.IllegalArgumentException
- if the filename is not of a type that the extractor
can extract text from
|
Jive Forums Project Page | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |