|
Jive Forums API (5.5.20.2-oracle) Developer Javadocs | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectcom.jivesoftware.base.util.HtmlExtractor
public class HtmlExtractor
A parser for html. Parses the html, extracts all text, and discards everything else such as comments, tags etc.
Field Summary | |
---|---|
protected java.io.BufferedReader |
_reader
|
protected java.lang.String |
parsedText
|
Constructor Summary | |
---|---|
HtmlExtractor()
Constructor |
Method Summary | |
---|---|
java.lang.String |
getDescription()
Returns the description for the extractor. |
java.lang.String |
getName()
Returns the name of the extractor. |
java.util.List |
getSupportedFileTypes()
Returns a list of file extensions that the extractor knows how to handle. |
java.lang.String |
getText(java.lang.String htmlContent)
|
java.lang.String |
getText(java.lang.String filename,
java.io.File file)
Retrieves text from a file. |
java.lang.String |
getText(java.lang.String filename,
java.io.InputStream is)
Retrieves text from a file. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected java.io.BufferedReader _reader
protected java.lang.String parsedText
Constructor Detail |
---|
public HtmlExtractor()
Method Detail |
---|
public java.lang.String getName()
TextExtractor
getName
in interface TextExtractor
public java.lang.String getDescription()
TextExtractor
getDescription
in interface TextExtractor
public java.util.List getSupportedFileTypes()
TextExtractor
getSupportedFileTypes
in interface TextExtractor
public java.lang.String getText(java.lang.String filename, java.io.File file) throws java.io.IOException
TextExtractor
getText
in interface TextExtractor
filename
- the 'real' filename of the file to parsefile
- the actual file to parse
java.io.IOException
- if an error occurs reading the filepublic java.lang.String getText(java.lang.String filename, java.io.InputStream is) throws java.io.IOException
TextExtractor
The stream should be closed after reading the data is completed. The InputStream will already be buffered, so there is no advantage to using additional buffering.
getText
in interface TextExtractor
filename
- the 'real' filename of the file to parseis
- an inputstream to read content from the file
java.io.IOException
- if an error occurs reading the filepublic java.lang.String getText(java.lang.String htmlContent)
|
Jive Forums Project Page | |||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |