Jive Forums API (5.5.20.2-oracle) Developer Javadocs

com.jivesoftware.base.util
Class HtmlExtractor

java.lang.Object
  extended by com.jivesoftware.base.util.HtmlExtractor
All Implemented Interfaces:
TextExtractor

public class HtmlExtractor
extends java.lang.Object
implements TextExtractor

A parser for html. Parses the html, extracts all text, and discards everything else such as comments, tags etc.


Field Summary
protected  java.io.BufferedReader _reader
           
protected  java.lang.String parsedText
           
 
Constructor Summary
HtmlExtractor()
          Constructor
 
Method Summary
 java.lang.String getDescription()
          Returns the description for the extractor.
 java.lang.String getName()
          Returns the name of the extractor.
 java.util.List getSupportedFileTypes()
          Returns a list of file extensions that the extractor knows how to handle.
 java.lang.String getText(java.lang.String htmlContent)
           
 java.lang.String getText(java.lang.String filename, java.io.File file)
          Retrieves text from a file.
 java.lang.String getText(java.lang.String filename, java.io.InputStream is)
          Retrieves text from a file.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

_reader

protected java.io.BufferedReader _reader

parsedText

protected java.lang.String parsedText
Constructor Detail

HtmlExtractor

public HtmlExtractor()
Constructor

Method Detail

getName

public java.lang.String getName()
Description copied from interface: TextExtractor
Returns the name of the extractor.

Specified by:
getName in interface TextExtractor

getDescription

public java.lang.String getDescription()
Description copied from interface: TextExtractor
Returns the description for the extractor.

Specified by:
getDescription in interface TextExtractor

getSupportedFileTypes

public java.util.List getSupportedFileTypes()
Description copied from interface: TextExtractor
Returns a list of file extensions that the extractor knows how to handle. For instance, an extractor that knows how to parse HTML documents would likely return '.html' and '.htm'. Extractors which implement this interface should return extensions in lowercase with a leading period.

Specified by:
getSupportedFileTypes in interface TextExtractor

getText

public java.lang.String getText(java.lang.String filename,
                                java.io.File file)
                         throws java.io.IOException
Description copied from interface: TextExtractor
Retrieves text from a file. Since Jive stores attachments with a filename different than the 'real' filename, the real filename should be passed in. If the parser fails for whatever reason parsing the text it will return null.

Specified by:
getText in interface TextExtractor
Parameters:
filename - the 'real' filename of the file to parse
file - the actual file to parse
Returns:
the plain text contained in the file, or null if parsing fails
Throws:
java.io.IOException - if an error occurs reading the file

getText

public java.lang.String getText(java.lang.String filename,
                                java.io.InputStream is)
                         throws java.io.IOException
Description copied from interface: TextExtractor
Retrieves text from a file. Since Jive stores attachments with a filename different than the 'real' filename, the real filename should be passed in. If the parser fails for whatever reason parsing the text it will return null.

The stream should be closed after reading the data is completed. The InputStream will already be buffered, so there is no advantage to using additional buffering.

Specified by:
getText in interface TextExtractor
Parameters:
filename - the 'real' filename of the file to parse
is - an inputstream to read content from the file
Returns:
the plain text contained in the file, or null if parsing fails
Throws:
java.io.IOException - if an error occurs reading the file

getText

public java.lang.String getText(java.lang.String htmlContent)

Jive Forums Project Page

Copyright © 1999-2006 Jive Software.