org.apache.nutch.parse
Interface HtmlParseFilter

All Superinterfaces:
Configurable, Pluggable

Deprecated. Extension point for DOM-based HTML parsers. Permits one to add additional metadata to HTML parses. All plugins found which implement this extension point are run sequentially on the parse.

public interface HtmlParseFilter
extends Pluggable, Configurable


Field Summary
static String X_POINT_ID
          Deprecated. The name of the extension point.
 
Method Summary
 Parse filter(Content content, Parse parse, HTMLMetaTags metaTags, DocumentFragment doc)
          Deprecated. Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.
 
Methods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
 

Field Detail

X_POINT_ID

static final String X_POINT_ID
Deprecated. 
The name of the extension point.

Method Detail

filter

Parse filter(Content content,
             Parse parse,
             HTMLMetaTags metaTags,
             DocumentFragment doc)
Deprecated. 
Adds metadata or otherwise modifies a parse of HTML content, given the DOM tree of a page.



Copyright © 2007, 2012, Oracle and/or its affiliates. All rights reserved.