Package org.apache.nutch.parse

Interface Summary
HtmlParseFilter Deprecated. Extension point for DOM-based HTML parsers.
Parse The result of parsing a page's raw content.
ParseFilter  
Parser A parser for content generated by a Protocol implementation.
 

Class Summary
HTMLMetaTags This class holds the information about HTML "meta" tags extracted from a page.
HtmlParseFilters Deprecated. Creates and caches HtmlParseFilter implementing plugins.
Outlink  
OutlinkExtractor Extractor to extract Outlinks / URLs from plain text using Regular Expressions.
ParseData Data extracted from a page's content.
ParseFilters  
ParseImpl The result of parsing a page's raw content.
ParserFactory Creates and caches Parser plugins.
ParseStatus  
ParseText  
ParseUtil A Utility class containing methods to simply perform parsing utilities such as iterating through a preferred list of Parsers to obtain Parse objects.
 

Exception Summary
ParseException  
ParserNotFound  
 



Copyright © 2007, 2012, Oracle and/or its affiliates. All rights reserved.