org.apache.nutch.parse
Class ParseUtil

java.lang.Object
  extended by org.apache.nutch.parse.ParseUtil

public class ParseUtil
extends Object

A Utility class containing methods to simply perform parsing utilities such as iterating through a preferred list of Parsers to obtain Parse objects.

Author:
mattmann, Jérôme Charron, Sébastien Le Callonnec

Field Summary
static org.apache.log4j.Logger mLogger
           
 
Constructor Summary
ParseUtil(Configuration conf)
           
 
Method Summary
 boolean isParserRegistered(String extensionId)
           
 Parse parse(Content content)
          Performs a parse by iterating through a List of preferred Parsers until a successful parse is performed and a Parse object is returned.
 Parse parseByExtensionId(String extId, Content content)
          Method parses a Content object using the Parser specified by the parameter extId, i.e., the Parser's extension ID.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

mLogger

public static final org.apache.log4j.Logger mLogger
Constructor Detail

ParseUtil

public ParseUtil(Configuration conf)
Parameters:
conf -
Method Detail

parse

public Parse parse(Content content)
            throws ParseException
Performs a parse by iterating through a List of preferred Parsers until a successful parse is performed and a Parse object is returned. If the parse is unsuccessful, a message is logged to the WARNING level, and an empty parse is returned.

Parameters:
content - The content to try and parse.
Returns:
A Parse object containing the parsed data.
Throws:
ParseException - If no suitable parser is found to perform the parse.

parseByExtensionId

public Parse parseByExtensionId(String extId,
                                Content content)
                         throws ParseException
Method parses a Content object using the Parser specified by the parameter extId, i.e., the Parser's extension ID. If a suitable Parser is not found, then a WARNING level message is logged, and a ParseException is thrown. If the parse is uncessful for any other reason, then a WARNING level message is logged, and a ParseStatus.getEmptyParse() is returned.

Parameters:
extId - The extension implementation ID of the Parser to use to parse the specified content.
content - The content to parse.
Returns:
A Parse object if the parse is successful, otherwise, a ParseStatus.getEmptyParse().
Throws:
ParseException - If there is no suitable Parser found to perform the parse.

isParserRegistered

public boolean isParserRegistered(String extensionId)


Copyright © 2007, 2012, Oracle and/or its affiliates. All rights reserved.