org.apache.nutch.parse
Class ParseUtil
java.lang.Object
org.apache.nutch.parse.ParseUtil
public class ParseUtil
- extends Object
A Utility class containing methods to simply perform parsing utilities such
as iterating through a preferred list of Parsers to obtain
Parse objects.
- Author:
- mattmann, Jérôme Charron, Sébastien Le Callonnec
|
Field Summary |
static org.apache.log4j.Logger |
mLogger
|
| Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
mLogger
public static final org.apache.log4j.Logger mLogger
ParseUtil
public ParseUtil(Configuration conf)
- Parameters:
conf -
parse
public Parse parse(Content content)
throws ParseException
- Performs a parse by iterating through a List of preferred
Parsers
until a successful parse is performed and a Parse object is
returned. If the parse is unsuccessful, a message is logged to the
WARNING level, and an empty parse is returned.
- Parameters:
content - The content to try and parse.
- Returns:
- A
Parse object containing the parsed data.
- Throws:
ParseException - If no suitable parser is found to perform the parse.
parseByExtensionId
public Parse parseByExtensionId(String extId,
Content content)
throws ParseException
- Method parses a
Content object using the Parser specified
by the parameter extId, i.e., the Parser's extension ID.
If a suitable Parser is not found, then a WARNING
level message is logged, and a ParseException is thrown. If the parse is
uncessful for any other reason, then a WARNING level
message is logged, and a ParseStatus.getEmptyParse() is
returned.
- Parameters:
extId - The extension implementation ID of the Parser to use
to parse the specified content.content - The content to parse.
- Returns:
- A
Parse object if the parse is successful, otherwise,
a ParseStatus.getEmptyParse().
- Throws:
ParseException - If there is no suitable Parser found
to perform the parse.
isParserRegistered
public boolean isParserRegistered(String extensionId)
Copyright © 2007, 2012, Oracle and/or its affiliates. All rights reserved.