org.apache.nutch.parse
Class ParseUtil
java.lang.Object
org.apache.nutch.parse.ParseUtil
public class ParseUtil
- extends Object
A Utility class containing methods to simply perform parsing utilities such
as iterating through a preferred list of Parser
s to obtain
Parse
objects.
- Author:
- mattmann, Jérôme Charron, Sébastien Le Callonnec
Field Summary |
static org.apache.log4j.Logger |
mLogger
|
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
mLogger
public static final org.apache.log4j.Logger mLogger
ParseUtil
public ParseUtil(Configuration conf)
- Parameters:
conf
-
parse
public Parse parse(Content content)
throws ParseException
- Performs a parse by iterating through a List of preferred
Parser
s
until a successful parse is performed and a Parse
object is
returned. If the parse is unsuccessful, a message is logged to the
WARNING
level, and an empty parse is returned.
- Parameters:
content
- The content to try and parse.
- Returns:
- A
Parse
object containing the parsed data.
- Throws:
ParseException
- If no suitable parser is found to perform the parse.
parseByExtensionId
public Parse parseByExtensionId(String extId,
Content content)
throws ParseException
- Method parses a
Content
object using the Parser
specified
by the parameter extId
, i.e., the Parser's extension ID.
If a suitable Parser
is not found, then a WARNING
level message is logged, and a ParseException is thrown. If the parse is
uncessful for any other reason, then a WARNING
level
message is logged, and a ParseStatus.getEmptyParse()
is
returned.
- Parameters:
extId
- The extension implementation ID of the Parser
to use
to parse the specified content.content
- The content to parse.
- Returns:
- A
Parse
object if the parse is successful, otherwise,
a ParseStatus.getEmptyParse()
.
- Throws:
ParseException
- If there is no suitable Parser
found
to perform the parse.
isParserRegistered
public boolean isParserRegistered(String extensionId)
Copyright © 2007, 2012, Oracle and/or its affiliates. All rights reserved.