|
||||||||||
| PREV NEXT | FRAMES NO FRAMES | |||||||||
Configuration.
String can be decoded in reverse and the
first character is represented by a terminal node.
String can be decoded and the last character is
represented by a terminal node.
position.
CircularDependencyException will be thrown if a circular
dependency is detected.Configuration.Configuration.DataInput implementation that reads from an in-memory
buffer.DataOutput implementation that writes to an in-memory
buffer.application/octet-stream MimeType
WritableComparable
implementation.
o is a FloatWritable with the same value.
o is a IntWritable with the same value.
o is a LongWritable with the same value.
o is an MD5Hash whose digest contains the
same values.
o is a Text with the same contents.
o is a VIntWritable with the same value.
o is a VLongWritable with the same value.
Extension is a kind of listener descriptor that will be
installed on a concrete ExtensionPoint that acts as kind of
Publisher.ExtensionPoint provide meta information of a extension
point.what in the backing
buffer, starting as position start.
name property.
name property, or null if no
such property exists.
name property.
WritableComparable implementation.
name property as an boolean.
name property as a Class.
name property as a Class.
name.
name.
name property as a float.
name property as an integer.
name property as a long.
name property, or null if no such
property exists.
Outlink from given plain text.
Outlink from given plain text and adds anchor
to the extracted Outlinks
Parser instance with the specified
extId, representing its extension ID.
Parsers for a given content type.
Plugin class.
null.
Protocol implementation for a url.
Content for a fetchlist entry.
name property as an array of
strings.
HtmlParseFilter implementing plugins.false if the robots.txt file
prohibits us from accessing the given url, or
true otherwise.
s padded with leading spaces so
that it's length is length.
input that is matched,
or null if no match exists.
- longestMatch(String) -
Method in class org.apache.nutch.util.SuffixStringMatcher
- Returns the longest suffix of
input that is matched,
or null if no match exists.
- longestMatch(String) -
Method in class org.apache.nutch.util.TrieStringMatcher
- Returns the longest substring of
input that is
matched by a pattern in the trie, or null if no match
exists.
- LongWritable - Class in org.apache.hadoop.io
- A WritableComparable for longs.
- LongWritable() -
Constructor for class org.apache.hadoop.io.LongWritable
-
- LongWritable(long) -
Constructor for class org.apache.hadoop.io.LongWritable
-
- LongWritable.Comparator - Class in org.apache.hadoop.io
- A Comparator optimized for LongWritable.
- LongWritable.Comparator() -
Constructor for class org.apache.hadoop.io.LongWritable.Comparator
-
- LongWritable.DecreasingComparator - Class in org.apache.hadoop.io
- A decreasing Comparator optimized for LongWritable.
- LongWritable.DecreasingComparator() -
Constructor for class org.apache.hadoop.io.LongWritable.DecreasingComparator
-
java.util.HashMap.TrieStringMatcher.TrieNode visited, given that you are at
node, and the the next character in the input is
the idx'th character of s.
String is matched by a
prefix in the trie
String is matched by a
suffix in the trie
String is matched by a
pattern in the trie
MissingDependencyException will be thrown if a plugin
dependency cannot be found.WritableComparable instance.
Configurations that include Nutch-specific
resources.Plugin System.Outlinks
/ URLs from plain text using Regular Expressions.Parsers
until a successful parse is performed and a Parse object is
returned.
Content object using the Parser specified
by the parameter extId, i.e., the Parser's extension ID.
Protocol
implementation.Parser plugins.Parsers to obtain
Parse objects.FileSystem.PluginClassLoader contains only classes of the runtime
libraries setuped in the plugin manifest file and exported libraries of
plugins that are required pluguin.PluginDescriptor provide access to all meta information of
a nutch-plugin, as well to the internationalizable resources and the plugin
own classloader.PluginManifestParser parser just parse the manifest file
in all plugin directories.PluginRuntimeException will be thrown until a exception in the
plugin managemnt occurs.Strings against a set
of prefixes.PrefixStringMatcher which will match
Strings with any prefix in the supplied array.
PrefixStringMatcher which will match
Strings with any prefix in the supplied
Collection.
ProtocolException instead.Protocol plugins.in.
CompressedWritable.readFields(DataInput).
Writable, String, primitive type, or an array of
the preceding.
Writable, String, primitive type, or an array of
the preceding.
false.
s padded with trailing spaces so
that it's length is length.
org.apache.nutch.fetcher.Fetcher when processing
redirect URLs.
org.apache.nutch.crawl.Generator.
org.apache.nutch.crawl.Injector.
Outlink instances.
org.apache.nutch.crawl.PartitionUrlByHost.
name property.
baseHref.
name property to an integer.
name property to the name of a class.
name property to an integer.
name property to a long.
noCache to true.
noFollow to true.
noIndex to true.
name property.
refresh to the supplied value.
refreshHref.
refreshTime.
input that is matched,
or null if no match exists.
- shortestMatch(String) -
Method in class org.apache.nutch.util.SuffixStringMatcher
- Returns the shortest suffix of
input that is matched,
or null if no match exists.
- shortestMatch(String) -
Method in class org.apache.nutch.util.TrieStringMatcher
- Returns the shortest substring of
input that is
matched by a pattern in the trie, or null if no match
exists.
- shutDown() -
Method in class org.apache.nutch.plugin.Plugin
- Shutdown the plugin.
- Signature - Class in org.apache.nutch.crawl
-
- Signature() -
Constructor for class org.apache.nutch.crawl.Signature
-
- SIGNATURE_KEY -
Static variable in interface org.apache.nutch.metadata.Nutch
-
- SignatureComparator - Class in org.apache.nutch.crawl
-
- SignatureComparator() -
Constructor for class org.apache.nutch.crawl.SignatureComparator
-
- SignatureFactory - Class in org.apache.nutch.crawl
- Factory class, which instantiates a Signature implementation according to the
current Configuration configuration.
- simpleHostname(String) -
Static method in class org.apache.hadoop.util.StringUtils
- Given a full hostname, return the word upto the first dot.
- size() -
Method in class org.apache.nutch.crawl.MapWritable
-
- size() -
Method in class org.apache.nutch.metadata.Metadata
- Returns the number of metadata names in this metadata.
- skip(DataInput) -
Static method in class org.apache.hadoop.io.Text
- Skips over one Text in the input.
- skip(DataInput) -
Static method in class org.apache.nutch.parse.Outlink
- Skips over one Outlink in the input.
- skipCompressedByteArray(DataInput) -
Static method in class org.apache.hadoop.io.WritableUtils
-
- SOURCE -
Static variable in interface org.apache.nutch.metadata.DublinCore
- A reference to a resource from which the present resource is derived.
- SpellCheckedMetadata - Class in org.apache.nutch.metadata
- A decorator to Metadata that adds spellchecking capabilities to property
names.
- SpellCheckedMetadata() -
Constructor for class org.apache.nutch.metadata.SpellCheckedMetadata
-
- startUp() -
Method in class org.apache.nutch.plugin.Plugin
- Will be invoked until plugin start up.
- statNames -
Static variable in class org.apache.nutch.crawl.CrawlDatum
-
- STATUS_BLOCKED -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_DB_FETCHED -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page was successfully fetched.
- STATUS_DB_GONE -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page no longer exists.
- STATUS_DB_MAX -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Maximum value of DB-related status.
- STATUS_DB_REDIR_PERM -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page permanently redirects to other page.
- STATUS_DB_REDIR_TEMP -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page temporarily redirects to other page.
- STATUS_DB_UNFETCHED -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page was not fetched yet.
- STATUS_FAILED -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_FAILURE -
Static variable in class org.apache.nutch.parse.ParseStatus
-
- STATUS_FETCH_CONTENT_LIMIT_EXCEEDED -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Fetching was successful but content was truncated
- STATUS_FETCH_GONE -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Fetching unsuccessful - page is gone.
- STATUS_FETCH_MAX -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Maximum value of fetch-related status.
- STATUS_FETCH_REDIR_PERM -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Fetching permanently redirected to other page.
- STATUS_FETCH_REDIR_TEMP -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Fetching temporarily redirected to other page.
- STATUS_FETCH_RETRY -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Fetching unsuccessful, needs to be retried (transient errors).
- STATUS_FETCH_SUCCESS -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Fetching was successful.
- STATUS_GONE -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_INJECTED -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page was newly injected.
- STATUS_LINKED -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page discovered through a link.
- STATUS_NOTFETCHING -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_NOTFOUND -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_NOTMODIFIED -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_NOTPARSED -
Static variable in class org.apache.nutch.parse.ParseStatus
-
- STATUS_REDIR_EXCEEDED -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_RETRY -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_ROBOTS_DENIED -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_SIGNATURE -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page signature.
- STATUS_SUCCESS -
Static variable in class org.apache.nutch.parse.ParseStatus
-
- STATUS_SUCCESS -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_WOULDBLOCK -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- stringifyException(Throwable) -
Static method in class org.apache.hadoop.util.StringUtils
- Make a string representation of the exception.
- stringToPath(String[]) -
Static method in class org.apache.hadoop.util.StringUtils
-
- stringToURI(String[]) -
Static method in class org.apache.hadoop.util.StringUtils
-
- StringUtil - Class in org.apache.nutch.util
- A collection of String processing utility methods.
- StringUtil() -
Constructor for class org.apache.nutch.util.StringUtil
-
- StringUtils - Class in org.apache.hadoop.util
- General string utils
- StringUtils() -
Constructor for class org.apache.hadoop.util.StringUtils
-
- SUBJECT -
Static variable in interface org.apache.nutch.metadata.DublinCore
- The topic of the content of the resource.
- SUCCESS -
Static variable in class org.apache.nutch.parse.ParseStatus
- Parsing succeeded.
- SUCCESS -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
- Content was retrieved without errors.
- SUCCESS_REDIRECT -
Static variable in class org.apache.nutch.parse.ParseStatus
- Parsed content contains a directive to redirect to another URL.
- suffix(String) -
Method in class org.apache.hadoop.fs.Path
- Adds a suffix to the final name in the path.
- SuffixStringMatcher - Class in org.apache.nutch.util
- A class for efficiently matching
Strings against a set
of suffixes. - SuffixStringMatcher(String[]) -
Constructor for class org.apache.nutch.util.SuffixStringMatcher
- Creates a new
PrefixStringMatcher which will match
Strings with any suffix in the supplied array.
- SuffixStringMatcher(Collection) -
Constructor for class org.apache.nutch.util.SuffixStringMatcher
- Creates a new
PrefixStringMatcher which will match
Strings with any suffix in the supplied
Collection
StringUtil.toHexString(byte[], String, int), where
sep = null; lineLen = Integer.MAX_VALUE.
sizeLimit bytes, if necessary.
URLFilter implementing plugins.VersionedWritable.readFields(DataInput) when the
version of an object being read does not match the current implementation
version as returned by VersionedWritable.getVersion().DataInput and
DataOutput.Writable and Comparable.WritableComparables.WritableComparable implementation.
out.
CompressedWritable.write(DataOutput).
Writable, String, primitive type, or an array of
the preceding.
|
||||||||||
| PREV NEXT | FRAMES NO FRAMES | |||||||||