|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
Configuration
.
String
can be decoded in reverse and the
first character is represented by a terminal node.
String
can be decoded and the last character is
represented by a terminal node.
position
.
CircularDependencyException
will be thrown if a circular
dependency is detected.Configuration
.Configuration
.DataInput
implementation that reads from an in-memory
buffer.DataOutput
implementation that writes to an in-memory
buffer.application/octet-stream
MimeType
WritableComparable
implementation.
o
is a FloatWritable with the same value.
o
is a IntWritable with the same value.
o
is a LongWritable with the same value.
o
is an MD5Hash whose digest contains the
same values.
o
is a Text with the same contents.
o
is a VIntWritable with the same value.
o
is a VLongWritable with the same value.
Extension
is a kind of listener descriptor that will be
installed on a concrete ExtensionPoint
that acts as kind of
Publisher.ExtensionPoint
provide meta information of a extension
point.what
in the backing
buffer, starting as position start
.
name
property.
name
property, or null if no
such property exists.
name
property.
WritableComparable
implementation.
name
property as an boolean.
name
property as a Class.
name
property as a Class.
name
.
name
.
name
property as a float.
name
property as an integer.
name
property as a long.
name
property, or null if no such
property exists.
Outlink
from given plain text.
Outlink
from given plain text and adds anchor
to the extracted Outlink
s
Parser
instance with the specified
extId
, representing its extension ID.
Parser
s for a given content type.
Plugin
class.
null
.
Protocol
implementation for a url.
Content
for a fetchlist entry.
name
property as an array of
strings.
HtmlParseFilter
implementing plugins.false
if the robots.txt
file
prohibits us from accessing the given url
, or
true
otherwise.
s
padded with leading spaces so
that it's length is length
.
input that is matched,
or null if no match exists.
- longestMatch(String) -
Method in class org.apache.nutch.util.SuffixStringMatcher
- Returns the longest suffix of
input that is matched,
or null if no match exists.
- longestMatch(String) -
Method in class org.apache.nutch.util.TrieStringMatcher
- Returns the longest substring of
input that is
matched by a pattern in the trie, or null if no match
exists.
- LongWritable - Class in org.apache.hadoop.io
- A WritableComparable for longs.
- LongWritable() -
Constructor for class org.apache.hadoop.io.LongWritable
-
- LongWritable(long) -
Constructor for class org.apache.hadoop.io.LongWritable
-
- LongWritable.Comparator - Class in org.apache.hadoop.io
- A Comparator optimized for LongWritable.
- LongWritable.Comparator() -
Constructor for class org.apache.hadoop.io.LongWritable.Comparator
-
- LongWritable.DecreasingComparator - Class in org.apache.hadoop.io
- A decreasing Comparator optimized for LongWritable.
- LongWritable.DecreasingComparator() -
Constructor for class org.apache.hadoop.io.LongWritable.DecreasingComparator
-
java.util.HashMap
.TrieStringMatcher.TrieNode
visited, given that you are at
node
, and the the next character in the input is
the idx
'th character of s
.
String
is matched by a
prefix in the trie
String
is matched by a
suffix in the trie
String
is matched by a
pattern in the trie
MissingDependencyException
will be thrown if a plugin
dependency cannot be found.WritableComparable
instance.
Configuration
s that include Nutch-specific
resources.Plugin
System.Outlink
s
/ URLs from plain text using Regular Expressions.Parser
s
until a successful parse is performed and a Parse
object is
returned.
Content
object using the Parser
specified
by the parameter extId
, i.e., the Parser's extension ID.
Protocol
implementation.Parser
plugins.Parser
s to obtain
Parse
objects.FileSystem
.PluginClassLoader
contains only classes of the runtime
libraries setuped in the plugin manifest file and exported libraries of
plugins that are required pluguin.PluginDescriptor
provide access to all meta information of
a nutch-plugin, as well to the internationalizable resources and the plugin
own classloader.PluginManifestParser
parser just parse the manifest file
in all plugin directories.PluginRuntimeException
will be thrown until a exception in the
plugin managemnt occurs.String
s against a set
of prefixes.PrefixStringMatcher
which will match
String
s with any prefix in the supplied array.
PrefixStringMatcher
which will match
String
s with any prefix in the supplied
Collection
.
ProtocolException
instead.Protocol
plugins.in
.
CompressedWritable.readFields(DataInput)
.
Writable
, String
, primitive type, or an array of
the preceding.
Writable
, String
, primitive type, or an array of
the preceding.
false
.
s
padded with trailing spaces so
that it's length is length
.
org.apache.nutch.fetcher.Fetcher
when processing
redirect URLs.
org.apache.nutch.crawl.Generator
.
org.apache.nutch.crawl.Injector
.
Outlink
instances.
org.apache.nutch.crawl.PartitionUrlByHost
.
name
property.
baseHref
.
name
property to an integer.
name
property to the name of a class.
name
property to an integer.
name
property to a long.
noCache
to true
.
noFollow
to true
.
noIndex
to true
.
name
property.
refresh
to the supplied value.
refreshHref
.
refreshTime
.
input that is matched,
or null if no match exists.
- shortestMatch(String) -
Method in class org.apache.nutch.util.SuffixStringMatcher
- Returns the shortest suffix of
input that is matched,
or null if no match exists.
- shortestMatch(String) -
Method in class org.apache.nutch.util.TrieStringMatcher
- Returns the shortest substring of
input that is
matched by a pattern in the trie, or null if no match
exists.
- shutDown() -
Method in class org.apache.nutch.plugin.Plugin
- Shutdown the plugin.
- Signature - Class in org.apache.nutch.crawl
-
- Signature() -
Constructor for class org.apache.nutch.crawl.Signature
-
- SIGNATURE_KEY -
Static variable in interface org.apache.nutch.metadata.Nutch
-
- SignatureComparator - Class in org.apache.nutch.crawl
-
- SignatureComparator() -
Constructor for class org.apache.nutch.crawl.SignatureComparator
-
- SignatureFactory - Class in org.apache.nutch.crawl
- Factory class, which instantiates a Signature implementation according to the
current Configuration configuration.
- simpleHostname(String) -
Static method in class org.apache.hadoop.util.StringUtils
- Given a full hostname, return the word upto the first dot.
- size() -
Method in class org.apache.nutch.crawl.MapWritable
-
- size() -
Method in class org.apache.nutch.metadata.Metadata
- Returns the number of metadata names in this metadata.
- skip(DataInput) -
Static method in class org.apache.hadoop.io.Text
- Skips over one Text in the input.
- skip(DataInput) -
Static method in class org.apache.nutch.parse.Outlink
- Skips over one Outlink in the input.
- skipCompressedByteArray(DataInput) -
Static method in class org.apache.hadoop.io.WritableUtils
-
- SOURCE -
Static variable in interface org.apache.nutch.metadata.DublinCore
- A reference to a resource from which the present resource is derived.
- SpellCheckedMetadata - Class in org.apache.nutch.metadata
- A decorator to Metadata that adds spellchecking capabilities to property
names.
- SpellCheckedMetadata() -
Constructor for class org.apache.nutch.metadata.SpellCheckedMetadata
-
- startUp() -
Method in class org.apache.nutch.plugin.Plugin
- Will be invoked until plugin start up.
- statNames -
Static variable in class org.apache.nutch.crawl.CrawlDatum
-
- STATUS_BLOCKED -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_DB_FETCHED -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page was successfully fetched.
- STATUS_DB_GONE -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page no longer exists.
- STATUS_DB_MAX -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Maximum value of DB-related status.
- STATUS_DB_REDIR_PERM -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page permanently redirects to other page.
- STATUS_DB_REDIR_TEMP -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page temporarily redirects to other page.
- STATUS_DB_UNFETCHED -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page was not fetched yet.
- STATUS_FAILED -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_FAILURE -
Static variable in class org.apache.nutch.parse.ParseStatus
-
- STATUS_FETCH_CONTENT_LIMIT_EXCEEDED -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Fetching was successful but content was truncated
- STATUS_FETCH_GONE -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Fetching unsuccessful - page is gone.
- STATUS_FETCH_MAX -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Maximum value of fetch-related status.
- STATUS_FETCH_REDIR_PERM -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Fetching permanently redirected to other page.
- STATUS_FETCH_REDIR_TEMP -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Fetching temporarily redirected to other page.
- STATUS_FETCH_RETRY -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Fetching unsuccessful, needs to be retried (transient errors).
- STATUS_FETCH_SUCCESS -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Fetching was successful.
- STATUS_GONE -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_INJECTED -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page was newly injected.
- STATUS_LINKED -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page discovered through a link.
- STATUS_NOTFETCHING -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_NOTFOUND -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_NOTMODIFIED -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_NOTPARSED -
Static variable in class org.apache.nutch.parse.ParseStatus
-
- STATUS_REDIR_EXCEEDED -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_RETRY -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_ROBOTS_DENIED -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_SIGNATURE -
Static variable in class org.apache.nutch.crawl.CrawlDatum
- Page signature.
- STATUS_SUCCESS -
Static variable in class org.apache.nutch.parse.ParseStatus
-
- STATUS_SUCCESS -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_WOULDBLOCK -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- stringifyException(Throwable) -
Static method in class org.apache.hadoop.util.StringUtils
- Make a string representation of the exception.
- stringToPath(String[]) -
Static method in class org.apache.hadoop.util.StringUtils
-
- stringToURI(String[]) -
Static method in class org.apache.hadoop.util.StringUtils
-
- StringUtil - Class in org.apache.nutch.util
- A collection of String processing utility methods.
- StringUtil() -
Constructor for class org.apache.nutch.util.StringUtil
-
- StringUtils - Class in org.apache.hadoop.util
- General string utils
- StringUtils() -
Constructor for class org.apache.hadoop.util.StringUtils
-
- SUBJECT -
Static variable in interface org.apache.nutch.metadata.DublinCore
- The topic of the content of the resource.
- SUCCESS -
Static variable in class org.apache.nutch.parse.ParseStatus
- Parsing succeeded.
- SUCCESS -
Static variable in class org.apache.nutch.protocol.ProtocolStatus
- Content was retrieved without errors.
- SUCCESS_REDIRECT -
Static variable in class org.apache.nutch.parse.ParseStatus
- Parsed content contains a directive to redirect to another URL.
- suffix(String) -
Method in class org.apache.hadoop.fs.Path
- Adds a suffix to the final name in the path.
- SuffixStringMatcher - Class in org.apache.nutch.util
- A class for efficiently matching
String
s against a set
of suffixes. - SuffixStringMatcher(String[]) -
Constructor for class org.apache.nutch.util.SuffixStringMatcher
- Creates a new
PrefixStringMatcher
which will match
String
s with any suffix in the supplied array.
- SuffixStringMatcher(Collection) -
Constructor for class org.apache.nutch.util.SuffixStringMatcher
- Creates a new
PrefixStringMatcher
which will match
String
s with any suffix in the supplied
Collection
StringUtil.toHexString(byte[], String, int)
, where
sep = null; lineLen = Integer.MAX_VALUE
.
sizeLimit
bytes, if necessary.
URLFilter
implementing plugins.VersionedWritable.readFields(DataInput)
when the
version of an object being read does not match the current implementation
version as returned by VersionedWritable.getVersion()
.DataInput
and
DataOutput
.Writable
and Comparable
.WritableComparable
s.WritableComparable
implementation.
out
.
CompressedWritable.write(DataOutput)
.
Writable
, String
, primitive type, or an array of
the preceding.
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |