- generate(Content, Parse) - Method in interface com.endeca.itl.web.process.EndecaRecordGenerator
-
- GENERATE_DIR_NAME - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
- GENERATE_TIME_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
-
- GenericWritable - Class in org.apache.hadoop.io
-
A wrapper for Writable instances.
- GenericWritable() - Constructor for class org.apache.hadoop.io.GenericWritable
-
- get(String, Object) - Method in class org.apache.hadoop.conf.Configuration
-
Returns the value of the name
property.
- get(String) - Method in class org.apache.hadoop.conf.Configuration
-
Returns the value of the name
property, or null if no
such property exists.
- get(String, String) - Method in class org.apache.hadoop.conf.Configuration
-
Returns the value of the name
property.
- get() - Method in class org.apache.hadoop.io.ArrayWritable
-
- get() - Method in class org.apache.hadoop.io.BooleanWritable
-
Returns the value of the BooleanWritable
- get() - Method in class org.apache.hadoop.io.BytesWritable
-
Get the data from the BytesWritable.
- get() - Method in class org.apache.hadoop.io.FloatWritable
-
Return the value of this FloatWritable.
- get() - Method in class org.apache.hadoop.io.GenericWritable
-
Return the wrapped instance.
- get() - Method in class org.apache.hadoop.io.IntWritable
-
Return the value of this IntWritable.
- get() - Method in class org.apache.hadoop.io.LongWritable
-
Return the value of this LongWritable.
- get() - Static method in class org.apache.hadoop.io.NullWritable
-
Returns the single instance of this class.
- get() - Method in class org.apache.hadoop.io.ObjectWritable
-
Return the instance, or null if none.
- get() - Method in class org.apache.hadoop.io.TwoDArrayWritable
-
- get() - Method in class org.apache.hadoop.io.VIntWritable
-
Return the value of this VIntWritable.
- get() - Method in class org.apache.hadoop.io.VLongWritable
-
Return the value of this LongWritable.
- get(Class) - Static method in class org.apache.hadoop.io.WritableComparator
-
- get(Writable) - Method in class org.apache.nutch.crawl.MapWritable
-
- get(String) - Method in class org.apache.nutch.metadata.Metadata
-
Get the value associated to a metadata name.
- get(String) - Method in class org.apache.nutch.metadata.SpellCheckedMetadata
-
- get(Configuration) - Static method in class org.apache.nutch.plugin.PluginRepository
-
- get(String, Configuration) - Static method in class org.apache.nutch.util.mime.MimeTypes
-
Return a MimeTypes instance.
- get(String, Log, Configuration) - Static method in class org.apache.nutch.util.mime.MimeTypes
-
Return a MimeTypes instance.
- getActionUrl() - Method in class com.endeca.itl.web.auth.AuthenticatorConfiguration
-
- getAnchor() - Method in class org.apache.nutch.parse.Outlink
-
- getArgs() - Method in class org.apache.nutch.parse.ParseStatus
-
- getArgs() - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- getAttribute(String) - Method in class org.apache.nutch.plugin.Extension
-
Returns a attribute value, that is setuped in the manifest file and is
definied by the extension point xml schema.
- getAuthConf() - Method in interface com.endeca.itl.web.auth.Authenticator
-
- getAuthenticator(String) - Method in class com.endeca.itl.web.auth.AuthenticatorManager
-
- getBaseHref() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
A convenience method.
- getBaseUrl() - Method in class org.apache.nutch.protocol.Content
-
The base url for relative links contained in the content.
- getBoolean(String, boolean) - Method in class org.apache.hadoop.conf.Configuration
-
Returns the value of the name
property as an boolean.
- getBytes() - Method in class org.apache.hadoop.io.Text
-
Retuns the raw bytes.
- getCapacity() - Method in class org.apache.hadoop.io.BytesWritable
-
Get the capacity, which is the maximum size that could handled without
resizing the backing storage.
- getClass(String, Class) - Method in class org.apache.hadoop.conf.Configuration
-
Returns the value of the name
property as a Class.
- getClass(String, Class, Class) - Method in class org.apache.hadoop.conf.Configuration
-
Returns the value of the name
property as a Class.
- getClass(String, Configuration) - Static method in class org.apache.hadoop.io.WritableName
-
Return the class for a name.
- getClassByName(String) - Method in class org.apache.hadoop.conf.Configuration
-
Load a class by name.
- getClassLoader() - Method in class org.apache.hadoop.conf.Configuration
-
Get the class loader for this job.
- getClassLoader() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns a cached classloader for a plugin.
- getClassName() - Method in class com.endeca.itl.web.auth.AuthenticatorConfiguration
-
- getClazz() - Method in class org.apache.nutch.plugin.Extension
-
Returns the full class name of the extension point implementation
- getCode() - Method in interface org.apache.nutch.net.protocols.Response
-
Returns the response code.
- getCode() - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- getConf() - Method in interface org.apache.hadoop.conf.Configurable
-
Return the configuration used by this object.
- getConf() - Method in class org.apache.hadoop.conf.Configured
-
- getConf() - Method in class org.apache.hadoop.io.ObjectWritable
-
- getConf() - Method in class org.apache.nutch.crawl.Signature
-
- getConf() - Method in class org.apache.nutch.fetcher.FetcherOutput
-
- getConf() - Method in class org.apache.nutch.parse.ParseData
-
- getConf() - Method in class org.apache.nutch.parse.ParseImpl
-
- getConfResourceAsInputStream(String) - Method in class org.apache.hadoop.conf.Configuration
-
Returns an input stream attached to the configuration resource with the
given name
.
- getConfResourceAsReader(String) - Method in class org.apache.hadoop.conf.Configuration
-
Returns a reader attached to the configuration resource with the
given name
.
- getContent() - Method in class org.apache.nutch.fetcher.FetcherOutput
-
- getContent() - Method in interface org.apache.nutch.net.protocols.Response
-
Returns the full content of the response.
- getContent() - Method in class org.apache.nutch.protocol.Content
-
The binary content retrieved.
- getContent() - Method in class org.apache.nutch.protocol.ProtocolOutput
-
- getContentMeta() - Method in class org.apache.nutch.parse.ParseData
-
The original Metadata retrieved from content
- getContentType() - Method in exception org.apache.nutch.parse.ParserNotFound
-
- getContentType() - Method in class org.apache.nutch.protocol.Content
-
The media type of the retrieved content.
- getCrawlDatum() - Method in class org.apache.nutch.fetcher.FetcherOutput
-
- getCrawlDelay() - Method in class org.apache.nutch.protocol.EmptyRobotRules
-
- getCrawlDelay() - Method in interface org.apache.nutch.protocol.RobotRules
-
Get Crawl-Delay, in milliseconds.
- getData() - Method in class org.apache.hadoop.io.DataOutputBuffer
-
Returns the current contents of the buffer.
- getData() - Method in interface org.apache.nutch.parse.Parse
-
Other data extracted from the page.
- getData() - Method in class org.apache.nutch.parse.ParseImpl
-
- getDebugStream(Log) - Static method in class org.apache.nutch.util.LogUtil
-
- getDeclaredClass() - Method in class org.apache.hadoop.io.ObjectWritable
-
Return the class this is meant to be.
- getDependencies() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns a array of plugin ids.
- getDescriptor() - Method in class org.apache.nutch.plugin.Extension
-
return the plugin descriptor.
- getDescriptor() - Method in class org.apache.nutch.plugin.Plugin
-
Returns the plugin descriptor
- getDigest() - Method in class org.apache.hadoop.io.MD5Hash
-
Returns the digest bytes.
- getDOMRoot() - Method in class org.apache.nutch.parse.ParseData
-
Retrieve the DOM, if there is one.
- getEmptyParse(Configuration) - Method in class org.apache.nutch.parse.ParseStatus
-
A convenience method.
- getErrorStream(Log) - Static method in class org.apache.nutch.util.LogUtil
-
- getExpireTime() - Method in class org.apache.nutch.protocol.EmptyRobotRules
-
- getExpireTime() - Method in interface org.apache.nutch.protocol.RobotRules
-
Get expire time
- getExportedLibUrls() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns a array exported librareis as URLs
- getExtensionInstance() - Method in class org.apache.nutch.plugin.Extension
-
Return an instance of the extension implementatio.
- getExtensionList(Extension[]) - Method in class org.apache.nutch.parse.ParseFilters
-
- getExtensionPoint(String) - Method in class org.apache.nutch.plugin.PluginRepository
-
Returns a extension point indentified by a extension point id.
- getExtensions(String) - Method in class org.apache.nutch.parse.ParserFactory
-
Finds the best-suited parse plugin for a given contentType.
- getExtensions() - Method in class org.apache.nutch.plugin.ExtensionPoint
-
Returns a array of extensions that lsiten to this extension point
- getExtensions() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns an array of extensions.
- getExtenstionPoints() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns a array of extension points.
- getFactory(Class) - Static method in class org.apache.hadoop.io.WritableFactories
-
Define a factory for a class.
- getFatalStream(Log) - Static method in class org.apache.nutch.util.LogUtil
-
- getFetchInterval() - Method in class org.apache.nutch.crawl.CrawlDatum
-
- getFetchTime() - Method in class org.apache.nutch.crawl.CrawlDatum
-
- getFile(String, String) - Method in class org.apache.hadoop.conf.Configuration
-
Returns a local file name under a directory named in dirsProp with
the given path.
- getFloat(String, float) - Method in class org.apache.hadoop.conf.Configuration
-
Returns the value of the name
property as a float.
- getFormattedTimeWithDiff(DateFormat, long, long) - Static method in class org.apache.hadoop.util.StringUtils
-
Formats time in ms and appends difference (finishTime - startTime)
as returned by formatTimeDiff().
- getGeneralTags() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Returns all collected values of the general meta tags.
- getHeader(String) - Method in interface org.apache.nutch.net.protocols.Response
-
Returns the value of a named header.
- getHeaders() - Method in interface org.apache.nutch.net.protocols.Response
-
Returns all the headers.
- getHttpEquivTags() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Returns all collected values of the "http-equiv" meta tags.
- getId() - Method in class org.apache.nutch.plugin.Extension
-
Return the unique id of the extension.
- getId() - Method in class org.apache.nutch.plugin.ExtensionPoint
-
Returns the unique id of the extension point.
- getInfoStream(Log) - Static method in class org.apache.nutch.util.LogUtil
-
- getInt(String, int) - Method in class org.apache.hadoop.conf.Configuration
-
Returns the value of the name
property as an integer.
- getKeyClass() - Method in class org.apache.hadoop.io.WritableComparator
-
Returns the WritableComparable implementation class.
- getLastModified() - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- getLength() - Method in class org.apache.hadoop.io.DataInputBuffer
-
Returns the length of the input.
- getLength() - Method in class org.apache.hadoop.io.DataOutputBuffer
-
Returns the length of the valid data currently in the buffer.
- getLength() - Method in class org.apache.hadoop.io.Text
-
Returns the number of bytes in the byte array
- getLocalPath(String, String) - Method in class org.apache.hadoop.conf.Configuration
-
Returns a local file under a directory named in dirsProp with
the given path.
- getLoginUrl() - Method in class com.endeca.itl.web.auth.AuthenticatorConfiguration
-
- getLong(String, long) - Method in class org.apache.hadoop.conf.Configuration
-
Returns the value of the name
property as a long.
- getMajorCode() - Method in class org.apache.nutch.parse.ParseStatus
-
- getMessage() - Method in class org.apache.nutch.parse.ParseStatus
-
A convenience method.
- getMessage() - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- getMeta(String) - Method in class org.apache.nutch.metadata.MetaWrapper
-
Get metadata.
- getMeta(String) - Method in class org.apache.nutch.parse.ParseData
-
Get a metadata single value.
- getMetaData() - Method in class org.apache.nutch.crawl.CrawlDatum
-
returns a MapWritable if it was set or read in @see readFields(DataInput),
returns empty map in case CrawlDatum was freshly created (lazily instantiated).
- getMetadata() - Method in class org.apache.nutch.metadata.MetaWrapper
-
Get all metadata.
- getMetadata() - Method in class org.apache.nutch.protocol.Content
-
Other protocol-specific data.
- getMetaTag() - Method in class org.apache.nutch.parse.ParseData
-
Returns the HTML meta tags which are populated by parsing
the meta tags in the head of an HTML document.
- getMetaValues(String) - Method in class org.apache.nutch.metadata.MetaWrapper
-
Get multiple metadata.
- getMethod() - Method in class com.endeca.itl.web.auth.AuthenticatorConfiguration
-
- getMimeType(File) - Method in class org.apache.nutch.util.mime.MimeTypes
-
Find the Mime Content Type of a file.
- getMimeType(URL) - Method in class org.apache.nutch.util.mime.MimeTypes
-
Find the Mime Content Type of a document from its URL.
- getMimeType(String) - Method in class org.apache.nutch.util.mime.MimeTypes
-
Find the Mime Content Type of a document from its name.
- getMimeType(byte[]) - Method in class org.apache.nutch.util.mime.MimeTypes
-
Find the Mime Content Type of a stream from its content.
- getMimeType(String, byte[]) - Method in class org.apache.nutch.util.mime.MimeTypes
-
Find the Mime Content Type of a document from its name and its content.
- getMinLength() - Method in class org.apache.nutch.util.mime.MimeTypes
-
Return the minimum length of data to provide to analyzing methods
based on the document's content in order to check all the known
MimeTypes.
- getMinorCode() - Method in class org.apache.nutch.parse.ParseStatus
-
- getMode(String) - Static method in class org.apache.nutch.net.URLScopeFilter
-
- getModifiedTime() - Method in class org.apache.nutch.crawl.CrawlDatum
-
- getName() - Method in class org.apache.hadoop.fs.Path
-
Returns the final component of this path.
- getName(Class) - Static method in class org.apache.hadoop.io.WritableName
-
Return the name for a class.
- getName() - Method in class org.apache.nutch.plugin.ExtensionPoint
-
Returns the name of the extension point.
- getName() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns the name of the plugin.
- getName() - Method in class org.apache.nutch.util.mime.MimeType
-
Return the name of this mime-type.
- getNoCache() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
A convenience method.
- getNoFollow() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
A convenience method.
- getNoIndex() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
A convenience method.
- getNormalizedName(String) - Static method in class org.apache.nutch.metadata.SpellCheckedMetadata
-
Get the normalized name of metadata attribute name.
- getNotExportedLibUrls() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns a array of libraries as URLs that are not exported by the plugin.
- getObject(String) - Method in class org.apache.hadoop.conf.Configuration
-
Returns the value of the name
property, or null if no such
property exists.
- getOutlinks(String, Configuration) - Static method in class org.apache.nutch.parse.OutlinkExtractor
-
Extracts Outlink
from given plain text.
- getOutlinks(String, String, Configuration) - Static method in class org.apache.nutch.parse.OutlinkExtractor
-
Extracts Outlink
from given plain text and adds anchor
to the extracted Outlink
s
- getOutlinks() - Method in class org.apache.nutch.parse.ParseData
-
The outlinks of the page.
- getParameters() - Method in class com.endeca.itl.web.auth.AuthenticatorConfiguration
-
- getParent() - Method in class org.apache.hadoop.fs.Path
-
Returns the parent of a path or null if at root.
- getParse() - Method in class org.apache.nutch.fetcher.FetcherOutput
-
- getParse(Content) - Method in interface org.apache.nutch.parse.Parser
-
Creates the parse for some content.
- getParseMeta() - Method in class org.apache.nutch.parse.ParseData
-
Other content properties.
- getParserById(String) - Method in class org.apache.nutch.parse.ParserFactory
-
Function returns a
Parser
instance with the specified
extId
, representing its extension ID.
- getParsers(String, String) - Method in class org.apache.nutch.parse.ParserFactory
-
Function returns an array of
Parser
s for a given content type.
- getPassword() - Method in class com.endeca.itl.web.auth.AuthenticatorConfiguration
-
- getPluginClass() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns the fully qualified name of the class which implements the abstarct
Plugin
class.
- getPluginDescriptor(String) - Method in class org.apache.nutch.plugin.PluginRepository
-
Returns the descriptor of one plugin identified by a plugin id.
- getPluginDescriptors() - Method in class org.apache.nutch.plugin.PluginRepository
-
Returns all registed plugin descriptors.
- getPluginFolder(String) - Method in class org.apache.nutch.plugin.PluginManifestParser
-
Return the named plugin folder.
- getPluginId() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns the unique identifier of the plug-in or null
.
- getPluginInstance(PluginDescriptor) - Method in class org.apache.nutch.plugin.PluginRepository
-
Returns a instance of a plugin.
- getPluginPath() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns the directory path of the plugin.
- getPosition() - Method in class org.apache.hadoop.io.DataInputBuffer
-
Returns the current position in the input.
- getPrimaryType() - Method in class org.apache.nutch.util.mime.MimeType
-
Return the primary type of this mime-type.
- getProperties() - Method in class com.endeca.itl.web.auth.AuthenticatorConfiguration
-
- getProtocol(String) - Method in class org.apache.nutch.protocol.ProtocolFactory
-
Returns the appropriate
Protocol
implementation for a url.
- getProtocolOutput(Text, CrawlDatum) - Method in interface org.apache.nutch.protocol.Protocol
-
Returns the
Content
for a fetchlist entry.
- getProviderName() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
- getRefresh() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
A convenience method.
- getRefreshHref() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
A convenience method.
- getRefreshTime() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
A convenience method.
- getResource(String) - Method in class org.apache.hadoop.conf.Configuration
-
Returns the URL for the named resource.
- getResourceString(String, Locale) - Method in class org.apache.nutch.plugin.PluginDescriptor
-
Returns a I18N'd resource string.
- getResponseCode() - Method in class org.apache.nutch.crawl.CrawlDatum
-
- getResponseCode() - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- getRetriesSinceFetch() - Method in class org.apache.nutch.crawl.CrawlDatum
-
- getRobotRules(Text, CrawlDatum) - Method in interface org.apache.nutch.protocol.Protocol
-
Retrieve robot rules applicable for this url.
- getRobotsDelay() - Method in class org.apache.nutch.crawl.CrawlDatum
-
- getSchema() - Method in class org.apache.nutch.plugin.ExtensionPoint
-
Returns a path to the xml schema of a extension point.
- getScore() - Method in class org.apache.nutch.crawl.CrawlDatum
-
- getSignature() - Method in class org.apache.nutch.crawl.CrawlDatum
-
- getSignature(Configuration) - Static method in class org.apache.nutch.crawl.SignatureFactory
-
Return the default Signature implementation.
- getSite() - Method in class com.endeca.itl.web.auth.AuthenticatorConfiguration
-
- getSize() - Method in class org.apache.hadoop.io.BytesWritable
-
Get the current size of the buffer.
- getStatus() - Method in class org.apache.nutch.crawl.CrawlDatum
-
- getStatus() - Method in class org.apache.nutch.parse.ParseData
-
The status of parsing the page.
- getStatus() - Method in class org.apache.nutch.protocol.ProtocolOutput
-
- getStatusName(byte) - Static method in class org.apache.nutch.crawl.CrawlDatum
-
- getStrings(String) - Method in class org.apache.hadoop.conf.Configuration
-
Returns the value of the name
property as an array of
strings.
- getStrings(String) - Static method in class org.apache.hadoop.util.StringUtils
-
returns an arraylist of strings
- getSubType() - Method in class org.apache.nutch.util.mime.MimeType
-
Return the sub type of this mime-type.
- getTargetPoint() - Method in class org.apache.nutch.plugin.Extension
-
Returns the Id of the extension point, that is implemented by this
extension.
- getText() - Method in interface org.apache.nutch.parse.Parse
-
The textual content of the page.
- getText() - Method in class org.apache.nutch.parse.ParseImpl
-
- getText() - Method in class org.apache.nutch.parse.ParseText
-
- getTitle() - Method in class org.apache.nutch.parse.ParseData
-
The title of the page.
- getToUrl() - Method in class org.apache.nutch.parse.Outlink
-
- getTraceStream(Log) - Static method in class org.apache.nutch.util.LogUtil
-
- getTypes() - Method in class org.apache.hadoop.io.GenericWritable
-
Return all classes that may be wrapped.
- getUrl() - Method in interface org.apache.nutch.net.protocols.Response
-
Returns the URL used to retrieve this response.
- getUrl() - Method in exception org.apache.nutch.parse.ParserNotFound
-
- getUrl() - Method in class org.apache.nutch.protocol.Content
-
The url fetched.
- getUrl() - Method in exception org.apache.nutch.protocol.ProtocolNotFound
-
- getUserPrincipal() - Method in class com.endeca.itl.web.auth.AuthenticatorConfiguration
-
- getValueClass() - Method in class org.apache.hadoop.io.ArrayWritable
-
- getValues(String) - Method in class org.apache.nutch.metadata.Metadata
-
Get the values associated to a metadata name.
- getValues(String) - Method in class org.apache.nutch.metadata.SpellCheckedMetadata
-
- getVersion() - Method in class org.apache.hadoop.io.VersionedWritable
-
Return the version number of the current implementation.
- getVersion() - Method in class org.apache.nutch.parse.ParseData
-
- getVersion() - Method in class org.apache.nutch.parse.ParseStatus
-
- getVersion() - Method in class org.apache.nutch.parse.ParseText
-
- getVersion() - Method in class org.apache.nutch.plugin.PluginDescriptor
-
- getVersion() - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- getVIntSize(long) - Static method in class org.apache.hadoop.io.WritableUtils
-
Get the encoded length if an integer is stored in a variable-length format
- getWarnStream(Log) - Static method in class org.apache.nutch.util.LogUtil
-
- GONE - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
Resource is gone.
- GZIPUtils - Class in org.apache.nutch.util
-
A collection of utility methods for working on GZIPed data.
- GZIPUtils() - Constructor for class org.apache.nutch.util.GZIPUtils
-
- PAGE_COUNT - Static variable in interface org.apache.nutch.metadata.Office
-
- Parse - Interface in org.apache.nutch.parse
-
The result of parsing a page's raw content.
- parse(Content) - Method in class org.apache.nutch.parse.ParseUtil
-
Performs a parse by iterating through a List of preferred
Parser
s
until a successful parse is performed and a
Parse
object is
returned.
- PARSE_DIR_NAME - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
- parseByExtensionId(String, Content) - Method in class org.apache.nutch.parse.ParseUtil
-
Method parses a
Content
object using the
Parser
specified
by the parameter
extId
, i.e., the Parser's extension ID.
- parseCharacterEncoding(String) - Static method in class org.apache.nutch.util.StringUtil
-
Parse the character encoding from the specified content type header.
- ParseData - Class in org.apache.nutch.parse
-
Data extracted from a page's content.
- ParseData() - Constructor for class org.apache.nutch.parse.ParseData
-
- ParseData(ParseStatus, String, Outlink[], Metadata) - Constructor for class org.apache.nutch.parse.ParseData
-
- ParseData(ParseStatus, String, Outlink[], Metadata, Metadata) - Constructor for class org.apache.nutch.parse.ParseData
-
- ParseData(ParseStatus, String, Outlink[], Metadata, Metadata, DocumentFragment, HTMLMetaTags) - Constructor for class org.apache.nutch.parse.ParseData
-
- ParseException - Exception in org.apache.nutch.parse
-
- ParseException() - Constructor for exception org.apache.nutch.parse.ParseException
-
- ParseException(String) - Constructor for exception org.apache.nutch.parse.ParseException
-
- ParseException(String, Throwable) - Constructor for exception org.apache.nutch.parse.ParseException
-
- ParseException(Throwable) - Constructor for exception org.apache.nutch.parse.ParseException
-
- ParseFilter - Interface in org.apache.nutch.parse
-
- PARSEFILTER_ORDER - Static variable in class org.apache.nutch.parse.HtmlParseFilters
-
Deprecated.
- ParseFilters - Class in org.apache.nutch.parse
-
- ParseFilters(Configuration) - Constructor for class org.apache.nutch.parse.ParseFilters
-
- ParseImpl - Class in org.apache.nutch.parse
-
The result of parsing a page's raw content.
- ParseImpl() - Constructor for class org.apache.nutch.parse.ParseImpl
-
- ParseImpl(Parse) - Constructor for class org.apache.nutch.parse.ParseImpl
-
- ParseImpl(String, ParseData) - Constructor for class org.apache.nutch.parse.ParseImpl
-
- ParseImpl(ParseText, ParseData) - Constructor for class org.apache.nutch.parse.ParseImpl
-
- parsePluginFolder(String[]) - Method in class org.apache.nutch.plugin.PluginManifestParser
-
Returns a list of all found plugin descriptors.
- Parser - Interface in org.apache.nutch.parse
-
A parser for content generated by a
Protocol
implementation.
- ParserFactory - Class in org.apache.nutch.parse
-
Creates and caches
Parser
plugins.
- ParserFactory(Configuration) - Constructor for class org.apache.nutch.parse.ParserFactory
-
- ParserNotFound - Exception in org.apache.nutch.parse
-
- ParserNotFound(String) - Constructor for exception org.apache.nutch.parse.ParserNotFound
-
- ParserNotFound(String, String) - Constructor for exception org.apache.nutch.parse.ParserNotFound
-
- ParserNotFound(String, String, String) - Constructor for exception org.apache.nutch.parse.ParserNotFound
-
- ParseStatus - Class in org.apache.nutch.parse
-
- ParseStatus() - Constructor for class org.apache.nutch.parse.ParseStatus
-
- ParseStatus(int, int, String[]) - Constructor for class org.apache.nutch.parse.ParseStatus
-
- ParseStatus(int) - Constructor for class org.apache.nutch.parse.ParseStatus
-
- ParseStatus(int, String[]) - Constructor for class org.apache.nutch.parse.ParseStatus
-
- ParseStatus(int, int) - Constructor for class org.apache.nutch.parse.ParseStatus
-
- ParseStatus(int, int, String) - Constructor for class org.apache.nutch.parse.ParseStatus
-
Simplified constructor for passing just a text message.
- ParseStatus(int, String) - Constructor for class org.apache.nutch.parse.ParseStatus
-
Simplified constructor for passing just a text message.
- ParseStatus(Throwable) - Constructor for class org.apache.nutch.parse.ParseStatus
-
- ParseText - Class in org.apache.nutch.parse
-
- ParseText() - Constructor for class org.apache.nutch.parse.ParseText
-
- ParseText(String) - Constructor for class org.apache.nutch.parse.ParseText
-
- ParseUtil - Class in org.apache.nutch.parse
-
A Utility class containing methods to simply perform parsing utilities such
as iterating through a preferred list of
Parser
s to obtain
Parse
objects.
- ParseUtil(Configuration) - Constructor for class org.apache.nutch.parse.ParseUtil
-
- Path - Class in org.apache.hadoop.fs
-
- Path(String, String) - Constructor for class org.apache.hadoop.fs.Path
-
Resolve a child path against a parent path.
- Path(Path, String) - Constructor for class org.apache.hadoop.fs.Path
-
Resolve a child path against a parent path.
- Path(String, Path) - Constructor for class org.apache.hadoop.fs.Path
-
Resolve a child path against a parent path.
- Path(Path, Path) - Constructor for class org.apache.hadoop.fs.Path
-
Resolve a child path against a parent path.
- Path(String) - Constructor for class org.apache.hadoop.fs.Path
-
Construct a path from a String.
- Path(String, String, String) - Constructor for class org.apache.hadoop.fs.Path
-
Construct a Path from components.
- Pluggable - Interface in org.apache.nutch.plugin
-
Defines the capability of a class to be plugged into Nutch.
- Plugin - Class in org.apache.nutch.plugin
-
A nutch-plugin is an container for a set of custom logic that provide
extensions to the nutch core functionality or another plugin that provides an
API for extending.
- Plugin(PluginDescriptor, Configuration) - Constructor for class org.apache.nutch.plugin.Plugin
-
Constructor
- PluginClassLoader - Class in org.apache.nutch.plugin
-
The PluginClassLoader
contains only classes of the runtime
libraries setuped in the plugin manifest file and exported libraries of
plugins that are required pluguin.
- PluginClassLoader(URL[], ClassLoader) - Constructor for class org.apache.nutch.plugin.PluginClassLoader
-
Construtor
- PluginDescriptor - Class in org.apache.nutch.plugin
-
The PluginDescriptor
provide access to all meta information of
a nutch-plugin, as well to the internationalizable resources and the plugin
own classloader.
- PluginDescriptor(String, String, String, String, String, String, Configuration) - Constructor for class org.apache.nutch.plugin.PluginDescriptor
-
Constructor
- PluginManifestParser - Class in org.apache.nutch.plugin
-
The PluginManifestParser
parser just parse the manifest file
in all plugin directories.
- PluginManifestParser(Configuration, PluginRepository) - Constructor for class org.apache.nutch.plugin.PluginManifestParser
-
- PluginRepository - Class in org.apache.nutch.plugin
-
The plugin repositority is a registry of all plugins.
- PluginRepository(Configuration) - Constructor for class org.apache.nutch.plugin.PluginRepository
-
- PluginRuntimeException - Exception in org.apache.nutch.plugin
-
PluginRuntimeException
will be thrown until a exception in the
plugin managemnt occurs.
- PluginRuntimeException(Throwable) - Constructor for exception org.apache.nutch.plugin.PluginRuntimeException
-
- PluginRuntimeException(String) - Constructor for exception org.apache.nutch.plugin.PluginRuntimeException
-
- preCrawlAuthenticate(Protocol) - Method in interface com.endeca.itl.web.auth.Authenticator
-
Authenticates the crawler for a particular site before the crawl starts.
- preCrawlAuthenticate(Protocol) - Method in class com.endeca.itl.web.auth.AuthenticatorManager
-
- PrefixStringMatcher - Class in org.apache.nutch.util
-
A class for efficiently matching String
s against a set
of prefixes.
- PrefixStringMatcher(String[]) - Constructor for class org.apache.nutch.util.PrefixStringMatcher
-
Creates a new PrefixStringMatcher
which will match
String
s with any prefix in the supplied array.
- PrefixStringMatcher(Collection) - Constructor for class org.apache.nutch.util.PrefixStringMatcher
-
Creates a new PrefixStringMatcher
which will match
String
s with any prefix in the supplied
Collection
.
- PROTO_NOT_FOUND - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
This protocol was not found.
- PROTO_STATUS_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
-
- Protocol - Interface in org.apache.nutch.protocol
-
A retriever of url content.
- ProtocolException - Exception in org.apache.nutch.net.protocols
-
- ProtocolException() - Constructor for exception org.apache.nutch.net.protocols.ProtocolException
-
Deprecated.
- ProtocolException(String) - Constructor for exception org.apache.nutch.net.protocols.ProtocolException
-
Deprecated.
- ProtocolException(String, Throwable) - Constructor for exception org.apache.nutch.net.protocols.ProtocolException
-
Deprecated.
- ProtocolException(Throwable) - Constructor for exception org.apache.nutch.net.protocols.ProtocolException
-
Deprecated.
- ProtocolException - Exception in org.apache.nutch.protocol
-
- ProtocolException() - Constructor for exception org.apache.nutch.protocol.ProtocolException
-
- ProtocolException(String) - Constructor for exception org.apache.nutch.protocol.ProtocolException
-
- ProtocolException(String, Throwable) - Constructor for exception org.apache.nutch.protocol.ProtocolException
-
- ProtocolException(Throwable) - Constructor for exception org.apache.nutch.protocol.ProtocolException
-
- ProtocolFactory - Class in org.apache.nutch.protocol
-
- ProtocolFactory(Configuration) - Constructor for class org.apache.nutch.protocol.ProtocolFactory
-
- ProtocolNotFound - Exception in org.apache.nutch.protocol
-
- ProtocolNotFound(String) - Constructor for exception org.apache.nutch.protocol.ProtocolNotFound
-
- ProtocolNotFound(String, String) - Constructor for exception org.apache.nutch.protocol.ProtocolNotFound
-
- ProtocolNotFound(String, String, Throwable) - Constructor for exception org.apache.nutch.protocol.ProtocolNotFound
-
- ProtocolOutput - Class in org.apache.nutch.protocol
-
Simple aggregate to pass from protocol plugins both content and
protocol status.
- ProtocolOutput(Content, ProtocolStatus) - Constructor for class org.apache.nutch.protocol.ProtocolOutput
-
- ProtocolOutput(Content) - Constructor for class org.apache.nutch.protocol.ProtocolOutput
-
- ProtocolStatus - Class in org.apache.nutch.protocol
-
- ProtocolStatus() - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- ProtocolStatus(int, String[]) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- ProtocolStatus(int, String[], int) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- ProtocolStatus(int, String[], long) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- ProtocolStatus(int, String[], long, int) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- ProtocolStatus(int) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- ProtocolStatus(int, int) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- ProtocolStatus(int, long) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- ProtocolStatus(int, long, int) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- ProtocolStatus(int, Object) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- ProtocolStatus(int, Object, int) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- ProtocolStatus(int, Object, long) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- ProtocolStatus(int, Object, long, int) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- ProtocolStatus(Throwable) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- ProtocolStatus(Throwable, int) - Constructor for class org.apache.nutch.protocol.ProtocolStatus
-
- PUBLISHER - Static variable in interface org.apache.nutch.metadata.DublinCore
-
An entity responsible for making the resource available.
- put(Writable, Writable) - Method in class org.apache.nutch.crawl.MapWritable
-
- putAll(MapWritable) - Method in class org.apache.nutch.crawl.MapWritable
-
- SCOPE_CRAWLDB - Static variable in class org.apache.nutch.net.URLNormalizers
-
Scope used when updating the CrawlDb with new URLs.
- SCOPE_DEFAULT - Static variable in class org.apache.nutch.net.URLNormalizers
-
Default scope.
- SCOPE_FETCHER - Static variable in class org.apache.nutch.net.URLNormalizers
-
Scope used by org.apache.nutch.fetcher.Fetcher
when processing
redirect URLs.
- SCOPE_GENERATE_HOST_COUNT - Static variable in class org.apache.nutch.net.URLNormalizers
-
Scope used by org.apache.nutch.crawl.Generator
.
- SCOPE_INJECT - Static variable in class org.apache.nutch.net.URLNormalizers
-
Scope used by org.apache.nutch.crawl.Injector
.
- SCOPE_LINKDB - Static variable in class org.apache.nutch.net.URLNormalizers
-
Scope used when updating the LinkDb with new URLs.
- SCOPE_OUTLINK - Static variable in class org.apache.nutch.net.URLNormalizers
-
Scope used when constructing new
Outlink
instances.
- SCOPE_PARTITION - Static variable in class org.apache.nutch.net.URLNormalizers
-
Scope used by org.apache.nutch.crawl.PartitionUrlByHost
.
- SCORE_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
-
- SEGMENT_NAME_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
-
- SEPARATOR - Static variable in class org.apache.hadoop.fs.Path
-
The directory separator, a slash.
- SEPARATOR_CHAR - Static variable in class org.apache.hadoop.fs.Path
-
- set(String, Object) - Method in class org.apache.hadoop.conf.Configuration
-
Sets the value of the name
property.
- set(Writable[]) - Method in class org.apache.hadoop.io.ArrayWritable
-
- set(boolean) - Method in class org.apache.hadoop.io.BooleanWritable
-
Set the value of the BooleanWritable
- set(BytesWritable) - Method in class org.apache.hadoop.io.BytesWritable
-
Set the BytesWritable to the contents of the given newData.
- set(byte[], int, int) - Method in class org.apache.hadoop.io.BytesWritable
-
Set the value to a copy of the given byte range
- set(float) - Method in class org.apache.hadoop.io.FloatWritable
-
Set the value of this FloatWritable.
- set(Writable) - Method in class org.apache.hadoop.io.GenericWritable
-
Set the instance that is wrapped.
- set(int) - Method in class org.apache.hadoop.io.IntWritable
-
Set the value of this IntWritable.
- set(long) - Method in class org.apache.hadoop.io.LongWritable
-
Set the value of this LongWritable.
- set(MD5Hash) - Method in class org.apache.hadoop.io.MD5Hash
-
Copy the contents of another instance into this instance.
- set(Object) - Method in class org.apache.hadoop.io.ObjectWritable
-
Reset the instance.
- set(String) - Method in class org.apache.hadoop.io.Text
-
Set to contain the contents of a string.
- set(byte[]) - Method in class org.apache.hadoop.io.Text
-
Set to a utf8 byte array
- set(Text) - Method in class org.apache.hadoop.io.Text
-
copy a text.
- set(byte[], int, int) - Method in class org.apache.hadoop.io.Text
-
Set the Text to range of bytes
- set(Writable[][]) - Method in class org.apache.hadoop.io.TwoDArrayWritable
-
- set(int) - Method in class org.apache.hadoop.io.VIntWritable
-
Set the value of this VIntWritable.
- set(long) - Method in class org.apache.hadoop.io.VLongWritable
-
Set the value of this LongWritable.
- set(CrawlDatum) - Method in class org.apache.nutch.crawl.CrawlDatum
-
Copy the contents of another instance into this instance.
- set(String, String) - Method in class org.apache.nutch.metadata.Metadata
-
Set metadata name/value.
- set(String, String) - Method in class org.apache.nutch.metadata.SpellCheckedMetadata
-
- setActionUrl(String) - Method in class com.endeca.itl.web.auth.AuthenticatorConfiguration
-
- setAll(Properties) - Method in class org.apache.nutch.metadata.Metadata
-
Copy All key-value pairs from properties.
- setArgs(String[]) - Method in class org.apache.nutch.parse.ParseStatus
-
- setArgs(String[]) - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- setAuthConf(AuthenticatorConfiguration) - Method in interface com.endeca.itl.web.auth.Authenticator
-
Initializes the Authenticator with the given configuration.
- setBaseHref(URL) - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Sets the baseHref
.
- setBoolean(String, boolean) - Method in class org.apache.hadoop.conf.Configuration
-
Sets the value of the name
property to an integer.
- setCapacity(int) - Method in class org.apache.hadoop.io.BytesWritable
-
Change the capacity of the backing storage.
- setClass(String, Class, Class) - Method in class org.apache.hadoop.conf.Configuration
-
Sets the value of the name
property to the name of a class.
- setClassLoader(ClassLoader) - Method in class org.apache.hadoop.conf.Configuration
-
Set the class loader that will be used to load the various objects.
- setClassName(String) - Method in class com.endeca.itl.web.auth.AuthenticatorConfiguration
-
- setClazz(String) - Method in class org.apache.nutch.plugin.Extension
-
Sets the Class that implement the concret extension and is only used until
model creation at system start up.
- setCode(int) - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- setConf(Configuration) - Method in interface org.apache.hadoop.conf.Configurable
-
Set the configuration to be used by this object.
- setConf(Configuration) - Method in class org.apache.hadoop.conf.Configured
-
- setConf(Configuration) - Method in class org.apache.hadoop.io.ObjectWritable
-
- setConf(Object, Configuration) - Static method in class org.apache.hadoop.util.ReflectionUtils
-
Check and set 'configuration' if necessary.
- setConf(Configuration) - Method in class org.apache.nutch.crawl.Signature
-
- setConf(Configuration) - Method in class org.apache.nutch.fetcher.FetcherOutput
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.ParseData
-
- setConf(Configuration) - Method in class org.apache.nutch.parse.ParseImpl
-
- setContent(byte[]) - Method in class org.apache.nutch.protocol.Content
-
- setContent(Content) - Method in class org.apache.nutch.protocol.ProtocolOutput
-
- setContentType(String) - Method in class org.apache.nutch.protocol.Content
-
- setDescriptor(PluginDescriptor) - Method in class org.apache.nutch.plugin.Extension
-
Sets the plugin descriptor and is only used until model creation at system
start up.
- setDigest(String) - Method in class org.apache.hadoop.io.MD5Hash
-
Sets the digest value from a hex string.
- setDOMRoot(DocumentFragment) - Method in class org.apache.nutch.parse.ParseData
-
Set the DOM.
- setFactory(Class, WritableFactory) - Static method in class org.apache.hadoop.io.WritableFactories
-
Define a factory for a class.
- setFetchInterval(float) - Method in class org.apache.nutch.crawl.CrawlDatum
-
- setFetchTime(long) - Method in class org.apache.nutch.crawl.CrawlDatum
-
- setId(String) - Method in class org.apache.nutch.plugin.Extension
-
Sets the unique extension Id and is only used until model creation at
system start up.
- setInt(String, int) - Method in class org.apache.hadoop.conf.Configuration
-
Sets the value of the name
property to an integer.
- setLastModified(long) - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- setLoginUrl(String) - Method in class com.endeca.itl.web.auth.AuthenticatorConfiguration
-
- setLong(String, long) - Method in class org.apache.hadoop.conf.Configuration
-
Sets the value of the name
property to a long.
- setMajorCode(byte) - Method in class org.apache.nutch.parse.ParseStatus
-
- setMessage(String) - Method in class org.apache.nutch.parse.ParseStatus
-
- setMessage(String) - Method in class org.apache.nutch.protocol.ProtocolStatus
-
- setMeta(String, String) - Method in class org.apache.nutch.metadata.MetaWrapper
-
Set metadata.
- setMetaData(MapWritable) - Method in class org.apache.nutch.crawl.CrawlDatum
-
- setMetadata(Metadata) - Method in class org.apache.nutch.protocol.Content
-
Other protocol-specific data.
- setMetaTag(HTMLMetaTags) - Method in class org.apache.nutch.parse.ParseData
-
- setMethod(String) - Method in class com.endeca.itl.web.auth.AuthenticatorConfiguration
-
- setMinorCode(short) - Method in class org.apache.nutch.parse.ParseStatus
-
- setModifiedTime(long) - Method in class org.apache.nutch.crawl.CrawlDatum
-
- setName(Class, String) - Static method in class org.apache.hadoop.io.WritableName
-
Set the name that a class should be known as to something other than the
class name.
- setNextFetchTime() - Method in class org.apache.nutch.crawl.CrawlDatum
-
- setNoCache() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Sets noCache
to true
.
- setNoFollow() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Sets noFollow
to true
.
- setNoIndex() - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Sets noIndex
to true
.
- setObject(String, Object) - Method in class org.apache.hadoop.conf.Configuration
-
Sets the value of the name
property.
- setParameters(List<NameValuePair>) - Method in class com.endeca.itl.web.auth.AuthenticatorConfiguration
-
- setParseMeta(Metadata) - Method in class org.apache.nutch.parse.ParseData
-
- setPreAuthenticate(boolean) - Method in class com.endeca.itl.web.auth.AuthenticatorConfiguration
-
- setProperties(Map<String, String>) - Method in class com.endeca.itl.web.auth.AuthenticatorConfiguration
-
- setQuietMode(boolean) - Method in class org.apache.hadoop.conf.Configuration
-
Make this class quiet.
- setRefresh(boolean) - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Sets refresh
to the supplied value.
- setRefreshHref(URL) - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Sets the refreshHref
.
- setRefreshTime(int) - Method in class org.apache.nutch.parse.HTMLMetaTags
-
Sets the refreshTime
.
- setResponseCode(int) - Method in class org.apache.nutch.crawl.CrawlDatum
-
- setRetriesSinceFetch(int) - Method in class org.apache.nutch.crawl.CrawlDatum
-
- setRobotsDelay(long) - Method in class org.apache.nutch.crawl.CrawlDatum
-
- setScore(float) - Method in class org.apache.nutch.crawl.CrawlDatum
-
- setSignature(byte[]) - Method in class org.apache.nutch.crawl.CrawlDatum
-
- setSite(String) - Method in class com.endeca.itl.web.auth.AuthenticatorConfiguration
-
- setSize(int) - Method in class org.apache.hadoop.io.BytesWritable
-
Change the size of the buffer.
- setStatus(int) - Method in class org.apache.nutch.crawl.CrawlDatum
-
- setStatus(ProtocolStatus) - Method in class org.apache.nutch.protocol.ProtocolOutput
-
- setValueClass(Class) - Method in class org.apache.hadoop.io.ArrayWritable
-
- shortestMatch(String) - Method in class org.apache.nutch.util.PrefixStringMatcher
-
Returns the shortest prefix of input that is matched,
or null if no match exists.
- shortestMatch(String) - Method in class org.apache.nutch.util.SuffixStringMatcher
-
Returns the shortest suffix of input that is matched,
or null if no match exists.
- shortestMatch(String) - Method in class org.apache.nutch.util.TrieStringMatcher
-
Returns the shortest substring of input that is
matched by a pattern in the trie, or null if no match
exists.
- shutDown() - Method in class org.apache.nutch.plugin.Plugin
-
Shutdown the plugin.
- Signature - Class in org.apache.nutch.crawl
-
- Signature() - Constructor for class org.apache.nutch.crawl.Signature
-
- SIGNATURE_KEY - Static variable in interface org.apache.nutch.metadata.Nutch
-
- SignatureComparator - Class in org.apache.nutch.crawl
-
- SignatureComparator() - Constructor for class org.apache.nutch.crawl.SignatureComparator
-
- SignatureFactory - Class in org.apache.nutch.crawl
-
Factory class, which instantiates a Signature implementation according to the
current Configuration configuration.
- simpleHostname(String) - Static method in class org.apache.hadoop.util.StringUtils
-
Given a full hostname, return the word upto the first dot.
- size() - Method in class org.apache.nutch.crawl.MapWritable
-
- size() - Method in class org.apache.nutch.metadata.Metadata
-
Returns the number of metadata names in this metadata.
- skip(DataInput) - Static method in class org.apache.hadoop.io.Text
-
Skips over one Text in the input.
- skip(DataInput) - Static method in class org.apache.nutch.parse.Outlink
-
Skips over one Outlink in the input.
- skipCompressedByteArray(DataInput) - Static method in class org.apache.hadoop.io.WritableUtils
-
- SOURCE - Static variable in interface org.apache.nutch.metadata.DublinCore
-
A reference to a resource from which the present resource is derived.
- SpellCheckedMetadata - Class in org.apache.nutch.metadata
-
A decorator to Metadata that adds spellchecking capabilities to property
names.
- SpellCheckedMetadata() - Constructor for class org.apache.nutch.metadata.SpellCheckedMetadata
-
- startUp() - Method in class org.apache.nutch.plugin.Plugin
-
Will be invoked until plugin start up.
- statNames - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
- STATUS_BLOCKED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_DB_FETCHED - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Page was successfully fetched.
- STATUS_DB_GONE - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Page no longer exists.
- STATUS_DB_MAX - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Maximum value of DB-related status.
- STATUS_DB_REDIR_PERM - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Page permanently redirects to other page.
- STATUS_DB_REDIR_TEMP - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Page temporarily redirects to other page.
- STATUS_DB_UNFETCHED - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Page was not fetched yet.
- STATUS_FAILED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_FAILURE - Static variable in class org.apache.nutch.parse.ParseStatus
-
- STATUS_FETCH_CONTENT_LIMIT_EXCEEDED - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Fetching was successful but content was truncated
- STATUS_FETCH_GONE - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Fetching unsuccessful - page is gone.
- STATUS_FETCH_MAX - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Maximum value of fetch-related status.
- STATUS_FETCH_REDIR_PERM - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Fetching permanently redirected to other page.
- STATUS_FETCH_REDIR_TEMP - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Fetching temporarily redirected to other page.
- STATUS_FETCH_RETRY - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Fetching unsuccessful, needs to be retried (transient errors).
- STATUS_FETCH_SUCCESS - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Fetching was successful.
- STATUS_GONE - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_INJECTED - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Page was newly injected.
- STATUS_LINKED - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Page discovered through a link.
- STATUS_NOTFETCHING - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_NOTFOUND - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_NOTMODIFIED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_NOTPARSED - Static variable in class org.apache.nutch.parse.ParseStatus
-
- STATUS_REDIR_EXCEEDED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_RETRY - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_ROBOTS_DENIED - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_SIGNATURE - Static variable in class org.apache.nutch.crawl.CrawlDatum
-
Page signature.
- STATUS_SUCCESS - Static variable in class org.apache.nutch.parse.ParseStatus
-
- STATUS_SUCCESS - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- STATUS_WOULDBLOCK - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
- stringifyException(Throwable) - Static method in class org.apache.hadoop.util.StringUtils
-
Make a string representation of the exception.
- stringToPath(String[]) - Static method in class org.apache.hadoop.util.StringUtils
-
- stringToURI(String[]) - Static method in class org.apache.hadoop.util.StringUtils
-
- StringUtil - Class in org.apache.nutch.util
-
A collection of String processing utility methods.
- StringUtil() - Constructor for class org.apache.nutch.util.StringUtil
-
- StringUtils - Class in org.apache.hadoop.util
-
General string utils
- StringUtils() - Constructor for class org.apache.hadoop.util.StringUtils
-
- SUBJECT - Static variable in interface org.apache.nutch.metadata.DublinCore
-
The topic of the content of the resource.
- SUCCESS - Static variable in class org.apache.nutch.parse.ParseStatus
-
Parsing succeeded.
- SUCCESS - Static variable in class org.apache.nutch.protocol.ProtocolStatus
-
Content was retrieved without errors.
- SUCCESS_REDIRECT - Static variable in class org.apache.nutch.parse.ParseStatus
-
Parsed content contains a directive to redirect to another URL.
- suffix(String) - Method in class org.apache.hadoop.fs.Path
-
Adds a suffix to the final name in the path.
- SuffixStringMatcher - Class in org.apache.nutch.util
-
A class for efficiently matching String
s against a set
of suffixes.
- SuffixStringMatcher(String[]) - Constructor for class org.apache.nutch.util.SuffixStringMatcher
-
Creates a new PrefixStringMatcher
which will match
String
s with any suffix in the supplied array.
- SuffixStringMatcher(Collection) - Constructor for class org.apache.nutch.util.SuffixStringMatcher
-
Creates a new PrefixStringMatcher
which will match
String
s with any suffix in the supplied
Collection