Package | Description |
---|---|
com.endeca.itl.web.process | |
org.apache.nutch.crawl |
Crawl control code.
|
org.apache.nutch.fetcher |
The Nutch robot.
|
org.apache.nutch.parse | |
org.apache.nutch.protocol |
Modifier and Type | Method and Description |
---|---|
List<com.endeca.itl.record.Record> |
EndecaRecordGenerator.generate(Content content,
Parse parse) |
Modifier and Type | Method and Description |
---|---|
byte[] |
TextProfileSignature.calculate(Content content,
Parse parse) |
abstract byte[] |
Signature.calculate(Content content,
Parse parse) |
byte[] |
MD5Signature.calculate(Content content,
Parse parse) |
Modifier and Type | Method and Description |
---|---|
Content |
FetcherOutput.getContent() |
Constructor and Description |
---|
FetcherOutput(CrawlDatum crawlDatum,
Content content,
ParseImpl parse) |
Modifier and Type | Method and Description |
---|---|
Parse |
ParseFilters.filter(Content content,
Parse parse)
Run all defined filters.
|
Parse |
ParseFilter.filter(Content content,
Parse parse)
Adds metadata or modifies parse
|
Parse |
HtmlParseFilters.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Deprecated.
Run all defined filters.
|
Parse |
HtmlParseFilter.filter(Content content,
Parse parse,
HTMLMetaTags metaTags,
DocumentFragment doc)
Deprecated.
Adds metadata or otherwise modifies a parse of HTML content, given
the DOM tree of a page.
|
Parse |
Parser.getParse(Content c)
Creates the parse for some content.
|
Parse |
ParseUtil.parse(Content content)
|
Parse |
ParseUtil.parseByExtensionId(String extId,
Content content)
|
Modifier and Type | Method and Description |
---|---|
Content |
ProtocolOutput.getContent() |
static Content |
Content.read(DataInput in) |
Modifier and Type | Method and Description |
---|---|
void |
ProtocolOutput.setContent(Content content) |
Constructor and Description |
---|
ProtocolOutput(Content content) |
ProtocolOutput(Content content,
ProtocolStatus status) |
Copyright © 2007, 2014, Oracle and/or its affiliates. All rights reserved.