|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
public interface Protocol
A retriever of url content. Implemented by protocol extensions.
| Field Summary | |
|---|---|
static String |
CHECK_BLOCKING
Property name. |
static String |
CHECK_ROBOTS
Property name. |
static String |
X_POINT_ID
The name of the extension point. |
| Method Summary | |
|---|---|
ProtocolOutput |
getProtocolOutput(Text url,
CrawlDatum datum)
Returns the Content for a fetchlist entry. |
RobotRules |
getRobotRules(Text url,
CrawlDatum datum)
Retrieve robot rules applicable for this url. |
| Methods inherited from interface org.apache.hadoop.conf.Configurable |
|---|
getConf, setConf |
| Field Detail |
|---|
static final String X_POINT_ID
static final String CHECK_BLOCKING
static final String CHECK_ROBOTS
| Method Detail |
|---|
ProtocolOutput getProtocolOutput(Text url,
CrawlDatum datum)
Content for a fetchlist entry.
RobotRules getRobotRules(Text url,
CrawlDatum datum)
url - url to checkdatum - page datum
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||