org.apache.nutch.crawl
Class CrawlDatum

java.lang.Object
  extended by org.apache.nutch.crawl.CrawlDatum
All Implemented Interfaces:
Cloneable, Comparable, Writable, WritableComparable

public class CrawlDatum
extends Object
implements WritableComparable, Cloneable


Nested Class Summary
static class CrawlDatum.Comparator
          A Comparator optimized for CrawlDatum.
 
Field Summary
static String FETCH_DIR_NAME
           
static String GENERATE_DIR_NAME
           
static String PARSE_DIR_NAME
           
static HashMap<Byte,String> statNames
           
static byte STATUS_DB_FETCHED
          Page was successfully fetched.
static byte STATUS_DB_GONE
          Page no longer exists.
static byte STATUS_DB_MAX
          Maximum value of DB-related status.
static byte STATUS_DB_REDIR_PERM
          Page permanently redirects to other page.
static byte STATUS_DB_REDIR_TEMP
          Page temporarily redirects to other page.
static byte STATUS_DB_UNFETCHED
          Page was not fetched yet.
static byte STATUS_FETCH_CONTENT_LIMIT_EXCEEDED
          Fetching was successful but content was truncated
static byte STATUS_FETCH_GONE
          Fetching unsuccessful - page is gone.
static byte STATUS_FETCH_MAX
          Maximum value of fetch-related status.
static byte STATUS_FETCH_REDIR_PERM
          Fetching permanently redirected to other page.
static byte STATUS_FETCH_REDIR_TEMP
          Fetching temporarily redirected to other page.
static byte STATUS_FETCH_RETRY
          Fetching unsuccessful, needs to be retried (transient errors).
static byte STATUS_FETCH_SUCCESS
          Fetching was successful.
static byte STATUS_INJECTED
          Page was newly injected.
static byte STATUS_LINKED
          Page discovered through a link.
static byte STATUS_SIGNATURE
          Page signature.
 
Constructor Summary
CrawlDatum()
           
CrawlDatum(int status, float fetchInterval)
           
CrawlDatum(int status, float fetchInterval, float score)
           
 
Method Summary
 Object clone()
           
 int compareTo(Object o)
          Sort by decreasing score.
 boolean equals(Object o)
           
 float getFetchInterval()
           
 long getFetchTime()
           
 MapWritable getMetaData()
          returns a MapWritable if it was set or read in @see readFields(DataInput), returns empty map in case CrawlDatum was freshly created (lazily instantiated).
 long getModifiedTime()
           
 long getResponseCode()
           
 byte getRetriesSinceFetch()
           
 long getRobotsDelay()
           
 float getScore()
           
 byte[] getSignature()
           
 byte getStatus()
           
static String getStatusName(byte value)
           
static boolean hasDbStatus(CrawlDatum datum)
           
static boolean hasFetchStatus(CrawlDatum datum)
           
 int hashCode()
           
static CrawlDatum read(DataInput in)
           
 void readFields(DataInput in)
          Reads the fields of this object from in.
 void set(CrawlDatum that)
          Copy the contents of another instance into this instance.
 void setFetchInterval(float fetchInterval)
           
 void setFetchTime(long fetchTime)
           
 void setMetaData(MapWritable mapWritable)
           
 void setModifiedTime(long modifiedTime)
           
 void setNextFetchTime()
           
 void setResponseCode(int responseCode)
           
 void setRetriesSinceFetch(int retries)
           
 void setRobotsDelay(long robotsDelay)
           
 void setScore(float score)
           
 void setSignature(byte[] signature)
           
 void setStatus(int status)
           
 String toString()
           
 void write(DataOutput out)
          Writes the fields of this object to out.
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

GENERATE_DIR_NAME

public static final String GENERATE_DIR_NAME
See Also:
Constant Field Values

FETCH_DIR_NAME

public static final String FETCH_DIR_NAME
See Also:
Constant Field Values

PARSE_DIR_NAME

public static final String PARSE_DIR_NAME
See Also:
Constant Field Values

STATUS_DB_UNFETCHED

public static final byte STATUS_DB_UNFETCHED
Page was not fetched yet.

See Also:
Constant Field Values

STATUS_DB_FETCHED

public static final byte STATUS_DB_FETCHED
Page was successfully fetched.

See Also:
Constant Field Values

STATUS_DB_GONE

public static final byte STATUS_DB_GONE
Page no longer exists.

See Also:
Constant Field Values

STATUS_DB_REDIR_TEMP

public static final byte STATUS_DB_REDIR_TEMP
Page temporarily redirects to other page.

See Also:
Constant Field Values

STATUS_DB_REDIR_PERM

public static final byte STATUS_DB_REDIR_PERM
Page permanently redirects to other page.

See Also:
Constant Field Values

STATUS_DB_MAX

public static final byte STATUS_DB_MAX
Maximum value of DB-related status.

See Also:
Constant Field Values

STATUS_FETCH_SUCCESS

public static final byte STATUS_FETCH_SUCCESS
Fetching was successful.

See Also:
Constant Field Values

STATUS_FETCH_RETRY

public static final byte STATUS_FETCH_RETRY
Fetching unsuccessful, needs to be retried (transient errors).

See Also:
Constant Field Values

STATUS_FETCH_REDIR_TEMP

public static final byte STATUS_FETCH_REDIR_TEMP
Fetching temporarily redirected to other page.

See Also:
Constant Field Values

STATUS_FETCH_REDIR_PERM

public static final byte STATUS_FETCH_REDIR_PERM
Fetching permanently redirected to other page.

See Also:
Constant Field Values

STATUS_FETCH_GONE

public static final byte STATUS_FETCH_GONE
Fetching unsuccessful - page is gone.

See Also:
Constant Field Values

STATUS_FETCH_CONTENT_LIMIT_EXCEEDED

public static final byte STATUS_FETCH_CONTENT_LIMIT_EXCEEDED
Fetching was successful but content was truncated

See Also:
Constant Field Values

STATUS_FETCH_MAX

public static final byte STATUS_FETCH_MAX
Maximum value of fetch-related status.

See Also:
Constant Field Values

STATUS_SIGNATURE

public static final byte STATUS_SIGNATURE
Page signature.

See Also:
Constant Field Values

STATUS_INJECTED

public static final byte STATUS_INJECTED
Page was newly injected.

See Also:
Constant Field Values

STATUS_LINKED

public static final byte STATUS_LINKED
Page discovered through a link.

See Also:
Constant Field Values

statNames

public static final HashMap<Byte,String> statNames
Constructor Detail

CrawlDatum

public CrawlDatum()

CrawlDatum

public CrawlDatum(int status,
                  float fetchInterval)

CrawlDatum

public CrawlDatum(int status,
                  float fetchInterval,
                  float score)
Method Detail

hasDbStatus

public static boolean hasDbStatus(CrawlDatum datum)

hasFetchStatus

public static boolean hasFetchStatus(CrawlDatum datum)

getStatus

public byte getStatus()

getStatusName

public static String getStatusName(byte value)

setStatus

public void setStatus(int status)

getFetchTime

public long getFetchTime()

setFetchTime

public void setFetchTime(long fetchTime)

getRobotsDelay

public long getRobotsDelay()

setRobotsDelay

public void setRobotsDelay(long robotsDelay)

getResponseCode

public long getResponseCode()

setResponseCode

public void setResponseCode(int responseCode)

setNextFetchTime

public void setNextFetchTime()

getModifiedTime

public long getModifiedTime()

setModifiedTime

public void setModifiedTime(long modifiedTime)

getRetriesSinceFetch

public byte getRetriesSinceFetch()

setRetriesSinceFetch

public void setRetriesSinceFetch(int retries)

getFetchInterval

public float getFetchInterval()

setFetchInterval

public void setFetchInterval(float fetchInterval)

getScore

public float getScore()

setScore

public void setScore(float score)

getSignature

public byte[] getSignature()

setSignature

public void setSignature(byte[] signature)

setMetaData

public void setMetaData(MapWritable mapWritable)

getMetaData

public MapWritable getMetaData()
returns a MapWritable if it was set or read in @see readFields(DataInput), returns empty map in case CrawlDatum was freshly created (lazily instantiated).


read

public static CrawlDatum read(DataInput in)
                       throws IOException
Throws:
IOException

readFields

public void readFields(DataInput in)
                throws IOException
Description copied from interface: Writable
Reads the fields of this object from in. For efficiency, implementations should attempt to re-use storage in the existing object where possible.

Specified by:
readFields in interface Writable
Throws:
IOException

write

public void write(DataOutput out)
           throws IOException
Description copied from interface: Writable
Writes the fields of this object to out.

Specified by:
write in interface Writable
Throws:
IOException

set

public void set(CrawlDatum that)
Copy the contents of another instance into this instance.


compareTo

public int compareTo(Object o)
Sort by decreasing score.

Specified by:
compareTo in interface Comparable

toString

public String toString()
Overrides:
toString in class Object

equals

public boolean equals(Object o)
Overrides:
equals in class Object

hashCode

public int hashCode()
Overrides:
hashCode in class Object

clone

public Object clone()
Overrides:
clone in class Object


Copyright © 2007, 2012, Oracle and/or its affiliates. All rights reserved.