org.apache.nutch.crawl
Class TextProfileSignature

java.lang.Object
  extended by org.apache.nutch.crawl.Signature
      extended by org.apache.nutch.crawl.TextProfileSignature
All Implemented Interfaces:
Configurable

public class TextProfileSignature
extends Signature

An implementation of a page signature. It calculates an MD5 hash of a plain text "profile" of a page. In case there is no text, it calculates a hash using the MD5Signature.

The algorithm to calculate a page "profile" takes the plain text version of a page and performs the following steps:

This list is then submitted to an MD5 hash calculation.

Author:
Andrzej Bialecki <ab@getopt.org>

Field Summary
 
Fields inherited from class org.apache.nutch.crawl.Signature
conf
 
Constructor Summary
TextProfileSignature()
           
 
Method Summary
 byte[] calculate(Content content, Parse parse)
           
 
Methods inherited from class org.apache.nutch.crawl.Signature
getConf, setConf
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

TextProfileSignature

public TextProfileSignature()
Method Detail

calculate

public byte[] calculate(Content content,
                        Parse parse)
Specified by:
calculate in class Signature


Copyright © 2007, 2012, Oracle and/or its affiliates. All rights reserved.