The filter-by-md5 function only allows the first resource with a given MD5 checksum value. If the current resource’s MD5 has been seen in an earlier resource by this robot, the current resource is denied. As a result, duplication of identical resources or single resources with multiple URLs is prevented.
You can only call this function at the Data stage or later. It can be called no more than once per filter. The filter must invoke the generate-md5 function to generate an MD5 checksum before invoking filter-by-md5 function.
The following example shows the typical method of handling MD5 checksums by first generating the checksum and then filtering based on it:
Data fn=generate-md5 Data fn=filter-by-md5