public interface LinkExtractor
This interface is used to extract the links from the downloaded markup as part of the crawl session. User can give their own custom implementation for LinkExtractor interface to extract the links from the downloaded resource. In Site Capture context, each downloaded resource is considered a WebResource.
There is an OOTB implementation - PatternLinkExtractor - which uses regular expression to extract the links from the downloaded markup.
Refer to the developer guide for details and usage of PatternLinkExtractor.
Modifier and Type | Method and Description |
---|---|
List<ResourceURL> |
extract(WebResource resource)
Method is used to parse the WebResource and find a list of links based on the algorithm specified.
|
List<ResourceURL> extract(WebResource resource)
resource
- A WebResource object which contains the information regarding the downloaded resource as part of crawl session.