Sun Java System Portal Server 7 Developer's Guide

The URLScraperProvider

The URLScraperProvider can retrieve and display content from a given URL. The following is a step-by-step description of how the URLScraperProvider retrieves and displays content from the specified URL.

Gets the timeout property for the provider.
Gets the url property for the provider.

This is the URL source for the content to be fetched. The url property is fetched from the conditional property of display profile based on the user’s clientType and locale, if defined. Otherwise the default value is returned. When scraping contents using URLScraperProvider, avoid the following technologies.
- Frames and iFrames
- Dynamic Hypertext Markup Language (DHTML)
- Pages with extensive JavaScript^TM technology that assumes the pages have the web browser to themselves
Gets the urlScraperRulesetID to be used by Rewriter as a string.

This process ensures that any HTML links within the content are correct when included in the desktop. That is, when content included in the portal by using the URLScraperProvider often contains URLs to other documents, the Rewriter rewrites URLs in the scraped content by changing each link’s relative links to absolute links thus ensuring that each link points back to the web site from which it was scraped. The Ruleset determines which tags in the HTML document contains URLs and must be rewritten. If the URLScraperProvider is scraping content that contains custom tags, extend the Ruleset to take these into account.
Determines if the cookieName can be forwarded.

To determine, it checks the value of cookiesToForwardAll attribute value in the display profile. If this value is false, it checks the cookiesToForwardList attribute value in the display profile.
Determines if cookies from the request should be forwarded.

The URLScraperProvider includes the functionality to forward cookies associated with the document intended for the client’s web browser. Administrator’s can control cookie forwarding on a per channel basis, forwarding some or all cookies to the user.
Gets the content by one of the following ways.
- Retrieve content from the specified URL.
  
  That is, it sends an HTTP(s) request that contains information related to this request for content and gets an HTTP(s) response that allows the provider to influence the overall response for the Desktop page (besides generating the content).
- Take the fully qualified path name of the file and returning StringBuffer containing the data from the specified file or null if file does not exist or cannot be read.