Rewriter scans the content of web pages and identifies the URLs it finds on those web pages. It uses a collection of rules defined in a ruleset to determine the elements of a web page to rewrite. Once Rewriter identifies a URL it can rewrite the URL by:
The URLScraperProvider is part of the core Portal Server 7.1 product. In a non-gateway scenario, the URLScraperProvider can be used to expand relative URLs to absolute URLs. For example, if a user is trying to access the site:
The Rewriter translates this to:
where http://www.yahoo.com/mail/ is the base URL of the page scraped.
The URLScraperProvider simply tries to display a designated URL in a channel. There’s no way to specify parts of a document URL (document) to display. The URLScraperProvider acts much like an HTTP client, in that it makes a request for the content of the specified URL. Just like in a browser, the target URL to scrape must be network visible, or you must have a proxy configured.
The resultant URL scraper channel, however, is not a mini-browser nor is it a frame. Therefore, if you have a link in the content, it effects the whole page, not just the channel. You should not browse inside the URL scraper channel. If you select a link within the channel the browser can interpret the link and replace the currently displayed page (your portal server Desktop) with the contents of the link location.
When an Edit function of some sort is required so that the user can customize the channel.
When the data comes from a non-HTML, non-web server source, that is, a database or mail server.
When the data needs to be reformatted in some way for the channel.
When a more efficient solution is required as the URLScraperProvider will do a request and look up for every Desktop display and user.
When cookies are sent by the origin server, they are forwarded back everytime web content is re-scraped. So the origin should get the cookies it sent as the web content scraped the first time, when portal desktop is updated or reloaded. But those cookies are not expected to be sent back when user clicks on any links in the url scraper channel.
In an implementation with a gateway such as the Sun Java System Portal Server 7.1 SRA, the gateway acts as a proxy for the client and accesses intranet sites and returns responses to the client. The Rewriter translates URLs in downloaded pages so that they point back to the gateway rather than to the original site by prefixing the gateway URL to the existing URL.
For example, if a user tries to access an HTML page on mymachine using the following URL:
The Rewriter prefixes this URL with a reference to the gateway as follows:
When a user selects a link associated with this anchor, the browser contacts the gateway. The gateway fetches the content of mypage.html from mymachine.intranet.com.
See the Sun Java System Portal Server 7.1 SRA Administration Guide for more information on using the Rewriter to prefix a gateway URL to an existing URL.