com.sun.portal.providers.urlscraper
Class URLScraperProvider

java.lang.Object
  |
  +--com.sun.portal.providers.ProviderAdapter
        |
        +--com.sun.portal.providers.ProfileProviderAdapter
              |
              +--com.sun.portal.providers.urlscraper.URLScraperProvider
All Implemented Interfaces:
Provider, ProviderEditTypes, ProviderWidths
Direct Known Subclasses:
XMLProvider

public class URLScraperProvider
extends ProfileProviderAdapter

A URLScraperProvider is a content provider that can retrieve and display HTML content from a given URL.

URLScraperProvider acts as an HTTP client and makes a request for the content of the specified URL and then displays it in the channel.

Each URLScraper channel has its own timeout attribute. The channel will wait up to its individual timeout to receive content.

Forwarding of cookies
Each URLScraper channel has a cookiesToForwardList attribute that can be set on the in the display profile. If a cookie is allowed by this attribute, a cookie in the request coming from the browser will be forwarded to the web server specified for the URL. allCookies attribute can be set to true to allow all the cookies. A set-cookie request from that web server will be sent back to the browser. The set-cookie request is modified so that the cookie is only sent back to the portal server.

URL Rewriting
The HTML content gathered by the channel will be rewritten if the rewriter is available. The ruleset used by the rewriter can be specified in the display profile attribute rulesetID. Relative URLs are converted to absolute URLs. For example, if your portal server is http://portal.iplanet.com/ and the web server specified in the URL is http://foo.sesta.com/ and the HTML file contains

<IMG SRC="/images/blah.gif">

then the html sent back to browser via portal server will be rewritten as:

<IMG SRC="http://foo.sesta.com/images/blah.gif">

Because otherwise the browser will attempt to read the image from http://portal.sesta.com/images/blah.gif and will not resolve it.

SSL protected pages
In general the URLScraperProvider will work with SSL pages. The important thing to remember is that there can be no level of interaction required by the specified URL as there is no way to pass that information to the end user.

Timeouts
There are 2 timeout values to consider:

Each URLScraper channel has its own timeout attribute. The channel will wait up to its individual timeout to receive content.

Encoding
The order for determining the encoding would be HTTP header, if available (only applies to http(s) urls)
inputEncoding property, if non-blank
tag in content, e.g. meta tag in html, xml header for xml, if available (only applies to HTML, XML, determined based on the MIMEType)
system default
MIMEType is determined from the jvm table. If not set, it is determined from the file extension.

Proxy Configuration
URLScraper channel uses a proxy to scrape the url specified if the proxy is set in jvm12.conf file for web server For Example the proxy can be set as
http.proxyHost=
http.proxyPort=

The refreshTime attribute is used for caching and will cause the URL not to be fetched again if the page is reloaded within that time.

NOTE: getEdit() and processEdit() methods are not implemented in URLScraper.


Fields inherited from interface com.sun.portal.providers.ProviderWidths
WIDTH_FULL_BOTTOM, WIDTH_FULL_TOP, WIDTH_THICK, WIDTH_THIN
 
Fields inherited from interface com.sun.portal.providers.ProviderEditTypes
EDIT_COMPLETE, EDIT_SUBSET
 
Constructor Summary
URLScraperProvider()
          Default constructor.
 
Method Summary
 StringBuffer getContent(javax.servlet.http.HttpServletRequest req, javax.servlet.http.HttpServletResponse res)
          Get the provider's content by retrieving HTML content from specified URL.
 String getInputEncoding()
           Gets the inputEncoding to be used by content.
 
Methods inherited from class com.sun.portal.providers.ProfileProviderAdapter
existsBooleanProperty, existsIntegerProperty, existsListProperty, existsListProperty, existsStringProperty, existsStringProperty, getBooleanProperty, getBooleanProperty, getClientProperty, getIntegerProperty, getIntegerProperty, getListProperty, getListProperty, getMapProperty, getMapProperty, getMapProperty, getMapProperty, getStringAttribute, getStringProperty, getStringProperty, getStringProperty, getStringProperty, getTemplate, getTemplate, isAllowed, setBooleanProperty, setClientProperty, setIntegerProperty, setListProperty, setMapProperty, setStringAttribute, setStringProperty
 
Methods inherited from class com.sun.portal.providers.ProviderAdapter
getContent, getDescription, getEdit, getEdit, getEditType, getHelp, getHelp, getName, getProviderContext, getRefreshTime, getResourceBundle, getResourceBundle, getTitle, getWidth, init, isEditable, isPresentable, processEdit, processEdit
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

URLScraperProvider

public URLScraperProvider()
Default constructor.
Method Detail

getInputEncoding

public String getInputEncoding()
                        throws ProviderException

Gets the inputEncoding to be used by content. This method returns the inputEncoding which would be used in encoding the scrapped content.

Returns:
String value
Throws:
ProviderException - if there is an error getting the input encoding.
See Also:
ProviderException

getContent

public StringBuffer getContent(javax.servlet.http.HttpServletRequest req,
                               javax.servlet.http.HttpServletResponse res)
                        throws ProviderException

Get the provider's content by retrieving HTML content from specified URL.

Overrides:
getContent in class ProviderAdapter
Parameters:
req - An HttpServletRequest that contains information related to this request for content.
res - An HttpServletResponse that allows the provider to influence the overall response for the desktop page (besides generating the content).
Returns:
Channel content
Throws:
ProviderException - if there was an error generating the content.
See Also:
ProviderException