|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--com.sun.portal.providers.ProviderAdapter | +--com.sun.portal.providers.ProfileProviderAdapter | +--com.sun.portal.providers.urlscraper.URLScraperProvider
A URLScraperProvider is a content provider that can retrieve and display HTML content from a given URL.
URLScraperProvider acts as an HTTP client and makes a request for the content of the specified URL and then displays it in the channel.
Each URLScraper channel has its own timeout attribute. The channel will wait up to its individual timeout to receive content.
Forwarding of cookies
Each URLScraper channel has a cookiesToForwardList
attribute
that can be set on the in the display profile. If
a cookie is allowed by this attribute, a cookie in the request
coming from the browser will be forwarded to the web server specified
for the URL. allCookies
attribute can be set to true to allow
all the cookies. A set-cookie
request from that web server
will be sent back to the browser. The set-cookie
request
is modified so that the cookie is only sent back to the portal server.
URL Rewriting
The HTML content gathered by the channel will be rewritten if
the rewriter is available. The ruleset used by the rewriter can be
specified in the display profile attribute rulesetID.
Relative URLs are converted to absolute URLs. For example, if your portal server is
http://portal.iplanet.com/
and the web server specified in the
URL is http://foo.sesta.com/
and the HTML file contains
<IMG SRC="/images/blah.gif">
then the html sent back to browser via portal server will be
rewritten as:
<IMG SRC="http://foo.sesta.com/images/blah.gif">
Because otherwise the browser will attempt to read the image from
http://portal.sesta.com/images/blah.gif
and will not resolve it.
SSL protected pages
In general the URLScraperProvider will work with SSL pages. The
important thing to remember is that there can be no level of
interaction required by the specified URL as there is no way to
pass that information to the end user.
Timeouts
There are 2 timeout values to consider:
Encoding
The order for determining the encoding would be
HTTP header, if available (only applies to http(s) urls)
inputEncoding property, if non-blank
tag in content, e.g. meta tag in html, xml header for xml, if available
(only applies to HTML, XML, determined based on the MIMEType)
system default
MIMEType is determined from the jvm table. If not set, it is determined
from the file extension.
Proxy Configuration
URLScraper channel uses a proxy to scrape the url specified
if the proxy is set in jvm12.conf file for web server
For Example the proxy can be set as
http.proxyHost=
http.proxyPort=
The refreshTime
attribute is used for caching and
will cause the URL not to be fetched again if the page is reloaded
within that time.
NOTE: getEdit()
and processEdit()
methods are not implemented in URLScraper.
Fields inherited from interface com.sun.portal.providers.ProviderWidths |
WIDTH_FULL_BOTTOM, WIDTH_FULL_TOP, WIDTH_THICK, WIDTH_THIN |
Fields inherited from interface com.sun.portal.providers.ProviderEditTypes |
EDIT_COMPLETE, EDIT_SUBSET |
Constructor Summary | |
URLScraperProvider()
Default constructor. |
Method Summary | |
StringBuffer |
getContent(javax.servlet.http.HttpServletRequest req,
javax.servlet.http.HttpServletResponse res)
Get the provider's content by retrieving HTML content from specified URL. |
String |
getInputEncoding()
Gets the inputEncoding to be used by content. |
Methods inherited from class com.sun.portal.providers.ProviderAdapter |
getContent, getDescription, getEdit, getEdit, getEditType, getHelp, getHelp, getName, getProviderContext, getRefreshTime, getResourceBundle, getResourceBundle, getTitle, getWidth, init, isEditable, isPresentable, processEdit, processEdit |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
public URLScraperProvider()
Method Detail |
public String getInputEncoding() throws ProviderException
Gets the inputEncoding to be used by content. This method returns the inputEncoding which would be used in encoding the scrapped content.
ProviderException
- if there is an error getting the
input encoding.ProviderException
public StringBuffer getContent(javax.servlet.http.HttpServletRequest req, javax.servlet.http.HttpServletResponse res) throws ProviderException
Get the provider's content by retrieving HTML content from specified URL.
getContent
in class ProviderAdapter
req
- An HttpServletRequest that contains information related
to this request for content.res
- An HttpServletResponse that allows the provider to
influence the overall response for the desktop page
(besides generating the content).ProviderException
- if there was an error generating the
content.ProviderException
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: INNER | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |