The HTML Content Gear retrieves the contents of a Web location, given its URL, then renders the location as gear content. The gear also ensures that all the URIs in the Web content are properly rewritten so that the links shown in the gear content point to the original Web server.

Instance Configuration

To configure an instance of the HTML Content Gear:

Configuring the HTMLFilterParser

The behavior of the HTML Content Gear is also affected by a Nucleus component named /atg/portal/gear/screenscraper/HtmlFilterParser. This component has three properties that you may want to configure:

tagsToRemove
A list of tags that you want to remove from the source Web page so that those tags from the source page do not interfere with the rendering of the content into the gear’s content pages. For example, suppose something like this appears in the source content:

<title>This is the Title</title>

and you have specified title as one of the items in the property tagsToRemove. Then, the above string will be rendered in the gear’s content pages as This is the title, without the <title> tags.

tagsToRemoveWithBody
A list of tags that you want to remove, together with the tags’ contents, from the source Web page. This does the same thing as the tagsToRemove property, except that it will remove not just the specified tags but also anything between the start and end tags.

For example, if this appears in the source content:

<title>This is the Title</title>

and you specified title as one of the items in the property tagsToRemoveWithBody, then whole string will be removed, including both the <title> tags and the This is the title string.

replaceBodyTagWithTableTag
The parser replaces the <body> tag with a <table> tag so that the community page where the gear is installed is not messed up due to the bgcolor or background attributes of the source page’s <body>tag. This functionality can be turned off by setting the replaceBodyTagWithTableTag property to false.

Extending the HtmlFilterParser

The Portal module includes the source code for the atg.portal.gear.screenscraper.HtmlFilterParser class in the <ATG10dir>/Portal/screenscraper/src/classes.jar/atg/portal/gear/screenscraper directory. You can modify the class to do your own custom parsing. You can even replace this parser with a parser of your own by subclassing the atg.portal.gear.screenscraper.HtmlFilterParser and overriding the parse(Reader pIn, Writer pWriter) and parse(InputStream pIn, OutputStream pOut) methods. This might enable, for example, the capability of replacing other tags in the source page.