HTMLComponent

Chapter 10

The HTMLComponent class allows rendering of HTML documents that conform to the XHTML Mobile Profile 1.0 (XHTML-MP 1.0) standard.

XHTML-MP 1.0 is a subset of XHTML adapted for mobile. The standard supports most of the basic elements such as Images, Fonts, Lists, Tables, Forms, and even WCSS (a subset of CSS2 for wireless). It does not support Javascript or frames, and it does not support all CSS2 tags or attributes.

This chapter discusses HTMLComponent use cases, interfaces, and implementation details. To learn more about HTMLComponent check out the LWUITBrowser application from the LWUIT SVN repository and examine the code. LWUITBrowser uses most of HTMLComponent’s capabilities.

10.1 `HTMLComponent` Use Cases

HTMLComponent can be used to render local or remote documents. It extends Container and as such it can be added to any Form.

HTMLComponent uses an internal parser to parse the given HTML documents. The parser is not 100% strict and can tolerate some errors in the document, however, some errors may be too fatal for the parser. It is very important to stick to the XHTML-MP1 standard. You must close all open tags in the correct hierarchical order.

Fox News Mobile Site Rendered with LWUIT Browser Using HTMLComponent

Site Rendered with LWUIT Browser and HTMLComponent

10.1.1 Rendering Rich Text

The most simple use case of HTMLComponent is rendering rich text:

HTMLComponent htmlC = new HTMLComponent(null);
htmlC.setBodyText("Hello <b>bold text</b>");

The only parameter the constructor expects is a class implementing the DocumentRequestHandler interface. This interface defines how links and external resources (such as images, CSS files) in the document are fetched.

Since the example does not use links, we can specify null instead of the document handler. In this case, if links or external resources are specified in the document body they are disabled or ignored.

setBodyText accepts a string containing any text with XHTML-MP 1.0 tags. The text is wrapped with the HTML and BODY tags and passed on for parsing.

If the text is encoded, you can specify the encoding as follows:

setBodyText(String htmlText,String encoding)

If you have a full HTML file and not just the body text, the following can be used:

setHTML(String htmlText,String encoding,String title,boolean isFullHTML)

To make the HTMLComponent visible add it to a form and display that form. For example:

Form form = new Form("HTML Test");
form.setLayout(new BorderLayout());
form.addComponent(BorderLayout.CENTER,htmlC);
form.show();

Rich Text Rendered Using HTMLComponent

10.1.2 Reading HTML and Enabling External Resources

The most common use case for HTMLComponent is reading HTML files from either a local or remote source, while enabling external resources such as images and CSS files, and allowing the user to follow links.

To support this use case you must first implement a DocumentRequestHandler interface that contains a single method:

InputStream resourceRequested(DocumentInfo docInfo)

This method is called by HTMLComponent (and other internal classes in the html package) to obtain the InputStream of the specified document. Requested documents are HTML files (followed links), referenced CSS files, and referenced images.

The requested document information is stored in a DocumentInfo object, which is populated automatically by HTMLComponent. The DocumentInfo values can be used to determine the document's path, file name, type, etcetera.

This example does not implement a DocumentRequestHandler. It uses the HttpRequestHandler (a ready-made implementation that can be found in the LWUITBrowser application) instead. LWUITBrowser be checked out from the LWUIT SVN under MIDP/applications.

HttpRequestHandler implementation supports fetching HTML documents via both HTTP and from a JAR file. It supports cookies, encoding, error handling and caching via the Storage class (also available in LWUITBrowser).

The following sample code uses HTMLComponent and HttpRequestHandler to browse to the mobile Facebook site:

HttpRequestHandler handler = new HttpRequestHandler();
HTMLComponent htmlC = new HTMLComponent(handler);
htmlC.setPage("http://m.facebook.com");

The setPage method accepts a String containing the URL to be rendered. This can be a remote resource (such as an http:// address) or a local file in the JAR such as file:///somepath/somefile . Alternatively it can be any kind of URL that our implementation of DocumentRequestHandler "understands".For example if we want to allow fetching HTML documents via JSR75, we can define our own protocol identifier (i.e. jsr75://) and have our request handler detect that protocol and act accordingly to fetch the specified file via JSR75.

After showing the above HTMLComponent on a form, the user can view the HTML at the specified address (including images and CSS), and follow links.

Facebook Mobile Home Page Rendered Using HTMLComponent

10.2 `HTMLCallback`

During the lifecycle of HTMLComponent there are many events that the developer can respond to. Developers should implement the HTMLCallback interface and set it to the HTMLComponent.

The html package provides a default implementation of the HTMLCallback named DefaultHTMLCallback. This implementation doesn't do too much, but it does demonstrate how to implement the interface methods without harming HTMLComponent tasks (as there are several potential pitfalls). The methods are link icon 10.2.1 parsingError, 10.2.2 pageStatusChanged, 10.2.3 titleUpdated, 10.2.4 linkClicked, 10.2.5 getLinkProperties and 10.2.6 Auto Complete.

10.2.1 `parsingError`

This method is called whenever the internal parser encounters an error during the document's parsing. This can occur while processing the main HTML document or its referenced CSS files.

You must return a boolean value denoting whether to continue the document processing despite the error (true) or to stop processing (false).

Detailed information on the error can be found in the parameters the method passes, especially errorId which holds the error code (one of the ERROR_* constants).

10.2.2 `pageStatusChanged`

This method notifies detects changes in the page loading lifecycle. pageStatusChanged can help you display status information to the user or to delay to certain statuses for certain flows.

A new HTMLComponent starts as STATUS_NONE. Shortly after a page URL is set it becomes STATUS_REQUESTED. After a successful connection to the input stream it changes to STATUS_CONNECTED. When the page is displayed (and this can be before images have been completely loaded) the status changes to STATUS_DISPLAYED and finally after all resources have been fully loaded the status becomes STATUS_COMPLETED.

If an error is encountered during the page loading, for example an unrecoverable parsing error, then the status is STATUS_ERROR. If the page loading was cancelled the status becomes STATUS_CANCELLED.

10.2.3 `titleUpdated`

A useful event that is called after the document's title has been extracted from the TITLE tag of the HTML document.

10.2.4 `linkClicked`

Called whenever a link is clicked to allow alternative or additional handling. Usually when a link is clicked the link is simply followed through, but in some cases you might want to take additional actions. For example, some updates to the UI outside the HTMLComponent.

The return value should be true if the regular link processing should proceed, and false if it should not.

10.2.5 `getLinkProperties`

This method is used to support Visited and Forbidden links.

Visited Links: Most browsers to mark visited links in different colors. HTMLComponent does not have any info on which links have been visited before, but getLinkProperties can help hook it to any implementation that tracks links, returning LINK_VISITED for visited links. (See LWUITBrowser for an example.)
Forbidden Links: Sometimes you may want to disallow the use of some links. A common use case may be restricting the user from accessing links outside a defined domain. Another may be blocking content types that HTMLComponent can not render. When getLinkProperties is called, the implementation can look at the URL and determine whether it returns LINK_REGULAR which enables the link, or LINK_FORBIDDEN which disables it.

10.2.6 Auto Complete

The fieldSubmitted and getAutoComplete methods support an auto complete implementation.

fieldSubmitted is called whenever a field in an HTML form is submitted. In return, the implementation should return the actual field value to send to the form. This can be used to perform some content filtering if needed. When none is needed, the value should be returned as is. However, you get the chance to store the field value along with its name, the form URL etc.

Data collected with fieldSubmitted can be used to populate form fields with getAutoComplete, which is called while constructing forms to obtain values for the various fields. Returning null simply means that users must fill out the form themselves. You can also supply another value that is appropriate for the form's specific field,. For example from a repository of stored values as recorded by fieldSubmitted.

10.3 Fonts

When rendering HTML, the HTMLComponent uses the following font facilities described in the following sections:

10.3.1 Default Font
10.3.2 System Fonts in HTMLComponent
10.3.3 Bitmap Fonts
10.3.4 Font Tags

10.3.1 Default Font

The default font used is the system font with FACE_SYSTEM, STYLE_PLAIN and SIZE_MEDIUM. This can be changed using the setDefaultFont method that accepts a font key (see link icon 10.3.3 Bitmap Fonts) and the font itself.

10.3.2 System Fonts in HTMLComponent

HTMLComponent automatically uses all available system fonts. For example if the <b> tag is encountered while rendering text with the default system font, the text in the tag is rendered with a system font with the style STYLE_BOLD. Same goes for the <big> tag which causes text to be rendered with a system font that has a size of SIZE_LARGE.

Note that not all system fonts, faces, styles, and sizes are available on all handsets. In fact it is very rare that a device has the full range of fonts representing all possible combinations of those properties. When HTMLComponent attempts to use unavailable fonts they are rendered according to fonts the device actually supports.

10.3.3 Bitmap Fonts

To enable HTMLComponent to use bitmap fonts, introduce them with the addFont method. This method accepts a String identifying the font (Font Key), and a LWUIT Font object. Usually this would be a bitmap font loaded from one of the resource files.

The font keys is an important concept in HTMLComponent. A font key identifies the font properties such as its family, style and size. While these properties are known for system fonts, they are unknown for bitmap fonts – and providing them to HTMLComponent allows them to be used correctly while rendering documents.

For example, to add a bold Arial font with a size of 20 pixels, one should use:

Font font = Font.getBitmapFont("myarialfont");
addFont("arial.12.bold", font);

The format of the font key is the family, size and style(s) delimited with the period sign. Order is irrelevant (i.e. arial.12.bold is the same as 12.bold.arial).

Note that the name of the font may be different in the resource files than the font key (In our example it is called "myarialfont"), though it is a good practice to name the font according to the font key.

Let's say that we add the following fonts as well:

// fonts = … 
// filling the fonts array with fonts from the resource file
setDefaultFont("arial.12", fonts[0]);
// Specifying "plain" as a style is optional
addFont("arial.12.bold.italic", fonts[1]);
addFont("timesnewroman.10", fonts[2]);
addFont("arial.14", fonts[3]);

And now we load the following HTML:

<html>
   <body>
      Default font
      <Bold font <i>Bold and Italic</i> </b>
      <big> Big font </big>
      <small> Small font </small>
   </body>
</html>

By specifying the font keys we allow HTMLComponent to know which font to assign when encountering font related tags (and also CSS attributes).

In the example above the words "Default font" are displayed in arial.12 font, and the rest of the text is displayed according to the tags. However, the "Small font" text is displayed in the default font, because even though there is a smaller font ("timesnewroman.10") it is not of the same family. The font matching algorithm gives more weight to the family than the size, and in fact is configured to match only fonts from the same family. Font matching is done sometimes under less then ideal scenarios. While HTML documents may be rich in fonts, the mobile client can offer a limited number of system and bitmap fonts. You should try to match the content with the available fonts in the application.

Also note that system fonts are always matched with other system fonts and bitmap fonts only with other bitmap fonts.

10.3.4 Font Tags

HTML defines several tags that cause (among other thing) a font change when rendered. The font selected to render these tags can be defined in a similar way to adding bitmap fonts. All you need to do is add the desired tag name to the font key. For example:

addFont("arial.20.bold.h1", myheaderfont);

Now text inside the <h1> tag is rendered with the specified font. Note that the font is added to the font pool and can be also used, for example, when the component seeks a matching bold and big font. Technically, you can prevent the component from using this font by adding it with a font key of just "h1", but of course this is not recommended.The tags that have associated fonts are: H1, H2, H3, H4, H5, H6, EM, STRONG, DFN, CODE, SAMP, KBD, VAR, CITE and PRE.

By default these tags are assigned with the following system fonts:

EM, DFN, VAR, CITE: system, italic, medium
CODE, SMAP, KBD: monospace, plain, medium
STRONG, H3: system, bold, medium
H1:system, bold, large
H2:system, italic, large

Note that while usually there is no reason to add a system font (as they are all automatically used), there is a use case for defining a tag-related font as follows:

Font sysFont=Font.createSystemFont(Font.FACE_SYSTEM,
     Font.STYLE_BOLD, Font.SIZE_SMALL);
addFont("h4", sysFont);

Note that here it is totally unnecessary to provide any other font properties in the font key because system fonts are supported without explicit addition. But denoting the font key as "h4" makes the component render text inside the H4 tag with the specified font.

Small-caps Font

One special case worth noting is small-caps fonts. In CSS one can define the font-variant property to the small-caps value. In this case the text should be displayed all in caps, with large capital letters depicting “regular” capital letters, and small caps depicting regular text.

System and bitmap fonts do not have this effect. If you have system or bitmap fonts in the documents the application renders, add the font to a resource file. In the resource editor, select a font that behaves like a small caps font) and it should be named as a “small-caps” family:

addFont(“small-caps.14.bold”, smallcapsFont);

If no small-caps fonts are added, the font-variant: small-caps CSS directive is ignored if encountered.

10.4 Styles in HTMLComponent

HTMLComponent renders most of the HTML tags as regular LWUIT components and as such uses the defined styles for these components. For example, form buttons render as LWUIT's Button, and as such any style that is applied in the theme to Button is expressed in buttons inside the HTML document.

There are however some custom components with the following UIIDs:

HTMLLink: Used for links in the document.
HTMLHR: Used for the HTML hr tag (Horizontal separator)
HTMLFieldSet: Used to render the HTML fieldset tag
HTMLOptgroup: Used to render the title of an option group inside a ComboBox (option groups are defined by the optgroup html tag)
HTMLOptgroupItem: Used to render a ComboBox single item that is a part of an option group.
HTMLMultiComboBoxItem: Used to render an item in a multiple choice ComboBox. LWUIT ComboBox

One can define the style of these components in the theme by using the above UIIDs. The LWUITBrowser application contains a theme that includes standard definitions for these UIIDs and can be used as a starting point.

Page Styling

Pages rendered with HTMLComponent are rendered on an internal container. This means that setting styles to the HTMLComponent itself won't necessarily affect the page style.

To change the style of this internal container, one can use the setPageStyle method that accepts a Style object.

10.5 Character Entities

Some characters are represented by character entities (which can be compared to Java escape sequences) either because the characters are reserved or because the character matching key in the keyboard.

A character entity is represented either by its Unicode numeric value or by a verbal symbol. For instance the character > (greater than) which is reserved for HTML tags, is represented either by > (its Unicode value) or by > (gt is the symbol assigned to this character).

HTMLComponent translates any numeric value and display the according character (of course depending on its availability in the font used). As for symbols, it supports all the standard ISO 8859-1 symbols (up until Unicode value of 255) and does not recognize symbols with Unicode value greater than 255 except 2 very common symbols - euro and bull (bullet).

If you need support for upper symbols, they can be added using the static methods addCharEntity and addCharEntitiesRange. For example:

HTMLComponent.addCharEntity("spades",9824);

10.6 HTMLComponent Settings

There are various settings you can control (or relay to the user's control) with HTMLComponent:

Image loading: Can be turned on/off using setShowImages(boolean). The default is true (showing images). When this is set to false, referenced images are not loaded nor are they displayed.
CSS loading: Can be turned on/off using setIgnoreCSS(boolean). The default is false (CSS are loaded). When this is set to true, all CSS directives are ignored including inline CSS, embedded CSS and external CSS files.
CSS media types: CSS references can specify which media types they are suitable for. For example an HTML document can have 2 separate CSS files, one for use with the "handheld" media type and the other with the "screen" media type. By default HTMLComponent accepts CSS files and segments that are defined as "handheld" or "all" (or if the media type is unspecified). To modify the supported media types one can use the setCSSSupportedMediaTypes method.
Max Threads: The number of threads used by HTMLComponent to load external referenced images and CSS files can be set with setMaxThreads(int). The default is 2.

10.7 CSS Support

HTMLComponent supports WCSS which is a subset of CSS 2.0. It supports inline CSS directives, embedded CSS segments, and external CSS files. Following are the supported attributes in HTMLComponent:

Fully supported CSS properties:

`Background`	background-color, background-image, background-repeat, background-attachment, background-position-x, background-position-y
`Border`	border--width, border--style, border-*-color
`Fonts`	font-family, font-size, font-style, font-weight, font-variant
`Lists`	list-style-image, list-style-position, list-style-type
`Margins`	margin-, padding-
Text	text-align, text-indent, text-transform
`Misc`	color, height, width, visibility
`WAP`	-wap-access-key, -wap-input-format, -wap-input-required
`Shorthand properties`	All shorthand properties are fully supported
	* represents `top`, `left`, `bottom`, `right`

Partially supported properties:

`display`	Supported: none, marquee Unsupported: block, inline , list-item
`white-space`	Supported: normal, nowrap Unsupported: pre
`vertical-align`	Works only within tables

Unsupported properties:

clear, float

Known issues:

width or height work for simple elements, but may be problematic with complex elements (for example tables).
font-family accepts the first mentioned font and ignores all fallback fonts, since finding a matching font is very time consuming, and also since in the ME environment usually there aren't that many fonts anyway.
text-decoration is irrelevant: since the only mandatory WCSS decoration value is 'none' which is usually used to remove underlines from links - since we don't have underlines it has no meaning.
text-transform may have issues when overriding a parent which has a different transform.
Some properties are ignored if associated with a pseudo-class (such as a:focus or hover) - and that's because while LWUIT does have separate styles for selected, unselected and pressed states - these styles include properties such as padding, margins, colors, background, font - but for example not alignment or visibility which affect the component in all of its states.

10.8 Implementing a DocumentRequestHandler

In the first example we have used a ready-made DocumentRequestHandler implementation. In this section we will create our own simple implementation that reads from files stored in the JAR.

Our implementation will accept URLs with the file:// protocol only, and fetch them from the JAR:

import com.sun.lwuit.html.DocumentInfo;
import com.sun.lwuit.html.DocumentRequestHandler;
import java.io.ByteArrayInputStream;
import java.io.InputStream;
 
class FileRequestHandler implements DocumentRequestHandler {
 
   public InputStream resourceRequested(DocumentInfo docInfo) {
      // Get the full URL from the docInfo
      String url=docInfo.getUrl();
 
      if (!url.startsWith(“file://”)) { // We support only files
         return getErrorStream(“This handler handles files only.”)
      }
 
      if (docInfo.isPostRequest()) { // We don't support POST
         return getErrorStream(“GET requests only please!”);
      }
      url=url.substring(7); // Cut the file://
      return getClass().getResourceAsStream(url);
      }
   // Utility method to get a stream out of a string
   private InputStream getErrorStream(String err) {
      err=”<html><body>”+err+”</body></html>”;
      ByteArrayInputStream bais =
         new ByteArrayInputStream(err.getBytes());
      return bais;
   }
}

As we can see the implementation is quite simple. It uses the getResourceAsStream method to obtain an InputStream of the file and send it over, but before that it queries the passed DocumentInfo object to get some information on the requested page. This object is explained in detail in the next section.

10.9 DocumentInfo

The DocumentInfo is an object that is passed from the HTMLComponent to the DocumentRequestHandler, and can be used by the latter to obtain information about the document such as its location, type, encoding etcetera, and also to hint back to the HTMLComponent about attributes it found about the document.

When a setPage is called on an HTMLComponent, it results in a call to the DocumentRequestHandler's resourceRequested method, with a populated DocumentInfo object. This method is also called when links are clicked or referenced images and CSS files are needed. The remainder of this section discusses some useful DocumentInfo getters/setters that a DocumentRequestHandler implementation should consider:

10.9.1 getUrl
10.9.2 getEncoding and setEncoding
10.9.3 getParams
10.9.4 getExpectedContentType and setExpectedContentType
10.9.5 getFullUrl or getBaseUrl

10.9.1 `getUrl`

This method returns the absolute URL of the requested document. The absolute URL is automatically calculated internally according to the page on which the link was clicked on. Implementations can learn about the document protocol (file, http etc.) and about the document's domain and act accordingly. For example, it is possible to allow only certain protocols or domains, or to use custom protocol strings etc.

10.9.2 `getEncoding` and `setEncoding`

getEncoding and setEncoding are quite important when reading documents that can have different encodings.Encoding information of HTML and CSS documents can appear in multiple places. For example when posting a form, its FORM tag can have an ENCTYPE property that specifies the form's encoding. This is one situation in which the encoding in the provided DocumentInfo is different than the default (which is ISO-8859-1), and thus has to be queried to set encoding headers appropriately. On the other direction, when requesting a document, the encoding can be specified by the response headers (charset in the content-type header) – and then in order for HTMLComponent to be able to read the document properly, the encoding type must be set using setEncoding. Note that encoding can be set in other ways as well such as BOM (Byte Order Mark), and it is the responsibility of the DocumentRequestHandler to figure it out and relay that info to HTMLComponent via the DocumentInfo object.

10.9.3 `getParams`

getParams returns the request parameters. It can be used for example to screen parameters before sending to the server (And it has a matching setter as well)

10.9.4 `getExpectedContentType` and `setExpectedContentType`

The expected content type is what the HTMLComponent expects to find when requesting the resource in question. This would be an HTML document (TYPE_HTML) when setting a page or clicking links, an image (TYPE_IMAGE) for image references and a CSS file (TYPE_CSS) for CSS references. Queering the expected content type can help processing, for example we will check encoding only for HTML and CSS, but not for images. Another reason may be that we want to cache images and not HTML documents and so on.

10.9.5 `getFullUrl` or `getBaseUrl`

Other more informative methods include getFullUrl returning a string composed of the absolute URL plus the parameters of the request (if any, and only if this was a GET request). Another one is getBaseUrl returning the document base URL.

Lightweight UI Toolkit Developer’s Guide