Fast Infoset in Java Web Services Developer Pack, Version 1.6

Santiago Pericas-Geertsen

Paul Sandoz

April 12, 2005


Table of Contents

Introduction
Using Fast Infoset
Base64 Data
Fast Infoset Documents as Attachments
Web Services Security
Performance
Conclusions
References

Introduction

The Fast Infoset specification (ITU-T Rec. X.891 | ISO/IEC 24824-1) describes an open, standards-based "binary XML" format that is based on the XML Information Set [XML Information Set]. JWSDP 1.6 now supports this optimized encoding as part of JAX-RPC 1.1.3. For ease of deployment, this new version of JAX-RPC also support a form of HTTP content negotiation that can be used to "turn on" Fast Infoset during message exchanges. By default, the Fast Infoset encoding is turned off. For more information on how to use this feature see the section called “Using Fast Infoset”.

The XML Information Set specifies the result of parsing an XML document, referred to as an XML infoset (or simply an infoset), and a glossary of terms to identify infoset components, referred to as information items and properties. An XML infoset is an abstract model of the information stored in an XML document; it establishes a separation between data and information in a way that suits most common uses of XML. In fact, several of the concrete XML data models are defined by referring to XML infoset items and their properties. For example, SOAP Version 1.2 [SOAP 1.2] makes use of this abstraction to define the information in a SOAP message without ever referring to XML 1.X, and the SOAP HTTP binding specifically allows for alternative media types that "provide for at least the transfer of the SOAP XML Infoset".

The Fast Infoset specification is jointly standardized at the ITU-T and ISO. As of this writing, the ISO Final Committee Draft ballot has been completed, and the specification has gone for Consent to Last Call at the ITU-T Study Group 17 meeting in Moscow, 30 March - 8 April 2005. The specification is available to all ITU-T sector members and can also be obtained via the corresponding ISO national body in your location. These specifications recommend the use of the MIME type application/fastinfoset, which has recently been approved by the Internet Engineering Steering Group (IESG) for documents serialized using this format.

Fast Infoset @ Java.net [Fast Infoset @ Java.net] is an open-source project initiated by Sun Microsystems to provide access to a fast, fully-featured and robust implementation of the Fast Infoset specification. JAX-RPC 1.1.3 employs the basic Fast Infoset parsers and serializers that are available from that project.

Using Fast Infoset

Content negotiation is completely driven by the client and uses the standard HTTP headers Accept and Content-Type. The initial request is always encoded in XML, but the client has the option of including the MIME type application/fastinfoset as part of the Content-Type list. If the request is received by a Fast Infoset-enabled service, the reply will be encoded in Fast Infoset. The remainder of the conversation between the client and the service will also be encoded in Fast Infoset as long as the client continues to use the same artifact (e.g., the same stub instance) to converse with the server. We call this form of negotiation as pessimistic, in contrast to the optimistic case, in which a client directly initiates a message exchange using the more efficient encoding.

Content negotiation can be enabled in two different ways: (i) by setting a system property on the VM used to run the client, and (ii) by setting a property on the static/dynamic stub or DII instance of Call used by the client. In either case, both the property name and its value are identical.

A system-wide property can be set using the -D option of the Java command, as shown below:

Example 1. Setting the System Property

        java -Dcom.sun.xml.rpc.client.ContentNegotiation=pessimistic ...

The following example shows how set the property on a stub:

Example 2. Setting Content Negotiation on a Stub

        stub = ...;    // Obtain reference to stub
        stub._setProperty(
             com.sun.xml.rpc.client.StubPropertyConstants.CONTENT_NEGOTIATION_PROPERTY,
             "pessimistic");     

These properties accept one of two values: none and pessimistic, with the former set as the default. If the client uses a Dynamic Instance Invocation (DII), the same property can be set as follows:

Example 3. Setting Content Negotiation on an instance of Call

        call = ...;    // Obtain reference to call
        call.setProperty(
             com.sun.xml.rpc.client.dii.CallPropertyConstants.CONTENT_NEGOTIATION_PROPERTY,
             "pessimistic");     

Even though a different Java constant was used in the DII case, the actual name of the property is the same as in the previous two examples (i.e., com.sun.xml.rpc.client.ContentNegotiation).

Base64 Data

Because XML is a textual format, binary blobs must be represented using character sequences before they can be embedded in an XML document. A popular encoding that permits this embedding is known as base64 encoding, and it corresponds to the XML Schema data type xsd:base64Binary. In a Web services toolkit that supports a binding framework, as it is the case in JAX-RPC, a value of this type must be encoded before transmission and decoded before binding. The encoding and decoding process is expensive and linear to the size of the binary object.

JAX-RPC 1.1.3 provides support for the so-called Fast Infoset base64 encoding algorithm. This is one of several built-in encoding algorithms in Fast Infoset that facilitate the encoding of character chunks (or text nodes) using a more efficient representation. More specifically, in the case of value of type xsd:base64Binary, byte sequences that are used to represent these blobs in memory can be parsed and serialized directly without the need for any encoding or decoding step. In JAX-RPC 1.1.3, this optimization is supported only for element content of type xsd:base64Binary; it is currently not supported for attribute values.

The SOAP Message Transmission Optimization Mechanism [MTOM], paired with the XML-binary Optimized Packaging [XOP], was proposed to address the inefficiencies related to the transmission of binary data in SOAP documents. This solution proposes a method in which XML messages are dissected in order to transmit binary blobs as MIME attachments in a way that is transparent to the application. It is worth pointing out that, in order to preserve full infoset fidelity, this transformation is restricted to base64 content in canonical form as defined in [XSD Datatypes].

Conceptually, Fast Infoset and MTOM/XOP are similar enough with respect to the management of binary data. In particular, supporting either of them requires Web services toolkits to be updated to avoid base64 encoding and decoding during parsing and data binding. We believe Fast Infoset's solution is not only much simpler to implement (it is a binary format after all!) but is likely to result in even better performance because there is no need to use a packaging technology such as MIME.

Fast Infoset Documents as Attachments

JAX-RPC supports the encoding of procedure call parameters and return values as MIME attachments. A SOAP message package with attachments is constructed using the MIME multipart/related type. This is typically accomplished by employing a MIME binding in the WSDL file. For example, the following snipet shows how to bind the addPhoto operation so that it returns the status part as a MIME attachment.

Example 4. Binding a part to a MIME attachment

        <wsdl:operation name="addPhoto">
            <wsdl:input>
                ...
            </wsdl:input>
            
            <wsdl:output>
                <mime:multipartRelated>
                    <mime:part>
                        <soap:body use="literal"/>
                    </mime:part>
                    <mime:part>
                        <!-- Use application/fastinfoset to indicate an FI attachment -->
                        <mime:content part="status" type="application/fastinfoset"/>
                    </mime:part>
                </mime:multipartRelated>
            </wsdl:output>
        </wsdl:operation>

This binding indicates not only that the status part is bound to a MIME part but also that the type of this attachment is application/fastinfoset, that is, a Fast Infoset document.

JAX-RPC uses the Java Beans Activation framework to support various MIME content types. The WSDL/XML to Java mapping in JAX-RPC will map the status part defined above to an instance of javax.activation.DataHandler whose content is of type org.jvnet.fastinfoset.FastInfosetSource. Thus, the return statement of the addPhoto operation would create an instance of DataHandler as shown below.

Example 5. Fast Infoset Attachments in Java

        public javax.activation.DataHandler addPhoto(...)
            throws java.rmi.RemoteException 
        {
            ...
            return new DataHandler(
                new FastInfosetSource(...), "application/fastinfoset");
        }

Please refer to the [JAX-RPC 1.1 Specification] for more information on how to use MIME attachments in JAX-RPC and to [Fast Infoset @ Java.net] for more information on how to use Fast Infoset sources and results. The example used in this section is available as part of the set of JAX-RPC samples in $JWSDP_HOME/jaxrpc/samples.

Web Services Security

JWSDP 1.6 also includes support for XML and Web Services Security (XWSS). This library is integrated into JAX-RPC and can be used to define message level security in Web services conversations. Message-level security has a number of advantages over transport level solutions in that security information, as an intrinsic part of a message, can persist over the lifetime of a connection. In addition, XWSS supports features such as signing, which are not commonly supported in transport-level protocols.

Even though it is conceivably possible to modify the XWSS library to support Fast Infoset natively (e.g., by extending it to support a canonical version of Fast Infoset), we have decided against it at this point due to a lack of (i) interoperable standards and (ii) empirical evidence showing a significant performance improvement in switching to a binary encoding. Research in this area is currently undergoing at Sun as well as at the ITU-T and ISO study groups, so readers should anticipate more information about these research directions.

Conceptually, the operation of securing a message can be regarded as an infoset transformation, which is guided by certain configuration parameters defined by the application's developer. In this sense, securing a message takes place after its infoset is produced and before it is serialized; conversely, unsecuring a message takes place after a message is parsed and before it is consumed. Therefore, Fast Infoset can still be used as the on-the-wire encoding without affecting these two pre-serialization and post-parsing steps. [1] Note that this approach still requires the use of an XML parser and an XML serializer (or XML canonicalizer) during the message securing and unsecuring phases. In particular, an XML parser is required to process decrypted fragments and an XML canonicalizer is required to both generate and verify signatures.

Performance

As the saying goes, when it comes to performance, "your mileage will vary." Fast Infoset is designed to optimize parsing and serialization, so the key to understanding the potential gains associated with using this technology is understanding the percentage of time your application spends in these two tasks. The greater the percentage, the greater the improvement will be.

As part of the source code available from the Fast Infoset @ Java.net project [Fast Infoset @ Java.net], there is a tool called Japex that we have used to write a number of different micro-benchmarks for our Fast Infoset implementation. All these performance reports are available from the project's Web page. To summarize, parsing micro-benchmarks show an average improvement of 3-10X depending on the XML parser in question, and JAX-RPC micro-benchmarks show an improvement of 2-5X depending on the structure of the messages exchanged —with the higher improvements achieved when base64 encoding is avoided.

Conclusions

Fast Infoset @ Java.net [Fast Infoset @ Java.net] is still a young project, and the performance potentials of this technology are yet to be fully explored (especially in relation to advanced features like encoding algorithms, restricted alphabets and external vocabularies). The integration of Fast Infoset into JAX-RPC 1.1.3 is just a first step towards understanding the benefits of using this technology to speed up Web services. Future versions of JAX-RPC and JWSDP will provide support for many more of the advanced features defined in the Fast Infoset specification.

Your experience and feedback using this technology are important! Please refer to the Java.net's project page on information about becoming a participant in the Fast Infoset project.

References



[1] Care should be exercised if, in a given implementation, encoding algorithms do not preserve a datatype's lexical space because this may result in errors during the signature verification process.