New Technologies in Web Development: Fast XSLT

Introduction

This article reviews the birth and development of the promising compiled-XSLT engine, Apache XSLTC, and the fierce competition among developers of XSLT engines to be the performance leader.

I Had A Dream

All great revolutions begin with one person, one dream, one idea, and one vision. And such is the case with Jacek Ambroziak: originator, visionary, and father of the Apache XSLTC project. Exactly one year ago Jacek collected together his thoughts and personal perspective as he departed the project in a document I refer to as the XSLTC memoir. Like all noteworthy breakthroughs, this one was inspired by the serendipitous convergence of three bodies of knowledge: the Java Virtual Machines, Compiler Theory, and XML transformation -- which all came together for Jacek one dark and stormy night back in early 2000. From there, the project was born at Sun Microsystems' XML Technology Center. And life was good, for a while.

To Compile or Not To Compile

An ancient question, stretching back to nearly the dawn of computer science, has been, to compile or interpret? Procedural languages have long explored this issue especially with the C and C++ language compilers that have dominated the last few decades. Java is both compiled and interpreted, depending upon how hard you look at it.

Does compilation always improve performance significantly? No. For any given problem, this is called the "Compile Conjecture." In a nutshell the answer is in the testing: if when all is said and done, the final optimized XSLT engine, or whatever software product, spends only 10% of its CPU time in the generated compiled code and 90% in generic algorithm code, then the increased complexity by using the compiler tactic is probably not worth it.

One of the oldest examples of "problem to compiled code" is the Unix tool lex, which is basically a regular expression generator creating optimized C code on the fly to fulfill the task of transforming a byte stream into a token stream. In this situation all possible efficiency is useful due to handling a large number of bytes. So the key question to ask is what type of software problems benefit from compilation, and does XSLT fit into this model?

Overview of Apache XSLTC

Apache XSLTC began in early 2000 at Sun Microsystems, first with Jacek Ambroziak and eventually turning into a small team. XSLTC is most significantly differentiated from other transformer engines by its distinctive two-phase operation. The XSL stylesheet is first compiled into a Java class called a "translet" ("transformer servlet", not related to a web application servlet). This stand-alone translet can be loaded as a Java class and run in a different process or even on an entirely different machine. The core translet accepts a DOM-like object for input and produces a SAX-like output stream. Adaptors are provided to convert to all conventional standards.

While XSLTC demonstrates a significant speed advantage over other transformers (300% to 500% over Xalan-J), it is still a young product and has limited range of compliance and a limited feature set. However, its true claim to fame may be its small memory footprint. While this is an advantage in space limited situations, it could potentially be a deciding factor in large high-volume web applications where sometimes thousands of requests are fulfilled simultaneously. On a per-thread basis, a translet can easily consume one-tenth the space of a conventional XSLT engine. This characteristic mitigates a major concern to date of using any XSL technology in high-volume sites.

Few generic products can easily gain acceptance as a proprietary offering from a commercial firm. Sun Microsystems made the decision to donate this software to the Apache Xalan effort in hopes of it seeing wider acceptance and usage. Without this decision, XSLTC would have probably become just a footnote in XSLT history. XSLTC is presently bundled with Xalan-J, yet is still a clearly individual component. Both Sun and IBM contribute key engineering resources to this project.

Gregor: A Second Generation Design

But Jacek Ambroziak has wiped the plate completely clean and started a new XSLT engine, which he calls "Gregor", at his new firm Ambrosoft. As Jacek says, "In the case of XSLTC (and especially Gregor) it is not only compilation but sets of particular algorithmic ideas that matter a lot for performance (speed/memory)".

In early testing, Gregor is tentatively showing nearly ten times the performance of Xalan-J and twice that of XSLTC; but it is not a complete product yet. Benchmarks can be misleading (See the "Metrics of Metrics" section below).

In the end, the question always remains, "are their sufficiently radical and advantageous architectural approaches that can improve performance yet unexplored and untapped?" In the case of XSLT engines, the answer is "probably", since it's a very young field.

To Be King

The race among XLST engines is not an easy one. There are many disjoint objectives: performance, quality, compliance, small size, etc. It is not always clear which architectural strategy will produce the best results.

Developing Performance and Superiority

Each XSLT engine tends to discover key algorithms and data structures that have produced superior results. Slowly, over time, the various projects assimilate each other's tactics. And yet they also break new ground in order to differentiate themselves, much like any competitive software field.

Compilation to bytecode is the key approach being discussed here. However, Michael Kay, author of Saxon, believes other tactics such as tree-rewriting, tree-building, sorting, memory allocation, and in general optimization at the semantic level (that is, rewriting the stylesheet) have a much greater impact on performance.

Xalan-J pioneered the usage of an internal data structure called DTM (Document Table Model); XSLTC will soon use this same tactic, achieving more commonality between the two Apache products.

XSLT is a rapidly moving target. An XSLT transformer is dependent upon numerous additional standards such as XPath, W3C XML Schema, XML Namespaces, EXSLT, DOM, SAX, and potentially other proprietary APIs. While in its pure form XSLT seems theoretically sufficient and complete, in practice extensions such as those provided by the community extension project EXSLT are essential. All these factors make it difficult to be an XSLT contender. A project must be under continual development and evolution or otherwise will lose its edge and fall behind.

Hardware Acceleration

Still need more horsepower? You are not alone. DataPower has been pursuing the high throughput XSLT market for some time: first with their just-in-time compiled XSLT product, XSLJIT, and more recently with a dedicated hardware accelerator called XA35. I spoke in depth with a Eugene Kuznetsov about this product, and tried to squeeze out of him the secrets of why dedicated hardware makes such a difference. He was pretty tight lipped, but I have a sneaky suspicion that associative memory, or something similar not found in the classic von Neumann CPU architectures, may play a key role. In any case, if indeed XSLT processing is significantly accelerated by dedicated hardware, we may see a day where specialized XSLT coprocessors are designed into CPUs much like math coprocessors. In the mean time, such hardware accelerators are viable solutions for the most challenging situations.DataPower also supports a Java interface that handles the remote communication allowing the accelerator to appear as a simple software component.

The Metric of Metrics

Benchmarks, especially XSLT benchmarks, are fuzzy because usage is a complex issue. The key question is what constitutes typical usage. Benchmark programs are characteristically composed of endlessly detailed case situations and caveats making clear comparisons nearly impossible. The state of affairs is reminiscent of earlier procedural compiler benchmark comparisons, where the general trends seem clear but details are debatable.

One further element that adds complexity is the separation of XML parsing and serialization processing from XSL transformation processing. While this is fairly straightforward to separate out if all engines use a DOM object for input, what if a particular engine exhibits better performance using SAX? Should this transformer be assigned a DOM to SAX conversion time penalty?

Finally, the reported benchmarks below are only for performance in time (that is, speed or throughput) and make no reference to footprint, breadth of compliance, or other characteristics.

A summary of the dominant XSLT engine, plus the new Gregor, is shown below. All performance numbers are normalized to be 100% for Xalan-J. Much of this data is obtained from the particular XSLT development team itself. It is expected that the data is biased toward their own product because the teams have, inadvertently or otherwise, adjusted the test suite to emphasis the features and functionality more in line with their perception of the customer base and usage. And, further, they've optimized their product accordingly. Basically, this is a Darwinian divergence in progress and the correctness of their assumptions will directly influence the market acceptance.

Relative Performance	Apache Xalan-J	Saxon	Apache XSLTC	Gregor	XA35
Authors, Developers, Support Team	Sun + IBM	Michael Kay	Sun + IBM	Ambrosoft	DataPower
Key Architectural Characteristics	Document Table Model	Sax-Centric Tiny-Trees	Compiled to Byte-Code	Pattern matching meta-algorithm	Dedicated and Specialized Hardware
Performance
Apache (Using DataPower XSLTMark++)	100%	na	360%	na	na
Ambrosoft (Using DataPower XSLTMark)	100%	130%	600%	1600%	na
Sarvega XSLT Benchmark Study	100%	200%	na	na	na
DataPower XSLTMark	100%	200%	na	na	2100%

Conclusion

Compilation to bytecode is just one of many tactics being utilized in the XSLT performance race. It is possible that additional significant gains will also be obtained in other areas such as intelligent pre-optimization of stylesheets, clever internal data structures, or even a more hot-spot like run-time optimizer. Most likely the field will remain competitive, each product achieving improvements and gains on a year-to-year basis, new contenders coming onto the scene, and some old ones slipping off.

Performance is difficult to quantify and predict and highly dependent upon the exact customer usage. For performance in critical situations it is recommended that a small handful of the dominant engines be tested in a particular application before a final decision is made.