next up previous
Next: Push and Pull: complementary Up: Design of a Pull Previous: Introduction

Current XML APIs for streaming parsing

The XML 1.0 [6] specification was adopted in February 1998 as a W3C Recommendation. A number of programming APIs have since been developed to parser XML. These include SAX, DOM, JDOM, DOM4J, libxml, RXP and NanoXML. Amongst these SAX has emerged as a de-facto standard for event-based XML parsing. It was developed as collaborative effort by the members of the XML-DEV mailing list. The current version of SAX, named SAX2, has support for XML namespaces [3], filter chains, querying and setting features and properties in the parser.

There exist some APIs for pull parsing but none in widespread use. For example kXML [9] is pull parser specifically designed for small devices and needs to be used with Java 2 Micro Edition. It provides simple API however the API and the implementation are tied together. With kXML it is easy to create an XML object tree in memory. However it is not easy to achieve streaming performance as for each event a new object to represent it is created and returned to the user even if the user just wants to skip parts of XML.

Xerces Pullable is a pull parser API that is used by Xalan. It has a pullable SAX model. The version for Xerces1 and Xerces 2 are both different. An application needs to request the Xerces parser [2] to parse only some portion of input and as soon as the parser invokes SAX callback it stops parsing. Then the application can ask for more input that it needs to continue parsing. This is a very good approach for applications that have already invested in SAX infrastructure but want more control over parsing. However this API is not standardized and building more clearly defined pull API is desirable. This is exactly the intent of XPP2 API, described in later sections, to allow to use Xerces 2 Native Interface (XNI) with API that was specifically designed for XML pull parsing.

GNOME C libxml [4] and XMLIO [7] are other examples of XML parsers. GNOME C libxml is not a streaming parser as it just produces a list of events that can be traversed. XMLIO allows for pulling data from XML but its API is specifically designed to unmarshal data structures.


next up previous
Next: Push and Pull: complementary Up: Design of a Pull Previous: Introduction
Aleksander Andrzej Slominski
2002-02-10