MxN Logo

The Parallel XML Processing Library for Multicore Systems

[Overview | Measurement | Publications |People |Acknowledgments]

Overview

XML has emerged as the de facto standard interoperable data format for the web service, the database and document processing systems. The processing of the XML documents, however, has been
recognized as the performance bottleneck in those systems, then demand for high-performance XML processing grows rapidly. On the hardware front, the multicore processor is increasingly becoming
available on desktop-computing machines with quad-core shipping now and 16 core system within two or three years. Unfortunately almost all of the present XML processing algorithms are still using sequential processing model, thus being unable to take advantage of the multicore resource. We believe a parallel XML processing model should be a cost-effective solution for the XML performance issue in the multicore era.

ParaXML is a parallel XML processing C# library designed for multicore CPUs. ParaXML adopts the data-parallel paradigm and work-stealing load balancing scheme. By now, as a proof-concept work ParaXML has implemented the below modules:

  1. Parallel XML traversal and searching (Note: here the searching refers the simple depth-first element searching, a complete parallel XPath searching is under plan)
  2. Parallel XML serialization and C14N
  3. Parallel XML signature processing
  4. Parallel XML pull-parsing & Parallel XML DOM building

Measurement

Our empirical study hows that those parallel implementations substantially improved the performance and scale well on a multicore :

Publications

ParaXML: a Parallel XML Processing Model on Multicore CPUs
by Wei Lu and Dennis Gannon.
submitted

Parallel XML Processing by Work Stealing. [pdf] [ppt]
by Wei Lu and Dennis Gannon.
Workshop on Service-Oriented Computing Performance In Conjunction with HPDC'07, Monterey  Bay, CA.

People

Wei Lu [HOME]
Dennis Gannon [HOME

Acknowledgments