The Parallel XML Parser for Multicore Systems

[Overview | Measurement | Publications |People |Acknowledgments]

Overview

A language for semi-structured documents, XML has emerged as the core of the web services architecture, and is playing crucial roles in messaging systems, databases, and document processing. However, the processing of XML documents has a reputation for poor performance, and a number of optimizations have been developed to address this performance problem from different perspectives, none of which have been entirely satisfactory. PXP is a novel approach: parallel XML parsing. PXP leverages the growing prevalence of multicore architectures in all sectors of the computer market, and yields
significant performance improvements. PXP consists of an initial preparsing phase to determine the structure of the XML document, followed by a full, parallel parse. The results of the preparsing phase are used to help partition the XML document for data parallel processing. Our parallel parsing phase is a modification of the libxml2 XML parser, which shows that our approach applies to real-world, production quality parsers. Our empirical study shows our parallel XML parsing algorithm can improved the XML parsing performance significantly and scales well.

Publications

        Parallel XML Parsing Using Meta-DFAs 
        by Yinfei Pan, Kenneth Chiu, Ying Zhang and Wei Lu.
            t
o appear in 3rd IEEE International Conference on e-Science and Grid Computing, Bangalore, India, 200

A Static Load-Balancing Scheme for Parallel XML Parsing on Multicore CPUs. [pdf]
by Yinfei Pan, Wei Lu , Ying Zhang and Kenneth Chiu.
CCGrid'07 (IEEE International Symposium on Cluster Computing and the Grid ),  2007, Rio de Janeiro, Brazil.

A Parallel Approach to XML Parsing, [pdf]
by Wei Lu, Kenneth Chiu, and Yinfei Pan,
Grid'06 (The 7th IEEE/ACM International Conference on Grid Computing), 2006. 

People

Wei Lu [HOME]
Kenneth Chiu @ SUNY Binghamton
Yinfei Pan @ SUNY Binghamton

Acknowledgments