• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Processing XML Streams with deterministic automata (2003)

Cached

  • Download as a PDF

Download Links

  • [www.cs.washington.edu]
  • [www.cs.ucdavis.edu]
  • [www.cis.upenn.edu]
  • [www.cs.washington.edu]
  • [www.cs.umass.edu]
  • [www.cs.washington.edu]
  • [www.cis.upenn.edu]
  • [www.cs.ucdavis.edu]
  • [www.cis.upenn.edu]
  • [www.cs.ucdavis.edu]
  • [www.cis.upenn.edu]

  • Other Repositories/Bibliography

  • DBLP
  • Save to List
  • Add to Collection
  • Correct Errors
  • Monitor Changes
by Todd J. Green , Gerome Miklau , Makoto Onizuka , Dan Suciu
Citations:107 - 3 self
  • Summary
  • Active Bibliography
  • Co-citation
  • Clustered Documents
  • Version History

BibTeX

@INPROCEEDINGS{Green03processingxml,
    author = {Todd J. Green and Gerome Miklau and Makoto Onizuka and Dan Suciu},
    title = {Processing XML Streams with deterministic automata},
    booktitle = {},
    year = {2003},
    pages = {173--189}
}

Years of Citing Articles

Bookmark

citeulike Connotea Bibsonomy Del.icio.us Digg Reddit

OpenURL

 

Abstract

Abstract. We consider the problem of evaluating a large number of XPath expressions on an XML stream. Our main contribution consists in showing that Deterministic Finite Automata (DFA) can be used effectively for this problem: in our experiments we achieve a throughput of about 5.4MB/s, independent of the number of XPath expressions (up to 1,000,000 in our tests). The major problem we face is that of the size of the DFA. Since the number of states grows exponentially with the number of XPath expressions, it was previously believed that DFAs cannot be used to process large sets of expressions. We make a theoretical analysis of the number of states in the DFA resulting from XPath expressions, and consider both the case when it is constructed eagerly, and when it is constructed lazily. Our analysis indicates that, when the automaton is constructed lazily, and under certain assumptions about the structure of the input XML data, the number of states in the lazy DFA is manageable. We also validate experimentally our findings, on both synthetic and real XML data sets. 1

Citations

7321 Introduction to Algorithms - Cormen, Leiserson, et al. - 2001
3376 Introduction to Automata Theory Languages and Computation. 2nd edition. Addison-Wesley Publishing Company, 2000. de - Hopcroft, Ullman - 2004
1654 B.: Building a large annotated corpus of english: The Penn treebank - Marcus, Marcinkiewicz, et al. - 1993
592 Query evaluation techniques for large databases - Graefe - 1993
470 Data on the Web: From Relations to Semi structured Data and XML - Abiteboul, Buneman, et al. - 1999
459 Dataguides: Enabling query formulation and optimization in semistructured databases - Goldman, Widom - 1997
449 M.J.: Efficient string matching: An aid to bibliographic search - Aho, Corasick - 1975
441 NiagaraCQ: A scalable continuous query system for Internet databases - Chen, DeWitt, et al. - 2000
272 Efficient filtering of xml documents for selective dissemination of information - Altinel, Franklin
195 Adding Structure to Unstructured Data - Buneman, Davidson, et al. - 1997
173 Query Optimization for XML - McHugh, Widom - 1999
155 Handbook of Formal Languages - Rozenberg, Salomaa - 1997
150 Efficient filtering of XML documents with XPath expressions - Chan, Felber, et al.
136 Optimizing regular path expressions using graph schemas - Fernandez, Suciu - 1998
125 Stream Processing of XPath Queries with Predicates - Gupta, Suciu - 2003
111 Principles of programming with complex objects and collection types - Buneman, Naqvi, et al. - 1995
105 Path sharing and predicate evaluation for high-performance xml filtering - Diao, Altinel, et al.
103 Regular expression search algorithm - Thompson - 1968
95 Database system implementation - Garcia-Molina, Ullman, et al. - 2000
90 Xpath queries on streaming data - Peng, Chawathe - 2003
84 YFilter: Efficient and scalable filtering of XML documents - Fischer, Franklin, et al. - 2002
66 Mesh-Based Content Routing using XML - Snoeren, Conley, et al. - 2001
66 A transducer-based xml query processor - Ludascher, Mukhopadhayn, et al. - 2002
60 Monitoring XML Data on the Web - Nguyen, Abiteboul, et al. - 2001
59 The bea/xqrl streaming xquery processor - Florescu - 2003
54 An XML query engine for network-bound data - Ives, Halevy, et al. - 2002
42 Query Processing for High-Volume XML Message Brokering - Diao, Franklin - 2003
38 Everything you ever wanted to know about DTDs, but were afraid to ask (Extended Abstract - SAHUGUET - 1997
32 An evaluation of regular path expressions with qualifiers against xml streams - Olteanu, Kiesling, et al. - 2003
23 Syntactic Definitions for the ACEDB Data Base Manager - Thierry-Mieg, Durbin - 1992
21 Ecient string matching: an aid to bibliographic search - Aho, Corasick - 1975
15 NFAs with Tagged Transitions, their Conversion to Deterministic Automata and Application to Regular Expressions - Laurikari - 2000
15 A taxonomy of finite automata construction algorithms - Watson - 1993
14 XMLTK: An XML toolkit for scalable XML stream processing - Avila-Campillo, Green, et al. - 2002
14 The EMBL data library - Higgins, Fuchs, et al. - 1992
7 Implementing and using finite automata toolkits - Watson - 1996
6 The view selection problem for xml content based routing - Gupta, Suciu, et al. - 2003
6 Light-weight xpath processing of XML stream with deterministic automata - ONIZUKA
5 Ecient of xml documents for selective dissemination of information - Altinel, Franklin - 2000
3 XMill: an efficent compressor for XML data - Liefke, Suciu - 2000
3 View selection for XML stream processing - Gupta, Halevy, et al. - 2002
2 Processing XML Streams with Deterministic Automata,” Proc. of the 9 [11] shik - Suciu - 1998
2 Ecient of XML documents with XPath expressions - Chan, Felber, et al. - 2002
2 Computer science bibliography (dblp). http://dblp.uni-trier.de - Ley
1 An XML query engine for network-bound data. Unpublished - Ives, Halevy, et al. - 2001
1 Syntactic Definitions for the ACEDB - Thierry-Mieg, Durbin - 1992
1 Y Ecient and scalable of xml documents - Diao, Fischer, et al. - 2002
1 XMill: an ecent compressor for XML data - Liefke, Suciu - 2000
1 Syntactic De for the ACEDB Data Base Manager - Thierry-Mieg, Durbin - 1992
1 direct internet message encapsulation specification index page. IETF Internet Draft, available from http://msdn.microsoft.com/webservices/understanding/gxa/default.aspx - DIME
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University