## Structured Parallel Computation in Structured Documents (1995)

### Cached

### Download Links

Venue: | Journal of Universal Computer Science |

Citations: | 8 - 2 self |

### BibTeX

@ARTICLE{Skillicorn95structuredparallel,

author = {D. B. Skillicorn},

title = {Structured Parallel Computation in Structured Documents},

journal = {Journal of Universal Computer Science},

year = {1995},

volume = {3},

pages = {42--68}

}

### OpenURL

### Abstract

Document archives contain large amounts of data to which sophisticated queries are applied. The size of archives and the complexity of evaluating queries makes the use of parallelism attractive. The use of semantically-based markup such as SGML makes it possible to represent documents and document archives as data types. We present a theory of trees and tree homomorphisms, modelling structured text archives and operations on them, from which it can be seen that: ffl many apparently-unrelated tree operations are homomorphisms; ffl homomorphisms can be described in a simple parameterised way that gives standard sequential and parallel implementations for them; ffl special classes of homomorphisms have parallel implementations of practical interest. In particular, we develop an implementation for path expression search, a novel powerful query facility for structured text, that takes time logarithmic in the text size. Keywords: structured text, categorical data type, software developme...

### Citations

440 |
Algorithmic Skeletons: Structured Management of Parallel Computation
- Cole
- 1989
(Show Context)
Citation Context ...sms [Col93], the composition of a homomorphism with a projection, and this is often of practical importance. Skeletons are an important approach to parallel computing, beginning from the work of Cole =-=[Col89]-=-. Homomorphic skeletons are particularly attractive because they come equipped with a useful transformation system, including a guarantee of expressive completeness [Ski94]. In the next section we rev... |

242 |
Data Parallel Algorithms
- Hillis, Guy
- 1986
(Show Context)
Citation Context ...tions. 6 Parallel Search of Flat Text There is a well-known parallel algorithm for recognizing whether a given string is a member of a regular language in time logarithmic in the length of the string =-=[5, 9]-=-. This algorithm is naturally parallel and readily adapted to document search, even on quite modest parallel computers. Although it was described for the Connection Machine [9] it never seems to have ... |

221 | An introduction to the theory of lists
- Bird
- 1987
(Show Context)
Citation Context ...al data types generalise abstract data types by encapsulating not only representation of the type, but also the implementation of homomorphisms on it. This approach was pioneered by Bird and Meertens =-=[Bir87]-=- and its use for software development by transformation has come to be known as the BirdMeertens Formalism. In object-oriented terms, the only methods available on constructed types are homomorphisms.... |

109 |
Signature Files: An Access Method for Documents and Its Analytical Performance Evaluation
- Faloutsos, Christodoulakis
- 1984
(Show Context)
Citation Context ...varying between individuals). It is usually implemented using indexing. Parallelism can be used by partitioning the index. 2. Search on full text. This is implemented using a signature file technique =-=[4, 22, 23]-=- or special purpose hardware [10, 11]. Parallelism can be used by partitioning the signature file. 3. Search on non-hierarchical tagged regions. This is a more expressive variant of full text search i... |

92 |
Architecture-independent parallel computation
- Skillicorn
- 1990
(Show Context)
Citation Context ...ition takes no longer than linear in the number of states of the automaton. The reduction itself takes time logarithmic in the size of the string being searched on a variety of parallel architectures =-=[18]-=-. The regular language recognition problem is easily adapted for query processing. Suppose we wish to determine if some regular expression, RE, is present in an input string. This regular expression d... |

87 |
A Simple Parallel Tree Contraction Algorithm
- Abrahamson, Dadoun, et al.
- 1989
(Show Context)
Citation Context ...W PRAM and more practical architectures such as the hypercube, tree reduction can be computed in time logarithmic in the number of nodes of the tree, subject to some mild conditions on the function g =-=[1, 16]-=-, described in Appendix A. This is a big improvement over the method suggested above, since a completely left or right branching tree with n nodes requires time proportional to n to reduce directly, w... |

84 |
The Art of Computer Programming, Volume I: Fundamental algorithms
- Knuth
- 1973
(Show Context)
Citation Context ... text associated with each node of the document. Trees representing structured text have arbitrary degree. Fortunately there is a natural way to transform a tree of arbitrary degree into a binary one =-=[Knu73]-=- and this transformation can be incorporated into the algorithms described here in a straightforward way [Ski]. Thus we will, for simplicity, describe the algorithms in the rest of the paper using bin... |

58 | A Cost Calculus for Parallel Functional Programming
- Skillicorn, Cai
- 1995
(Show Context)
Citation Context ...ssarily constant time, so the computation of f 2 will not be; and the size of lists grows with the distance from the leaves, creating a communication cost that must be accounted for on real computers =-=[21]-=-. Another useful class of tree homomorphisms computing global properties are those that compute properties of extent. The most obvious example computes the length of a document in characters. It is f ... |

58 |
Parallel free-text search on the connection machine system
- STANFILL, KAHLE
- 1986
(Show Context)
Citation Context ...varying between individuals). It is usually implemented using indexing. Parallelism can be used by partitioning the index. 2. Search on full text. This is implemented using a signature file technique =-=[4, 22, 23]-=- or special purpose hardware [10, 11]. Parallelism can be used by partitioning the signature file. 3. Search on non-hierarchical tagged regions. This is a more expressive variant of full text search i... |

43 |
Faster Optimal Parallel Prefix sums and List Ranking
- COLE, VISHKIN
- 1989
(Show Context)
Citation Context ... algorithm for deciding where to apply the contraction operations is the following: 1. Number the leaves left to right beginning at 0 -- this can be done in O(log n) time using O(n= log n) processors =-=[3]-=-. 2. For every u such that u.l is an even numbered leaf, perform the contraction operation. 3. For every u that was not involved in the previous step, and for which u.r is an even numbered leaf, perfo... |

42 | On the use of regular expressions for searching text
- Clarke, Cormack
- 1997
(Show Context)
Citation Context ... size of the string being searched on a variety of parallel architectures [Ski90]. The regular language recognition problem is readily adapted for query processing (although there are some subtleties =-=[CC95]-=-). Suppose we wish to determine if some regular expression, RE, is present in an input string. This regular expression defines a language, L(RE), that is then extended to allow for the existence of ot... |

40 |
New Indices for Text
- Gonnet, Baeza-Yates, et al.
- 1992
(Show Context)
Citation Context ...e expressive variant of full text search in which non-hierarchical tags are present in the text. Searches may include references to tags as well as to content. This approach is used in the PAT system =-=[8]-=- for searching the Oxford English Dictionary. The descriptive markup of historical documents is typically too ad hoc to be captured by SGML-style tags, but is nevertheless an important part of the org... |

39 |
Information processing -- Text and office systems -- Standard Generalized Markup Language
- ISO
(Show Context)
Citation Context ...- It is hard to predict the performance of software on parallel machines without actually developing and executing it. The extensive use of semantically-based markup, and particularly the use of SGML =-=[ISO86]-=-, means that most documents have a de facto tree structure. This makes it possible to model them by a data type with enough formality that useful theory can be applied. We will use the theory of categ... |

30 |
A query language for retrieving information from hierarchic text structures
- Macleod
- 1991
(Show Context)
Citation Context ...context information (e.g. "dog" within the third section heading), and truncated term expansion is trivially available. No existing system has this capability, but the path expressions query=-= language [14]-=- allows such queries to be expressed, and we show in subsequent sections how such searches may be implemented efficiently. Fortunately, this fourth level of search is no more expensive to implement in... |

25 | Parallel text search methods
- Salton, Buckley
- 1988
(Show Context)
Citation Context ...l become common in the next decade. The use of parallelism has been suggested for document applications for some time. Some of the drawbacks have been pointed out by Stone [24] and Salton and Buckley =-=[17]-=- -- these centre around the need of most parallel applications to examine the entire text database where sequential algorithms examine only a small portion, and the consequent performance degradation ... |

24 | Parallel Programming, List Homomorphisms and the Maximum Segment Sum Problem
- Cole
- 1993
(Show Context)
Citation Context ...isms include many of the interesting functions on constructed data types. In particular, all injective functions are homomorphisms. Furthermore, all functions can be expressed as almost-homomorphisms =-=[2]-=-, the composition of a homomorphism with a projection, and this is often of practical importance. In the next section we introduce the construction of a type for trees to represent structured text. We... |

24 |
Algebras for Tree Algorithms. D
- Gibbons
- 1991
(Show Context)
Citation Context ...tial implementations, while for large p it gives almost logarithmic execution times. Binary trees are easily extended to trees in which each internal node has a list of subtrees (so-called Rose trees =-=[6]-=-). Rose trees much more naturally model SGML tagged text. The complexity of the algorithms we will present only changes by a constant factor, since any Rose tree can be replaced by a binary tree witho... |

19 | Efficient parallel algorithms for tree accumulations
- Gibbons, Cai, et al.
- 1994
(Show Context)
Citation Context ...sion of tree contraction; when a node u is removed, it is stacked by its remaining child. When this child receives its final value, it unstacks u and computes its final value. Details may be found in =-=[7]-=-. The sequential time complexity of an upwards accumulation is t 1 (UpAccum(f 1 ; f 2 )) = n(t 1 (f 1 ) + t 1 (f 2 )) its parallel time complexity is t n (UpAccum(f 1 ; f 2 )) = t 1 (f 1 ) + ht \Theta... |

18 | Parallel Implementation of Tree Skeletons
- Skillicorn
- 1996
(Show Context)
Citation Context ...s not normally itself a tree. However, with some care it is possible to get implementations of the fast parallel operations above in time complexity n=p + log p for trees of size n using p processors =-=[20]-=-. For small p, this gives almost linear speed-up over sequential implementations, while for large p it gives almost logarithmic execution times. Binary trees are easily extended to trees in which each... |

14 | Optimal routing of parentheses on the hypercube
- Mayr, Werchner
- 1993
(Show Context)
Citation Context ...h nearest neighbours, except for the tree contraction algorithm used in several places. This enables us to include communication costs in our complexity measures. We use a result of Mayr and Werchner =-=[16]-=- to justify the complexity of tree contraction on the hypercube. We will also assume that a tree of n nodes is processed by an n-processor system, so that there is a processor per tree node. We return... |

9 |
Foundations of Parallel Programming. Cambridge Series in Parallel Computation
- Skillicorn
- 1994
(Show Context)
Citation Context ...documents have a de facto tree structure. This makes it possible to model them by a data type with enough formality that useful theory can be applied. We will use the theory of categorical data types =-=[Ski94]-=-, a particular approach to initiality, emphasising its ability to hide those aspects of a computation that are most difficult in a parallel setting. Operations on structured text are expressed as homo... |

7 |
Faster optimal parallel pre x sums and list ranking
- Cole, Vishkin
- 1989
(Show Context)
Citation Context ...e algorithm for deciding where to apply the contraction operations is the following: 1. Number the leaves left to right beginning at 0 { this can be done in O(log n) time using O(n= log n) processors =-=[3]-=-. 2. For every u such that u.l is an even numbered leaf, perform the contraction operation. 3. For every u that was not involved in the previous step, and for which u.r is an even numbered leaf, perfo... |

6 |
On parsing Context-Free Languages in Parallel Environments
- Fischer
- 1975
(Show Context)
Citation Context ...tions. 6 Parallel Search of Flat Text There is a well-known parallel algorithm for recognizing whether a given string is a member of a regular language in time logarithmic in the length of the string =-=[5, 9]-=-. This algorithm is naturally parallel and readily adapted to document search, even on quite modest parallel computers. Although it was described for the Connection Machine [9] it never seems to have ... |

4 |
Signature les: An access method for documents 11 its analytical performance evaluation
- Faloutsos, Christodoulakis
- 1984
(Show Context)
Citation Context ...e varying between individuals). It is usually implemented using indexing. Parallelism can be used by partitioning the index. 2. Search on full text. This is implemented using a signature le technique =-=[4, 22, 23]-=- or special purpose hardware [10, 11]. Parallelism can be used by partitioning the signature le. 3. Search on non-hierarchical tagged regions. This is a more expressive variant of full text search in ... |

3 |
Foundations of Parallel Computing. Cambridge Series in Parallel Computation
- Skillicorn
- 1994
(Show Context)
Citation Context ...documents have a de facto tree structure. This makes it possible to model them by a data type with enough formality that useful theory can be applied. We will use the theory of categorical data types =-=[19]-=-, a particular approach to initiality, emphasising its ability to hide those aspects of a computation that are most difficult in a parallel setting. Categorical data types generalise abstract data typ... |

2 |
Special-purpose hardware for text searching: Past experience, future potential
- Hollaar
- 1991
(Show Context)
Citation Context ...lly implemented using indexing. Parallelism can be used by partitioning the index. 2. Search on full text. This is implemented using a signature file technique [4, 22, 23] or special purpose hardware =-=[10, 11]-=-. Parallelism can be used by partitioning the signature file. 3. Search on non-hierarchical tagged regions. This is a more expressive variant of full text search in which non-hierarchical tags are pre... |

2 |
Path expressions as selectors for non-linear text
- Macleod
- 1993
(Show Context)
Citation Context ...ribute at all the nodes that have passed the filter. More complex queries such as the last example above require more complex moves. This insight is the critical one in the design of path expressions =-=[15]-=-, a general query language for structured text applications. The crucial property of path expressions that we require is that filters can be broken up into searches for patterns that are single paths,... |

2 |
Parallel querying of large databases
- Stone
- 1987
(Show Context)
Citation Context ... archives. Such machines will become common in the next decade. The use of parallelism has been suggested for document applications for some time. Some of the drawbacks have been pointed out by Stone =-=[24]-=- and Salton and Buckley [17] -- these centre around the need of most parallel applications to examine the entire text database where sequential algorithms examine only a small portion, and the consequ... |

1 |
The Utah Text Search Engine: Implementation experiences and future plans
- Hollaar
- 1985
(Show Context)
Citation Context ...lly implemented using indexing. Parallelism can be used by partitioning the index. 2. Search on full text. This is implemented using a signature file technique [4, 22, 23] or special purpose hardware =-=[10, 11]-=-. Parallelism can be used by partitioning the signature file. 3. Search on non-hierarchical tagged regions. This is a more expressive variant of full text search in which non-hierarchical tags are pre... |

1 |
Fast querying of concurrent hierarchies. submitted
- Raju, Skillicorn
(Show Context)
Citation Context ...e extended to multiple trees 65 Skillicorn D.B.: Structured Parallel Computation in Structured Documents sharing structure, such as those which arise in corpus linguistics and multiple document views =-=[RS]-=-. 9 Implementing Path Expression Search The parallel search algorithm described in the previous section is not, at present, competitive with existing index-based search algorithms. There are two reaso... |

1 | A generalisation of indexing for parallel document search
- Skillicorn
- 1995
(Show Context)
Citation Context ...reating and storing large indexes. Although almost nothing is know about indexing structures in a way that would improve the performance of parallel algorithms, there is some hope that new techniques =-=[Ski95]-=- will reduce this handicap. Our algorithm may also be useful in situations where the text archive is highly dynamic, making indexing impractical. It is also the case that most document structure resid... |

1 |
Information processing { text and o ce systems { standard generalized markup language (SGML
- ISO
- 1986
(Show Context)
Citation Context ...{ It is hard to predict the performance of software on parallel machines without actually developing and executing it. The extensive use of semantically-based markup, and particularly the use of SGML =-=[ISO86]-=-, means that most documents have ade facto tree structure. This makes it possible to model them by a data type with enough formality that useful theory can be applied. We will use the theory of catego... |