Results 1 - 10
of
123
Skip graphs
- in SODA
, 2003
"... Skip graphs are a novel distributed data structure, based on skip lists, that provide the full functionality of a balanced tree in a distributed system where resources are stored in separate nodes that may fail at any time. They are designed for use in searching peer-to-peer systems, and by providin ..."
Abstract
-
Cited by 202 (8 self)
- Add to MetaCart
Skip graphs are a novel distributed data structure, based on skip lists, that provide the full functionality of a balanced tree in a distributed system where resources are stored in separate nodes that may fail at any time. They are designed for use in searching peer-to-peer systems, and by providing the ability to perform queries based on key ordering, they improve on existing search tools that provide only hash table functionality. Unlike skip lists or other tree data structures, skip graphs are highly resilient, tolerating a large fraction of failed nodes without losing connectivity. In addition, simple and straightforward algorithms can be used to construct a skip graph, insert new nodes into it, search it, and detect and repair errors in a skip graph introduced due to node failures.
A New Approach to Text Searching
"... We introduce a family of simple and fast algorithms for solving the classical string matching problem, string matching with classes of symbols, don't care symbols and complement symbols, and multiple patterns. In addition we solve the same problems allowing up to k mismatches. Among the features of ..."
Abstract
-
Cited by 200 (14 self)
- Add to MetaCart
We introduce a family of simple and fast algorithms for solving the classical string matching problem, string matching with classes of symbols, don't care symbols and complement symbols, and multiple patterns. In addition we solve the same problems allowing up to k mismatches. Among the features of these algorithms are that they don't need to buffer the input, they are real time algorithms (for constant size patterns), and they are suitable to be implemented in hardware. 1 Introduction String searching is a very important component of many problems, including text editing, bibliographic retrieval, and symbol manipulation. Recent surveys of string searching can be found in [17, 4]. The string matching problem consists of finding all occurrences of a pattern of length m in a text of length n. We generalize the problem allowing "don't care" symbols, the complement of a symbol, and any finite class of symbols. We solve this problem for one or more patterns, with or without mismatches. Fo...
Join Indices
- ACM Transactions on Database Systems
, 1987
"... In new application areas of relational database systems, such as artificial intelligence, the join operator is used more extensively than in conventional applications. In this paper, we propose a simple data structure, called a join index, for improving the performance of joins in the context of com ..."
Abstract
-
Cited by 188 (2 self)
- Add to MetaCart
In new application areas of relational database systems, such as artificial intelligence, the join operator is used more extensively than in conventional applications. In this paper, we propose a simple data structure, called a join index, for improving the performance of joins in the context of complex queries. For most of the joins, updates to join indices incur very little overhead. Some properties of a join index are (i) its efficient use of memory and adaptiveness to parallel execution, data type join predicates, (iv) its support for multirelation clustering, and (v) its use in representing directed graphs and in evaluating recursive queries. Finally, the analysis of the join algorithm using join indices shows its excellent performance.
Index-driven similarity search in metric spaces
- ACM Transactions on Database Systems
, 2003
"... Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search th ..."
Abstract
-
Cited by 118 (6 self)
- Add to MetaCart
Similarity search is a very important operation in multimedia databases and other database applications involving complex objects, and involves finding objects in a data set S similar to a query object q, based on some similarity measure. In this article, we focus on methods for similarity search that make the general assumption that similarity is represented with a distance metric d. Existing methods for handling similarity search in this setting typically fall into one of two classes. The first directly indexes the objects based on distances (distance-based indexing), while the second is based on mapping to a vector space (mapping-based approach). The main part of this article is dedicated to a survey of distance-based indexing methods, but we also briefly outline how search occurs in mapping-based methods. We also present a general framework for performing search based on distances, and present algorithms for common types of queries that operate on an arbitrary “search hierarchy. ” These algorithms can be applied on each of the methods presented, provided a suitable search hierarchy is defined.
Reasoning Under Varying and Uncertain Resource Constraints
, 1988
"... We describe the use of decision-theory to optimize the value of computation under uncertain and varying resource limitations. The research is motivated by the pursuit of formal models of rational decision making for computational agents, centering on the explicit consideration of preferences and res ..."
Abstract
-
Cited by 112 (19 self)
- Add to MetaCart
We describe the use of decision-theory to optimize the value of computation under uncertain and varying resource limitations. The research is motivated by the pursuit of formal models of rational decision making for computational agents, centering on the explicit consideration of preferences and resource availability. We focus here on the importance of identifying the multiattribute structure of partial results generated by approximation methods for making control decisions. Work on simple algorithms and on the control of decision-theoretic inference itself is described. 1 Computation Under Uncertainty We are investigating the decision-theoretic control of problem solving under varying constraints in resources required for reasoning, such as time and memory. This work is motivated by the pursuit of formal models of rational decision making under resource constraints and our goal of extending foundational work on normative rationality to computational agents. We describe here a portion...
Reducing the braking distance of an SQL query engine
- In Proc. of the 24th VLDB Conf
, 1998
"... In a recent paper, we proposed adding a STOP AFTER clause to SQL to permit the cardinality of a query result to be explicitly limited by query writers and query tools. We demonstrated the usefulness of having this clause, showed how to extend a traditional cost-based query optimizer to accommodate i ..."
Abstract
-
Cited by 84 (6 self)
- Add to MetaCart
In a recent paper, we proposed adding a STOP AFTER clause to SQL to permit the cardinality of a query result to be explicitly limited by query writers and query tools. We demonstrated the usefulness of having this clause, showed how to extend a traditional cost-based query optimizer to accommodate it, and demonstrated via DB2-based simulations that large performance gains are possible when STOP AFTER queries are explicitly supported by the database engine. In this paper, we present several new strategies for efficiently processing STOP AFTER queries. These strategies, based largely on the use of range partitioning techniques, offer significant additional savings for handling STOP AFTER queries that yield sizeable result sets. We describe classes of queries where such savings would indeed arise and present experimental measurements that show the benefits and tradeoffs associated with the new processing strategies. 1
Efficient Routing in Networks with Long Range Contacts (Extended Abstract)
, 2001
"... Lali Barri`ere , Pierre Fraigniaud , Evangelos Kranakis , and Danny Krizanc Dept. de Matem`atica Aplicada i Telem`atica, Universitat Polit`ecnica de Catalunya. ..."
Abstract
-
Cited by 70 (9 self)
- Add to MetaCart
Lali Barri`ere , Pierre Fraigniaud , Evangelos Kranakis , and Danny Krizanc Dept. de Matem`atica Aplicada i Telem`atica, Universitat Polit`ecnica de Catalunya.
SIMPLE: A methodology for programming high performance algorithms on clusters of symmetric multiprocessors (SMPs
- Journal of Parallel and Distributed Computing
, 1999
"... We describe a methodology for developing high performance programs running on clusters of SMP nodes. Our methodology is based on a small kernel (SIMPLE) of collective communication primitives that make e cient use of the hybrid shared and message passing environment. We illustrate the power of our m ..."
Abstract
-
Cited by 52 (13 self)
- Add to MetaCart
We describe a methodology for developing high performance programs running on clusters of SMP nodes. Our methodology is based on a small kernel (SIMPLE) of collective communication primitives that make e cient use of the hybrid shared and message passing environment. We illustrate the power of our methodology by presenting experimental results for sorting integers, two-dimensional fast Fourier transforms (FFT), and constraint-satis ed searching. Our testbed is a cluster of DEC AlphaServer 2100 4/275 nodes interconnected by anATM switch.
Fast Concurrent Access to Parallel Disks
- In 11th ACM-SIAM Symposium on Discrete Algorithms
, 1999
"... High performance applications involving large data sets require the efficient and flexible use of multiple disks. In an external memory machine with D parallel, independent disks, only one block can be accessed on each disk in one I/O step. This restriction leads to a load balancing problem that is ..."
Abstract
-
Cited by 44 (11 self)
- Add to MetaCart
High performance applications involving large data sets require the efficient and flexible use of multiple disks. In an external memory machine with D parallel, independent disks, only one block can be accessed on each disk in one I/O step. This restriction leads to a load balancing problem that is perhaps the main inhibitor for adapting single-disk external memory algorithms to multiple disks. This paper shows that this problem can be solved efficiently using a combination of randomized placement, redundancy and an optimal scheduling algorithm. A buffer of O(D) blocks suffices to support efficient writing of arbitrary blocks if blocks are distributed uniformly at random to the disks (e.g., by hashing). If two randomly allocated copies of each block exist, N arbitrary blocks can be read within dN=De + 1 I/O steps with high probability. In addition, the redundancy can be reduced from 2 to 1 + 1=r for any integer r. These results can be used to emulate the simple and powerful "single-disk multi-head" model of external computing [1] on the physically more realistic independent disk model [33] with small constant overhead. This is faster than a lower bound for deterministic emulation [3].
AbstFinder, A Prototype Natural Language Text Abstraction Finder for Use in Requirements Elicitation
- Automated Software Engineering
, 1997
"... Abstract. Abstraction identification is named as a key problem in requirements analysis. Typically, the abstractions must be found among the large mass of natural language text collected from the clients and users. This paper motivates and describes a new approach, based on traditional signal proces ..."
Abstract
-
Cited by 42 (0 self)
- Add to MetaCart
Abstract. Abstraction identification is named as a key problem in requirements analysis. Typically, the abstractions must be found among the large mass of natural language text collected from the clients and users. This paper motivates and describes a new approach, based on traditional signal processing methods, for finding abstractions in natural language text and offers a new tool, AbstFinder as an implementation of this approach. The advantages and disadvantages of the approach and the design of the tool are discussed in detail. Various scenarios for use of the tool are offered. Some of these scenarios were used in case study of the effectiveness of the tool on an industrial-strength example of finding abstractions in a request for proposals.

