Results 1  10
of
57
External Memory Algorithms and Data Structures
, 1998
"... Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we survey the ..."
Abstract

Cited by 320 (24 self)
 Add to MetaCart
Data sets in large applications are often too massive to fit completely inside the computer's internal memory. The resulting input/output communication (or I/O) between fast internal memory and slower external memory (such as disks) can be a major performance bottleneck. In this paper, we survey the state of the art in the design and analysis of external memory algorithms and data structures (which are sometimes referred to as "EM" or "I/O" or "outofcore" algorithms and data structures). EM algorithms and data structures are often designed and analyzed using the parallel disk model (PDM). The three machineindependent measures of performance in PDM are the number of I/O operations, the CPU time, and the amount of disk space. PDM allows for multiple disks (or disk arrays) and parallel CPUs, and it can be generalized to handle tertiary storage and hierarchical memory. We discuss several important paradigms for how to solve batched and online problems efficiently in external memory. Programming tools and environments are available for simplifying the programming task. The TPIE system (Transparent Parallel I/O programming Environment) is both easy to use and efficient in terms of execution speed. We report on some experiments using TPIE in the domain of spatial databases. The newly developed EM algorithms and data structures that incorporate the paradigms we discuss are significantly faster than methods currently used in practice.
Indexing the Positions of Continuously Moving Objects
, 2000
"... The coming years will witness dramatic advances in wireless communications as well as positioning technologies. As a result, tracking the changing positions of objects capable of continuous movement is becoming increasingly feasible and necessary. The present paper proposes a novel, R # tree base ..."
Abstract

Cited by 318 (18 self)
 Add to MetaCart
The coming years will witness dramatic advances in wireless communications as well as positioning technologies. As a result, tracking the changing positions of objects capable of continuous movement is becoming increasingly feasible and necessary. The present paper proposes a novel, R # tree based indexing technique that supports the efficient querying of the current and projected future positions of such moving objects. The technique is capable of indexing objects moving in one, two, and threedimensional space. Update algorithms enable the index to accommodate a dynamic data set, where objects may appear and disappear, and where changes occur in the anticipated positions of existing objects. A comprehensive performance study is reported.
Indexing moving points
, 2003
"... We propose three indexing schemes for storing a set S of N points in the plane, each moving along a linear trajectory, so that any query of the following form can be answered quickly: Given a rectangle R and a real value t; report all K points of S that lie inside R at time t: We first present an in ..."
Abstract

Cited by 169 (13 self)
 Add to MetaCart
We propose three indexing schemes for storing a set S of N points in the plane, each moving along a linear trajectory, so that any query of the following form can be answered quickly: Given a rectangle R and a real value t; report all K points of S that lie inside R at time t: We first present an indexing structure that, for any given constant e> 0; uses OðN=BÞ disk blocks and answers a query in OððN=BÞ 1=2þe þ K=BÞ I/Os, where B is the block size. It can also report all the points of S that lie inside R during a given time interval. A point can be inserted or deleted, or the trajectory of a point can be changed, in Oðlog 2 B NÞ I/Os. Next, we present a general approach that improves the query time if the queries arrive in chronological order, by allowing the index to evolve over time. We obtain a tradeoff between the query time and the number of times the index needs to be updated as the points move. We also describe an indexing scheme in which the number of I/Os required to answer a query depends monotonically on the difference between the query time stamp t and the current time. Finally, we develop an efficient indexing scheme to answer approximate
Efficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data
 Proc. 30th Int’l Conf. Very Large Data Bases (VLDB
, 2004
"... It is infeasible for a sensor database to contain the exact value of each sensor at all points in time. This uncertainty is inherent in these systems due to measurement and sampling errors, and resource limitations. In order to avoid drawing erroneous conclusions based upon stale data, the use of un ..."
Abstract

Cited by 104 (20 self)
 Add to MetaCart
It is infeasible for a sensor database to contain the exact value of each sensor at all points in time. This uncertainty is inherent in these systems due to measurement and sampling errors, and resource limitations. In order to avoid drawing erroneous conclusions based upon stale data, the use of uncertainty intervals that model each data item as a range and associated probability density function (pdf) rather than a single value has recently been proposed. Querying these uncertain data introduces imprecision into answers, in the form of probability values that specify the likeliness the answer satisfies the query. These queries are more expensive to evaluate than their traditional counterparts but are guaranteed to be correct and more informative due to the probabilities accompanying the answers. Although the answer probabilities are useful, for many applications, it is only necessary to know whether the probability exceeds a given threshold – we term these Probabilistic Threshold Queries (PTQ). In this paper we address the efficient computation of these types of queries. In particular, we develop two index structures and associated algorithms to efficiently answer PTQs. The first index scheme is based on the idea of augmenting uncertainty information to an Rtree. We establish the difficulty
External Memory Data Structures
, 2001
"... In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worstcase efficient external memory dynami ..."
Abstract

Cited by 79 (36 self)
 Add to MetaCart
In many massive dataset applications the data must be stored in space and query efficient data structures on external storage devices. Often the data needs to be changed dynamically. In this chapter we discuss recent advances in the development of provably worstcase efficient external memory dynamic data structures. We also briefly discuss some of the most popular external data structures used in practice.
Range Searching
, 1996
"... Range searching is one of the central problems in computational geometry, because it arises in many applications and a wide variety of geometric problems can be formulated as a rangesearching problem. A typical rangesearching problem has the following form. Let S be a set of n points in R d , an ..."
Abstract

Cited by 69 (1 self)
 Add to MetaCart
Range searching is one of the central problems in computational geometry, because it arises in many applications and a wide variety of geometric problems can be formulated as a rangesearching problem. A typical rangesearching problem has the following form. Let S be a set of n points in R d , and let R be a family of subsets; elements of R are called ranges . We wish to preprocess S into a data structure so that for a query range R, the points in S " R can be reported or counted efficiently. Typical examples of ranges include rectangles, halfspaces, simplices, and balls. If we are only interested in answering a single query, it can be done in linear time, using linear space, by simply checking for each point p 2 S whether p lies in the query range.
Online data structures in external memory
 Proc. Annual International Colloquium on Automata, Languages, and Programming, LNCS 1644
, 1999
"... The data sets for many of today's computer applications are too large to fit within the computer's internal memory and must instead be stored on external storage devices such as disks. A major performance bottleneck can be the input/output communication (or I/O) between the external and internal mem ..."
Abstract

Cited by 57 (6 self)
 Add to MetaCart
The data sets for many of today's computer applications are too large to fit within the computer's internal memory and must instead be stored on external storage devices such as disks. A major performance bottleneck can be the input/output communication (or I/O) between the external and internal memories. In this paper we discuss a variety of online data structures for external memory, some very old and some very new, such as hashing (for dictionaries), Btrees (for dictionaries and 1D range search), buffer trees (for batched dynamic problems), interval trees with weightbalanced Btrees (for stabbing queries), priority search trees (for 3sided 2D range search), and Rtrees and other spatial structures. We also discuss several open problems along the way.
Efficient Searching with Linear Constraints (Extended Abstract)
"... ) Pankaj K. Agarwal Lars Arge y Jeff Erickson z Paolo G. Franciosa x Jeffrey Scott Vitter  Abstract We show how to preprocess a set S of points in R d to get an external memory data structure that efficiently supports linearconstraint queries. Each query is in the form of a linear c ..."
Abstract

Cited by 56 (17 self)
 Add to MetaCart
) Pankaj K. Agarwal Lars Arge y Jeff Erickson z Paolo G. Franciosa x Jeffrey Scott Vitter  Abstract We show how to preprocess a set S of points in R d to get an external memory data structure that efficiently supports linearconstraint queries. Each query is in the form of a linear constraint a \Delta x b; the data structure must report all the points of S that satisfy the query. Our goal is to minimize the number of disk blocks required to store the data structure and the number of disk accesses (I/Os) required to answer a query. For d = 2, we present the first nearlinear size data structures that can answer linearconstraint queries using an optimal number of I/Os. We also present a linearsize data structure that can answer queries efficiently in the worst case. We combine these two approaches to obtain tradeoffs between space and query time. Finally, we show that some of our techniques extend to higher dimensions d. Center for Geometric Computing, Computer...
A General Model for Authenticated Data Structures
 Algorithmica
, 2001
"... Query answers from online databases can easily be corrupted by hackers or malicious database publishers. Thus it is important to provide mechanisms which allow clients to trust the results from online queries. Authentic publication is a novel approach which allows untrusted publishers to securely ..."
Abstract

Cited by 49 (1 self)
 Add to MetaCart
Query answers from online databases can easily be corrupted by hackers or malicious database publishers. Thus it is important to provide mechanisms which allow clients to trust the results from online queries. Authentic publication is a novel approach which allows untrusted publishers to securely answer queries from clients on behalf of trusted offline data owners. Publishers validate answers using compact, hardtoforge verification objects (VOs), which clients can check efficiently. This approach provides greater scalability (by adding more publishers) and better security (online publishers don't need to be trusted).
Type less, find more: fast autocompletion search with a succinct index
 IN 29TH CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR’06
, 2006
"... We consider the following fulltext search autocompletion feature. Imagine a user of a search engine typing a query. Then with every letter being typed, we would like an instant display of completions of the last query word which would lead to good hits. At the same time, the best hits for any of th ..."
Abstract

Cited by 43 (5 self)
 Add to MetaCart
We consider the following fulltext search autocompletion feature. Imagine a user of a search engine typing a query. Then with every letter being typed, we would like an instant display of completions of the last query word which would lead to good hits. At the same time, the best hits for any of these completions should be displayed. Known indexing data structures that apply to this problem either incur large processing times for a substantial class of queries, or they use a lot of space. We present a new indexing data structure that uses no more space than a stateoftheart compressed inverted index, but that yields an order of magnitude faster query processing times. Even on the large TREC Terabyte collection, which comprises over 25 million documents, we achieve, on a single machine and with the index on disk, average response times of one tenth of a second. We have built a fullfledged, interactive search engine that realizes the proposed autocompletion feature combined with support for proximity search, semistructured (XML) text, subword and phrase completion, and semantic tags.