Results 1 - 10
of
37
Models and issues in data stream systems
- In PODS
, 2002
"... In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams. In addition to reviewing past work releva ..."
Abstract
-
Cited by 520 (18 self)
- Add to MetaCart
In this overview paper we motivate the need for and research issues arising from a new model of data processing. In this model, data does not take the form of persistent relations, but rather arrives in multiple, continuous, rapid, time-varying data streams. In addition to reviewing past work relevant to data stream systems and current projects in the area, the paper explores topics in stream query languages, new requirements and challenges in query processing, and algorithmic issues. 1
Continuous Queries over Data Streams
, 2004
"... In many recent applications, data may take the form of continuous data streams, rather than finite stored data sets. Several aspects of data management need to be reconsidered in the presence of data streams, offering a new research direction for the database community. In this paper we focus primar ..."
Abstract
-
Cited by 215 (8 self)
- Add to MetaCart
In many recent applications, data may take the form of continuous data streams, rather than finite stored data sets. Several aspects of data management need to be reconsidered in the presence of data streams, offering a new research direction for the database community. In this paper we focus primarily on the problem of query processing, specifically on how to define and evaluate continuous queries over data streams. We address semantic issues as well as efficiency concerns. Our main contributions are threefold. First, we specify a general and flexible architecture for query processing in the presence of data streams. Second, we use our basic architecture as a tool to clarify alternative semantics and processing techniques for continuous queries. The architecture also captures most previous work on continuous queries and data streams, as well as related concepts such as triggers and materialized views. Finally, we map out research topics in the area of query processing over data streams, showing where previous work is relevant and describing problems yet to be addressed.
Storing and querying ordered xml using a relational database system
- In SIGMOD
, 2002
"... XML is quickly becoming the de facto standard for data exchange over the Intemet. This is creating a new set of data management requirements involving XML, such as the need to store and query XML documents. Researchers have proposed using relational database systems to satisfy these requirements by ..."
Abstract
-
Cited by 180 (1 self)
- Add to MetaCart
XML is quickly becoming the de facto standard for data exchange over the Intemet. This is creating a new set of data management requirements involving XML, such as the need to store and query XML documents. Researchers have proposed using relational database systems to satisfy these requirements by devising ways to "shred " XML documents into relations, and translate XML queries into SQL queries over these relations. However, a key issue with such an approach, which has largely been ignored in the research literature, is how (and whether) the ordered XML data model can be efficiently supported by the unordered relational data model. This paper shows that XML's ordered data model can indeed be efficiently supported by a relational database system. This is accomplished by encoding order as a data value. We propose three order encoding methods that can be used to represent XML order in the relational data model, and also propose algorithms for translating ordered XPath expressions into SQL using these encoding methods. Finally, we report the results of an experimental study that investigates the performance of the proposed order encoding methods on a workload of ordered XML queries and updates. 1.
Temporal and Real-Time Databases: A Survey
- IEEE Transactions on Knowledge and Data Engineering
, 1995
"... A temporal database contains time-varying data. In a real-time database transactions have deadlines or timing constraints. In this paper we review the substantial research in these two heretofore separate research areas. We first characterize the time domain, then investigate temporal and real-time ..."
Abstract
-
Cited by 155 (9 self)
- Add to MetaCart
A temporal database contains time-varying data. In a real-time database transactions have deadlines or timing constraints. In this paper we review the substantial research in these two heretofore separate research areas. We first characterize the time domain, then investigate temporal and real-time data models. We evaluate temporal and real-time query languages along several dimensions. Temporal and real-time DBMS implementation is examined. We conclude with a summary of the major accomplishments of the research to date, and list several research questions that should be addressed next. Keywords: object-oriented database, relational databases, query language, temporal data model, time-constrained database, transaction time, user-defined time, valid time 1 Introduction Time is an important aspect of all real-world phenomena. Events occur at specific points in time; objects and the relationships among objects exist over time. The ability to model this temporal dimension of the real worl...
On Similarity-Based Queries for Time Series Data
, 1999
"... We study similarity queries for time series data where similarity is defined in terms of a set of linear transformations on the Fourier series representation of a sequence. We have shown in an earlier work that this set of transformations is rich enough to formulate operations such as moving average ..."
Abstract
-
Cited by 128 (5 self)
- Add to MetaCart
We study similarity queries for time series data where similarity is defined in terms of a set of linear transformations on the Fourier series representation of a sequence. We have shown in an earlier work that this set of transformations is rich enough to formulate operations such as moving average and time scaling. In this paper, we present a new algorithm for processing queries that define similarity in terms of multiple transformations instead of a single one. The idea is, instead of searching the index multiple times and each time applying a single transformation, to search the index only once and apply a collection of transformations simultaneously to the index. Our experimental results on both synthetic and real data show that the new algorithm for simultaneously processing multiple transformations is much faster than sequential scanning or index traversal using one transformation at a time. We also examine the possibility of composing transformations in a query or of rewriting...
On Similarity Queries for Time-Series Data: Constraint Specification and Implementation
, 1995
"... Constraints are a natural mechanism for the specification of similarity queries on time-series data. However, to realize the expressive power of constraint programming in this context, one must provide the matching implementation technology for efficient indexing of very large data sets. In this pap ..."
Abstract
-
Cited by 93 (4 self)
- Add to MetaCart
Constraints are a natural mechanism for the specification of similarity queries on time-series data. However, to realize the expressive power of constraint programming in this context, one must provide the matching implementation technology for efficient indexing of very large data sets. In this paper, we formalize the intuitive notions of exact and approximate similarity between time-series patterns and data. Our definition of similarity extends the distance metric used in [2, 7] with invariance under a group of transformations. Our main observation is that the resulting, more expressive, set of constraint queries can be supported by a new indexing technique, which preserves all the desirable properties of the indexing scheme proposed in [2, 7].
Tribeca: A System for Managing Large Databases of Network Traffic
- In USENIX
, 1998
"... The engineers who analyze traffic on high bandwidth networks must filter and aggregate either recorded traces of network packets or live traffic from the network itself. These engineers perform operations similar to database queries, but cannot use conventional data managers because of performance c ..."
Abstract
-
Cited by 67 (0 self)
- Add to MetaCart
The engineers who analyze traffic on high bandwidth networks must filter and aggregate either recorded traces of network packets or live traffic from the network itself. These engineers perform operations similar to database queries, but cannot use conventional data managers because of performance concerns and a semantic mismatch between the analysis operations and the operations supported by commercial DBMSs. Traffic analysis does not require fast random access, transactional update, or relational joins. Rather, it needs fast sequential access to a stream of traffic records and the ability to filter, aggregate, define windows, demultiplex, and remultiplex the stream. Tribeca is an extensible, stream-oriented DBMS designed to support network traffic analysis. It combines ideas from temporal and sequence databases with an implementation optimized for databases stored on high speed ID-1 tapes or arriving in real time from the network. The paper describes Tribeca's query language, executo...
Similarity-Based Queries
, 1995
"... We develop a domain-independent framework for defining queries in terms of similarity of objects. Our framework has three components: a pattern language, a transformation rule language, and a query language. The pattern language specifies classes of objects, the transformation rule language defines ..."
Abstract
-
Cited by 62 (6 self)
- Add to MetaCart
We develop a domain-independent framework for defining queries in terms of similarity of objects. Our framework has three components: a pattern language, a transformation rule language, and a query language. The pattern language specifies classes of objects, the transformation rule language defines similarity by specifying the similarity-preserving transformations, and the whole package is wrapped in a general query language. The framework can be "tuned" to the needs of a specific application domain, such as time sequences, molecules, text strings or images, by the choice of these languages. We demonstrate the framework by presenting a specific instance on a specific domain -- the domain of sequences. We start with sequences over a finite alphabet, and then consider sequences over infinite ordered domains. The basic pattern language we use is regular expressions, and the query language is calculus-based. We show that even when the pattern/query languages chosen are not too powerful, t...
Flexible time management in data stream systems
- In PODS
, 2004
"... (DSMS) rely on time as a basis for windows on streams and for defining a consistent semantics for multiple streams and updatable relations. The system clock in a centralized DSMS provides a convenient and well-behaved notion of time, but often it is more appropriate for a DSMS application to define ..."
Abstract
-
Cited by 56 (3 self)
- Add to MetaCart
(DSMS) rely on time as a basis for windows on streams and for defining a consistent semantics for multiple streams and updatable relations. The system clock in a centralized DSMS provides a convenient and well-behaved notion of time, but often it is more appropriate for a DSMS application to define its own notion of timeāits own clock(s), sequence numbers, or other forms of ordering and timestamping. Flexible application-defined time poses challenges to the DSMS, since streams may be out of order and uncoordinated with each other, they may incur latency reaching the DSMS, and they may pause or stop. We formalize these challenges and specify how to generate heartbeats so that queries can be evaluated correctly and continuously in an application-defined time domain. Our heartbeat generation algorithm is based on parameters capturing skew between streams, unordering within streams, and latency in streams reaching the DSMS. We also describe how to estimate these parameters at run-time, and we discuss how heartbeats can be used for processing continuous queries. 1.
Efficient Retrieval of Similar Time Sequences Using DFT
- In Proc. FODO Conference, Kobe
, 1998
"... We propose an improvement of the known DFTbased indexing technique for fast retrieval of similar time sequences. We use the last few Fourier coefficients in the distance computation without storing them in the index since every coefficient at the end is the complex conjugate of a coefficient at the ..."
Abstract
-
Cited by 48 (2 self)
- Add to MetaCart
We propose an improvement of the known DFTbased indexing technique for fast retrieval of similar time sequences. We use the last few Fourier coefficients in the distance computation without storing them in the index since every coefficient at the end is the complex conjugate of a coefficient at the beginning and as strong as its counterpart. We show analytically that this observation can accelerate the search time of the index by more than a factor of two. This result was confirmed by our experiments, which were carried out on real stock prices and synthetic data. Keywords similarity retrieval, time series indexing 1 Introduction Time sequences constitute a large amount of data stored in computers. Examples include stock prices, exchange rates, weather data and biomedical measurements. We are often interested in similarity queries on time-series data [APWZ95, ALSS95]. For example, we may want to find stocks that behave in approximately the same way; or years when the temperature pat...

