Results 1 - 10
of
157
A framework for clustering evolving data streams. In:
- Proc of VLDB’03,
, 2003
"... Abstract The clustering problem is a difficult problem for the data stream domain. This is because the large volumes of data arriving in a stream renders most traditional algorithms too inefficient. In recent years, a few one-pass clustering algorithms have been developed for the data stream proble ..."
Abstract
-
Cited by 359 (36 self)
- Add to MetaCart
algorithm requires much greater functionality in discovering and exploring clusters over different portions of the stream. The widely used practice of viewing data stream clustering algorithms as a class of onepass clustering algorithms is not very useful from an application point of view. For example, a
A Framework for Clustering Uncertain Data Streams
- Proc. 24th IEEE Int’l Conf. Data Eng. (ICDE
, 2008
"... Abstract — In recent years, uncertain data management applications have grown in importance because of the large number of hardware applications which measure data approximately. For example, sensors are typically expected to have considerable noise in their readings because of inaccuracies in data ..."
Abstract
-
Cited by 43 (12 self)
- Add to MetaCart
for clustering uncertain data streams. We use a very general model of the uncertainty in which we assume that only a few statistical measures of the uncertainty are available. We will show that the use of even modest uncertainty information during the mining process is sufficient to greatly improve the quality
DCF: An Efficient Data Stream Clustering Framework for Streaming Applications
- Proc. DEXA 2006
"... Abstract. Streaming applications, such as environment monitoring and vehicle location tracking require handling high volumes of continuously arriving data and sudden fluctuations in these volumes while efficiently supporting multi-dimensional historical queries. The use of the traditional database m ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
management systems is inappropriate because they require excessive number of disk I/O in continuously updating massive data streams. In this paper, we propose DCF (Data Stream Clustering Framework), a novel framework that supports efficient data stream archiving for streaming applications. DCF can reduce a
A framework for clustering massivedomain data streams
- In IEEE 25th International Conference on Data Engineering (ICDE ’09
"... Abstract — In this paper, we will examine the problem of clustering massive domain data streams. Massive-domain data streams are those in which the number of possible domain values for each attribute are very large and cannot be easily tracked for clustering purposes. Some examples of such streams i ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
discrete values. The task of clustering is significantly more challenging in such cases, since the intermediate statistics for the different clusters cannot be maintained efficiently. In this paper, we propose a method for clustering massive-domain data streams with the use of sketches. We prove
Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters
- In HotCloud
, 2012
"... Many important “big data ” applications need to process data arriving in real time. However, current programming models for distributed stream processing are relatively low-level, often leaving the user to worry about consistency of state across the system and fault recovery. Furthermore, the models ..."
Abstract
-
Cited by 30 (2 self)
- Add to MetaCart
support a new recovery mechanism that improves efficiency over the traditional replication and upstream backup solutions in streaming databases: parallel recovery of lost state across the cluster. We have prototyped D-Streams in an extension to the Spark cluster computing framework called Spark Streaming
Density-Based Clustering for Real-Time Stream Data
- Proc. Of KDD' 07
, 2007
"... Existing data-stream clustering algorithms such as CluStream are based on k-means. These clustering algorithms are incompetent to find clusters of arbitrary shapes and cannot handle outliers. Further, they require the knowledge of k and user-specified time window. To address these issues, this paper ..."
Abstract
-
Cited by 42 (0 self)
- Add to MetaCart
, this paper proposes D-Stream, a framework for clustering stream data using a density-based approach. The algorithm uses an online component which maps each input data record into a grid and an offline component which computes the grid density and clusters the grids based on the density. The algorithm adopts
Communication-Efficient and Exact Clustering Distributed Streaming Data
"... A widely used approach to clustering a single data stream is the two-phased approach in which the online phase creates and maintains micro-clusters while the off-line phase generates the macro-clustering from the micro-clusters. We use this approach to propose a distributed framework for clustering ..."
Abstract
- Add to MetaCart
A widely used approach to clustering a single data stream is the two-phased approach in which the online phase creates and maintains micro-clusters while the off-line phase generates the macro-clustering from the micro-clusters. We use this approach to propose a distributed framework for clustering
A framework for clustering massive graph streams
- STATISTICAL ANALYSIS AND DATA MINING
, 2010
"... In this paper, we examine the problem of clustering massive graph streams. Graph clustering poses significant challenges because of the complex structures which may be present in the underlying data. The massive size of the underlying graph makes explicit structural enumeration very difficult. Con ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
. Consequently, most techniques for clustering multidimensional data are difficult to generalize to the case of massive graphs. Recently, methods have been proposed for clustering graph data, though these methods are designed for static data, and are not applicable to the case of graph streams. Furthermore
Detecting the change of clustering structure in categorical data streams
- SIAM Data Mining Conference
, 2006
"... Analyzing clustering structures in data streams can provide critical information for making decision in realtime. Most research has been focused on clustering algorithms for data streams. We argue that, more importantly, we need to monitor the change of clustering structure online. In this paper, we ..."
Abstract
-
Cited by 7 (3 self)
- Add to MetaCart
, we present a framework for detecting the change of critical clustering structure in categorical data streams, which is indicated by the change of the best number of clusters (Best K) in the data stream. The framework extends the work on determining the best K for static datasets (the BkPlot method
Detecting the Change of Clustering Structure in Categorical Data Streams
"... Analyzing clustering structures in data streams can provide critical information for making decision in realtime. In this paper, we present a framework for detecting the change of critical clustering structure in categorical data streams. The framework consists of the Hierarchical Entropy Tree struc ..."
Abstract
- Add to MetaCart
Analyzing clustering structures in data streams can provide critical information for making decision in realtime. In this paper, we present a framework for detecting the change of critical clustering structure in categorical data streams. The framework consists of the Hierarchical Entropy Tree
Results 1 - 10
of
157