Results 1 - 10
of
14
The PARSEC benchmark suite: Characterization and architectural implications
- IN PRINCETON UNIVERSITY
, 2008
"... This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs). Previous available benchmarks for multiprocessors have focused on high-performance computing applications and used a limited ..."
Abstract
-
Cited by 150 (1 self)
- Add to MetaCart
This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs). Previous available benchmarks for multiprocessors have focused on high-performance computing applications and used a limited number of synchronization methods. PARSEC includes emerging applications in recognition, mining and synthesis (RMS) as well as systems applications which mimic large-scale multithreaded commercial programs. Our characterization shows that the benchmark suite covers a wide spectrum of working sets, locality, data sharing, synchronization and off-chip traffic. The benchmark suite has been made available to the public.
Mining Frequent Patterns in Data Streams at Multiple Time Granularities
, 2002
"... Although frequent-pattern mining has been widely studied and used, it is challenging to extend it to data streams. Compared to mining from a static transaction data set, the streaming case has far more information to track and far greater complexity to manage. Infrequent items can become frequent la ..."
Abstract
-
Cited by 75 (5 self)
- Add to MetaCart
Although frequent-pattern mining has been widely studied and used, it is challenging to extend it to data streams. Compared to mining from a static transaction data set, the streaming case has far more information to track and far greater complexity to manage. Infrequent items can become frequent later on and hence cannot be ignored. The storage structure needs to be dynamically adjusted to reflect the evolution of itemset frequencies over time.
A regression-based temporal pattern mining scheme for data streams
- In VLDB
, 2003
"... We devise in this paper a regression-based algorithm, called algorithm FTP-DS (Frequent Temporal Patterns of Data Streams), to mine frequent temporal patterns for data streams. While providing a general framework of pattern frequency counting, algorithm FTP-DS has two major features, namely one data ..."
Abstract
-
Cited by 35 (5 self)
- Add to MetaCart
We devise in this paper a regression-based algorithm, called algorithm FTP-DS (Frequent Temporal Patterns of Data Streams), to mine frequent temporal patterns for data streams. While providing a general framework of pattern frequency counting, algorithm FTP-DS has two major features, namely one data scan for online statistics collection and regressionbased compact pattern representation. To attain the feature of one data scan, the data segmentation and the pattern growth scenarios are explored for the frequency counting purpose. Algorithm FTP-DS scans online transaction flows and generates candidate frequent patterns in real time. The second important feature of algorithm FTP-DS is on the regression-based compact pattern representation. Specifically, to meet the space constraint, we devise for pattern representation a compact ATF (standing for Accumulated Time and Frequency) form to aggregately comprise all the information required for regression analysis. In addition, we develop the techniques of the segmentation tuning and segment relaxation to enhance the functions of FTP-DS. With these features, algorithm FTP-DS is able to not only conduct mining with variable time intervals but also perform trend detection effectively. Synthetic data and a real dataset which contains net-
SWAT: Hierarchical Stream Summarization in Large Networks
- In ICDE
, 2003
"... The problem of statistics and aggregate maintenance over data streams has gained popularity in recent years especially in telecommunications network monitoring, trend-related analysis, web-click streams, stock tickers, and other time-variant data. The amount of data generated in such applications ca ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
The problem of statistics and aggregate maintenance over data streams has gained popularity in recent years especially in telecommunications network monitoring, trend-related analysis, web-click streams, stock tickers, and other time-variant data. The amount of data generated in such applications can become too large to store, or if stored too large to scan multiple times. We consider queries over data streams that are biased towards the more recent values. We develop a technique that summarizes a dynamic stream incrementally at multiple resolutions. This approximation can be used to answer point queries, range queries, and inner product queries. Moreover, the precision of answers can be changed adaptively by a client.
MAIDS: Mining Alarming Incidents from Data Streams
"... Real-time surveillance systems, network and telecommunication systems, and other dynamic processes often generate tremendous (potentially infinite) volume of stream data. Effective analysis of such stream data poses great challenges to database and data mining researchers, due to its unique features ..."
Abstract
-
Cited by 20 (1 self)
- Add to MetaCart
Real-time surveillance systems, network and telecommunication systems, and other dynamic processes often generate tremendous (potentially infinite) volume of stream data. Effective analysis of such stream data poses great challenges to database and data mining researchers, due to its unique features, such as single-scan algorithm, multi-dimensional online analysis, fast response time, etc.
Stardust: Fast Stream Indexing using Incremental Wavelet Approximations
"... Monitoring thousands of data streams online poses a challenge in many data-centric applications such as telecommunications networks, traffic management, trend-related analysis, web-click streams, intrusion detection, and sensor networks. Stream mining techniques employed in these applications ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Monitoring thousands of data streams online poses a challenge in many data-centric applications such as telecommunications networks, traffic management, trend-related analysis, web-click streams, intrusion detection, and sensor networks. Stream mining techniques employed in these applications have to be efficient in terms of space and per-item processing time, while providing a high quality of answers to queries such as finding similar patterns, monitoring specified conditions, detecting correlations, and computing aggregates.
Single-pass algorithms for mining frequency change patterns with limited space in evolving append-only and dynamic transaction data streams
- in: Proc EEE
, 2004
"... In this paper, we propose an online single-pass algorithm MFC-append (Mining Frequency Change patterns in append-only data streams) for online mining frequent frequency change items in continuous append-only data streams. An online space-efficient data structure called Change-Sketch is developed for ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
In this paper, we propose an online single-pass algorithm MFC-append (Mining Frequency Change patterns in append-only data streams) for online mining frequent frequency change items in continuous append-only data streams. An online space-efficient data structure called Change-Sketch is developed for providing fast response time to compute dynamic frequency changes between data streams. A modified approach MFCdynamic (Mining Frequency Change patterns in dynamic data streams) is also presented to mine frequency changes in dynamic data streams. The theoretic analyses show that our algorithms meet the major performance requirements of single-pass, bounded storage, and real time for streaming data mining. 1.
Discovering Evolutionary Classifier over High Speed Non-static Stream
"... With the emergence of large-volume and high-speed streaming data, mining data streams has become a focus of increasing interests. The major new challenges in streaming data mining are as follows: (1) since streams may flow in and out indefinitely and in fast speed, it is usually expected that a stre ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
With the emergence of large-volume and high-speed streaming data, mining data streams has become a focus of increasing interests. The major new challenges in streaming data mining are as follows: (1) since streams may flow in and out indefinitely and in fast speed, it is usually expected that a stream mining process can only scan a data stream once; and (2) since the characteristics of the data may evolve over time, it is desirable to incorporate evolving features of data streams. This paper investigates the issues of developing a high-speed classification method on streaming data with concept drifts. Among several popular classification techniques, the naïve Bayesian classifier is chosen due to its low construction cost, easiness for incremental maintenance, and high accuracy. An efficient algorithm, called EvoClass (Evolutionary Classifier), is devised. EvoClass builds an incremental, evolutionary Bayesian classifier on streaming data. A train-and-test method is employed to discover the changes in the characteristics of the data and the need for construction of a new classifier. In addition, divergence is utilized to quantify the changes in the classifier and inform the user what aspect of the data characteristics has evolved. Finally, an intensive empirical study has been performed that demonstrates the effectiveness and efficiency of the EvoClass method.
Mining Data Streams
"... this paper, we address the challenges to mine data streams as well as discuss some limitations of current research. To takle these problems, our research focuses on Designing computation and memory efficient algorithms to provide approximate results in high accuracy and confidence and developing sys ..."
Abstract
- Add to MetaCart
this paper, we address the challenges to mine data streams as well as discuss some limitations of current research. To takle these problems, our research focuses on Designing computation and memory efficient algorithms to provide approximate results in high accuracy and confidence and developing system support help mine useful information from data streams. Some specific research problems are identified. Meanwhile, the new techniques and algorithms we have developed for decision tree construction and frequent itemset mining on streaming data are presented, and some preliminary ideas of our on-going work are discussed. Currently, we continue working on some of these problems

