Results 1 -
5 of
5
DEVise: Integrated Querying and Visual Exploration of Large Datasets (Demo Abstract)
- In Proceedings of ACM SIGMOD
, 1997
"... ) M. Livny, R. Ramakrishnan, K. Beyer, G. Chen, D. Donjerkovic, S. Lawande, J. Myllymaki and K. Wenger Department of Computer Sciences, University of Wisconsin--Madison 1210 W. Dayton St., Madison, Wisconsin 53706 Tel: (608)262-6611, Fax: (608)262-9777 fmiron,raghu,beyer,guangshu,donjerko,ssl,jus ..."
Abstract
-
Cited by 70 (4 self)
- Add to MetaCart
) M. Livny, R. Ramakrishnan, K. Beyer, G. Chen, D. Donjerkovic, S. Lawande, J. Myllymaki and K. Wenger Department of Computer Sciences, University of Wisconsin--Madison 1210 W. Dayton St., Madison, Wisconsin 53706 Tel: (608)262-6611, Fax: (608)262-9777 fmiron,raghu,beyer,guangshu,donjerko,ssl,jussi,wengerg@cs.wisc.edu Abstract DEVise is a data exploration system that allows users to easily develop, browse, and share visual presentations of large tabular datasets (possibly containing or referencing multimedia objects) from several sources. The DEVise framework, implemented in a tool that has been already successfully applied to a variety of real applications by a number of user groups, makes several contributions. In particular, it combines support for extended relational queries with powerful data visualization features. Datasets much larger than available main memory can be handled---DEVise is currently being used to visualize datasets well in excess of 100MB--- and data can be in...
Visual Exploration of Large Data Sets
- in Proc. of SPIE -- Int. Soc. Opt. Eng
, 1996
"... DEVise is a data visualization and exploration system capable of handling large data sets using off-theshelf hardware with minimal memory requirements. Data can be large in volume, complex in structure (multidimensional and/or hierarchical), and may be imported from different sources such as databas ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
DEVise is a data visualization and exploration system capable of handling large data sets using off-theshelf hardware with minimal memory requirements. Data can be large in volume, complex in structure (multidimensional and/or hierarchical), and may be imported from different sources such as database servers, external programs, and World Wide Web resources. Commercial and scientific databases can also be linked to DEVise to allow the user to visualize and analyze related information from heterogeneous sources. Associations between data sources are developed interactively as the user gains more knowledge of the data being explored. To assist in handling large data sets, DEVise allows a user to logically split the data into more manageable units at different levels. The user selects a data source, a data stream within a data source (e.g. a time series), attributes of a stream, and a mapping of attributes to graphical objects. At each step, the selections made by the user reduce the data...
What's Next? Sequence Queries
- in Proc. Int. Conf. Management of Data
, 1994
"... A large (and increasing) number of applications require the ability to manipulate sequences of data. While some of these applications use very specialized forms of sequences---e.g., video sequences---there is much to be gained by developing a general framework for managing and querying large sequenc ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
A large (and increasing) number of applications require the ability to manipulate sequences of data. While some of these applications use very specialized forms of sequences---e.g., video sequences---there is much to be gained by developing a general framework for managing and querying large sequence data sets. Important target domains include scientific modelling, stock market and financial analysis, analysis of trace data, temporal queries, and exploratory data analysis or `data mining'. Current database systems cannot address the problems in these domains satisfactorily, and developing suitable systems and techniques is an important and interesting research problem. In this paper, our goal is primarily to motivate the importance of sequence query processing, and to discuss why current systems are unsatisfactory. We illustrate some of the technical challenges in developing suitable systems through examples. We also outline some of the work that we are doing in this area in the SEQ pr...
Interactive Classification of Very Large Datasets with BIRCH
- Proc. of Workshop on Research Issues on Data Mining and Knowledge Discovery (in cooperation with ACM-SIGMOD'96
"... Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely studied problems in this area is the identification of clusters in a multi-dimensional dataset. Prior work has mostly been in the Statistics and Machine Learning communities, and does n ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely studied problems in this area is the identification of clusters in a multi-dimensional dataset. Prior work has mostly been in the Statistics and Machine Learning communities, and does not adequately address the problem of large datasets and minimization of I/O costs. BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies), is a data clustering algorithm especially suitable for very large datasets. BIRCH incrementally and dynamically clusters incoming multi-dimensional metric data points. BIRCH can typically find a good clustering with a single scan of the data, and improve the clustering quality further with a few additional scans. It adjusts dynamically to the input dataset to try to produce the best quality clustering with the available resources (i.e., available memory and time constraints). In this paper, we demonstrate that BIRCH can be integrated with a...
BIRCH: A New Data Clustering . . .
- DATA MINING AND KNOWLEDGE DISCOVERY
, 1998
"... Data clustering is an important technique for exploratory data analysis, and has been studied for several years. It has been shown to be useful in many practical domains such as data classification and image processing. Recently, there has been a growing emphasis on exploratory analysis of very lar ..."
Abstract
- Add to MetaCart
Data clustering is an important technique for exploratory data analysis, and has been studied for several years. It has been shown to be useful in many practical domains such as data classification and image processing. Recently, there has been a growing emphasis on exploratory analysis of very large datasets to discover useful patterns and/or correlations among attributes. This is called data mining, and data clustering is regarded as a particular branch. However existing data clustering methods do not adequately address the problem of processing large datasets with a limited amount of resources (e.g., memory and cpu cycles). So as the dataset size increases, they do not scale up well in terms of memory requirement, running time, and result quality. In this paper, an efficient and scalable data clustering method is proposed, based on a new in-memory data structure called CF-tree, which serves as an in-memory summary of the data distribution. We have implemented it in a system called...

