Results 1 -
7 of
7
Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases
- IEEE Transactions on Visualization and Computer Graphics
, 2002
"... In the last several years, large multi-dimensional databases have become common in a variety of applications such as data warehousing and scientific computing. Analysis and exploration tasks place significant demands on the interfaces to these databases. Because of the size of the data sets, dense g ..."
Abstract
-
Cited by 93 (5 self)
- Add to MetaCart
In the last several years, large multi-dimensional databases have become common in a variety of applications such as data warehousing and scientific computing. Analysis and exploration tasks place significant demands on the interfaces to these databases. Because of the size of the data sets, dense graphical representations are more effective for exploration than spreadsheets and charts. Furthermore, because of the exploratory nature of the analysis, it must be possible for the analysts to change visualizations rapidly as they pursue a cycle involving first hypothesis and then experimentation.
Adaptive Sampling Methods for Scaling Up Knowledge Discovery Algorithms
- Data Mining and Knowledge Discovery
, 1999
"... Scalability is a key requirement for any KDD and data mining algorithm, and one of the biggest research challenges is to develop methods that allow to use large amounts of data. One possible approach for dealing with huge amounts of data is to take a random sample and do data mining on it, since for ..."
Abstract
-
Cited by 35 (7 self)
- Add to MetaCart
Scalability is a key requirement for any KDD and data mining algorithm, and one of the biggest research challenges is to develop methods that allow to use large amounts of data. One possible approach for dealing with huge amounts of data is to take a random sample and do data mining on it, since for many data mining applications approximate answers are acceptable. However, as argued by several researchers, random sampling is difficult to use due to the difficulty of determining an appropriate sample size. In this paper, we take a sequential sampling approach for solving this difficulty, and propose an adaptive sampling method that solves a general problem covering many actual problems arising in applications of discovery science. An algorithm following this method obtains examples sequentially in an online fashion, and it determines from the obtained examples whether it has already seen a large enough number of examples. Thus, sample size is notfixed a priori; instead, it adaptively depends on the situation. Due to this adaptiveness, if we are not in a worst case situation as fortunately happens in many practical applications, then we can solve the problem with a number of examples much smaller than the required in the worst case. We prove the correctness of our method and estimates its efficiency theoretically. For illustrating its usefulness, we consider one concrete example of using sampling, provide an algorithm based on our method, and show its efficiency by experimental evaluation.
Scaling up a Boosting-Based Learner via Adaptive Sampling
- In Proceedings of the Fourth Pacific-Asia Conference on Knowledge Discovery and Data Mining
, 2000
"... In this paper we present a experimental evaluation of a boosting based learning system and show that can be run efficiently over a large dataset. The system uses as base learner decision stumps, single atribute decision trees with only two terminal nodes. To select the best decision stump at each it ..."
Abstract
-
Cited by 11 (5 self)
- Add to MetaCart
In this paper we present a experimental evaluation of a boosting based learning system and show that can be run efficiently over a large dataset. The system uses as base learner decision stumps, single atribute decision trees with only two terminal nodes. To select the best decision stump at each iteration we use an adaptive sampling method. As a boosting algorithm, we use a modification of AdaBoost that is suitable to be combined with a base learner that does not use all the dataset. We provide experimental evidence that our method is as accurate as the equivalent algorithm that uses all the dataset but much faster.
Simple Sampling Techniques for Discovery Science
, 2000
"... this article. This work issup orted inpI0 by Grant-in-Aid for Scientific Research on Priority Areas (Discovery Science), 1999, the Ministry of Education, Science,Sp orts and Culture. ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
this article. This work issup orted inpI0 by Grant-in-Aid for Scientific Research on Priority Areas (Discovery Science), 1999, the Ministry of Education, Science,Sp orts and Culture.
From Computational Learning Theory to Discovery Science
, 1999
"... . Machine learning has been one of the important subjects of AI that is motivated by many real world applications. In theoretical computer science, researchers also have introduced mathematical frameworks for investigating machine learning, and in these frameworks, many interesting results have been ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
. Machine learning has been one of the important subjects of AI that is motivated by many real world applications. In theoretical computer science, researchers also have introduced mathematical frameworks for investigating machine learning, and in these frameworks, many interesting results have been obtained. Now we are proceeding to a new stage to study how to apply these fruitful theoretical results to real problems. We point out in this paper that \adaptivity" is one of the important issues when we consider applications of learning techniques, and we propose one learning algorithm with this feature. 1 Introduction Discovery science 1 is a new area of computer science that aims at (i) developing eÆcient computational methods which enable automatic discoveries of scientic knowledge and decision making rules and (ii) understanding all the issues concerned with this goal. Of course, discovery science involves many areas, from practical to theoretical, of computer science. For exampl...
Sequential Sampling Algorithms: Unified Analysis and Lower Bounds
, 2001
"... Sequential sampling algorithms have recently attracted interest as a way to design scalable algorithms for Data mining and KDD processes. In this paper, we identify an elementary sequential sampling task (estimation from examples), from which one can derive many other tasks appearing in practice. We ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Sequential sampling algorithms have recently attracted interest as a way to design scalable algorithms for Data mining and KDD processes. In this paper, we identify an elementary sequential sampling task (estimation from examples), from which one can derive many other tasks appearing in practice. We present a generic algorithm to solve this task and an analysis of its correctness and running time that is simpler and more intuitive than those existing in the literature. For two specific tasks, frequency and advantage estimation, we derive lower bounds on running time in addition to the general upper bounds.
Faster Near-Optimal Reinforcement Learning: Adding Adaptiveness to the E³ Algorithm
- In Algorithmic Learning Theory, 10th International Conference, ALT ’99
, 1999
"... Recently, Kearns and Singh presented the first probably efficient and near-optimal algorithm for reinforcement learning in general Markov decision processes. One of the key contributions of the algorithm is its explicit treatment of the exploration-exploitation trade off. In this paper, we show how ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Recently, Kearns and Singh presented the first probably efficient and near-optimal algorithm for reinforcement learning in general Markov decision processes. One of the key contributions of the algorithm is its explicit treatment of the exploration-exploitation trade off. In this paper, we show how the algorithm can be improved by substituting the exploration phase, that builds a model of the underlying Markov decision process by estimating the transition probabilities, by an adaptive sampling method more suitable for the problem. Our improvement is two-folded. First, our theoretical bound on the worst case time need to converge to an almost optimal policy is significatively smaller. Second, due to the adaptiveness of the sampling method we use, we discuss how our algorithm might perform better in practice than the previous one.

