Results 1  10
of
11
Wrappers For Performance Enhancement And Oblivious Decision Graphs
, 1995
"... In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are stu ..."
Abstract

Cited by 116 (7 self)
 Add to MetaCart
In this doctoral dissertation, we study three basic problems in machine learning and two new hypothesis spaces with corresponding learning algorithms. The problems we investigate are: accuracy estimation, feature subset selection, and parameter tuning. The latter two problems are related and are studied under the wrapper approach. The hypothesis spaces we investigate are: decision tables with a default majority rule (DTMs) and oblivious readonce decision graphs (OODGs).
Static Versus Dynamic Sampling for Data Mining
 In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining
, 1996
"... As data warehouses grow to the point where one hundred gigabytes is considered small, the computational efficiency of datamining algorithms on large databases becomes increasingly important. Using a sample from the database can speed up the datamining process, but this is only acceptable if it does ..."
Abstract

Cited by 80 (0 self)
 Add to MetaCart
As data warehouses grow to the point where one hundred gigabytes is considered small, the computational efficiency of datamining algorithms on large databases becomes increasingly important. Using a sample from the database can speed up the datamining process, but this is only acceptable if it does not reduce the quality of the mined knowledge. To this end, we introduce the "Probably Close Enough" criterion to describe the desired properties of a sample. Sampling usually refers to the use of static statistical tests to decide whether a sample is sufficiently similar to the large database, in the absence of any knowledge of the tools the data miner intends to use. We discuss dynamic sampling methods, which take into account the mining tool being used and can thus give better samples. We describe dynamic schemes that observe a mining tool's performance on training samples of increasing size and use these results to determine when a sample is sufficiently large. We evaluate these sampl...
Genetic Programming and MultiAgent Layered Learning by Reinforcements
 In Genetic and Evolutionary Computation Conference
, 2002
"... We present an adaptation of the standard genetic program (GP) to hierarchically decomposable, multiagent learning problems. To break down a problem that requires cooperation of multiple agents, we use the team objective function to derive a simpler, intermediate objective function for pairs of coop ..."
Abstract

Cited by 40 (4 self)
 Add to MetaCart
We present an adaptation of the standard genetic program (GP) to hierarchically decomposable, multiagent learning problems. To break down a problem that requires cooperation of multiple agents, we use the team objective function to derive a simpler, intermediate objective function for pairs of cooperating agents. We apply GP to optimize first for the intermediate, then for the team objective function, using the final population from the earlier GP as the initial seed population for the next. This layered learning approach facilitates the discovery of primitive behaviors that can be reused and adapted towards complex objectives based on a shared team goal.
The LearningCurve Sampling Method Applied to ModelBased Clustering
 Also in AI and Statistics
, 2001
"... We examine the learningcurve sampling method, an approach for applying machinelearning algorithms to large data sets. The approach is based on the observation that the computational cost of learning a model increases as a function of the sample size of the training data, whereas the accuracy of a m ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
We examine the learningcurve sampling method, an approach for applying machinelearning algorithms to large data sets. The approach is based on the observation that the computational cost of learning a model increases as a function of the sample size of the training data, whereas the accuracy of a model has diminishing improvements as a function of sample size. Thus, the learningcurve sampling method monitors the increasing costs and performance as larger and larger amounts of data are used for training, and terminates learning when future costs outweigh future benefits. In this paper, we formalize the learningcurve sampling method and its associated costbenefit tradeoff in terms of decision theory. In addition, we describe the application of the learningcurve sampling method to the task of modelbased clustering via the expectationmaximization (EM) algorithm. In experiments on three real data sets, we show that the learningcurve sampling method produces models that are nearly as accurate as those trained on complete data sets, but with dramatically reduced learning times. Finally, we describe an extension of the basic learningcurve approach for modelbased clustering that results in an additional speedup. This extension is based on the observation that the shape of the learning curve for a given model and data set is roughly independent of the number of EM iterations used during training. Thus, we run EM for only a few iterations to decide how many cases to use for training, and then run EM to full convergence once the number of cases is selected. Keywords: Learningcurve sampling method, clustering, scalability, decision theory, sampling 1.
Time Series Learning with Probabilistic Network Composites
 University of Illinois
, 1998
"... The purpose of this research is to extend the theory of uncertain reasoning over time through integrated, multistrategy learning. Its focus is on decomposable, concept learning problems for classification of spatiotemporal sequences. Systematic methods of task decomposition using attributedriven m ..."
Abstract

Cited by 10 (10 self)
 Add to MetaCart
The purpose of this research is to extend the theory of uncertain reasoning over time through integrated, multistrategy learning. Its focus is on decomposable, concept learning problems for classification of spatiotemporal sequences. Systematic methods of task decomposition using attributedriven methods, especially attribute partitioning, are investigated. This leads to a novel and important type of unsupervised learning in which the feature construction (or extraction) step is modified to account for multiple sources of data and to systematically search for embedded temporal patterns. This modified technique is combined with traditional cluster definition methods to provide an effective mechanism for decomposition of time series learning problems. The decomposition process interacts with model selection from a collection of probabilistic models such as temporal artificial neural networks and temporal Bayesian networks. Models are chosen using a new quantitative (metricbased) approach that estimates expected performance of a learning architecture, algorithm, and mixture model on a newly defined subproblem. By mapping subproblems to customized configurations of probabilistic networks for time series learning, a hierarchical, supervised learning system with enhanced generalization quality can be automatically built. The system can improve data fusion
Modelling Classification Performance for Large Data Sets  An Empirical Study
, 2001
"... . For many learning algorithms, their learning accuracy will increase as the size of training data increases, forming the wellknown learning curve. Usually a learning curve can be fitted by interpolating or extrapolating some points on it with a specified model. The obtained learning curve can ..."
Abstract

Cited by 5 (0 self)
 Add to MetaCart
. For many learning algorithms, their learning accuracy will increase as the size of training data increases, forming the wellknown learning curve. Usually a learning curve can be fitted by interpolating or extrapolating some points on it with a specified model. The obtained learning curve can then be used to predict the maximum achievable learning accuracy or to estimate the amount of data needed to achieve an expected learning accuracy, both of which will be especially meaningful to data mining on large data sets. Although some models have been proposed to model learning curves, most of them do not test their applicability to large data sets. In this paper, we focus on this issue. We empirically compare six potentially useful models by fitting learning curves of two typical classification algorithmsC4.5 (decision tree) and LOG (logistic discrimination) on eight large UCI benchmark data sets. By using all available data for learning, we fit a fulllength learning curve; by using a small portion of the data, we fit a partlength learning curve. The models are then compared in terms of two performances: (1) how well they fit a fulllength learning curve, and (2) how well a fitted partlength learning curve can predict learning accuracy at the full length. Experimental results show that the power law (y = a  b # x c ) is the best among the six models in both the performances for the two algorithms and all the data sets. These results support the applicability of learning curves to data mining. 1
Noise Tolerance of EPBased Classifiers
"... Abstract. Emerging Pattern (EP)based classifiers are a type of new classifiers based on itemsets whose occurrence in one dataset varies significantly from that of another. These classifiers are very promising and have shown to perform comparably with some popular classifiers. In this paper, we cond ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Emerging Pattern (EP)based classifiers are a type of new classifiers based on itemsets whose occurrence in one dataset varies significantly from that of another. These classifiers are very promising and have shown to perform comparably with some popular classifiers. In this paper, we conduct two experiments to study the noise tolerance of EPbased classifiers. A primary concern is to ascertain if overfitting occurs in them. Our results highlight the fact that the aggregating approach in constructing EPbased classifiers prevents them from overfitting. We further conclude that perfect training accuracy does not necessarily lead to overfitting of a classifier as long as there exists a suitable mechanism, such as an aggregating approach, to counterbalance any propensity to overfit.
Modelling modelled
 S.E.E.D. Journal
"... A model is one of the most fundamental concepts: it is a formal and generalized explanation of a phenomenon. Only with models we can bridge the particulars and predict the unknown. Virtually all our intellectual work turns around finding models, evaluating models, using models. Because models are so ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
A model is one of the most fundamental concepts: it is a formal and generalized explanation of a phenomenon. Only with models we can bridge the particulars and predict the unknown. Virtually all our intellectual work turns around finding models, evaluating models, using models. Because models are so pervasive, it makes sense to take a look at modelling itself. We will approach this problem, of course, by
Random Sampling for Classification on Large Data Sets
, 2002
"... i Acknowledgement First, I must express my gratitude to Dr Liu Bing, who has been a knowledgeable supervisor and an objective critic of my study. Working under his patient guidance has become a privilege and a pleasure. I have gained immensely from his deep insight, creative thinking, and rigorous s ..."
Abstract
 Add to MetaCart
(Show Context)
i Acknowledgement First, I must express my gratitude to Dr Liu Bing, who has been a knowledgeable supervisor and an objective critic of my study. Working under his patient guidance has become a privilege and a pleasure. I have gained immensely from his deep insight, creative thinking, and rigorous scholarship. I have also received his financial support to attend conferences and RA job in his project. Without his guidance and help, I would probably never be able to complete the thesis. I must also thank Dr. Liu Huan, who was my previous supervisor, for his valuable guidance and continuous help in my study, especially his strong recommendation for my previous GA job. I have learned from him the skill of keeping the research focused and scheduled. My sincere thanks also goes to Dr. Hu Feifang, who had spent a lot of time in helping me on statistics when he lectured in NUS. Many ideas of this thesis were inspired in discussion with him. Some conceptual errors were corrected due to his effort.