Results 1 - 10
of
33
Issues in Stacked Generalization
- Journal of Artificial Intelligence Research
, 1999
"... Stacked generalization is a general method of using a high-level model to combine lower-level models to achieve greater predictive accuracy. In this paper we address two crucial issues which have been considered to be a `black art' in classification tasks ever since the introduction of stacked gener ..."
Abstract
-
Cited by 71 (1 self)
- Add to MetaCart
Stacked generalization is a general method of using a high-level model to combine lower-level models to achieve greater predictive accuracy. In this paper we address two crucial issues which have been considered to be a `black art' in classification tasks ever since the introduction of stacked generalization in 1992 by Wolpert: the type of generalizer that is suitable to derive the higher-level model, and the kind of attributes that should be used as its input. We find that best results are obtained when the higher-level model combines the confidence (and not just the predictions) of the lower-level ones.
Inductive and Bayesian learning in medical diagnosis
- Applied Artificial Intelligence
, 1993
"... Abstract. Although successful in medical diagnostic problems, inductive learning systems were not widely accepted in medical practice. In this paper two di erent approaches to machine learning in medical appli-cations are compared: the system for inductive learning of decision trees Assistant, and t ..."
Abstract
-
Cited by 56 (9 self)
- Add to MetaCart
Abstract. Although successful in medical diagnostic problems, inductive learning systems were not widely accepted in medical practice. In this paper two di erent approaches to machine learning in medical appli-cations are compared: the system for inductive learning of decision trees Assistant, and the naive Bayesian classi er. Both methodologies were tested in four medical diagnostic problems: localization of primary tumor, prognostics of recurrence of breast cancer, diagnosis of thyroid diseases, and rheumatology. The accuracy of automatically acquired diagnostic knowledge from stored data records is compared and the interpretation of the knowledge and the explanation ability of the classi cation process of each system is discussed. Surprisingly, thenaiveBayesian classi er is superior to Assistant in classi cation accuracy and explanation ability, while the interpretation of the acquired knowledge seems to be equally valuable. In ad-dition, two extensions to naive Bayesian classi er are brie y described: dealing with continuous attributes, and discovering the dependencies among attributes.
Feature Subset Selection by Bayesian networks: a comparison with genetic and sequential algorithms
"... In this paper we perform a comparison among FSS-EBNA, a randomized, populationbased and evolutionary algorithm, and two genetic and other two sequential search approaches in the well known Feature Subset Selection (FSS) problem. In FSS-EBNA, the FSS problem, stated as a search problem, uses the E ..."
Abstract
-
Cited by 35 (13 self)
- Add to MetaCart
In this paper we perform a comparison among FSS-EBNA, a randomized, populationbased and evolutionary algorithm, and two genetic and other two sequential search approaches in the well known Feature Subset Selection (FSS) problem. In FSS-EBNA, the FSS problem, stated as a search problem, uses the EBNA (Estimation of Bayesian Network Algorithm) search engine, an algorithm within the EDA (Estimation of Distribution Algorithm) approach. The EDA paradigm is born from the roots of the GA community in order to explicitly discover the relationships among the features of the problem and not disrupt them by genetic recombination operators. The EDA paradigm avoids the use of recombination operators and it guarantees the evolution of the population of solutions and the discovery of these relationships by the factorization of the probability distribution of best individuals in each generation of the search. In EBNA, this factorization is carried out by a Bayesian network induced by a chea...
Optimization by learning and simulation of Bayesian and Gaussian networks
, 1999
"... Estimation of Distribution Algorithms (EDA) constitute an example of stochastics heuristics based on populations of individuals every of which encode the possible solutions to the optimization problem. These populations of individuals evolve in succesive generations as the search progresses -- organ ..."
Abstract
-
Cited by 34 (6 self)
- Add to MetaCart
Estimation of Distribution Algorithms (EDA) constitute an example of stochastics heuristics based on populations of individuals every of which encode the possible solutions to the optimization problem. These populations of individuals evolve in succesive generations as the search progresses -- organized in the same way as most evolutionary computation heuristics. In opposition to most evolutionary computation paradigms which consider the crossing and mutation operators as essential tools to generate new populations, EDA replaces those operators by the estimation and simulation of the joint probability distribution of the selected individuals. In this work, after making a review of the different approaches based on EDA for problems of combinatorial optimization as well as for problems of optimization in continuous domains, we propose new approaches based on the theory of probabilistic graphical models to solve problems in both domains. More precisely, we propose to adapt algorit...
An Analysis of Rule Evaluation Metrics
- Proceedings of the 20th International Conference on Machine Learning (ICML-03
, 2003
"... In this paper we analyze the most popular evaluation metrics for separate-and-conquer rule learning algorithms. Our results show that all commonly used heuristics, including accuracy, weighted relative accuracy, entropy, Gini index and information gain, are equivalent to one of two fundamental ..."
Abstract
-
Cited by 32 (9 self)
- Add to MetaCart
In this paper we analyze the most popular evaluation metrics for separate-and-conquer rule learning algorithms. Our results show that all commonly used heuristics, including accuracy, weighted relative accuracy, entropy, Gini index and information gain, are equivalent to one of two fundamental prototypes: precision, which tries to optimize the area under the ROC curve for unknown costs, and a cost-weighted difference between covered positive and negative examples, which tries to find the optimal point under known or assumed costs. We also show that a straightforward generalization of the m-estimate trades off these two prototypes.
Machine learning for medical diagnosis: history, state of the art and perspective
- Artificial Intelligence in Medicine
, 2001
"... The paper provides an overview of the development of intelligent data analysis in medicine from a machine learning perspective: a historical view, a state of the art view and a view on some future trends in this subfield of applied artificial intelligence. The paper is not intended to provide a com- ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
The paper provides an overview of the development of intelligent data analysis in medicine from a machine learning perspective: a historical view, a state of the art view and a view on some future trends in this subfield of applied artificial intelligence. The paper is not intended to provide a com-prehensive overview but rather describes some subeareas and directions which from my personal point of view seem to be important for applying machine learning in medical diagnosis. In the historical overview I emphasize the naive Bayesian classifier, neural networks and decision trees. I present a comparison of some state of the art systems, representatives from each branch of machine learning, when applied to several medical diagnostic tasks. The future trends are illustrated by two case studies. The first describes a recently developed method for dealing with reliability of decisions of classifiers, which seems to be promising for intelligent data analysis in medicine. The second describes an ap-proach to using machine learning in order to verify some unexplained phenomena from complementary medicine, which is not (yet) approved by the orthodox medical community but could in the future play an important role in overall medical diagnosis and treatment. 1
Stacked Generalization: when does it work?
- in Procs. International Joint Conference on Artificial Intelligence
, 1997
"... Stacked generalization is a general method of using a high-level model to combine lower-level models to achieve greater predictive accuracy. In this paper we address two crucial issues which have been considered to be a `black art' in classification tasks ever since the introduction of stacked gener ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Stacked generalization is a general method of using a high-level model to combine lower-level models to achieve greater predictive accuracy. In this paper we address two crucial issues which have been considered to be a `black art' in classification tasks ever since the introduction of stacked generalization in 1992 by Wolpert: the type of generalizer that is suitable to derive the higher-level model, and the kind of attributes that should be used as its input. We demonstrate the effectiveness of stacked generalization for combining three different types of learning algorithms.
Knowledge Discovery In Databases: An Attribute-Oriented Rough Set Approach
, 1995
"... Knowledge Discovery in Databases (KDD) is an active research area with the promise for a high payoff in many business and scientific applications. The grand challenge of knowledge discovery in databases is to automatically process large quantities of raw data, identify the most significant and meani ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Knowledge Discovery in Databases (KDD) is an active research area with the promise for a high payoff in many business and scientific applications. The grand challenge of knowledge discovery in databases is to automatically process large quantities of raw data, identify the most significant and meaningful patterns, and present this knowledge in an appropriate form for achieving the user's goal. Knowledge discovery systems face challenging problems from the real-world databases which tend to be very large, redundant, noisy and dynamic. Each of these problems has been addressed to some extent within machine learning, but few, if any, systems address them all. Collectively handling these problems while producing useful knowledge efficiently and effectively is the main focus of the thesis. In this thesis, we develop an attribute-oriented rough set approach for knowledge discovery in databases. The method adopts the artificial intelligent "learning from examples" paradigm combined with rough...
Probabilistic information retrieval approach for ranking of database query results
- ACM Transactions on Database Systems (TODS
, 2006
"... We investigate the problem of ranking the answers to a database query when many tuples are returned. In particular, we present methodologies to tackle the problem for conjunctive and range queries, by adapting and applying principles of probabilistic models from Information Retrieval for structured ..."
Abstract
-
Cited by 18 (4 self)
- Add to MetaCart
We investigate the problem of ranking the answers to a database query when many tuples are returned. In particular, we present methodologies to tackle the problem for conjunctive and range queries, by adapting and applying principles of probabilistic models from Information Retrieval for structured data. Our solution is domain independent and leverages data and workload statistics and correlations. We evaluate the quality of our approach with a user survey on a real database. Furthermore, we present and experimentally evaluate algorithms to efficiently retrieve the top ranked results, which demonstrate the feasibility of our ranking system.
Discretization of Continuous-Valued Attributes and Instance-Based Learning
, 1994
"... Recent work on discretization of continuous-valued attributes in learning decision trees has produced some positive results. This paper adopts the idea of discretization of continuous-valued attributes and applies it to instance-based learning (Aha, 1990; Aha, Kibler & Albert, 1991). Our experiments ..."
Abstract
-
Cited by 15 (4 self)
- Add to MetaCart
Recent work on discretization of continuous-valued attributes in learning decision trees has produced some positive results. This paper adopts the idea of discretization of continuous-valued attributes and applies it to instance-based learning (Aha, 1990; Aha, Kibler & Albert, 1991). Our experiments have shown that instance-based learning (IBL) usually performs well in continuous-valued attribute domains and poorly in nominal attribute domains. Cost and Salzberg (1993) have devised the modified value-difference metric (MVDM) that raises the performance of IBL in nominal attribute domains. This paper explores a way in which continuous-valued attributes and nominal attributes can be treated cohesively in IBL. An algorithm which combines the discretization of continuous-valued attributes and IB1 (Aha, Kibler & Albert, 1991) using the modified value-difference metric is introduced. The empirical results show that the proposed algorithm, IB1-MVDM* achieves a substantial improvement over C4....

