Results 11 - 20
of
35
Learning from Examples: Generation and Evaluation of Decision Trees for Software Resource Analysis
- IEEE Trans. Software Eng
, 1988
"... Solutions to the problem of learning from examples will have far-reaching benefits, and therefore, the problem is one of the most widely studied in the field of machine learning. The purpose of this study is to investigate a general solution method for the problem, the automatic generation of decisi ..."
Abstract
-
Cited by 51 (5 self)
- Add to MetaCart
Solutions to the problem of learning from examples will have far-reaching benefits, and therefore, the problem is one of the most widely studied in the field of machine learning. The purpose of this study is to investigate a general solution method for the problem, the automatic generation of decision (or classification) trees. The approach is to provide insights through in-depth empirical characterization and evaluation of decision trees for one problem domain, software resource data analysis. The purpose of the decision trees is to identify classes of objects (software modules) that had high development effort or faults, where "high" was defined to be in the uppermost quartile relative to past data. Sixteen software systems ranging from 3000 to 112,000 source lines have been selected for analysis from a NASA production environment. The collection and analysis of 74 attributes (or metrics), for over 4700 objects, capture a multitude of information about the objects: development effort...
Dimensionality Reduction Using Genetic Algorithms
, 2000
"... Pattern recognition generally requires that objects be described in terms of a set of measurable features. The selection and quality of the features representing each pattern has a considerable bearing on the success of subsequent pattern classification. Feature extraction is the process of deriving ..."
Abstract
-
Cited by 51 (4 self)
- Add to MetaCart
Pattern recognition generally requires that objects be described in terms of a set of measurable features. The selection and quality of the features representing each pattern has a considerable bearing on the success of subsequent pattern classification. Feature extraction is the process of deriving new features from the original features in order to reduce the cost of feature measurement, increase classifier efficiency, and allow higher classification accuracy. Many current feature extraction techniques involve linear transformations of the original pattern vectors to new vectors of lower dimensionality. While this is useful for data visualization and increasing classification efficiency, it does not necessarily reduce the number of features that must be measured, since each new feature may be a linear combination of all of the features in the original pattern vector. Here we present a new approach to feature extraction in which feature selection, feature extraction, and classifier training are performed simultaneously using a genetic algorithm. The genetic algorithm optimizes a vector of feature weights, which are used to scale the individual features in the original pattern vectors in either a linear or a nonlinear fashion. A masking vector is also employed to perform simultaneous selection of a subset of the features. We employ this technique in combination with the k-nearest-neighbor classification rule, and compare the results with classical feature selection and extraction techniques, including sequential floating forward feature selection, and linear discriminant analysis. We also present results for identification of favorable water binding sites on protein surfaces, an important problem in biochemistry and drug design.
Induction in Noisy Domains
, 1994
"... This paper examines the induction of classification rules from examples using real-world data. Real-world data is almost always characterized by two features, which are important for the design of an induction algorithm. Firstly, there is often noise present, for example, due to imperfect measuri ..."
Abstract
-
Cited by 38 (5 self)
- Add to MetaCart
This paper examines the induction of classification rules from examples using real-world data. Real-world data is almost always characterized by two features, which are important for the design of an induction algorithm. Firstly, there is often noise present, for example, due to imperfect measuring equipment used to collect the data. Secondly the description language is often incomplete, such that examples with identical descriptions in the language will not always be members of the same class. Many induction systems make the `noiseless domain' assumption that the examples do not contain errors and the description language is complete, and consequently constrain their search for rules to those for which no counterexamples exist in the data used for induction. However, in real-world domains correlations between attributes and classes in a data set are rarely without exceptions. To locate such correlations and induce rules describing them it is also necessary to consider rules which may not classify all the training examples correctly. This paper firstly discusses some of the problems presented by noise and proposes a top-down induction algorithm for induction in real-world domains. Secondly, an experimental comparison of this algorithm with other induction systems is presented using three sets of real-world medical data.
StatLog: Comparison of Classification Algorithms on Large Real-World Problems
, 1995
"... This paper describes work in the StatLog project comparing classification algorithms on large real-world problems. The algorithms compared were from: symbolic learning (CART, C4.5, NewID, AC 2 , ITrule, Cal5, CN2), statistics (Naive Bayes, k-nearest neighbor, kernel density, linear discriminant, qua ..."
Abstract
-
Cited by 37 (0 self)
- Add to MetaCart
This paper describes work in the StatLog project comparing classification algorithms on large real-world problems. The algorithms compared were from: symbolic learning (CART, C4.5, NewID, AC 2 , ITrule, Cal5, CN2), statistics (Naive Bayes, k-nearest neighbor, kernel density, linear discriminant, quadratic discriminant, logistic regression, projection pursuit, Bayesian networks), and neural networks (back-propagation, radial basis functions). Twelve datasets were used: five from image analysis, three from medicine, and two each from engineering and finance. We found that which algorithm performed best depended critically on the dataset investigated. We therefore developed a set of dataset descriptors to help decide which algorithms are suited to particular datasets. For example, datasets with extreme distributions (skew ? 1 and kurtosis ? 7) and with many binary/categorical attributes (? 38%) tend to favor symbolic learning algorithms. We suggest how classification algorith...
Inductive Policy: The Pragmatics of Bias Selection
- MACHINE LEARNING
, 1995
"... This paper extends the currently accepted model of inductive bias by identifying six categories of bias and separates inductive bias from the policy for its selection (the inductive policy). We analyze existing "blas selection " systems, examining the similarities and differences in their ..."
Abstract
-
Cited by 37 (9 self)
- Add to MetaCart
This paper extends the currently accepted model of inductive bias by identifying six categories of bias and separates inductive bias from the policy for its selection (the inductive policy). We analyze existing "blas selection " systems, examining the similarities and differences in their inductive policies, and idemify three techniques useful for building inductive policies. We then present a framework for representing and automaticaIly selecting a wide variety of biases and describe experiments with an instantiation of the framework addressing various pragmatic tradeoffs of time, space, accuracy, and the cost oferrors. The experiments show that a common framework can be used to implement policies for a variety of different types of blas selection, such as parameter selection, term selection, and example selection, using similar techniques. The experiments also show that different tradeoffs can be made by the implementation of different policies; for example, from the same data different rule sets can be learned based on different tradeoffs of accuracy versus the cost of erroneous predictions.
Machine learning for medical diagnosis: history, state of the art and perspective
- Artificial Intelligence in Medicine
, 2001
"... The paper provides an overview of the development of intelligent data analysis in medicine from a machine learning perspective: a historical view, a state of the art view and a view on some future trends in this subfield of applied artificial intelligence. The paper is not intended to provide a com- ..."
Abstract
-
Cited by 25 (0 self)
- Add to MetaCart
The paper provides an overview of the development of intelligent data analysis in medicine from a machine learning perspective: a historical view, a state of the art view and a view on some future trends in this subfield of applied artificial intelligence. The paper is not intended to provide a com-prehensive overview but rather describes some subeareas and directions which from my personal point of view seem to be important for applying machine learning in medical diagnosis. In the historical overview I emphasize the naive Bayesian classifier, neural networks and decision trees. I present a comparison of some state of the art systems, representatives from each branch of machine learning, when applied to several medical diagnostic tasks. The future trends are illustrated by two case studies. The first describes a recently developed method for dealing with reliability of decisions of classifiers, which seems to be promising for intelligent data analysis in medicine. The second describes an ap-proach to using machine learning in order to verify some unexplained phenomena from complementary medicine, which is not (yet) approved by the orthodox medical community but could in the future play an important role in overall medical diagnosis and treatment. 1
Learning Logical Exceptions In Chess
, 1994
"... This thesis is about inductive learning, or learning from examples. The goal has been to investigate ways of improving learning algorithms. The chess end-game "King and Rook against King" (KRK) was chosen, and a number of benchmark learning tasks were defined within this domain, sufficient to over-c ..."
Abstract
-
Cited by 16 (2 self)
- Add to MetaCart
This thesis is about inductive learning, or learning from examples. The goal has been to investigate ways of improving learning algorithms. The chess end-game "King and Rook against King" (KRK) was chosen, and a number of benchmark learning tasks were defined within this domain, sufficient to over-challenge stateof -the-art learning algorithms. The tasks comprised learning rules to distinguish (1) illegal positions and (2) legal positions won optimally in a fixed number of moves. From our experimental results with task (1) the best-performing algorithm was selected and a number of improvements were made. The principal extension to this generalisation method was to alter its representation from classical logic to a non-monotonic formalism. A novel algorithm was developed in this framework to implement rule specialisation, relying on the invention of new predicates. When experimentally tested this combined approach did not at first deliver the expected performance gains due to restrictio...
SHYSTER: A Pragmatic Legal Expert System
, 1993
"... Most legal expert systems attempt to implement complex models of legal reasoning. Yet the utility of a legal expert system lies not in the extent to which it simulates a lawyer's approach to a legal problem, but in the quality of its predictions and of its arguments. A complex model of legal reasoni ..."
Abstract
-
Cited by 13 (2 self)
- Add to MetaCart
Most legal expert systems attempt to implement complex models of legal reasoning. Yet the utility of a legal expert system lies not in the extent to which it simulates a lawyer's approach to a legal problem, but in the quality of its predictions and of its arguments. A complex model of legal reasoning is not necessary: a successful legal expert system can be based upon a simplified model of legal reasoning.
Some researchers have based their systems upon a jurisprudential approach to the law, yet lawyers are patently able to operate without any jurisprudential insight. A useful legal expert system should be capable of producing advice similar to that which one might get from a lawyer, so it should operate at the same pragmatic level of abstraction as does a lawyer—not at the more philosophical level of jurisprudence.
A legal expert system called SHYSTER has been developed to demonstrate that a useful legal expert system can be based upon a pragmatic approach to the law. SHYSTER has a simple representation structure which simplifies the problem of knowledge acquisition. Yet this structure is complex enough for SHYSTER to produce useful advice.
SHYSTER is a case-based legal expert system (although it has been designed so that it can be linked with a rule-based system to form a hybrid legal expert system). Its advice is based upon an examination of, and an argument about, the similarities and differences between cases. SHYSTER attempts to model the way in which lawyers argue with cases, but it does not attempt to model the way in which lawyers decide which cases to use in those arguments. Instead, it employs statistical techniques to quantify the similarity between cases. It decides which cases to use in argument, and what prediction it will make, on the basis of that similarity measure.
SHYSTER is of a general design: it provides advice in areas of case law that have been specified by a legal expert using a specification language. Four different, and disparate, areas of law have been specified for SHYSTER, and its operation has been tested in each of those legal domains.
Testing of SHYSTER in these four domains indicates that it is exceptionally good at predicting results, and fairly good at choosing cases with which to construct its arguments. SHYSTER demonstrates the viability of a pragmatic approach to legal expert system design.
Classifier Learning from Noisy Data as Probabilistic Evidence Combination
, 1992
"... This paper presents an approach to learning from noisy data that views the problem as one of reasoning under uncertainty, where prior knowledge of the noise process is applied to compute a posteriori probabilities over the hypothesis space. In preliminary experiments this maximum a posteriori (MAP ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
This paper presents an approach to learning from noisy data that views the problem as one of reasoning under uncertainty, where prior knowledge of the noise process is applied to compute a posteriori probabilities over the hypothesis space. In preliminary experiments this maximum a posteriori (MAP) approach exhibits a learning rate advantage over the C4.5 algorithm that is statistically significant. Introduction The classifier learning problem is to use a set of labeled training data to induce a classifier that will accurately classify as yet unseen, unclassified testing data. Some approaches assume that the training data is correct [ Mitchell, 1982 ] . Some assume that noise is present and simply tolerate it [ Breiman et al., 1984; Quinlan, 1987 ] . Another approach is to exploit knowledge of the presence and nature of noise [ Hirsh, 1990b ] . This paper takes the third approach, and views classifier learning from noisy data as a problem of reasoning under uncertainty, where knowle...
Machine Learning in Prognosis of the Femoral Neck Fracture Recovery
, 1996
"... We compare the performance of several machine learning algorithms in the problem of prognostics of the femoral neck fracture recovery: the K-nearest neighbours algorithm, the semi-naive Bayesian classifier, backpropagation with weight elimination learning of the multilayered neural networks, the LFC ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
We compare the performance of several machine learning algorithms in the problem of prognostics of the femoral neck fracture recovery: the K-nearest neighbours algorithm, the semi-naive Bayesian classifier, backpropagation with weight elimination learning of the multilayered neural networks, the LFC (lookahead feature construction) algorithm, and the Assistant-I and Assistant-R algorithms for top down induction of decision trees using information gain and RELIEFF as search heuristics, respectively. We compare the prognostic accuracy and the explanation ability of di#erent classifiers. Among the di#erent algorithms the semi-naive Bayesian classifier and Assistant-R seem to be the most appropriate. We analyze the combination of decisions of several classifiers for solving prediction problems and show that the combined classifier improves both performance and the explanation ability. Keywords: learning from examples, estimating attributes, explanation ability, impurity function, empirica...

