Results 1  10
of
28
Decision templates for multiple classifier fusion: an experimental comparison
 Pattern Recognition
, 2001
"... Multiple classifier fusion may generate more accurate classification than each of the constituent classifiers. Fusion is often based on fixed combination rules like the product and average. Only under strict probabilistic conditions can these rules be justified. We present here a simple rule for ada ..."
Abstract

Cited by 107 (9 self)
 Add to MetaCart
Multiple classifier fusion may generate more accurate classification than each of the constituent classifiers. Fusion is often based on fixed combination rules like the product and average. Only under strict probabilistic conditions can these rules be justified. We present here a simple rule for adapting the class combiner to the application. c decision templates (one per class) are estimated with the same training set that is used for the set of classifiers. These templates are then matched to the decision profile of new incoming objects by some similarity measure. We compare 11 versions of our model with 14 other techniques for classifier fusion on the Satimage and Phoneme datasets from the database ELENA. Our results show that decision templates based on integral type measures of similarity are superior to the other schemes on both data sets.
Using TwoClass Classifiers for Multiclass Classification
"... The generalization from twoclass classification to multiclass classification is not straightforward for discriminants which are not based on density estimation. Simple combining methods use voting, but this has the drawback of inconsequent labelings and ties. More advanced methods map the discrimin ..."
Abstract

Cited by 36 (1 self)
 Add to MetaCart
The generalization from twoclass classification to multiclass classification is not straightforward for discriminants which are not based on density estimation. Simple combining methods use voting, but this has the drawback of inconsequent labelings and ties. More advanced methods map the discriminant outputs to approximate posterior probability estimates and combine these, while other methods use errorcorrecting output codes. In this paper we want to show the possibilities of simple generalizations of the twoclass classification, using voting and combinations of approximate posterior probabilities.
Markov Monitoring with Unknown States
 IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS
, 1993
"... Pattern recognition methods and hidden Markov models can be effective tools for online health monitoring of communications systems. Previous work has assumed that the states in the system model are exhaustive. This can be a significant drawback in realworld fault monitoring applications where it is ..."
Abstract

Cited by 27 (1 self)
 Add to MetaCart
Pattern recognition methods and hidden Markov models can be effective tools for online health monitoring of communications systems. Previous work has assumed that the states in the system model are exhaustive. This can be a significant drawback in realworld fault monitoring applications where it is if not impossible to model all the possible fault states of the system in advance. In this paper a method is described for extending the Markov monitoring approach to allow for unknown or novel states which can not be accounted for when the model is being designed, The method is described and evaluated on data from one of the Jet Propulsion Laboratory 's Deep Space Network antennas. The experimental results indicate that the method is both practical and effective, allowing both discrimination between known states and detection of previously unknown fault conditions.
Unsupervised Discrimination of Clustered Data via Optimization of Binary Information Gain
 Advances in Neural Information Processing Systems
, 1993
"... We present the informationtheoretic derivation of a learning algorithm that clusters unlabelled data with linear discriminants. In contrast to methods that try to preserve information about the input patterns, we maximize the information gained from observing the output of robust binary discriminat ..."
Abstract

Cited by 21 (9 self)
 Add to MetaCart
We present the informationtheoretic derivation of a learning algorithm that clusters unlabelled data with linear discriminants. In contrast to methods that try to preserve information about the input patterns, we maximize the information gained from observing the output of robust binary discriminators implemented with sigmoid nodes. We derive a local weight adaptation rule via gradient ascent in this objective, demonstrate its dynamics on some simple data sets, relate our approach to previous work and suggest directions in which it may be extended.
Reweighting Monte Carlo Mixtures
 J. AMER. STATIST. ASSOC
, 1991
"... Markov chain Monte Carlo (e. g., the Metropolis algorithm, Hastings algorithm, and Gibbs sampler) is a general multivariate simulation method applicable to a wide range of problems. It permits sampling from any stochastic process whose density is known up to a constant of proportionality. The Gibbs ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
Markov chain Monte Carlo (e. g., the Metropolis algorithm, Hastings algorithm, and Gibbs sampler) is a general multivariate simulation method applicable to a wide range of problems. It permits sampling from any stochastic process whose density is known up to a constant of proportionality. The Gibbs sampler has recently received much attention as a method of simulating from posterior distributions in Bayesian inference, but Markov chain Monte Carlo is no less important in frequentist inference with applications in maximum likelihood, hypothesis testing, and the parametric bootstrap. It is most useful when combined with importance reweighting so that a Monte Carlo sample from one distribution can be used for inference about many distributions. In Bayesian inference, reweighting permits the calculation of posteriors corresponding to a range of priors using a Monte Carlo sample from just one posterior. In likelihood inference, reweighting permits the calculation of the whole likelihood function using a Monte Carlo sample from just one distribution in the model. Given this estimate of the likelihood, a parametric bootstrap calculation of the sampling distribution of the maximum likelihood estimate can be done using just one more Monte Carlo sample. Although reweighting can save much calculation, it does not work well unless the distribution being reweighted places appreciable mass in all regions of interest. Hence it is often not advisable to sample from a distribution in the model. Reweighting a mixture of distributions in the model may perform much better. But using such a mixture gives rise to another problem when the densities are known only up to constants of proportionality. These normalizing constants must be calculated to obtain the mixture density. Direct Monte Carl...
Estimating Risk and Rate Levels, Ratios, and Differences in CaseControl Studies
, 2001
"... Classic (or "cumulative") casecontrol sampling designs do not admit inferences about quantities of interest other than risk ratios, and then only by making the rare events assumption. Probabilities, risk differences, and other quantities cannot be computed without knowledge of the populat ..."
Abstract

Cited by 6 (2 self)
 Add to MetaCart
Classic (or "cumulative") casecontrol sampling designs do not admit inferences about quantities of interest other than risk ratios, and then only by making the rare events assumption. Probabilities, risk differences, and other quantities cannot be computed without knowledge of the population incidence fraction. Similarly, density (or "risk set") casecontrol sampling designs do not allow inferences about quantities other than the rate ratio. Rates, rate differences, cumulative rates, risks, and other quantities cannot be estimated unless auxiliary information about the underlying cohort such as the number of controls in each full risk set is available. Most scholars who have considered the issue recommend reporting more than just risk and rate ratios, but auxiliary population information needed to do this is not usually available. We address this problem by developing methods that allow valid inferences about all relevant quantities of interest from either type of casecontrol study when completely ignorant of or only partially knowledgeable about relevant auxiliary population information.
Analysis of contingency tables by ideal point discriminant analysis
 Psychometrika
, 1987
"... Crossclassified data are frequently encountered in behavioral and social science research. The loglinear model and dual scaling (correspondence analysis) are two representative methods analyzing such data. An alternative method, based on ideal point discriminant analysis (DA), proposed for analysis ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
Crossclassified data are frequently encountered in behavioral and social science research. The loglinear model and dual scaling (correspondence analysis) are two representative methods analyzing such data. An alternative method, based on ideal point discriminant analysis (DA), proposed for analysis of contingency tables, which in a certain sense encompasses the two existing methods. A variety of interesting structures can be imposed on rows and columns of the tables through manipulations of predictor variables and/or as direct constraints on model parameters. This, along with maximum likelihood estimation of the model parameters, allows interesting model comparisons. This is illustrated by the analysis of several data sets.
Sisterhood of Classifiers: A Comparative Study of Naive Bayes and Noisyor Networks
"... Classification is a task central to many machine learning problems. In this paper we examine two Bayesian network classifiers, the naive Bayes and the noisyor models. They are of particular interest because of their simple structures. We compare them on two dimensions: expressive power and ability ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Classification is a task central to many machine learning problems. In this paper we examine two Bayesian network classifiers, the naive Bayes and the noisyor models. They are of particular interest because of their simple structures. We compare them on two dimensions: expressive power and ability to learn. As it turns out, naive Bayes, noisyor, and logistic regression classifiers all have equivalent expressiveness. We show mathematical derivations of how to transform a classifer in one model into the other two. These classifiers differ on their ability to learn though. We conducted an experiment confirming the intuition that naive Bayes performs better than noisyor when the data fits its independence assumptions, and vice versa. However, we still do not have a clear set of criteria for determining under exactly what conditions would each classifier excel. Further study of the strenghts and weaknesses of each classifier should provide deeper insight on how to improve the current models. One possible extension would be to combine the naive Bayes and noisyor model so that the network will more closely depict the actual relationship between the attributes. 1
Efficient Learning by Combining ConfidenceRated Classifiers to Incorporate Unlabeled Medical Data
"... Abstract. In this paper, we propose a new dynamic learning framework that requires a small amount of labeled data in the beginning, then incrementally discovers informative unlabeled data to be handlabeled and incorporates them into the training set to improve learning performance. This approach ha ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
Abstract. In this paper, we propose a new dynamic learning framework that requires a small amount of labeled data in the beginning, then incrementally discovers informative unlabeled data to be handlabeled and incorporates them into the training set to improve learning performance. This approach has great potential to reduce the training expense in many medical image analysis applications. The main contributions lie in a new strategy to combine confidencerated classifiers learned on different feature sets and a robust way to evaluate the “informativeness” of each unlabeled example. Our framework is applied to the problem of classifying microscopic cell images. The experimental results show that 1) our strategy is more effective than simply multiplying the predicted probabilities, 2) the error rate of highconfidence predictions is much lower than the average error rate, and 3) handlabeling informative examples with lowconfidence predictions improves performance efficiently and the performance difference from handlabeling all unlabeled data is very small. 1