Results 1  10
of
48
Structure learning in random fields for heart motion abnormality detection
 In CVPR
, 2008
"... Coronary Heart Disease can be diagnosed by assessing the regional motion of the heart walls in ultrasound images of the left ventricle. Even for experts, ultrasound images are difficult to interpret leading to high intraobserver variability. Previous work indicates that in order to approach this pr ..."
Abstract

Cited by 40 (5 self)
 Add to MetaCart
Coronary Heart Disease can be diagnosed by assessing the regional motion of the heart walls in ultrasound images of the left ventricle. Even for experts, ultrasound images are difficult to interpret leading to high intraobserver variability. Previous work indicates that in order to approach this problem, the interactions between the different heart regions and their overall influence on the clinical condition of the heart need to be considered. To do this, we propose a method for jointly learning the structure and parameters of conditional random fields, formulating these tasks as a convex optimization problem. We consider blockL1 regularization for each set of features associated with an edge, and formalize an efficient projection method to find the globally optimal penalized maximum likelihood solution. We perform extensive numerical experiments comparing the presented method with related methods that approach the structure learning problem differently. We verify the robustness of our method on echocardiograms collected in routine clinical practice at one hospital. 1.
nFOIL: Integrating Naïve Bayes and FOIL
, 2005
"... We present the system nFOIL. It tightly integrates the naïve Bayes learning scheme with the inductive logic programming rulelearner FOIL. In contrast to previous combinations, which have employed naïve Bayes only for postprocessing the rule sets, nFOIL employs the naïve Bayes criterion to directly ..."
Abstract

Cited by 24 (3 self)
 Add to MetaCart
We present the system nFOIL. It tightly integrates the naïve Bayes learning scheme with the inductive logic programming rulelearner FOIL. In contrast to previous combinations, which have employed naïve Bayes only for postprocessing the rule sets, nFOIL employs the naïve Bayes criterion to directly guide its search. Experimental evidence shows that nFOIL performs better than both its base line algorithm FOIL or the postprocessing approach, and is at the same time competitive with more sophisticated approaches.
Irrelevance and parameter learning in Bayesian networks
 Artificial Intelligence
, 1996
"... Bayesian network classifiers have been widely used for classification problems. Given a fixed Bayesian network structure, parameters learning can take two different approaches: generative and discriminative learning. While generative parameter learning is more efficient, discriminative parameter lea ..."
Abstract

Cited by 24 (2 self)
 Add to MetaCart
Bayesian network classifiers have been widely used for classification problems. Given a fixed Bayesian network structure, parameters learning can take two different approaches: generative and discriminative learning. While generative parameter learning is more efficient, discriminative parameter learning is more effective. In this paper, we propose a simple, efficient, and effective discriminative parameter learning method, called Discriminative Frequency Estimate (DFE), which learns parameters by discriminatively computing frequencies from data. Empirical studies show that the DFE algorithm integrates the advantages of both generative and discriminative learning: it performs as well as the stateoftheart discriminative parameter learning method ELR in accuracy, but is significantly more efficient. 1.
FeatureBased Pronunciation Modeling for Automatic Speech Recognition
 In Proc. HLT/NAACL
, 2005
"... Spoken language, especially conversational speech, is characterized by great variability in word pronunciation, including many variants that differ grossly from dictionary prototypes. This is one factor in the poor performance of automatic speech recognizers on conversational speech. One approach to ..."
Abstract

Cited by 17 (8 self)
 Add to MetaCart
Spoken language, especially conversational speech, is characterized by great variability in word pronunciation, including many variants that differ grossly from dictionary prototypes. This is one factor in the poor performance of automatic speech recognizers on conversational speech. One approach to handling this variation consists of expanding the dictionary with phonetic substitution, insertion, and deletion rules. Common rule sets, however, typically leave many pronunciation variants unaccounted for and increase word confusability due to the coarse granularity of phone units. We present an alternative approach, in which many types of variation are explained by representing a pronunciation as multiple streams of linguistic features rather than a single stream of phones. Features may correspond to the positions of the speech articulators, such as the lips and tongue, or to acoustic or perceptual categories. By
Efficient discriminative learning of bayesian network classifier via boosted augmented naive bayes
 Bayes, 2005. International Conference on Machine Learning
, 2005
"... The use of Bayesian networks for classification problems has received significant recent attention. Although computationally efficient, the standard maximum likelihood learning method tends to be suboptimal due to the mismatch between its optimization criteria (data likelihood) and the actual goal f ..."
Abstract

Cited by 16 (8 self)
 Add to MetaCart
The use of Bayesian networks for classification problems has received significant recent attention. Although computationally efficient, the standard maximum likelihood learning method tends to be suboptimal due to the mismatch between its optimization criteria (data likelihood) and the actual goal for classification (label prediction). Recent approaches to optimizing the classification performance during parameter or structure learning show promise, but lack the favorable computational properties of maximum likelihood learning. In this paper we present the Boosted Augmented Naive Bayes (BAN) classifier. We show that a combination of discriminative dataweighting with generative training of intermediate models can yield a computationally efficient method for discriminative parameter learning and structure selection. 1.
On Discriminative Bayesian Network Classifiers and Logistic Regression
 Machine Learning
, 2005
"... Discriminative learning of the parameters in the naive Bayes model is known to be equivalent to a logistic regression problem. Here we show that the same fact holds for much more general Bayesian network models, as long as the corresponding network structure satisfies a certain graphtheoretic prope ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
Discriminative learning of the parameters in the naive Bayes model is known to be equivalent to a logistic regression problem. Here we show that the same fact holds for much more general Bayesian network models, as long as the corresponding network structure satisfies a certain graphtheoretic property. The property holds for naive Bayes but also for more complex structures such as treeaugmented naive Bayes (TAN) as well as for mixed diagnosticdiscriminative structures. Our results imply that for networks satisfying our property, the conditional likelihood cannot have local maxima so that the global maximum can be found by simple local optimization methods. We also show that if this property does not hold, then in general the conditional likelihood can have local, nonglobal maxima. We illustrate our theoretical results by empirical experiments with local optimization in a conditional naive Bayes model. Furthermore, we provide a heuristic strategy for pruning the number of parameters and relevant features in such models. For many data sets, we obtain good results with heavily pruned submodels containing many fewer parameters than the original naive Bayes model.
Classification using Hierarchical Naïve Bayes models
 Machine Learning 2006
, 2002
"... Classification problems have a long history in the machine learning literature. One of the simplest, and yet most consistently well performing set of classifiers is the Nave Bayes models. However, an inherent problem with these classifiers is the assumption that all attributes used to describe an in ..."
Abstract

Cited by 11 (1 self)
 Add to MetaCart
Classification problems have a long history in the machine learning literature. One of the simplest, and yet most consistently well performing set of classifiers is the Nave Bayes models. However, an inherent problem with these classifiers is the assumption that all attributes used to describe an instance are conditionally independent given the class of that instance. When this assumption is violated (which is often the case in practice) it can reduce classification accuracy due to "information doublecounting" and interaction omission.
Robust bayesian linear classifier ensembles
 Proc. 16th European Conf. Machine Learning, Lecture Notes in Computer Science
, 2005
"... Abstract. Ensemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Abstract. Ensemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models that can be learnt in order to lower computational complexity costs. In these restricted spaces, where incorrect modelling assumptions are possibly made, uniform averaging sometimes performs even better than bayesian model averaging. Linear mixtures over sets of models provide an space that includes uniform averaging as a particular case. We develop two algorithms for learning maximum a posteriori weights for linear mixtures, based on expectation maximization and on constrained optimizition. We provide a nontrivial example of the utility of these two algorithms by applying them for one dependence estimators. We develop the conjugate distribution for one dependence estimators and empirically show that uniform averaging is clearly superior to BMA for this family of models. After that we empirically show that the maximum a posteriori linear mixture weights improve accuracy significantly over uniform aggregation.
Learning graphical models for hypothesis testing
 in Proc. 14th IEEE Statist. Signal Process. Workshop
, 2007
"... Abstract—Sparse graphical models have proven to be a flexible class of multivariate probability models for approximating highdimensional distributions. In this paper, we propose techniques to exploit this modeling ability for binary classification by discriminatively learning such models from label ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
Abstract—Sparse graphical models have proven to be a flexible class of multivariate probability models for approximating highdimensional distributions. In this paper, we propose techniques to exploit this modeling ability for binary classification by discriminatively learning such models from labeled training data, i.e., using both positive and negative samples to optimize for the structures of the two models. We motivate why it is difficult to adapt existing generative methods, and propose an alternative method consisting of two parts. First, we develop a novel method to learn treestructured graphical models which optimizes an approximation of the loglikelihood ratio. We also formulate a joint objective to learn a nested sequence of optimal forestsstructured models. Second, we construct a classifier by using ideas from boosting to learn a set of discriminative trees. The final classifier can interpreted as a likelihood ratio test between two models with a larger set of pairwise features. We use crossvalidation to determine the optimal number of edges in the final model. The algorithm presented in this paper also provides a method to identify a subset of the edges that are most salient for discrimination. Experiments show that the proposed procedure outperforms generative methods such as Tree Augmented Naïve Bayes and ChowLiu as well as their boosted counterparts.