Results 1  10
of
16
A Bayesian system integrating expression data with sequence patterns for localizing proteins: comprehensive application to the yeast genome
 J. Mol. Biol
, 2000
"... Version Final We develop a probabilistic system for predicting the subcellular localization of proteins and estimating the relative population of the various compartments in yeast. Our system employs a Bayesian approach, updating a protein's probability of being in a compartment based on a diverse ..."
Abstract

Cited by 74 (21 self)
 Add to MetaCart
Version Final We develop a probabilistic system for predicting the subcellular localization of proteins and estimating the relative population of the various compartments in yeast. Our system employs a Bayesian approach, updating a protein's probability of being in a compartment based on a diverse range of 30 features. These range from specific motifs (e.g. signal sequences or HDEL) to overall properties of a sequence (e.g. surface composition or isoelectric point) to wholegenome data (e.g. absolute mRNA expression levels or their fluctuations). The strength of our approach is the easy integration of many features, particularly the wholegenome expression data. We construct a training and testing set of ~1300 yeast proteins with an experimentally known localization from merging, filtering, and standardizing the annotation in the MIPS, SwissProt and YPD databases, and we achieve 75 % accuracy on individual protein predictions using this dataset. Moreover, we are able to estimate the relative protein population of the various compartments without requiring a definite localization for every protein. This approach, which is based on an
On supervised selection of Bayesian networks
 In UAI99
, 1999
"... Given a set of possible models (e.g., Bayesian network structures) and a data sample, in the unsupervised model selection problem the task is to choose the most accurate model with respect to the domain joint probability distribution. In contrast to this, in supervised model selection it is a priori ..."
Abstract

Cited by 18 (6 self)
 Add to MetaCart
Given a set of possible models (e.g., Bayesian network structures) and a data sample, in the unsupervised model selection problem the task is to choose the most accurate model with respect to the domain joint probability distribution. In contrast to this, in supervised model selection it is a priori known that the chosen model will be used in the future for prediction tasks involving more \focused " predictive distributions. Although focused predictive distributions can be produced from the joint probability distribution by marginalization, in practice the best model in the unsupervised sense does not necessarily perform well in supervised domains. In particular, the standard marginal likelihood score is a criterion for the unsupervised task, and, although frequently used for supervised model selection also, does not perform well in such tasks. In this paper we study the performance of the marginal likelihood score empirically in supervised Bayesian network selection tasks by using a large number of publicly available classi cation data sets, and compare the results to those obtained by alternative model selection criteria, including empirical crossvalidation methods, an approximation of a supervised marginal likelihood measure, and a supervised version of Dawid's prequential (predictive sequential) principle. The results demonstrate that the marginal likelihood score does not perform well for supervised model selection, while the best results are obtained by using Dawid's prequential approach.
Maximum entropy and the glasses you are looking through
 IN: PROCEEDINGS OF THE SIXTEENTH ANNUAL CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI2000
, 2000
"... We give an interpretation of the Maximum Entropy (MaxEnt) Principle in gametheoretic terms. Based on this interpretation, we makeaformal distinction between different ways of applying Maximum Entropy distributions. MaxEnt has frequently been criticized on the grounds that it leads to highly represen ..."
Abstract

Cited by 11 (6 self)
 Add to MetaCart
We give an interpretation of the Maximum Entropy (MaxEnt) Principle in gametheoretic terms. Based on this interpretation, we makeaformal distinction between different ways of applying Maximum Entropy distributions. MaxEnt has frequently been criticized on the grounds that it leads to highly representation dependent results. Our distinction allows us to avoid this problem in many cases.
Mixnets: Factored Mixtures of Gaussians in Bayesian Networks with Mixed Continuous And Discrete Variables
, 2000
"... Recently developed techniques have made it possible to quickly learn accurate probability density functions from data in lowdimensional continuous spaces. In particular, mixtures of Gaussians can be fitted to data very quickly using an accelerated EM algorithm that employs multiresolution kdtrees ..."
Abstract

Cited by 7 (2 self)
 Add to MetaCart
Recently developed techniques have made it possible to quickly learn accurate probability density functions from data in lowdimensional continuous spaces. In particular, mixtures of Gaussians can be fitted to data very quickly using an accelerated EM algorithm that employs multiresolution kdtrees (Moore, 1999). In this paper, we propose a kind of Bayesian network in which lowdimensional mixtures of Gaussians over different subsets of the domain’s variables are combined into a coherent joint probability model over the entire domain. The network is also capable of modeling complex dependencies between discrete variables and continuous variables without requiring discretization of the continuous variables. We present efficient heuristic algorithms for automatically learning these networks from data, and perform comparative experiments illustrating how well these networks model real scientific data and synthetic data. We also briefly discuss some possible improvements to the networks, as well as possible applications.
Local sparsity control for Naive Bayes with extreme misclassification costs
"... In applications of data mining characterized by highly skewed misclassification costs certain types of errors become virtually unacceptable. This limits the utility of a classifier to a range in which such constraints can be met. Naive Bayes, which has proven to be very useful in text mining applic ..."
Abstract

Cited by 4 (2 self)
 Add to MetaCart
In applications of data mining characterized by highly skewed misclassification costs certain types of errors become virtually unacceptable. This limits the utility of a classifier to a range in which such constraints can be met. Naive Bayes, which has proven to be very useful in text mining applications due to high scalability, can be particularly affected. Although its 0/1 loss tends to be small, its misclassifications are often made with apparently high con…dence. Aside from e¤orts to better calibrate Naive Bayes scores, it has been shown that its accuracy depends on document sparsity and feature selection can lead to marked improvement in classification performance. Traditionally, sparsity is controlled globally, and the result for any particular document may vary. In this work we examine the merits of local sparsity control for Naive Bayes in the context of highly asymmetric misclassification costs. In experiments with three benchmark document collections we demonstrate clear advantages of documentlevel feature selection. In the extreme cost setting, multinomial Naive Bayes with local sparsity control is able to outperform even some of the recently proposed e¤ective improvements to the Naive Bayes classifier. There are also indications that local feature selection may be preferable in different cost settings.
Integrating learning from examples into the search for diagnostic policies
 Artificial Intelligence
, 1998
"... This paper studies the problem of learning diagnostic policies from training examples. A diagnostic policy is a complete description of the decisionmaking actions of a diagnostician (i.e., tests followed by a diagnostic decision) for all possible combinations of test results. An optimal diagnostic ..."
Abstract

Cited by 4 (0 self)
 Add to MetaCart
This paper studies the problem of learning diagnostic policies from training examples. A diagnostic policy is a complete description of the decisionmaking actions of a diagnostician (i.e., tests followed by a diagnostic decision) for all possible combinations of test results. An optimal diagnostic policy is one that minimizes the expected total cost, which isthe sum of measurement costs and misdiagnosis costs. In most diagnostic settings, there is a tradeo between these two kinds of costs. This paper formalizes diagnostic decision making as a Markov Decision Process (MDP). The paper introduces a new family of systematic search algorithms based on the AO algorithm to solve this MDP.To makeAO e cient, the paper describes an admissible heuristic that enables AO to prune large parts of the search space. The paper also introduces several greedy algorithms including some improvements over previouslypublished methods. The paper then addresses the question of learning diagnostic policies from examples. When the probabilities of diseases and test results are computed from training data, there is a great danger of over tting. To reduce over tting, regularizers are integrated into the search algorithms. Finally, the paper compares the proposed methods on ve benchmark diagnostic data sets. The studies show that in most cases the systematic search methods produce better diagnostic policies than the greedy methods. In addition, the studies show that for training sets of realistic size, the systematic search algorithms are practical on today's desktop computers. 1.
Tree Augmented Classification of Binary Data Minimizing Stochastic Complexity
, 2002
"... We establish the algorithms and procedures that augment by trees the classfiers of binary feature vectors in (Gyllenberg et. al. 1993, 1997, Gyllenberg et. al. 1999 and Gyllenberg and Koski 2002). The notion of augmenting a classifier by a tree is due to (Chow and Liu 1968) and in a more extensive f ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
We establish the algorithms and procedures that augment by trees the classfiers of binary feature vectors in (Gyllenberg et. al. 1993, 1997, Gyllenberg et. al. 1999 and Gyllenberg and Koski 2002). The notion of augmenting a classifier by a tree is due to (Chow and Liu 1968) and in a more extensive form due to (Friedman et. al. 1997). These techniques will in another report be primarily applied to unsupervised classification of bacterial DNA fingerprints (or electrophoretic patterns), c.f., (Gyllenberg and Koski 2001 (a), Rademaker et. al. 1999). By classification we mean here both the (unsupervised) procedures of finding the classes in (training) data of items as well as the actual outcome of the procedure, i.e., a partitioning of the items. By identification we mean the procedures for finding the assignment of items in classes, preestablished in one way or the other. The distinction should be clear, although the algorithms of classification as given in the sequel will also...
Bayesian Classification
, 1999
"... Bayesian classification addresses the classification problem by learning the distribution of instances given different class values. We review the basic notion of Bayesian classification, describe in some detail the naive Bayesian classifier, and briefly discuss some extensions. ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Bayesian classification addresses the classification problem by learning the distribution of instances given different class values. We review the basic notion of Bayesian classification, describe in some detail the naive Bayesian classifier, and briefly discuss some extensions.
Performance Comparison of ItemtoItem Skills Models with the IRT Single Latent Trait Model
"... Abstract. Assessing a learner's mastery of a set of skills is a fundamental issue in intelligent learning environments. We compare the predictive performance of two approaches for training a learner model with domain data. One is based on the principle of building the model solely from observable da ..."
Abstract

Cited by 1 (0 self)
 Add to MetaCart
Abstract. Assessing a learner's mastery of a set of skills is a fundamental issue in intelligent learning environments. We compare the predictive performance of two approaches for training a learner model with domain data. One is based on the principle of building the model solely from observable data items, such as exercises or test items. Skills modelling is not part of the training phase, but instead dealt with at later stage. The other approach incorporates a single latent skill in the model. We compare the capacity of both approaches to accurately predict item outcome (binary success or failure) from a subset of item outcomes. Three types of itemtoitem models based on standard Bayesian modeling algorithms are tested: (1) Naive Bayes, (2) TreeAugmented Naive Bayes (TAN), and (3) a K2 Bayesian Classi er. Their performance is compared to the widely used IRT2PL approach which incorporates a single latent skill. The results show that the itemtoitem approaches perform as well, or better than the IRT2PL approach over 4 widely di erent data sets, but the di erences vary considerably among the data sets. We discuss the implications of these results and the issues relating to the practical use of itemtoitem models.
Online Detection of Rule Violations in Table Soccer
"... Abstract. In table soccer, humans can not always thoroughly observe fast actions like rod spins and kicks. However, this is necessary in order to detect rule violations for example for tournament play. We describe an automatic system using sensors on a regular soccer table to detect rule violations ..."
Abstract
 Add to MetaCart
Abstract. In table soccer, humans can not always thoroughly observe fast actions like rod spins and kicks. However, this is necessary in order to detect rule violations for example for tournament play. We describe an automatic system using sensors on a regular soccer table to detect rule violations in realtime. Naive Bayes is used for kick classi cation, the parameters are trained using supervised learning. In the online experiments, rule violations were detected at a higher rate than by the human players. The implementation proved its usefulness by being used by humans in real games and sets a basis for future research using probability models in table soccer. 1