Results 1  10
of
14
Using Bayesian networks to analyze expression data
 Journal of Computational Biology
, 2000
"... DNA hybridization arrays simultaneously measure the expression level for thousands of genes. These measurements provide a “snapshot ” of transcription levels within the cell. A major challenge in computational biology is to uncover, from such measurements, gene/protein interactions and key biologica ..."
Abstract

Cited by 731 (16 self)
 Add to MetaCart
DNA hybridization arrays simultaneously measure the expression level for thousands of genes. These measurements provide a “snapshot ” of transcription levels within the cell. A major challenge in computational biology is to uncover, from such measurements, gene/protein interactions and key biological features of cellular systems. In this paper, we propose a new framework for discovering interactions between genes based on multiple expression measurements. This framework builds on the use of Bayesian networks for representing statistical dependencies. A Bayesian network is a graphbased model of joint multivariate probability distributions that captures properties of conditional independence between variables. Such models are attractive for their ability to describe complex stochastic processes and because they provide a clear methodology for learning from (noisy) observations. We start by showing how Bayesian networks can describe interactions between genes. We then describe a method for recovering gene interactions from microarray data using tools for learning Bayesian networks. Finally, we demonstrate this method on the S. cerevisiae cellcycle measurements of Spellman et al. (1998). Key words: gene expression, microarrays, Bayesian methods. 1.
The Bayesian Structural EM Algorithm
, 1998
"... In recent years there has been a flurry of works on learning Bayesian networks from data. One of the hard problems in this area is how to effectively learn the structure of a belief network from incomplete datathat is, in the presence of missing values or hidden variables. In a recent paper, I in ..."
Abstract

Cited by 220 (12 self)
 Add to MetaCart
In recent years there has been a flurry of works on learning Bayesian networks from data. One of the hard problems in this area is how to effectively learn the structure of a belief network from incomplete datathat is, in the presence of missing values or hidden variables. In a recent paper, I introduced an algorithm called Structural EM that combines the standard Expectation Maximization (EM) algorithm, which optimizes parameters, with structure search for model selection. That algorithm learns networks based on penalized likelihood scores, which include the BIC/MDL score and various approximations to the Bayesian score. In this paper, I extend Structural EM to deal directly with Bayesian model selection. I prove the convergence of the resulting algorithm and show how to apply it for learning a large class of probabilistic models, including Bayesian networks and some variants thereof.
Model Selection for Probabilistic Clustering Using CrossValidated Likelihood
 Statistics and Computing
, 1998
"... Crossvalidated likelihood is investigated as a tool for automatically determining the appropriate number of components (given the data) in finite mixture modelling, particularly in the context of modelbased probabilistic clustering. The conceptual framework for the crossvalidation approach to mod ..."
Abstract

Cited by 65 (4 self)
 Add to MetaCart
Crossvalidated likelihood is investigated as a tool for automatically determining the appropriate number of components (given the data) in finite mixture modelling, particularly in the context of modelbased probabilistic clustering. The conceptual framework for the crossvalidation approach to model selection is direct in the sense that models are judged directly on their outofsample predictive performance. The method is applied to a wellknown clustering problem in the atmospheric science literature using historical records of upper atmosphere geopotential height in the Northern hemisphere. Crossvalidated likelihood provides strong evidence for three clusters in the data set, providing an objective confirmation of earlier results derived using nonprobabilistic clustering techniques. 1 Introduction Crossvalidation is a wellknown technique in supervised learning to select a model from a family of candidate models. Examples include selecting the best classification tree using cr...
Switching Kalman Filters
, 1998
"... We show how many different variants of Switching Kalman Filter models can be represented in a unified way, leading to a single, generalpurpose inference algorithm. We then show how to find approximate Maximum Likelihood Estimates of the parameters using the EM algorithm, extending previous results ..."
Abstract

Cited by 58 (3 self)
 Add to MetaCart
We show how many different variants of Switching Kalman Filter models can be represented in a unified way, leading to a single, generalpurpose inference algorithm. We then show how to find approximate Maximum Likelihood Estimates of the parameters using the EM algorithm, extending previous results on learning using EM in the nonswitching case [DRO93, GH96a] and in the switching, but fully observed, case [Ham90]. 1 Introduction Dynamical systems are often assumed to be linear and subject to Gaussian noise. This model, called the Linear Dynamical System (LDS) model, can be defined as x t = A t x t\Gamma1 + v t y t = C t x t +w t where x t is the hidden state variable at time t, y t is the observation at time t, and v t ¸ N(0; Q t ) and w t ¸ N(0; R t ) are independent Gaussian noise sources. Typically the parameters of the model \Theta = f(A t ; C t ; Q t ; R t )g are assumed to be timeinvariant, so that they can be estimated from data using e.g., EM [GH96a]. One of the main adva...
Factored sparse inverse covariance matrices
 In Proc. IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing
, 2000
"... Most HMMbased speech recognition systems use Gaussian mixtures as observation probability density functions. An important goal in all such systems is to improve parsimony. One method is to adjust the type of covariance matrices used. In this work, factored sparse inverse covariance matrices are int ..."
Abstract

Cited by 38 (10 self)
 Add to MetaCart
Most HMMbased speech recognition systems use Gaussian mixtures as observation probability density functions. An important goal in all such systems is to improve parsimony. One method is to adjust the type of covariance matrices used. In this work, factored sparse inverse covariance matrices are introduced. Based on Í �Í factorization, the inverse covariance matrix can be represented using linear regressive coefficients which 1) correspond to sparse patterns in the inverse covariance matrix (and therefore represent conditional independence properties of the Gaussian), and 2), result in a method of partial tying of the covariance matrices without requiring nonlinear EM update equations. Results show that the performance of fullcovariance Gaussians can be matched by factored sparse inverse covariance Gaussians having significantly fewer parameters. 1.
Probabilistic ModelBased Clustering of Multivariate and Sequential Data
 In Proceedings of Artificial Intelligence and Statistics
, 1999
"... Probabilistic modelbased clustering, based on finite mixtures of multivariate models, is a useful framework for clustering data in a statistical context. This general framework can be directly extended to clustering of sequential data, based on finite mixtures of sequential models. In this paper we ..."
Abstract

Cited by 28 (1 self)
 Add to MetaCart
Probabilistic modelbased clustering, based on finite mixtures of multivariate models, is a useful framework for clustering data in a statistical context. This general framework can be directly extended to clustering of sequential data, based on finite mixtures of sequential models. In this paper we consider the problem of fitting mixture models where both multivariate and sequential observations are present. A general EM algorithm is discussed and experimental results demonstrated on simulated data. The problem is motivated by the practical problem of clustering individuals into groups based on both their static characteristics and their dynamic behavior. 1 Introduction and Motivation Consider the following problem. We have a set of individuals (a random sample from a larger population) whomwe would like to cluster into groups based on observational data. For each individual we can measure characteristics which are relatively static (e.g., their height, weight, income, age, sex, etc)...
Inferring mixtures of markov chains
 2004 Black, Paul E. “Markov Chain.” National Institute of Standards and Technology
, 2002
"... Abstract. We define the problem of inferring a “mixture of Markov chains ” based on observing a stream of interleaved outputs from these chains. We show a sharp characterization of the inference process. The problems we consider also has applications such as gene finding, intrusion detection, etc., ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Abstract. We define the problem of inferring a “mixture of Markov chains ” based on observing a stream of interleaved outputs from these chains. We show a sharp characterization of the inference process. The problems we consider also has applications such as gene finding, intrusion detection, etc., and more generally in analyzing interleaved sequences. 1
Robust bayesian linear classifier ensembles
 Proc. 16th European Conf. Machine Learning, Lecture Notes in Computer Science
, 2005
"... Abstract. Ensemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
Abstract. Ensemble classifiers combine the classification results of several classifiers. Simple ensemble methods such as uniform averaging over a set of models usually provide an improvement over selecting the single best model. Usually probabilistic classifiers restrict the set of possible models that can be learnt in order to lower computational complexity costs. In these restricted spaces, where incorrect modelling assumptions are possibly made, uniform averaging sometimes performs even better than bayesian model averaging. Linear mixtures over sets of models provide an space that includes uniform averaging as a particular case. We develop two algorithms for learning maximum a posteriori weights for linear mixtures, based on expectation maximization and on constrained optimizition. We provide a nontrivial example of the utility of these two algorithms by applying them for one dependence estimators. We develop the conjugate distribution for one dependence estimators and empirically show that uniform averaging is clearly superior to BMA for this family of models. After that we empirically show that the maximum a posteriori linear mixture weights improve accuracy significantly over uniform aggregation.
Finite Mixture Model of Bounded SemiNaive Bayesian Networks for Classification
 In Joint 13th International Conference on Artificial Neural Network (ICANN2003) and 10th International Conference on Neural Information Processing (ICONIP2003), Long paper, Lecture Notes in Computer Science
, 2003
"... The Naive Bayesian (NB) network classifier, a probabilistic model with a strong assumption of conditional independence among features, shows a surprisingly competitive prediction performance even when compared with some stateoftheart classifiers. With a looser assumption of conditional independ ..."
Abstract

Cited by 3 (1 self)
 Add to MetaCart
The Naive Bayesian (NB) network classifier, a probabilistic model with a strong assumption of conditional independence among features, shows a surprisingly competitive prediction performance even when compared with some stateoftheart classifiers. With a looser assumption of conditional independence, the SemiNaive Beyesian (SNB) network classifier is superior to NB classifiers when features are combined. However, the problem for SNB is that its structure is still strongly constrained which may generate inaccurate distributions for some datasets. A natural progression to improve SNB is to extend it using the mixture approach. However, in obtaining the final structure, traditional SNBs use the heuristic approaches to learn the structure from data locally. On the other hand, ExpectationMaximization (EM) method is used in the mixture approach to obtain the structure iteratively. The extension is difficult to integrate the local heuristic into the maximization step since it may not convergence. In this paper we firstly develop a Bounded SemiNaive Bayesian network (BSNB) model, which contains the restriction on the number of variables that can be joined in a combined feature. As opposed to local property of the traditional SNB models, our model enjoys a global nature and maintains a polynomial time cost. Overcoming the difficulty of integrating SNBs into the mixture model, we then propose an algorithm to extend it into a finite mixture structure, named Mixture of Bounded SemiNaive Bayesian network (MBSNB). We give theoretical derivations, outline of the algorithm, analysis of algo rithm and a set of experiments to demonstrate the usefulness of MBSNB in some classification tasks. The novel finite MBSNB network shows good speed up, ability to converge and ...
Initialization of Cluster Refinement Algorithms:
 in Proceedings of International Joint Conference on Neural Networks (IJCNN
, 2004
"... Various iterative refinement clustering methods are dependent on the initial state of the model and are capable of obtaining one of their local optima only. Since the task of identifying the global optimization is NPhard, the study of the initialization method towards a suboptimization is of great ..."
Abstract

Cited by 3 (0 self)
 Add to MetaCart
Various iterative refinement clustering methods are dependent on the initial state of the model and are capable of obtaining one of their local optima only. Since the task of identifying the global optimization is NPhard, the study of the initialization method towards a suboptimization is of great value. This paper reviews the various cluster initialization methods in the literature by categorizing them into three major families, namely random sampling methods, distance optimization methods, and density estimation methods. In addition, using a set of quantitative measures, we assess their performance on a number of synthetic and reallife data sets. Our controlled benchmark identifies two distance optimization methods, namely SCS and KKZ, as complements of the KMeans learning characteristics towards a better cluster separation in the output solution.