Results 1  10
of
57
Hierarchical Latent Class Models for Cluster Analysis
 Journal of Machine Learning Research
, 2002
"... Latent class models are used for cluster analysis of categorical data. Underlying such a model is the assumption that the observed variables are mutually independent given the class variable. A serious problem with the use of latent class models, known as local dependence, is that this assumption is ..."
Abstract

Cited by 53 (12 self)
 Add to MetaCart
Latent class models are used for cluster analysis of categorical data. Underlying such a model is the assumption that the observed variables are mutually independent given the class variable. A serious problem with the use of latent class models, known as local dependence, is that this assumption is often untrue. In this paper we propose hierarchical latent class models as a framework where the local dependence problem can be addressed in a principled manner. We develop a searchbased algorithm for learning hierarchical latent class models from data. The algorithm is evaluated using both synthetic and realworld data.
Last level cache (llc) performance of data mining workloads on a cmp – a case study of parallel bioinformatics workloads
 in HPCA ’07: Proceedings of the 12th International Symposium on High Performance Computer Architecture (HPCA
, 2007
"... ..."
(Show Context)
Data Perturbation for Escaping Local Maxima in Learning
 IN AAAI
, 2002
"... Almost all machine learning algorithmsbe they for regression, classification or density estimationseek hypotheses that optimize a score on training data. In most interesting cases, however, full global optimization is not feasible and local search techniques are used to discover reasonable ..."
Abstract

Cited by 35 (3 self)
 Add to MetaCart
Almost all machine learning algorithmsbe they for regression, classification or density estimationseek hypotheses that optimize a score on training data. In most interesting cases, however, full global optimization is not feasible and local search techniques are used to discover reasonable solutions. Unfortunately,
Using guide trees to construct multiplesequence evolutionary HMMs
 Bioinformatics 19(Suppl
, 2003
"... Motivation: Scorebased progressive alignment algorithms do dynamic programming on successive branches of a guide tree. The analogous probabilistic construct is an Evolutionary HMM. This is a multiplesequence hidden Markov model (HMM) made by combining transducers (conditionally normalised Pair HM ..."
Abstract

Cited by 31 (5 self)
 Add to MetaCart
Motivation: Scorebased progressive alignment algorithms do dynamic programming on successive branches of a guide tree. The analogous probabilistic construct is an Evolutionary HMM. This is a multiplesequence hidden Markov model (HMM) made by combining transducers (conditionally normalised Pair HMMs) on the branches of a phylogenetic tree. Methods: We present general algorithms for constructing an Evolutionary HMM from any Pair HMM and for doing dynamic programming to any Multiplesequence HMM. Results: Our prototype implementation, Handel, is based on the ThorneKishinoFelsenstein evolutionary model and is benchmarked using structural reference alignments. Availability: Handel can be downloaded under GPL from www.biowiki.org/Handel
Phylogenetic hidden Markov models
 IN STATISTICAL METHODS IN MOLECULAR EVOLUTION
, 2005
"... Phylogenetic hidden Markov models, or phyloHMMs, are probabilistic models that consider not only the way substitutions occur through evolutionary history at each site of a genome, but also the way this process changes from one site to the next. By treating molecular evolution as a combination of tw ..."
Abstract

Cited by 27 (6 self)
 Add to MetaCart
(Show Context)
Phylogenetic hidden Markov models, or phyloHMMs, are probabilistic models that consider not only the way substitutions occur through evolutionary history at each site of a genome, but also the way this process changes from one site to the next. By treating molecular evolution as a combination of two Markov processes—one that operates in the dimension of space (along a genome) and one that operates in the dimension of time (along the branches of a phylogenetic tree)—these models allow aspects of both sequence structure and sequence evolution to be captured. Moreover, as we will discuss, they permit key computations to be performed exactly and efficiently. PhyloHMMs allow evolutionary information to be brought to bear on a wide variety of problems of sequence “segmentation, ” such as gene prediction and the identification of conserved elements. PhyloHMMs were first proposed as a way of improving phylogenetic models that allow for variation among sites in the rate of substitution [8, 52]. Soon afterward, they were adapted for the problem of secondary structure
Using evolutionary expectation maximization to estimate indel rates, Bioinformatics 21
, 2005
"... Motivation: The Expectation Maximization (EM) algorithm, in the form of the Baum–Welch algorithm (for hidden Markov models) or the InsideOutside algorithm (for stochastic contextfree grammars), is a powerful way to estimate the parameters of stochastic grammars for biological sequence analysis. To ..."
Abstract

Cited by 17 (1 self)
 Add to MetaCart
Motivation: The Expectation Maximization (EM) algorithm, in the form of the Baum–Welch algorithm (for hidden Markov models) or the InsideOutside algorithm (for stochastic contextfree grammars), is a powerful way to estimate the parameters of stochastic grammars for biological sequence analysis. To use this algorithm for multiplesequence evolutionary modelling, it would be useful to apply the EM algorithm to estimate not only the probability parameters of the stochastic grammar, but also the instantaneous mutation rates of the underlying evolutionary model (to facilitate the development of stochastic grammars based on phylogenetic trees, also known as Statistical Alignment). Recently, we showed how to do this for the point substitution component of the evolutionary process; here, we extend these results to the indel process. Results: We present an algorithm for maximumlikelihood estimation of insertion and deletion rates from multiple sequence alignments, using EM, under the singleresidue indel model owing to Thorne, Kishino and Felsenstein (the ‘TKF91 ’ model). The algorithm converges extremely rapidly, gives accurate results on simulated data that are an improvement over parsimonious estimates (which are shown to underestimate the true indel rate), and gives plausible results on experimental data (coronavirus envelope domains). Owing to the algorithm’s close similarity to the Baum–Welch algorithm for training hidden Markov models, it can be used in an ‘unsupervised ’ fashion to estimate rates for unaligned sequences, or estimate several sets of rates for sequences with heterogenous rates. Availability: Software implementing the algorithm and the benchmark is available under GPL from
An Investigation of Phylogenetic Likelihood Methods
, 2003
"... We analyze the performance of likelihoodbased approaches used to reconstruct phylogenetic trees. Unlike other techniques such as NeighborJoining (NJ) and Maximum Parsimony (MP), relatively little is known regarding the behavior of algorithms founded on the principle of likelihood. ..."
Abstract

Cited by 16 (0 self)
 Add to MetaCart
(Show Context)
We analyze the performance of likelihoodbased approaches used to reconstruct phylogenetic trees. Unlike other techniques such as NeighborJoining (NJ) and Maximum Parsimony (MP), relatively little is known regarding the behavior of algorithms founded on the principle of likelihood.
Combining Multiple Datasets in a Likelihood Analysis: Which Models are Best?
"... Until recently, phylogenetic analyses have been routinely based on homologous sequences of a single gene. Given the vast number of gene sequences now available, phylogenetic studies are now based on the analysis of multiple genes. Thus, it has become necessary to devise statistical methods to combin ..."
Abstract

Cited by 12 (0 self)
 Add to MetaCart
Until recently, phylogenetic analyses have been routinely based on homologous sequences of a single gene. Given the vast number of gene sequences now available, phylogenetic studies are now based on the analysis of multiple genes. Thus, it has become necessary to devise statistical methods to combine multiple molecular datasets. Here, we compare several models for combining different genes for the purpose of evaluating the likelihood of tree topologies. Three methods of branch length estimation were studied: assuming all genes have the same branch lengths (concatenate model); assuming that branch lengths are proportional among genes (proportional model); or assuming that each gene has a separate set of branch lengths (separate model). We also compared three models of amongsite rate variation: the homogenous model, a model that assumes one gamma parameter for all genes, and a model that assumes one gamma parameter for each gene. On the basis of two nuclear and one mitochondrial aminoacid datasets, our results suggest that, depending on the dataset chosen, either the separate model or the proportional model represent the most appropriate method for branch length analysis. For all datasets examined, one gamma parameter to each gene represents the best model for amongsite rate variation. Using these models, we analyzed alternative mammalian tree topologies and describe the effect of the assumed model on the maximum likelihood tree. We show that the choice of the model has an impact on the best phylogeny obtained.
Selecton 2007: advanced models for detecting positive and purifying selection using a Bayesian inference approach
 Nucleic Acids Research
, 2007
"... Biologically significant sites in a protein may be
identified by contrasting the rates of synonymous
(K
s
) and nonsynonymous (K
a
) substitutions. This
enables the inference of sitespecific positive
Darwinian selection and purifying selection. We
present here Selecton version 2.2 (http://selecto ..."
Abstract

Cited by 7 (1 self)
 Add to MetaCart
Biologically significant sites in a protein may be
identified by contrasting the rates of synonymous
(K
s
) and nonsynonymous (K
a
) substitutions. This
enables the inference of sitespecific positive
Darwinian selection and purifying selection. We
present here Selecton version 2.2 (http://selecton.
bioinfo.tau.ac.il), a web server which automatically
calculates the ratio between K
a
and K
s
(u) at each
site of the protein. This ratio is graphically displayed
on each site using a colorcoding scheme, indicat
ing either positive selection, purifying selection
or lack of selection. Selecton implements an
assembly of different evolutionary models, which
allow for statistical testing of the hypothesis that a protein has undergone positive selection.
Specifically, the recently developed mechanistic
empirical model is introduced, which takes into
account the physicochemical properties of amino
acids. Advanced options were introduced to allow
maximal fine tuning of the server to the user’s
specific needs, including calculation of statistical
support of the u values, an advanced graphic
display of the protein’s 3dimensional structure,
use of different genetic codes and inputting of a
prebuilt phylogenetic tree. Selecton version 2.2 is
an effective, userfriendly and freely available web
server which implements uptodate methods for
computing sitespecific selection forces, and the
visualization of these forces on the protein’s
sequence and structure.
BioMed Central
, 2006
"... A novel approach to phylogenetic tree construction using stochastic optimization and clustering ..."
Abstract

Cited by 6 (3 self)
 Add to MetaCart
(Show Context)
A novel approach to phylogenetic tree construction using stochastic optimization and clustering