Results 1 -
2 of
2
Semisupervised learning of hierarchical latent trait models for data visualisation
- IEEE Transactions on Knowledge and Data Engineering
, 2005
"... Recently, we have developed the hierarchical Generative Topographic Mapping (HGTM), an inter-active method for visualisation of large high-dimensional real-valued data sets. In this paper, we propose a more general visualisation system by extending HGTM in 3 ways, which allow the user to visualise a ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
Recently, we have developed the hierarchical Generative Topographic Mapping (HGTM), an inter-active method for visualisation of large high-dimensional real-valued data sets. In this paper, we propose a more general visualisation system by extending HGTM in 3 ways, which allow the user to visualise a wider range of datasets and better support the model development process. (i) We integrate HGTM with noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM). This enables us to visualise data of inherently discrete nature, e.g. collections of documents in a hierarchical manner. (ii) We give the user a choice of initialising the child plots of the current plot in either interactive, or automatic mode. In the interactive mode the user selects “regions of interest”, whereas in the automatic mode an unsupervised minimum message length (MML)-inspired construction of a mixture of LTMs is employed. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualising large data sets. (iii) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualisation plots, since they can highlight the boundaries between data clusters. We illustrate our approach on a toy example and evaluate it on three more complex real data sets.
PROTEINS: Structure, Function, and Genetics 51:504–514 (2003) Hidden Markov Models That Use Predicted Local Structure for Fold Recognition: Alphabets of Backbone Geometry
"... ABSTRACT An important problem in computational biology is predicting the structure of the large number of putative proteins discovered by genome sequencing projects. Fold-recognition methods attempt to solve the problem by relating the target proteins to known structures, searching for template prot ..."
Abstract
- Add to MetaCart
ABSTRACT An important problem in computational biology is predicting the structure of the large number of putative proteins discovered by genome sequencing projects. Fold-recognition methods attempt to solve the problem by relating the target proteins to known structures, searching for template proteins homologous to the target. Remote homologs that may have significant structural similarity are often not detectable by sequence similarities alone. To address this, we incorporated predicted local structure, a generalization of secondary structure, into two-track profile hidden Markov models (HMMs). We did not rely on a simple helix-strandcoil definition of secondary structure, but experimented with a variety of local structure descriptions, following a principled protocol to establish which descriptions are most useful for improving fold recognition and alignment quality. On a test set of 1298 nonhomologous proteins, HMMs incorporating a 3-letter STRIDE alphabet improved fold recognition accuracy by 15 % over amino-acid-only HMMs and 23% over PSI-BLAST, measured by ROC-65 numbers. We compared two-track HMMs to amino-acid-only HMMs on a difficult alignment test set of 200 protein pairs (structurally similar with 3–24 % sequence identity). HMMs with a 6-letter STRIDE secondary track improved alignment quality by 62%, relative to DALI structural alignments, while HMMs with an STR track (an expanded DSSP alphabet that subdivides strands into six states) improved by 40 % relative to CE. Proteins 2003;51:504–514. © 2003 Wiley-Liss, Inc. Key words: protein structure prediction; two-track HMM; multitrack HMM; information theory; neural network; alignment; secondary structure

