Results 11  20
of
220
Markovian Structures in Biological Sequence Alignments
 Journal of the American Statistical Association
, 1999
"... this article, we provide a coherent view of the two recent models used for multiple sequence alignment  the hidden Markov model (HMM) and the blockbased motif model  in order to develop a set of new algorithms that enjoy both the sensitivity of the blockbased model and the flexibility of the ..."
Abstract

Cited by 22 (8 self)
 Add to MetaCart
this article, we provide a coherent view of the two recent models used for multiple sequence alignment  the hidden Markov model (HMM) and the blockbased motif model  in order to develop a set of new algorithms that enjoy both the sensitivity of the blockbased model and the flexibility of the HMM. In particular, we decompose the standard HMM into two components: the insertion component, which is captured by the socalled "propagation model," and the deletion component, which is described by a deletion vector. Such a decomposition serves as a basis for rational compromise between biological specificity and model flexibility. Furthermore, we introduce a Bayesian model selection criterion that  in combination with the propagation model, genetic algorithm, and other computational aspects  forms the core of PROBE, a multiple alignment and database search methodology (software available via anonymous ftp at ftp://ncbi.nlm.nih.gov/pub/neuwald/probe1.0). The application of our method to a GTPase family of protein sequences yields an alignment that is confirmed by comparison with known tertiary structures.
A Case Study on the Choice, Interpretation and Checking of Multilevel Models for Longitudinal Binary Outcomes
"... Recent advances in statistical software have led to the rapid diffusion of new methods for modeling longitudinal data. Multilevel (also known as hierarchical or random effects) models for binary outcomes have been generally based on a logisticnormal specification, by analogy with earlier work for n ..."
Abstract

Cited by 18 (2 self)
 Add to MetaCart
Recent advances in statistical software have led to the rapid diffusion of new methods for modeling longitudinal data. Multilevel (also known as hierarchical or random effects) models for binary outcomes have been generally based on a logisticnormal specification, by analogy with earlier work for normally distributed data. The appropriate application and interpretation of these models remains somewhat unclear, especially when compared with the computationally more straightforward marginal modeling (GEE) approaches. In this paper we pose two interrelated questions. First, what limits should be placed on the interpretation of the coefficients and inferences derived from random effect models involving binary outcomes? Second, what are the minimum diagnostic checks that are required to evaluate whether such random effect models provide appropriate fits to the data? We address these questions by means of an extended case study using data on adolescent smoking from a large cohort study. Bay...
A statespace model for National Football League scores
 Journal of the American Statistical Association
, 1998
"... This paper develops a predictive model for National Football League (NFL) game scores using data from the period 19881993. The parameters of primary interest, measures of team strength, are expected to vary over time. Our model accounts for this source of variability by modeling football outcomes ..."
Abstract

Cited by 17 (3 self)
 Add to MetaCart
This paper develops a predictive model for National Football League (NFL) game scores using data from the period 19881993. The parameters of primary interest, measures of team strength, are expected to vary over time. Our model accounts for this source of variability by modeling football outcomes using a statespace model that assumes team strength parameters follow a firstorder autoregressive process. Two sources of variation in team strengths are addressed in our model; weektoweek changes in team strength due to injuries and other random factors, and seasontoseason changes resulting from changes in personnel and other longerterm factors. Our model also incorporates a homefield advantage while allowing for the possibility that the magnitude of the advantage may vary across teams. The aim of the analysis is to obtain plausible inferences concerning team strengths and other model parameters, and to predict future game outcomes. Iterative simulation is used to obtain samples fro...
Exploratory Data Analysis for Complex Models
, 2002
"... Exploratory" and "confirmatory" data analysis can both be viewed as methods for comparing observed data to what would be obtained under an implicit or explicit statistical model. ..."
Abstract

Cited by 15 (6 self)
 Add to MetaCart
Exploratory" and "confirmatory" data analysis can both be viewed as methods for comparing observed data to what would be obtained under an implicit or explicit statistical model.
Diagnostic Measures for Model Criticism
 JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
, 1996
"... ... In this article we present the general outlook and discuss general families of elaborations for use in practice; the exponential connection elaboration plays a key role. We then describe model elaborations for use in diagnosing: departures from normality, goodness of fit in generalized linear mo ..."
Abstract

Cited by 15 (1 self)
 Add to MetaCart
... In this article we present the general outlook and discuss general families of elaborations for use in practice; the exponential connection elaboration plays a key role. We then describe model elaborations for use in diagnosing: departures from normality, goodness of fit in generalized linear models, and variable selection in regression and outlier detection. We illustrate our approach with two applications.
All Maps of Parameter Estimates Are Misleading
 Statistics in Medicine
, 1998
"... Maps are frequently used to display spatial distributions of parameters of interest, such as cancer rates or average pollutant concentrations by county. It's well known that plotting observed rates can have serious drawbacks when sample sizes vary by area, since very high (and low) observed rat ..."
Abstract

Cited by 14 (8 self)
 Add to MetaCart
(Show Context)
Maps are frequently used to display spatial distributions of parameters of interest, such as cancer rates or average pollutant concentrations by county. It's well known that plotting observed rates can have serious drawbacks when sample sizes vary by area, since very high (and low) observed rates are found disproportionately in poorlysampled areas. Unfortunately, adjusting the observed rates to account for the effects of smallsample noise can introduce an opposite effect, in which the highest adjusted rates tend to be found disproportionately in wellsampled areas. In either case, the maps can be difficult to interpret because the display of spatial variation in the underlying parameters of interest is confounded with spatial variation in sample sizes. As a result, spatial patterns occur in adjusted rates even if there is no spatial structure in the underlying parameters of interest, and adjusted rates tend to look too uniform in areas with little data. We introduce two models (normal...
Multiple imputation for model checking: Completeddata plots with missing and latent data
 Biometrics
, 2005
"... Summary. In problems with missing or latent data, a standard approach is to first impute the unobserved data, then perform all statistical analyses on the completed dataset—corresponding to the observed data and imputed unobserved data—using standard procedures for completedata inference. Here, we ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
(Show Context)
Summary. In problems with missing or latent data, a standard approach is to first impute the unobserved data, then perform all statistical analyses on the completed dataset—corresponding to the observed data and imputed unobserved data—using standard procedures for completedata inference. Here, we extend this approach to model checking by demonstrating the advantages of the use of completeddata model diagnostics on imputed completed datasets. The approach is set in the theoretical framework of Bayesian posterior predictive checks (but, as with missingdata imputation, our methods of missingdata model checking can also be interpreted as “predictive inference ” in a nonBayesian context). We consider the graphical diagnostics within this framework. Advantages of the completeddata approach include: (1) One can often check model fit in terms of quantities that are of key substantive interest in a natural way, which is not always possible using observed data alone. (2) In problems with missing data, checks may be devised that do not require to model the missingness or inclusion mechanism; the latter is useful for the analysis of ignorable but unknown data collection mechanisms, such as are often assumed in the analysis of sample surveys and observational studies. (3) In many problems with latent data, it is possible to check qualitative features of the model (for example, independence of two variables) that can be naturally formalized with the help of the latent data. We illustrate with several applied examples.
Document Structure Analysis and Performance Evaluation
, 1999
"... Document Structure Analysis and Performance Evaluation by Jisheng Liang Chair of Supervisory Committee Professor Robert M. Haralick Electrical Engineering The goal of the document structure analysis is to find an optimal solution to partition the set of glyphs on a given document to a hierarchical t ..."
Abstract

Cited by 13 (0 self)
 Add to MetaCart
Document Structure Analysis and Performance Evaluation by Jisheng Liang Chair of Supervisory Committee Professor Robert M. Haralick Electrical Engineering The goal of the document structure analysis is to find an optimal solution to partition the set of glyphs on a given document to a hierarchical tree structure where entities within the hierarchy are associated with their physical properties and semantic labels. In this dissertation, we present a unified document structure extraction algorithm that is probability based, where the probabilities are estimated from an extensive training set of various kinds of measurements of distances between the terminal and nonterminal entities with which the algorithm works. The offline probabilities estimated in the training then drive all decisions in the online segmentation module. An iterative, relaxation like method is used to find the partitioning solution that maximizes the joint probability. This approach can be uniformly apply to the cons...