Results 1  10
of
480
Dynamic topic models
 In ICML
, 2006
"... Scientists need new tools to explore and browse large collections of scholarly literature. Thanks to organizations such as JSTOR, which scan and index the original bound archives of many journals, modern scientists can search digital libraries spanning hundreds of years. A scientist, suddenly ..."
Abstract

Cited by 656 (28 self)
 Add to MetaCart
(Show Context)
Scientists need new tools to explore and browse large collections of scholarly literature. Thanks to organizations such as JSTOR, which scan and index the original bound archives of many journals, modern scientists can search digital libraries spanning hundreds of years. A scientist, suddenly
ModelBased Clustering and Data Transformations for Gene Expression Data
, 2001
"... Motivation: Clustering is a useful exploratory technique for the analysis of gene expression data. Many different heuristic clustering algorithms have been proposed in this context. Clustering algorithms based on probability models offer a principled alternative to heuristic algorithms. In particula ..."
Abstract

Cited by 196 (9 self)
 Add to MetaCart
Motivation: Clustering is a useful exploratory technique for the analysis of gene expression data. Many different heuristic clustering algorithms have been proposed in this context. Clustering algorithms based on probability models offer a principled alternative to heuristic algorithms. In particular, modelbased clustering assumes that the data is generated by a finite mixture of underlying probability distributions such as multivariate normal distributions. The issues of selecting a 'good' clustering method and determining the 'correct' number of clusters are reduced to model selection problems in the probability framework. Gaussian mixture models have been shown to be a powerful tool for clustering in many applications.
Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction
 In Proceedings of NAACLHLT 2009. Shay
, 2009
"... We present a family of priors over probabilistic grammar weights, called the shared logistic normal distribution. This family extends the partitioned logistic normal distribution, enabling factored covariance between the probabilities of different derivation events in the probabilistic grammar, prov ..."
Abstract

Cited by 65 (11 self)
 Add to MetaCart
(Show Context)
We present a family of priors over probabilistic grammar weights, called the shared logistic normal distribution. This family extends the partitioned logistic normal distribution, enabling factored covariance between the probabilities of different derivation events in the probabilistic grammar, providing a new way to encode prior knowledge about an unknown grammar. We describe a variational EM algorithm for learning a probabilistic grammar based on this family of priors. We then experiment with unsupervised dependency grammar induction and show significant improvements using our model for both monolingual learning and bilingual learning with a nonparallel, multilingual corpus. 1
A Statistical Model for Multiparty Electoral Data
 American Political Science Review
, 1999
"... e propose a comprehensive statistical model for analyzing multiparty, districtlevel elections. This model, which provides a tool for comparative politics research analogous to that which regression analysis provides in the American twoparty context, can be used to explain or predict how geographic ..."
Abstract

Cited by 59 (12 self)
 Add to MetaCart
(Show Context)
e propose a comprehensive statistical model for analyzing multiparty, districtlevel elections. This model, which provides a tool for comparative politics research analogous to that which regression analysis provides in the American twoparty context, can be used to explain or predict how geographic distributions of electoral results depend upon economic conditions, neighborhood ethnic compositions, campaign spending, and other features of the election campaign or aggregate areas. We also provide new graphical representations for data exploration, model evaluation, and substantive interpretation. We illustrate the use of this model by attempting to resolve a controversy over the size of and trend in the electoral advantage of incumbency in Britain. Contraiy to previous analyses, all based on measures now known to be biased, we demonstrate that the advantage is small but meaningfkl, varies substantially across the parties, and is not growing. Finally, we show how to estimate the party from which each party's advantage is predominantly drawn. w e propose the first internally consistent statistical model for analyzing multiparty, districtlevel aggregate election data. Our model can
A bayesian hierarchical topic model for political texts: Measuring expressed agendas in senate press releases
 In Proceedings of the First Workshop on Social Media Analytics, SOMA ’10
"... Political scientists lack methods to efficiently measure the priorities political actors emphasize in statements. To address this limitation, I introduce a statistical model that attends to the structure of political rhetoric when measuring expressed priorities: statements are naturally organized b ..."
Abstract

Cited by 53 (4 self)
 Add to MetaCart
(Show Context)
Political scientists lack methods to efficiently measure the priorities political actors emphasize in statements. To address this limitation, I introduce a statistical model that attends to the structure of political rhetoric when measuring expressed priorities: statements are naturally organized by author. The expressed agenda model exploits this structure to simultaneously estimate the topics in the texts, as well as the attention political actors allocate to the estimated topics. I apply the method to a collection of over 64,000 press releases from senators from 20052007, which I demonstrate is an ideal medium to measure how senators explain their work in Washington to constituents. A set of examples validates the estimated priorities and demonstrates that the additional information included in the model provides better classification than expert human coders or statistical models for clustering that ignore the author of a document. The statistical model and its extensions will be made available in a forthcoming free software package for the R computing language and the press release data will be made available for download. ∗PhD Candidate, Harvard University Department of Government. I thank the Center for American Political Studies
Bayesian wavelet regression on curves with application to a spectroscopic calibration problem
 Journal of the American Statistical Association
, 2001
"... Motivated by calibration problems in nearinfrared (N IR) spectroscopy, we consider the linear regression setting in which the many predictor variables arise from sampling an essentially continuous curve at equally spaced points and there may be multiple predictands. We tackle this regression proble ..."
Abstract

Cited by 46 (5 self)
 Add to MetaCart
Motivated by calibration problems in nearinfrared (N IR) spectroscopy, we consider the linear regression setting in which the many predictor variables arise from sampling an essentially continuous curve at equally spaced points and there may be multiple predictands. We tackle this regression problem by calculating the wavelet transforms of the discretized curves, then applying a Bayesian variable selection method using mixture priors to the multivariate regression of predictands on wavelet coef � cients. For prediction purposes, we average over a set of likely models. Applied to a particular problem in N IR spectroscopy, this approach was able to � nd subsets of the wavelet coef � cients with overall better predictive performance than the more usual approaches. In the application, the available predictors are measurements of the N IR re � ectance spectrum of biscuit dough pieces at 256 equally spaced wavelengths. The aim is to predict the composition (i.e., the fat, � our, sugar, and water content) of the dough pieces using the spectral variables. Thus we have a multivariate regression of four predictands on 256 predictors with quite high intercorrelation among the predictors. A training set of 39 samples is available to � t this regression. Applying a wavelet transform replaces the 256 measurements on each spectrum with 256 wavelet coef � cients that carry the same information. The variable selection method could use subsets of these coef � cients that gave good predictions for all four compositional variables on a separate test set of samples. Selecting in the wavelet domain rather than from the original spectral variables is appealing in this application, because a single wavelet coef � cient can carry information from a band of wavelengths in the original spectrum. This band can be narrow or wide, depending on the scale of the wavelet selected.
Canonical community ordination. Part I: Basic theory and linear methods. Ecoscience
 Ecoscience
, 1994
"... 1 Canonical community ordination comprises a collection of methods that relate species assemblages to their environment, in both observational studies and designed experiments. Canonical ordination differs from ordination sensu stricto in that species and environment data are analyzed simultaneously ..."
Abstract

Cited by 43 (1 self)
 Add to MetaCart
(Show Context)
1 Canonical community ordination comprises a collection of methods that relate species assemblages to their environment, in both observational studies and designed experiments. Canonical ordination differs from ordination sensu stricto in that species and environment data are analyzed simultaneously. Part I reviews the theory in a nonmathematical way with emphasis on new insights for the interpretation of ordination diagrams. The interpretation depends on the ordination method used to create the diagram. After the basic theory, Part I is focused on the ordination diagrams in linear methods of canonical community ordination, in particular principal components analysis, redundancy analysis and canonical correlation analysis. Special attention is devoted to the display of qualitative environmental variables. Key words: principal components analysis, redundancy analysis, canonical correlation analysis, biplot, ordination diagram, speciesenvironment relations. 2
Sex differences in brain gray and white matter in healthy young adults: correlations with cognitive performance
 J. Neurosci
, 1999
"... Sexrelated differences in behavior are extensive, but their neuroanatomic substrate is unclear. Indirect perfusion data have suggested a higher percentage of gray matter (GM) in left hemisphere cortex and in women, but differences in volumes of the major cranial compartments have not been examined ..."
Abstract

Cited by 40 (2 self)
 Add to MetaCart
Sexrelated differences in behavior are extensive, but their neuroanatomic substrate is unclear. Indirect perfusion data have suggested a higher percentage of gray matter (GM) in left hemisphere cortex and in women, but differences in volumes of the major cranial compartments have not been examined for the entire brain in association with cognitive performance. We used volumetric segmentation of dual echo (proton density and T2weighted) magnetic resonance imaging (MRI) scans in healthy volunteers (40 men, 40 women) age 18–45. Supertentorial volume was segmented into GM, white matter (WM), and CSF. We confirmed that women have a higher percentage of GM, whereas men have a higher percentage of WM and of CSF. These differences sustained a correction for total intracranial volume. In men the slope of the relation between cranial volume and GM paralleled that for WM, whereas in women the increase
Biplots of compositional data
 Applied Statistics
, 2002
"... The singular value decomposition and its interpretation as a linear biplot has proved to be a powerful tool for analysing many forms of multivariate data. Here we adapt biplot methodology to the speci¯c case of compositional data consisting of positive vectors each of which is constrained to have un ..."
Abstract

Cited by 39 (12 self)
 Add to MetaCart
(Show Context)
The singular value decomposition and its interpretation as a linear biplot has proved to be a powerful tool for analysing many forms of multivariate data. Here we adapt biplot methodology to the speci¯c case of compositional data consisting of positive vectors each of which is constrained to have unit sum. These relative variation biplots have properties relating to special features of compositional data: the study of ratios, subcompositions and models of compositional relationships. The methodology is demonstrated on a data set consisting of sixpart colour compositions in 22 abstract paintings, showing how the singular value decomposition can achieve an accurate biplot of the colour ratios and how possible models interrelating the colours can be diagnosed. 2 1
Regression analysis of variates observed on (0, 1): percentages, proportions, and fractions
, 2003
"... Abstract: Many types of studies examine the in�uence of selected variables on the conditional expectation of a proportion or vector of proportions, for example, market shares, rock composition, and so on. We identify four distributional categories into which such data can be put, and focus on regres ..."
Abstract

Cited by 38 (0 self)
 Add to MetaCart
Abstract: Many types of studies examine the in�uence of selected variables on the conditional expectation of a proportion or vector of proportions, for example, market shares, rock composition, and so on. We identify four distributional categories into which such data can be put, and focus on regression models for