Learning Subjective Language
 Computational Linguistics
, 1993
Subjectivity in natural language refers to aspects of language used to express opinions, evaluations, and speculations. There are numerous natural language processing applications for which subjectivity analysis is relevant, including information extraction and text categorization. The goal of this work is learning subjective language from corpora. Clues of subjectivity are generated and tested, including lowfrequency words, collocations, and adjectives and verbs identified using distributional similarity. The features are also examined working together in concert. The features, generated from different data sets using different procedures, exhibit consistency in performance in that they all do better and worse on the same data sets. In addition, this article shows that the density of subjectivity clues in the surrounding context strongly affects how likely it is that a word is subjective, and it provides the results of an annotation study assessing the subjectivity of sentences with highdensity features. Finally, the clues are used to perform opinion piece recognition (a type of text categorization and genre detection) to demonstrate the utility of the knowledge acquired in this article.
Maximum likelihood estimation of observer errorrates using the EM algorithm
 Applied Statistics
, 1979
JSTOR is a notforprofit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Learning Subjective Adjectives from Corpora
 In AAAI
, 2000
Subjectivity tagging is distinguishing sentences used to present opinions and evaluations from sentences used to objectively present factual information. There are numerous applications for which subjectivity tagging is relevant, including information extraction and information retrieval. This paper identifies strong clues of subjectivity using the results of a method for clustering words according to distributional similarity (Lin 1998), seeded by a small amount of detailed manual annotation. These features are then further refined with the addition of lexical semantic features of adjectives, specifically polarity and gradability (Hatzivassiloglou & McKeown 1997), which can be automatically learned from corpora. In 10fold cross validation experiments, features based on both similarity clusters and the lexical semantic features are shown to have higher precision than features based on each alone.
Development and Use of a GoldStandard Data Set for Subjectivity Classifications
, 1999
and improving intercoder reliability in discourse tagging using statistical techniques. Biascorrected tags axe formulated and successfully used to guide a revision of the coding manual and develop an automatic classifier.
Beyond SEM: General latent variable modeling
 Behaviormetrika
, 2002
This article gives an overview of statistical analysis with latent variables. Using traditional structural equation modeling as a starting point, it shows how the idea of latent variables captures a wide variety of statistical concepts, including random e&ects, missing data, sources of variation in hierarchical data, hnite mixtures, latent classes, and clusters. These latent variable applications go beyond the traditional latent variable useage in psychometrics with its focus on measurement error and hypothetical constructs measured by multiple indicators. The article argues for the value of integrating statistical and psychometric modeling ideas. Di&erent applications are discussed in a unifying framework that brings together in one general model such di&erent analysis types as factor models, growth curve models, multilevel models, latent class models and discretetime survival models. Several possible combinations and extensions of these models are made clear due to the unifying framework. 1.
The Theoretical Status of Latent Variables
 Psychological Review
, 2003
This article examines the theoretical status of latent variables as used in modern test theory models. First, it is argued that a consistent interpretation of such models requires a realist ontology for latent variables. Second, the relation between latent variables and their indicators is discussed. It is maintained that this relation can be interpreted as a causal one but that in measurement models for interindividual differences the relation does not apply to the level of the individual person. To substantiate intraindividual causal conclusions, one must explicitly represent individual level processes in the measurement model. Several research strategies that may be useful in this respect are discussed, and a typology of constructs is proposed on the basis of this analysis. The need to link individual processes to latent variable models for interindividual differences is emphasized. Consider the following sentence: “Einstein would not have been able to come up with his e � mc 2 had he not possessed such an extraordinary intelligence. ” What does this sentence express? It relates observable behavior (Einstein’s writing e � mc 2)toan unobservable attribute (his extraordinary intelligence), and it does so by assigning to the unobservable attribute a causal role in
Algebraic Geometry of Bayesian Networks
 Journal of Symbolic Computation
, 2005
We study the algebraic varieties defined by the conditional independence statements of Bayesian networks. A complete algebraic classification is given for Bayesian networks on at most five random variables. Hidden variables are related to the geometry of higher secant varieties. 1
Identifiability of parameters in latent structure models with many observed variables
 ANN. STATIST
, 2009
While hidden class models of various types arise in many statistical applications, it is often difficult to establish the identifiability of their parameters. Focusing on models in which there is some structure of independence of some of the observed variables conditioned on hidden ones, we demonstrate a general approach for establishing identifiability utilizing algebraic arguments. A theorem of J. Kruskal for a simple latentclass model with finite state space lies at the core of our results, though we apply it to a diverse set of models. These include mixtures of both finite and nonparametric product distributions, hidden Markov models and random graph mixture models, and lead to a number of new results and improvements to old ones. In the parametric setting, this approach indicates that for such models, the classical definition of identifiability is typically too strong. Instead generic identifiability holds, which implies that the set of nonidentifiable parameters has measure zero, so that parameter inference is still meaningful. In particular, this sheds light on the properties of finite mixtures of Bernoulli products, which have been used for decades despite being known to have nonidentifiable parameters. In the nonparametric setting, we again obtain identifiability only when certain restrictions are placed on the distributions that are mixed, but we explicitly describe the conditions.
Intergenerational Solidarity and the Structure of Adult ChildParent Relationships in American Families
 American Journal of Sociology
, 1997
The authors investigate the structure of intergenerational cohesion by examining socialpsychological, structural, and transactional aspects of adult child–parent relations. The authors use latent class analysis to develop a typology based on three underlying dimensions of intergenerational solidarity: affinity, opportunity structure, and function. The same five types are found for relations with both mothers and fathers: tightknit, sociable, intimate but distant, obligatory, and detached. Relationship types are also differentiated by sociodemographic characteristics; relations with fathers and divorced parents tended to have the weakest cohesion. The authors conclude that adult intergenerational relationships in American families are structurally diverse but generally possess the potential to serve their members ’ needs.
Stratified exponential families: Graphical models and model selection
 ANNALS OF STATISTICS
, 2001
