Results 1  10
of
103
Discrete Multivariate Analysis: Theory and Practice
, 1975
"... the collaboration of Richard J. Light and Frederick Mosteller. ..."
Abstract

Cited by 420 (34 self)
 Add to MetaCart
the collaboration of Richard J. Light and Frederick Mosteller.
Classification by pairwise coupling
, 1998
"... We discuss a strategy for polychotomous classification that involves estimating class probabilities for each pair of classes, and then coupling the estimates together. The coupling model is similar to the BradleyTerry method for paired comparisons. We study the nature of the class probability estim ..."
Abstract

Cited by 273 (0 self)
 Add to MetaCart
We discuss a strategy for polychotomous classification that involves estimating class probabilities for each pair of classes, and then coupling the estimates together. The coupling model is similar to the BradleyTerry method for paired comparisons. We study the nature of the class probability estimates that arise, and examine the performance of the procedure in real and simulated datasets. Classifiers used include linear discriminants, nearest neighbors, and the support vector machine.
A Comparison of Algorithms for Maximum Entropy Parameter Estimation
"... A comparison of algorithms for maximum entropy parameter estimation Conditional maximum entropy (ME) models provide a general purpose machine learning technique which has been successfully applied to fields as diverse as computer vision and econometrics, and which is used for a wide variety of class ..."
Abstract

Cited by 228 (2 self)
 Add to MetaCart
A comparison of algorithms for maximum entropy parameter estimation Conditional maximum entropy (ME) models provide a general purpose machine learning technique which has been successfully applied to fields as diverse as computer vision and econometrics, and which is used for a wide variety of classification problems in natural language processing. However, the flexibility of ME models is not without cost. While parameter estimation for ME models is conceptually straightforward, in practice ME models for typical natural language tasks are very large, and may well contain many thousands of free parameters. In this paper, we consider a number of algorithms for estimating the parameters of ME models, including iterative scaling, gradient ascent, conjugate gradient, and variable metric methods. Surprisingly, the standardly used iterative scaling algorithms perform quite poorly in comparison to the others, and for all of the test problems, a limitedmemory variable metric algorithm outperformed the other choices.
Traffic Matrix Estimation: Existing Techniques and New Directions
, 2002
"... Very few techniques have been proposed for estimating traffic matrices in the context of Internet traffic. Our work on POPtoPOP traffic matrices (TM) makes two contributions. The primary contribution is the outcome of a detailed comparative evaluation of the three existing techniques. We evaluate ..."
Abstract

Cited by 172 (13 self)
 Add to MetaCart
Very few techniques have been proposed for estimating traffic matrices in the context of Internet traffic. Our work on POPtoPOP traffic matrices (TM) makes two contributions. The primary contribution is the outcome of a detailed comparative evaluation of the three existing techniques. We evaluate these methods with respect to the estimation errors yielded, sensitivity to prior information required and sensitivity to the statistical assumptions they make. We study the impact of characteristics such as path length and the amount of link sharing on the estimation errors. Using actual data from a Tier1 backbone, we assess the validity of the typical assumptions needed by the TM estimation techniques. The secondary contribution of our work is the proposal of a new direction for TM estimation based on using choice models to model POP fanouts. These models allow us to overcome some of the problems of existing methods because they can incorporate additional data and information about POPs and they enable us to make a fundamentally different kind of modeling assumption. We validate this approach by illustrating that our modeling assumption matches actual Internet data well. Using two initial simple models we provide a proof of concept showing that the incorporation of knowledge of POP features (such as total incoming bytes, number of customers, etc.) can reduce estimation errors. Our proposed approach can be used in conjunction with existing or future methods in that it can be used to generate good priors that serve as inputs to statistical inference techniques.
Models of Translational Equivalence among Words
 Computational Linguistics
, 2000
"... This article presents methods for biasing statistical translation models to reflect these properties. Evaluation with respect to independent human judgments has confirmed that translation models biased in this fashion are significantly more accurate than a baseline knowledgefree model. This article ..."
Abstract

Cited by 143 (2 self)
 Add to MetaCart
This article presents methods for biasing statistical translation models to reflect these properties. Evaluation with respect to independent human judgments has confirmed that translation models biased in this fashion are significantly more accurate than a baseline knowledgefree model. This article also shows how a statistical translation model can take advantage of preexisting knowledge that might be available about particular language pairs. Even the simplest kinds of languagespecific knowledge, such as the distinction between content words and function words, are shown to reliably boost translation model performance on some tasks. Statistical models that reflect knowledge about the model domain combine the best of both the rationalist and empiricist paradigms
The Unified propagation and scaling algorithm
 In Advances in Neural Information Processing Systems
, 2002
"... In this paper we will show that a restricted class of constrained minimum divergence problems, named generalized inference problems, can be solved by approximating the KL divergence with a Bethe free energy. The algorithm we derive is closely related to both loopy belief propagation and iterative sc ..."
Abstract

Cited by 42 (9 self)
 Add to MetaCart
In this paper we will show that a restricted class of constrained minimum divergence problems, named generalized inference problems, can be solved by approximating the KL divergence with a Bethe free energy. The algorithm we derive is closely related to both loopy belief propagation and iterative scaling. This unified propagation and scaling algorithm reduces to a convergent alternative to loopy belief propagation when no constraints are present. Experiments show the viability of our algorithm. 1
Finding social groups: A metaanalysis of the southern women data
 Dynamic Social Network Modeling and Analysis. The National Academies
, 2003
"... For more than 100 years, sociologists have been concerned with relatively small, cohesive social groups (Tönnies, [1887] 1940; Durkheim [1893] 1933; Spencer 189597; Cooley, 1909). The groups that concern sociologists are not simply categories—like redheads or people more than ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
For more than 100 years, sociologists have been concerned with relatively small, cohesive social groups (Tönnies, [1887] 1940; Durkheim [1893] 1933; Spencer 189597; Cooley, 1909). The groups that concern sociologists are not simply categories—like redheads or people more than
Soft Evidential Update for Probabilistic Multiagent Systems
 INTERNATIONAL JOURNAL OF APPROXIMATE REASONING
, 2000
"... We address the problem of updating a probability distribution represented by a Bayesian network upon presentation of soft evidence. Our motivation ..."
Abstract

Cited by 26 (5 self)
 Add to MetaCart
We address the problem of updating a probability distribution represented by a Bayesian network upon presentation of soft evidence. Our motivation
Poststratification Into Many Categories Using Hierarchical Logistic Regression
, 1997
"... A standard method for correcting for unequal sampling probabilities and nonresponse in sample surveys is poststratification: that is, dividing the population into several categories, estimating the distribution of responses in each category, and then counting each category in proportion to its size ..."
Abstract

Cited by 20 (11 self)
 Add to MetaCart
A standard method for correcting for unequal sampling probabilities and nonresponse in sample surveys is poststratification: that is, dividing the population into several categories, estimating the distribution of responses in each category, and then counting each category in proportion to its size in the population. We consider poststratification as a general framework that includes many weighting schemes used in survey analysis (see Little, 1993). We construct a hierarchical logistic regression model for the mean of a binary response variable conditional on poststratification cells. The hierarchical model allows us to fit many more cells than is possible using classical methods, and thus to include much more populationlevel information, while at the same time including all the information used in standard survey sampling inferences. We are thus combining the modeling approach often used in smallarea estimation with the population information used in poststratification. We apply the...