Results 1  10
of
231,916
Automatic Word Sense Discrimination
 Journal of Computational Linguistics
, 1998
"... This paper presents contextgroup discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a highdimensional, realvalued space in which closen ..."
Abstract

Cited by 530 (1 self)
 Add to MetaCart
This paper presents contextgroup discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a highdimensional, realvalued space in which
Understanding Normal and Impaired Word Reading: Computational Principles in QuasiRegular Domains
 PSYCHOLOGICAL REVIEW
, 1996
"... We develop a connectionist approach to processing in quasiregular domains, as exemplified by English word reading. A consideration of the shortcomings of a previous implementation (Seidenberg & McClelland, 1989, Psych. Rev.) in reading nonwords leads to the development of orthographic and phono ..."
Abstract

Cited by 583 (94 self)
 Add to MetaCart
We develop a connectionist approach to processing in quasiregular domains, as exemplified by English word reading. A consideration of the shortcomings of a previous implementation (Seidenberg & McClelland, 1989, Psych. Rev.) in reading nonwords leads to the development of orthographic
Finding community structure in networks using the eigenvectors of matrices
, 2006
"... We consider the problem of detecting communities or modules in networks, groups of vertices with a higherthanaverage density of edges connecting them. Previous work indicates that a robust approach to this problem is the maximization of the benefit function known as “modularity ” over possible div ..."
Abstract

Cited by 500 (0 self)
 Add to MetaCart
divisions of a network. Here we show that this maximization process can be written in terms of the eigenspectrum of a matrix we call the modularity matrix, which plays a role in community detection similar to that played by the graph Laplacian in graph partitioning calculations. This result leads us to a
Reversible jump Markov chain Monte Carlo computation and Bayesian model determination
 Biometrika
, 1995
"... Markov chain Monte Carlo methods for Bayesian computation have until recently been restricted to problems where the joint distribution of all variables has a density with respect to some xed standard underlying measure. They have therefore not been available for application to Bayesian model determi ..."
Abstract

Cited by 1330 (24 self)
 Add to MetaCart
Markov chain Monte Carlo methods for Bayesian computation have until recently been restricted to problems where the joint distribution of all variables has a density with respect to some xed standard underlying measure. They have therefore not been available for application to Bayesian model
Maximum entropy markov models for information extraction and segmentation
, 2000
"... Hidden Markov models (HMMs) are a powerful probabilistic tool for modeling sequential data, and have been applied with success to many textrelated tasks, such as partofspeech tagging, text segmentation and information extraction. In these cases, the observations are usually modeled as multinomial ..."
Abstract

Cited by 554 (18 self)
 Add to MetaCart
as multinomial distributions over a discrete vocabulary, and the HMM parameters are set to maximize the likelihood of the observations. This paper presents a new Markovian sequence model, closely related to HMMs, that allows observations to be represented as arbitrary overlapping features (such as word
Inducing Features of Random Fields
 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 1997
"... We present a technique for constructing random fields from a set of training samples. The learning paradigm builds increasingly complex fields by allowing potential functions, or features, that are supported by increasingly large subgraphs. Each feature has a weight that is trained by minimizing the ..."
Abstract

Cited by 664 (14 self)
 Add to MetaCart
introduced in this paper differ from those common to much of the computer vision literature in that the underlying random fields are nonMarkovian and have a large number of parameters that must be estimated. Relations to other learning approaches, including decision trees, are given. As a demonstration
Property Testing and its connection to Learning and Approximation
"... We study the question of determining whether an unknown function has a particular property or is fflfar from any function with that property. A property testing algorithm is given a sample of the value of the function on instances drawn according to some distribution, and possibly may query the fun ..."
Abstract

Cited by 498 (68 self)
 Add to MetaCart
w.r.t the vertex set). Our graph property testing algorithms are probabilistic and make assertions which are correct with high probability, utilizing only poly(1=ffl) edgequeries into the graph, where ffl is the distance parameter. Moreover, the property testing algorithms can be used
The information bottleneck method
 University of Illinois
, 1999
"... We define the relevant information in a signal x ∈ X as being the information that this signal provides about another signal y ∈ Y. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken. ..."
Abstract

Cited by 545 (38 self)
 Add to MetaCart
We define the relevant information in a signal x ∈ X as being the information that this signal provides about another signal y ∈ Y. Examples include the information that face images provide about the names of the people portrayed, or the information that speech sounds provide about the words spoken
A comparison of event models for Naive Bayes text classification
, 1998
"... Recent work in text classification has used two different firstorder probabilistic models for classification, both of which make the naive Bayes assumption. Some use a multivariate Bernoulli model, that is, a Bayesian Network with no dependencies between words and binary word features (e.g. Larkey ..."
Abstract

Cited by 1002 (27 self)
 Add to MetaCart
Recent work in text classification has used two different firstorder probabilistic models for classification, both of which make the naive Bayes assumption. Some use a multivariate Bernoulli model, that is, a Bayesian Network with no dependencies between words and binary word features (e
Results 1  10
of
231,916