Results 1 - 10
of
4,911
A Maximum Entropy approach to Natural Language Processing
- COMPUTATIONAL LINGUISTICS
, 1996
"... The concept of maximum entropy can be traced back along multiple threads to Biblical times. Only recently, however, have computers become powerful enough to permit the widescale application of this concept to real world problems in statistical estimation and pattern recognition. In this paper we des ..."
Abstract
-
Cited by 846 (5 self)
- Add to MetaCart
The concept of maximum entropy can be traced back along multiple threads to Biblical times. Only recently, however, have computers become powerful enough to permit the widescale application of this concept to real world problems in statistical estimation and pattern recognition. In this paper we describe a method for statistical modeling based on maximum entropy. We present a maximum-likelihood approach for automatically constructing maximum entropy models and describe how to implement this approach efficiently, using as examples several problems in natural language processing.
An information-maximization approach to blind separation and blind deconvolution
- NEURAL COMPUTATION
, 1995
"... ..."
Network Information Flow
- IEEE TRANSACTIONS ON INFORMATION THEORY
, 2000
"... We introduce a new class of problems called network information flow which is inspired by computer network applications. Consider a point-to-point communication network on which a number of information sources are to be mulitcast to certain sets of destinations. We assume that the information source ..."
Abstract
-
Cited by 698 (6 self)
- Add to MetaCart
We introduce a new class of problems called network information flow which is inspired by computer network applications. Consider a point-to-point communication network on which a number of information sources are to be mulitcast to certain sets of destinations. We assume that the information sources are mutually independent. The problem is to characterize the admissible coding rate region. This model subsumes all previously studied models along the same line. In this paper, we study the problem with one information source, and we have obtained a simple characterization of the admissible coding rate region. Our result can be regarded as the Max-flow Min-cut Theorem for network information flow. Contrary to one’s intuition, our work reveals that it is in general not optimal to regard the information to be multicast as a “fluid” which can simply be routed or replicated. Rather, by employing coding at the nodes, which we refer to as network coding, bandwidth can in general be saved. This finding may have significant impact on future design of switching systems.
Text Classification from Labeled and Unlabeled Documents using EM
- Machine Learning
, 1999
"... . This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large qua ..."
Abstract
-
Cited by 633 (16 self)
- Add to MetaCart
. This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large quantities of unlabeled documents are readily available. We introduce an algorithm for learning from labeled and unlabeled documents based on the combination of Expectation-Maximization (EM) and a naive Bayes classifier. The algorithm first trains a classifier using the available labeled documents, and probabilistically labels the unlabeled documents. It then trains a new classifier using the labels for all the documents, and iterates to convergence. This basic EM procedure works well when the data conform to the generative assumptions of the model. However these assumptions are often violated in practice, and poor performance can result. We present two extensions to the algorithm that improve ...
A comparison of event models for Naive Bayes text classification
, 1998
"... Recent work in text classification has used two different first-order probabilistic models for classification, both of which make the naive Bayes assumption. Some use a multi-variate Bernoulli model, that is, a Bayesian Network with no dependencies between words and binary word features (e.g. Larkey ..."
Abstract
-
Cited by 620 (23 self)
- Add to MetaCart
Recent work in text classification has used two different first-order probabilistic models for classification, both of which make the naive Bayes assumption. Some use a multi-variate Bernoulli model, that is, a Bayesian Network with no dependencies between words and binary word features (e.g. Larkey and Croft 1996; Koller and Sahami 1997). Others use a multinomial model, that is, a uni-gram language model with integer word counts (e.g. Lewis and Gale 1994; Mitchell 1997). This paper aims to clarify the confusion by describing the differences and details of these two models, and by empirically comparing their classification performance on five text corpora. We find that the multi-variate Bernoulli performs well with small vocabulary sizes, but that the multinomial performs usually performs even better at larger vocabulary sizes---providing on average a 27% reduction in error over the multi-variate Bernoulli model at any vocabulary size.
The space complexity of approximating the frequency moments
- JOURNAL OF COMPUTER AND SYSTEM SCIENCES
, 1996
"... The frequency moments of a sequence containing mi elements of type i, for 1 ≤ i ≤ n, are the numbers Fk = �n i=1 mki. We consider the space complexity of randomized algorithms that approximate the numbers Fk, when the elements of the sequence are given one by one and cannot be stored. Surprisingly, ..."
Abstract
-
Cited by 570 (13 self)
- Add to MetaCart
The frequency moments of a sequence containing mi elements of type i, for 1 ≤ i ≤ n, are the numbers Fk = �n i=1 mki. We consider the space complexity of randomized algorithms that approximate the numbers Fk, when the elements of the sequence are given one by one and cannot be stored. Surprisingly, it turns out that the numbers F0, F1 and F2 can be approximated in logarithmic space, whereas the approximation of Fk for k ≥ 6 requires nΩ(1) space. Applications to data bases are mentioned as well.
No Free Lunch Theorems for Optimization
, 1997
"... A framework is developed to explore the connection between effective optimization algorithms and the problems they are solving. A number of “no free lunch ” (NFL) theorems are presented which establish that for any algorithm, any elevated performance over one class of problems is offset by performan ..."
Abstract
-
Cited by 516 (8 self)
- Add to MetaCart
A framework is developed to explore the connection between effective optimization algorithms and the problems they are solving. A number of “no free lunch ” (NFL) theorems are presented which establish that for any algorithm, any elevated performance over one class of problems is offset by performance over another class. These theorems result in a geometric interpretation of what it means for an algorithm to be well suited to an optimization problem. Applications of the NFL theorems to information-theoretic aspects of optimization and benchmark measures of performance are also presented. Other issues addressed include time-varying optimization problems and a priori “head-to-head” minimax distinctions between optimization algorithms, distinctions that result despite the NFL theorems’ enforcing of a type of uniformity over all algorithms.
Quantization
- IEEE TRANS. INFORM. THEORY
, 1998
"... The history of the theory and practice of quantization dates to 1948, although similar ideas had appeared in the literature as long ago as 1898. The fundamental role of quantization in modulation and analog-to-digital conversion was first recognized during the early development of pulsecode modula ..."
Abstract
-
Cited by 515 (10 self)
- Add to MetaCart
The history of the theory and practice of quantization dates to 1948, although similar ideas had appeared in the literature as long ago as 1898. The fundamental role of quantization in modulation and analog-to-digital conversion was first recognized during the early development of pulsecode modulation systems, especially in the 1948 paper of Oliver, Pierce, and Shannon. Also in 1948, Bennett published the first high-resolution analysis of quantization and an exact analysis of quantization noise for Gaussian processes, and Shannon published the beginnings of rate distortion theory, which would provide a theory for quantization as analog-to-digital conversion and as data compression. Beginning with these three papers of fifty years ago, we trace the history of quantization from its origins through this decade, and we survey the fundamentals of the theory and many of the popular and promising techniques for quantization.
Cooperative diversity in wireless networks: efficient protocols and outage behavior
- IEEE Trans. Inform. Theory
, 2004
"... Abstract—We develop and analyze low-complexity cooperative diversity protocols that combat fading induced by multipath propagation in wireless networks. The underlying techniques exploit space diversity available through cooperating terminals’ relaying signals for one another. We outline several str ..."
Abstract
-
Cited by 512 (24 self)
- Add to MetaCart
Abstract—We develop and analyze low-complexity cooperative diversity protocols that combat fading induced by multipath propagation in wireless networks. The underlying techniques exploit space diversity available through cooperating terminals’ relaying signals for one another. We outline several strategies employed by the cooperating radios, including fixed relaying schemes such as amplify-and-forward and decode-and-forward, selection relaying schemes that adapt based upon channel measurements between the cooperating terminals, and incremental relaying schemes that adapt based upon limited feedback from the destination terminal. We develop performance characterizations in terms of outage events and associated outage probabilities, which measure robustness of the transmissions to fading, focusing on the high signal-to-noise ratio (SNR) regime. Except for fixed decode-and-forward, all of our cooperative diversity protocols are efficient in the sense that they achieve full diversity (i.e., second-order diversity in the case of two terminals), and, moreover, are close to optimum (within 1.5 dB) in certain regimes. Thus, using distributed antennas, we can provide the powerful benefits of space diversity without need for physical arrays, though at a loss of spectral efficiency due to half-duplex operation and possibly at the cost of additional receive hardware. Applicable to any wireless setting, including cellular or ad hoc networks—wherever space constraints preclude the use of physical arrays—the performance characterizations reveal that large power or energy savings result from the use of these protocols. Index Terms—Diversity techniques, fading channels, outage probability, relay channel, user cooperation, wireless networks. I.

