Word Association Norms, Mutual Information, and Lexicography
, 1990
"... This paper will propose an objective measure based on the information theoretic notion of mutual information, for estimating word association norms from computer readable corpora. (The standard method of obtaining word association norms, testing a few thousand subjects on a few hundred words, is b ..."
This paper will propose an objective measure based on the information theoretic notion of mutual information, for estimating word association norms from computer readable corpora. (The standard method of obtaining word association norms, testing a few thousand subjects on a few hundred words, is both costly and unreliable.) The proposed measure, the association ratio, estimates word association norms directly from computer readable corpora, making it possible to estimate norms for tens of thousands of words
Texttranslation alignment
, 1988
"... We present an algorithm for aligning texts with their translations that is based only on internal evidence. The relaxation process rests on a notion of which word in one text corresponds to which word in the other text that is essentially based on the similarity of their distributions. It exploits a ..."
We present an algorithm for aligning texts with their translations that is based only on internal evidence. The relaxation process rests on a notion of which word in one text corresponds to which word in the other text that is essentially based on the similarity of their distributions. It exploits a partial alignment of the word level to induce a maximum likelihood alignment of the sentence level, which is in turn used, in the next iteration, to refine the word level estimate. The algorithm appears to converge to the correct sentence alignment in only a few iterations. 1. The Problem To align a text with a translation of it in another language is, in the terminology of this paper, to show which of its parts are translated by what parts of the second text. The result takes the form of a list of pairs of itemswords, sentences, paragraphs, or whateverfrom the two texts. A pair (a ~ b> is on the list if a is translated, in whole or in part, by b. If (a, b> and (a, c) are on the list, it is because a is translated partly by b, and partly by c. We say that the alignment is partial if only some of the items of the chosen kind from one or other of the texts are represented in the pairs. Otherwise, it is complete.
InformationTheoretic Determination of Minimax Rates of Convergence
 Ann. Stat
, 1997
"... In this paper, we present some general results determining minimax bounds on statistical risk for density estimation based on certain informationtheoretic considerations. These bounds depend only on metric entropy conditions and are used to identify the minimax rates of convergence. ..."
In this paper, we present some general results determining minimax bounds on statistical risk for density estimation based on certain informationtheoretic considerations. These bounds depend only on metric entropy conditions and are used to identify the minimax rates of convergence.
The Method of Types
, 1998
"... The method of types is one of the key technical tools in Shannon Theory, and this tool is valuable also in other fields. In this paper, some key applications will be presented in sufficient detail enabling an interested nonspecialist to gain a working knowledge of the method, and a wide selection of ..."
The method of types is one of the key technical tools in Shannon Theory, and this tool is valuable also in other fields. In this paper, some key applications will be presented in sufficient detail enabling an interested nonspecialist to gain a working knowledge of the method, and a wide selection of further applications will be surveyed. These range from hypothesis testing and large deviations theory through error exponents for discrete memoryless channels and capacity of arbitrarily varying channels to multiuser problems. While the method of types is suitable primarily for discrete memoryless models, its extensions to certain models with memory will also be discussed. Index TermsArbitrarily varying channels, choice of decoder, counting approach, error exponents, extended type concepts, hypothesis testing, large deviations, multiuser problems, universal coding. I.
Mutual Information in Learning Feature Transformations
 In Proceedings of the 17th International Conference on Machine Learning
, 2000
"... We present feature transformations useful for exploratory data analysis or for pattern recognition. Transformations are learned from example data sets by maximizing the mutual information between transformed data and their class labels. We make use of Renyi's quadratic entropy, and we extend the wor ..."
We present feature transformations useful for exploratory data analysis or for pattern recognition. Transformations are learned from example data sets by maximizing the mutual information between transformed data and their class labels. We make use of Renyi's quadratic entropy, and we extend the work of Principe et al. to mutual information between continuous multidimensional variables and discretevalued class labels. 1.
Effects of disfluencies, predictability, and utterance position on word form variation in English conversation
, 2003
"... Function words, especially frequently occurring ones such as (the, that, and, and of), vary widely in pronunciation. Understanding this variation is essential both for cognitive modeling of lexical production and for computer speech recognition and synthesis. This study investigates which factors ..."
Function words, especially frequently occurring ones such as (the, that, and, and of), vary widely in pronunciation. Understanding this variation is essential both for cognitive modeling of lexical production and for computer speech recognition and synthesis. This study investigates which factors affect the forms of function words, especially whether they have a fuller pronunciation (e.g., , , 22 , ) or a more reduced or lenited pronunciation (e.g., ). It is based on over 8000 occurrences of the ten most frequent English function words in a fourhour sample from conversations from the Switchboard corpus. Ordinary linear and logistic regression models were used to examine variation in the length of the words, in the form of their vowel (basic, full, or reduced), and whether final obstruents were present or not. For all these measures, after controlling for segmental context, rate of speech, and other important factors, there are strong independent effects that made highfrequency monosyllabic function words more likely to be longer or have a fuller form (1) when neighboring disfluencies (such as filled pauses uh and um) indicate that the speaker was encountering problems in planning the utterance; (2) when the word is unexpected, i.e less predictable in context; (3) when the word is either utteranceinitial or utterancefinal. Looking at the phenomenon in a different way, frequent function words are more likely to be shorter and to have less full forms in fluent speech, in predictable positions or multiword collocations, and utteranceinternally. Also considered are other factors such as sex (women are more likely to use fuller forms, even after controlling for rate of speech, for example), and some of the dif...
A Multistage Representation of the Wiener Filter Based on Orthogonal Projections
 IEEE Transactions on Information Theory
, 1998
"... The Wiener filter is analyzed for stationary complex Gaussian signals from an informationtheoretic point of view. A dualport analysis of the Wiener filter leads to a decomposition based on orthogonal projections and results in a new multistage method for implementing the Wiener filter using a nest ..."
The Wiener filter is analyzed for stationary complex Gaussian signals from an informationtheoretic point of view. A dualport analysis of the Wiener filter leads to a decomposition based on orthogonal projections and results in a new multistage method for implementing the Wiener filter using a nested chain of scalar Wiener filters. This new representation of the Wiener filter provides the capability to perform an informationtheoretic analysis of previous, basisdependent, reducedrank Wiener filters. This analysis demonstrates that the recently introduced crossspectral metric is optimal in the sense that it maximizes mutual information between the observed and desired processes. A new reducedrank Wiener filter is developed based on this new structure which evolves a basis using successive projections of the desired signal onto orthogonal, lower dimensional subspaces. The performance is evaluated using a comparative computer analysis model and it is demonstrated that the lowcomplexity multistage reducedrank Wiener filter is capable of outperforming the more complex eigendecompositionbased methods.
A Methodology for Information Theoretic Feature Extraction
 in World Congress on Computational Intelligence
, 1998
"... We discuss an unsupervised feature extraction method which is driven by an information theoretic based criterion: mutual information. While information theoretic signal processing has been examined by many authors the method presented here is more closely related to the approaches of Linsker (1988,1 ..."
We discuss an unsupervised feature extraction method which is driven by an information theoretic based criterion: mutual information. While information theoretic signal processing has been examined by many authors the method presented here is more closely related to the approaches of Linsker (1988,1990), Bell and Sejnowski (1995), and Viola et al (1996). The method we discuss differs from previous work in several aspects. It is extensible to a feedforward multilayer perceptron with an arbitrary number of layers. No assumptions are made about the underlying PDF of the input space. It exploits a property of entropy coupled with a saturating nonlinearity resulting in a method for entropy manipulation with computational complexity proportional to the number of data samples squared. This represents a significant computational savings over previous methods (Viola et al, 1996). As mutual information is a function of two entropy terms, the method for entropy manipulation can be directly appl...
A Theory of Term Weighting Based on Exploratory Data Analysis
 Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
, 1998
"... Techniques of exploratory data analysis are used to study the weight of evidence that the occurrence of a query term provides in support of the hypothesis that a document is relevant to an information need. In particular, the relationship between the document frequency and the weight of evidence is ..."
Techniques of exploratory data analysis are used to study the weight of evidence that the occurrence of a query term provides in support of the hypothesis that a document is relevant to an information need. In particular, the relationship between the document frequency and the weight of evidence is investigated. A correlation between document frequency normalized by collection size and the mutual information between relevance and term occurrence is uncovered. This correlation is found to be robust across a variety of query sets and document collections. Based on this relationship, a theoretical explanation of the efficacy of inverse document frequency for term weighting is developed which differs in both style and content from theories previously put forth. The theory predicts that a "flattening" of idf at both low and high frequency should result in improved retrieval performance. This altered idf formulation is tested on all TREC query sets. Retrieval results corroborate the predicti...
Word informativeness and automatic pitch accent modeling
 in EMNLP/VCL
, 1999
"... To appear in Proc. of EMNLP/VLC, 1999. In intonational phonology and speech synthesis research, it has been suggested that the relative informativeness of a word can be used to predict pitch prominence. The more information conveyed by a word, the more likely it will be accented. But there are other ..."
To appear in Proc. of EMNLP/VLC, 1999. In intonational phonology and speech synthesis research, it has been suggested that the relative informativeness of a word can be used to predict pitch prominence. The more information conveyed by a word, the more likely it will be accented. But there are others who express doubts about such a correlation. In this paper, we provide some empirical evidence to support the existence of such a correlation by employing two widely accepted measures of informativeness. Our experiments show that there is a positive correlation between the informativeness of a word and its pitch accent assignment. They also show that informativeness enables statistically significant improvements in pitch accent prediction. The computation of word informativeness is inexpensive and can be incorporated into speech synthesis systems easily. 1