### Word Sense Disambiguation Using Co-Occurrence Statistics on Random Labels

"... In this paper we present experiments using Random Indexing for “query expansion ” in Word Sense Disambiguation. Random Indexing is an efficient, scalable and incremental latent semantic indexing method somewhat akin to LSA, and has in these experiments shown promising results on a small test set for ..."

Abstract
- Add to MetaCart

In this paper we present experiments using Random Indexing for “query expansion ” in Word Sense Disambiguation. Random Indexing is an efficient, scalable and incremental latent semantic indexing method somewhat akin to LSA, and has in these experiments shown promising results on a small test set for Swedish with an accuracy up to 80 % with relatively little training data. We also compare it to results obtained when applying a Naïve Bayes classifier to the same training and data sets, retrieving a maximum accuracy of 56%. 1

### INTERSPEECH 2011 Incremental Learning and Forgetting in Stochastic Turn-Taking Models

"... We present a computational framework for stochastically modeling dyad interaction chronograms. The framework’s most novel feature is the capacity for incremental learning and forgetting. To showcase its flexibility, we design experiments answering four concrete questions about the systematics of spo ..."

Abstract
- Add to MetaCart

We present a computational framework for stochastically modeling dyad interaction chronograms. The framework’s most novel feature is the capacity for incremental learning and forgetting. To showcase its flexibility, we design experiments answering four concrete questions about the systematics of spoken interaction. The results show that: (1) individuals are clearly affected by one another; (2) there is individual variation in interaction strategy; (3) strategies wander in time rather than converge; and (4) individuals exhibit similarity with their interlocutors. We expect the proposed framework to be capable of answering many such questions with little additional effort. Index Terms: interaction, chronogram modeling, turn-taking, incremental learning.

### Summer Term

"... The decision tree is a well-known methodology for classification and regression. In this dissertation, we focus on the minimization of the misclassification rate for decision tree classifiers. We derive the necessary equations that provide the optimal tree prediction, the estimated risk of the tree’ ..."

Abstract
- Add to MetaCart

The decision tree is a well-known methodology for classification and regression. In this dissertation, we focus on the minimization of the misclassification rate for decision tree classifiers. We derive the necessary equations that provide the optimal tree prediction, the estimated risk of the tree’s prediction, and the reliability of the tree’s risk estimation. We carry out an extensive analysis of the application of Lidstone’s law of succession for the estimation of the class probabilities. In contrast to existing research, we not only compute the expected values of the risks but also calculate the corresponding reliability of the risk (measured by standard deviations). We also provide an explicit expression of the k-norm estimation for the tree’s misclassification rate that combines both the expected value and the reliability. Furthermore, our proposed and proven theorem on k-norm estimation suggests an efficient pruning algorithm that has a clear theoretical interpretation, is easily implemented, and does not require a validation set. Our experiments show that our proposed pruning algorithm produces accurate trees quickly that compares very favorably with two other wellknown pruning algorithms, CCP of CART and EBP of C4.5. Finally, our work provides a

### Athena: Mining-based Interactive Management ofText Databases

"... We describeAthena: a system for creating, exploiting, and maintaining a hierarchical arrangement of textual documents through interactive mining-based operations. Requirements of any suchsystem include speed and minimal end-user e ort. Athena satis es these requirements through linear-time classi ca ..."

Abstract
- Add to MetaCart

We describeAthena: a system for creating, exploiting, and maintaining a hierarchical arrangement of textual documents through interactive mining-based operations. Requirements of any suchsystem include speed and minimal end-user e ort. Athena satis es these requirements through linear-time classi cation and clustering engines which are applied interactively to speed the development of accurate models. Naive Bayes classi ers are recognized to be among the best for classifying text. We show that our specialization of the Naive Bayes classi er is considerably more accurate (7 to 29% absolute increase in accuracy) than a standard implementation. Our enhancements include using Lidstone's law of succession instead of Laplace's law, under-weighting long documents, and over-weighting author and subject. We also present anewinteractive clustering algorithm, C-Evolve, for topic discovery. C-Evolve rst nds highly accurate cluster digests (partial clusters), gets user feedback to merge and correct these digests, and then uses the classi cation algorithm to complete the partitioning of the data. By allowing this interactivity in the clustering process, C-Evolve achieves considerably higher clustering accuracy (10 to 20 % absolute increase in our experiments)thanthe popular K-Means and agglomerative clustering methods. 1

### LETTER Communicated by Liam Paninski Maximum Likelihood Set for Estimating a Probability Mass Function

"... We propose a new method for estimating the probability mass function (pmf) of a discrete and finite random variable from a small sample. We focus on the observed counts—the number of times each value appears in the sample—and define the maximum likelihood set (MLS) as the set of pmfs that put more m ..."

Abstract
- Add to MetaCart

We propose a new method for estimating the probability mass function (pmf) of a discrete and finite random variable from a small sample. We focus on the observed counts—the number of times each value appears in the sample—and define the maximum likelihood set (MLS) as the set of pmfs that put more mass on the observed counts than on any other set of counts possible for the same sample size. We characterize the MLS in detail in this article. We show that the MLS is a diamond-shaped subset of the probability simplex [0, 1] k bounded by at most k × (k − 1) hyperplanes, where k is the number of possible values of the random variable. The MLS always contains the empirical distribution, as well as a family of Bayesian estimators based on a Dirichlet prior, particularly the wellknown Laplace estimator. We propose to select from the MLS the pmf that is closest to a fixed pmf that encodes prior knowledge. When using Kullback-Leibler distance for this selection, the optimization problem comprises finding the minimum of a convex function over a domain defined by linear inequalities, for which standard numerical procedures are available. We apply this estimate to language modeling using Zipf’s law to encode prior knowledge and show that this method permits obtaining state-of-the-art results while being conceptually simpler than most competing methods. 1

### COMPARING RNNS AND LOG-LINEAR INTERPOLATION OF IMPROVED SKIP-MODEL ON FOUR BABEL LANGUAGES: CANTONESE, PASHTO, TAGALOG, TURKISH

"... Recurrent neural networks (RNNs) are a very recent technique to model long range dependencies in natural languages. They have clearly outperformed trigrams and other more advanced language modeling techniques by using non-linearly modeling long range dependencies. An alternative is to use log-linear ..."

Abstract
- Add to MetaCart

Recurrent neural networks (RNNs) are a very recent technique to model long range dependencies in natural languages. They have clearly outperformed trigrams and other more advanced language modeling techniques by using non-linearly modeling long range dependencies. An alternative is to use log-linear interpolation of skip models (i.e. skip bigrams and skip trigrams). The method as such has been published earlier. In this paper we investigate the impact of different smoothing techniques on the skip models as a measure of their overall performance. One option is to use automatically trained distance clusters (both hard and soft) to increase robustness and to combat sparseness in the skip model. We also investigate alternative smoothing techniques on word level. For skip bigrams when skipping a small number of words Kneser-Ney smoothing (KN) is advantageous. For a larger number of words being skipped Dirichlet smoothing performs better. In order to exploit the advantages of both KN and Dirichlet smoothing we propose a new unified smoothing technique. Experiments are performed on four Babel languages: Cantonese, Pashto, Tagalog and Turkish. RNNs and log-linearly interpolated skip models are on par if the skip models are trained with standard smoothing techniques. Using the improved smoothing of the skip models along with distance clusters, we can clearly outperform RNNs by about 8-11 % in perplexity across all four languages. Index Terms — RNNs, log-linear interpolation, skip models, smoothing, under researched languages

### doi:10.1155/2007/24602 Research Article A Model-Based Approach to Constructing Music Similarity Functions

"... Several authors have presented systems that estimate the audio similarity of two pieces of music through the calculation of a distance metric, such as the Euclidean distance, between spectral features calculated from the audio, related to the timbre or pitch of the signal. These features can be augm ..."

Abstract
- Add to MetaCart

Several authors have presented systems that estimate the audio similarity of two pieces of music through the calculation of a distance metric, such as the Euclidean distance, between spectral features calculated from the audio, related to the timbre or pitch of the signal. These features can be augmented with other, temporally or rhythmically based features such as zero-crossing rates, beat histograms, or fluctuation patterns to form a more well-rounded music similarity function. It is our contention that perceptual or cultural labels, such as the genre, style, or emotion of the music, are also very important features in the perception of music. These labels help to define complex regions of similarity within the available feature spaces. We demonstrate a machinelearning-based approach to the construction of a similarity metric, which uses this contextual information to project the calculated features into an intermediate space where a music similarity function that incorporates some of the cultural information may be calculated. Copyright © 2007 Hindawi Publishing Corporation. All rights reserved. 1.

### A Comparison of Smoothing Techniques for Bilingual Lexicon Extraction from Comparable Corpora

"... Smoothing is a central issue in language modeling and a prior step in different natural language processing (NLP) tasks. However, less attention has been given to it for bilingual lexicon extraction from comparable corpora. If a first work to improve the extraction of low frequency words showed sign ..."

Abstract
- Add to MetaCart

Smoothing is a central issue in language modeling and a prior step in different natural language processing (NLP) tasks. However, less attention has been given to it for bilingual lexicon extraction from comparable corpora. If a first work to improve the extraction of low frequency words showed significant improvement while using distance-based averaging (Pekar et al., 2006), no investigation of the many smoothing techniques has been carried out so far. In this paper, we present a study of some widelyused smoothing algorithms for language n-gram modeling (Laplace, Good-Turing, Kneser-Ney...). Our main contribution is to investigate how the different smoothing techniques affect the performance of the standard approach (Fung, 1998) traditionally used for bilingual lexicon extraction. We show that using smoothing as a preprocessing step of the standard approach increases its performance significantly. 1

### Harmonising Chorales in the Style of

"... This dissertation describes a chorale harmonisation system which uses Hidden Markov Models. We use a standard data set of chorale harmonisations composed by Johann Sebastian Bach. This data set provides a large number of stylistically similar harmonisations, and is freely available in a machine-read ..."

Abstract
- Add to MetaCart

This dissertation describes a chorale harmonisation system which uses Hidden Markov Models. We use a standard data set of chorale harmonisations composed by Johann Sebastian Bach. This data set provides a large number of stylistically similar harmonisations, and is freely available in a machine-readable format. We divide the data into training and test sets, and compare the predictive power of various models, as measured by cross-entropy, the negative log likelihood per symbol. Using Hidden Markov Models we create a harmonisation system which learns its harmonic rules by example, without a pre-programmed knowledge base. We assume that we only need to take into account short-term dependencies in the local context. However, we generate globally probable harmonisations, rather than choosing the locally most likely outcome at each decision. The results pro-duced by the system show that pre-programmed harmonic rules are not necessary for automatic harmonisation. Statistical observation of training examples provides the harmonic knowledge needed to generate reasonable chorale harmonisations.