Results 1 - 10
of
56
Random number generation
"... Random numbers are the nuts and bolts of simulation. Typically, all the randomness required by the model is simulated by a random number generator whose output is assumed to be a sequence of independent and identically distributed (IID) U(0, 1) random variables (i.e., continuous random variables dis ..."
Abstract
-
Cited by 123 (30 self)
- Add to MetaCart
Random numbers are the nuts and bolts of simulation. Typically, all the randomness required by the model is simulated by a random number generator whose output is assumed to be a sequence of independent and identically distributed (IID) U(0, 1) random variables (i.e., continuous random variables distributed uniformly over the interval
Development and Use of a Gold-Standard Data Set for Subjectivity Classifications
, 1999
"... and improving intercoder reliability in discourse tagging using statistical techniques. Biascorrected tags axe formulated and successfully used to guide a revision of the coding manual and develop an automatic classifier. ..."
Abstract
-
Cited by 48 (7 self)
- Add to MetaCart
and improving intercoder reliability in discourse tagging using statistical techniques. Biascorrected tags axe formulated and successfully used to guide a revision of the coding manual and develop an automatic classifier.
Comparing Corpora using Frequency Profiling
- In proceedings of the workshop on Comparing Corpora, held in conjunction ACL 2000. October 2000, Hong Kong
, 2000
"... This paper describes a method of comparing corpora which uses frequency profiling. The method can be used to discover key words in the corpora which differentiate one corpus from another. Using annotated corpora, it can be applied to discover key grammatical or word-sense categories. This can ..."
Abstract
-
Cited by 46 (2 self)
- Add to MetaCart
This paper describes a method of comparing corpora which uses frequency profiling. The method can be used to discover key words in the corpora which differentiate one corpus from another. Using annotated corpora, it can be applied to discover key grammatical or word-sense categories. This can be used as a quick way in to find the differences between the corpora and is shown to have applications in the study of social differentiation in the use of English vocabulary, profiling of learner English and document analysis in the software engineering process.
Fishing for Exactness
- In Proceedings of the South-Central SAS Users Group Conference
, 1996
"... Statistical methods for automatically identifying dependent word pairs (i.e. dependent bigrams) in a corpus of natural language text have traditionally been performed using asymptotic tests of significance. This paper suggests that Fisher's exact test is a more appropriate test due to the skewed and ..."
Abstract
-
Cited by 34 (4 self)
- Add to MetaCart
Statistical methods for automatically identifying dependent word pairs (i.e. dependent bigrams) in a corpus of natural language text have traditionally been performed using asymptotic tests of significance. This paper suggests that Fisher's exact test is a more appropriate test due to the skewed and sparse data samples typical of this problem. Both theoretical and experimental comparisons between Fisher's exact test and a variety of asymptotic tests (the t-test, Pearson's chi-square test, and Likelihood-ratio chi-square test) are presented. These comparisons show that Fisher's exact test is more reliable in identifying dependent word pairs. The usefulness of Fisher's exact test extends to other problems in statistical natural language processing as skewed and sparse data appears to be the rule in natural language. The experiment presented in this paper was performed using PROC FREQ of the SAS System. Introduction Due to advances in computing power and the increasing availability of l...
Csiszár’s divergences for non-negative matrix factorization: Family of new algorithms
- LNCS
, 2006
"... In this paper we discus a wide class of loss (cost) functions for non-negative matrix factorization (NMF) and derive several novel algorithms with improved efficiency and robustness to noise and outliers. We review several approaches which allow us to obtain generalized forms of multiplicative NMF a ..."
Abstract
-
Cited by 32 (15 self)
- Add to MetaCart
In this paper we discus a wide class of loss (cost) functions for non-negative matrix factorization (NMF) and derive several novel algorithms with improved efficiency and robustness to noise and outliers. We review several approaches which allow us to obtain generalized forms of multiplicative NMF algorithms and unify some existing algorithms. We give also the flexible and relaxed form of the NMF algorithms to increase convergence speed and impose some desired constraints such as sparsity and smoothness of components. Moreover, the effects of various regularization terms and constraints are clearly shown. The scope of these results is vast since the proposed generalized divergence functions include quite large number of useful loss functions such as the squared Euclidean distance,Kulback-Leibler divergence, Itakura-Saito, Hellinger, Pearson’s chi-square, and Neyman’s chi-square distances, etc. We have applied successfully the developed algorithms to blind (or semi blind) source separation (BSS) where sources can be generally statistically dependent, however they satisfy some other conditions or additional constraints such as nonnegativity, sparsity and/or smoothness.
Kernel measures of conditional dependence
- In Adv. NIPS
, 2008
"... We propose a new measure of conditional dependence of random variables, based on normalized cross-covariance operators on reproducing kernel Hilbert spaces. Unlike previous kernel dependence measures, the proposed criterion does not depend on the choice of kernel in the limit of infinite data, for a ..."
Abstract
-
Cited by 31 (24 self)
- Add to MetaCart
We propose a new measure of conditional dependence of random variables, based on normalized cross-covariance operators on reproducing kernel Hilbert spaces. Unlike previous kernel dependence measures, the proposed criterion does not depend on the choice of kernel in the limit of infinite data, for a wide class of kernels. At the same time, it has a straightforward empirical estimate with good convergence behaviour. We discuss the theoretical properties of the measure, and demonstrate its application in experiments. 1
A kernel statistical test of independence
, 2008
"... Although kernel measures of independence have been widely applied in machine learning (notably in kernel ICA), there is as yet no method to determine whether they have detected statistically significant dependence. We provide a novel test of the independence hypothesis for one particular kernel inde ..."
Abstract
-
Cited by 25 (20 self)
- Add to MetaCart
Although kernel measures of independence have been widely applied in machine learning (notably in kernel ICA), there is as yet no method to determine whether they have detected statistically significant dependence. We provide a novel test of the independence hypothesis for one particular kernel independence measure, the Hilbert-Schmidt independence criterion (HSIC). The resulting test costs O(m 2), where m is the sample size. We demonstrate that this test outperforms established contingency table and functional correlation-based tests, and that this advantage is greater for multivariate data. Finally, we show the HSIC test also applies to text (and to structured data more generally), for which no other independence test presently exists. 1
Significant lexical relationships
- Proceedings of the 13th National Conference on Artificial Intelligence (AAAI-96
, 1996
"... Statistical NLP inevitably deals with a large number of rare events. As a consequence, NLP data often violates the assumptions implicit in traditional statistical procedures such as significance testing. We describe a significance test, an exact conditional test, that is appropriate for NLP data and ..."
Abstract
-
Cited by 24 (14 self)
- Add to MetaCart
Statistical NLP inevitably deals with a large number of rare events. As a consequence, NLP data often violates the assumptions implicit in traditional statistical procedures such as significance testing. We describe a significance test, an exact conditional test, that is appropriate for NLP data and can be performed using freely available software. We apply this test to the study of lexical relationships and demonstrate that the results obtained using this test are both theoretically more reliable and different from the results obtained using previously applied tests.
Decomposable Modeling in Natural Language Processing
, 1999
"... In this paper, we describe a framework for developing probabilistic classifiers in natural language processing. Our focus is on formulating models that capture the most important interdependencies among features, to avoid overfitting the data while also characterizing the data well. The class of pro ..."
Abstract
-
Cited by 23 (6 self)
- Add to MetaCart
In this paper, we describe a framework for developing probabilistic classifiers in natural language processing. Our focus is on formulating models that capture the most important interdependencies among features, to avoid overfitting the data while also characterizing the data well. The class of probability models and the associated inference techniques described here were developed in mathematical statistics, and are widely used in artificial intelligence and applied statistics. Our goal is to make this model selection framework accessible to researchers in NLP, and provide pointers to available software and important references. In addition, we describe how the quality of the three determinants of classifier performance (the features, the form of the model, and the parameter estimates) can be separately evaluated. We also demonstrate the classification performance of these models in a large-scale experiment involving the disambiguation of 34 words taken from the HECTOR word sense corpus (Hanks 1996). In 10-fold cross-validations, the model search procedure performs significantly better than naive Bayes on 6 of the words without being significantly worse on any of them
Non-Negative Matrix Factorization with Quasi-Newton Optimization
- In Eighth International Conference on Artificial Intelligence and Soft Computing, ICAISC
, 2006
"... Abstract. Non-negative matrix factorization (NMF) is an emerging method with wide spectrum of potential applications in data analysis, feature extraction and blind source separation. Currently, most applications use relative simple multiplicative NMF learning algorithms which were proposed by Lee an ..."
Abstract
-
Cited by 19 (6 self)
- Add to MetaCart
Abstract. Non-negative matrix factorization (NMF) is an emerging method with wide spectrum of potential applications in data analysis, feature extraction and blind source separation. Currently, most applications use relative simple multiplicative NMF learning algorithms which were proposed by Lee and Seung, and are based on minimization of the Kullback-Leibler divergence and Frobenius norm. Unfortunately, these algorithms are relatively slow and often need a few thousands of iterations to achieve a local minimum. In order to increase a convergence rate and to improve performance of NMF, we proposed to use a more general cost function: so-called Amari alpha divergence. Taking into account a special structure of the Hessian of this cost function, we derived a relatively simple second-order quasi-Newton method for NMF. The validity and performance of the proposed algorithm has been extensively tested for blind source separation problems, both for signals and images. The performance of the developed NMF algorithm is illustrated for separation of statistically dependent signals and images from their linear mixtures. 1

