Results 1 - 10
of
11
An Information-Theoretic External Cluster-Validity Measure
- Research Report RJ 10219, IBM
, 2001
"... In this paper we propose a measure of similarity/association between two partitions of a set of objects. Our motivation is the desire to use the measure to characterize the quality or accuracy of clustering algorithms by somehow comparing the clusters they produce with "ground truth" consisting of c ..."
Abstract
-
Cited by 48 (2 self)
- Add to MetaCart
In this paper we propose a measure of similarity/association between two partitions of a set of objects. Our motivation is the desire to use the measure to characterize the quality or accuracy of clustering algorithms by somehow comparing the clusters they produce with "ground truth" consisting of classes assigned to the patterns by manual means or some other means in whose veracity there is confidence. Such measures are referred to as "external". Our measure also allows clusterings with different numbers of clusters to be compared in a quantitative and principled way. Our evaluation scheme quantitatively measures how useful the cluster labels of the patterns are as predictors of their class labels. When all clusterings to be compared have the same number of clusters, the measure is equivalent to the mutual information between the cluster labels and the class labels. In cases where the numbers of clusters are different, however, it computes the reduction in the number of bits that w...
On the Index of Dissimilarity for Lack of Fit in Log Linear Models
"... The index of dissimilarity, often denoted by Delta, is commonly used, especially in social science and with large datasets, to describe the lack of fit of models for categorical data. In this paper the definition and sampling properties of the index are investigated for general log-linear models. It ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The index of dissimilarity, often denoted by Delta, is commonly used, especially in social science and with large datasets, to describe the lack of fit of models for categorical data. In this paper the definition and sampling properties of the index are investigated for general log-linear models. It is argued that in some applications a standardized version of the index is appropriate for interpretation. A simple, approximate variance formula is derived for the index, whether standardized or not. A simple bias reduction formula is also given. The accuracy of these formulae and of confidence intervals based upon them is investigated in a simulation study based on large-scale social mobility data. Key words: bias reduction; dissimilarity index; extended hypergeometric; folded normal; iterative proportional fitting; iterative scaling; stratified sampling. 1
Analysis of Local or Asymmetric Dependencies in Contingency Tables using the Imprecise Dirichlet Model
- Zaffalon (Eds.), Proc. 3rd Int. Symp. on Imprecise Probabilities their Applications (ISIPTA ’03), Proceedings in Informatics, Vol. 18, Carleton Scientific
, 2003
"... We consider the statistical problem of analyzing the association between two categorical variables from cross-classified data. The focus is put on measures which enable one to study the dependencies at a local level and to assess whether the data support some more or less strong association model ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
We consider the statistical problem of analyzing the association between two categorical variables from cross-classified data. The focus is put on measures which enable one to study the dependencies at a local level and to assess whether the data support some more or less strong association model.
ON CHARACTERIZING DEPENDENCE IN JOINT DISTRIBUTIONS by
, 1967
"... Ways of characterizing the dependence of one random variable on another (or several others) are investigated. In particular, an index of dependence of X on Y is introduced which (i) always eXists, (ii) lies between zero and unity inclusive, (iii) is zero if and only if X and Yare independent, (iv) i ..."
Abstract
- Add to MetaCart
Ways of characterizing the dependence of one random variable on another (or several others) are investigated. In particular, an index of dependence of X on Y is introduced which (i) always eXists, (ii) lies between zero and unity inclusive, (iii) is zero if and only if X and Yare independent, (iv) is unity if X is a function of Y (and only if whenever X has finite variance), (v) may assume every value between zero and unity inclusive by varying the joint distribution but holding the marginal distributions fixed (assuming Y continuously distributed), (vi) is invariant under linear transformation of X and one-to-one transformation of Y, and (vii) equals kim whenever X and Yare sums of (non-degenerate) independent and identically distributed random variables Zl,Z2 ' •• • 'X being the sum of the first m ZI S and Y the sum of therirst k ZI S (m> k). When the correlation ratio eXists, its square cannot exceed the dependence index, and when (X,Y) is either bivariate normal or trinomial in distribution then the index equals the square of the correlation coefficient. The index is derived by first introducing and investigating a dependence characteristic, defined as the correlation ratio of exp(itX) on Y as a function of t. A correlation characteristic and index are also introduced. A brief survey of correlation and regression theory for complexvalued random variables is included. (No statistical aspects of dependence are considered). 1. Introduction. We
Institute of Statistics Mimeo Series No. 801 February 1972AN ANALYSIS FOR COHPOUNDED LOGARITHHIC ': " EXPONENTIAL LINEAR FUNCTIONS OF CATEGORICAL DATA
"... One area of application which has becom~ increasingly important to statisticians and other researchers is the analysis of categorical data. Often the principal ..."
Abstract
- Add to MetaCart
One area of application which has becom~ increasingly important to statisticians and other researchers is the analysis of categorical data. Often the principal
Reader BYUNG sao KIM. Studies of Multinomial Mixture Models
, 1984
"... (Under the direction of Barry H. Margolin) We investigate certain inferential aspects of mixtures of multinomial distributions, both in nonparametric and parametric contexts. As a nonparametric mixture model we propose a k-population finite mixture of binomial distributions, which can be applied to ..."
Abstract
- Add to MetaCart
(Under the direction of Barry H. Margolin) We investigate certain inferential aspects of mixtures of multinomial distributions, both in nonparametric and parametric contexts. As a nonparametric mixture model we propose a k-population finite mixture of binomial distributions, which can be applied to the analysis of noniid data generated from a series of toxicological experiments. A necessary and sufficient identifiability condition for the k-population finite mixture of binomials is obtained. The maximum likelihood estimates (MLE's) of the k-population finite mixture of binomials is computed via the EM algorithm (Dempster, Laird and Rubin, 1977), and the asymptotic properties of the MLE's are discussed. The identifiability condition is equivalent to the positive definiteness of the information matrix for the parameters. The MLE's and their sampling distributions, together with the data mentioned above, provide an empirical check of the statistical procedures proposed by Margolin, Kaplan and Zeiger (1981).
AUTOMATIC CLASSIFICATION
"... In this chapter I shall attempt to present a coherent account of classification in such a way that the principles involved will be sufficiently understood for anyone wishing to use classification techniques in IR to do so without too much difficulty. The emphasis will be ..."
Abstract
- Add to MetaCart
In this chapter I shall attempt to present a coherent account of classification in such a way that the principles involved will be sufficiently understood for anyone wishing to use classification techniques in IR to do so without too much difficulty. The emphasis will be
Suggested Citation
"... Kasprzyk for their helpful comments on earlier drafts of this paper. Clerical assistance was ..."
Abstract
- Add to MetaCart
Kasprzyk for their helpful comments on earlier drafts of this paper. Clerical assistance was
Prediction of Solvability Dependencies between Dichotomous Test Items: A Local Order-Theoretic Measure of Association
"... Summary. Solvability dependencies between dichotomous test items play an important role in the psychometric theory of knowledge spaces. Knowledge space theory (KST), based on hypothesized solvability dependencies between dichotomous items, has been successfully applied for the computerized, adaptive ..."
Abstract
- Add to MetaCart
Summary. Solvability dependencies between dichotomous test items play an important role in the psychometric theory of knowledge spaces. Knowledge space theory (KST), based on hypothesized solvability dependencies between dichotomous items, has been successfully applied for the computerized, adaptive assessment and training of knowledge. For instance, see the ALEKS system, a fully automated math tutor on the Internet:

