Results 21  30
of
122
Knowledge transformation from word space to document space
 In Proc. of SIGIR’ 08
, 2008
"... In most IR clustering problems, we directly cluster the documents, working in the document space, using cosine similarity between documents as the similarity measure. In many realworld applications, however, we usually have knowledge on the word side and wish to transform this knowledge to the docu ..."
Abstract

Cited by 13 (5 self)
 Add to MetaCart
(Show Context)
In most IR clustering problems, we directly cluster the documents, working in the document space, using cosine similarity between documents as the similarity measure. In many realworld applications, however, we usually have knowledge on the word side and wish to transform this knowledge to the document (concept) side. In this paper, we provide a mechanism for this knowledge transformation. To the best of our knowledge, this is the first model for such type of knowledge transformation. This model uses a nonnegative matrix factorization model X = FSG T, where X is the worddocument semantic matrix, F is the posterior probability of a word belonging to a word cluster and represents knowledge in the word space, G is the posterior probability of a document belonging to a document cluster and represents knowledge in the document space, and S is a scaled matrix factor which provides a condensed view of X. We show how knowledge on words can improve document clustering, i.e, knowledge in the word space is transformed into the document space. We perform extensive experiments to validate our approach.
Classification with Partial Labels
"... In this paper, we address the problem of learning when some cases are fully labeled while other cases are only partially labeled, in the form of partial labels. Partial labels are represented as a set of possible labels for each training example, one of which is the correct label. We introduce a dis ..."
Abstract

Cited by 11 (0 self)
 Add to MetaCart
(Show Context)
In this paper, we address the problem of learning when some cases are fully labeled while other cases are only partially labeled, in the form of partial labels. Partial labels are represented as a set of possible labels for each training example, one of which is the correct label. We introduce a discriminative learning approach that incorporates partial label information into the conventional marginbased learning framework. The partial label learning problem is formulated as a convex quadratic optimization minimizing the L2norm regularized empirical risk using hinge loss. We also present an efficient algorithm for classification in the presence of partial labels. Experiments with different data sets show that partial label information improves the performance of classification when there is traditional fullylabeled data, and also yields reasonable performance in the absence of any fully labeled data.
Active learning using smooth relative regret approximations with applications (full version
 In arXiv:1110.2136
, 2012
"... The disagreement coefficient of Hanneke has become a central data independent invariant in proving active learning rates. It has been shown in various ways that a concept class with low complexity together with a bound on the disagreement coefficient at an optimal solution allows active learning rat ..."
Abstract

Cited by 10 (3 self)
 Add to MetaCart
(Show Context)
The disagreement coefficient of Hanneke has become a central data independent invariant in proving active learning rates. It has been shown in various ways that a concept class with low complexity together with a bound on the disagreement coefficient at an optimal solution allows active learning rates that are superior to passive learning ones. We present a different tool for pool based active learning which follows from the existence of a certain uniform version of low disagreement coefficient, but is not equivalent to it. In fact, we present two fundamental active learning problems of significant interest for which our approach allows nontrivial active learning bounds. However, any general purpose method relying on the disagreement coefficient bounds only fails to guarantee any useful bounds for these problems. The applications of interest are: Learning to rank from pairwise preferences, and clustering with side information (a.k.a. semisupervised clustering). The tool we use is based on the learner’s ability to compute an estimator of the difference between the loss of any hypothesis and some fixed “pivotal ” hypothesis to within an absolute error of at most ε times the disagreement measure (ℓ1 distance) between the two hypotheses. We prove that such an estimator implies the existence of a learning algorithm which, at each iteration, reduces its inclass excess risk to within a constant factor. Each iteration replaces the current pivotal hypothesis with the minimizer of the estimated loss difference function with respect to the previous pivotal hypothesis. The label complexity essentially becomes that of computing this estimator.
Learnable Similarity Functions and Their Applications to Clustering and Record Linkage
, 2004
"... rship (Xing et al. 2003), and relative comparisons (Schultz & Joachims 2004). These approaches have shown improvements over traditional similarity functions for different data types such as vectors in Euclidean space, strings, and database records composed of multiple text fields. While these in ..."
Abstract

Cited by 10 (0 self)
 Add to MetaCart
rship (Xing et al. 2003), and relative comparisons (Schultz & Joachims 2004). These approaches have shown improvements over traditional similarity functions for different data types such as vectors in Euclidean space, strings, and database records composed of multiple text fields. While these initial results are encouraging, there still remains a large number of similarity functions that are currently unable to adapt to a particular domain. In our research, we attempt to bridge this gap by developing both new learnable similarity functions and methods for their application to particular problems in machine learning and data mining. In preliminary work, we proposed two learnable similarity functions for strings that adapt distance computations given training pairs of equivalent and nonequivalent strings (Bilenko & Mooney 2003a). The first function is based on a probabilistic model of edit distance with affine gaps (Gus Copyright c # 2004, American Association for Artificial Intelli
Clustering with modellevel constraints
 SDM Conference
, 2005
"... In this paper we describe a systematic approach to uncovering multiple clusterings underlying a dataset. In contrast to previous approaches, the proposed method uses information about structures that are not desired and consequently is very useful in an exploratory datamining setting. Specifically, ..."
Abstract

Cited by 9 (1 self)
 Add to MetaCart
In this paper we describe a systematic approach to uncovering multiple clusterings underlying a dataset. In contrast to previous approaches, the proposed method uses information about structures that are not desired and consequently is very useful in an exploratory datamining setting. Specifically, the problem is formulated as constrained modelbased clustering where the constraints are placed at a modellevel. Two variants of an EM algorithm, for this constrained model, are derived. The performance of both variants is compared against a stateoftheart information bottleneck algorithm on both synthetic and real datasets. 1
Learning from Noisy Side Information by Generalized Maximum Entropy Model
"... We consider the problem of learning from noisy side information in the form of pairwise constraints. Although many algorithms have been developed to learn from side information, most of them assume perfect pairwise constraints. Given the pairwise constraints are often extracted from data sources suc ..."
Abstract

Cited by 8 (4 self)
 Add to MetaCart
(Show Context)
We consider the problem of learning from noisy side information in the form of pairwise constraints. Although many algorithms have been developed to learn from side information, most of them assume perfect pairwise constraints. Given the pairwise constraints are often extracted from data sources such as paper citations, they tend to be noisy and inaccurate. In this paper, we introduce the generalization of maximum entropy model and propose a framework for learning from noisy side information based on the generalized maximum entropy model. The theoretic analysis shows that under certain assumption, theclassificationmodeltrainedfromthe noisy side information can be very close to theonetrainedfromthe perfectsideinformation. Extensive empirical studies verify the effectiveness of the proposed framework. 1.
Kmeans with Large and Noisy Constraint Sets
"... Abstract. We focus on the problem of clustering with soft instancelevel constraints. Recently, the CVQE algorithm was proposed in this context. It modifies the objective function of traditional Kmeans to include penalties for violated constraints. CVQE was shown to efficiently produce highquality ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We focus on the problem of clustering with soft instancelevel constraints. Recently, the CVQE algorithm was proposed in this context. It modifies the objective function of traditional Kmeans to include penalties for violated constraints. CVQE was shown to efficiently produce highquality clustering of UCI data. In this work, we examine the properties of CVQE and propose a modification that results in a more intuitive objective function, with lower computational complexity. We present our extensive experimentation, which provides insight into CVQE and shows that our new variant can dramatically improve clustering quality while reducing run time. We show its superiority in a largescale surveillance scenario with noisy constraints. 1
Supervised Clustering: Algorithms and Application
 Suffolk University Law Review
, 2005
"... This work centers on a novel data mining technique we term supervised clustering. Unlike traditional clustering, supervised clustering assumes that the examples are classified and has the goal of identifying classuniform clusters that have high probability densities. Three representative–based algo ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
(Show Context)
This work centers on a novel data mining technique we term supervised clustering. Unlike traditional clustering, supervised clustering assumes that the examples are classified and has the goal of identifying classuniform clusters that have high probability densities. Three representative–based algorithms for supervised clustering are introduced: two greedy algorithms SRIDHCR and SPAM, and an evolutionary computing algorithm named SCEC. The three algorithms were evaluated using a benchmark consisting of UCI machine learning datasets. Study of the solution landscape for the fitness function used by supervised clustering shows that the landscape seems to have a “Canyonland ” shape, thereby, increasing the difficulty of the clustering task for the greedy algorithms. Furthermore, we introduce a technique for class decomposition and demonstrate with experimental results how it could enhance the performance of simple classifiers. We, also, present a dataset editing technique, we call supervised clustering editing (SCE), which replaces examples of a learned cluster by the cluster representative. Our experimental results demonstrate how dataset editing techniques in general and SCE technique in particular enhance the performance of NN classifiers. Other potential applications of supervised clustering such as summary generation, discovery of interesting regions in spatial databases, and distance function learning are discussed as well 1.
Penalized probabilistic clustering
 In Advances in Neural Information Processing Systems 17
, 2005
"... While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to influence cluster assignments of outofsample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on Gaussian mixture models (GMM) of the data distribution. We express clustering preferences in the prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. We fit the model parameters with EM. Experiments on a variety of data sets show that PPC can consistently improve clustering results. 1