## Self-taught Clustering

### Cached

### Download Links

Citations: | 19 - 5 self |

### BibTeX

@MISC{Dai_self-taughtclustering,

author = {Wenyuan Dai and Qiang Yang and Gui-rong Xue and Yong Yu},

title = {Self-taught Clustering},

year = {}

}

### OpenURL

### Abstract

This paper focuses on a new clustering task, called self-taught clustering. Self-taught clustering is an instance of unsupervised transfer learning, which aims at clustering a small collection of target unlabeled data with the help of a large amount of auxiliary unlabeled data. The target and auxiliary data can be different in topic distribution. We show that even when the target data are not sufficient to allow effective learning of a high quality feature representation, it is possible to learn the useful features with the help of the auxiliary data on which the target data can be clustered effectively. We propose a co-clustering based self-taught clustering algorithm to tackle this problem, by clustering the target and auxiliary data simultaneously to allow the feature representation from the auxiliary data to influence the target data through a common set of features. Under the new data representation, clustering on the target data can be improved. Our experiments on image clustering show that our algorithm can greatly outperform several state-of-the-art clustering methods when utilizing irrelevant unlabeled auxiliary data. 1.

### Citations

8550 |
Elements of Information Theory
- Cover, Thomas
- 1991
(Show Context)
Citation Context ...get data X and their feature space Z for illustration, the objective function can be expressed as I(X,Z) − I( ˜ X, ˜ Z), (2) where I(·; ·) denotes the mutual information between two random variables (=-=Cover & Thomas, 1991-=-) that I(X;Z) = � � p(x,z) x∈X z∈Z p(x,z)log p(x)p(z) . Moreover, I( ˜ X, ˜ Z) corresponds to the joint probability distribution p( ˜ X, ˜ Z) which is defined as p(˜x, ˜z) = � � p(x,z). (3) x∈˜x z∈˜z... |

5086 | Distinctive image features from Scale-Invariant keypoints
- Lowe
- 2004
(Show Context)
Citation Context ...ed data. For data preprocessing, we used the “bag-of-words” method (Li & Perona, 2005) to represent images in our experiments. Interesting points in images are found and described by SIFT descriptor (=-=Lowe, 2004-=-). Then, we clustered all the interesting points to get the codebook, and set the number of clusters to 800. Using this codebook, each image can be represented as a vector in the subsequent learning p... |

2138 |
Dubes. Algorithms for Clustering Data
- Jain, C
- 1988
(Show Context)
Citation Context ...eriments on image clustering show that our algorithm can greatly outperform several state-of-the-art clustering methods when utilizing irrelevant unlabeled auxiliary data. 1. Introduction Clustering (=-=Jain & Dubes, 1988-=-) aims at partitioning objects into groups, so that the objects in the same groups are relatively similar, while the objects in different groups are relatively dissimilar. Clustering has a long histor... |

1848 | Some methods for classification and analysis of multivariate observations - MacQueen - 1967 |

540 | A Bayesian hierarchical model for learning natural scene categories
- Fei-Fei, Perona
- 2005
(Show Context)
Citation Context ...ta from the corresponding categories as target unlabeled data, while the data from the remaining categories as the auxiliary unlabeled data. For data preprocessing, we used the “bag-of-words” method (=-=Li & Perona, 2005-=-) to represent images in our experiments. Interesting points in images are found and described by SIFT descriptor (Lowe, 2004). Then, we clustered all the interesting points to get the codebook, and s... |

466 | Multitask learning
- Caruana
- 1997
(Show Context)
Citation Context ...tance of transfer learning, which makes use of knowledge gained from one learning task to improve the performance of another, even when these learning tasks or domains follow different distributions (=-=Caruana, 1997-=-). However, since all the data are unlabeled, we can consider it as an instance of unsupervised transfer learning (Teh et al., 2006). This unsupervised transfer learning problem could also be viewed a... |

326 | Constrained k-means clustering with background knowledge
- Wagstaff, Cardie, et al.
- 1999
(Show Context)
Citation Context ... Copyright 2008 by the author(s)/owner(s). 1967), and recent works on clustering research have focused on improving the clustering performance using the prior knowledge in semi-supervised clustering (=-=Wagstaff et al., 2001-=-) and supervised clustering (Finley & Joachims, 2005). In the past, semi-supervised clustering incorporates pairwise supervision, such as must-link or cannot-link constraints (Wagstaff et al., 2001), ... |

249 | Information-theoretic co-clustering, in
- Dhillon, Mallela, et al.
(Show Context)
Citation Context ...“metal”. In this situation, the auxiliary data can be used to help uncover a better data representation to benefit the target data set. Our approach to tackling this problem is by using coclustering (=-=Dhillon et al., 2003-=-), so that the commonality can be found in the feature spaces that corresponds to similar semantic meanings. In our solution to the self-taught clustering problem, two clustering operations, on the ta... |

245 |
Caltech-256 object category dataset
- Griffin, Holub, et al.
(Show Context)
Citation Context ...tering algorithm STC on the image clustering tasks, and show effectiveness of STC. 4.1. Data Sets We conduct our experiments on eight clustering tasks generated based on the Caltech-256 image corpus (=-=Griffin et al., 2007-=-). There are a total of 256 categories in the Caltech-256 data set, where we randomly chose 20 categories from this corpus. For each category, 70 images are randomly selected to form our clustering ta... |

184 | Self-taught learning: Transfer learning from unlabeled data
- Raina, Battle, et al.
- 2007
(Show Context)
Citation Context ... consider it as an instance of unsupervised transfer learning (Teh et al., 2006). This unsupervised transfer learning problem could also be viewed as a clustering version of the self-taught learning (=-=Raina et al., 2007-=-), which uses irrelevant unlabeled data to help supervised learning. Thus, we refer to our problem as self-taught clusterings(a) diamond (b) platinum (c) ring (d) titanium Figure 1. Example for common... |

183 | A probabilistic framework for semi-supervised clustering - Basu, Bilenko, et al. |

144 | Semi-supervised clustering by seeding - Basu, Banerjee, et al. - 2002 |

86 | Constructing informative priors using transfer learning - Raina, Ng, et al. - 2006 |

65 | Improving SVM accuracy by training on auxiliary data sources - Wu, Dietterich - 2004 |

58 | Supervised clustering with support vector machines
- Finley, Joachims
- 2005
(Show Context)
Citation Context ...and recent works on clustering research have focused on improving the clustering performance using the prior knowledge in semi-supervised clustering (Wagstaff et al., 2001) and supervised clustering (=-=Finley & Joachims, 2005-=-). In the past, semi-supervised clustering incorporates pairwise supervision, such as must-link or cannot-link constraints (Wagstaff et al., 2001), to bias clustering results. Supervised clustering me... |

26 | A Bayesian model for supervised clustering with the Dirichlet process prior - Daume, Marcu - 2005 |

11 |
S.: Intractability and clustering with constraints
- Davidson, Ravi
(Show Context)
Citation Context ...-taught Clustering Markov random fields that combines the constraints and clustering distortion measures in a general framework. Recent semi-supervised clustering works include (Nelson & Cohen, 2007; =-=Davidson & Ravi, 2007-=-). Supervised clustering is another branch of work designed to improve clustering performance with the help of a collection of auxiliary labeled data. To address the supervised clustering problem, Fin... |

5 | Revisiting probabilistic models for clustering with constraints
- Nelson, Cohen
- 2007
(Show Context)
Citation Context ...k based on hidden Self-taught Clustering Markov random fields that combines the constraints and clustering distortion measures in a general framework. Recent semi-supervised clustering works include (=-=Nelson & Cohen, 2007-=-; Davidson & Ravi, 2007). Supervised clustering is another branch of work designed to improve clustering performance with the help of a collection of auxiliary labeled data. To address the supervised ... |