Results 1 - 10
of
26
Non-metric affinity propagation for unsupervised image categorization
"... Unsupervised categorization of images or image parts is often needed for image and video summarization or as a preprocessing step in supervised methods for classification, tracking and segmentation. While many metric-based techniques have been applied to this problem in the vision community, often, ..."
Abstract
-
Cited by 52 (2 self)
- Add to MetaCart
(Show Context)
Unsupervised categorization of images or image parts is often needed for image and video summarization or as a preprocessing step in supervised methods for classification, tracking and segmentation. While many metric-based techniques have been applied to this problem in the vision community, often, the most natural measures of similarity (e.g., number of matching SIFT features) between pairs of images or image parts is non-metric. Unsupervised categorization by identifying a subset of representative exemplars can be efficiently performed with the recently-proposed ‘affinity propagation ’ algorithm. In contrast to k-centers clustering, which iteratively refines an initial randomly-chosen set of exemplars, affinity propagation simultaneously considers all data points as potential exemplars and iteratively exchanges messages between data points until a good solution emerges. When applied to the Olivetti face data set using a translation-invariant non-metric similarity, affinity propagation achieves a much lower reconstruction error and nearly halves the classification error rate, compared to state-of-the-art techniques. For the more challenging problem of unsupervised categorization of images from the Caltech101 data set, we derived non-metric similarities between pairs of images by matching SIFT features. Affinity propagation successfully identifies meaningful categories, which provide a natural summarization of the training images and can be used to classify new input images. 1.
Can We Understand van Gogh’s Mood? Learning to Infer Affects from Images in Social Networks
"... Can we understand van Gogh’s mood from his artworks? For many years, people have tried to capture van Gogh’s affects from his artworks so as to understand the essential meaning behind the images and catch on why van Gogh created these works. In this paper, we study the problem of inferring affects f ..."
Abstract
-
Cited by 14 (7 self)
- Add to MetaCart
(Show Context)
Can we understand van Gogh’s mood from his artworks? For many years, people have tried to capture van Gogh’s affects from his artworks so as to understand the essential meaning behind the images and catch on why van Gogh created these works. In this paper, we study the problem of inferring affects from images in social networks. In particular, we aim to answer: What are the fundamental features that reflect the affects of the authors in images? How the social network information can be leveraged to help detect these affects? We propose a semi-supervised framework to formulate the problem into a factor graph model. Experiments on 20,000 random-download Flickr images show that our method can achieve a precision of 49 % with a recall of 24 % on inferring authors ’ affects into 16 categories. Finally, we demonstrate the effectiveness of the proposed method on automatically understanding van Gogh’s Mood from his artworks, and inferring the trend of public affects around special event.
Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery
"... Given pairwise dissimilarities between data points, we consider the problem of finding a subset of data points, called representatives or exemplars, that can efficiently describe the data collection. We formulate the problem as a row-sparsity regularized trace minimization problem that can be solved ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
(Show Context)
Given pairwise dissimilarities between data points, we consider the problem of finding a subset of data points, called representatives or exemplars, that can efficiently describe the data collection. We formulate the problem as a row-sparsity regularized trace minimization problem that can be solved efficiently using convex programming. The solution of the proposed optimization program finds the representatives and the probability that each data point is associated with each one of the representatives. We obtain the range of the regularization parameter for which the solution of the proposed optimization program changes from selecting one representative for all data points to selecting all data points as representatives. When data points are distributed around multiple clusters according to the dissimilarities, we show that the data points in each cluster select representatives only from that cluster. Unlike metric-based methods, our algorithm can be applied to dissimilarities that are asymmetric or violate the triangle inequality, i.e., it does not require that the pairwise dissimilarities come from a metric. We demonstrate the effectiveness of the proposed algorithm on synthetic data as well as real-world image and text data. 1
Improving Personalization Solutions through Optimal Segmentation of Customer Bases
- in ICDM. 2006
"... On the Web, where the search costs are low and the competition is just a mouse click away, it is crucial to segment the customers intelligently in order to offer more targeted and personalized products and services to them. Traditionally, customer segmentation is achieved using statistics-based meth ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
(Show Context)
On the Web, where the search costs are low and the competition is just a mouse click away, it is crucial to segment the customers intelligently in order to offer more targeted and personalized products and services to them. Traditionally, customer segmentation is achieved using statistics-based methods that compute a set of statistics from the customer data and group customers into segments by applying distance-based clustering algorithms in the space of these statistics. In this paper, we present a direct grouping based approach to computing customer segments that groups customers not based on computed statistics, but in terms of optimally combining transactional data of several customers to build a data mining model of customer behavior for each group. Then building customer segments becomes a combinatorial optimization problem of finding the best partitioning of the customer base into disjoint groups. The paper shows that finding an optimal customer partition is NP-hard, proposes a suboptimal direct grouping segmentation method and empirically compares it against traditional statistics-based segmentation and 1-to-1 methods across multiple experimental conditions. We show that the direct grouping method significantly dominates the statistics-based and 1-to-1 approaches across all the experimental conditions, while still being computationally tractable. We also show that there are very few size-one customer segments generated by the best direct grouping method and that micro-segmentation provides the best approach to personalization. Index Terms – customer segmentation, marketing application, personalization, 1-to-1 marketing, customer profiles 1.
AFFINITY PROPAGATION: CLUSTERING DATA BY PASSING MESSAGES
, 2009
"... Clustering data by identifying a subset of representative examples is important for detecting patterns in data and in processing sensory signals. Such “exemplars ” can be found by randomly choosing an initial subset of data points as exemplars and then iteratively refining it, but this works well on ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
(Show Context)
Clustering data by identifying a subset of representative examples is important for detecting patterns in data and in processing sensory signals. Such “exemplars ” can be found by randomly choosing an initial subset of data points as exemplars and then iteratively refining it, but this works well only if that initial choice is close to a good solution. This thesis describes a method called “affinity propagation ” that simultaneously considers all data points as potential exemplars, exchanging real-valued messages between data points until a high-quality set of exemplars and corresponding clusters gradually emerges. Affinity propagation takes as input a set of pairwise similarities between data points and finds clusters on the basis of maximizing the total similarity between data points and their exemplars. Similarity can be simply defined as negative squared Euclidean distance for compatibility with other algorithms, or it can incorporate richer domain-specific models (e.g., translation-invariant distances for comparing images). Affinity propagation’s computational and memory requirements scale linearly with the number of similarities input; for non-sparse problems where all possible similarities are computed, these requirements scale quadratically with the number of data points. Affinity propagation is demonstrated on several applications
A Fuzzy-Statistics-Based Affinity Propagation Technique for Clustering in Multispectral Images
- IEEE Transactions on Geoscience and Remote Sensing
, 2010
"... Abstract—Due to a high number of spectral channels and a large information quantity, multispectral remote-sensing images are difficult to be classified with high accuracy and efficiency by conventional classification methods, particularly when training data are not available and when unsupervised cl ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
(Show Context)
Abstract—Due to a high number of spectral channels and a large information quantity, multispectral remote-sensing images are difficult to be classified with high accuracy and efficiency by conventional classification methods, particularly when training data are not available and when unsupervised clustering tech-niques should be considered for data analysis. In this paper, we propose a novel image clustering method [called fuzzy-statistics-based affinity propagation (FS-AP)] which is based on a fuzzy statistical similarity measure (FSS) to extract land-cover information in multispectral imagery. AP is a clustering algorithm proposed recently in the literature, which exhibits a fast execution speed and finds clusters with small error, particularly for large datasets. FSS can get objective estimates of how closely two pixel vectors resemble each other. The proposed method simultaneously considers all data points to be equally suitable as initial exemplars, thus reducing the dependence of the final clustering from the ini-tialization. Results obtained on three kinds of multispectral images (Landsat-7 ETM+, Quickbird, and moderate resolution imaging spectroradiometer) by comparing the proposed technique with K-means, fuzzy K-means, and AP based on Euclidean distance (ED-AP) demonstrate the good efficiency and high accuracy of FS-AP. Index Terms—Affinity propagation (AP), clustering, fuzzy clus-tering, fuzzy sets, fuzzy statistical similarity measure (FSS), image classification, unsupervised classification. I.
Clustering for Data Reduction: A Divide and Conquer Approach
, 2007
"... We consider the problem of reducing a potentially very large dataset to a subset of representative prototypes. Rather than searching over the entire space of prototypes, we first roughly divide the data into balanced clusters using bisecting k-means and spectral cuts, and then find the prototypes fo ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
We consider the problem of reducing a potentially very large dataset to a subset of representative prototypes. Rather than searching over the entire space of prototypes, we first roughly divide the data into balanced clusters using bisecting k-means and spectral cuts, and then find the prototypes for each cluster by affinity propagation. We apply our algorithm to text data, where we perform an order of magnitude faster than simply looking for prototypes on the entire dataset. Furthermore, our “divide and conquer ” approach actually performs more accurately on datasets which are well bisected, as the greedy decisions of affinity propagation are confined to classes of already similar items. 1 Introduction. We consider the problem of data reduction, whereby we want to reduce our original data to a smaller but representative subset. This is related to feature selection in dimensionality reduction tasks, where we are looking for an m < n, where m << n. A reduced dataset might have a direct interpretation. For instance, the objects in question might be sentences, and resulting prototypes would span only the most essential
Dynamic Micro Targeting: Fitness-Based Approach to Predicting Individual Preferences
- In ICDM
, 2007
"... Customer segmentation, such as customer grouping by the level of family income, education, or any other demographic variable, is considered as one of the standard techniques used by marketers for a long time [10]. Its popularity comes from the fact that segmented models usually outperform aggregated ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Customer segmentation, such as customer grouping by the level of family income, education, or any other demographic variable, is considered as one of the standard techniques used by marketers for a long time [10]. Its popularity comes from the fact that segmented models usually outperform aggregated models of customer behavior [11]. More recently, there has been much interest in the marketing and data mining communities in learning individual models of customer behavior within the context of 1-to-1 marketing [9] and personalization [4], when models of customer behavior are learned from the data pertaining only to a particular customer. These learned individualized models of customer behavior are stored as parts of customer profiles and are subsequently used for recommending and delivering personalized products and services to the customers [2]. As was shown in [7], it is a non-trivial problem to compare segmented and individual customer models because of the tradeoff between the sparsity of data (bias) for individual customer models and customer heterogeneity (variance) in aggregate models: individual models may suffer from sparse data, while aggregate models suffer from high levels of customer heterogeneity. A typical approach to customer segmentation is based on the statistics-based approach that computes the set of statistics from customer’s demographic and transactional data [3, 7, 12], such as the average time it
Exemplar-based Robust Coherent Biclustering
"... The biclustering, co-clustering, or subspace clustering problem involves simultaneously grouping the rows and columns of a data matrix to uncover biclusters or sub-matrices of the data matrix that optimize a desired objective function. In coherent biclustering, the objective function contains a cohe ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
The biclustering, co-clustering, or subspace clustering problem involves simultaneously grouping the rows and columns of a data matrix to uncover biclusters or sub-matrices of the data matrix that optimize a desired objective function. In coherent biclustering, the objective function contains a coherence measure of the biclusters. We introduce a novel formulation of the coherent biclustering problem and use it to derive two algorithms. The first algorithm is based on loopy message passing; and the second relies on a greedy strategy yielding an algorithm that is significantly faster than the first. A distinguishing feature of these algorithms is that they identify an exemplar or a prototypical member of each bi-cluster. We note the interference from background elements in bi-clustering, and offer a means to circumvent such interference using additional regularization. Our experiments with synthetic as well as real-world datasets show that our algorithms are competitive with the current stateof-the-art algorithms for finding coherent bi-clusters. 1
Novel Methods to Elucidate Core Classes in Multi-Dimensional Biomedical Data
"... Breast cancer, which is the most common cancer in women, is a complex disease characterised by multiple molecular alterations. Current routine clinical management relies on availability of robust clinical and pathologic prognostic and predictive factors, like the Nottingham Prognostic Index, to supp ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
(Show Context)
Breast cancer, which is the most common cancer in women, is a complex disease characterised by multiple molecular alterations. Current routine clinical management relies on availability of robust clinical and pathologic prognostic and predictive factors, like the Nottingham Prognostic Index, to support decision making. Recent advances in highthroughput molecular technologies supported the evidence of a biologic heterogeneity of breast cancer. This thesis is a multi-disciplinary work involving both computer scientists and molecular pathologists. It focuses on the development of advanced computational models for the classification of breast cancer into sub-types of the disease based on protein expression levels of selected markers. In a previous study conducted at the University of Nottingham, it has been suggested that immunohistochemical analysis may be used to identify distinct biological classes of breast cancer. The objectives of this work were related both to the clinical and technical aspects. From a clinical point of view, the aim was to encourage a multiple techniques approach