Results 1 - 10
of
34
Support Vector Clustering
, 2001
"... We present a novel clustering method using the approach of support vector machines. Data points are mapped by means of a Gaussian kernel to a high dimensional feature space, where we search for the minimal enclosing sphere. This sphere, when mapped back to data space, can separate into several compo ..."
Abstract
-
Cited by 124 (1 self)
- Add to MetaCart
We present a novel clustering method using the approach of support vector machines. Data points are mapped by means of a Gaussian kernel to a high dimensional feature space, where we search for the minimal enclosing sphere. This sphere, when mapped back to data space, can separate into several components, each enclosing a separate cluster of points. We present a simple algorithm for identifying these clusters. The width of the Gaussian kernel controls the scale at which the data is probed while the soft margin constant helps coping with outliers and overlapping clusters. The structure of a dataset is explored by varying the two parameters, maintaining a minimal number of support vectors to assure smooth cluster boundaries. We demonstrate the performance of our algorithm on several datasets.
Using relative novelty to identify useful temporal abstractions in reinforcement learning
- In Proceedings of the Twenty-First International Conference on Machine Learning
, 2004
"... We present a new method for automatically creating useful temporal abstractions in reinforcement learning. We argue that states that allow the agent to transition to a different region of the state space are useful subgoals, and propose a method for identifying them using the concept of relative nov ..."
Abstract
-
Cited by 51 (11 self)
- Add to MetaCart
We present a new method for automatically creating useful temporal abstractions in reinforcement learning. We argue that states that allow the agent to transition to a different region of the state space are useful subgoals, and propose a method for identifying them using the concept of relative novelty. When such a state is identified, a temporallyextended activity (e.g., an option) is generated that takes the agent efficiently to this state. We illustrate the utility of the method in a number of tasks. 1.
Content-Boosted Collaborative Filtering
- In Proceedings of the 2001 SIGIR Workshop on Recommender Systems
, 2001
"... Most recommender systems use Collaborative Filtering or Content-based methods to predict new items of interest for a user. While both methods have their own advantages, individually they fail to provide good recommendations in many situations. Incorporating components from both methods, a hybrid rec ..."
Abstract
-
Cited by 50 (0 self)
- Add to MetaCart
Most recommender systems use Collaborative Filtering or Content-based methods to predict new items of interest for a user. While both methods have their own advantages, individually they fail to provide good recommendations in many situations. Incorporating components from both methods, a hybrid recommender system can overcome these shortcomings. In this paper, we present an elegant and effective framework for combining content and collaboration. Our approach uses a content-based predictor to enhance existing user data, and then provides personalized suggestions through collaborative filtering. We present experimental results that show how this approach, Content-Boosted Collaborative Filtering, performs better than a pure content-based predictor, pure collaborative filter, and a naive hybrid approach. We also discuss methods to improve the performance of our hybrid system.
Enhanced Word Clustering for Hierarchical Text Classification
, 2002
"... In this paper we propose a new information-theoretic divisive algorithm for word clustering applied to text classification. In previous work, such "distributional clustering" of features has been found to achieve improvements over feature selection in terms of classification accuracy, especially at ..."
Abstract
-
Cited by 37 (1 self)
- Add to MetaCart
In this paper we propose a new information-theoretic divisive algorithm for word clustering applied to text classification. In previous work, such "distributional clustering" of features has been found to achieve improvements over feature selection in terms of classification accuracy, especially at lower number of features [2, 28]. However the existing clustering techniques are agglomerative in nature and result in (i) sub-optimal word clusters and (ii) high computational cost. In order to explicitly capture the optimality of word clusters in an information theoretic framework, we first derive a global criterion for feature clustering. We then present a fast, divisive algorithm that monotonically decreases this objective function value, thus converging to a local minimum. We show that our algorithm minimizes the "within-cluster Jensen-Shannon divergence" while simultaneously maximizing the "between-cluster Jensen-Shannon divergence". In comparison to the previously proposed agglomerative strategies our divisive algorithm achieves higher classification accuracy especially at lower number of features. We further show that feature clustering is an effective technique for building smaller class models in hierarchical classification. We present detailed experimental results using Naive Bayes and Support Vector Machines on the 20 Newsgroups data set and a 3-level hierarchy of HTML documents collected from Dmoz Open Directory.
A second-order perceptron algorithm
, 2005
"... Kernel-based linear-threshold algorithms, such as support vector machines and Perceptron-like algorithms, are among the best available techniques for solving pattern classification problems. In this paper, we describe an extension of the classical Perceptron algorithm, called second-order Perceptr ..."
Abstract
-
Cited by 34 (12 self)
- Add to MetaCart
Kernel-based linear-threshold algorithms, such as support vector machines and Perceptron-like algorithms, are among the best available techniques for solving pattern classification problems. In this paper, we describe an extension of the classical Perceptron algorithm, called second-order Perceptron, and analyze its performance within the mistake bound model of on-line learning. The bound achieved by our algorithm depends on the sensitivity to second-order data information and is the best known mistake bound for (efficient) kernel-based linear-threshold classifiers to date. This mistake bound, which strictly generalizes the well-known Perceptron bound, is expressed in terms of the eigenvalues of the empirical data correlation matrix and depends on a parameter controlling the sensitivity of the algorithm to the distribution of these eigenvalues. Since the optimal setting of this parameter is not known a priori, we also analyze two variants of the second-order Perceptron algorithm: one that adaptively sets the value of the parameter in terms of the number of mistakes made so far, and one that is parameterless, based on pseudoinverses.
Statistical Framework for Model-based Image Retrieval in . . .
, 2003
"... Recently, research in the field of content-based image retrieval has attracted a lot of attention. Nevertheless, most existing methods cannot be easily applied to medical image databases, as global image descriptions based on color, texture, or shape do not supply sufficient semantics for medical ap ..."
Abstract
-
Cited by 29 (9 self)
- Add to MetaCart
Recently, research in the field of content-based image retrieval has attracted a lot of attention. Nevertheless, most existing methods cannot be easily applied to medical image databases, as global image descriptions based on color, texture, or shape do not supply sufficient semantics for medical applications. The concept for content-based image retrieval in medical applications (IRMA) is therefore based on the separation of the following processing steps: categorization of the entire image; registration with respect to prototypes; extraction and query-dependent selection of local features; hierarchical blob representation including object identification; and finally, image retrieval. Within the first step of processing, images are classified according to image modality, body orientation, anatomic region, and biological system. The statistical classifier for the anatomic region is based on Gaussian kernel densities within a probabilistic framework for multiobject recognition. Special emphasis is placed on invariance, employing a probabilistic model of variability based on tangent distance and an image distortion model. The performance of the classifier is evaluated using a set of 1617 radiographs from daily routine, where the error rate of 8.0% in this six-class problem is an excellent result, taking into account the difficulty of the task. The computed posterior probabilities are furthermore used in the subsequent steps of the retrieval process.
Towards Principled Feature Selection: Relevancy, Filters and Wrappers
- in Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics
, 2003
"... In an influencial paper Kohavi and John [7] presented a number of disadvantages of the filter approach to the feature selection problem, steering research towards algorithms adopting the wrapper approach. We show here that neither approach is inherently better and that any practical feature selectio ..."
Abstract
-
Cited by 23 (9 self)
- Add to MetaCart
In an influencial paper Kohavi and John [7] presented a number of disadvantages of the filter approach to the feature selection problem, steering research towards algorithms adopting the wrapper approach. We show here that neither approach is inherently better and that any practical feature selection algorithm needs to at least consider the learner used for classification and the metric used for evaluating the learner's performance. In the process we formally define the feature selection problem, re-examine the relationship between relevancy and filter algorithms, and establish a connection between Kohavi and John's definition of relevancy to the Markov Blanket of a target variable in a Bayesian Network faithful to some data distribution.
Facial Asymmetry Quantification for Expression Invariant Human Identification
, 2002
"... We investigate facial asymmetry as a biometric under expression variation. For the first time, we have defined two types of quantified facial asymmetry measures that are easily computable from facial images and videos. Our findings show that the asymmetry measures of automatically selected facial re ..."
Abstract
-
Cited by 23 (10 self)
- Add to MetaCart
We investigate facial asymmetry as a biometric under expression variation. For the first time, we have defined two types of quantified facial asymmetry measures that are easily computable from facial images and videos. Our findings show that the asymmetry measures of automatically selected facial regions capture individual differences that are relatively stable to facial expression variations. More importantly, a synergy is achieved by combining facial asymmetry information with conventional EigenFace and FisherFace methods. We have assessed the generality of these ndings across two publicly available face databases: Using a random subset of 110 subjects from the FERET database, a 38% classification error reduction rate is obtained. Error reduction rates of 45% to 100% are achieved on 55 subjects from the Cohn-Kanade AU-coded facial expression database. These results suggest that facial asymmetry may provide complementary discriminative information to human identification methods, which has been missing in automatic human identification.
Robust Feature Selection by Mutual Information Distributions
- Proceedings of the 18th International Conference on Uncertainty in Artificial Intelligence (UAI-2002
, 2002
"... Mutual information is widely used in artificial intelligence, in a descriptive way, to measure the stochastic dependence of discrete random variables. In order to address questions such as the reliability of the empirical value, one must consider sample-to-population inferential approaches. This pap ..."
Abstract
-
Cited by 21 (6 self)
- Add to MetaCart
Mutual information is widely used in artificial intelligence, in a descriptive way, to measure the stochastic dependence of discrete random variables. In order to address questions such as the reliability of the empirical value, one must consider sample-to-population inferential approaches. This paper deals with the distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean and an analytical approximation of the variance are reported. Asymptotic approximations of the distribution are proposed. The results are applied to the problem of selecting features for incremental learning and classification of the naive Bayes classifier. A fast, newly defined method is shown to outperform the traditional approach based on empirical mutual information on a number of real data sets. Finally, a theoretical development is reported that allows one to efficiently extend the above methods to incomplete samples in an easy and effective way.

