Results 1  10
of
119
Kernel Conditional Random Fields: Representation and Clique Selection
 IN ICML
, 2004
"... Kernel conditional random fields (KCRFs) are introduced as a framework for discriminative modeling of graphstructured data. A representer theorem for conditional graphical models is given which shows how kernel conditional random fields arise from risk minimization procedures defined using Me ..."
Abstract

Cited by 96 (5 self)
 Add to MetaCart
(Show Context)
Kernel conditional random fields (KCRFs) are introduced as a framework for discriminative modeling of graphstructured data. A representer theorem for conditional graphical models is given which shows how kernel conditional random fields arise from risk minimization procedures defined using Mercer kernels on labeled graphs. A procedure for greedily selecting cliques in the dual representation is then proposed, which allows sparse representations. By incorporating kernels and implicit feature spaces into conditional graphical models, the framework enables semisupervised learning algorithms for structured data through the use of graph kernels.
Discriminative batch mode active learning
, 2007
"... Active learning sequentially selects unlabeled instances to label with the goal of reducing the effort needed to learn a good classifier. Most previous studies in active learning have focused on selecting one unlabeled instance to label at a time while retraining in each iteration. Recently a few ba ..."
Abstract

Cited by 45 (2 self)
 Add to MetaCart
(Show Context)
Active learning sequentially selects unlabeled instances to label with the goal of reducing the effort needed to learn a good classifier. Most previous studies in active learning have focused on selecting one unlabeled instance to label at a time while retraining in each iteration. Recently a few batch mode active learning approaches have been proposed that select a set of most informative unlabeled instances in each iteration under the guidance of heuristic scores. In this paper, we propose a discriminative batch mode active learning approach that formulates the instance selection task as a continuous optimization problem over auxiliary instance selection variables. The optimization is formulated to maximize the discriminative classification performance of the target classifier, while also taking the unlabeled data into account. Although the objective is not convex, we can manipulate a quasiNewton method to obtain a good local solution. Our empirical studies on UCI datasets show that the proposed active learning is more effective than current stateofthe art batch mode active learning algorithms. 1
Making Logistic Regression A Core Data Mining Tool: A Practical Investigation of Accuracy, Speed, and Simplicity
, 2004
"... Binary classification is a core data mining task. For large datasets or realtime applications, desirable classifiers are accurate, fast, and need no parameter tuning. We present a simple implementation of logistic regression that meets these requirements. A combination of regularization, truncated ..."
Abstract

Cited by 40 (0 self)
 Add to MetaCart
(Show Context)
Binary classification is a core data mining task. For large datasets or realtime applications, desirable classifiers are accurate, fast, and need no parameter tuning. We present a simple implementation of logistic regression that meets these requirements. A combination of regularization, truncated Newton methods, and iteratively reweighted least squares make it faster and more accurate than modern SVM implementations, and relatively insensitive to parameters. It is robust to linear dependencies and some scaling problems, making most data preprocessing unnecessary. 1
A Discriminative Learning Framework with Pairwise Constraints for Video Object Classification
 In Proc. of CVPR
, 2004
"... In video object classification, insufficient labeled data may at times be easily augmented with pairwise constraints on sample points, i.e, whether they are in the same class or not. In this paper, we proposed a discriminative learning approach which incorporates pairwise constraints into a conventi ..."
Abstract

Cited by 38 (5 self)
 Add to MetaCart
(Show Context)
In video object classification, insufficient labeled data may at times be easily augmented with pairwise constraints on sample points, i.e, whether they are in the same class or not. In this paper, we proposed a discriminative learning approach which incorporates pairwise constraints into a conventional marginbased learning framework. The proposed approach offers several advantages over existing approaches dealing with pairwise constraints. First, as opposed to learning distance metrics, the new approach derives its classification power by directly modeling the decision boundary. Second, most previous work handles labeled data by converting them to pairwise constraints and thus leads to much more computation. The proposed approach can handle pairwise constraints together with labeled data so that the computation is greatly reduced. The proposed approach is evaluated on a people classification task with two surveillance video datasets.
Exponentiated gradient algorithms for loglinear structured prediction
 In Proc. ICML, 2007
"... Conditional loglinear models are a commonly used method for structured prediction. Efficient learning of parameters in these models is therefore an important problem. This paper describes an exponentiated gradient (EG) algorithm for training such models. EG is applied to the convex dual of the maxi ..."
Abstract

Cited by 32 (5 self)
 Add to MetaCart
(Show Context)
Conditional loglinear models are a commonly used method for structured prediction. Efficient learning of parameters in these models is therefore an important problem. This paper describes an exponentiated gradient (EG) algorithm for training such models. EG is applied to the convex dual of the maximum likelihood objective; this results in both sequential and parallel update algorithms, where in the sequential algorithm parameters are updated in an online fashion. We provide a convergence proof for both algorithms. Our analysis also simplifies previous results on EG for maxmargin models, and leads to a tighter bound on convergence rates. Experiments on a largescale parsing task show that the proposed algorithm converges much faster than conjugategradient and LBFGS approaches both in terms of optimization objective and test error. 1.
Learning the unified kernel machines for classification
 In Proc. KDD
, 2006
"... Kernel machines have been shown as the stateoftheart learning techniques for classification. In this paper, we propose a novel general framework of learning the Unified Kernel Machines (UKM) from both labeled and unlabeled data. Our proposed framework integrates supervised learning, semisupervise ..."
Abstract

Cited by 31 (13 self)
 Add to MetaCart
(Show Context)
Kernel machines have been shown as the stateoftheart learning techniques for classification. In this paper, we propose a novel general framework of learning the Unified Kernel Machines (UKM) from both labeled and unlabeled data. Our proposed framework integrates supervised learning, semisupervised kernel learning, and active learning in a unified solution. In the suggested framework, we particularly focus our attention on designing a new semisupervised kernel learning method, i.e., Spectral Kernel Learning (SKL), which is built on the principles of kernel target alignment and unsupervised kernel design. Our algorithm is related to an equivalent quadratic programming problem that can be efficiently solved. Empirical results have shown that our method is more effective and robust to learn the semisupervised kernels than traditional approaches. Based on the framework, we present a specific paradigm of unified kernel machines with respect to Kernel Logistic Regresions (KLR), i.e., Unified Kernel Logistic Regression (UKLR). We evaluate our proposed UKLR classification scheme in comparison with traditional solutions. The promising results show that our proposed UKLR paradigm is more effective than the traditional classification approaches.
L.: Face liveness detection from a single image with sparse low rank bilinear discriminative model. Computer Vision–ECCV
, 2010
"... Abstract. Spoofing with photograph or video is one of the most commonmanner to circumvent a face recognition system. In this paper, we present a realtime and nonintrusive method to address this based on individual images from a generic webcamera. The task is formulated as a binary classification p ..."
Abstract

Cited by 30 (1 self)
 Add to MetaCart
(Show Context)
Abstract. Spoofing with photograph or video is one of the most commonmanner to circumvent a face recognition system. In this paper, we present a realtime and nonintrusive method to address this based on individual images from a generic webcamera. The task is formulated as a binary classification problem, in which, however, the distribution of positive and negative are largely overlapping in the input space, and a suitable representation space is hence of importance. Using the Lambertian model, we propose two strategies to extract the essential information about different surface properties of a live human face or a photograph, in terms of latent samples. Based on these, we develop two new extensions to the sparse logistic regression model which allow quick and accurate spoof detection. Primary experiments on a large photo imposter database show that the proposed method gives preferable detection performance compared to others. 1
A Convex Optimization Approach to Modeling Consumer Heterogeneity in Conjoint Estimation
"... We propose and test a new approach for modeling consumer heterogeneity in conjoint estimation, which extends individuallevel methods based on convex optimization and statistical machine learning. We develop methods both for metric and choice data. Like HB, our methods shrink individuallevel partwo ..."
Abstract

Cited by 23 (5 self)
 Add to MetaCart
We propose and test a new approach for modeling consumer heterogeneity in conjoint estimation, which extends individuallevel methods based on convex optimization and statistical machine learning. We develop methods both for metric and choice data. Like HB, our methods shrink individuallevel partworth estimates towards a population mean. However, while HB samples from a posterior distribution that depends on exogenous parameters (the parameters of the secondstage priors), we minimize a convex loss function that depends on an endogenous parameter (determined from the calibration data using crossvalidation). As a result, the amounts of shrinkage differ between the two approaches, leading to different estimation accuracies. Comparisons based on simulations as well as empirical data sets suggest that the new approach overall outperforms standard HB (i.e., with relatively diffuse secondstage priors) both with metric and choice data.
Fast algorithms for large scale conditional 3D prediction
 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR
, 2008
"... The potential success of discriminative learning approaches to 3D reconstruction relies on the ability to efficiently train predictive algorithms using sufficiently many examples that are representative of the typical configurations encountered in the application domain. Recent research indicates th ..."
Abstract

Cited by 23 (4 self)
 Add to MetaCart
(Show Context)
The potential success of discriminative learning approaches to 3D reconstruction relies on the ability to efficiently train predictive algorithms using sufficiently many examples that are representative of the typical configurations encountered in the application domain. Recent research indicates that sparse conditional Bayesian Mixture of Experts (cMoE) models (e.g. BME [21]) are adequate modeling tools that not only provide contextual 3D predictions for problems like human pose reconstruction, but can also represent multiple interpretations that result from depth ambiguities or occlusion. However, training conditional predictors requires sophisticated doubleloop algorithms that scale unfavorably with the input dimension and the training set size, thus limiting their usage to 10,000 examples of less, so far. In this paper we present largescale algorithms, referred to as f BME, that combine forward feature selection and bound optimization in order to train probabilistic, BME models, with one order of magnitude more data (100,000 examples and up) and more than one order of magnitude faster. We present several large scale experiments, including monocular evaluation on the HumanEva dataset [19], demonstrating how the proposed methods overcome the scaling limitations of existing ones. 1.