Results 1  10
of
66
Kernel Conditional Random Fields: Representation and Clique Selection
 in ICML
, 2004
"... Kernel conditional random fields (KCRFs) are introduced as a framework for discriminative modeling of graphstructured data. A representer theorem for conditional graphical models is given which shows how kernel conditional random fields arise from risk minimization procedures defined using Me ..."
Abstract

Cited by 73 (4 self)
 Add to MetaCart
Kernel conditional random fields (KCRFs) are introduced as a framework for discriminative modeling of graphstructured data. A representer theorem for conditional graphical models is given which shows how kernel conditional random fields arise from risk minimization procedures defined using Mercer kernels on labeled graphs. A procedure for greedily selecting cliques in the dual representation is then proposed, which allows sparse representations. By incorporating kernels and implicit feature spaces into conditional graphical models, the framework enables semisupervised learning algorithms for structured data through the use of graph kernels.
Making Logistic Regression A Core Data Mining Tool: A Practical Investigation of Accuracy, Speed, and Simplicity
, 2004
"... Binary classification is a core data mining task. For large datasets or realtime applications, desirable classifiers are accurate, fast, and need no parameter tuning. We present a simple implementation of logistic regression that meets these requirements. A combination of regularization, truncated ..."
Abstract

Cited by 33 (0 self)
 Add to MetaCart
Binary classification is a core data mining task. For large datasets or realtime applications, desirable classifiers are accurate, fast, and need no parameter tuning. We present a simple implementation of logistic regression that meets these requirements. A combination of regularization, truncated Newton methods, and iteratively reweighted least squares make it faster and more accurate than modern SVM implementations, and relatively insensitive to parameters. It is robust to linear dependencies and some scaling problems, making most data preprocessing unnecessary. 1
Exponentiated gradient algorithms for loglinear structured prediction
 In Proc. ICML
, 2007
"... Conditional loglinear models are a commonly used method for structured prediction. Efficient learning of parameters in these models is therefore an important problem. This paper describes an exponentiated gradient (EG) algorithm for training such models. EG is applied to the convex dual of the maxi ..."
Abstract

Cited by 29 (5 self)
 Add to MetaCart
Conditional loglinear models are a commonly used method for structured prediction. Efficient learning of parameters in these models is therefore an important problem. This paper describes an exponentiated gradient (EG) algorithm for training such models. EG is applied to the convex dual of the maximum likelihood objective; this results in both sequential and parallel update algorithms, where in the sequential algorithm parameters are updated in an online fashion. We provide a convergence proof for both algorithms. Our analysis also simplifies previous results on EG for maxmargin models, and leads to a tighter bound on convergence rates. Experiments on a largescale parsing task show that the proposed algorithm converges much faster than conjugategradient and LBFGS approaches both in terms of optimization objective and test error. 1.
A Discriminative Learning Framework with Pairwise Constraints for Video Object Classification
 In Proc. of CVPR
, 2004
"... In video object classification, insufficient labeled data may at times be easily augmented with pairwise constraints on sample points, i.e, whether they are in the same class or not. In this paper, we proposed a discriminative learning approach which incorporates pairwise constraints into a conventi ..."
Abstract

Cited by 27 (5 self)
 Add to MetaCart
In video object classification, insufficient labeled data may at times be easily augmented with pairwise constraints on sample points, i.e, whether they are in the same class or not. In this paper, we proposed a discriminative learning approach which incorporates pairwise constraints into a conventional marginbased learning framework. The proposed approach offers several advantages over existing approaches dealing with pairwise constraints. First, as opposed to learning distance metrics, the new approach derives its classification power by directly modeling the decision boundary. Second, most previous work handles labeled data by converting them to pairwise constraints and thus leads to much more computation. The proposed approach can handle pairwise constraints together with labeled data so that the computation is greatly reduced. The proposed approach is evaluated on a people classification task with two surveillance video datasets.
Learning the unified kernel machines for classification
 In Proc. KDD
, 2006
"... Kernel machines have been shown as the stateoftheart learning techniques for classification. In this paper, we propose a novel general framework of learning the Unified Kernel Machines (UKM) from both labeled and unlabeled data. Our proposed framework integrates supervised learning, semisupervise ..."
Abstract

Cited by 27 (14 self)
 Add to MetaCart
Kernel machines have been shown as the stateoftheart learning techniques for classification. In this paper, we propose a novel general framework of learning the Unified Kernel Machines (UKM) from both labeled and unlabeled data. Our proposed framework integrates supervised learning, semisupervised kernel learning, and active learning in a unified solution. In the suggested framework, we particularly focus our attention on designing a new semisupervised kernel learning method, i.e., Spectral Kernel Learning (SKL), which is built on the principles of kernel target alignment and unsupervised kernel design. Our algorithm is related to an equivalent quadratic programming problem that can be efficiently solved. Empirical results have shown that our method is more effective and robust to learn the semisupervised kernels than traditional approaches. Based on the framework, we present a specific paradigm of unified kernel machines with respect to Kernel Logistic Regresions (KLR), i.e., Unified Kernel Logistic Regression (UKLR). We evaluate our proposed UKLR classification scheme in comparison with traditional solutions. The promising results show that our proposed UKLR paradigm is more effective than the traditional classification approaches.
Discriminative batch mode active learning
, 2007
"... Active learning sequentially selects unlabeled instances to label with the goal of reducing the effort needed to learn a good classifier. Most previous studies in active learning have focused on selecting one unlabeled instance to label at a time while retraining in each iteration. Recently a few ba ..."
Abstract

Cited by 23 (2 self)
 Add to MetaCart
Active learning sequentially selects unlabeled instances to label with the goal of reducing the effort needed to learn a good classifier. Most previous studies in active learning have focused on selecting one unlabeled instance to label at a time while retraining in each iteration. Recently a few batch mode active learning approaches have been proposed that select a set of most informative unlabeled instances in each iteration under the guidance of heuristic scores. In this paper, we propose a discriminative batch mode active learning approach that formulates the instance selection task as a continuous optimization problem over auxiliary instance selection variables. The optimization is formulated to maximize the discriminative classification performance of the target classifier, while also taking the unlabeled data into account. Although the objective is not convex, we can manipulate a quasiNewton method to obtain a good local solution. Our empirical studies on UCI datasets show that the proposed active learning is more effective than current stateofthe art batch mode active learning algorithms. 1
Computational development of ψlearning
 In Proc. 2005 SIAM Int. Conf. Data Mining
, 2005
"... unseen outcome via relevant knowledge gained from data, where accuracy of generalization is the key. In the context of classification, we argue that higher generalization accuracy is achievable via ψlearning, when a certain class of nonconvex rather than convex cost functions are employed. To deli ..."
Abstract

Cited by 14 (7 self)
 Add to MetaCart
unseen outcome via relevant knowledge gained from data, where accuracy of generalization is the key. In the context of classification, we argue that higher generalization accuracy is achievable via ψlearning, when a certain class of nonconvex rather than convex cost functions are employed. To deliver attainable higher generalization accuracy, we propose two computational strategies via a global optimization technique–difference convex programming, which relies on a decomposition of the cost function into a difference of two convex functions. The first strategy solves sequential quadratic programs. The second strategy, combining this with the method of BranchandBound, is more computationally intensive but is capable of producing global optima. Numerical experiments suggest that the algorithms realize the desired generalization ability of ψlearning.
On l1norm multiclass support vector machines: Methodology and theory
 Journal of the American Statistical Association
"... Binary Support Vector Machines (SVM) have proven effective in classification. However, problems remain with respect to feature selection in multiclass classification. This article proposes a novel multiclass SVM, which performs classification and feature selection simultaneously via L1norm penali ..."
Abstract

Cited by 14 (3 self)
 Add to MetaCart
Binary Support Vector Machines (SVM) have proven effective in classification. However, problems remain with respect to feature selection in multiclass classification. This article proposes a novel multiclass SVM, which performs classification and feature selection simultaneously via L1norm penalized sparse representations. The proposed methodology, together with our developed regularization solution path, permits feature selection within the framework of classification. The operational characteristics of the proposed methodology is examined via both simulated and benchmark examples, and is compared to some competitors in terms of the accuracy of prediction and feature selection. The numerical results suggest that the proposed methodology is highly competitive. 1
Fast algorithms for large scale conditional 3D prediction
 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR
, 2008
"... The potential success of discriminative learning approaches to 3D reconstruction relies on the ability to efficiently train predictive algorithms using sufficiently many examples that are representative of the typical configurations encountered in the application domain. Recent research indicates th ..."
Abstract

Cited by 13 (3 self)
 Add to MetaCart
The potential success of discriminative learning approaches to 3D reconstruction relies on the ability to efficiently train predictive algorithms using sufficiently many examples that are representative of the typical configurations encountered in the application domain. Recent research indicates that sparse conditional Bayesian Mixture of Experts (cMoE) models (e.g. BME [21]) are adequate modeling tools that not only provide contextual 3D predictions for problems like human pose reconstruction, but can also represent multiple interpretations that result from depth ambiguities or occlusion. However, training conditional predictors requires sophisticated doubleloop algorithms that scale unfavorably with the input dimension and the training set size, thus limiting their usage to 10,000 examples of less, so far. In this paper we present largescale algorithms, referred to as f BME, that combine forward feature selection and bound optimization in order to train probabilistic, BME models, with one order of magnitude more data (100,000 examples and up) and more than one order of magnitude faster. We present several large scale experiments, including monocular evaluation on the HumanEva dataset [19], demonstrating how the proposed methods overcome the scaling limitations of existing ones. 1.
Regression and Classification with Regularization
, 2002
"... The purpose of this chapter is to present a theoretical framework for the problem of learning from examples. Learning from examples can be regarded [13] as the problem of approximating a multivariate function from sparse data. The function can be real valued as in regression or binary valued as in c ..."
Abstract

Cited by 12 (6 self)
 Add to MetaCart
The purpose of this chapter is to present a theoretical framework for the problem of learning from examples. Learning from examples can be regarded [13] as the problem of approximating a multivariate function from sparse data. The function can be real valued as in regression or binary valued as in classification. The problem of approximating a function from sparse data is illposed and a classical solution is regularization theory [19]. Regularization theory, as we will consider here, formulates the regression problem as a variational problem of finding the function f that minimizes the functional K (6.1) where V (; ) is a loss function (in the classical formulation the square loss was used), kfk K is a norm in a Reproducing Kernel Hilbert Space (RKHS) H de ned by the positive definite function K, ` is the number of data points or examples (the ` training pairs (x i ; y i )) and is the regularization parameter. Under rather general conditions [14, 22, ...