Results 1  10
of
3,137
A Study on L2Loss (Squared HingeLoss) MultiClass SVM
"... Crammer and Singer’s method is one of the most popular multiclass SVMs. It considers L1 loss (hinge loss) in a complicated optimization problem. In SVM, squared hinge loss (L2 loss) is a common alternative to L1 loss, but surprisingly we have not seen any paper studying details of Crammer and Singe ..."
Abstract
 Add to MetaCart
Crammer and Singer’s method is one of the most popular multiclass SVMs. It considers L1 loss (hinge loss) in a complicated optimization problem. In SVM, squared hinge loss (L2 loss) is a common alternative to L1 loss, but surprisingly we have not seen any paper studying details of Crammer
Distance metric learning for large margin nearest neighbor classification
 In NIPS
, 2006
"... We show how to learn a Mahanalobis distance metric for knearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the knearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven ..."
Abstract

Cited by 695 (14 self)
 Add to MetaCart
convex optimization based on the hinge loss. Unlike learning in SVMs, however, our framework requires no modification or extension for problems in multiway (as opposed to binary) classification. 1
Greedy Function Approximation: A Gradient Boosting Machine
 Annals of Statistics
, 2000
"... Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest{descent minimization. A general gradient{descent \boosting" paradigm is developed for additi ..."
Abstract

Cited by 1000 (13 self)
 Add to MetaCart
for additive expansions based on any tting criterion. Specic algorithms are presented for least{squares, least{absolute{deviation, and Huber{M loss functions for regression, and multi{class logistic likelihood for classication. Special enhancements are derived for the particular case where the individual
The Performance of TCP/IP for Networks with High BandwidthDelay Products and Random Loss.
 IEEE/ACM Trans. Networking,
, 1997
"... AbstractThis paper examines the performance of TCP/IP, the Internet data transport protocol, over widearea networks (WANs) in which data traffic could coexist with realtime traffic such as voice and video. Specifically, we attempt to develop a basic understanding, using analysis and simulation, ..."
Abstract

Cited by 465 (6 self)
 Add to MetaCart
). The following key results are obtained. First, random loss leads to significant throughput deterioration when the product of the loss probability and the square of the bandwidthdelay product is larger than one. Second, for multiple connections sharing a bottleneck link, TCP is grossly unfair toward connections
The Dantzig selector: statistical estimation when p is much larger than n
, 2005
"... In many important statistical applications, the number of variables or parameters p is much larger than the number of observations n. Suppose then that we have observations y = Ax + z, where x ∈ R p is a parameter vector of interest, A is a data matrix with possibly far fewer rows than columns, n ≪ ..."
Abstract

Cited by 879 (14 self)
 Add to MetaCart
‖ˆx − x ‖ 2 ℓ2 ≤ C2 ( · 2 log p · σ 2 + ∑ min(x 2 i, σ 2) Our results are nonasymptotic and we give values for the constant C. In short, our estimator achieves a loss within a logarithmic factor of the ideal mean squared error one would achieve with an oracle which would supply perfect information
Online passiveaggressive algorithms
 JMLR
, 2006
"... We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the nonrealizable case. The end result is new alg ..."
Abstract

Cited by 435 (24 self)
 Add to MetaCart
algorithms and accompanying loss bounds for hingeloss regression and uniclass. We also get refined loss bounds for previously studied classification algorithms.
Clustering with Bregman Divergences
 JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergence ..."
Abstract

Cited by 443 (57 self)
 Add to MetaCart
A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman
Climate change, coral bleaching and the future of the world’s coral reefs
 Marine and Freshwater Research
, 1999
"... Sea temperatures in the tropics have increased by almost 1oC over the past 100 years and are currently increasing at the rate of approximately 12oC per century. Reefbuilding corals, which are central to healthy coral reefs, are currently living close to their thermal maxima. They become stressed i ..."
Abstract

Cited by 428 (16 self)
 Add to MetaCart
the coral host. Corals tend to die in great numbers immediately following coral bleaching events, which may stretch across thousands of square kilometers of ocean. Bleaching events in 1998, the worst year on record, saw the complete loss of live coral in some parts of the world. This paper reviews our
How to Use Expert Advice
 JOURNAL OF THE ASSOCIATION FOR COMPUTING MACHINERY
, 1997
"... We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worstcase situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the ..."
Abstract

Cited by 377 (79 self)
 Add to MetaCart
is on the order of the square root of the number of mistakes of the best expert, and we give efficient algorithms that achieve this. Our upper and lower bounds have matching leading constants in most cases. We then show howthis leads to certain kinds of pattern recognition/learning algorithms with performance
∇wJ = −2 i
"... Given the training examples {xi, yi}, the squared Hinge loss is written as: J = n∑ i=1 max(0, 1 − yix>i w)2 and its gradient is: ..."
Abstract
 Add to MetaCart
Given the training examples {xi, yi}, the squared Hinge loss is written as: J = n∑ i=1 max(0, 1 − yix>i w)2 and its gradient is:
Results 1  10
of
3,137