Results 1 - 10
of
3,137
A Study on L2-Loss (Squared Hinge-Loss) Multi-Class SVM
"... Crammer and Singer’s method is one of the most popular multi-class SVMs. It considers L1 loss (hinge loss) in a complicated optimization problem. In SVM, squared hinge loss (L2 loss) is a common alternative to L1 loss, but surprisingly we have not seen any paper studying details of Crammer and Singe ..."
Abstract
- Add to MetaCart
Crammer and Singer’s method is one of the most popular multi-class SVMs. It considers L1 loss (hinge loss) in a complicated optimization problem. In SVM, squared hinge loss (L2 loss) is a common alternative to L1 loss, but surprisingly we have not seen any paper studying details of Crammer
Distance metric learning for large margin nearest neighbor classification
- In NIPS
, 2006
"... We show how to learn a Mahanalobis distance metric for k-nearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven ..."
Abstract
-
Cited by 695 (14 self)
- Add to MetaCart
convex optimization based on the hinge loss. Unlike learning in SVMs, however, our framework requires no modification or extension for problems in multiway (as opposed to binary) classification. 1
Greedy Function Approximation: A Gradient Boosting Machine
- Annals of Statistics
, 2000
"... Function approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest{descent minimization. A general gradient{descent \boosting" paradigm is developed for additi ..."
Abstract
-
Cited by 1000 (13 self)
- Add to MetaCart
for additive expansions based on any tting criterion. Specic algorithms are presented for least{squares, least{absolute{deviation, and Huber{M loss functions for regression, and multi{class logistic likelihood for classication. Special enhancements are derived for the particular case where the individual
The Performance of TCP/IP for Networks with High Bandwidth-Delay Products and Random Loss.
- IEEE/ACM Trans. Networking,
, 1997
"... Abstract-This paper examines the performance of TCP/IP, the Internet data transport protocol, over wide-area networks (WANs) in which data traffic could coexist with real-time traffic such as voice and video. Specifically, we attempt to develop a basic understanding, using analysis and simulation, ..."
Abstract
-
Cited by 465 (6 self)
- Add to MetaCart
). The following key results are obtained. First, random loss leads to significant throughput deterioration when the product of the loss probability and the square of the bandwidth-delay product is larger than one. Second, for multiple connections sharing a bottleneck link, TCP is grossly unfair toward connections
The Dantzig selector: statistical estimation when p is much larger than n
, 2005
"... In many important statistical applications, the number of variables or parameters p is much larger than the number of observations n. Suppose then that we have observations y = Ax + z, where x ∈ R p is a parameter vector of interest, A is a data matrix with possibly far fewer rows than columns, n ≪ ..."
Abstract
-
Cited by 879 (14 self)
- Add to MetaCart
‖ˆx − x ‖ 2 ℓ2 ≤ C2 ( · 2 log p · σ 2 + ∑ min(x 2 i, σ 2) Our results are nonasymptotic and we give values for the constant C. In short, our estimator achieves a loss within a logarithmic factor of the ideal mean squared error one would achieve with an oracle which would supply perfect information
Online passive-aggressive algorithms
- JMLR
, 2006
"... We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the non-realizable case. The end result is new alg ..."
Abstract
-
Cited by 435 (24 self)
- Add to MetaCart
algorithms and accompanying loss bounds for hinge-loss regression and uniclass. We also get refined loss bounds for previously studied classification algorithms.
Clustering with Bregman Divergences
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2005
"... A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman divergence ..."
Abstract
-
Cited by 443 (57 self)
- Add to MetaCart
A wide variety of distortion functions are used for clustering, e.g., squared Euclidean distance, Mahalanobis distance and relative entropy. In this paper, we propose and analyze parametric hard and soft clustering algorithms based on a large class of distortion functions known as Bregman
Climate change, coral bleaching and the future of the world’s coral reefs
- Marine and Freshwater Research
, 1999
"... Sea temperatures in the tropics have increased by almost 1oC over the past 100 years and are currently increasing at the rate of approximately 1-2oC per century. Reef-building corals, which are central to healthy coral reefs, are currently living close to their thermal maxima. They become stressed i ..."
Abstract
-
Cited by 428 (16 self)
- Add to MetaCart
the coral host. Corals tend to die in great numbers immediately following coral bleaching events, which may stretch across thousands of square kilometers of ocean. Bleaching events in 1998, the worst year on record, saw the complete loss of live coral in some parts of the world. This paper reviews our
How to Use Expert Advice
- JOURNAL OF THE ASSOCIATION FOR COMPUTING MACHINERY
, 1997
"... We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worst-case situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the ..."
Abstract
-
Cited by 377 (79 self)
- Add to MetaCart
is on the order of the square root of the number of mistakes of the best expert, and we give efficient algorithms that achieve this. Our upper and lower bounds have matching leading constants in most cases. We then show howthis leads to certain kinds of pattern recognition/learning algorithms with performance
∇wJ = −2 i
"... Given the training examples {xi, yi}, the squared Hinge loss is written as: J = n∑ i=1 max(0, 1 − yix>i w)2 and its gradient is: ..."
Abstract
- Add to MetaCart
Given the training examples {xi, yi}, the squared Hinge loss is written as: J = n∑ i=1 max(0, 1 − yix>i w)2 and its gradient is:
Results 1 - 10
of
3,137