Results 1  10
of
1,152
Training Linear SVMs in Linear Time
, 2006
"... Linear Support Vector Machines (SVMs) have become one of the most prominent machine learning techniques for highdimensional sparse data commonly encountered in applications like text classification, wordsense disambiguation, and drug design. These applications involve a large number of examples n ..."
Abstract

Cited by 549 (6 self)
 Add to MetaCart
as well as a large number of features N, while each example has only s << N nonzero features. This paper presents a CuttingPlane Algorithm for training linear SVMs that provably has training time O(sn) for classification problems and O(sn log(n)) for ordinal regression problems. The algorithm
Pegasos: Primal Estimated subgradient solver for SVM
"... We describe and analyze a simple and effective stochastic subgradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a singl ..."
Abstract

Cited by 542 (20 self)
 Add to MetaCart
runtime of our method is Õ(d/(λɛ)), where d is a bound on the number of nonzero features in each example. Since the runtime does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to non
Data cube: A relational aggregation operator generalizing groupby, crosstab, and subtotals
, 1996
"... Abstract. Data analysis applications typically aggregate data across many dimensions looking for anomalies or unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zerodimensional or onedimensional aggregates. Applications need the Ndimensional generalization of these op ..."
Abstract

Cited by 860 (11 self)
 Add to MetaCart
Abstract. Data analysis applications typically aggregate data across many dimensions looking for anomalies or unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zerodimensional or onedimensional aggregates. Applications need the Ndimensional generalization
Loopy belief propagation for approximate inference: An empirical study. In:
 Proceedings of Uncertainty in AI,
, 1999
"... Abstract Recently, researchers have demonstrated that "loopy belief propagation" the use of Pearl's polytree algorithm in a Bayesian network with loops can perform well in the context of errorcorrecting codes. The most dramatic instance of this is the near Shannonlimit performanc ..."
Abstract

Cited by 676 (15 self)
 Add to MetaCart
marginals at the last two iterations. We only plot the diseases which had nonnegligible posterior probability. Loopy Belief Propagation . s=o� . a' range of prior To test this hypothesis, we reparameterized the pyra mid network as follows: we set the prior probability of the "1"
Evidence of NonZero Mass Features for the Neutrinos Emitted at Supernova LMC’87A
, 2008
"... The observation of the neutrinos arrived from Supernova LMC’87A shows, with a good confidence level, the existence of two massive neutrinos. For the unobserved third neutrino mass, could speculate two possibilities either that this mass is close to one of the two observed values or that this neutri ..."
Abstract
 Add to MetaCart
The observation of the neutrinos arrived from Supernova LMC’87A shows, with a good confidence level, the existence of two massive neutrinos. For the unobserved third neutrino mass, could speculate two possibilities either that this mass is close to one of the two observed values or that this neutrino has a negligible electronic flavor component. 1
Efficient Maximum Margin Clustering via Cutting Plane Algorithm
"... Maximum margin clustering (MMC) is a recently proposed clustering method, which extends the theory of support vector machine to the unsupervised scenario and aims at finding the maximum margin hyperplane which separates the data from different classes. Traditionally, MMC is formulated as a nonconve ..."
Abstract

Cited by 18 (3 self)
 Add to MetaCart
in this sequence could be efficiently solved using the constrained concaveconvex procedure (CCCP). Moreover, we prove theoretically that the CPMMC algorithm takes time O(sn) to converge with guaranteed accuracy, where n is the total number of samples in the dataset and s is the average number of nonzero features
Training Linear Discriminant Analysis in Linear Time
"... Abstract—Linear Discriminant Analysis (LDA) has been a popular method for extracting features which preserve class separability. It has been widely used in many fields of information processing, such as machine learning, data mining, information retrieval, and pattern recognition. However, the compu ..."
Abstract

Cited by 2 (0 self)
 Add to MetaCart
computation and the use of regularization techniques. Our theoretical analysis shows that SRDA can be computed with O(ms) time and O(ms) memory, where s( ≤ n) is the average number of nonzero features in each sample. Extensive experimental results on four real world data sets demonstrate the effectiveness
The Minimum Feature Set Problem
 Neural Networks
, 1994
"... One approach to improving the generalization power of a neural net is to try to minimize the number of nonzero weights used. We examine two issues relevant to this approach, for singlelayer nets. First we bound the VC dimension of the set of linearthreshold functions that have nonzero weights for ..."
Abstract

Cited by 16 (1 self)
 Add to MetaCart
for at most s of n inputs. Second, we show that the problem of minimizing the number of nonzero input weights used (without misclassifying training examples) is both NPhard and difficult to approximate. Keywords: linear threshold, minimization, learning, complexity, approximation algorithm. Minimum Feature
Large scale training methods for linear RankRLS
"... Abstract. RankRLS is a recently proposed stateoftheart method for learning ranking functions by minimizing a pairwise ranking error. The method can be trained by solving a system of linear equations. In this work, we investigate the use of conjugate gradient and regularization by iteration for li ..."
Abstract
 Add to MetaCart
of the proposed methods, when learning from data with utility scores, is O(tms), where t is the required number of iterations, m the number of training examples and s the average number of nonzero features per example. In addition, the complexity of learning from pairwise preferences is O(tms+tl), where l
An Online Algorithm for Large Scale Image Similarity Learning
"... Learning a measure of similarity between pairs of objects is a fundamental problem in machine learning. It stands in the core of classification methods like kernel machines, and is particularly useful for applications like searching for images that are similar to a given image or finding videos that ..."
Abstract

Cited by 29 (0 self)
 Add to MetaCart
similarity. We describe OASIS, a method for learning pairwise similarity that is fast and scales linearly with the number of objects and the number of nonzero features. Scalability is achieved through online learning of a bilinear model over sparse representations using a large margin criterion
Results 1  10
of
1,152