Results 1 - 10
of
1,152
Training Linear SVMs in Linear Time
, 2006
"... Linear Support Vector Machines (SVMs) have become one of the most prominent machine learning techniques for high-dimensional sparse data commonly encountered in applications like text classification, word-sense disambiguation, and drug design. These applications involve a large number of examples n ..."
Abstract
-
Cited by 549 (6 self)
- Add to MetaCart
as well as a large number of features N, while each example has only s << N non-zero features. This paper presents a Cutting-Plane Algorithm for training linear SVMs that provably has training time O(sn) for classification problems and O(sn log(n)) for ordinal regression problems. The algorithm
Pegasos: Primal Estimated sub-gradient solver for SVM
"... We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy ɛ is Õ(1/ɛ), where each iteration operates on a singl ..."
Abstract
-
Cited by 542 (20 self)
- Add to MetaCart
run-time of our method is Õ(d/(λɛ)), where d is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to non
Data cube: A relational aggregation operator generalizing group-by, cross-tab, and sub-totals
, 1996
"... Abstract. Data analysis applications typically aggregate data across many dimensions looking for anomalies or unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional aggregates. Applications need the N-dimensional generalization of these op ..."
Abstract
-
Cited by 860 (11 self)
- Add to MetaCart
Abstract. Data analysis applications typically aggregate data across many dimensions looking for anomalies or unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional aggregates. Applications need the N-dimensional generalization
Loopy belief propagation for approximate inference: An empirical study. In:
- Proceedings of Uncertainty in AI,
, 1999
"... Abstract Recently, researchers have demonstrated that "loopy belief propagation" -the use of Pearl's polytree algorithm in a Bayesian network with loops -can perform well in the context of error-correcting codes. The most dramatic instance of this is the near Shannon-limit performanc ..."
Abstract
-
Cited by 676 (15 self)
- Add to MetaCart
marginals at the last two iterations. We only plot the diseases which had non-negligible posterior probability. Loopy Belief Propagation . s---=-o� . a-----' range of prior To test this hypothesis, we reparameterized the pyra mid network as follows: we set the prior probability of the "1"
Evidence of Non-Zero Mass Features for the Neutrinos Emitted at Supernova LMC-’87A
, 2008
"... The observation of the neutrinos arrived from Supernova LMC-’87A shows, with a good confidence level, the existence of two massive neutrinos. For the unobserved third neutrino mass, could speculate two possibilities either that this mass is close to one of the two observed values or that this neutri ..."
Abstract
- Add to MetaCart
The observation of the neutrinos arrived from Supernova LMC-’87A shows, with a good confidence level, the existence of two massive neutrinos. For the unobserved third neutrino mass, could speculate two possibilities either that this mass is close to one of the two observed values or that this neutrino has a negligible electronic flavor component. 1
Efficient Maximum Margin Clustering via Cutting Plane Algorithm
"... Maximum margin clustering (MMC) is a recently proposed clustering method, which extends the theory of support vector machine to the unsupervised scenario and aims at finding the maximum margin hyperplane which separates the data from different classes. Traditionally, MMC is formulated as a non-conve ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
in this sequence could be efficiently solved using the constrained concave-convex procedure (CCCP). Moreover, we prove theoretically that the CPMMC algorithm takes time O(sn) to converge with guaranteed accuracy, where n is the total number of samples in the dataset and s is the average number of non-zero features
Training Linear Discriminant Analysis in Linear Time
"... Abstract—Linear Discriminant Analysis (LDA) has been a popular method for extracting features which preserve class separability. It has been widely used in many fields of information processing, such as machine learning, data mining, information retrieval, and pattern recognition. However, the compu ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
computation and the use of regularization techniques. Our theoretical analysis shows that SRDA can be computed with O(ms) time and O(ms) memory, where s( ≤ n) is the average number of non-zero features in each sample. Extensive experimental results on four real world data sets demonstrate the effectiveness
The Minimum Feature Set Problem
- Neural Networks
, 1994
"... One approach to improving the generalization power of a neural net is to try to minimize the number of non-zero weights used. We examine two issues relevant to this approach, for single-layer nets. First we bound the VC dimension of the set of linearthreshold functions that have non-zero weights for ..."
Abstract
-
Cited by 16 (1 self)
- Add to MetaCart
for at most s of n inputs. Second, we show that the problem of minimizing the number of non-zero input weights used (without misclassifying training examples) is both NP-hard and difficult to approximate. Keywords: linear threshold, minimization, learning, complexity, approximation algorithm. Minimum Feature
Large scale training methods for linear RankRLS
"... Abstract. RankRLS is a recently proposed state-of-the-art method for learning ranking functions by minimizing a pairwise ranking error. The method can be trained by solving a system of linear equations. In this work, we investigate the use of conjugate gradient and regularization by iteration for li ..."
Abstract
- Add to MetaCart
of the proposed methods, when learning from data with utility scores, is O(tms), where t is the required number of iterations, m the number of training examples and s the average number of non-zero features per example. In addition, the complexity of learning from pairwise preferences is O(tms+tl), where l
An Online Algorithm for Large Scale Image Similarity Learning
"... Learning a measure of similarity between pairs of objects is a fundamental problem in machine learning. It stands in the core of classification methods like kernel machines, and is particularly useful for applications like searching for images that are similar to a given image or finding videos that ..."
Abstract
-
Cited by 29 (0 self)
- Add to MetaCart
similarity. We describe OASIS, a method for learning pairwise similarity that is fast and scales linearly with the number of objects and the number of non-zero features. Scalability is achieved through online learning of a bilinear model over sparse representations using a large margin criterion
Results 1 - 10
of
1,152