Results 1  10
of
81
Online Learning with Kernels
, 2003
"... Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the socalled kernel trick with the large margin idea. There has been little u ..."
Abstract

Cited by 2831 (123 self)
 Add to MetaCart
Kernel based algorithms such as support vector machines have achieved considerable success in various problems in the batch setting where all of the training data is available in advance. Support vector machines combine the socalled kernel trick with the large margin idea. There has been little use of these methods in an online setting suitable for realtime applications. In this paper we consider online learning in a Reproducing Kernel Hilbert Space. By considering classical stochastic gradient descent within a feature space, and the use of some straightforward tricks, we develop simple and computationally efficient algorithms for a wide range of problems such as classification, regression, and novelty detection. In addition to allowing the exploitation of the kernel trick in an online setting, we examine the value of large margins for classification in the online setting with a drifting target. We derive worst case loss bounds and moreover we show the convergence of the hypothesis to the minimiser of the regularised risk functional. We present some experimental results that support the theory as well as illustrating the power of the new algorithms for online novelty detection. In addition
Efficient structure learning of Markov networks using L1regularization
 In NIPS
, 2006
"... Markov networks are widely used in a wide variety of applications, in problems ranging from computer vision, to natural language, to computational biology. In most current applications, even those that rely heavily on learned models, the structure of the Markov network is constructed by hand, due to ..."
Abstract

Cited by 144 (3 self)
 Add to MetaCart
Markov networks are widely used in a wide variety of applications, in problems ranging from computer vision, to natural language, to computational biology. In most current applications, even those that rely heavily on learned models, the structure of the Markov network is constructed by hand, due to the lack of effective algorithms for learning Markov network structure from data. In this paper, we provide a computationally effective method for learning Markov network structure from data. Our method is based on the use of L1 regularization on the weights of the loglinear model, which has the effect of biasing the model towards solutions where many of the parameters are zero. This formulation converts the Markov network learning problem into a convex optimization problem in a continuous space, which can be solved using efficient gradient methods. A key issue in this setting is the (unavoidable) use of approximate inference, which can lead to errors in the gradient computation when the network structure is dense. Thus, we explore the use of different feature introduction schemes and compare their performance. We provide results for our method on synthetic data, and on two real world data sets: modeling the joint distribution of pixel values in the MNIST data, and modeling the joint distribution of genetic sequence variations in the human HapMap data. We show that our L1based method achieves considerably higher generalization performance than the more standard L2based method (a Gaussian parameter prior) or pure maximumlikelihood learning. We also show that we can learn MRF network structure at a computational cost that is not much greater than learning parameters alone, demonstrating the existence of a feasible method for this important problem. 1
Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation
"... Accurate modeling of geographic distributions of species is crucial to various applications in ecology and conservation. The best performing techniques often require some parameter tuning, which may be prohibitively timeconsuming to do separately for each species, or unreliable for small or biased ..."
Abstract

Cited by 131 (2 self)
 Add to MetaCart
Accurate modeling of geographic distributions of species is crucial to various applications in ecology and conservation. The best performing techniques often require some parameter tuning, which may be prohibitively timeconsuming to do separately for each species, or unreliable for small or biased datasets. Additionally, even with the abundance of good quality data, users interested in the application of species models need not have the statistical knowledge required for detailed tuning. In such cases, it is desirable to use ‘‘default settings’’, tuned and validated on diverse datasets. Maxent is a recently introduced modeling technique, achieving high predictive accuracy and enjoying several additional attractive properties. The performance of Maxent is influenced by a moderate number of parameters. The first contribution of this paper is the empirical tuning of these parameters. Since many datasets lack information about species absence, we present a tuning method that uses presenceonly data. We evaluate our method on independently collected highquality presenceabsence data. In addition to tuning, we introduce several concepts that improve the predictive accuracy and running time of Maxent. We introduce ‘‘hinge features’ ’ that model more complex relationships in the training data; we describe a new logistic output format that gives an estimate of probability of presence; finally we explore ‘‘background sampling’’ strategies that cope with sample selection bias and decrease modelbuilding time. Our evaluation, based on a diverse dataset of 226 species from 6 regions, shows: 1) default settings tuned on presenceonly data achieve performance which is almost as good as if they had been tuned on the evaluation data itself; 2) hinge features substantially improve model
A maximum entropy approach to species distribution modeling
 In Proceedings of the TwentyFirst International Conference on Machine Learning
, 2004
"... We study the problem of modeling species geographic distributions, a critical problem in conservation biology. We propose the use of maximumentropy techniques for this problem, specifically, sequentialupdate algorithms that can handle a very large number of features. We describe experiments compar ..."
Abstract

Cited by 104 (7 self)
 Add to MetaCart
(Show Context)
We study the problem of modeling species geographic distributions, a critical problem in conservation biology. We propose the use of maximumentropy techniques for this problem, specifically, sequentialupdate algorithms that can handle a very large number of features. We describe experiments comparing maxent with a standard distributionmodeling tool, called GARP, on a dataset containing observation data for North American breeding birds. We also study how well maxent performs as a function of the number of training examples and training time, analyze the use of regularization to avoid overfitting when the number of examples is small, and explore the interpretability of models constructed using maxent. 1.
ACAS: Automated construction of application signatures
 In SIGCOMM’05 MineNet Workshop
, 2005
"... An accurate mapping of traffic to applications is important for a broad range of network management and measurement tasks. Internet applications have traditionally been identified using wellknown default server networkport numbers in the TCP or UDP headers. However this approach has become increas ..."
Abstract

Cited by 91 (1 self)
 Add to MetaCart
(Show Context)
An accurate mapping of traffic to applications is important for a broad range of network management and measurement tasks. Internet applications have traditionally been identified using wellknown default server networkport numbers in the TCP or UDP headers. However this approach has become increasingly inaccurate. An alternate, more accurate technique is to use specific applicationlevel features in the protocol exchange to guide the identification. Unfortunately deriving the signatures manually is very time consuming and difficult. In this paper, we explore automatically extracting application signatures from IP traffic payload content. In particular we apply three statistical machine learning algorithms to automatically identify signatures for a range of applications. The results indicate that this approach is highly accurate and scales to allow online application identification on high speed links. We also discovered that content signatures still work in the presence of encryption. In these cases we were able to derive content signature for unencrypted handshakes negotiating the encryption parameters of a particular connection.
Bootstrapping PayAsYouGo Data Integration Systems In
 SIGMOD,
, 2008
"... ABSTRACT Data integration systems offer a uniform interface to a set of data sources. Despite recent progress, setting up and maintaining a data integration application still requires significant upfront effort of creating a mediated schema and semantic mappings from the data sources to the mediate ..."
Abstract

Cited by 75 (7 self)
 Add to MetaCart
(Show Context)
ABSTRACT Data integration systems offer a uniform interface to a set of data sources. Despite recent progress, setting up and maintaining a data integration application still requires significant upfront effort of creating a mediated schema and semantic mappings from the data sources to the mediated schema. Many application contexts involving multiple data sources (e.g., the web, personal information management, enterprise intranets) do not require full integration in order to provide useful services, motivating a payasyougo approach to integration. With that approach, a system starts with very few (or inaccurate) semantic mappings and these mappings are improved over time as deemed necessary. This paper describes the first completely selfconfiguring data integration system. The goal of our work is to investigate how advanced of a starting point we can provide a payasyougo system. Our system is based on the new concept of a probabilistic mediated schema that is automatically created from the data sources. We automatically create probabilistic schema mappings between the sources and the mediated schema. We describe experiments in multiple domains, including 50800 data sources, and show that our system is able to produce highquality answers with no human intervention.
Unifying Divergence Minimization and Statistical Inference via Convex Duality
 Proc. of Conf. on Learning Theory (COLT
, 2006
"... Abstract. In this paper we unify divergence minimization and statistical inference by means of convex duality. In the process of doing so, we prove that the dual of approximate maximum entropy estimation is maximum a posteriori estimation. Moreover, our treatment leads to stability and convergence b ..."
Abstract

Cited by 53 (10 self)
 Add to MetaCart
(Show Context)
Abstract. In this paper we unify divergence minimization and statistical inference by means of convex duality. In the process of doing so, we prove that the dual of approximate maximum entropy estimation is maximum a posteriori estimation. Moreover, our treatment leads to stability and convergence bounds for many statistical learning problems. Finally, we show how an algorithm by Zhang can be used to solve this class of optimization problems efficiently. 1
Learning to rank networked entities
 In KDD
, 2006
"... Several algorithms have been proposed to learn to rank entities modeled as feature vectors, based on relevance feedback. However, these algorithms do not model network connections or relations between entities. Meanwhile, Pagerank and variants find the stationary distribution of a reasonable but arb ..."
Abstract

Cited by 51 (4 self)
 Add to MetaCart
Several algorithms have been proposed to learn to rank entities modeled as feature vectors, based on relevance feedback. However, these algorithms do not model network connections or relations between entities. Meanwhile, Pagerank and variants find the stationary distribution of a reasonable but arbitrary Markov walk over a network, but do not learn from relevance feedback. We present a framework for ranking networked entities based on Markov walks with parameterized conductance values associated with the network edges. We propose two flavors of conductance learning problems in our framework. In the first setting, relevance feedback comparing nodepairs hints that the user has one or more hidden preferred communities with large edge conductance, and the algorithm must discover these communities. We present a constrained maximum entropy network flow formulation whose dual can be solved efficiently using a cuttingplane approach and a quasiNewton optimizer. In the second setting, edges have types, and relevance feedback hints that each edge type has a potentially different conductance, but this is fixed across the whole network. Our algorithm learns the conductances using an approximate Newton method.
Coordinate Descent Method for Largescale L2loss Linear SVM
"... Linear support vector machines (SVM) are useful for classifying largescale sparse data. Problems with sparse features are common in applications such as document classification and natural language processing. In this paper, we propose a novel coordinate descent algorithm for training linear SVM wit ..."
Abstract

Cited by 46 (12 self)
 Add to MetaCart
Linear support vector machines (SVM) are useful for classifying largescale sparse data. Problems with sparse features are common in applications such as document classification and natural language processing. In this paper, we propose a novel coordinate descent algorithm for training linear SVM with the L2loss function. At each step, the proposed method minimizes a onevariable subproblem while fixing other variables. The subproblem is solved by Newton steps with the line search technique. The procedure globally converges at the linear rate. Experiments show that our method is more efficient and stable than state of the art methods such as Pegasos and TRON. 1
Correcting sample selection bias in maximum entropy density estimation
 In Advances in Neural Information Processing Systems
, 2005
"... We study the problem of maximum entropy density estimation in the presence of known sample selection bias. We propose three bias correction approaches. The first one takes advantage of unbiased sufficient statistics which can be obtained from biased samples. The second one estimates the biased distr ..."
Abstract

Cited by 38 (4 self)
 Add to MetaCart
(Show Context)
We study the problem of maximum entropy density estimation in the presence of known sample selection bias. We propose three bias correction approaches. The first one takes advantage of unbiased sufficient statistics which can be obtained from biased samples. The second one estimates the biased distribution and then factors the bias out. The third one approximates the second by only using samples from the sampling distribution. We provide guarantees for the first two approaches and evaluate the performance of all three approaches in synthetic experiments and on real data from species habitat modeling, where maxent has been successfully applied and where sample selection bias is a significant problem. 1