Results 1  10
of
318
Learning the Kernel Matrix with SemiDefinite Programming
, 2002
"... Kernelbased learning algorithms work by embedding the data into a Euclidean space, and then searching for linear relations among the embedded data points. The embedding is performed implicitly, by specifying the inner products between each pair of points in the embedding space. This information ..."
Abstract

Cited by 545 (25 self)
 Add to MetaCart
Kernelbased learning algorithms work by embedding the data into a Euclidean space, and then searching for linear relations among the embedded data points. The embedding is performed implicitly, by specifying the inner products between each pair of points in the embedding space. This information is contained in the socalled kernel matrix, a symmetric and positive definite matrix that encodes the relative positions of all points. Specifying this matrix amounts to specifying the geometry of the embedding space and inducing a notion of similarity in the input spaceclassical model selection problems in machine learning. In this paper we show how the kernel matrix can be learned from data via semidefinite programming (SDP) techniques. When applied
Selective sampling using the Query by Committee algorithm
 Machine Learning
, 1997
"... We analyze the "query by committee" algorithm, a method for filtering informative queries from a random stream of inputs. We show that if the twomember committee algorithm achieves information gain with positive lower bound, then the prediction error decreases exponentially with the number of queri ..."
Abstract

Cited by 334 (7 self)
 Add to MetaCart
We analyze the "query by committee" algorithm, a method for filtering informative queries from a random stream of inputs. We show that if the twomember committee algorithm achieves information gain with positive lower bound, then the prediction error decreases exponentially with the number of queries. We show that, in particular, this exponential decrease holds for query learning of perceptrons.
A framework for learning predictive structures from multiple tasks and unlabeled data
 Journal of Machine Learning Research
, 2005
"... One of the most important issues in machine learning is whether one can improve the performance of a supervised learning algorithm by including unlabeled data. Methods that use both labeled and unlabeled data are generally referred to as semisupervised learning. Although a number of such methods ar ..."
Abstract

Cited by 320 (3 self)
 Add to MetaCart
One of the most important issues in machine learning is whether one can improve the performance of a supervised learning algorithm by including unlabeled data. Methods that use both labeled and unlabeled data are generally referred to as semisupervised learning. Although a number of such methods are proposed, at the current stage, we still don’t have a complete understanding of their effectiveness. This paper investigates a closely related problem, which leads to a novel approach to semisupervised learning. Specifically we consider learning predictive structures on hypothesis spaces (that is, what kind of classifiers have good predictive power) from multiple learning tasks. We present a general framework in which the structural learning problem can be formulated and analyzed theoretically, and relate it to learning with unlabeled data. Under this framework, algorithms for structural learning will be proposed, and computational issues will be investigated. Experiments will be given to demonstrate the effectiveness of the proposed algorithms in the semisupervised learning setting. 1.
A Critical Point For Random Graphs With A Given Degree Sequence
, 2000
"... Given a sequence of nonnegative real numbers 0 ; 1 ; : : : which sum to 1, we consider random graphs having approximately i n vertices of degree i. Essentially, we show that if P i(i \Gamma 2) i ? 0 then such graphs almost surely have a giant component, while if P i(i \Gamma 2) i ! 0 the ..."
Abstract

Cited by 287 (6 self)
 Add to MetaCart
Given a sequence of nonnegative real numbers 0 ; 1 ; : : : which sum to 1, we consider random graphs having approximately i n vertices of degree i. Essentially, we show that if P i(i \Gamma 2) i ? 0 then such graphs almost surely have a giant component, while if P i(i \Gamma 2) i ! 0 then almost surely all components in such graphs are small. We can apply these results to G n;p ; G n;M , and other wellknown models of random graphs. There are also applications related to the chromatic number of sparse random graphs.
Sybilguard: Defending against sybil attacks via social networks
 In ACM SIGCOMM ’06
, 2006
"... Peertopeer and other decentralized, distributed systems are known to be particularly vulnerable to sybil attacks. In a sybil attack, a malicious user obtains multiple fake identities and pretends to be multiple, distinct nodes in the system. By controlling a large fraction of the nodes in the syst ..."
Abstract

Cited by 207 (6 self)
 Add to MetaCart
Peertopeer and other decentralized, distributed systems are known to be particularly vulnerable to sybil attacks. In a sybil attack, a malicious user obtains multiple fake identities and pretends to be multiple, distinct nodes in the system. By controlling a large fraction of the nodes in the system, the malicious user is able to “out vote” the honest users in collaborative tasks such as Byzantine failure defenses. This paper presents SybilGuard, anovelprotocolfor limiting the corruptive influences of sybil attacks. Our protocol is based on the “social network ” among user identities, where an edge between two identities indicates a humanestablished trust relationship. Malicious users can create many identities but few trust relationships. Thus, there is a disproportionatelysmall “cut ” in the graph between the sybil nodes and the honest nodes. SybilGuard exploits this property to bound the number of identities a malicious user can create. We show the effectiveness of SybilGuard both analytically and experimentally.
Models of Random Regular Graphs
 In Surveys in combinatorics
, 1999
"... In a previous paper we showed that a random 4regular graph asymptotically almost surely (a.a.s.) has chromatic number 3. Here we extend the method to show that a random 6regular graph asymptotically almost surely (a.a.s.) has chromatic number 4 and that the chromatic number of a random dregular g ..."
Abstract

Cited by 156 (32 self)
 Add to MetaCart
In a previous paper we showed that a random 4regular graph asymptotically almost surely (a.a.s.) has chromatic number 3. Here we extend the method to show that a random 6regular graph asymptotically almost surely (a.a.s.) has chromatic number 4 and that the chromatic number of a random dregular graph for other d between 5 and 10 inclusive is a.a.s. restricted to a range of two integer values: {3, 4} for d = 5, {4, 5} for d = 7, 8, 9, and {5, 6} for d = 10. The proof uses efficient algorithms which a.a.s. colour these random graphs using the number of colours specified by the upper bound. These algorithms are analysed using the differential equation method, including an analysis of certain systems of differential equations with discontinuous right hand sides. 1
Correcting sample selection bias by unlabeled data
"... We consider the scenario where training and test data are drawn from different distributions, commonly referred to as sample selection bias. Most algorithms for this setting try to first recover sampling distributions and then make appropriate corrections based on the distribution estimate. We prese ..."
Abstract

Cited by 130 (9 self)
 Add to MetaCart
We consider the scenario where training and test data are drawn from different distributions, commonly referred to as sample selection bias. Most algorithms for this setting try to first recover sampling distributions and then make appropriate corrections based on the distribution estimate. We present a nonparametric method which directly produces resampling weights without distribution estimation. Our method works by matching distributions between training and testing sets in feature space. Experimental results demonstrate that our method works well in practice.
Tail Bounds for Occupancy and the Satisfiability Threshold Conjecture
, 1995
"... The classical occupancy problem is concerned with studying the number of empty bins resulting from a random allocation of m balls to n bins. We provide a series of tail bounds on the distribution of the number of empty bins. These tail bounds should find application in randomized algorithms and prob ..."
Abstract

Cited by 97 (1 self)
 Add to MetaCart
The classical occupancy problem is concerned with studying the number of empty bins resulting from a random allocation of m balls to n bins. We provide a series of tail bounds on the distribution of the number of empty bins. These tail bounds should find application in randomized algorithms and probabilistic analysis. Our motivating application is the following wellknown conjecture on threshold phenomenon for the satisfiability problem. Consider random 3SAT formulas with cn clauses over n variables, where each clause is chosen uniformly and independently from the space of all clauses of size 3. It has been conjectured that there is a sharp threshold for satisfiability at c ß 4:2. We provide a strong upper bound on the value of c , showing that for c ? 4:758 a random 3SAT formula is unsatisfiable with high probability. This result is based on a structural property, possibly of independent interest, whose proof needs several applications of the occupancy tail bounds. Supporte...