Results 1 - 10
of
101
Combining labeled and unlabeled data with co-training
, 1998
"... We consider the problem of using a large unlabeled sample to boost performance of a learning algorithm when only a small set of labeled examples is available. In particular, we consider a setting in which the description of each example can be partitioned into two distinct views, motivated by the ta ..."
Abstract
-
Cited by 1633 (28 self)
- Add to MetaCart
(Show Context)
We consider the problem of using a large unlabeled sample to boost performance of a learning algorithm when only a small set of labeled examples is available. In particular, we consider a setting in which the description of each example can be partitioned into two distinct views, motivated by the task of learning to classify web pages. For example, the description of a web page can be partitioned into the words occurring on that page, and the words occurring in hyperlinks that point to that page. We assume that either view of the example would be su cient for learning if we had enough labeled data, but our goal is to use both views together to allow inexpensive unlabeled data to augment amuch smaller set of labeled examples. Speci cally, the presence of two distinct views of each example suggests strategies in which two learning algorithms are trained separately on each view, and then each algorithm's predictions on new unlabeled examples are used to enlarge the training set of the other. Our goal in this paper is to provide a PAC-style analysis for this setting, and, more broadly, a PAC-style framework for the general problem of learning from both labeled and unlabeled data. We also provide empirical results on real web-page data indicating that this use of unlabeled examples can lead to signi cant improvement of hypotheses in practice. As part of our analysis, we provide new re-
An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision
- IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
, 2001
"... After [10, 15, 12, 2, 4] minimum cut/maximum flow algorithms on graphs emerged as an increasingly useful tool for exact or approximate energy minimization in low-level vision. The combinatorial optimization literature provides many min-cut/max-flow algorithms with different polynomial time compl ..."
Abstract
-
Cited by 1315 (53 self)
- Add to MetaCart
(Show Context)
After [10, 15, 12, 2, 4] minimum cut/maximum flow algorithms on graphs emerged as an increasingly useful tool for exact or approximate energy minimization in low-level vision. The combinatorial optimization literature provides many min-cut/max-flow algorithms with different polynomial time complexity. Their practical efficiency, however, has to date been studied mainly outside the scope of computer vision. The goal of this paper
FINDING STRUCTURE WITH RANDOMNESS: PROBABILISTIC ALGORITHMS FOR CONSTRUCTING APPROXIMATE MATRIX DECOMPOSITIONS
"... Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for ..."
Abstract
-
Cited by 253 (6 self)
- Add to MetaCart
(Show Context)
Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition
A new approach to the minimum cut problem
- Journal of the ACM
, 1996
"... Abstract. This paper presents a new approach to finding minimum cuts in undirected graphs. The fundamental principle is simple: the edges in a graph’s minimum cut form an extremely small fraction of the graph’s edges. Using this idea, we give a randomized, strongly polynomial algorithm that finds th ..."
Abstract
-
Cited by 128 (9 self)
- Add to MetaCart
Abstract. This paper presents a new approach to finding minimum cuts in undirected graphs. The fundamental principle is simple: the edges in a graph’s minimum cut form an extremely small fraction of the graph’s edges. Using this idea, we give a randomized, strongly polynomial algorithm that finds the minimum cut in an arbitrarily weighted undirected graph with high probability. The algorithm runs in O(n 2 log 3 n) time, a significant improvement over the previous Õ(mn) time bounds based on maximum flows. It is simple and intuitive and uses no complex data structures. Our algorithm can be parallelized to run in �� � with n 2 processors; this gives the first proof that the minimum cut problem can be solved in ���. The algorithm does more than find a single minimum cut; it finds all of them. With minor modifications, our algorithm solves two other problems of interest. Our algorithm finds all cuts with value within a multiplicative factor of � of the minimum cut’s in expected Õ(n 2 � ) time, or in �� � with n 2 � processors. The problem of finding a minimum multiway cut of a graph into r pieces is solved in expected Õ(n 2(r�1) ) time, or in �� � with n 2(r�1) processors. The “trace ” of the algorithm’s execution on these two problems forms a new compact data structure for representing all small cuts and all multiway cuts in a graph. This data structure can be efficiently transformed into the
Minimum cuts in near-linear time
- Proc. of the 28th STOC
, 1996
"... Abstract. We significantly improve known time bounds for solving the minimum cut problem on undirected graphs. We use a "semiduality" between minimum cuts and maximum spanning tree packings combined with our previously developed random sampling techniques. We give a randomized (Monte Carl ..."
Abstract
-
Cited by 95 (12 self)
- Add to MetaCart
(Show Context)
Abstract. We significantly improve known time bounds for solving the minimum cut problem on undirected graphs. We use a "semiduality" between minimum cuts and maximum spanning tree packings combined with our previously developed random sampling techniques. We give a randomized (Monte Carlo) algorithm that finds a minimum cut in an m-edge, n-vertex graph with high probability in O(m log 3 n) time. We also give a simpler randomized algorithm that finds all minimum cuts with high probability in O(n 2 log n) time. This variant has an optimal ᏺᏯ parallelization. Both variants improve on the previous best time bound of O(n 2 log 3 n). Other applications of the tree-packing approach are new, nearly tight bounds on the number of near-minimum cuts a graph may have and a new data structure for representing them in a space-efficient manner.
Improved Approximation Algorithms for Uniform Connectivity Problems
- J. Algorithms
"... The problem of finding minimum weight spanning subgraphs with a given connectivity requirement is considered. The problem is NP-hard when the connectivity requirement is greater than one. Polynomial time approximation algorithms for various weighted and unweighted connectivity problems are given. Th ..."
Abstract
-
Cited by 79 (3 self)
- Add to MetaCart
(Show Context)
The problem of finding minimum weight spanning subgraphs with a given connectivity requirement is considered. The problem is NP-hard when the connectivity requirement is greater than one. Polynomial time approximation algorithms for various weighted and unweighted connectivity problems are given. The following results are presented: 1. For the unweighted k-edge-connectivity problem an approximation algorithm that achieves a performance ratio of 1.85 is described. This is the first polynomial-time algorithm that achieves a constant less than 2, for all k. 2. For the weighted k-vertex-connectivity problem, a constant factor approximation algorithm is given assuming that the edge-weights satisfy the triangle inequality. This is the first constant factor approximation algorithm for this problem. 3. For the case of biconnectivity, with no assumptions about the weights of the edges, an algorithm that achieves a factor asymptotically approaching 2 is described. This matches the previous best...
Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions
, 2009
"... Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys recent research which demonstrates that randomization offers a powerful tool for performing l ..."
Abstract
-
Cited by 62 (4 self)
- Add to MetaCart
(Show Context)
Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. In particular, these techniques offer a route toward principal component analysis (PCA) for petascale data. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider
Graph sketches: sparsification, spanners, and subgraphs
- In PODS
, 2012
"... When processing massive data sets, a core task is to construct synopses of the data. To be useful, a synopsis data structure should be easy to construct while also yielding good approximations of the relevant properties of the data set. A particularly useful class of synopses are sketches, i.e., tho ..."
Abstract
-
Cited by 46 (10 self)
- Add to MetaCart
(Show Context)
When processing massive data sets, a core task is to construct synopses of the data. To be useful, a synopsis data structure should be easy to construct while also yielding good approximations of the relevant properties of the data set. A particularly useful class of synopses are sketches, i.e., those based on linear projections of the data. These are applicable in many models including various parallel, stream, and compressed sensing settings. A rich body of analytic and empirical work exists for sketching numerical data such as the frequencies of a set of entities. Our work investigates graph sketching where the graphs of interest encode the relationships between these entities. The main challenge is to capture this richer structure and build the necessary synopses with only linear measurements. In this paper we consider properties of graphs including the size of the cuts, the distances between nodes, and the prevalence of
Approximating Minimum-Size k-Connected Spanning Subgraphs via Matching
- SIAM J. Comput
, 1998
"... Abstract: An efficient heuristic is presented for the problem of finding a minimum-size k- connected spanning subgraph of an (undirected or directed) simple graph G =(V#E). There are four versions of the problem, and the approximation guarantees are as follows: minimum-size k-node connected spann ..."
Abstract
-
Cited by 43 (3 self)
- Add to MetaCart
Abstract: An efficient heuristic is presented for the problem of finding a minimum-size k- connected spanning subgraph of an (undirected or directed) simple graph G =(V#E). There are four versions of the problem, and the approximation guarantees are as follows: minimum-size k-node connected spanning subgraph of an undirected graph 1+[1=k], minimum-size k-node connected spanning subgraph of a directed graph 1+[1=k], minimum-size k-edge connected spanning subgraph of an undirected graph 1+[2=(k + 1)], and minimum-size k-edge connected spanning subgraph of a directed graph 1+[4= p k].