Results 1 - 10
of
309
Similarity flooding: A versatile graph matching algorithm
, 2002
"... Matching elements of two data schemas or two data instances plays a key role in data warehousing, e-business, or even biochemical applications. In this paper we present a matching algorithm based on a fixpoint computation that is usable across different scenarios. The algorithm takes two graphs (sch ..."
Abstract
-
Cited by 322 (9 self)
- Add to MetaCart
Matching elements of two data schemas or two data instances plays a key role in data warehousing, e-business, or even biochemical applications. In this paper we present a matching algorithm based on a fixpoint computation that is usable across different scenarios. The algorithm takes two graphs (schemas, catalogs, or other data structures) as input, and produces as output a mapping between corresponding nodes of the graphs. Depending on the matching goal, a subset of the mapping is chosen using filters. After our algorithm runs, we expect a human to check and if necessary adjust the results. As a matter of fact, we evaluate the ‘accuracy ’ of the algorithm by counting the number of needed adjustments. We conducted a user study, in which our accuracy metric was used to estimate the labor savings that the users could obtain by utilizing our algorithm to obtain an initial matching. Finally, we illustrate how our matching algorithm is deployed as one of several high-level operators in an implemented testbed for managing information models and mappings. 1.
Bidding and Allocation in Combinatorial Auctions
- In ACM Conference on Electronic Commerce
, 2000
"... When an auction of multiple items is performed, it is often desirable to allow bids on combinations of items, as opposed to only on single items. Such an auction is often called "combinatorial ", and the exponential number of possible combinations results in computational intractability of many a ..."
Abstract
-
Cited by 218 (12 self)
- Add to MetaCart
When an auction of multiple items is performed, it is often desirable to allow bids on combinations of items, as opposed to only on single items. Such an auction is often called "combinatorial ", and the exponential number of possible combinations results in computational intractability of many aspects regarding such an auction. This paper considers two of these aspects: the bidding language and the allocation algorithm. First we consider which kinds of bids on combinations are allowed and how, i.e. in what language, they are specified. The basic tradeoff is the expressibility of the language versus its simplicity. We consider and formalize several bidding languages and compare their strengths. We prove exponential separations between the expressive power of different languages, and show that one language, "OR-bids with phantom items", can polynomially simulate the others. We then consider the problem of determining the best allocation -- a problem known to be computationally intractable. We suggest an approach based on Linear Programming (LP) and motivate it. We prove that the LP approach finds an optimal allocation if and only if prices can be attached to single items in the auction. We pinpoint several classes of auctions where this is the case, and suggest greedy and branch-and-bound heuristics based on LP for other cases. 1
The Markov Chain Monte Carlo method: an approach to approximate counting and integration
, 1996
"... In the area of statistical physics, Monte Carlo algorithms based on Markov chain simulation have been in use for many years. The validity of these algorithms depends crucially on the rate of convergence to equilibrium of the Markov chain being simulated. Unfortunately, the classical theory of stocha ..."
Abstract
-
Cited by 203 (13 self)
- Add to MetaCart
In the area of statistical physics, Monte Carlo algorithms based on Markov chain simulation have been in use for many years. The validity of these algorithms depends crucially on the rate of convergence to equilibrium of the Markov chain being simulated. Unfortunately, the classical theory of stochastic processes hardly touches on the sort of non-asymptotic analysis required in this application. As a consequence, it had previously not been possible to make useful, mathematically rigorous statements about the quality of the estimates obtained. Within the last ten years, analytical tools have been devised with the aim of correcting this deficiency. As well as permitting the analysis of Monte Carlo algorithms for classical problems in statistical physics, the introduction of these tools has spurred the development of new approximation algorithms for a wider class of problems in combinatorial enumeration and optimization. The “Markov chain Monte Carlo ” method has been applied to a variety of such problems, and often provides the only known efficient (i.e., polynomial time) solution technique.
An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1993
"... Abstract-A novel graph theoretic approach for data clustering is presented and its application to the image segmentation problem is demonstrated. The data to be clustered are represented by an undirected adjacency graph G with arc capacities assigned to reflect the similarity between the linked vert ..."
Abstract
-
Cited by 200 (0 self)
- Add to MetaCart
Abstract-A novel graph theoretic approach for data clustering is presented and its application to the image segmentation problem is demonstrated. The data to be clustered are represented by an undirected adjacency graph G with arc capacities assigned to reflect the similarity between the linked vertices. Clustering is achieved by removing arcs of G to form mutually exclusive subgraphs such that the largest inter-subgraph maximum flow is minimized. For graphs of moderate size (- 2000 vertices), the optimal solution is obtained through partitioning a flow and cut equivalent tree of 6, which can be efficiently constructed using the Gomory-Hu algorithm. However for larger graphs this approach is impractical. New theorems for subgraph condensation are derived and are then used to develop a fast algorithm which hierarchically constructs and partitions a partially equivalent tree of much reduced size. This algorithm results in an optimal solution equivalent to that obtained by partitioning the complete equivalent tree and is able to handle very large graphs with several hundred thousand vertices. The new clustering algorithm is applied to the image segmentation problem. The segmentation is achieved by effectively searching for closed contours of edge elements (equivalent to minimum cuts in G), which consist mostly of strong edges, while rejecting contours containing isolated strong edges. This method is able to accurately locate region boundaries and at the same time guarantees the formation of closed edge contours. Index Terms-Clustering, edge contours, flow and cut equivalent tree, graph theory, image segmentation, subgraph condensation. D I.
Improved Steiner Tree Approximation in Graphs
, 2000
"... The Steiner tree problem in weighted graphs seeks a minimum weight connected subgraph containing a given subset of the vertices (terminals). We present a new polynomial-time heuristic with an approximation ratio approaching 1 + 2 1:55, which improves upon the previously best-known approximation ..."
Abstract
-
Cited by 168 (6 self)
- Add to MetaCart
The Steiner tree problem in weighted graphs seeks a minimum weight connected subgraph containing a given subset of the vertices (terminals). We present a new polynomial-time heuristic with an approximation ratio approaching 1 + 2 1:55, which improves upon the previously best-known approximation algorithm of [10] with performance ratio 1:59.
An interactive clustering-based approach to integrating source query interfaces on the deep web
- In SIGMOD
, 2004
"... An increasing number of data sources now become available on the Web, but often their contents are only accessible through query interfaces. For a domain of interest, there often exist many such sources with varied coverage or querying capabilities. As an important step to the integration of these s ..."
Abstract
-
Cited by 73 (14 self)
- Add to MetaCart
An increasing number of data sources now become available on the Web, but often their contents are only accessible through query interfaces. For a domain of interest, there often exist many such sources with varied coverage or querying capabilities. As an important step to the integration of these sources, we consider the integration of their query interfaces. More specifically, we focus on the crucial step of the integration: accurately matching the interfaces. While the integration of query interfaces has received more attentions recently, current approaches are not sufficiently general: (a) they all model interfaces with flat schemas; (b) most of them only consider 1:1 mappings of fields over the interfaces; (c) they all perform the integration in a blackbox-like fashion and the whole process has to be restarted from scratch if anything goes wrong; and (d) they often require laborious parameter tuning. In this paper, we propose an interactive, clustering-based approach to matching query interfaces. The hierarchical nature of interfaces is captured with ordered trees. Varied types of complex mappings of fields are examined and several approaches are proposed to effectively identify these mappings. We put the human integrator back in the loop and propose several novel approaches to the interactive learning of parameters and the resolution of uncertain mappings. Extensive experiments are conducted and results show that our approach is highly effective. 1.
The Traveling Salesman Problem and Its Variations
, 2002
"... Introduction The Maximum Traveling Salesman Problem (MAX TSP), also known informally as the "taxicab ripoff problem", is stated as follows: Given an n \Theta n real matrix c = (c ij ), called a weight matrix, find a hamiltonian cycle i 1 7! i 2 7! : : : 7! i n 7! i 1 , for which the maximum value ..."
Abstract
-
Cited by 72 (3 self)
- Add to MetaCart
Introduction The Maximum Traveling Salesman Problem (MAX TSP), also known informally as the "taxicab ripoff problem", is stated as follows: Given an n \Theta n real matrix c = (c ij ), called a weight matrix, find a hamiltonian cycle i 1 7! i 2 7! : : : 7! i n 7! i 1 , for which the maximum value of c i 1 i 2 + c i 2 i 3 + : : : + c i n\Gamma1 i n + c i n i 1 is attained. Here (i 1 ; : : : ; i n ) is a permutation of the set f1; : : : ; ng. Of course, in this general setting, the Maximum Traveling Salesman Problem is equivalent to the Minimum Traveling Salesman Problem, Partially supported by NSF Grant DMS 9734138 since the maximum weight hamiltonian cycle with the weight matrix c corresponds to the minimum weight hamiltonian cycle with the weight matrix \Gammac. What makes the MAX TSP special is that there are some interesting and natural special cases of weights c ij , not preserved by the sign reversal, where much more can be said about the problem than in the general case. Be
Provisioning a Virtual Private Network: A network design problem for multicommodity flow
- In Proceedings of the 33rd Annual ACM Symposium on Theory of Computing
, 2001
"... Consider a setting in which a group of nodes, situated in a large underlying network, wishes to reserve bandwidth on which to support communication. Virtual private networks (VPNs) are services that support such a construct; rather than building a new physical network on the group of nodes that must ..."
Abstract
-
Cited by 69 (12 self)
- Add to MetaCart
Consider a setting in which a group of nodes, situated in a large underlying network, wishes to reserve bandwidth on which to support communication. Virtual private networks (VPNs) are services that support such a construct; rather than building a new physical network on the group of nodes that must be connected, bandwidth in the underlying network is reserved for communication within the group, forming a virtual “sub-network.” Provisioning a virtual private network over a set of terminals gives rise to the following general network design problem. We have bounds on the cumulative amount of traffic each terminal can send and receive; we must choose a path for each pair of terminals, and a bandwidth allocation for each edge of the network, so that any traffic matrix consistent with the given upper bounds can be feasibly routed. Thus, we are seeking to design a network that can support a continuum of possible traffic scenarios. We provide optimal and approximate algorithms for several variants of this problem, depending on whether the traffic matrix is required to be symmetric, and on whether the designed network is required to be a tree (a natural constraint in a number of basic applications). We also establish a relation between this collection of network design problems and a variant of the facility location problem introduced by Karger and Minkoff; we extend their results by providing a stronger approximation algorithm for this latter problem. 1
The objective method: Probabilistic combinatorial optimization and local weak convergence
, 2003
"... ..."

