Results 1  10
of
402
Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching
, 2002
"... Matching elements of two data schemas or two data instances plays a key role in data warehousing, ebusiness, or even biochemical applications. In this paper we present a matching algorithm based on a fixpoint computation that is usable across different scenarios. The algorithm takes two graphs (sch ..."
Abstract

Cited by 436 (11 self)
 Add to MetaCart
Matching elements of two data schemas or two data instances plays a key role in data warehousing, ebusiness, or even biochemical applications. In this paper we present a matching algorithm based on a fixpoint computation that is usable across different scenarios. The algorithm takes two graphs (schemas, catalogs, or other data structures) as input, and produces as output a mapping between corresponding nodes of the graphs. Depending on the matching goal, a subset of the mapping is chosen using filters. After our algorithm runs, we expect a human to check and if necessary adjust the results. As a matter of fact, we evaluate the ‘accuracy ’ of the algorithm by counting the number of needed adjustments. We conducted a user study, in which our accuracy metric was used to estimate the labor savings that the users could obtain by utilizing our algorithm to obtain an initial matching. Finally, we illustrate how our matching algorithm is deployed as one of several highlevel operators in an implemented testbed for managing information models and mappings.
An optimal graph theoretic approach to data clustering: Theory and its application to image segmentation
 IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1993
"... AbstractA novel graph theoretic approach for data clustering is presented and its application to the image segmentation problem is demonstrated. The data to be clustered are represented by an undirected adjacency graph G with arc capacities assigned to reflect the similarity between the linked vert ..."
Abstract

Cited by 268 (0 self)
 Add to MetaCart
AbstractA novel graph theoretic approach for data clustering is presented and its application to the image segmentation problem is demonstrated. The data to be clustered are represented by an undirected adjacency graph G with arc capacities assigned to reflect the similarity between the linked vertices. Clustering is achieved by removing arcs of G to form mutually exclusive subgraphs such that the largest intersubgraph maximum flow is minimized. For graphs of moderate size ( 2000 vertices), the optimal solution is obtained through partitioning a flow and cut equivalent tree of 6, which can be efficiently constructed using the GomoryHu algorithm. However for larger graphs this approach is impractical. New theorems for subgraph condensation are derived and are then used to develop a fast algorithm which hierarchically constructs and partitions a partially equivalent tree of much reduced size. This algorithm results in an optimal solution equivalent to that obtained by partitioning the complete equivalent tree and is able to handle very large graphs with several hundred thousand vertices. The new clustering algorithm is applied to the image segmentation problem. The segmentation is achieved by effectively searching for closed contours of edge elements (equivalent to minimum cuts in G), which consist mostly of strong edges, while rejecting contours containing isolated strong edges. This method is able to accurately locate region boundaries and at the same time guarantees the formation of closed edge contours. Index TermsClustering, edge contours, flow and cut equivalent tree, graph theory, image segmentation, subgraph condensation. D I.
Bidding and Allocation in Combinatorial Auctions
 In ACM Conference on Electronic Commerce
, 2000
"... When an auction of multiple items is performed, it is often desirable to allow bids on combinations of items, as opposed to only on single items. Such an auction is often called "combinatorial ", and the exponential number of possible combinations results in computational intractability of many a ..."
Abstract

Cited by 242 (11 self)
 Add to MetaCart
When an auction of multiple items is performed, it is often desirable to allow bids on combinations of items, as opposed to only on single items. Such an auction is often called "combinatorial ", and the exponential number of possible combinations results in computational intractability of many aspects regarding such an auction. This paper considers two of these aspects: the bidding language and the allocation algorithm. First we consider which kinds of bids on combinations are allowed and how, i.e. in what language, they are specified. The basic tradeoff is the expressibility of the language versus its simplicity. We consider and formalize several bidding languages and compare their strengths. We prove exponential separations between the expressive power of different languages, and show that one language, "ORbids with phantom items", can polynomially simulate the others. We then consider the problem of determining the best allocation  a problem known to be computationally intractable. We suggest an approach based on Linear Programming (LP) and motivate it. We prove that the LP approach finds an optimal allocation if and only if prices can be attached to single items in the auction. We pinpoint several classes of auctions where this is the case, and suggest greedy and branchandbound heuristics based on LP for other cases. 1
The Markov Chain Monte Carlo method: an approach to approximate counting and integration
, 1996
"... In the area of statistical physics, Monte Carlo algorithms based on Markov chain simulation have been in use for many years. The validity of these algorithms depends crucially on the rate of convergence to equilibrium of the Markov chain being simulated. Unfortunately, the classical theory of stocha ..."
Abstract

Cited by 234 (13 self)
 Add to MetaCart
In the area of statistical physics, Monte Carlo algorithms based on Markov chain simulation have been in use for many years. The validity of these algorithms depends crucially on the rate of convergence to equilibrium of the Markov chain being simulated. Unfortunately, the classical theory of stochastic processes hardly touches on the sort of nonasymptotic analysis required in this application. As a consequence, it had previously not been possible to make useful, mathematically rigorous statements about the quality of the estimates obtained. Within the last ten years, analytical tools have been devised with the aim of correcting this deficiency. As well as permitting the analysis of Monte Carlo algorithms for classical problems in statistical physics, the introduction of these tools has spurred the development of new approximation algorithms for a wider class of problems in combinatorial enumeration and optimization. The “Markov chain Monte Carlo ” method has been applied to a variety of such problems, and often provides the only known efficient (i.e., polynomial time) solution technique.
Improved Steiner Tree Approximation in Graphs
, 2000
"... The Steiner tree problem in weighted graphs seeks a minimum weight connected subgraph containing a given subset of the vertices (terminals). We present a new polynomialtime heuristic with an approximation ratio approaching 1 + 2 1:55, which improves upon the previously bestknown approximation ..."
Abstract

Cited by 197 (8 self)
 Add to MetaCart
The Steiner tree problem in weighted graphs seeks a minimum weight connected subgraph containing a given subset of the vertices (terminals). We present a new polynomialtime heuristic with an approximation ratio approaching 1 + 2 1:55, which improves upon the previously bestknown approximation algorithm of [10] with performance ratio 1:59.
The Traveling Salesman Problem and Its Variations
, 2002
"... Introduction The Maximum Traveling Salesman Problem (MAX TSP), also known informally as the "taxicab ripoff problem", is stated as follows: Given an n \Theta n real matrix c = (c ij ), called a weight matrix, find a hamiltonian cycle i 1 7! i 2 7! : : : 7! i n 7! i 1 , for which the maximum value ..."
Abstract

Cited by 96 (4 self)
 Add to MetaCart
Introduction The Maximum Traveling Salesman Problem (MAX TSP), also known informally as the "taxicab ripoff problem", is stated as follows: Given an n \Theta n real matrix c = (c ij ), called a weight matrix, find a hamiltonian cycle i 1 7! i 2 7! : : : 7! i n 7! i 1 , for which the maximum value of c i 1 i 2 + c i 2 i 3 + : : : + c i n\Gamma1 i n + c i n i 1 is attained. Here (i 1 ; : : : ; i n ) is a permutation of the set f1; : : : ; ng. Of course, in this general setting, the Maximum Traveling Salesman Problem is equivalent to the Minimum Traveling Salesman Problem, Partially supported by NSF Grant DMS 9734138 since the maximum weight hamiltonian cycle with the weight matrix c corresponds to the minimum weight hamiltonian cycle with the weight matrix \Gammac. What makes the MAX TSP special is that there are some interesting and natural special cases of weights c ij , not preserved by the sign reversal, where much more can be said about the problem than in the general case. Be
The objective method: Probabilistic combinatorial optimization and local weak convergence
, 2003
"... ..."
An interactive clusteringbased approach to integrating source query interfaces on the deep web
 In SIGMOD
, 2004
"... An increasing number of data sources now become available on the Web, but often their contents are only accessible through query interfaces. For a domain of interest, there often exist many such sources with varied coverage or querying capabilities. As an important step to the integration of these s ..."
Abstract

Cited by 89 (14 self)
 Add to MetaCart
An increasing number of data sources now become available on the Web, but often their contents are only accessible through query interfaces. For a domain of interest, there often exist many such sources with varied coverage or querying capabilities. As an important step to the integration of these sources, we consider the integration of their query interfaces. More specifically, we focus on the crucial step of the integration: accurately matching the interfaces. While the integration of query interfaces has received more attentions recently, current approaches are not sufficiently general: (a) they all model interfaces with flat schemas; (b) most of them only consider 1:1 mappings of fields over the interfaces; (c) they all perform the integration in a blackboxlike fashion and the whole process has to be restarted from scratch if anything goes wrong; and (d) they often require laborious parameter tuning. In this paper, we propose an interactive, clusteringbased approach to matching query interfaces. The hierarchical nature of interfaces is captured with ordered trees. Varied types of complex mappings of fields are examined and several approaches are proposed to effectively identify these mappings. We put the human integrator back in the loop and propose several novel approaches to the interactive learning of parameters and the resolution of uncertain mappings. Extensive experiments are conducted and results show that our approach is highly effective. 1.
Fairness and Load Balancing in Wireless LANs Using Association Control
"... Recent studies on operational wireless LANs (WLANs) have shown that the traffic load is often unevenly distributed among the access points (APs). Such load imbalance results in unfair bandwidth allocation among users. We argue that the load imbalance and consequent unfair bandwidth allocation can ..."
Abstract

Cited by 85 (3 self)
 Add to MetaCart
Recent studies on operational wireless LANs (WLANs) have shown that the traffic load is often unevenly distributed among the access points (APs). Such load imbalance results in unfair bandwidth allocation among users. We argue that the load imbalance and consequent unfair bandwidth allocation can be greatly alleviated by intelligently associating users to APs, termed association control, rather than having users associate with the APs of strongest signal strength. In this paper, we present an efficient algorithmic solution to determine the userAP associations for maxmin fair bandwidth allocation. We provide a rigorous formulation of the association control problem, considering bandwidth constraints of both the wireless and backhaul links. We show the strong correlation between fairness and load balancing, which enables us to use load balancing techniques for obtaining optimal maxmin fair bandwidth allocation. As this problem is NPhard, we devise algorithms that achieve constantfactor approximation. In particular, we present a 2approximation algorithm for unweighted users and a 3approximation algorithm for weighted users. In our algorithms, we first compute a fractional association solution, in which users can be associated with multiple APs simultaneously. This solution guarantees the fairest bandwidth allocation in terms of maxmin fairness. Then, by utilizing a rounding method, we obtain the integral solution from the fractional solution. We also consider time fairness and present a polynomialtime algorithm for optimal integral solution. We further extend our schemes for the online case where users may join and leave dynamically. Our simulations demonstrate that the proposed algorithms achieve close to optimal load balancing (i.e., maxmin fairness) and they outperform commonlyused heuristic approaches.
Provisioning a Virtual Private Network: A network design problem for multicommodity flow
 In Proceedings of the 33rd Annual ACM Symposium on Theory of Computing
, 2001
"... Consider a setting in which a group of nodes, situated in a large underlying network, wishes to reserve bandwidth on which to support communication. Virtual private networks (VPNs) are services that support such a construct; rather than building a new physical network on the group of nodes that must ..."
Abstract

Cited by 80 (12 self)
 Add to MetaCart
Consider a setting in which a group of nodes, situated in a large underlying network, wishes to reserve bandwidth on which to support communication. Virtual private networks (VPNs) are services that support such a construct; rather than building a new physical network on the group of nodes that must be connected, bandwidth in the underlying network is reserved for communication within the group, forming a virtual “subnetwork.” Provisioning a virtual private network over a set of terminals gives rise to the following general network design problem. We have bounds on the cumulative amount of traffic each terminal can send and receive; we must choose a path for each pair of terminals, and a bandwidth allocation for each edge of the network, so that any traffic matrix consistent with the given upper bounds can be feasibly routed. Thus, we are seeking to design a network that can support a continuum of possible traffic scenarios. We provide optimal and approximate algorithms for several variants of this problem, depending on whether the traffic matrix is required to be symmetric, and on whether the designed network is required to be a tree (a natural constraint in a number of basic applications). We also establish a relation between this collection of network design problems and a variant of the facility location problem introduced by Karger and Minkoff; we extend their results by providing a stronger approximation algorithm for this latter problem. 1