## Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching (2002)

### Cached

### Download Links

Citations: | 436 - 11 self |

### BibTeX

@MISC{Melnik02similarityflooding:,

author = {Sergey Melnik and Hector Garcia-molina and Erhard Rahm},

title = {Similarity Flooding: A Versatile Graph Matching Algorithm and Its Application to Schema Matching},

year = {2002}

}

### Years of Citing Articles

### OpenURL

### Abstract

Matching elements of two data schemas or two data instances plays a key role in data warehousing, e-business, or even biochemical applications. In this paper we present a matching algorithm based on a fixpoint computation that is usable across different scenarios. The algorithm takes two graphs (schemas, catalogs, or other data structures) as input, and produces as output a mapping between corresponding nodes of the graphs. Depending on the matching goal, a subset of the mapping is chosen using filters. After our algorithm runs, we expect a human to check and if necessary adjust the results. As a matter of fact, we evaluate the ‘accuracy ’ of the algorithm by counting the number of needed adjustments. We conducted a user study, in which our accuracy metric was used to estimate the labor savings that the users could obtain by utilizing our algorithm to obtain an initial matching. Finally, we illustrate how our matching algorithm is deployed as one of several high-level operators in an implemented testbed for managing information models and mappings.

### Citations

3249 | The anatomy of a large-scale hypertextual web search engine
- Brin, Page
- 1998
(Show Context)
Citation Context ...ponds to random walks over graphs [16], as explained in [14]. A well-known example of using fixpoint computation for ranking nodes in graphs is the PageRank algorithm used in the Google search engine =-=[3]-=-. Unlike PageRank, our algorithm has two source graphs and extensively uses and depends on edge labeling. The filters that we proposed for choosing subsets of multimappings are based on the intuition ... |

1872 | Randomized Algorithms
- Motwani, Raghavan
- 1995
(Show Context)
Citation Context ...than those performed on models to be matched. In designing our algorithm and the filters, we borrowed ideas from three research areas. The fixpoint computation corresponds to random walks over graphs =-=[16]-=-, as explained in [14]. A well-known example of using fixpoint computation for ranking nodes in graphs is the PageRank algorithm used in the Google search engine [3]. Unlike PageRank, our algorithm ha... |

749 | Resource Description Framework (RDF) Model and Syntax Specification, W3C Recommendation
- Ora, Swick
- 1999
(Show Context)
Citation Context ...tric that we use for evaluating the algorithm is related to the precision/recall metrics developed in the context of information retrieval. The data model used in this paper is based on the RDF model =-=[10]-=-. For transforming native data into graphs we use graph-based models defined for different applications (see e.g. [1, 17, 7]). Proceedings of the 18th International Conference on Data Engineering (ICD... |

504 |
1986]: Matching Theory
- Lovász, Plummer
(Show Context)
Citation Context ... our selection dilemma is closely related to well-known matching problems in bipartite graphs, so that we can build on intuitions and algorithms developed for solving this class of problems (see e.g. =-=[12, 8]-=-). In the graph matching literature, a matching is defined as a mapping with cardinality � � ℄ � � ℄, i.e., a set of edges no two of which are incident on the same node. A bipartite graph is one whose... |

480 | Object exchange across heterogeneous information sources
- Papakonstantinou, Garcia-Molina, et al.
- 1995
(Show Context)
Citation Context ...nformation retrieval. The data model used in this paper is based on the RDF model [10]. For transforming native data into graphs we use graph-based models defined for different applications (see e.g. =-=[1, 17, 7]-=-). Proceedings of the 18th International Conference on Data Engineering (ICDE’02) 1063-6382/02 $17.00 © 2002 IEEE8. Conclusion In this paper we presented a simple structural algorithm based on fixpoi... |

459 | E.:“Generic schema matching with Cupid
- Madhavan, Bernstein, et al.
- 2001
(Show Context)
Citation Context ...oduces additional preparing and training effort. Concurrently and independently to the work reported in this paper, a generic schema matching approach called Cupid was developed at Microsoft Research =-=[13]-=-. It uses a comprehensive name matching based on synonym tables and other thesauri as well as a new structural matching approach considering data types and topological adjacency of schema elements. Ma... |

351 | A.: “Reconciling Schemas of Disparate Data Sources: a machine learning approach
- Doan, Domingos, et al.
(Show Context)
Citation Context ...e how different filters and parameters of the algorithm affect the match results. For our study we used nine relatively simple match problems. 2 Some of the problems were borrowed from reseach papers =-=[15, 6, 18]-=-. Others were derived from data used on the web2 The complete specification of the match tasks handed out to the users is available at http://wwwdb.stanford.edu/�melnik/mm/sfa/ Proceedings of the 18th... |

233 |
The Stable Marriage Problem. Structure and Algorithms
- Gusfield, Irving
- 1989
(Show Context)
Citation Context ... our selection dilemma is closely related to well-known matching problems in bipartite graphs, so that we can build on intuitions and algorithms developed for solving this class of problems (see e.g. =-=[12, 8]-=-). In the graph matching literature, a matching is defined as a mapping with cardinality � � ℄ � � ℄, i.e., a set of edges no two of which are incident on the same node. A bipartite graph is one whose... |

214 | Integration of heterogeneous databases without common domains using queries based on textual similarity
- Cohen
- 1998
(Show Context)
Citation Context ...idering data types and topological adjacency of schema elements. Many other studies have used more sophisticated linguistic (name/text) matchers compared to our very simple string matcher, e.g. WHIRL =-=[5]-=-. The work in [15] addresses the related problem of determining mapping expressions between matching elements. In general, match algorithms developed by different researchers are hard to compare since... |

192 |
Schema Mapping as Query Discovery
- Miller, Haas, et al.
- 2000
(Show Context)
Citation Context ...s and topological adjacency of schema elements. Many other studies have used more sophisticated linguistic (name/text) matchers compared to our very simple string matcher, e.g. WHIRL [5]. The work in =-=[15]-=- addresses the related problem of determining mapping expressions between matching elements. In general, match algorithms developed by different researchers are hard to compare since most of them are ... |

135 | A vision of management of complex models
- Bernstein, Halevy, et al.
(Show Context)
Citation Context ...rage [14]. The flooding algorithm is relatively insensitive to ‘errors’ in initial similarity values. 7. Related Work Our work was inspired by model management scenarios presented in the vision paper =-=[2]-=- by Bernstein et al. In particular, our scripts use similar high-level operations on models. Such an approach can significantly simplify the development of metadata-based tasks and applications compar... |

128 | SEMINT: A Tool for Identifying Attribute Correspondences in Heterogeneous Databases Using Neural Networks
- Li, Clifton
- 2009
(Show Context)
Citation Context ...ings obtained by using an automatic matcher as accuracy of match result, defined as . In a perfect match, � � , resulting in accuracy 1. Notice that and correspond to recall and precision of matching =-=[11]-=-. Hence, we can express match accuracy as a function of recall and precision as follows: Accuracy � � � Recall � Precision In the above definition, the notion of accuracy only makes sense if precision... |

115 | Meaningful change detection in structured data
- Chawathe, Garcia-Molina
- 1997
(Show Context)
Citation Context ...asuring the matching accuracy [11, 6] did not consider the extra work caused by wrong match proposals. Our accuracy metric is similar in spirit to measuring the length of edit scripts as suggested in =-=[4]-=-. However, we are counting the edit operations on mappings, rather than those performed on models to be matched. In designing our algorithm and the filters, we borrowed ideas from three research areas... |

76 | On matching schemas automatically
- Rahm, Bernstein
- 2001
(Show Context)
Citation Context ...e how different filters and parameters of the algorithm affect the match results. For our study we used nine relatively simple match problems. 2 Some of the problems were borrowed from reseach papers =-=[15, 6, 18]-=-. Others were derived from data used on the web2 The complete specification of the match tasks handed out to the users is available at http://wwwdb.stanford.edu/�melnik/mm/sfa/ Proceedings of the 18th... |

32 |
Post-genome informatics
- Kanehisa
- 2000
(Show Context)
Citation Context ...lopers. In this scenario, matching helps identify moved or modified elements in these complex data structures. In bioinformatics, matching has been used for network analysis of molecular interactions =-=[9]-=-. In this domain, data instances represent metabolic networks of chemical compounds, or molecular assembly maps. Matching of molecular networks and biochemical pathways helps to predict metabolism of ... |