Generic Schema Matching with Cupid
 In The VLDB Journal
, 2001
Cited by 593 (17 self)
Schema matching is a critical step in many applications, such as XML message mapping, data warehouse loading, and schema integration. In this paper, we investigate algorithms for generic schema matching, outside of any particular data model or application. We first present a taxonomy for past
Interior Point Methods in Semidefinite Programming with Applications to Combinatorial Optimization
 SIAM Journal on Optimization
, 1993
Cited by 557 (12 self)
and maximum stable set problems in perfect graphs, the maximum k partite subgraph problem in graphs, and va...
Community detection in graphs
, 2009
Cited by 801 (1 self)
The modern science of networks has brought significant advances to our understanding of complex systems. One of the most relevant features of graphs representing real systems is community structure, or clustering, i. e. the organization of vertices in clusters, with many edges joining vertices of the same cluster and comparatively few edges joining vertices of different clusters. Such
Biclustering of Expression Data
, 2000
Cited by 591 (0 self)
An efficient nodedeletion algorithm is introduced to find submatrices...
Exact Sampling with Coupled Markov Chains and Applications to Statistical Mechanics
, 1996
Cited by 548 (13 self)
For many applications it is useful to sample from a finite set of objects in accordance with some particular distribution. One approach is to run an ergodic (i.e., irreducible aperiodic) Markov chain whose stationary distribution is the desired distribution on this set; after the Markov chain has run for M steps, with M sufficiently large, the distribution governing the state of the chain approximates the desired distribution. Unfortunately it can be difficult to determine how large M needs to be. We describe a simple variant of this method that determines on its own when to stop, and that outputs samples in exact accordance with the desired distribution. The method uses couplings, which have also played a role in other sampling schemes; however, rather than running the coupled chains from the present into the future, one runs from a distant point in the past up until the present, where the distance into the past that one needs to go is determined during the running of the al...
Good ErrorCorrecting Codes based on Very Sparse Matrices
, 1999
Cited by 741 (23 self)
We study two families of errorcorrecting codes defined in terms of very sparse matrices. "MN" (MacKayNeal) codes are recently invented, and "Gallager codes" were first investigated in 1962, but appear to have been largely forgotten, in spite of their excellent properties. The decoding of both codes can be tackled with a practical sumproduct algorithm. We prove that these codes are "very good," in that sequences of codes exist which, when optimally decoded, achieve information rates up to the Shannon limit. This result holds not only for the binarysymmetric channel but also for any channel with symmetric stationary ergodic noise. We give experimental results for binarysymmetric channels and Gaussian channels demonstrating that practical performance substantially better than that of standard convolutional and concatenated codes can be achieved; indeed, the performance of Gallager codes is almost as close to the Shannon limit as that of turbo codes.
Empirical exchange rate models of the Seventies: do they fit out of sample?
 JOURNAL OF INTERNATIONAL ECONOMICS
, 1983
Cited by 831 (12 self)
This study compares the outofsample forecasting accuracy of various structural and time series exchange rate models. We find that a random walk model performs as well as any estimated model at one to twelve month horizons for the dollar/pound, dollar/mark, dollar/yen and tradeweighted dollar exchange rates. The candidate structural models include the flexibleprice (FrenkelBilson) and stickyprice (DornbuschFrankel) monetary models, and a stickyprice model which incorporates the current account (HooperMorton). The structural models perform poorly despite the fact that we base their forecasts on actual realized values of future explanatory variables.
An Efficient Boosting Algorithm for Combining Preferences
, 1999
Cited by 707 (18 self)
The problem of combining preferences arises in several applications, such as combining the results of different search engines. This work describes an efficient algorithm for combining multiple preferences. We first give a formal framework for the problem. We then describe and analyze a new boosting algorithm for combining preferences called RankBoost. We also describe an efficient implementation of the algorithm for certain natural cases. We discuss two experiments we carried out to assess the performance of RankBoost. In the first experiment, we used the algorithm to combine different WWW search strategies, each of which is a query expansion for a given domain. For this task, we compare the performance of RankBoost to the individual search strategies. The second experiment is a collaborativefiltering task for making movie recommendations. Here, we present results comparing RankBoost to nearestneighbor and regression algorithms.
Distributed hierarchical processing in the primate cerebral cortex
 Cereb Cortex
, 1991
Cited by 901 (6 self)
In recent years, many new cortical areas have been identified in the macaque monkey. The number of identified connections between areas has increased even more dramatically. We report here on (1) a summary of the layout of cortical areas associated with vision and with other modalities, (2) a computerized database for storing and representing large amounts of information on connectivity patterns, and (3) the application of these data to the analysis of hierarchical organization of the cerebral cortex. Our analysis concentrates on the visual system, which includes 25 neocortical areas that are predominantly or exclusively visual in function, plus an additional 7 areas that we regard as visualassociation areas on the basis of their extensive visual inputs. A total of 305 connections among these 32 visual and
Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations
, 2005
Cited by 534 (48 self)
How do real graphs evolve over time? What are “normal” growth patterns in social, technological, and information networks? Many studies have discovered patterns in static graphs, identifying properties in a single snapshot of a large network, or in a very small number of snapshots; these include heavy tails for in and outdegree distributions, communities, smallworld phenomena, and others. However, given the lack of information about network evolution over long periods, it has been hard to convert these findings into statements about trends over time. Here we study a wide range of real graphs, and we observe some surprising phenomena. First, most of these graphs densify over time, with the number of edges growing superlinearly in the number of nodes. Second, the average distance between nodes often shrinks over time, in contrast to the conventional wisdom that such distance parameters should increase slowly as a function of the number of nodes (like O(log n) orO(log(log n)). Existing graph generation models do not exhibit these types of behavior, even at a qualitative level. We provide a new graph generator, based on a “forest fire” spreading process, that has a simple, intuitive justification, requires very few parameters (like the “flammability” of nodes), and produces graphs exhibiting the full range of properties observed both in prior work and in the present study.
