Results 1  10
of
189
Randomwalk computation of similarities between nodes of a graph, with application to collaborative recommendation
 IEEE Transactions on Knowledge and Data Engineering
, 2006
"... Abstract—This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markovchain model of random walk through the database. More precisely, we compute quantities (the average comm ..."
Abstract

Cited by 122 (18 self)
 Add to MetaCart
Abstract—This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markovchain model of random walk through the database. More precisely, we compute quantities (the average commute time, the pseudoinverse of the Laplacian matrix of the graph, etc.) that provide similarities between any pair of nodes, having the nice property of increasing when the number of paths connecting those elements increases and when the “length ” of paths decreases. It turns out that the square root of the average commute time is a Euclidean distance and that the pseudoinverse of the Laplacian matrix is a kernel matrix (its elements are inner products closely related to commute times). A principal component analysis (PCA) of the graph is introduced for computing the subspace projection of the node vectors in a manner that preserves as much variance as possible in terms of the Euclidean commutetime distance. This graph PCA provides a nice interpretation to the “Fiedler vector, ” widely used for graph partitioning. The model is evaluated on a collaborativerecommendation task where suggestions are made about which movies people should watch based upon what they watched in the past. Experimental results on the MovieLens database show that the Laplacianbased similarities perform well in comparison with other methods. The model, which nicely fits into the socalled “statistical relational learning ” framework, could also be used to compute document or word similarities, and, more generally, it could be applied to machinelearning and patternrecognition tasks involving a relational database. Index Terms—Graph analysis, graph and database mining, collaborative recommendation, graph kernels, spectral clustering, Fiedler vector, proximity measures, statistical relational learning. 1
A Clustering Algorithm based on Graph Connectivity
 Information Processing Letters
, 1999
"... We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques. ..."
Abstract

Cited by 102 (3 self)
 Add to MetaCart
We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques.
Mining email social networks
 in Proceedings of the 3rd International Workshop on Mining Software Repositories
, 2006
"... Communication & Coordination activities are central to large software projects, but are difficult to observe and study in traditional (closedsource, commercial) settings because of the prevalence of informal, direct communication modes. OSS projects, on the other hand, use the internet as the ..."
Abstract

Cited by 94 (11 self)
 Add to MetaCart
Communication & Coordination activities are central to large software projects, but are difficult to observe and study in traditional (closedsource, commercial) settings because of the prevalence of informal, direct communication modes. OSS projects, on the other hand, use the internet as the communication medium, and typically conduct discussions in an open, public manner. As a result, the email archives of OSS projects provide a useful trace of the communication and coordination activities of the participants. However, there are various challenges that must be addressed before this data can be effectively mined. Once this is done, we can construct social networks of email correspondents, and begin to address some interesting questions. These include questions relating to participation in the email; the social status of different types of OSS participants; the relationship of email activity and commit activity (in the CVS repositories) and the relationship of social status with commit activity. In this paper, we begin with a discussion of our infrastructure and then discuss our approach to mining the email archives; and finally we present some preliminary results from our data analysis.
Landscapes and Their Correlation Functions
, 1996
"... Fitness landscapes are an important concept in molecular evolution. Many important examples of landscapes in physics and combinatorial optimation, which are widely used as model landscapes in simulations of molecular evolution and adaptation, are "elementary", i.e., they are (up to an addi ..."
Abstract

Cited by 90 (15 self)
 Add to MetaCart
Fitness landscapes are an important concept in molecular evolution. Many important examples of landscapes in physics and combinatorial optimation, which are widely used as model landscapes in simulations of molecular evolution and adaptation, are "elementary", i.e., they are (up to an additive constant) eigenfuctions of a graph Laplacian. It is shown that elementary landscapes are characterized by their correlation functions. The correlation functions are in turn uniquely determined by the geometry of the underlying configuration space and the nearest neighbor correlation of the elementary landscape. Two types of correlation functions are investigated here: the correlation of a time series sampled along a random walk on the landscape and the correlation function with respect to a partition of the set of all vertex pairs.
A dynamic survey of graph labellings
 Electron. J. Combin., Dynamic Surveys(6):95pp
, 2001
"... A graph labeling is an assignment of integers to the vertices or edges, or both, subject to certain conditions. Graph labelings were first introduced in the late 1960s. In the intervening years dozens of graph labelings techniques have been studied in over 1000 papers. Finding out what has been done ..."
Abstract

Cited by 82 (0 self)
 Add to MetaCart
A graph labeling is an assignment of integers to the vertices or edges, or both, subject to certain conditions. Graph labelings were first introduced in the late 1960s. In the intervening years dozens of graph labelings techniques have been studied in over 1000 papers. Finding out what has been done for any particular kind of labeling and keeping up with new discoveries is difficult because of the sheer number of papers and because many of the papers have appeared in journals that are not widely available. In this survey I have collected everything I could find on graph labeling. For the convenience of the reader the survey includes a detailed table of contents and index.
Generic Properties of Combinatory Maps  Neutral Networks of RNA Secondary Structures
, 1995
"... Random graph theory is used to model relationships between sequences and secondary structures of RNA molecules. Sequences folding into identical structures form neutral networks which percolate sequence space if the fraction of neutral nearest neighbors exceeds a threshold value. The networks of any ..."
Abstract

Cited by 81 (36 self)
 Add to MetaCart
Random graph theory is used to model relationships between sequences and secondary structures of RNA molecules. Sequences folding into identical structures form neutral networks which percolate sequence space if the fraction of neutral nearest neighbors exceeds a threshold value. The networks of any two different structures almost touch each other, and sequences folding into almost all "common" structures can be found in a small ball of an arbitrary location in sequence space. The results from random graph theory are compared with data obtained by folding large samples of RNA sequences. Differences are explained in terms of RNA molecular structures. 1.
The Principal Components Analysis of a Graph, and its Relationships to Spectral Clustering
 Proceedings of the 15th European Conference on Machine Learning (ECML 2004). Lecture Notes in Artificial Intelligence
, 2004
"... This work presents a novel procedure for computing (1) distances between nodes of a weighted, undirected, graph, called the Euclidean Commute Time Distance (ECTD), and (2) a subspace projection of the nodes of the graph that preserves as much variance as possible, in terms of the ECTD  a princi ..."
Abstract

Cited by 69 (18 self)
 Add to MetaCart
This work presents a novel procedure for computing (1) distances between nodes of a weighted, undirected, graph, called the Euclidean Commute Time Distance (ECTD), and (2) a subspace projection of the nodes of the graph that preserves as much variance as possible, in terms of the ECTD  a principal components analysis of the graph. It is based on a Markovchain model of random walk through the graph. The model assigns transition probabilities to the links between nodes, so that a random walker can jump from node to node. A quantity, called the average commute time, computes the average time taken by a random walker for reaching node j when starting from node i, and coming back to node i. The square root of this quantity, the ECTD, is a distance measure between any two nodes, and has the nice property of decreasing when the number of paths connecting two nodes increases and when the "length" of any path decreases. The ECTD can be computed from the pseudoinverse of the Laplacian matrix of the graph, which is a kernel. We finally define the Principal Components Analysis (PCA) of a graph as the subspace projection that preserves as much variance as possible, in terms of the ECTD. This graph PCA has some interesting links with spectral graph theory, in particular spectral clustering.
Distribution of power in exchange networks: Theory and experimental results
 American Journal of Sociology
, 1983
"... you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, noncommercial use. Please contact the publisher regarding any further use of this work. Publisher contact inform ..."
Abstract

Cited by 58 (3 self)
 Add to MetaCart
you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, noncommercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at
An Algorithm for Clustering cDNAs for Gene Expression Analysis
 In RECOMB99: Proceedings of the Third Annual International Conference on Computational Molecular Biology
, 1999
"... We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques. A similarity graph is defined and clusters in that graph correspond to highly connected subgraphs. A polynomial algorithm to compute them efficiently is presented. Our algorithm produces a clusterin ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
We have developed a novel algorithm for cluster analysis that is based on graph theoretic techniques. A similarity graph is defined and clusters in that graph correspond to highly connected subgraphs. A polynomial algorithm to compute them efficiently is presented. Our algorithm produces a clustering with some provably good properties. The application that motivated this study was gene expression analysis, where a collection of cDNAs must be clustered based on their oligonucleotide fingerprints. The algorithm has been tested intensively on simulated libraries and was shown to outperform extant methods. It demonstrated robustness to high noise levels. In a blind test on real cDNA fingerprint data the algorithm obtained very good results. Utilizing the results of the algorithm would have saved over 70% of the cDNA sequencing cost on that data set. 1 Introduction Cluster analysis seeks grouping of data elements into subsets, so that elements in the same subset are in some sense more cl...
Why is the Snowflake Schema a Good Data Warehouse Design?
 Information Systems
"... Database design for data warehouses is based on the notion of the snowflake schema and its important special case, the star schema. The snowflake schema represents a dimensional model which is composed of a central fact table and a set of constituent dimension tables which can be further broken up i ..."
Abstract

Cited by 27 (0 self)
 Add to MetaCart
Database design for data warehouses is based on the notion of the snowflake schema and its important special case, the star schema. The snowflake schema represents a dimensional model which is composed of a central fact table and a set of constituent dimension tables which can be further broken up into subdimension tables. We formalise the concept of a snowflake schema in terms of an acyclic database schema whose join tree satisfies certain structural properties. We then define a normal form for snowflake schemas which captures its intuitive meaning with respect to a set of functional and inclusion dependencies. We show that snowflake schemas in this normal form are independent as well as separable when the relation schemas are pairwise incomparable. This implies that relations in the data warehouse can be updated independently of each other as long as referential integrity is maintained. In addition, we show that a data warehouse in snowflake normal form can be queried by joining the relation over the fact table with the relations over its dimension and subdimension tables. We also examine an informationtheoretic interpretation of the snowflake schema and show that the redundancy of the primary key of the fact table is zero. Key words. Data warehouse design, star and snowflake schema, independent and separable database schema, acyclic database schema. 1