Results 1 - 10
of
10
Random-walk computation of similarities between nodes of a graph, with application to collaborative recommendation
- IEEE Transactions on Knowledge and Data Engineering
, 2006
"... Abstract—This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markov-chain model of random walk through the database. More precisely, we compute quantities (the average comm ..."
Abstract
-
Cited by 55 (12 self)
- Add to MetaCart
Abstract—This work presents a new perspective on characterizing the similarity between elements of a database or, more generally, nodes of a weighted and undirected graph. It is based on a Markov-chain model of random walk through the database. More precisely, we compute quantities (the average commute time, the pseudoinverse of the Laplacian matrix of the graph, etc.) that provide similarities between any pair of nodes, having the nice property of increasing when the number of paths connecting those elements increases and when the “length ” of paths decreases. It turns out that the square root of the average commute time is a Euclidean distance and that the pseudoinverse of the Laplacian matrix is a kernel matrix (its elements are inner products closely related to commute times). A principal component analysis (PCA) of the graph is introduced for computing the subspace projection of the node vectors in a manner that preserves as much variance as possible in terms of the Euclidean commute-time distance. This graph PCA provides a nice interpretation to the “Fiedler vector, ” widely used for graph partitioning. The model is evaluated on a collaborativerecommendation task where suggestions are made about which movies people should watch based upon what they watched in the past. Experimental results on the MovieLens database show that the Laplacian-based similarities perform well in comparison with other methods. The model, which nicely fits into the so-called “statistical relational learning ” framework, could also be used to compute document or word similarities, and, more generally, it could be applied to machine-learning and pattern-recognition tasks involving a relational database. Index Terms—Graph analysis, graph and database mining, collaborative recommendation, graph kernels, spectral clustering, Fiedler vector, proximity measures, statistical relational learning. 1
A Systematic Approach for Optimizing Complex Mining Tasks on Multiple Databases
"... and iterative process. In order to support this process, one of the long-term goals of data mining research has been to build a Knowledge Discovery and Data Mining System (KDDMS). Along this line, much research has been done to provide database support for mining operations. ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
and iterative process. In order to support this process, one of the long-term goals of data mining research has been to build a Knowledge Discovery and Data Mining System (KDDMS). Along this line, much research has been done to provide database support for mining operations.
Comparison of graph-based and logic-based multi-relational data mining
- SIGKDD Explorations
, 2005
"... The goal of this paper is to generate insights about the differences between graph-based and logic-based approaches to multi-relational data mining by performing a case study of graph-based system, Subdue and the inductive logic programming system, CProgol. We identify three key factors for comparin ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
The goal of this paper is to generate insights about the differences between graph-based and logic-based approaches to multi-relational data mining by performing a case study of graph-based system, Subdue and the inductive logic programming system, CProgol. We identify three key factors for comparing graph-based and logic-based multi-relational data mining; namely, the ability to discover structurally large concepts, the ability to discover semantically complicated concepts and the ability to effectively utilize background knowledge. We perform an experimental comparison of Subdue and CProgol on the Mutagenesis domain and various artificially generated Bongard problems. Experimental results indicate that Subdue can significantly outperform CProgol while discovering structurally large multi-relational concepts. It is also observed that CProgol is better at learning semantically complicated concepts and it tends to use background knowledge more effectively than Subdue. 1.
Recommendation on item graphs
- In ICDM’06
, 2006
"... A novel scheme for item-based recommendation is proposed in this paper. In our framework, the items are described by an undirected weighted graph G = (V, E). V is the node set which is identical to the item set, and E is the edge set. Associate with each edge eij ∈ E is a weight wij 0, which represe ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
A novel scheme for item-based recommendation is proposed in this paper. In our framework, the items are described by an undirected weighted graph G = (V, E). V is the node set which is identical to the item set, and E is the edge set. Associate with each edge eij ∈ E is a weight wij 0, which represents similarity between items i and j. Without the loss of generality, we assume that any user’s ratings to the items should be sufficiently smooth with respect to the intrinsic structure of the items, i.e., a user should give similar ratings to similar items. A simple algorithm is presented to achieve such a “smooth ” solution. Encouraging experimental results are provided to show the effectiveness of our method. 1.
A Method for Multi-Relational Classification Using Single and Multi-Feature Aggregation Functions
"... Abstract. This paper presents a novel method for multi-relational classification via an aggregation-based Inductive Logic Programming (ILP) approach. We extend the classical ILP representation by aggregation of multiple-features which aid the classification process by allowing for the analysis of re ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Abstract. This paper presents a novel method for multi-relational classification via an aggregation-based Inductive Logic Programming (ILP) approach. We extend the classical ILP representation by aggregation of multiple-features which aid the classification process by allowing for the analysis of relationships and dependencies between different features. In order to efficiently learn rules of this rich format, we present a novel algorithm capable of performing aggregation with the use of virtual joins of the data. By using more expressive aggregation predicates than the existential quantifier used in standard ILP methods, we improve the accuracy of multi-relational classification. This claim is supported by experimental evaluation on three different real world datasets.
Exploring the power of heuristics and links in multi-relational data mining
- In Foundations of Intelligent Systems (ISMIS
, 2008
"... Abstract. Relational databases are the most popular repository for structured data, and are thus one of the richest sources of knowledge in the world. Because of the complexity of relational data, it is a challenging task to design efficient and scalable data mining approaches in relational database ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. Relational databases are the most popular repository for structured data, and are thus one of the richest sources of knowledge in the world. Because of the complexity of relational data, it is a challenging task to design efficient and scalable data mining approaches in relational databases. In this paper we discuss two methodologies to address this issue. The first methodology is to use heuristics to guide the data mining procedure, in order to avoid aimless, exhaustive search in relational databases. The second methodology is to assign certain property to each object in the database, and let different objects interact with each other along the links. Experiments show that both approaches achieve high efficiency and accuracy in real applications. 1
Cost-Based Query Optimization for Complex Pattern Mining on Multiple Databases
"... For complex data mining queries, query optimization issues arise, similar to those for the traditional database queries. However, few works have applied the cost-based query optimization, which is the key technique in optimizing traditional database queries, on complex mining queries. In this work, ..."
Abstract
- Add to MetaCart
For complex data mining queries, query optimization issues arise, similar to those for the traditional database queries. However, few works have applied the cost-based query optimization, which is the key technique in optimizing traditional database queries, on complex mining queries. In this work, we develop a cost-based query optimization framework to an important collection of data mining queries, i.e. frequent pattern mining across multiple databases. Specifically, we make the following contributions: 1) We present a rich class of queries on mining frequent itemsets across multiple datasets supported by a SQL-based mechanism. 2) We present an approach to enumerate all possible query plans for the mining queries, and develop a dynamic programming approach and a branch-and-bound approach based on the enumeration algorithm to find optimal query plans with the least mining cost. 3) We introduce models to estimate the cost of individual mining operators. 4) We evaluate our query optimization techniques on both real and synthetic datasets and show significant performance improvements. 1.
Tamil Nadu,India
"... Classification is an important task in data mining and machine learning, which has been studied extensively and has a wide range of applications. Lots of algorithms have been proposed to build accurate and scalable classifiers. Most of these algorithms can only applied to single “flat “ relations, w ..."
Abstract
- Add to MetaCart
Classification is an important task in data mining and machine learning, which has been studied extensively and has a wide range of applications. Lots of algorithms have been proposed to build accurate and scalable classifiers. Most of these algorithms can only applied to single “flat “ relations, whereas in the real world most data are stored in multiple tables. As converting data from multiple relations into single flat relation usually causes many problems, development of classification across multiple database relations becomes important. In this paper, we present the several kinds of classification method across multiple database relations including Inductive Logic Programming (ILP), Relational database, Emerging Pattern, Associative approaches and their characteristics, the comparisons in detail. classification with four main categories such as i). ILP based MRC (LBRC), ii). Relational database based MRC (RBRC), iii). Emerging Patterns based MRC iv). Associative MRC. The Figure.1 shows the four categories of classification across multiple database relations. An extensive survey of literature was made to identify various research issues in this filed. The following five sections present different methods and the research directions in each area.
Data Profiling Using Attribute Clustering
"... Abstract — Finding trends in database data is hard when presented with data sets containing many attributes (columns). The difficulty is increased when the data is in text fields and may include large summary or remarks fields. This paper discusses an approach that uses attribute level clustering in ..."
Abstract
- Add to MetaCart
Abstract — Finding trends in database data is hard when presented with data sets containing many attributes (columns). The difficulty is increased when the data is in text fields and may include large summary or remarks fields. This paper discusses an approach that uses attribute level clustering in order to discover trends or profiles in the data. This is different from traditional uses of clustering in that each attribute is clustered separately and then the results are combined to define profiles. For example, in a case study of the Global Terrorism Database (GTD) data set, there are 98 columns (attributes) in the data. A profile might be defined by a particular group, attack type, weapon type and by specific information found in larger remarks-type fields. The profiles will show the values of these attributes along with all the records that matched that profile.
Multi Relational Data Mining Approaches: A Data Mining Technique
"... The multi relational data mining approach has developed as an alternative way for handling the structured data such that RDBMS. This will provides the mining in multiple tables directly. In MRDM the patterns are available in multiple tables (relations) from a relational database. As the data are ava ..."
Abstract
- Add to MetaCart
The multi relational data mining approach has developed as an alternative way for handling the structured data such that RDBMS. This will provides the mining in multiple tables directly. In MRDM the patterns are available in multiple tables (relations) from a relational database. As the data are available over the many tables which will affect the many problems in the practice of the data mining. To deal with this problem, one either constructs a single table by Propositionalisation, or uses a Multi-Relational Data Mining algorithm.MRDM approaches have been successfully applied in the area of bioinformatics. Three popular pattern finding techniques classification, clustering and association are frequently used in MRDM. Multi relational approach has developed as an alternative for analyzing the structured data such as relational database. MRDM allowing applying directly in the data mining in multiple tables. To avoid the expensive joining operations and semantic losses we used the MRDM technique. This paper focuses some of the application areas of MRDM and feature directions as well as the

