## Unsupervised Link Discovery in Multi-relational Data via Rarity Analysis (2003)

### Cached

### Download Links

- [www.csie.ntu.edu.tw]
- [www.isi.edu]
- [www.isi.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | IEEE International Conference on Data Mining |

Citations: | 33 - 2 self |

### BibTeX

@INPROCEEDINGS{Lin03unsupervisedlink,

author = {Shou-de Lin},

title = {Unsupervised Link Discovery in Multi-relational Data via Rarity Analysis},

booktitle = {IEEE International Conference on Data Mining},

year = {2003},

pages = {171--178}

}

### Years of Citing Articles

### OpenURL

### Abstract

A significant portion of knowledge discovery and data mining research focuses on finding patterns of interest in data. Once a pattern is found, it can be used to recognize satisfying instances. The new area of link discovery requires a complementary approach, since patterns of interest might not yet be known or might have too few examples to be learnable. This paper presents an unsupervised link discovery method aimed at discovering unusual, interestingly linked entities in multi-relational datasets. Various notions of rarity are introduced to measure the "interestingness " of sets of paths and entities. These measurements have been implemented and applied to a real-world bibliographic dataset where they give very promising results. 1.

### Citations

3296 | The anatomy of a large-scale hypertextual web search engine
- Brin, Page
- 1998
(Show Context)
Citation Context ...requires the occurrence of very common patterns in the data. Other analysis algorithms such as PageRank compute the importance of links through the connections between nodes in an unsupervised manner =-=[12, 13]-=-. In that framework, however, all relations are treated to be identical (that is, “A kills B” is not different from “A writes to B“), therefore, this approach is not suitable for the multi-relational ... |

1722 |
Social Network Analysis: Methods and Applications
- Wasserman, Faust
- 1994
(Show Context)
Citation Context ...es. This kind of data can naturally be represented by a labeled graph such as the one shown in Figure 1 where nodes stand for entities and links for binary relations. For example, social network data =-=[16]-=- or Web-pages with proper classification on hyperlinks can be represented in this way. We also assume that the data employs a rich vocabulary of relations where different link types represent differen... |

268 | The KDD Process for Extracting Useful Knowledge from Volumes of Data
- Fayyad, Piatetsky-Shapiro, et al.
- 1996
(Show Context)
Citation Context ...raditionally, knowledge discovery and data mining research focuses on discovering and extracting previously unknown, valid, novel, potentially useful and understandable patterns from lower-level data =-=[20]-=-. Such patterns can be represented as association rules, classification rules, clusters, sequential patterns, time series, contingency tables, etc [9]. Identifying “interesting” information in large, ... |

259 | Algorithms for Mining Distance-Based Outliers
- Knorr, Ng
- 1998
(Show Context)
Citation Context ... the multi-relational NLD problem. The area of outlier detection in data mining and statistics aims at detecting points that are considerably dissimilar or inconsistent with the remainder of the data =-=[2, 3, 7, 14, 15]-=-. This is conceptually related to our use of rarity analysis to solve the NLD problem. Current research on outlier detection, however, analyzes primarily numerical entity-attribute data instead of mul... |

225 | Efficient Algorithms for Mining Outliers from Large
- Ramaswamy, Rastogi, et al.
- 2000
(Show Context)
Citation Context ... the multi-relational NLD problem. The area of outlier detection in data mining and statistics aims at detecting points that are considerably dissimilar or inconsistent with the remainder of the data =-=[2, 3, 7, 14, 15]-=-. This is conceptually related to our use of rarity analysis to solve the NLD problem. Current research on outlier detection, however, analyzes primarily numerical entity-attribute data instead of mul... |

161 | Outlier detection for high dimensional data
- Aggarwal, Yu
- 2001
(Show Context)
Citation Context ... the multi-relational NLD problem. The area of outlier detection in data mining and statistics aims at detecting points that are considerably dissimilar or inconsistent with the remainder of the data =-=[2, 3, 7, 14, 15]-=-. This is conceptually related to our use of rarity analysis to solve the NLD problem. Current research on outlier detection, however, analyzes primarily numerical entity-attribute data instead of mul... |

141 | Visual Data Mining
- Macedo, Cook, et al.
- 2000
(Show Context)
Citation Context ...rch is on learning patterns from complex multi-relational data. For example, inductive logic programming has been applied to learn relational patterns [11] . Additionally, graph-based methods such as =-=[6]-=- have been used to learn subgraph categories and isomorphisms. These approaches either require training examples or learn things at the structure/schema level, while for the NLD problem it is necessar... |

113 |
Fish-oil, Raynaud’s syndrome and undiscovered public knowledge
- Swanson
- 1986
(Show Context)
Citation Context ...n of novel, interesting, plausible and intelligible knowledge about the objects of study. In this sense the novel link discovery problem is similar to literature-based discovery introduced by Swanson =-=[18, 19]-=-, since they both intend to find interesting facts and connections in large amounts of data. Since 1986 Swanson has triggered interesting discoveries insbiomedicine strictly by looking for mediators t... |

52 | Interestingness measures for association patterns: a perspective, in
- Tan, Kumar
- 2000
(Show Context)
Citation Context ...dealing with a discovery and not a learning problem. There is a significant body of work in data mining that deals with measuring the interestingness of discovered association or classification rules =-=[4, 8, 9]-=-; however, these interestingness measures are not appropriate for the NLD problem. The reasons are twofold. First, most of these methods assume the data is in the form of a feature-vector (a single re... |

48 | Knowledge discovery and interestingness measures: A survey
- Hilderman, Hamilton
- 1999
(Show Context)
Citation Context ...nd understandable patterns from lower-level data [20]. Such patterns can be represented as association rules, classification rules, clusters, sequential patterns, time series, contingency tables, etc =-=[9]-=-. Identifying “interesting” information in large, multi-relational data sets without using a pattern, on the other hand, has not received much attention at all. We argue, however, that patterns and ru... |

45 | 2001) Detecting graph-based spatial outlier: algorithms and applications(a summary of results
- Shekhar, Lu, et al.
(Show Context)
Citation Context |

36 | Relational Data Mining with Inductive Logic Programming for Link Discovery
- Mooney, Mellville, et al.
- 2002
(Show Context)
Citation Context .... Introduction Link discovery is a relatively new form of data mining with the goal of automatically identifying abnormal or threatening activities in large and heterogeneous data sets. Mooney et al. =-=[10]-=- describe it as the task of “identifying known, complex, multi-relational patterns that indicate potentially threatening activities in large amounts of relational data.” Under this view of link discov... |

31 | OPTICS-OF: Identifying Local Outliers
- Breunig, Kriegel, et al.
- 1999
(Show Context)
Citation Context |

26 | Principles of human-computer collaboration for knowledge discovery
- Valdés-Pérez
- 1999
(Show Context)
Citation Context ...rality analysis uses only the connectivity (the number of paths) to judge the significance while our algorithm considers not only the quantity but also the quality (rarity) of the paths. Valdes-Perez =-=[21]-=- characterizes discovery in science as the generation of novel, interesting, plausible and intelligible knowledge about the objects of study. In this sense the novel link discovery problem is similar ... |

15 |
On rule interestingness measures.”, KnowledgeBased Systems journal 12
- Freitas
- 1999
(Show Context)
Citation Context ...dealing with a discovery and not a learning problem. There is a significant body of work in data mining that deals with measuring the interestingness of discovered association or classification rules =-=[4, 8, 9]-=-; however, these interestingness measures are not appropriate for the NLD problem. The reasons are twofold. First, most of these methods assume the data is in the form of a feature-vector (a single re... |

15 |
Somatomedin C and arginine: Implicit connections between mutually isolated literatures
- Swanson
- 1990
(Show Context)
Citation Context ...n of novel, interesting, plausible and intelligible knowledge about the objects of study. In this sense the novel link discovery problem is similar to literature-based discovery introduced by Swanson =-=[18, 19]-=-, since they both intend to find interesting facts and connections in large amounts of data. Since 1986 Swanson has triggered interesting discoveries insbiomedicine strictly by looking for mediators t... |

7 |
Reasoning for web document associations and its applications in site map construction
- Candan, Li
- 2002
(Show Context)
Citation Context ...requires the occurrence of very common patterns in the data. Other analysis algorithms such as PageRank compute the importance of links through the connections between nodes in an unsupervised manner =-=[12, 13]-=-. In that framework, however, all relations are treated to be identical (that is, “A kills B” is not different from “A writes to B“), therefore, this approach is not suitable for the multi-relational ... |

3 |
Evidence Extraction and Link Discovery Program.” Presentation at DARPATech 2002 Symposium, August 2. Available at http://www.darpa.mil/DARPATech2002/presentations/iao_pdf/speeches /SENATOR.pdf
- Senator
- 2002
(Show Context)
Citation Context ...s and corruption of the data. Its biggest limitation is, however, that it can only detect instances of known patterns and cannot cope with previously unknown or evolving patterns of interest. Senator =-=[17]-=- describes link discovery more broadly as the process of looking for “evidence of known patterns and, perhaps more important, for unexplainable connections that may indicate previously unknown but sig... |

1 |
Correlation of complex evidences and link discovery
- Kovalerchuk, Vityaev
(Show Context)
Citation Context ...cture/schema level, while for the NLD problem it is necessary to perform discovery at the instance level by using unsupervised methods. Kovalerchuk and Vityaev’s hybrid evidence correlation technique =-=[1]-=- first identifies common patterns via standard data mining techniques and then hypothesizes interesting or unusual patterns by negating some of the statistically significant patterns found. It is conc... |

1 |
Dealing with Dirty Data, DBMS Magazine
- Kimball
- 1996
(Show Context)
Citation Context ...prior to the analysis. Our approach is a generalpurpose method and can be applied to arbitrary multirelational datasets. Potential applications are in law enforcement, threat detection, data cleaning =-=[5]-=- and scientific discovery. The experiment shows that our approach can capture interesting connections that are representative of meaningful real-world relationships. Future work will include more exte... |