• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Disambiguating web appearances of people in a social network (2005)

by R Bekkerman, A McCallum
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 50
Next 10 →

Ontology-Driven Automatic Entity Disambiguation in Unstructured Text

by Joseph Hassell, Boanerges Aleman-meza, I. Budak Arpinar - In International Semantic Web Conference , 2006
"... Abstract. Precisely identifying entities in web documents is essential for document indexing, web search and data integration. Entity disambiguation is the challenge of determining the correct entity out of various candidate entities. Our novel method utilizes background knowledge in the form of a p ..."
Abstract - Cited by 22 (2 self) - Add to MetaCart
Abstract. Precisely identifying entities in web documents is essential for document indexing, web search and data integration. Entity disambiguation is the challenge of determining the correct entity out of various candidate entities. Our novel method utilizes background knowledge in the form of a populated ontology. Additionally, it does not rely on the existence of any structure in a document or the appearance of data items that can provide strong evidence, such as email addresses, for disambiguating person names. Originality of our method is demonstrated in the way it uses different relationships in a document as well as from the ontology to provide clues in determining the correct entity. We demonstrate the applicability of our method by disambiguating names of researchers appearing in a collection of DBWorld posts using a large scale, realworld ontology extracted from the DBLP bibliography website. The precision and recall measurements provide encouraging results. Keywords: Entity disambiguation, ontology, semantic web, DBLP, DBWorld. 1

ArnetMiner: Extraction and Mining of Academic Social Networks

by Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, Zhong Su
"... This paper addresses several key issues in the ArnetMiner system, which aims at extracting and mining academic social networks. Specifically, the system focuses on: 1) Extracting researcher profiles automatically from the Web; 2) Integrating the publication data into the network from existing digita ..."
Abstract - Cited by 22 (7 self) - Add to MetaCart
This paper addresses several key issues in the ArnetMiner system, which aims at extracting and mining academic social networks. Specifically, the system focuses on: 1) Extracting researcher profiles automatically from the Web; 2) Integrating the publication data into the network from existing digital libraries; 3) Modeling the entire academic network; and 4) Providing search services for the academic network. So far, 448,470 researcher profiles have been extracted using a unified tagging approach. We integrate publications from online Web databases and propose a probabilistic framework to deal with the name ambiguity problem. Furthermore, we propose a unified modeling approach to simultaneously model topical aspects of papers, authors, and publication venues. Search services such as expertise search and people association search have been provided based on the modeling results. In this paper, we describe the architecture and main features of the system. We also present the empirical evaluation of the proposed methods.

Social Network Extraction of Academic Researchers

by Jie Tang, Duo Zhang, Limin Yao - In Proc. of ICDM’07
"... This paper addresses the issue of extraction of an academic researcher social network. By researcher social network extraction, we are aimed at finding, extracting, and fusing the ‘semantic’-based profiling information of a researcher from the Web. Previously, social network extraction was often und ..."
Abstract - Cited by 11 (7 self) - Add to MetaCart
This paper addresses the issue of extraction of an academic researcher social network. By researcher social network extraction, we are aimed at finding, extracting, and fusing the ‘semantic’-based profiling information of a researcher from the Web. Previously, social network extraction was often undertaken separately in an ad-hoc fashion. This paper first gives a formalization of the entire problem. Specifically, it identifies the ‘relevant documents ’ from the Web by a classifier. It then proposes a unified approach to perform the researcher profiling using Conditional Random Fields (CRF). It integrates publications from the existing bibliography datasets. In the integration, it proposes a constraints-based probabilistic model to name disambiguation. Experimental results on an online system show that the unified approach to researcher profiling significantly outperforms the baseline methods of using rule learning or classification. Experimental results also indicate that our method to name disambiguation performs better than the baseline method using unsupervised learning. The methods have been applied to expert finding. Experiments show that the accuracy of expert finding can be significantly improved by using the proposed methods. 1.

Person resolution in person search results: Webhawk

by Xiaojun Wan, Jianfeng Gao, Mu Li, Binggong Ding - CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT , 2005
"... Finding information about people on the Web using a search engine is difficult because there is a many-to-many mapping between person names and specific persons (i.e. referents). This paper describes a person resolution system, called WebHawk. Given a list of pages obtained by submitting a person qu ..."
Abstract - Cited by 11 (0 self) - Add to MetaCart
Finding information about people on the Web using a search engine is difficult because there is a many-to-many mapping between person names and specific persons (i.e. referents). This paper describes a person resolution system, called WebHawk. Given a list of pages obtained by submitting a person query to a search engine, WebHawk facilitates person search in three steps: First of all, a filter removes those pages that contain no information about any person. Secondly, a cluster groups the remaining pages into different clusters, each for one specific person. To make the resulting clusters more meaningful, an extractor is used to induce query-oriented personal information from each page. Finally, a namer generates an informative description for each cluster so that users can find any specific person easily. The architecture of WebHawk is presented, and the four components are discussed in detail, with a separate evaluation of each component presented where appropriate. A user study shows that WebHawk complements most existing search engines and successfully improves users' experience of person search on the Web.

Disambiguating Authors in Academic Publications using Random Forests

by Pucktada Treeratpituk, C. Lee Giles - JOINT CONFERENCE IN DIGITAL LIBRARIES , 2009
"... Users of digital libraries usually want to know the exact author or authors of an article. But different authors may share the same names, either as full names or as initials and last names (complete name change examples are not considered here). In such a case, the user would like the digital libra ..."
Abstract - Cited by 9 (3 self) - Add to MetaCart
Users of digital libraries usually want to know the exact author or authors of an article. But different authors may share the same names, either as full names or as initials and last names (complete name change examples are not considered here). In such a case, the user would like the digital library to differentiate among these authors. Name disambiguation can help in many cases; one being a user in a search of all articles written by a particular author. Disambiguation also enables better bibliometric analysis by allowing a more accurate counting and grouping of publications and citations. In this paper, we describe an algorithm for pairwise disambiguation of author names based on a machine learning classification algorithm, random forests. We define a set of similarity profile features to assist in author disambiguation. Our experiments on the Medline database show that the random forest model outperforms other previously proposed techniques such as those using support-vector machines (SVM). In addition, we demonstrate that the variable importance produced by the random forest model can be used in feature selection with little degradation in the disambiguation accuracy. In particular, the inverse document frequency of author last name and the middle name’s similarity alone achieves an accuracy of almost 90%.

2006b. Graph-based word clustering using web search engine

by Yutaka Matsuo, Kôki Uchiyama - In Proc. of EMNLP 2006
"... Word clustering is important for automatic thesaurus construction, text classification, and word sense disambiguation. Recently, several studies have reported using the web as a corpus. This paper proposes an unsupervised algorithm for word clustering based on a word similarity measure by web counts ..."
Abstract - Cited by 7 (0 self) - Add to MetaCart
Word clustering is important for automatic thesaurus construction, text classification, and word sense disambiguation. Recently, several studies have reported using the web as a corpus. This paper proposes an unsupervised algorithm for word clustering based on a word similarity measure by web counts. Each pair of words is queried to a search engine, which produces a co-occurrence matrix. By calculating the similarity of words, a word cooccurrence graph is obtained. A new kind of graph clustering algorithm called Newman clustering is applied for efficiently identifying word clusters. Evaluations are made on two sets of word groups derived from a web directory and WordNet. 1

Towards Breaking the Quality Curse. A Web-Querying Approach to Web People Search. ∗

by Dmitri V. Kalashnikov, Rabia Nuray-turan, Sharad Mehrotra
"... Searching for people on the Web is one of the most common query types to the web search engines today. However, when a person name is queried, the returned webpages often contain documents related to several distinct namesakes who have the queried name. The task of disambiguating and finding the web ..."
Abstract - Cited by 7 (5 self) - Add to MetaCart
Searching for people on the Web is one of the most common query types to the web search engines today. However, when a person name is queried, the returned webpages often contain documents related to several distinct namesakes who have the queried name. The task of disambiguating and finding the webpages related to the specific person of interest is left to the user. Many Web People Search (WePS) approaches have been developed recently that attempt to automate this disambiguation process. Nevertheless, the disambiguation quality of these techniques leaves a major room for improvement. This paper presents a new serverside WePS approach. It is based on collecting co-occurrence information from the Web and thus it uses the Web as an external data source. A skyline-based classification technique

Web people search via connection analysis

by Dmitri V. Kalashnikov, Zhaoqi (stella Chen, Sharad Mehrotra, Rabia Nuray-turan - IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE , 2008
"... Abstract—Nowadays, searches for the web pages of a person with a given name constitute a notable fraction of queries to Web search engines. Such a query would normally return web pages related to several namesakes, who happened to have the queried name, leaving the burden of disambiguating and colle ..."
Abstract - Cited by 7 (5 self) - Add to MetaCart
Abstract—Nowadays, searches for the web pages of a person with a given name constitute a notable fraction of queries to Web search engines. Such a query would normally return web pages related to several namesakes, who happened to have the queried name, leaving the burden of disambiguating and collecting pages relevant to a particular person (from among the namesakes) on the user. In this paper, we develop a Web People Search approach that clusters web pages based on their association to different people. Our method exploits a variety of semantic information extracted from web pages, such as named entities and hyperlinks, to disambiguate among namesakes referred to on the web pages. We demonstrate the effectiveness of our approach by testing the efficacy of the disambiguation algorithms and its impact on person search. Index Terms—Web people search, entity resolution, graph-based disambiguation, social network analysis, clustering. Ç 1

Extracting Social Networks among Various Entities on the Web

by Yingzi Jin, Yutaka Matsuo, Mitsuru Ishizuka
"... Abstract. Social networks have recently attracted much attention for their importance to the Semantic Web. Several methods exist to extract social networks for people (particularly researchers) from the web using a search engine. Our goal is to expand existing techniques to obtain social networks am ..."
Abstract - Cited by 6 (2 self) - Add to MetaCart
Abstract. Social networks have recently attracted much attention for their importance to the Semantic Web. Several methods exist to extract social networks for people (particularly researchers) from the web using a search engine. Our goal is to expand existing techniques to obtain social networks among various entities. This paper proposes two improvements, i.e. relation identification and threshold tuning, which enable us to deal with complex and inhomogeneous communities. Social networks among firms and artists (of contemporary) are extracted as examples: Several evaluations emphasize the effectiveness of these methods. Our system was used at the International Triennale of Contemporary Art (Yokohama Triennale 2005) to facilitate navigation of artists ’ information. This study contributes to the Semantic Web in that we increase the applicability of social network extraction for several studies. 1

Disambiguation algorithm for people search on the web. ICDE, to appear

by Dmitri V. Kalashnikov, Sharad Mehrotra, Zhaoqi Chen, Rabia Nuray-turan, Naveen Ashish - In ICDE poster , 2007
"... Searching for entities, i.e., webpages related to a person, location, organization or other types of entities is a common activity in internet search today. For instance “people search ” i.e., searching for webpages related to a person accounts ..."
Abstract - Cited by 6 (6 self) - Add to MetaCart
Searching for entities, i.e., webpages related to a person, location, organization or other types of entities is a common activity in internet search today. For instance “people search ” i.e., searching for webpages related to a person accounts
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University