Results 1 - 10
of
18
Web people search via connection analysis
- IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE
, 2008
"... Abstract—Nowadays, searches for the web pages of a person with a given name constitute a notable fraction of queries to Web search engines. Such a query would normally return web pages related to several namesakes, who happened to have the queried name, leaving the burden of disambiguating and colle ..."
Abstract
-
Cited by 29 (11 self)
- Add to MetaCart
(Show Context)
Abstract—Nowadays, searches for the web pages of a person with a given name constitute a notable fraction of queries to Web search engines. Such a query would normally return web pages related to several namesakes, who happened to have the queried name, leaving the burden of disambiguating and collecting pages relevant to a particular person (from among the namesakes) on the user. In this paper, we develop a Web People Search approach that clusters web pages based on their association to different people. Our method exploits a variety of semantic information extracted from web pages, such as named entities and hyperlinks, to disambiguate among namesakes referred to on the web pages. We demonstrate the effectiveness of our approach by testing the efficacy of the disambiguation algorithms and its impact on person search. Index Terms—Web people search, entity resolution, graph-based disambiguation, social network analysis, clustering. Ç 1
Exploiting context analysis for combining multiple entity resolution systems
- in Proceedings of the 35th SIGMOD international conference on Management of data, 2009
"... Entity Resolution (ER) is an important real world problem that has attracted significant research interest over the past few years. It deals with determining which object descriptions co-refer in a dataset. Due to its practical significance for data mining and data analysis tasks many different ER a ..."
Abstract
-
Cited by 22 (9 self)
- Add to MetaCart
(Show Context)
Entity Resolution (ER) is an important real world problem that has attracted significant research interest over the past few years. It deals with determining which object descriptions co-refer in a dataset. Due to its practical significance for data mining and data analysis tasks many different ER approaches has been developed to address the ER challenge. This paper proposes a new ER Ensemble framework. The task of ER Ensemble is to combine the results of multiple base-level ER systems into a single solution with the goal of increasing the quality of ER. The framework proposed in this paper leverages the observation that often no single ER method always performs the best, consistently outperforming other ER techniques in terms of quality. Instead, different ER solutions perform better in different contexts. The framework employs two novel combining approaches, which are based on supervised learning. The two approaches learn a mapping of the clustering decisions of the base-level ER systems, together with the local context, into a combined clustering decision. The paper empirically studies the framework by applying it to different domains. The experiments demonstrate that the proposed framework achieves significantly higher disambiguation quality compared to the current state of the art solutions.
A semantics-based approach for speech annotation of images
- IEEE Trans. Knowl. Data Engin
, 2011
"... Abstract—Associating textual annotations/tags with multimedia content is among the most effective approaches to organize and to support search over digital images and multimedia databases. Despite advances in multimedia analysis, effective tagging remains largely a manual process wherein users add d ..."
Abstract
-
Cited by 4 (2 self)
- Add to MetaCart
(Show Context)
Abstract—Associating textual annotations/tags with multimedia content is among the most effective approaches to organize and to support search over digital images and multimedia databases. Despite advances in multimedia analysis, effective tagging remains largely a manual process wherein users add descriptive tags by hand, usually when uploading or browsing the collection, much after the pictures have been taken. This approach, however, is not convenient in all situations or for many applications, e.g., when users would like to publish and share pictures with others in real time. An alternate approach is to instead utilize a speech interface using which users may specify image tags that can be transcribed into textual annotations by employing automated speech recognizers. Such a speech-based approach has all the benefits of human tagging without the cumbersomeness and impracticality typically associated with human tagging in real time. The key challenge in such an approach is the potential low recognition quality of the stateof-the-art recognizers, especially, in noisy environments. In this paper, we explore how semantic knowledge in the form of cooccurrence between image tags can be exploited to boost the quality of speech recognition. We postulate the problem of speech annotation as that of disambiguating among multiple alternatives offered by the recognizer. An empirical evaluation has been conducted over both real speech recognizer’s output as well as synthetic data sets. The results demonstrate significant advantages of the proposed approach compared to the recognizer’s output under varying conditions. Index Terms—Using speech for tagging and annotation, using semantics to improve ASR, maximum entropy approach, correlationbased approach, branch and bound algorithm. Ç 1
WEST: Modern Technologies for Web People Search
"... In this paper we describe WEST (Web Entity Search Technologies) system that we have developed to improve people search over the Internet. Recently the problem of Web People Search (WePS) has attracted significant attention from both ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
In this paper we describe WEST (Web Entity Search Technologies) system that we have developed to improve people search over the Internet. Recently the problem of Web People Search (WePS) has attracted significant attention from both
Exploiting Web querying for Web People Search in WePS2
"... Searching for people on the Web is one of the most common query types to the web search engines today. However, when a person name is queried, the returned result often contains webpages related to several distinct namesakes who have the queried name. The task of disambiguating and finding the webpa ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
Searching for people on the Web is one of the most common query types to the web search engines today. However, when a person name is queried, the returned result often contains webpages related to several distinct namesakes who have the queried name. The task of disambiguating and finding the webpages related to the specific person of interest is left to the user. Many Web People Search (WePS) approaches have been developed recently that attempt to automate this disambiguation process. Nevertheless, the disambiguation quality of these techniques leaves a major room for improvement. In this paper we describe our experience of applying our WePS approaches developed in [20] in the context of WePS-2 Clustering Task [14]. The approach is based on extracting named entities from the web pages and then querying the web to collecting co-occurrence statistics, which are used as additional similarity measures.
Fuzzy Ants Clustering for Web People Search
- In: 2nd Web People Search Evaluation Workshop (WePS 2009), 18th WWW Conference
, 2009
"... A search engine query for a person’s name often brings up web pages corresponding to several people who share the same name. The Web People Search (WePS) problem involves organizing such search results for an ambiguous name query in meaningful clusters, that group together all web pages correspondin ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
A search engine query for a person’s name often brings up web pages corresponding to several people who share the same name. The Web People Search (WePS) problem involves organizing such search results for an ambiguous name query in meaningful clusters, that group together all web pages corresponding to one single individual. A particularly challenging aspect of this task is that it is in general not known beforehand how many clusters to expect. In this paper we therefore propose the use of a Fuzzy Ants clustering algorithm that does not rely on prior knowledge of the number of clusters that need to be found in the data. An evaluation on benchmark data sets from SemEval’s WePS1 and WePS2 competitions shows that the resulting system is competitive with the agglomerative clustering Agnes algorithm. This is particularly interesting as the latter involves manual setting of a similarity threshold (or estimating the number of clusters in advance) while the former does not.
Cluster Based Web Search
"... Abstract- The rapid growth of the Internet has made the Web a popular place for collecting information. Internet user access billions of web pages online using search engines. Information in the Web comes from many sources, including websites of companies, organizations, communications and personal ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract- The rapid growth of the Internet has made the Web a popular place for collecting information. Internet user access billions of web pages online using search engines. Information in the Web comes from many sources, including websites of companies, organizations, communications and personal homepages, etc. Effective representation of Web search results remains an open problem in the Information Retrieval community and also it put burden of collecting pages relevant to query on user. We develop an approach to organize search results into semantically meaningful groups of web pages and present these to the user as clusters, one for each meaning of the query. With each cluster, our approach provide a summary description that is representative of the real entity associated with that cluster (for the query “mouse”, the summary description may be a list of words such as “a small mammal and a popular pet”). The user can hone in on the cluster of interest to her and get all pages in that cluster. Index Terms- Web Search result, Clustering.
tokyo.ac.jp
"... tokyo.ac.jp In this paper, we report our system that disambiguates per-son names in Web search results. The system uses named entities, compound key words, and URLs as features for document similarity calculation, which typically show high precision but low recall clustering results. We propose to u ..."
Abstract
- Add to MetaCart
(Show Context)
tokyo.ac.jp In this paper, we report our system that disambiguates per-son names in Web search results. The system uses named entities, compound key words, and URLs as features for document similarity calculation, which typically show high precision but low recall clustering results. We propose to use a two-stage clustering algorithm by bootstrapping to improve the low recall values, in which clustering results of the first stage are used to extract features used in the second stage clustering. Experimental results revealed that our algorithm yields better score than the best systems at the latest WePS workshop.
Directorate of Technological Education Ministry of Manpower
"... Abstract — A person may have multiple personal name aliases on the web. Identifying aliases of a name is useful in information retrieval and knowledge management, sentiment analysis, relation extraction and name disambiguation. The objective of detecting aliases from the web is to retrieve all the i ..."
Abstract
- Add to MetaCart
(Show Context)
Abstract — A person may have multiple personal name aliases on the web. Identifying aliases of a name is useful in information retrieval and knowledge management, sentiment analysis, relation extraction and name disambiguation. The objective of detecting aliases from the web is to retrieve all the information pertaining to a personal name whose content is described with different nick names in different documents of web. As of now, web contains aliases of popular personalities in various domains like sports, politics, medicine, music, cinema etc., and does not contain alias information about common man. Recently, there are proven methods of extracting aliases through lexical pattern based retrieval tested using real-world name-alias pairs in Japanese and English as training data related to limited domains. In this paper, we discuss on how this information retrieval process has been grown now, and what the future directions are and also the scope of alias extraction in Inter-disciplinary fields.
Web People Search using Ontology Based Decision Tree
"... Internet plays an important role in search of information. So far, most of the search done on the internet is related to person search. When we give a query for person search, it returns a set of web pages related to distinct person of given name. For such type of search the job of finding the web p ..."
Abstract
- Add to MetaCart
Internet plays an important role in search of information. So far, most of the search done on the internet is related to person search. When we give a query for person search, it returns a set of web pages related to distinct person of given name. For such type of search the job of finding the web page of interest is left on the user. In this paper, we develop a technique for web people search which clusters the web pages based on semantic information and maps them using ontology based decision tree making the user to access the information in more easy way. This technique uses the concept of ontology thus reducing the number of inconsistencies. The result proves that ontology based decision tree and clustering helps in increasing the efficiency of the overall search.