Results 1 - 10
of
40
A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches
"... This paper presents and compares WordNetbased and distributional similarity approaches. The strengths and weaknesses of each approach regarding similarity and relatedness tasks are discussed, and a combination is presented. Each of our methods independently provide the best results in their class on ..."
Abstract
-
Cited by 33 (3 self)
- Add to MetaCart
This paper presents and compares WordNetbased and distributional similarity approaches. The strengths and weaknesses of each approach regarding similarity and relatedness tasks are discussed, and a combination is presented. Each of our methods independently provide the best results in their class on the RG and WordSim353 datasets, and a supervised combination of them yields the best published results on all datasets. Finally, we pioneer cross-lingual similarity, showing that our methods are easily adapted for a cross-lingual task with minor losses. 1
Learning about the world through long-term query logs
- ACM Trans. Web
, 2008
"... In this article, we demonstrate the value of long-term query logs. Most work on query logs to date considers only short-term (within-session) query information. In contrast, we show that long-term query logs can be used to learn about the world we live in. There are many applications of this that le ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
In this article, we demonstrate the value of long-term query logs. Most work on query logs to date considers only short-term (within-session) query information. In contrast, we show that long-term query logs can be used to learn about the world we live in. There are many applications of this that lead not only to improving the search engine for its users, but also potentially to advances in other disciplines such as medicine, sociology, economics, and more. In this article, we will show how long-term query logs can be used for these purposes, and that their potential is severely reduced if the logs are limited to short time horizons. We show that query effects are long-lasting, provide valuable information, and might be used to automatically make medical discoveries, build concept hierarchies, and generally learn about the sociological behavior of users. We believe these applications are only the beginning of what can be done with the information contained in long-term query logs, and see this work as a step toward unlocking their potential.
FastSum: fast and accurate query-based multi-document summarization
- In Proceedings of ACL-08: HLT
"... We present a fast query-based multi-document summarizer called FastSum based solely on word-frequency features of clusters, documents and topics. Summary sentences are ranked by a regression SVM. The summarizer does not use any expensive NLP techniques such as parsing, tagging of names or even part ..."
Abstract
-
Cited by 9 (3 self)
- Add to MetaCart
We present a fast query-based multi-document summarizer called FastSum based solely on word-frequency features of clusters, documents and topics. Summary sentences are ranked by a regression SVM. The summarizer does not use any expensive NLP techniques such as parsing, tagging of names or even part of speech information. Still, the achieved accuracy is comparable to the best systems presented in recent academic competitions (i.e., Document Understanding Conference (DUC)). Because of a detailed feature analysis using Least Angle Regression (LARS), FastSum can rely on a minimal set of features leading to fast processing times: 1250 news documents in 60 seconds. 1
Web-based Measure of Semantic Relatedness
- In Proc. of 9th International Conference on Web Information Systems Engineering (WISE 2008), Auckland (New Zealand
, 2008
"... Abstract. Semantic relatedness measures quantify the degree in which some words or concepts are related, considering not only similarity but any possible semantic relationship among them. Relatedness computation is of great interest in different areas, such as Natural Language Processing, Informatio ..."
Abstract
-
Cited by 8 (4 self)
- Add to MetaCart
Abstract. Semantic relatedness measures quantify the degree in which some words or concepts are related, considering not only similarity but any possible semantic relationship among them. Relatedness computation is of great interest in different areas, such as Natural Language Processing, Information Retrieval, or the Semantic Web. Different methods have been proposed in the past; however, current relatedness measures lack some desirable properties for a new generation of Semantic Web applications: maximum coverage, domain independence, and universality. In this paper, we explore the use of a semantic relatedness measure between words, that uses the Web as knowledge source. This measure exploits the information about frequencies of use provided by existing search engines. Furthermore, taking this measure as basis, we define a new semantic relatedness measure among ontology terms. The proposed measure fulfils the above mentioned desirable properties to be used on the Semantic Web. We have tested extensively this semantic measure to show that it correlates well with human judgment, and helps solving some particular tasks, as word sense disambiguation or ontology matching.
Towards Breaking the Quality Curse. A Web-Querying Approach to Web People Search. ∗
"... Searching for people on the Web is one of the most common query types to the web search engines today. However, when a person name is queried, the returned webpages often contain documents related to several distinct namesakes who have the queried name. The task of disambiguating and finding the web ..."
Abstract
-
Cited by 7 (5 self)
- Add to MetaCart
Searching for people on the Web is one of the most common query types to the web search engines today. However, when a person name is queried, the returned webpages often contain documents related to several distinct namesakes who have the queried name. The task of disambiguating and finding the webpages related to the specific person of interest is left to the user. Many Web People Search (WePS) approaches have been developed recently that attempt to automate this disambiguation process. Nevertheless, the disambiguation quality of these techniques leaves a major room for improvement. This paper presents a new serverside WePS approach. It is based on collecting co-occurrence information from the Web and thus it uses the Web as an external data source. A skyline-based classification technique
Query-Sensitive Mutual Reinforcement Chain and Its Application in Query-Oriented Multi-Document Summarization
"... Sentence ranking is the issue of most concern in document summarization. Early researchers have presented the mutual reinforcement principle (MR) between sentence and term for simultaneous key phrase and salient sentence extraction in generic single-document summarization. In this work, we extend th ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Sentence ranking is the issue of most concern in document summarization. Early researchers have presented the mutual reinforcement principle (MR) between sentence and term for simultaneous key phrase and salient sentence extraction in generic single-document summarization. In this work, we extend the MR to the mutual reinforcement chain (MRC) of three different text granularities, i.e., document, sentence and terms. The aim is to provide a general reinforcement framework and a formal mathematical modeling for the MRC. Going one step further, we incorporate the query influence into the MRC to cope with the need for query-oriented multi-document summarization. While the previous summarization approaches often calculate the similarity regardless of the query, we develop a query-sensitive similarity to measure the affinity between the pair of texts. When evaluated on the DUC 2005 dataset, the experimental results suggest that the proposed query-sensitive MRC (Qs-MRC) is a promising approach for summarization.
Robust Estimation of Google Counts for Social Network Extraction
"... Various studies within NLP and Semantic Web use the so-called Google count, which is the hit count on a query returned by a search engine (not only Google). However, sometimes the Google count is unreliable, especially when the count is large, or when advanced operators such as OR and NOT are used. ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
Various studies within NLP and Semantic Web use the so-called Google count, which is the hit count on a query returned by a search engine (not only Google). However, sometimes the Google count is unreliable, especially when the count is large, or when advanced operators such as OR and NOT are used. In this paper, we propose a novel algorithm that estimates the Google count robustly. It (i) uses the co-occurrence of terms as evidence to estimate the occurrence of a given word, and (ii) integrates multiple evidence for robust estimation. We evaluated our algorithm for more than 2000 queries on three datasets using Google, Yahoo! and MSN search engine. Our algorithm also provides estimate counts for any classifier that judges a web page as positive or negative. Consequently, we can estimate the number of documents with included references of a particular person (among namesakes) on the entire web.
Exploiting Web querying for Web People Search in WePS2
"... Searching for people on the Web is one of the most common query types to the web search engines today. However, when a person name is queried, the returned result often contains webpages related to several distinct namesakes who have the queried name. The task of disambiguating and finding the webpa ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
Searching for people on the Web is one of the most common query types to the web search engines today. However, when a person name is queried, the returned result often contains webpages related to several distinct namesakes who have the queried name. The task of disambiguating and finding the webpages related to the specific person of interest is left to the user. Many Web People Search (WePS) approaches have been developed recently that attempt to automate this disambiguation process. Nevertheless, the disambiguation quality of these techniques leaves a major room for improvement. In this paper we describe our experience of applying our WePS approaches developed in [20] in the context of WePS-2 Clustering Task [14]. The approach is based on extracting named entities from the web pages and then querying the web to collecting co-occurrence statistics, which are used as additional similarity measures.
Thomson Reuters at TAC 2008: Aggressive Filtering with FastSum for Update and Opinion Summarization
"... In TAC 2008 we participated in the main task (Update Summarization) as well as the Sentiment Summarization pilot task. We modified the FastSum system (Schilder and Kondadadi, 2008) and added more aggressive filtering in order to adapt the system to update summarization and sentiment summarization. F ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
In TAC 2008 we participated in the main task (Update Summarization) as well as the Sentiment Summarization pilot task. We modified the FastSum system (Schilder and Kondadadi, 2008) and added more aggressive filtering in order to adapt the system to update summarization and sentiment summarization. For the Update Summarization task, we show that a classifier that identifies sentences that are similar to typical first sentences of a news article improves the overall linguistic quality of the generated summaries. For the Sentiment Summarization pilot task, we use a simple sentiment classifier based on a gazetteer of positive and negative sentiment words derived from the General Inquirer and other sources to produce opinion-based summaries for a collection of blog posts given a set of positive and negative questions. 1
Towards bridging the web and the semantic web
- In Proc. of WI/IAT 2009
, 2009
"... The World Wide Web (WWW) has provided us with a plethora of information. However, given its unstructured format, this information is useful mainly to humans and cannot be effectively interpreted by machines. The Semantic Web provides information in computer understandable structures (e.g., RDF), but ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The World Wide Web (WWW) has provided us with a plethora of information. However, given its unstructured format, this information is useful mainly to humans and cannot be effectively interpreted by machines. The Semantic Web provides information in computer understandable structures (e.g., RDF), but the amount of information on the Semantic Web is limited compared to the amount of information available on the Web. The problem of generating a bridge between the Web and Semantic Web has recently gained a lot of attention. In this paper, we propose a Concept Extractor and Relationship Identifier (CE-RI) system, which acts as a bridge between Web and Semantic Web by providing a “semantic ” way of presenting the search results to the user. The Concept Extractor (CE) component of our system makes use of the power of existing search engines coupled with the elegance of PageRank to extract high quality concepts related to the given query. The Relationship Identifier (RI) component finds relationships between the extracted concepts and the given query and presents them to the user in the form of a graph. It also stores the generated results formally, in the form of RDF triples, to facilitate better inferences as compared to traditional search engines. We evaluate our system by comparing its components CE and RI with other similar ”state of the art ” concept detection and relationship identification systems, respectively. The results produced by our system are either similiar or better than those generated by other systems. 1.

