Results 11 - 20
of
1,502
Generating query substitutions
- In WWW
, 2006
"... We introduce the notion of query substitution, that is, generating a new query to replace a user’s original search query. Our technique uses modifications based on typical substitutions web searchers make to their queries. In this way the new query is strongly related to the original query, containi ..."
Abstract
-
Cited by 124 (5 self)
- Add to MetaCart
We introduce the notion of query substitution, that is, generating a new query to replace a user’s original search query. Our technique uses modifications based on typical substitutions web searchers make to their queries. In this way the new query is strongly related to the original query, containing terms closely related to all of the original terms. This contrasts with query expansion through pseudo-relevance feedback, which is costly and can lead to query drift. This also contrasts with query relaxation through boolean or TFIDF retrieval, which reduces the specificity of the query. We define a scale for evaluating query substitution, and show that our method performs well at generating new queries related to the original queries. We build a model for selecting between candidates, by using a number of features relating the query-candidate pair, and by fitting the model to human judgments of relevance of query suggestions. This further improves the quality of the candidates generated. Experiments show that our techniques significantly increase coverage and effectiveness in the setting of sponsored search.
Extended gloss overlaps as a measure of semantic relatedness
- In Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence
, 2003
"... This paper presents a new measure of semantic relatedness between concepts that is based on the number of shared words (overlaps) in their definitions (glosses). This measure is unique in that it extends the glosses of the concepts under consideration to include the glosses of other concepts to whic ..."
Abstract
-
Cited by 121 (5 self)
- Add to MetaCart
This paper presents a new measure of semantic relatedness between concepts that is based on the number of shared words (overlaps) in their definitions (glosses). This measure is unique in that it extends the glosses of the concepts under consideration to include the glosses of other concepts to which they are related according to a given concept hierarchy. We show that this new measure reasonably correlates to human judgments. We introduce a new method of word sense disambiguation based on extended gloss overlaps, and demonstrate that it fares well on the SENSEVAL-2 lexical sample data. 1
Models of Translational Equivalence among Words
- Computational Linguistics
, 2000
"... This article presents methods for biasing statistical translation models to reflect these properties. Evaluation with respect to independent human judgments has confirmed that translation models biased in this fashion are significantly more accurate than a baseline knowledge-free model. This article ..."
Abstract
-
Cited by 121 (2 self)
- Add to MetaCart
This article presents methods for biasing statistical translation models to reflect these properties. Evaluation with respect to independent human judgments has confirmed that translation models biased in this fashion are significantly more accurate than a baseline knowledge-free model. This article also shows how a statistical translation model can take advantage of preexisting knowledge that might be available about particular language pairs. Even the simplest kinds of languagespecific knowledge, such as the distinction between content words and function words, are shown to reliably boost translation model performance on some tasks. Statistical models that reflect knowledge about the model domain combine the best of both the rationalist and empiricist paradigms
Semantic taxonomy induction from heterogenous evidence
- In Proceedings of COLING/ACL 2006
, 2006
"... We propose a novel algorithm for inducing semantic taxonomies. Previous algorithms for taxonomy induction have typically focused on independent classifiers for discovering new single relationships based on hand-constructed or automatically discovered textual patterns. By contrast, our algorithm flex ..."
Abstract
-
Cited by 121 (1 self)
- Add to MetaCart
We propose a novel algorithm for inducing semantic taxonomies. Previous algorithms for taxonomy induction have typically focused on independent classifiers for discovering new single relationships based on hand-constructed or automatically discovered textual patterns. By contrast, our algorithm flexibly incorporates evidence from multiple classifiers over heterogenous relationships to optimize the entire structure of the taxonomy, using knowledge of a word’s coordinate terms to help in determining its hypernyms, and vice versa. We apply our algorithm on the problem of sense-disambiguated noun hyponym acquisition, where we combine the predictions of hypernym and coordinate term classifiers with the knowledge in a preexisting semantic taxonomy (WordNet 2.1). We add 10, 000 novel synsets to WordNet 2.1 at 84 % precision, a relative error reduction of 70 % over a non-joint algorithm using the same component classifiers. Finally, we show that a taxonomy built using our algorithm shows a 23 % relative F-score improvement over WordNet 2.1 on an independent testset of hypernym pairs. 1
Mining the Web for Synonyms: PMI-IR Versus LSA on TOEFL
, 2001
"... This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of wo ..."
Abstract
-
Cited by 118 (10 self)
- Add to MetaCart
This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words. PMI-IR is empirically evaluated using 80 synonym test questions from the Test of English as a Foreign Language (TOEFL) and 50 synonym test questions from a collection of tests for students of English as a Second Language (ESL). On both tests, the algorithm obtains a score of 74%. PMI-IR is contrasted with Latent Semantic Analysis (LSA), which achieves a score of 64% on the same 80 TOEFL questions. The paper discusses potential applications of the new unsupervised learning algorithm and some implications of the results for LSA and LSI (Latent Semantic Indexing).
Clustering with instance-level constraints
- In Proceedings of the Seventeenth International Conference on Machine Learning
, 2000
"... One goal of research in artificial intelligence is to automate tasks that currently require human expertise; this automation is important because it saves time and brings problems that were previously too large to be solved into the feasible domain. Data analysis, or the ability to identify meaningf ..."
Abstract
-
Cited by 116 (6 self)
- Add to MetaCart
One goal of research in artificial intelligence is to automate tasks that currently require human expertise; this automation is important because it saves time and brings problems that were previously too large to be solved into the feasible domain. Data analysis, or the ability to identify meaningful patterns and trends in large volumes of data, is an important task that falls into this category. Clustering algorithms are a particularly useful group of data analysis tools. These methods are used, for example, to analyze satellite images of the Earth to identify and categorize different land and foliage types or to analyze telescopic observations to determine what distinct types of astronomical bodies exist and to categorize each observation. However, most existing clustering methods apply general similarity techniques rather than making use of problem-specific information. This dissertation first presents a novel method for converting existing clustering algorithms into constrained clustering algorithms. The resulting methods are able to accept domain-specific information in the form of constraints on the output clusters. At the most general level, each constraint is an instance-level statement
Imagenet: A large-scale hierarchical image database
- In CVPR
, 2009
"... The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce her ..."
Abstract
-
Cited by 110 (7 self)
- Add to MetaCart
The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a largescale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond. 1.
Text2Onto - A Framework for Ontology Learning and Data-driven Change Discovery
, 2005
"... In this paper we present Text2Onto, a framework for ontology learning from textual resources. Three main features distinguish Text2Onto from our earlier framework TextToOnto as well as other state-of-the-art ontology learning frameworks. First, by representing the learned knowledge at a meta-lev ..."
Abstract
-
Cited by 101 (18 self)
- Add to MetaCart
In this paper we present Text2Onto, a framework for ontology learning from textual resources. Three main features distinguish Text2Onto from our earlier framework TextToOnto as well as other state-of-the-art ontology learning frameworks. First, by representing the learned knowledge at a meta-level in the form of instantiated modeling primitives within a so called Probabilistic Ontology Model (POM), we remain independent of a concrete target language while being able to translate the instantiated primitives into any (reasonably expressive) knowledge representation formalism. Second, user interaction is a core aspect of Text2Onto and the fact that the system calculates a confidence for each learned object allows to design sophisticated visualizations of the POM. Third, by incorporating strategies for data-driven change discovery, we avoid processing the whole corpus from scratch each time it changes, only selectively updating the POM according to the corpus changes instead. Besides increasing e#ciency in this way, it also allows a user to trace the evolution of the ontology with respect to the changes in the underlying corpus.
Overview of the TREC 2001 Question Answering Track
- In Proceedings of the Tenth Text REtrieval Conference (TREC
, 2001
"... The TREC question answering track is an effort to bring the benefits of loxge-scale evaluation to beox on the question answering problem. In its third yeox, the track continued to focus on retrieving small snippets of text that contain an answer to a question. However, several new conditions were ..."
Abstract
-
Cited by 94 (3 self)
- Add to MetaCart
The TREC question answering track is an effort to bring the benefits of loxge-scale evaluation to beox on the question answering problem. In its third yeox, the track continued to focus on retrieving small snippets of text that contain an answer to a question. However, several new conditions were added to increase the realism, nd the difficulty, of the task. In the main task, questions were no longer guoxanteed to have n nswer in the collection; systems returned a response of 'NIL' to indicate their belief that no nswer was present. In the new list task, systems assembled a set of instances as the response for a question, requiring the ability to distinguish aong instances found in multiple documents. Another new task, the context task, required systems to track discourse objects through a series of questions.
Opinion Observer: Analyzing and Comparing Opinions on the Web
- In WWW ’05: Proceedings of the 14th international conference on World Wide Web
, 2005
"... The Web has become an excellent source for gathering consumer opinions. There are now numerous Web sites containing such opinions, e.g., customer reviews of products, forums, discussion groups, and blogs. This paper focuses on online customer reviews of products. It makes two contributions. First, i ..."
Abstract
-
Cited by 91 (8 self)
- Add to MetaCart
The Web has become an excellent source for gathering consumer opinions. There are now numerous Web sites containing such opinions, e.g., customer reviews of products, forums, discussion groups, and blogs. This paper focuses on online customer reviews of products. It makes two contributions. First, it proposes a novel framework for analyzing and comparing consumer opinions of competing products. A prototype system called Opinion Observer is also implemented. The system is such that with a single glance of its visualization, the user is able to clearly see the strengths and weaknesses of each product in the minds of consumers in terms of various product features. This comparison is useful to both potential customers and product manufacturers. For a potential customer, he/she can see a visual side-by-side and feature-by-feature comparison of consumer opinions on these products, which helps him/her to decide which product to buy. For a product manufacturer, the comparison enables it to easily gather marketing intelligence and product benchmarking information. Second, a new technique based on language pattern mining is proposed to extract product features from Pros and Cons in a particular type of reviews. Such features form the basis for the above comparison. Experimental results show that the technique is highly effective and outperform existing methods significantly.

