• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

Using WordNet in a Knowledge-Based Approach to Information Retrieval. Working Paper CA-0395 (1995)

by R Richardson, A F Smeaton
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 49
Next 10 →

Semantic similarity based on corpus statistics and lexical taxonomy

by Jay J. Jiang, David W. Conrath - Proc of 10th International Conference on Research in Computational Linguistics, ROCLING’97 , 1997
"... This paper presents a new approach for measuring semantic similarity/distance between words and concepts. It combines a lexical taxonomy structure with corpus statistical information so that the semantic distance between nodes in the semantic space constructed by the taxonomy can be better quantifie ..."
Abstract - Cited by 396 (0 self) - Add to MetaCart
This paper presents a new approach for measuring semantic similarity/distance between words and concepts. It combines a lexical taxonomy structure with corpus statistical information so that the semantic distance between nodes in the semantic space constructed by the taxonomy can be better quantified with the computational evidence derived from a distributional analysis of corpus data. Specifically, the proposed measure is a combined approach that inherits the edge-based approach of the edge counting scheme, which is then enhanced by the node-based approach of the information content calculation. When tested on a common data set of word pair similarity ratings, the proposed approach outperforms other computational models. It gives the highest correlation value (r = 0.828) with a benchmark based on human similarity judgements, whereas an upper bound (r = 0.885) is observed when human subjects replicate the same task. 1.

Determining Semantic Similarity among Entity Classes from Different Ontologies

by M. Andrea Rodríguez, Max J. Egenhofer - IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING , 2003
"... Semantic similarity measures play an important role in information retrieval and information integration. Traditional approaches to modeling semantic similarity compute the semantic distance between definitions within a single ontology. This single ontology is either a domain-independent ontology or ..."
Abstract - Cited by 119 (3 self) - Add to MetaCart
Semantic similarity measures play an important role in information retrieval and information integration. Traditional approaches to modeling semantic similarity compute the semantic distance between definitions within a single ontology. This single ontology is either a domain-independent ontology or the result of the integration of existing ontologies. We present an approach to computing semantic similarity that relaxes the requirement of a single ontology and accounts for differences in the levels of explicitness and formalization of the different ontology specifications. A similarity function determines similar entity classes by using a matching process over synonym sets, semantic neighborhoods, and distinguishing features that are classified into parts, functions, and attributes. Experimental results with different ontologies indicate that the model gives good results when ontologies have complete and detailed representations of entity classes. While the combination of word matching and semantic neighborhood matching is adequate for detecting equivalent entity classes, feature matching allows us to discriminate among similar, but not necessarily equivalent, entity classes.

Semantic E-Workflow Composition

by Jorge Cardoso, Amit Sheth - Journal of Intelligent Information Systems , 2003
"... Systems and infrastructures are currently being developed to support Web services. The main idea is to encapsulate an organization’s functionality within an appropriate interface and advertise it as Web services. While in some cases Web services may be utilized in an isolated form, it is normal to e ..."
Abstract - Cited by 112 (19 self) - Add to MetaCart
Systems and infrastructures are currently being developed to support Web services. The main idea is to encapsulate an organization’s functionality within an appropriate interface and advertise it as Web services. While in some cases Web services may be utilized in an isolated form, it is normal to expect Web services to be integrated as part of workflow processes. The composition of workflow processes that model e-service applications differs from the design of traditional workflows, in terms of the number of tasks (Web services) available to the composition process, in their heterogeneity, and in their autonomy. Therefore, two problems need to be solved: how to efficiently discover Web services – based on functional and operational requirements – and how to facilitate the interoperability of heterogeneous Web services. In this paper, we present a solution within the context of the emerging Semantic Web, that includes use of ontologies to overcome some of the problems. We start by illustrating the steps involved in the composition of a workflow. Two of these steps are the discovery of Web services and their posterior integration into a workflow. To assist designers with those two steps, we have devised an algorithm to simultaneously discover Web services and resolve heterogeneity among their interfaces and the workflow host. Finally, we describe a prototype that has been implemented to illustrate how discovery and interoperability functions are achieved.

Indexing with WordNet synsets can improve text retrieval

by Julio Gonzalo, Felisa Verdejo, Irina Chugur, Juan Cigarrin , 1998
"... The classical, vector space model for text retrieval is shown to give better results (up to 29% better in our experiments) ff WordNet synsets are chosen as the indexing space, instead of word forms. This resuit is obtained for a manually disambiguated test collection (of queries and documents) deriv ..."
Abstract - Cited by 110 (2 self) - Add to MetaCart
The classical, vector space model for text retrieval is shown to give better results (up to 29% better in our experiments) ff WordNet synsets are chosen as the indexing space, instead of word forms. This resuit is obtained for a manually disambiguated test collection (of queries and documents) derived from the SEMCOR semantic concordance. The sensitiv- ity of retrieval performance to (automatic) disambiguation errors when indexing documents is also measured. Finally, it is observed that ff queries are not disambiguated, indexing by synsets performs (at best) only as good as standard word indexing.

Experiments on Using Semantic Distances Between Words in Image Caption Retrieval

by Alan Smeaton, Ian Quigley , 1996
"... Traditional approaches to information retrieval are based upon representing a user's query as a bag of query terms and a document as a bag of index terms and computing a degree of similarity between the two based on the overlap or number of query terms in common between them. Our long-term approach ..."
Abstract - Cited by 83 (2 self) - Add to MetaCart
Traditional approaches to information retrieval are based upon representing a user's query as a bag of query terms and a document as a bag of index terms and computing a degree of similarity between the two based on the overlap or number of query terms in common between them. Our long-term approach to IR applications is based upon precomputing semantically-based word-word similarities, work which is described elsewhere, and using these as part of the document-query similarity measure. A basic premise of our word-to-word similarity measure is that the input to this computation is the correct or intended word sense but in information retrieval applications, automatic and accurate word sense disambiguation remains an unsolved problem. In this paper we describe our first successful application of these ideas to an information retrieval application, specifically the indexing and retrieval of captions describing the content of images. We have hand-captioned 2714 images and to circumvent, fo...

An effective approach to document retrieval via utilizing wordnet and recognizing phrases

by Shuang Liu, Fang Liu, Clement Yu, Weiyi Meng - In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval , 2004
"... Noun phrases in queries are identified and classified into four types: proper names, dictionary phrases, simple phrases and complex phrases. A document has a phrase if all content words in the phrase are within a window of a certain size. The window sizes for different types of phrases are different ..."
Abstract - Cited by 50 (9 self) - Add to MetaCart
Noun phrases in queries are identified and classified into four types: proper names, dictionary phrases, simple phrases and complex phrases. A document has a phrase if all content words in the phrase are within a window of a certain size. The window sizes for different types of phrases are different and are determined using a decision tree. Phrases are more important than individual terms. Consequently, documents in response to a query are ranked with matching phrases given a higher priority. We utilize WordNet to disambiguate word senses of query terms. Whenever the sense of a query term is determined, its synonyms, hyponyms, words from its definition and its compound words are considered for possible additions to the query. Experimental results show that our approach yields between 23 % and 31% improvements over the best-known results on the TREC 9, 10 and 12 collections for short (title only) queries, without using Web data.

Exploiting Hierarchical Domain Structure to Compute Similarity

by Prasanna Ganesan , Hector Garcia-Molina, Jennifer Widom - ACM TRANSACTIONS ON INFORMATION SYSTEMS , 2003
"... ..."
Abstract - Cited by 50 (0 self) - Add to MetaCart
Abstract not found

Comparing Geospatial Entity Classes: An Asymmetric and Context-Dependent Similarity Measure

by M. Andrea Rodríguez, Max J. Egenhofer - International Journal of Geographical Information Science , 2004
"... Semantic similarity plays an important role in geographic information systems as it supports the identification of objects that are conceptually close, but not identical. Similarity assessments are particularly important for retrieval of geospatial data in such settings as digital libraries, heterog ..."
Abstract - Cited by 35 (1 self) - Add to MetaCart
Semantic similarity plays an important role in geographic information systems as it supports the identification of objects that are conceptually close, but not identical. Similarity assessments are particularly important for retrieval of geospatial data in such settings as digital libraries, heterogeneous databases, and the World Wide Web. Although some computational models for semantic similarity assessment exist, these models are typically limited by their inability to handle such important cognitive properties of similarity judgments as their inherent asymmetry and their dependence on context. This paper defines the Matching-Distance Similarity Measure (MDSM) for determining semantic similarity among spatial entity classes, taking into account the distinguishing features of these classes (parts, functions, and attributes) and their semantic interrelations (is-a and part-whole relations). A matching process is combined with a semantic-distance calculation to obtain asymmetric values of similarity that depend on the degree of generalization of entity classes. MDSM's matching process is also driven by contextual considerations, where the context determines the relative importance of distinguishing features. Based on a human-subject experiment, MDSM results correlate well with people's judgments of similarity. When contextual information is used for determining the importance of distinguishing features, this correlation increases; however, the major component of the correlation between MDSM results and people's judgments is due to a detailed definition of entity classes.

Relevance Weighting Using Distance Between Term Occurrences

by David Hawking, David Hawking, Paul Thistlewaite, Paul Thistlewaite , 1996
"... Recent work has achieved promising retrieval performance using distance between term occurrences as a primary estimator of document relevance. A major benefit of this approach is that relevance scoring does not rely on collection frequency statistics. A theoretical framework for lexical spans is now ..."
Abstract - Cited by 25 (6 self) - Add to MetaCart
Recent work has achieved promising retrieval performance using distance between term occurrences as a primary estimator of document relevance. A major benefit of this approach is that relevance scoring does not rely on collection frequency statistics. A theoretical framework for lexical spans is now proposed which encompasses these approaches and suggests a number of important directions for future experimental work. Based on the formalism, approaches to issues such as scoring partial spans, treatment of repeated term occurrences within spans, and the importance of ordering are proposed. Consideration is given to the practical application of the formalism to both locating and scoring concept intersections and to locating phrases (with an estimate of confidence) despite intervening or substituted words. 1 Introduction The idea that the relative positions of query terms within a document may supply information about relevance arose nearly forty years ago. As early as 1958, Luhn [6] wrot...

Semantic Structure Matching for Assessing Web Service Similarity

by Yiqiao Wang, Eleni Stroulia - 1st International Conference on Service Oriented Computing (ICSOC03 , 2003
"... Abstract. The web-services stack of standards is designed to support the reuse and interoperation of software components on the web. A critical step in the process of developing applications based on web services is service discovery, i.e., the identification of existing web services that can potent ..."
Abstract - Cited by 21 (3 self) - Add to MetaCart
Abstract. The web-services stack of standards is designed to support the reuse and interoperation of software components on the web. A critical step in the process of developing applications based on web services is service discovery, i.e., the identification of existing web services that can potentially be used in the context of a new web application. UDDI, the standard API for publishing webservices specifications, provides a simple browsing-by-business-category mechanism for developers to review and select published services. To support programmatic service discovery, we have developed a suite of methods that utilizes both the semantics of the identifiers of WSDL descriptions and the structure of their operations, messages and data types to assess the similarity of two WSDL files. Given only a textual description of the desired service, a semantic information-retrieval method can be used to identify and order the most similar service-description files. This step assesses the similarity of the provided description of the desired service with the available services. If a (potentially partial) specification of the desired service behavior is also available, this set of likely candidates can be further refined by a semantic structure-matching step assessing the structural similarity of the desired vs. the retrieved services and the semantic similarity of their identifier. In this paper, we describe and experimentally evaluate our suite of service-similarity assessment methods. 1
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University