Results 1 - 10
of
25
Dependency-based construction of semantic space models
- Computational Linguistics
, 2007
"... Traditionally, vector-based semantic space models use word co-occurrence counts from large corpora to represent lexical meaning. In this article we present a novel framework for constructing semantic spaces that take syntactic relations into account. We introduce a formalization for this class of mo ..."
Abstract
-
Cited by 79 (6 self)
- Add to MetaCart
Traditionally, vector-based semantic space models use word co-occurrence counts from large corpora to represent lexical meaning. In this article we present a novel framework for constructing semantic spaces that take syntactic relations into account. We introduce a formalization for this class of models which allows linguistic knowledge to guide the construction process. We evaluate our framework on a range of tasks relevant for cognitive science and natural language processing: semantic priming, synonymy detection and word sense disambiguation. In all cases, our framework obtains results that are comparable or superior to the state of the art. 1.
Collection Statistics for Fast Duplicate Document Detection
- ACM TRANSACTIONS ON INFORMATION SYSTEMS
, 2002
"... ..."
Integrating Structured Data and Text: A relational approach
- Journal of the American Society of Information Science
, 1997
"... We integrate structured data and text using the unchanged, standard relational model. We started with the premise that a relational system could be used to implement an Information Retrieval (IR) system. After implementing a prototype to verify that premise, we then began to investigate the performa ..."
Abstract
-
Cited by 50 (27 self)
- Add to MetaCart
We integrate structured data and text using the unchanged, standard relational model. We started with the premise that a relational system could be used to implement an Information Retrieval (IR) system. After implementing a prototype to verify that premise, we then began to investigate the performance of a parallel relational database system for this application. We also tested the effect of query reduction on accuracy and found that queries can be reduced prior to their implementation without incurring a significant loss in precision/recall. This reduction also serves to improve run-time performance. After comparing our results to a special purpose IR system, we conclude that the relational model offers scalable performance and includes the ability to integrate structured data and text in a portable fashion. 1 Introduction Increasingly, applications integrate structured and unstructured data, responding to requests such as "Find articles containing vehicle and sales published in jou...
A structured vector space model for word meaning in context
, 2008
"... We address the task of computing vector space representations for the meaning of word occurrences, which can vary widely according to context. This task is a crucial step towards a robust, vector-based compositional account of sentence meaning. We argue that existing models for this task do not take ..."
Abstract
-
Cited by 30 (5 self)
- Add to MetaCart
We address the task of computing vector space representations for the meaning of word occurrences, which can vary widely according to context. This task is a crucial step towards a robust, vector-based compositional account of sentence meaning. We argue that existing models for this task do not take syntactic structure sufficiently into account. We present a novel structured vector space model that addresses these issues by incorporating the selectional preferences for words’ argument positions. This makes it possible to integrate syntax into the computation of word meaning in context. In addition, the model performs at and above the state of the art for modeling the contextual adequacy of paraphrases. 1
Semantic Structure Matching for Assessing Web Service Similarity
- 1st International Conference on Service Oriented Computing (ICSOC03
, 2003
"... Abstract. The web-services stack of standards is designed to support the reuse and interoperation of software components on the web. A critical step in the process of developing applications based on web services is service discovery, i.e., the identification of existing web services that can potent ..."
Abstract
-
Cited by 21 (3 self)
- Add to MetaCart
Abstract. The web-services stack of standards is designed to support the reuse and interoperation of software components on the web. A critical step in the process of developing applications based on web services is service discovery, i.e., the identification of existing web services that can potentially be used in the context of a new web application. UDDI, the standard API for publishing webservices specifications, provides a simple browsing-by-business-category mechanism for developers to review and select published services. To support programmatic service discovery, we have developed a suite of methods that utilizes both the semantics of the identifiers of WSDL descriptions and the structure of their operations, messages and data types to assess the similarity of two WSDL files. Given only a textual description of the desired service, a semantic information-retrieval method can be used to identify and order the most similar service-description files. This step assesses the similarity of the provided description of the desired service with the available services. If a (potentially partial) specification of the desired service behavior is also available, this set of likely candidates can be further refined by a semantic structure-matching step assessing the structural similarity of the desired vs. the retrieved services and the semantic similarity of their identifier. In this paper, we describe and experimentally evaluate our suite of service-similarity assessment methods. 1
Flexible interface matching for web-service discovery
- In: Web Information Systems Engineering (WISE). Proceedings of the Fourth International Conference on. (2003) 147–156
, 2003
"... The web-services stack of standards is designed to support the reuse and interoperation of software components on the web. A critical step, to that end, is service discovery, i.e., the identification of existing web services that can potentially be used in the context of a new web application. UDDI, ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
The web-services stack of standards is designed to support the reuse and interoperation of software components on the web. A critical step, to that end, is service discovery, i.e., the identification of existing web services that can potentially be used in the context of a new web application. UDDI, the standard API for publishing web-services specifications, provides a simple browsing-by-business-category mechanism for developers to review and select published services. In our work, we have developed a flexible service discovery method, for identifying potentially useful services and assessing their relevance to the task at hand. Given a textual description of the desired service, a traditional information-retrieval method is used to identify the most similar service description files, and to order them according to their similarity. Next, given this set of likely candidates and a (potentially partial) specification of the desired service behavior, a structure-matching step further refines and assesses the quality of the candidate service set. In this paper, we describe and experimentally evaluate our webservice discovery process. 1.
Constructing Semantic Space Models from Parsed Corpora
- IN PROCEEDINGS OF ACL-03
, 2003
"... Traditional vector-based models use word co-occurrence counts from large corpora to represent lexical meaning. In this paper we present a novel approach for constructing semantic spaces that takes syntactic relations into account. We introduce a formalisation for this class of models and eval ..."
Abstract
-
Cited by 14 (1 self)
- Add to MetaCart
Traditional vector-based models use word co-occurrence counts from large corpora to represent lexical meaning. In this paper we present a novel approach for constructing semantic spaces that takes syntactic relations into account. We introduce a formalisation for this class of models and evaluate their adequacy on two modelling tasks: semantic priming and automatic discrimination of lexical relations.
Measures and Applications of Lexical Distributional Similarity
, 2003
"... This thesis is concerned with the measurement and application of lexical distributional similarity. Two words are said to be distributionally similar if they appear in similar contexts. This loose definition, however, has led to many measures being proposed or adopted from fields such as geometry, s ..."
Abstract
-
Cited by 14 (0 self)
- Add to MetaCart
This thesis is concerned with the measurement and application of lexical distributional similarity. Two words are said to be distributionally similar if they appear in similar contexts. This loose definition, however, has led to many measures being proposed or adopted from fields such as geometry, statistics, Information Retrieval (IR) and Information Theory. Our aim is to investigate the properties which make a good measure of lexical distributional similarity. We start by introducing the concept of lexical distributional similarity. We discuss potential applications, which can be roughly divided into distributional or language modelling applications and semantic applications, and methods of evaluation (Chapter 2). We look at existing measures of distributional similarity and carry out an empirical comparison of fifteen of these measures, paying particular attention to the effects of word frequency (Chapter 3). We propose a new general framework for distributional similarity based on the context of lexical substitutability, which me measure using the IR concepts of precision and recall. This framework allows us to investigate the key factors in similarity of asymmetry, the relative influence of different contexts and the extent to which words share a context (Chapter 4). Finally, we consider the application of distributional similarity in language modelling (Chapter 5) and as a predictor of semantic similarity using human judgements of similarity and a spelling correction task (Chapter 6).
On the Design and Evaluation of a Multi-dimensional Approach to Information Retrieval
, 2000
"... We present a method of searching text collections that takes advantage of the inherent hierarchrical information within documents and integrates searches of structured and unstructured data. Multidimensional databases (MDB), designed for accessing data along hierarchical dimensions, are effective fo ..."
Abstract
-
Cited by 12 (1 self)
- Add to MetaCart
We present a method of searching text collections that takes advantage of the inherent hierarchrical information within documents and integrates searches of structured and unstructured data. Multidimensional databases (MDB), designed for accessing data along hierarchical dimensions, are effective for information retrieval. MDB are frequently used for On-line Analytic Processing (OLAP) applications with great success. We demonstrate a method of using OLAP techniques on a text collection. This combines traditional information retrieval and the slicing, dicing, drill-down, and roll-up of OLAP. We demonstrate use of a prototype for searching documents from the TREC collection. 1 Introduction # We propose a technique for searching text and structured data using an On-line Analytic Processing (OLAP) tool. We present how text is modeled and accessed in a multi-dimensional database, taking advantage of hierarchical information in both structured data and text. Hierarchies perme...
Improving Accuracy and Run-Time Performance for TREC-4
"... For TREC-4, we enhanced our existing prototype that implements relevance ranking using the AT&T DBC-1012 Model 4 parallel database machine to support the entire document collection. Additionally,we developed a special purpose IR prototype to test a new index compression algorithm and to provide p ..."
Abstract
-
Cited by 7 (4 self)
- Add to MetaCart
For TREC-4, we enhanced our existing prototype that implements relevance ranking using the AT&T DBC-1012 Model 4 parallel database machine to support the entire document collection. Additionally,we developed a special purpose IR prototype to test a new index compression algorithm and to provide performance comparisons to the relational approach. We submitted o#cial results for both automatic and manual adhoc queries for the entire 2GB English collection and the provided Spanish collection. Additionally,we submitted results using n-grams to process the corrupted data. In addition to implementing the vector-space model, we experimented with query reduction based on term frequency. Query reduction was shown to result in dramatically improved run-time performance and, in many cases, resulted in little or no degradation of precision#recall. 1 Introduction For TREC-4, we implemented relevance ranking queries using SQL on an AT&T DBC-1012 #formerly Teradata# parallel database machi...

