Results 11 - 20
of
48
On Expanding Query Vectors with Lexically Related Words
, 1994
"... Experiments performed on small collections suggest that expanding query vectors with words that are lexically related to the original query words can improve retrieval effectiveness. Prior experiments using WordNet to automatically expand vectors in the large TREC-1 collection were inconclusive rega ..."
Abstract
-
Cited by 36 (2 self)
- Add to MetaCart
(Show Context)
Experiments performed on small collections suggest that expanding query vectors with words that are lexically related to the original query words can improve retrieval effectiveness. Prior experiments using WordNet to automatically expand vectors in the large TREC-1 collection were inconclusive regarding effectiveness gains from lexically related words since any such effects were dominated by the choice of words to expand. This paper specifically investigates the effect of expansion by selecting query concepts to be expanded by hand. Concepts are represented by WordNet synonym sets and are expanded by following the typed links included in WordNet. Experimental results suggest that this query expansion technique makes little difference in retrieval effectiveness within the TREC environment, presumably because the TREC topic statements provide such a rich description of the information being sought. 1 Introduction The IR group at Siemens Corporate Research is investigating how concept ...
Streams, Structures, Spaces, Scenarios, Societies (5S): A Formal Model for Digital Libraries
- ACM Trans. Inf. Syst
, 2004
"... Digital libraries (DLs) are complex information systems and therefore demand formal foundations lest development e#orts diverge and interoperability su#ers. In this paper, we propose the fundamental abstractions of Streams, Structures, Spaces, Scenarios, and Societies (5S), which contribute to defin ..."
Abstract
-
Cited by 27 (3 self)
- Add to MetaCart
(Show Context)
Digital libraries (DLs) are complex information systems and therefore demand formal foundations lest development e#orts diverge and interoperability su#ers. In this paper, we propose the fundamental abstractions of Streams, Structures, Spaces, Scenarios, and Societies (5S), which contribute to define digital libraries rigorously and usefully. Streams are sequences of abstract items used to describe static and dynamic content. Structures can be defined as labeled directed graphs, which impose organization. Spaces are sets of abstract items and operations on those sets that obey certain rules. Scenarios consist of sequences of events or actions that modify states of a computation in order to accomplish a functional requirement. Societies comprehend entities and the relationships between and among them. Together these abstractions relate and unify concepts, among others, of digital objects, metadata, collections, and services required to formalize and elucidate "digital libraries". The applicability, versatility and unifying power of the theory is demonstrated through its use in three distinct applications: building and interpretation of a DL taxonomy, analysis of case studies of digital libraries, and utilization as a formal basis for a DL description language. Keywords: digital libraries, theory, foundations, definitions, applications 1 1 Motivation Digital libraries are extremely complex information systems. The proper concept of a digital library seems hard to completely understand and evades definitional consensus. Di#erent views (e.g., historical, technological) and perspectives (e.g., from the library and information science, information retrieval, or human-computer interaction communities) have led to a myriad of di#ering definitions. Licklider, in his seminal ...
Building A Large Thesaurus For Information Retrieval
, 1988
"... Information retrieval systems that support searching of large textual databases are typically accessed by trained search intermediaries who provide assistance to end users in bridging the gap between the languages of authors and inquirers. We are building a thesaurus in the form of a large semantic ..."
Abstract
-
Cited by 21 (2 self)
- Add to MetaCart
Information retrieval systems that support searching of large textual databases are typically accessed by trained search intermediaries who provide assistance to end users in bridging the gap between the languages of authors and inquirers. We are building a thesaurus in the form of a large semantic network io support interactive query expansion and search by end users. Our lexicon is being built by analyzing and merging data from several large English dictionaries; testing of its value for reuieval is with the SMART and CODER systems.
Cheese: an overview
- In Cheese: Chemistry, Physics and Microbiology, Vol 1: General Aspects
, 2004
"... ..."
(Show Context)
Discovery of Similarity Computations of Search Engines
- In Proceedings of the 9th International Conference on Information and Knowledge Management (CIKM). ACM
, 2000
"... Two typical situations in which it is of practical interest to determine the similarities of text documents to a query due to a search engine are: (1) a global search engine, constructed on top of a group of local search engines, wishes to retrieve the set of local documents globally most similar to ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
(Show Context)
Two typical situations in which it is of practical interest to determine the similarities of text documents to a query due to a search engine are: (1) a global search engine, constructed on top of a group of local search engines, wishes to retrieve the set of local documents globally most similar to a given query; and (2) an organization wants to compare the retrieval performance of search engines. The dot-product function is a widely used similarity function. For a search engine using such a function, we can determine its similarity computations if how the search engine sets the weights of terms is known, which is usually not the case. In this paper, techniques are presented to discover certain mathematical expressions of these formulas and the values of embedded constants when the dot-product similarity function is used. Preliminary results from experiments on the WebCrawler search engine are given to illustrate our techniques. 1 Categories and Subject Descriptors H.3 [Information...
Using Syntactic Information in Handling Natural Language Queries for Extended Boolean Retrieval Model
- In “Proceedings of the 4th international workshop on information retrieval with Asian languages (IRAL99)”, Academia Sinica
, 1999
"... There are considerable evidences that trained users can achieve a good search effectiveness through structured boolean queries rather than simple key-word queries because boolean operators can help to make more accurate representations of users' information search needs. However, it is not norm ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
There are considerable evidences that trained users can achieve a good search effectiveness through structured boolean queries rather than simple key-word queries because boolean operators can help to make more accurate representations of users' information search needs. However, it is not normally easy for ordinary users to construct effective boolean queries using appropriate boolean operators. In this paper, we propose a syntax-based technique for handling natural language queris and phrases for extended boolean retrieval model in order to pursue both search effectiveness and user convenience. First, natural language queries are syntactically analyzed using Korean natural language parser and the resulting syntactic trees are structurally simplified using tree-simplifying mechanism in order to catch the logical relationships between keywords. Secondly, in a simplified tree, plausible noun phrases are identified and added into the tree as new additional keywords for more precise retrie...
TSSP: A Reinforcement Algorithm to Find Related Papers
- In Proceedings of the Web Intelligence
, 2004
"... Content analysis and citation analysis are two common methods in recommending system. Compared with content analysis, citation analysis can discover more implicitly related papers. However, the citation-based methods may introduce more noise in citation graph and cause topic drift. Some work combine ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Content analysis and citation analysis are two common methods in recommending system. Compared with content analysis, citation analysis can discover more implicitly related papers. However, the citation-based methods may introduce more noise in citation graph and cause topic drift. Some work combine content with citation to improve similarity measurement. The problem is that the two features are not used to reinforce each other to get better result. To solve the problem, we propose a new algorithm, Topic Sensitive Similarity Propagation (TSSP), to effectively integrate content similarity into similarity propagation. TSSP has two parts: citation context based propagation and iterative reinforcement. First, citation contexts provide clues for which papers are topic related to and filter out less irrelevant citations. Second, iteratively integrating content and citation similarity enable them to reinforce each other during the propagation. The experimental results of a user study show TSSP outperforms other algorithms in almost all cases. 1.
A Brief Review of Information Retrieval Models
, 2007
"... Information retrieval models have been studied for decades, leading to a huge body of literature on the topic. In this paper, we briefly review this body of literature along with a discussion of some recent trends. ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
(Show Context)
Information retrieval models have been studied for decades, leading to a huge body of literature on the topic. In this paper, we briefly review this body of literature along with a discussion of some recent trends.