Results 1 - 10
of
36
Improving Browsing in Digital Libraries with Keyphrase Indexes
, 1998
"... Browsing accounts for much of people's interaction with digital libraries, but it is poorly supported by standard search engines. Conventional systems often operate at the wrong level, indexing words when people think in terms of topics, and returning documents when people want a broader view. As a ..."
Abstract
-
Cited by 49 (9 self)
- Add to MetaCart
Browsing accounts for much of people's interaction with digital libraries, but it is poorly supported by standard search engines. Conventional systems often operate at the wrong level, indexing words when people think in terms of topics, and returning documents when people want a broader view. As a result, users cannot easily determine what is in a collection, how well a particular topic is covered, or what kinds of queries will provide useful results. We have built
A Parallel Computing Approach to Creating Engineering Concept Spaces for Semantic Retrieval: The Illinois Digital Library Initiative Project
- IEEE Transactions on Pattern Analysis and Machine Intelligence
, 1996
"... : This research presents preliminary results generated from the semantic retrieval research component of the Illinois Digital Library Initiative (DLI) project. Using a variation of the automatic thesaurus generation techniques, to which we refer as the concept space approach, we aimed to create gra ..."
Abstract
-
Cited by 37 (12 self)
- Add to MetaCart
: This research presents preliminary results generated from the semantic retrieval research component of the Illinois Digital Library Initiative (DLI) project. Using a variation of the automatic thesaurus generation techniques, to which we refer as the concept space approach, we aimed to create graphs of domain-specific concepts (terms) and their weighted co-occurrence relationships for all major engineering domains. Merging these concept spaces and providing traversal paths across different concept spaces could potentially help alleviate the vocabulary (difference) problem evident in large-scale information retrieval. We have experimented previously with such a technique for a smaller molecular biology domain (Worm Community System, with 10+ MBs of document collection) with encouraging results. In order to address the scalability issue related to large-scale information retrieval and analysis for the current Illinois DLI project, we recently conducted experiments using the concept sp...
Automatic Subject Indexing Using An Associative Neural Network
- IN: PROCEEDINGS OF THE 3 RD ACM INTERNATIONAL CONFERENCE ON DIGITAL LIBRARIES (DL’98
, 1998
"... The global growth in popularity of the World Wide Web has been enabled in part by the availability of browser based search tools which in turn have led to an increased demand for indexing techniques and technologies. As the amount of globally accessible information in community repositories grows, i ..."
Abstract
-
Cited by 20 (6 self)
- Add to MetaCart
The global growth in popularity of the World Wide Web has been enabled in part by the availability of browser based search tools which in turn have led to an increased demand for indexing techniques and technologies. As the amount of globally accessible information in community repositories grows, it is no longer cost-effective for such repositories to be indexed by professional indexers who have been trained to be consistent in subject assignment from controlled vocabulary lists. The era of amateur indexers is thus upon us, and the information infrastructure needs to provide support for such indexing if search of the Net is to produce useful results. In this paper
Updateable PAT-Tree Approach to Chinese Key Phrase Extraction Using Mutual . . .
, 1999
"... There has been renewed research interest in using the statistical approach to extraction of key phrases from Chinese documents because existing approaches do not allow online frequency updates after phrases have been extracted. This consequently results in inaccurate, partial extraction. In this pap ..."
Abstract
-
Cited by 20 (9 self)
- Add to MetaCart
There has been renewed research interest in using the statistical approach to extraction of key phrases from Chinese documents because existing approaches do not allow online frequency updates after phrases have been extracted. This consequently results in inaccurate, partial extraction. In this paper, we present an updateable PAT-tree approach. In our experiment, we compared our approach with that of Lee-Feng Chien with that showed an improvement in recall from 0.19 to 0.43 and in precision from 0.52 to 0.70. This paper also reviews the requirements for a data structure that facilitates implementation of any statistical approaches to key-phrase extraction, including PAT-tree, PAT-array and suffix array with semi-infinite strings.
Semantic Indexing for a Complete Subject Discipline
- In 4th Int ACM Conf on Digital Libraries
, 1999
"... As part of the Illinois Digital Library Initiative (DLI) project we developed "scalable semantics" technologies. These statistical techniques enabled us to index large collections for deeper search than word matching. Through the auspices of the DARPA Information Management program, we are developin ..."
Abstract
-
Cited by 16 (4 self)
- Add to MetaCart
As part of the Illinois Digital Library Initiative (DLI) project we developed "scalable semantics" technologies. These statistical techniques enabled us to index large collections for deeper search than word matching. Through the auspices of the DARPA Information Management program, we are developing an integrated analysis environment, the Interspace Prototype, that uses "semantic indexing" as the foundation for supporting concept navigation. These semantic indexes record the contextual correlation of noun phrases, and are computed generically, independent of subject domain. Using this technology, we were able to compute semantic indexes for a subject discipline. In particular, in the summer of 1998, we computed concept spaces for 9.3M MEDLINE bibliographic records from the National Library of Medicine (NLM) which extensively covered the biomedical literature for the period from 1966 to 1997. In this experiment, we first partitioned the collection into smaller collections (repositorie...
Augmenting Thesaurus Relationships: Possibilities for Retrieval
- Journal of Digital Information
, 2001
"... This paper discusses issues concerning the augmentation of thesaurus relationships, in light of new application possibilities for retrieval. We first discuss a case study that explored the retrieval potential of an augmented set of thesaurus relationships by specialising standard relationships into ..."
Abstract
-
Cited by 11 (2 self)
- Add to MetaCart
This paper discusses issues concerning the augmentation of thesaurus relationships, in light of new application possibilities for retrieval. We first discuss a case study that explored the retrieval potential of an augmented set of thesaurus relationships by specialising standard relationships into richer subtypes, in particular hierarchical geographical containment and the associative relationship. We then locate this work in a broader context by reviewing various attempts to build taxonomies of thesaurus relationships and conclude by discussing the feasibility of hierarchically augmenting the core set of thesaurus relationships, particularly the associative relationship. We discuss the possibility of enriching the specification and semantics of RT relationships, while maintaining compatibility with traditional thesauri via a limited hierarchical extension of the associative (and hierarchical) relationships. This would be facilitated by distinguishing the type of term from the (sub)type of relationship and explicitly specifying semantic categories for terms following a faceted approach. We first illustrate how hierarchical spatial relationships can be used to provide more flexible retrieval for queries incorporating place names in applications employing online gazetteers and geographical thesauri. We then employ a set of experimental scenarios to investigate key issues affecting use of the associative (RT) thesaurus relationships in semantic distance measures. Previous work has noted the potential of RTs in thesaurus search aids but also the problem of uncontrolled expansion of result sets. Results presented in this paper suggest a potential for taking account of the hierarchical context of an RT link and specialisations of the RT relationship. 1.
High-Performance Distributed Digital Libraries: Building the Interspace on the Grid
- 7TH IEEE SYMP HIGH-PERFORMANCE DISTRIBUTED COMPUTING
, 1998
"... The Net of the 21 st Century will radically transform interaction with knowledge. Users will navigate in the Interspace, across logical spaces of semantic indexes, rather than in the Internet, across physical networks of computer servers. Correlation across indexed collections is the most important ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
The Net of the 21 st Century will radically transform interaction with knowledge. Users will navigate in the Interspace, across logical spaces of semantic indexes, rather than in the Internet, across physical networks of computer servers. Correlation across indexed collections is the most important feature of this infrastructure. Over ten years of research, the author has developed scalable technology for generating the necessary semantic indexes. Construction of large-scale models of the Interspace is feasible now under controlled laboratory conditions. Community repositories for entire scientific disciplines have been constructed using supercomputer simulations on millions of documents. A model Interspace is a set of community repositories, interconnected by concept switching networks to support information analysis across subject domains. CANIS has constructed several model testbeds with increasingly better infrastructure technology. We propose a PACI Interspace for the NSF flagship efforts of the HPDC community. The Interspace would provide concept switching for the users while the Grid would provide object switching for the sources.
Detecting Emerging Concepts in Textual Data Mining
- In Computational Information Retrieval
, 2001
"... This article summarizes our research to date in the automatic identification of emerging trends in textual data. Applications are numerous: the detection of trends in warranty repair claims, for example, is of genuine interest to NCSA industrial partners Caterpillar and Boeing. Technology forecastin ..."
Abstract
-
Cited by 9 (4 self)
- Add to MetaCart
This article summarizes our research to date in the automatic identification of emerging trends in textual data. Applications are numerous: the detection of trends in warranty repair claims, for example, is of genuine interest to NCSA industrial partners Caterpillar and Boeing. Technology forecasting is another example with numerous applications of both academic and practical interest. In general, trending analysis of textual data can be performed in any domain that involves written records of human endeavors whether scientific or artistic in nature
Bibliometric Information Retrieval System (BIRS): A Web Search Interface Utilizing Bibliometric Research Results
- Journal of the American Society for Information Science
, 2000
"... Introduction TheInternetandWWWhavealreadyestablishedthemselvesasmajorfactorsintheoperationofscholarlycom - munitiesworldwide.Today,theInternetisusedinall spheresoflifeforexchangeofinformation.Information resourcesontheInternetareincreasingtremendously.GordonandPathak (1999)suggestedthattheprimaryus ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Introduction TheInternetandWWWhavealreadyestablishedthemselvesasmajorfactorsintheoperationofscholarlycom - munitiesworldwide.Today,theInternetisusedinall spheresoflifeforexchangeofinformation.Information resourcesontheInternetareincreasingtremendously.GordonandPathak (1999)suggestedthattheprimaryuseofthe Internetisforinformationretrieval.Searchenginesare consideredasthemostimportanttoolforretrievinginformationontheWeb, andconsequentlyformacriticalareaof research(Gaines,Chen,&Shaw,1997;Lawrence&Giles, 1998). DespitetheeffectivenessofInternet-basedoronlineinformationretrieval, problemsstillexist.Woodward(1996) arguedthattheInternetiscurrentlyinastateofnearchaos intermsofaccessandorganizationofinformation.Voorbij (1999)foundthat67%oftheInternetusersagreeor stronglyagreewiththedifficultytoperformsubject searchesontheInternet.Users,especiallythenoviceand irregularusers,finditdifficulttophrasetheirinformation needsduetothelackofknowledgeliteracyinsearch domain(Bates,1986,1998).Alth
The Domain-Specific Task of CLEF - Specific Evaluation Strategies in Cross-Language Information Retrieval
- In C. Peters(Ed.), Proceedings of the CLEF 2000 evaluation forum
, 2001
"... Abstract. This paper describes the domain-specific cross-language information retrieval (CLIR) task of CLEF, why and how it is important and how it differs from general cross-language retrieval problem associated with the general CLEF collections. The inclusion of a domainspecific document collectio ..."
Abstract
-
Cited by 8 (2 self)
- Add to MetaCart
Abstract. This paper describes the domain-specific cross-language information retrieval (CLIR) task of CLEF, why and how it is important and how it differs from general cross-language retrieval problem associated with the general CLEF collections. The inclusion of a domainspecific document collection and topics has both advantages and disadvantages

