Results 1 -
3 of
3
Efficient Single-Pass Index Construction for Text Databases
- Jour. of the American Society for Information Science and Technology
, 2003
"... Efficient construction of inverted indexes is essential to provision of search over large collections of text data. In this paper, we review the principal approaches to inversion, analyse their theoretical cost, and present experimental results. We identify the drawbacks of existing inversion approa ..."
Abstract
-
Cited by 31 (2 self)
- Add to MetaCart
Efficient construction of inverted indexes is essential to provision of search over large collections of text data. In this paper, we review the principal approaches to inversion, analyse their theoretical cost, and present experimental results. We identify the drawbacks of existing inversion approaches and propose a single-pass inversion method that, in contrast to previous approaches, does not require the complete vocabulary of the indexed collection in main memory, can operate within limited resources, and does not sacrifice speed with high temporary storage requirements. We show that the performance of the single-pass approach can be improved by constructing inverted files in segments, reducing the cost of disk accesses during inversion of large volumes of data.
The Application Protocol Information Base World Wide Web Gateway
- NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY, NISTIR 5868
, 1996
"... The Application Protocol Information Base (APIB) is an on-line repository of documents for the Standard for the Exchange of Product model data (STEP, officially ISO 10303 -- Product Data Representation and Exchange9. Document types in the APIB include STEP Application Protocols and Integrated Resour ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
The Application Protocol Information Base (APIB) is an on-line repository of documents for the Standard for the Exchange of Product model data (STEP, officially ISO 10303 -- Product Data Representation and Exchange9. Document types in the APIB include STEP Application Protocols and Integrated Resources. Application Protocols are standards that are intended to be implemented in software systems, and Integrated Resources are used by them as building blocks. Application Protocols and Integrated Resources are represented in the Standard Generalized Markup Language (SGML) in the APIB in order to facilitate efficient information search and retrieval. This paper describes a World Wide Web gateway to the APIB, implemented using the Common Gateway Interface (CGI) standard. The APIB gateway allows STEP developers to efficiently search for ISO 10303 standards and supporting information. The only client software required to use the APIB gateway is a third party web browser.
A scalable architecture for XML retrieval
, 2003
"... While in classical text collections documents are regarded as atomic units, in XML collections nested elements of varying granularity are considered. This augmented view increases the number of potentially retrieved objects, e.g. documents, elements within documents, or aggregations of elements or o ..."
Abstract
- Add to MetaCart
While in classical text collections documents are regarded as atomic units, in XML collections nested elements of varying granularity are considered. This augmented view increases the number of potentially retrieved objects, e.g. documents, elements within documents, or aggregations of elements or of documents. The increase in the number of objects to be indexed and retrieved by XML retrieval systems leads, for XML collections of comparably small size (several 100 MB), already to the necessity to apply strategies for scalability, such as paralell and distributed processing, term, document and database pre-selection. We report in this paper on our approach for dealing with XML collections in general, and with the INEX collection in particular, using a scalable indexing and retrieval architecture. 1

