• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 927
Next 10 →

Document compaction for efficient query biased snippet generation

by Yohannes Tsegay, Simon J. Puglisi, Andrew Turpin - ECIR 2009 31st European Conference on Information Retrieval, volume 5478 of LNCS , 2009
"... Abstract. Current web search engines return query-biased snippets for each document they list in a result set. For efficiency, search engines operating on large collections need to cache snippets for common queries, and to cache documents to allow fast generation of snippets for uncached queries. To ..."
Abstract - Cited by 8 (2 self) - Add to MetaCart
Abstract. Current web search engines return query-biased snippets for each document they list in a result set. For efficiency, search engines operating on large collections need to cache snippets for common queries, and to cache documents to allow fast generation of snippets for uncached queries

Color and Texture Descriptors

by B. S. Manjunath, Jens-rainer Ohm, Vinod V. Vasudevan, Akio Yamada - In IEEE Transaction on Circuits and Systems for Video Technology , 2001
"... Abstract—This paper presents an overview of color and texture descriptors that have been approved for the Final Committee Draft of the MPEG-7 standard. The color and texture descriptors that are described in this paper have undergone extensive evaluation and development during the past two years. Ev ..."
Abstract - Cited by 353 (0 self) - Add to MetaCart
descriptor, and a color layout descriptor. The three texture descriptors include one that characterizes homogeneous texture regions and another that represents the local edge distribution. A compact descriptor that facilitates texture browsing is also defined. Each of the descriptors is explained in detail

TileBars: Visualization of Term Distribution Information in Full Text Information Access

by Marti A. Hearst , 1995
"... The field of information retrieval has traditionally focused on textbases consisting of titles and abstracts. As a consequence, many underlying assumptions must be altered for retrieval from full-length text collections. This paper argues for making use of text structure when retrieving from full te ..."
Abstract - Cited by 341 (10 self) - Add to MetaCart
text documents, and presents a visualization paradigm, called TileBars, that demonstrates the usefulness of explicit term distribution information in Boolean-type queries. TileBars simultaneously and compactly indicate relative document length, query term frequency, and query term distribution

Structural analysis of hypertexts: Identifying hierarchies and useful metrics

by Rodrigo A. Botafogo, Ehud Rivlin, Ben Shneiderman - ACM TRANSACTIONS ON INFORMATION SYSTEMS , 1992
"... Hypertext users often suffer from the “lost in hyperspace ” problem: disorientation from too many Jumps while traversing a complex network. One solution to this problem is Improved authoring to create more comprehensible structures. This paper proposes several authoring tools, based on hypertext str ..."
Abstract - Cited by 213 (2 self) - Add to MetaCart
different views of the same hypertext The second part helps authors by Identifying properties of the hypertext document Multiple metrics are developed including compactness and stratum. Compactness indicates the mtrinslc connectedness of the hypertext, and stratum reveals to what degree the hypertext

Compacting XML Documents

by Miklos Kalman, Ferenc Havasi, Tibor Gyimothy
"... Nowadays one of the most common formats for storing information is XML. The size of XML documents can be rather large, and they may contain redundant attributes which can be calculated from others. The main idea behind our paper is based on a relationship between XML documents and attribute gram ..."
Abstract - Add to MetaCart
grammars. Using this relationship it is possible to de ne semantic rules for XML attributes using a metalanguage called SRML. With this metalanguage we decided to develop a method for compacting XML documents. After compaction it is possible to use XML compressors to make the compacted document

Large-scale bayesian logistic regression for text categorization

by Alexander Genkin, David D. Lewis, David Madigan - Technometrics
"... Logistic regression analysis of high-dimensional data, such as natural language text, poses computational and statistical challenges. Maximum likelihood estimation often fails in these applications. We present a simple Bayesian logistic regression approach that uses a Laplace prior to avoid overfitt ..."
Abstract - Cited by 191 (13 self) - Add to MetaCart
overfitting and produces sparse predictive models for text data. We apply this approach to a range of document classification problems and show that it produces compact predictive models at least as effective as those produced by support vector machine classifiers or ridge logistic regression combined

PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities

by Francisco Matias Cuenca-acuna, Christopher Peery, Richard P. Martin, Thu D. Nguyen , 2003
"... Abstract. We present PlanetP, a peer-to-peer (P2P) content search and retrieval infrastructure targeting communities wishing to share large sets of text documents. P2P computing is an attractive model for information sharing between ad hoc groups of users because of its low cost of entry and explici ..."
Abstract - Cited by 194 (11 self) - Add to MetaCart
systems. PlanetP takes the novel approach of replicating the global directory and a compact summary index at every peer using gossiping. PlanetP then leverages this information to approximate a state-of-the-art document ranking algorithm to help users locate relevant information within the large communal

Semi-supervised learning of compact document representations with deep networks

by Martin Szummer - International Conferenece on Machine Learning , 2008
"... Finding good representations of text documents is crucial in information retrieval and classification systems. Today the most popular document representation is based on a vector of word counts in the document. This representation neither captures dependencies between related words, nor handles syno ..."
Abstract - Cited by 35 (1 self) - Add to MetaCart
synonyms or polysemous words. In this paper, we propose an algorithm to learn text document representations based on semi-supervised autoencoders that are stacked to form a deep network. The model can be trained efficiently on partially labeled corpora, producing very compact representations of documents

Faster Compact Top-k Document Retrieval

by Roberto Konow, Gonzalo Navarro
"... An optimal index solving top-k document retrieval [Navarro and Nekrich, SODA’12] takes O(m + k) time for a pattern of length m, but its space is at least 80n bytes for a collection of n symbols. We reduce it to 1.5n– 3n bytes, with O(m+(k+log log n) log log n) time, on typical texts. The index is u ..."
Abstract - Cited by 8 (5 self) - Add to MetaCart
An optimal index solving top-k document retrieval [Navarro and Nekrich, SODA’12] takes O(m + k) time for a pattern of length m, but its space is at least 80n bytes for a collection of n symbols. We reduce it to 1.5n– 3n bytes, with O(m+(k+log log n) log log n) time, on typical texts. The index

Using Wikipedia Categories for Compact Representations of Chemical Documents

by Benjamin Köhncke, Wolf-tilo Balke
"... Today, Web pages are usually accessed using text search engines, whereas documents stored in the deep Web are accessed through domain-specific Web portals. These portals rely on external knowledge bases, respectively ontologies, mapping documents to more general concepts allowing for suitable classi ..."
Abstract - Cited by 3 (2 self) - Add to MetaCart
the content of chemical documents in a compact form. We compare the results to the domain-specific ChEBI ontology and the results show that Wikipedia categories indeed allow useful descriptions for chemical documents that are even better than descriptions from the ChEBI ontology.
Next 10 →
Results 1 - 10 of 927
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University