Results 1 - 10
of
12
Stylistic Experiments For Information Retrieval
, 2000
"... Information retrieval systems are built to handle texts as topical items: texts are tabulated by occurrence frequencies of content words in them, under the assumption that text topic is reasonably well modeled by content word occurrence. But texts have several interesting characteristics beyond topi ..."
Abstract
-
Cited by 47 (8 self)
- Add to MetaCart
Information retrieval systems are built to handle texts as topical items: texts are tabulated by occurrence frequencies of content words in them, under the assumption that text topic is reasonably well modeled by content word occurrence. But texts have several interesting characteristics beyond topic. The experiments described in this text investigate stylistic variation. Roughly put, style is the difference between two ways of saying the same thing -- and systematic stylistic variation can be used to characterize the genre of documents. These experiments investigate if stylistic information is distinguishable using simple language engineering methods, and if in that case this type of information can be used to improve information retrieval systems.
A Framework for Creating a Facetted Classification for Genres: Addressing Issues of Multidimensionality
, 2004
"... People recognize and use document genres as a way of identifying useful information and of participating in mutually understood communicative acts. Crowston and Kwasnik [1] discuss the possibility of improving information access in large digital collections through the identification and use of docu ..."
Abstract
-
Cited by 27 (4 self)
- Add to MetaCart
People recognize and use document genres as a way of identifying useful information and of participating in mutually understood communicative acts. Crowston and Kwasnik [1] discuss the possibility of improving information access in large digital collections through the identification and use of document genre metadata. They draw on the definition of genre proposed by Orlikowski and Yates [3], who describe genre as "a distinctive type of communicative action, characterized by a socially recognized communicative purpose and common aspects of form" (p. 543). Scholars in fields such as rhetoric and library science have attempted to describe and systematize the notion of genre, and have offered many different definitions of genre. We like Orlikowski and Yates's definition because it takes into account all three aspects of genre that we recognize as fundamental: content, form, and purpose.
Iterative Information Retrieval Using Fast Clustering and Usage-Specific Genres
- Paper presented at the Eighth DELOS Workshop: User Interface in Digital Libraries
, 1998
"... This paper describes how collection specific empirically defined stylistics based genre prediction can be brought together together with rapid topical clustering to build an interactive information retrieval interface with multi-dimensional presentation of search results. The prototype presented add ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
This paper describes how collection specific empirically defined stylistics based genre prediction can be brought together together with rapid topical clustering to build an interactive information retrieval interface with multi-dimensional presentation of search results. The prototype presented addresses two specific problems of information retrieval: how to enrich the information seeking dialog by encouraging and supporting iterative refinement of queries, and how to enrich the document representation past the shallow semantics allowed by term frequencies. Searching For More Than Words Today's tools for searching information in a document database are based on term occurrence in texts. The searcher enters a number of terms and a number of documents where those terms or closely related terms appear comparatively frequently are retrieved and presented by the system in list form. This method works well up to a point. It is intuitively understandable, and for competent users and well e...
Can Document-Genre Metadata Improve Information Access to Large Digital Collections
, 2003
"... We discuss the issues of resolving the information-retrieval problem in large digital collections through the identification and use of document genres. Explicit identification of genre seems particularly important for such collections because any search usually retrieves documents with a diversity ..."
Abstract
-
Cited by 10 (3 self)
- Add to MetaCart
We discuss the issues of resolving the information-retrieval problem in large digital collections through the identification and use of document genres. Explicit identification of genre seems particularly important for such collections because any search usually retrieves documents with a diversity of genres that are undifferentiated by obvious clues as to their identity. As well, because most genres are characterized by both form and purpose, identifying the genre of a document provides information as to the documents purpose and its fit to the users situation, which can be otherwise difficult to assess. We begin by outlining the possible role of genre identification in the information-retrieval process. Our assumption is that genre identification would enhance searching, first because we know that topic alone is not enough to define an information problem and second because search results containing genre information would be more easily understandable. Next, we discuss how information professionals have traditionally tackled the issues of representing genre in settings where topical representation is the norm. Finally, we address the issues of studying the efficacy of identifying genre in large digital collections. Because genre is often an implicit notion, studying it in a systematic way presents many problems. We outline a research protocol that would provide guidance for identifying Web document genres, for observing how genre is used in searching and evaluating search results, and finally for representing and visualizing genres.
Web-Specific Genre Visualization
- In Proceedings of the Webnet World Conference on the WWW and Internet
, 1998
"... : User interfaces to WWW search engines typically present results as ranked lists of documents. Such lists give users little help in understanding document variation: we propose a richer representation of retrieval results in the search interface. Fundamental to us is the notion of document grouping ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
: User interfaces to WWW search engines typically present results as ranked lists of documents. Such lists give users little help in understanding document variation: we propose a richer representation of retrieval results in the search interface. Fundamental to us is the notion of document grouping. We use both stylistic genre-based document categorization and statistical content-based clustering, and organize documents along these criteria in a highly interactive visualization front-end to WWW search engines, enabling quick overview and incremental query refinement. Introduction The vast majority of user interfaces to WWW search engines are still based on an exceedingly simple interaction model where a linear list of hits, i.e. document items, is sorted after so-called "relevance" with inner workings and metrics hidden and all but incomprehensible to most users: "This is appealing in its simplicity, but users are often frustrated as they do not know what the results mean, nor can th...
Opportunities from open source search
- In Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence
, 2005
"... Internet search has a strong business model that permits a free service to users, so it is difficult to see why, if at all, there should be open source offerings as well. This paper first discusses open source search, and a rationale for the computer science community at large to get involved. Becau ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
Internet search has a strong business model that permits a free service to users, so it is difficult to see why, if at all, there should be open source offerings as well. This paper first discusses open source search, and a rationale for the computer science community at large to get involved. Because there is no shortage of core open source components for at least some of the tasks involved, the Alvis Consortium is building infrastructure for open source search engines using peer-to-peer and subject specific technology as its core, based on this rationale. We view open source search as a rich future playground in which information extraction and retrieval components can be used and intelligent agents can operate. 1
Web-Specific Genre Visualization
, 1998
"... : User interfaces to WWW search engines typically present results as ranked lists of documents. Such lists give users little help in understanding document variation: we propose a richer representation of retrieval results in the search interface. Fundamental to us is the notion of document group ..."
Abstract
- Add to MetaCart
: User interfaces to WWW search engines typically present results as ranked lists of documents. Such lists give users little help in understanding document variation: we propose a richer representation of retrieval results in the search interface. Fundamental to us is the notion of document grouping. We use both stylistic genre-based document categorization and statistical content-based clustering, and organize documents along these criteria in a highly interactive visualization front-end to WWW search engines, enabling quick overview and incremental query refinement. Introduction The vast majority of user interfaces to WWW search engines are still based on an exceedingly simple interaction model where a linear list of hits, i.e. document items, is sorted after so-called "relevance" with inner workings and metrics hidden and all but incomprehensible to most users: "This is appealing in its simplicity, but users are often frustrated as they do not know what the results mean, nor...
Building a Text Corpus for Representing the Variety of Medical Language
, 2001
"... Medical language processing has focused until recently on a few types of textual documents. However, a much larger variety of document types are used in different settings. It has been showed that Natural Language Processing (NLP) tools can exhibit very different behavior on different types of texts ..."
Abstract
- Add to MetaCart
Medical language processing has focused until recently on a few types of textual documents. However, a much larger variety of document types are used in different settings. It has been showed that Natural Language Processing (NLP) tools can exhibit very different behavior on different types of texts. Without better informed knowledge about the differential performance of NLP tools on a variety of medical text types, it will be difficult to control the extension of their application to different medical documents. We endeavored to provide a basis for such informed assessment: the construction of a large corpus of medical text samples. We propose a framework for designing such a corpus: a set of descriptive dimensions and a standardized encoding of both meta-information (implementing these dimensions) and content. We present a proof of concept demonstration by encoding an initial corpus of text samples according to these principles.
Opportunities from Open Source Search
"... Internet search has a strong business model that permits a free service to users, so it is difficult to see why, if at all, there should be open source offerings as well. This paper first discusses open source search, and a rationale for the computer science community at large to get involved. Becau ..."
Abstract
- Add to MetaCart
Internet search has a strong business model that permits a free service to users, so it is difficult to see why, if at all, there should be open source offerings as well. This paper first discusses open source search, and a rationale for the computer science community at large to get involved. Because there is no shortage of core open source components for at least some of the tasks involved, the Alvis Consortium is building infrastructure for open source search engines using peer-to-peer and subject specific technology as its core, based on this rationale. We view open source search as a rich future playground in which information extraction and retrieval components can be used and intelligent agents can operate.
Problems in the Use-Centered Development of a Taxonomy of Web Genres ∗
, 2008
"... Much of the recent research on Web genres has been driven by the assumption that information seeking on the Web can be made easier and more effective by information retrieval (IR) tools that differentiate among indexed Web documents in terms of their genre (or genres). ..."
Abstract
- Add to MetaCart
Much of the recent research on Web genres has been driven by the assumption that information seeking on the Web can be made easier and more effective by information retrieval (IR) tools that differentiate among indexed Web documents in terms of their genre (or genres).

