Results 1 -
6 of
6
Stylistic Experiments For Information Retrieval
, 2000
"... Information retrieval systems are built to handle texts as topical items: texts are tabulated by occurrence frequencies of content words in them, under the assumption that text topic is reasonably well modeled by content word occurrence. But texts have several interesting characteristics beyond topi ..."
Abstract
-
Cited by 47 (8 self)
- Add to MetaCart
Information retrieval systems are built to handle texts as topical items: texts are tabulated by occurrence frequencies of content words in them, under the assumption that text topic is reasonably well modeled by content word occurrence. But texts have several interesting characteristics beyond topic. The experiments described in this text investigate stylistic variation. Roughly put, style is the difference between two ways of saying the same thing -- and systematic stylistic variation can be used to characterize the genre of documents. These experiments investigate if stylistic information is distinguishable using simple language engineering methods, and if in that case this type of information can be used to improve information retrieval systems.
Integrating Automatic Genre Analysis into Digital Libraries
- IN FIRST ACM-IEEE JOINT CONF ON DIGITAL LIBRARIES
, 2001
"... With the number and types of documents in digital library systems increasing, tools for automatically organizing and presenting the content have to be found. While many approaches focus on topic-based organization and structuring, hardly any system incorporates automatic structural analysis and repr ..."
Abstract
-
Cited by 39 (2 self)
- Add to MetaCart
With the number and types of documents in digital library systems increasing, tools for automatically organizing and presenting the content have to be found. While many approaches focus on topic-based organization and structuring, hardly any system incorporates automatic structural analysis and representation. Yet, genre information (unconsciously) forms one of the most distinguishing features in conventional libraries and in information searches. In this paper we present an approach to automatically analyze the structure of documents and to integrate this information into an automatically created content-based organization. In the resulting visualization, documents on similar topics, yet representing different genres, are depicted as books in diering colors. This representation supports users intuitively in locating relevant information presented in a relevant form.
Genre Classification of Web Pages: User Study and Feasibility Analysis
- IN: BIUNDO S., FRUHWIRTH T., PALM G. (EDS.): ADVANCES IN ARTIFICIAL INTELLIGENCE
, 2004
"... Genre classification means to discriminate between documents by means of their form, their style, or their targeted audience. Put another way, genre classification is orthogonal to a classification based on the documents’ contents. While most of the existing investigations of an automated genre clas ..."
Abstract
-
Cited by 10 (0 self)
- Add to MetaCart
Genre classification means to discriminate between documents by means of their form, their style, or their targeted audience. Put another way, genre classification is orthogonal to a classification based on the documents’ contents. While most of the existing investigations of an automated genre classification are based on news articles corpora, the idea here is applied to arbitrary Web pages. We see genre classification as a powerful instrument to bring Web-based search services closer to a user’s information need. This objective raises two questions: (1) What are useful genres when searching the WWW? (2) Can these genres be reliably identified? The paper in hand presents results from a user study on Web genre usefulness as well as results from the construction of a genre classifier using discriminant analysis, neural network learning, and support vector machines. Particular attention is turned to a classifier’s underlying feature set: Aside from the standard feature types we introduce new features that are based on word frequency classes and that can be computed with minimum computational effort. They allow us to construct compact feature sets with few elements, with which a satisfactory genre diversification is achieved. About 70 % of the Web-documents are assigned to their true genre; note in this connection that no genre classification benchmark for Web pages has been published so far.
Modular System Design for Multimedial Information Handling
"... Often, information retrieval from various other media is analogous to text-based retrieval; however, accessing documents in e.g. audio or video formats causes some extra problems, in particular with respect to document segmentation, choice of indexing features, and robustness. We review these diffic ..."
Abstract
- Add to MetaCart
Often, information retrieval from various other media is analogous to text-based retrieval; however, accessing documents in e.g. audio or video formats causes some extra problems, in particular with respect to document segmentation, choice of indexing features, and robustness. We review these difficulties, together with some previous attempts to overcome them, and then describe a very flexible, modular IR system which has been designed with a specific eye towards these issues.
The Workshop “Towards Genre-Enabled Search Engines: The Impact of Natural Language
, 2007
"... Genres are textual categories that organise and structure communication. By definition, each genre brings with it a set of conventions that can be conceptualised as expectations regarding a textual instance of a specific genre. For example, the conventions of the blog genre — a genre that is only in ..."
Abstract
- Add to MetaCart
Genres are textual categories that organise and structure communication. By definition, each genre brings with it a set of conventions that can be conceptualised as expectations regarding a textual instance of a specific genre. For example, the conventions of the blog genre — a genre that is only instantiated on the World Wide Web — comprise a sequence of more or less daily postings that contain narratives,
Distinguishing Topic from Genre
"... Abstract: This paper contributes to a facet from the area of Web Information Retrieval that has recently received much attention: The satisfaction of a user’s personal information need with respect to text type, presentation type, or information quality. We imply that such properties can be quantifi ..."
Abstract
- Add to MetaCart
Abstract: This paper contributes to a facet from the area of Web Information Retrieval that has recently received much attention: The satisfaction of a user’s personal information need with respect to text type, presentation type, or information quality. We imply that such properties can be quantified for all kinds of Web documents, and we subsume them under the term “Web genre” or “genre”. Recent surveys show that there is, to a certain degree, a common understanding of Web genre. However, the strictness by which genre and non-genre aspects of a document are experienced is an individual matter. To get a better understanding of the challenges of Web genre identification and its possible limits we investigate in this paper a very interesting question, which has not been posed by now: Given a categorization C of documents (or bookmarks, links, document identifiers), can we provide a reliable assessment whether C is governed by topic or by genre considerations? We present instruments to answer this question as well as to make a distinct statement about the homogeneity of a categorization.

