Results 1 - 10
of
23
Stylistic Experiments For Information Retrieval
, 2000
"... Information retrieval systems are built to handle texts as topical items: texts are tabulated by occurrence frequencies of content words in them, under the assumption that text topic is reasonably well modeled by content word occurrence. But texts have several interesting characteristics beyond topi ..."
Abstract
-
Cited by 47 (8 self)
- Add to MetaCart
Information retrieval systems are built to handle texts as topical items: texts are tabulated by occurrence frequencies of content words in them, under the assumption that text topic is reasonably well modeled by content word occurrence. But texts have several interesting characteristics beyond topic. The experiments described in this text investigate stylistic variation. Roughly put, style is the difference between two ways of saying the same thing -- and systematic stylistic variation can be used to characterize the genre of documents. These experiments investigate if stylistic information is distinguishable using simple language engineering methods, and if in that case this type of information can be used to improve information retrieval systems.
Integrating Automatic Genre Analysis into Digital Libraries
- IN FIRST ACM-IEEE JOINT CONF ON DIGITAL LIBRARIES
, 2001
"... With the number and types of documents in digital library systems increasing, tools for automatically organizing and presenting the content have to be found. While many approaches focus on topic-based organization and structuring, hardly any system incorporates automatic structural analysis and repr ..."
Abstract
-
Cited by 39 (2 self)
- Add to MetaCart
With the number and types of documents in digital library systems increasing, tools for automatically organizing and presenting the content have to be found. While many approaches focus on topic-based organization and structuring, hardly any system incorporates automatic structural analysis and representation. Yet, genre information (unconsciously) forms one of the most distinguishing features in conventional libraries and in information searches. In this paper we present an approach to automatically analyze the structure of documents and to integrate this information into an automatically created content-based organization. In the resulting visualization, documents on similar topics, yet representing different genres, are depicted as books in diering colors. This representation supports users intuitively in locating relevant information presented in a relevant form.
Architectural Elements of Language Engineering Robustness
- Journal of Natural Language Engineering – Special Issue on Robust Methods in Analysis of Natural Language Data
, 2002
"... We discuss robustness in LE systems from the perspective of engineering, and the predictability of both outputs and construction process that this entails. We present an architectural system that contributes to engineering robustness and low-overhead systems development (GATE, a General Architecture ..."
Abstract
-
Cited by 22 (13 self)
- Add to MetaCart
We discuss robustness in LE systems from the perspective of engineering, and the predictability of both outputs and construction process that this entails. We present an architectural system that contributes to engineering robustness and low-overhead systems development (GATE, a General Architecture for Text Engineering). To verify our ideas we present results from the development of a multi-purpose cross-genre Named Entity recognition system. This system aims be robust across diverse input types, and to reduce the need for costly and time-consuming adaptation of systems to new applications, with its capability to process texts from widely di#ering domains and genres.
A Computational Theory of Goal-Directed Style In Syntax
- COMPUTATIONAL LINGUISTICS
, 1993
"... ..."
Genres, registers, text types, domains and styles: clarifying the concepts and navigating a path through the BNC jungle
- Technology
, 2001
"... In this paper, an attempt is first made to clarify and tease apart the somewhat confusing terms genre, register, text type, domain, sublanguage, and style. The use of these terms by various linguists and literary theorists working under different traditions or orientations will be examined and a pos ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
In this paper, an attempt is first made to clarify and tease apart the somewhat confusing terms genre, register, text type, domain, sublanguage, and style. The use of these terms by various linguists and literary theorists working under different traditions or orientations will be examined and a possible way of synthesising their insights will be proposed and illustrated with reference to the disparate categories used to classify texts in various existing computer corpora. With this terminological problem resolved, a personal project which involved giving each of the 4,124 British National Corpus (BNC, version 1) files a descriptive "genre " label will then be described. The result of this work, a spreadsheet/database (the "BNC Index") containing genre labels and other types of information about the BNC texts will then be described and its usefulness shown. It is envisaged that this resource will allow linguists, language teachers, and other users to easily navigate through or scan the huge BNC jungle more easily, to quickly ascertain what is there (and how much) and to make informed selections from the mass of texts available. It should also greatly facilitate genre-based research (e.g., EAP, ESP, discourse analysis, lexicogrammatical, and collocational studies) and focus everyday classroom concordancing activities by making it easy for people to restrict their searches to highly specified sub-sets of the BNC using PC-based concordancers such as WordSmith, MonoConc, or the Web-based BNCWeb.
Weight functions impact on LSA performance
- EuroConference RANLP'2001 (Recent Advances in NLP
, 2001
"... This paper presents experimental results of usage of LSA for analysis of English literature texts. Several preliminary transformations of the frequency text-document matrix with different weight functions are tested on the basis of control subsets. Additional clustering based on correlation matrix i ..."
Abstract
-
Cited by 13 (3 self)
- Add to MetaCart
This paper presents experimental results of usage of LSA for analysis of English literature texts. Several preliminary transformations of the frequency text-document matrix with different weight functions are tested on the basis of control subsets. Additional clustering based on correlation matrix is applied in order to reveal the latent structure. The algorithm creates a shaded form matrix via singular values and vectors. The results are interpreted as a quality of the transformations and compared to the control set tests. 1.
The Glass Box User Model for Filtering
, 1994
"... The first requirement on an interactive system in a domain such as information filtering is to be an interface to knowledge, rather than just a knowledgeable interface. We borrow the computation instruction metaphor of a system as "a black box in a glass box" as a means to conceptualize the problem ..."
Abstract
-
Cited by 12 (4 self)
- Add to MetaCart
The first requirement on an interactive system in a domain such as information filtering is to be an interface to knowledge, rather than just a knowledgeable interface. We borrow the computation instruction metaphor of a system as "a black box in a glass box" as a means to conceptualize the problem of giving a user control over the actions of an interactive system. The application domain we work in is that of information filtering. In the "black box", we hide complex knowledge of the domain objects such as facts and assumptions about text genre identification, while the "glass box", which is what the user sees, only shows the neat top level knowledge of the domain conceptual categories such as e.g. categorization rules. Keywords: Information filtering, user modelling, interface design, Usenet News The glass box user model for filtering Jussi Karlgren, Kristina Höök, Ann Lantz, Jacob Palme, Daniel Pargman Departments of Computer and Systems Sciences, Computational Linguistics, and Psy...
Web-Specific Genre Visualization
- In Proceedings of the Webnet World Conference on the WWW and Internet
, 1998
"... : User interfaces to WWW search engines typically present results as ranked lists of documents. Such lists give users little help in understanding document variation: we propose a richer representation of retrieval results in the search interface. Fundamental to us is the notion of document grouping ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
: User interfaces to WWW search engines typically present results as ranked lists of documents. Such lists give users little help in understanding document variation: we propose a richer representation of retrieval results in the search interface. Fundamental to us is the notion of document grouping. We use both stylistic genre-based document categorization and statistical content-based clustering, and organize documents along these criteria in a highly interactive visualization front-end to WWW search engines, enabling quick overview and incremental query refinement. Introduction The vast majority of user interfaces to WWW search engines are still based on an exceedingly simple interaction model where a linear list of hits, i.e. document items, is sorted after so-called "relevance" with inner workings and metrics hidden and all but incomprehensible to most users: "This is appealing in its simplicity, but users are often frustrated as they do not know what the results mean, nor can th...

