Results 11 - 20
of
105
An evaluation of techniques for clustering search results
, 1996
"... The ability to effectively organize retrieval results becomes more important as the focus of Information Retrieval (IR) shifts towards interactive search processes. Automatic classification techniques are capable of providing the necessary information organization by arranging the retrieved data int ..."
Abstract
-
Cited by 35 (3 self)
- Add to MetaCart
The ability to effectively organize retrieval results becomes more important as the focus of Information Retrieval (IR) shifts towards interactive search processes. Automatic classification techniques are capable of providing the necessary information organization by arranging the retrieved data into groups of documents with common subjects. In this paper, we compare classification methods from IR and Machine Learning (ML) for clustering search results. Issues such as document representation, classification algorithms, and cluster representation are discussed. We introduce several evaluation techniques and use them in preliminary experiments. These experiments indicate that the proposed techniques have promise, but it is clear that user experiments are required to carry out more thorough evaluation.
Lower dimensional representation of text data based on centroids and least squares
- BIT
, 2003
"... Abstract Dimension reduction in today's vector space based information retrieval system is essential for improvingcomputational efficiency in handling massive amounts of data. A mathematical framework for lower dimensional representation of text data in vector space based information retrieval is pr ..."
Abstract
-
Cited by 35 (12 self)
- Add to MetaCart
Abstract Dimension reduction in today's vector space based information retrieval system is essential for improvingcomputational efficiency in handling massive amounts of data. A mathematical framework for lower dimensional representation of text data in vector space based information retrieval is proposed using minimizationand a matrix rank reduction formula. We illustrate how the commonly used Latent Semantic Indexing based on the Singular Value Decomposition (LSI/SVD) can be derived as a method for dimension reduction fromour mathematical framework. Then two new methods for dimension reduction based on the centroids of data clusters are proposed and shown to be more efficient and effective than LSI/SVD when we have a prioriinformation on the cluster structure of the data. Several advantages of the new methods in terms of computational efficiency and data representation in the reduced space, as well as their mathematical properties arediscussed. Experimental results are presented to illustrate the effectiveness of our methods on certain classificationproblems in a reduced dimensional space. The results indicate that for a successful lower dimensional representation of the data, it is important to incorporate a priori knowledge in the dimension reductionalgorithms.
The Application of Classical Information Retrieval Techniques to Spoken Documents
, 1995
"... Object Description General Discussion Map Reading Photographic Interpretation Cartoon Description Table 4.1: Message classes in classification experiments of Rose et al. Now, an estimate of I(C i ; w k ) can be calculated by a four--way partition of the set of test messages, depending on (a) whether ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
Object Description General Discussion Map Reading Photographic Interpretation Cartoon Description Table 4.1: Message classes in classification experiments of Rose et al. Now, an estimate of I(C i ; w k ) can be calculated by a four--way partition of the set of test messages, depending on (a) whether or not a message belongs to topic class C i and (b) whether or not it contains word w k . If N is the number of messages in the test collection, R i is the number belonging to topic class C i , n k is the number of messages containing word w k and r ik is the number of messages in class C i containing word w k , then, estimating the probabilities by frequency counts, I(C i ; w k ) = log ( r ik R i ) ( n k N ) : This is actually identical to a form of retrospective term relevance weight, initially proposed in the IR literature by both Barkla [66] and Miller [67], and reviewed by Robertson and Sparck Jones in their classic paper on the subject [42]. Moreover, Rose proposed, but did no...
Structure preserving dimension reduction for clustered text data based on the generalized singular value decomposition
- SIAM Journal on Matrix Analysis and Applications
, 2003
"... Abstract. In today’s vector space information retrieval systems, dimension reduction is imperative for efficiently manipulating the massive quantity of data. To be useful, this lower-dimensional representation must be a good approximation of the full document set. To that end, we adapt and extend th ..."
Abstract
-
Cited by 31 (15 self)
- Add to MetaCart
Abstract. In today’s vector space information retrieval systems, dimension reduction is imperative for efficiently manipulating the massive quantity of data. To be useful, this lower-dimensional representation must be a good approximation of the full document set. To that end, we adapt and extend the discriminant analysis projection used in pattern recognition. This projection preserves cluster structure by maximizing the scatter between clusters while minimizing the scatter within clusters. A common limitation of trace optimization in discriminant analysis is that one of the scatter matrices must be nonsingular, which restricts its application to document sets in which the number of terms does not exceed the number of documents. We show that by using the generalized singular value decomposition (GSVD), we can achieve the same goal regardless of the relative dimensions of the term-document matrix. In addition, applying the GSVD allows us to avoid the explicit formation of the scatter matrices in favor of working directly with the data matrix, thus improving the numerical properties of the approach. Finally, we present experimental results that confirm the effectiveness of our approach.
Effects of OCR errors on ranking and feedback using the vector space model
- Inf. Proc. and Management
, 1996
"... We report on the performance of the vector space model in the presence of OCR errors. We show that average precision and recall is not affected for our full text document collection when the OCR version is compared to its corresponding corrected set. We do see divergence though between the relevant ..."
Abstract
-
Cited by 29 (12 self)
- Add to MetaCart
We report on the performance of the vector space model in the presence of OCR errors. We show that average precision and recall is not affected for our full text document collection when the OCR version is compared to its corresponding corrected set. We do see divergence though between the relevant document rankings of the OCR and corrected collections with different weighting combinations. In particular, we observed that cosine normalization plays a considerable role in the disparity seen between the collections. Furthermore, we show that even though feedback improves retrieval for both collections, it can not be used to compensate for OCR errors caused by badly degraded documents.
NATURAL LANGUAGE CALL ROUTING: A Robust, Self-Organizing Approach
- In Proceedings of the Fifth International Conference on Spoken Language Processing
, 1998
"... We have developed a domain independent, automatically trained, call router which directs customer calls based on their response to an open-ended "How may I direct your call?" query. Routing behavior is trained from a corpus of transcribed and hand-routed calls and then carried out using vector-based ..."
Abstract
-
Cited by 29 (2 self)
- Add to MetaCart
We have developed a domain independent, automatically trained, call router which directs customer calls based on their response to an open-ended "How may I direct your call?" query. Routing behavior is trained from a corpus of transcribed and hand-routed calls and then carried out using vector-based information retrieval techniques. Terms consist of sequences of morphologically reduced content words. Documents representing routing destinations consist of weighted term frequencies derived from calls to that destination in the training corpus. In this paper, we evaluate our approach in the context of a large financial services call center with thousands of possible customer activities and dozens of routing destinations. We evaluate the system's performance on ambiguous and unambiguous calls when given either accurate transcriptions or fairly noisy real-time speech recognizer output. We conclude that in a highly complex call center, our system performs at roughly the same level of accurac...
Selective Text Utilization and Text Traversal
- In Hypertext '93 Proceedings
, 1995
"... Many large collections of full-text documents are currently stored in machine-readable form and processed automatically in various ways. These collections may include different types of documents, such as messages, research articles, and books, and the subject matter may vary widely. To process such ..."
Abstract
-
Cited by 27 (6 self)
- Add to MetaCart
Many large collections of full-text documents are currently stored in machine-readable form and processed automatically in various ways. These collections may include different types of documents, such as messages, research articles, and books, and the subject matter may vary widely. To process such collections, robust text analysis methods must be used, capable of handling materials in arbitrary subject areas, and flexible access must be provided to texts and text excerpts of varying size. In this study, global text comparison methods are used to identify similarities between text elements, followed by local context-checking operations that resolve ambiguities and distinguish superficially similar texts from texts that actually cover identical topics. A linked text structure is then created that relates similar texts at various levels of detail. In particular, text links are available for full texts, as well as text sections, paragraphs, and sentence groups. The linked structures are ...
Semantic-audio retrieval
- IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP
, 2002
"... © 2002 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other w ..."
Abstract
-
Cited by 27 (0 self)
- Add to MetaCart
© 2002 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.
MIMIC: An Adaptive Mixed Initiative Spoken Dialogue System for Information Queries
, 2000
"... This paper describes MIMIC, an adaptive mixed initiative spoken dialogue system that provides movie showtime information. MIMIC improves upon previous dialogue systems in two respects. First, it employs initiative-oriented strategy adaptation to automatically adapt response generation strategies bas ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
This paper describes MIMIC, an adaptive mixed initiative spoken dialogue system that provides movie showtime information. MIMIC improves upon previous dialogue systems in two respects. First, it employs initiative-oriented strategy adaptation to automatically adapt response generation strategies based on the cumulative effect of information dynamically extracted from user utterances during the dialogue. Second, MIMIC's dialogue management architecture decouples its initiative module from the goal and response strategy selection processes, providing a general framework for developing spoken dialogue systems with different adaptation behavior.
The Effect of Accessing Non-Matching Documents on Relevance Feedback
- ACM Transactions on Information Systems
, 1997
"... Traditional information retrieval (IR)... This paper shows that, in systems that allow access to non-matching documents (e.g. hybrid hypertext and information retrieval systems), the strength of the effect of giving relevance feedback varies between matching and non-matching documents. For positive ..."
Abstract
-
Cited by 23 (0 self)
- Add to MetaCart
Traditional information retrieval (IR)... This paper shows that, in systems that allow access to non-matching documents (e.g. hybrid hypertext and information retrieval systems), the strength of the effect of giving relevance feedback varies between matching and non-matching documents. For positive feedback the results shown here are encouraging as they can be justified by an intuitive view of the process. However, for negative feedback the results show behaviour that cannot easily be justified and that varies greatly depending on the model of feedback used.

