Results 1 -
6 of
6
Evaluating Natural Language Processing Systems
, 1993
"... This report presents a detailed analysis and review of NLP evaluation, in principle and in practice. Part 1 examines evaluation concepts and establishes a framework for NLP system evaluation. This makes use of experience in the related area of information retrieval and the analysis also refers to ev ..."
Abstract
-
Cited by 104 (0 self)
- Add to MetaCart
This report presents a detailed analysis and review of NLP evaluation, in principle and in practice. Part 1 examines evaluation concepts and establishes a framework for NLP system evaluation. This makes use of experience in the related area of information retrieval and the analysis also refers to evaluation in speech processing. Part 2 surveys significant evaluation work done so far, for instance in machine translation, and discusses the particular problems of generic system evaluation. The conclusion is that evaluation strategies and techniques for NLP need much more development, in particular to take proper account of the influence of system tasks and settings. Part 3 develops a general approach to NLP evaluation, aimed at methodologically-sound strategies for test and evaluation motivated by comprehensive performance factor identification. The analysis throughout the report is supported by extensive illustrative examples. This work was carried out under the UK Science and Engineeri...
Evaluating Spoken Dialogue Agents with PARADISE: Two Case Studies
, 1998
"... This paper presents PARADISE (PARAdigm for DIalogue System Evaluation), a general framework for evaluating and comparing the performance of spoken dialogue agents. The framework decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enable ..."
Abstract
-
Cited by 32 (3 self)
- Add to MetaCart
This paper presents PARADISE (PARAdigm for DIalogue System Evaluation), a general framework for evaluating and comparing the performance of spoken dialogue agents. The framework decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enables the calculation of performance over subdialogues and whole dialogues, specifies the relative contribution of various factors to performance, and makes it possible to compare agents performing different tasks by normalizing for task complexity. After presenting PARADISE, we illustrate its application to two different spoken dialogue agents. We show how to derive a performance function for each agent and how to generalize results across agents. We then show that once such a performance function has been derived, that it can be used both for making predictions about future versions of an agent, and as feedback to the agent so that the agent can learn to optimize its behavior based on its experiences with users over time.
Empirical studies in discourse
- Computational Linguistics
, 1997
"... Computational theories of discourse are concerned with the context-based interpreta-tion or generation of discourse phenomena in text and dialogue. In the past, research in ..."
Abstract
-
Cited by 8 (0 self)
- Add to MetaCart
Computational theories of discourse are concerned with the context-based interpreta-tion or generation of discourse phenomena in text and dialogue. In the past, research in
Statistical Source Channel Models for Natural Language Understanding
, 1996
"... d my ignorance in the field. He was always patient, and took the time to explain his answers at a level I could understand. iv Dr. Todd Ward, a colleague of mine at IBM, has also "been there" for me. I cannot count the number of times that Todd helped me figure out a solution to a problem, either ..."
Abstract
-
Cited by 8 (1 self)
- Add to MetaCart
d my ignorance in the field. He was always patient, and took the time to explain his answers at a level I could understand. iv Dr. Todd Ward, a colleague of mine at IBM, has also "been there" for me. I cannot count the number of times that Todd helped me figure out a solution to a problem, either mathematical or programming. Whenever I was not sure about a solution to a problem, Todd was my sounding board. I'm sure that his individual research efforts were slowed by our meetings, but that never stopped him from helping me. Todd also acted as a counselor, providing insight on how to complete a doctorate! Former IBMer, Dr. Stephen Della Pietra, is without a doubt the brightest mathematician with whom I have ever worked. Like Salim and Todd, he knows statistical modeling at a much greater depth than I do, and he never minded "bringing down" the level of his explanations to one where I could understand and absorb the material. Stephen was my mentor, and without his expert tutelag
Evaluating Interactive Dialogue Systems: Extending Component Evaluation to Integrated System Evaluation
- IN PROCEEDINGS OF THE ACL/EACL WORKSHOP ON INTERACTIVE SPOKEN DIALOGUE SYSTEMS
, 1997
"... This paper discusses the range of ways in which spoken dialogue system components have been evaluated and discusses approaches to evaluation that attempt to integrate component evaluation into an overall view of system performance. We will ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This paper discusses the range of ways in which spoken dialogue system components have been evaluated and discusses approaches to evaluation that attempt to integrate component evaluation into an overall view of system performance. We will

