Results 1 - 10
of
116
Towards a Framework for Software Measurement Validation
- IEEE Transactions on Software Engineering
, 1995
"... Abstract-In this paper we propose a framework for validating software measurement. We start by defining a measurement structure model that identifies the elementary component of measures and the measurement process, and then consider five other models involved in measurement: unit definition models, ..."
Abstract
-
Cited by 100 (0 self)
- Add to MetaCart
Abstract-In this paper we propose a framework for validating software measurement. We start by defining a measurement structure model that identifies the elementary component of measures and the measurement process, and then consider five other models involved in measurement: unit definition models, instrumentation models, attribute relationship models, measure-ment protocols and entity population models. We consider a number of measures from the viewpoint of our measurement vali-dation framework and identify a number of shortcomings; in particular we identify a number of problems with the construc-tion of function points. We also compare our view of measure-ment validation with ideas presented by other researchers and identify a number of areas of disagreement. Finally, we suggest several rules that practitioners and researchers can use to avoid measurement problems, including the use of measurement vectors rather than artificially contrived scalars. Index Terms-Measurement theory, software measurement, software metrics validation.
Inter-Coder Agreement for Computational Linguistics
- COMPUTATIONAL LINGUISTICS
, 2008
"... This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff’s alpha as well as Scott’s pi and Cohen’s kappa; discusses the use of coefficients in several annotation tasks; ..."
Abstract
-
Cited by 54 (1 self)
- Add to MetaCart
This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff’s alpha as well as Scott’s pi and Cohen’s kappa; discusses the use of coefficients in several annotation tasks; and argues that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in Computational Linguistics, may be more appropriate for many corpus annotation tasks – but that their use makes the interpretation of the value of the coefficient even harder.
Criteria for evaluating usability evaluation methods
- International Journal of Human-Computer Interaction
, 2001
"... The current variety of alternative approaches to usability evaluation methods (UEMs) designed to assess and improve usability in software systems is offset by a general lack of understanding of the capabilities and limitations of each. Practitioners need to know which methods are more effective and ..."
Abstract
-
Cited by 38 (0 self)
- Add to MetaCart
The current variety of alternative approaches to usability evaluation methods (UEMs) designed to assess and improve usability in software systems is offset by a general lack of understanding of the capabilities and limitations of each. Practitioners need to know which methods are more effective and in what ways and for what purposes. However, UEMs cannot be evaluated and compared reliably because of the lack of standard criteria for comparison. In this article, we present a practical discussion of factors, comparison criteria, and UEM performance measures useful in studies comparing UEMs. In demonstrating the importance of developing appropriate UEM evaluation criteria, we offer operational definitions and possible measures of UEM performance. We highlight specific challenges that researchers and practitioners face in comparing UEMs and provide a point of departure for further discussion and refinement of the principles and techniques used to approach UEM evaluation and comparison. 1.
A reference collection for Web spam
- SIGIR Forum
, 2006
"... We describe the WEBSPAM-UK2006 collection, a large set of Web pages that have been manually annotated with labels indicating if the hosts are include Web spam aspects or not. This is the first publicly available Web spam collection that includes page contents and links, and that has been labelled by ..."
Abstract
-
Cited by 36 (12 self)
- Add to MetaCart
We describe the WEBSPAM-UK2006 collection, a large set of Web pages that have been manually annotated with labels indicating if the hosts are include Web spam aspects or not. This is the first publicly available Web spam collection that includes page contents and links, and that has been labelled by a large and diverse set of judges. 1
Spotting "Hot Spots" in Meetings: Human Judgments and Prosodic Cues
- in Proc. Eurospeech
, 2003
"... Recent interest in the automatic processing of meetings is motivated by a desire to summarize, browse, and retrieve important information from lengthy archives of spoken data. One of the most useful capabilities such a technology could provide is a way for users to locate "hot spots" or regions in w ..."
Abstract
-
Cited by 30 (3 self)
- Add to MetaCart
Recent interest in the automatic processing of meetings is motivated by a desire to summarize, browse, and retrieve important information from lengthy archives of spoken data. One of the most useful capabilities such a technology could provide is a way for users to locate "hot spots" or regions in which participants are highly involved in the discussion (e.g. heated arguments, points of excitement, etc.). We ask two questions about hot spots in meetings in the ICSI Meeting Recorder corpus. First, we ask whether involvement can be judged reliably by human listeners. Results show that despite the subjective nature of the task, raters show significant agreement in distinguishing involved from non-involved utterances. Second, we ask whether there is a relationship between human judgments of involvement and automatically extracted prosodic features of the associated regions. Results show that there are significant differences in both F0 and energy between involved and non-involved utterances. These findings suggest that humans do agree to some extent on the judgment of hot spots, and that acoustic-only cues could be used for automatic detection of hot spots in natural meetings.
Data Management and Analysis Methods
- IN DENZIN N, LINCOLN Y (EDS.) HANDBOOK OF QUALITATIVE RESEARCH, 2ND ED., THOUSAND OAKS, CA: SAGE PUBLICATIONS
, 2000
"... This chapter is about methods for managing and analyzing qualitative data. By qualitative data we mean text: newspapers, movies, sitcoms, e-mail traffic, folktales, life histories. We also mean narratives—narratives about getting divorced, about being sick, about surviving hand-to-hand combat, about ..."
Abstract
-
Cited by 28 (1 self)
- Add to MetaCart
This chapter is about methods for managing and analyzing qualitative data. By qualitative data we mean text: newspapers, movies, sitcoms, e-mail traffic, folktales, life histories. We also mean narratives—narratives about getting divorced, about being sick, about surviving hand-to-hand combat, about selling sex, about trying to quit smoking. In fact, most of the archaeologically recoverable information about human thought and human behavior is text, the good stuff of social science. Scholars in content analysis began using computers in the 1950s to do statistical analysis of texts (Pool, 1959), but recent advances in technology are changing the economics of the social
The Repeatability of Code Defect Classifications
- International Software Engineering Research Network
, 1998
"... Counts of defects found during the various defect detection activities in software projects and their classification provide a basis for product quality evaluation and process improvement. However, since defect classifications are subjective, it is necessary to ensure that they are repeatable (i.e., ..."
Abstract
-
Cited by 16 (0 self)
- Add to MetaCart
Counts of defects found during the various defect detection activities in software projects and their classification provide a basis for product quality evaluation and process improvement. However, since defect classifications are subjective, it is necessary to ensure that they are repeatable (i.e., that the classification is not dependent on the individual). In this paper we evaluate a slight adaptation of a commonly used defect classification scheme that has been applied in IBM's Orthogonal Defect Classification work, and in the SEI's Personal Software Process. The evaluation utilizes the Kappa statistic. We use defect data from code inspections conducted during a development project. Our results indicate that the classification scheme is in general repeatable. We further evaluate classes of defects to find out if confusion between some categories is more common, and suggest a potential improvement to the scheme. Keywords: defect classification, software inspections, measurement rel...
The prosody of backchannels in American English
- In ICPhS
, 2007
"... We examine prosodic and contextual factors characterizing the backchannel function of single affirmative words. Data is drawn from collaborative task-oriented dialogues between speakers of Standard American English. Despite high lexical variability, backchannels are prosodically well defined: they h ..."
Abstract
-
Cited by 11 (3 self)
- Add to MetaCart
We examine prosodic and contextual factors characterizing the backchannel function of single affirmative words. Data is drawn from collaborative task-oriented dialogues between speakers of Standard American English. Despite high lexical variability, backchannels are prosodically well defined: they have higher pitch and intensity and greater pitch slope than affirmative words expressing other pragmatic functions. Additionally, we identify phrase-final rising pitch as a salient trigger for backchanneling.
The role of context and prosody in the interpretation of okay
- In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics
, 2007
"... We examine the effect of contextual and acoustic cues in the disambiguation of three discourse-pragmatic functions of the word okay. Results of a perception study show that contextual cues are stronger predictors of discourse function than acoustic cues. However, acoustic features capturing the pitc ..."
Abstract
-
Cited by 10 (2 self)
- Add to MetaCart
We examine the effect of contextual and acoustic cues in the disambiguation of three discourse-pragmatic functions of the word okay. Results of a perception study show that contextual cues are stronger predictors of discourse function than acoustic cues. However, acoustic features capturing the pitch excursion at the right edge of okay feature prominently in disambiguation, whether other contextual cues are present or not. 1
Interrater Agreement in SPICE-Based Assessments: Some Preliminary Results
- In Proceedings of the International Conference on the Software Process
, 1996
"... ..."

