Results 1 - 10
of
91
Automatic Word Sense Discrimination
- Journal of Computational Linguistics
, 1998
"... This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a high-dimensional, real-valued space in which closen ..."
Abstract
-
Cited by 272 (0 self)
- Add to MetaCart
This paper presents context-group discrimination, a disambiguation algorithm based on clustering. Senses are interpreted as groups (or clusters) of similar contexts of the ambiguous word. Words, contexts, and senses are represented in Word Space, a high-dimensional, real-valued space in which closeness corresponds to semantic similarity. Similarity in Word Space is based on second-order co-occurrence: two tokens (or contexts) of the ambiguous word are assigned to the same sense cluster if the words they co-occur with in turn occur with similar words in a training corpus. The algorithm is automatic and unsupervised in both training and application: senses are induced from a corpus without labeled training insta,nces or other external knowledge sources. The paper demonstrates good performance of context-group discrimination for a sample of natural and artificial ambiguous words
Projections for Efficient Document Clustering
, 1997
"... Clustering is increasing in importance, but linear- and even constant-time clustering algorithms are often too slow for real-time applications. A simple way to speed up clustering is to speed up the distance calculations at the heart of clustering routines. We study two techniques for improving the ..."
Abstract
-
Cited by 86 (0 self)
- Add to MetaCart
Clustering is increasing in importance, but linear- and even constant-time clustering algorithms are often too slow for real-time applications. A simple way to speed up clustering is to speed up the distance calculations at the heart of clustering routines. We study two techniques for improving the cost of distance calculations, LSI and truncation, and determine both how much these techniques speed up clustering and how much they affect the quality of the resulting clusters. We find that the speed increase is significant while --- surprisingly --- the quality of clustering is not adversely affected. We conclude that truncation yields clusters as good as those produced by full-profile clustering while offering a significant speed advantage.
An experimental determination of sufficient mutant operators
- ACM Transactions on Software Engineering Methodology
, 1996
"... Mutation testing is a technique for unit testing software that, although powerful, is computationally expensive. The principal expense of mutation is that many variants of the test program, called mutants, must be repeatedly executed. This paper quantifies the expense of mutation in terms of the num ..."
Abstract
-
Cited by 75 (2 self)
- Add to MetaCart
Mutation testing is a technique for unit testing software that, although powerful, is computationally expensive. The principal expense of mutation is that many variants of the test program, called mutants, must be repeatedly executed. This paper quantifies the expense of mutation in terms of the number of mutants that are created, then proposes and evaluates a technique that reduces the number of mutants by an order of magnitude. Selective mutation reduces the cost of mutation testing by reducing the number of mutants. This paper reports experimental results that compare selective mutation testing with standard, or non-selective, mutation testing, and results that quantify the savings achieved by selective mutation testing. The results support the hypothesis that selective mutation is almost as strong as non-selective mutation; in experimental trials selective mutation provides almost the same coverage as non-selective mutation, with a four-fold or more reduction in the number of mutants.
Test case prioritization: An empirical study
- In Proceedings of the International Conference on Software Maintenance
, 1999
"... Test case prioritization techniques schedule test cases for execution in an order that attempts to maximize some objective function. A variety of objective functions are applicable; one such function involves rate of fault detection — a measure of how quickly faults are detected within the testing p ..."
Abstract
-
Cited by 55 (12 self)
- Add to MetaCart
Test case prioritization techniques schedule test cases for execution in an order that attempts to maximize some objective function. A variety of objective functions are applicable; one such function involves rate of fault detection — a measure of how quickly faults are detected within the testing process. An improved rate of fault detection during regression testing can provide faster feedback on a system under regression test and let debuggers begin their work earlier than might otherwise be possible. In this paper, we describe several techniques for prioritizing test cases and report our empirical results measuring the effectiveness of these techniques for improving rate of fault detection. The results provide insights into the tradeoffs among various techniques for test case prioritization. 1.
The Impact of Database Selection on Distributed Searching
- SIGIR
, 2000
"... The proliferation of online information resources increases the importance of effective and efficient distributed searching. Distributed searching is cast in three parts -- database selection, query processing, and results merging. In this paper we examine the effect of database selection on retriev ..."
Abstract
-
Cited by 53 (12 self)
- Add to MetaCart
The proliferation of online information resources increases the importance of effective and efficient distributed searching. Distributed searching is cast in three parts -- database selection, query processing, and results merging. In this paper we examine the effect of database selection on retrieval performance. We look at retrieval performance in three different distributed retrieval testbeds and distill some general results. First we find that good database selection can result in better retrieval effectiveness than can be achieved in a centralized database. Second we find that good performance can be achieved when only a few sites are selected and that the performance generally increases as more sites are selected. Finally we find that when database selection is employed, it is not necessary to maintain collection wide information (CWI), e.g. global idf. Local information can be used to achieve superior performance. This means that distributed systems can be engineered with more autonomy and less cooperation. This work suggests that improvements in database selection can lead to broader improvements in retrieval performance, even in centralized (i.e. single database) systems. Given a centralized database and a good selection mechanism, retrieval performance can be improved by decomposing that database conceptually and employing a selection step.
An experimental determination of sufficient mutation operators
- ACM Transactions on Software Engineering Methodology
, 1996
"... Mutation testing is a technique for unit testing software that, although powerful, is computationally expensive. The principal expense of mutation is that many variants of the test program, called mutants, must be repeatedly executed. Selective mutation is a way to reduce the cost of mutation testin ..."
Abstract
-
Cited by 27 (8 self)
- Add to MetaCart
Mutation testing is a technique for unit testing software that, although powerful, is computationally expensive. The principal expense of mutation is that many variants of the test program, called mutants, must be repeatedly executed. Selective mutation is a way to reduce the cost of mutation testing by reducing the number of mutants that must be executed. This paper reports experimental results that compare selective mutation testing to standard, or non-selective, mutation testing. The results support the hypothesis that selective mutation is almost as strong as non-selective mutation; in experimental trials selective mutation provides almost the same coverage as non-selective mutation, with a four-fold or more reduction in cost.
A Statistical Word-Level Translation Model for Comparable Corpora
- IN PROCEEDINGS OF THE CONFERENCE ON CONTENT-BASED MULTIMEDIA INFORMATION ACCESS (RIAO
, 2000
"... In this paper, we present a model of statistical word-level mapping for comparable corpora. The approach is based on the assumption that if two terms have close distributional profiles, their corresponding translations' distributional profiles should be close in a comparable corpus. The proposed mod ..."
Abstract
-
Cited by 27 (1 self)
- Add to MetaCart
In this paper, we present a model of statistical word-level mapping for comparable corpora. The approach is based on the assumption that if two terms have close distributional profiles, their corresponding translations' distributional profiles should be close in a comparable corpus. The proposed model is described. A preliminary investigation on intralanguage comparable corpora is laid out. The preliminary results are >92% accurate, suggesting the feasibility of the model. The model needs to undergo some improvements and should be tested cross linguistically before assessing its significance.
Test Case Prioritization
, 1999
"... Test case prioritization techniques schedule test cases for execution in an order that attempts to increase their effectiveness in meeting some performance goal. Various goals are possible; one involves rate of fault detection --- a measure of how quickly faults are detected within the testing proce ..."
Abstract
-
Cited by 27 (13 self)
- Add to MetaCart
Test case prioritization techniques schedule test cases for execution in an order that attempts to increase their effectiveness in meeting some performance goal. Various goals are possible; one involves rate of fault detection --- a measure of how quickly faults are detected within the testing process. An improved rate of fault detection during testing can provide faster feedback on the system under test, and let software engineers begin correcting faults earlier than might otherwise be possible. In this paper, we describe several techniques for prioritizing test cases, and report the results of empirical studies investigating the effectiveness of these techniques for improving rate of fault detection. Our results suggest that several techniques can significantly improve rate of fault detection, and illustrate several tradeoffs between the techniques.
Investigating Reading Techniques for Object-Oriented Framework Learning
- IEEE TRANSACTIONS ON SOFTWARE ENGINEERING
, 2000
"... The empirical study described in this paper addresses software reading for construction: how application developers obtain an understanding of a software artifact for use in new system development. This study focuses on the processes that developers would engage in when learning and using object-o ..."
Abstract
-
Cited by 23 (3 self)
- Add to MetaCart
The empirical study described in this paper addresses software reading for construction: how application developers obtain an understanding of a software artifact for use in new system development. This study focuses on the processes that developers would engage in when learning and using object-oriented frameworks. We analyzed 15 student software development projects using both qualitative and quantitative methods to gain insight into what processes occurred during framework usage. The contribution of the study is not to test predefined hypotheses but to generate well-supported hypotheses for further investigation. The main hypotheses we produce are that example-based techniques are well suited to use by beginning learners while hierarchy-based techniques are not because of a larger learning curve. Other more specific hypotheses are proposed and discussed.
Problem Difficulty for Tabu Search in Job-Shop Scheduling
- Artificial Intelligence
, 2002
"... Tabu search algorithms are among the most effective approaches for solving the job-shop scheduling problem (JSP). Yet, we have little understanding of why these algorithms work so well, and under what conditions. We develop a model of problem difficulty for tabu search in the JSP, borrowing from sim ..."
Abstract
-
Cited by 18 (7 self)
- Add to MetaCart
Tabu search algorithms are among the most effective approaches for solving the job-shop scheduling problem (JSP). Yet, we have little understanding of why these algorithms work so well, and under what conditions. We develop a model of problem difficulty for tabu search in the JSP, borrowing from similar models developed for SAT and other NP - complete problems. We show that the mean distance between random local optima and the nearest optimal solution is highly correlated with the cost of locating optimal solutions to typical, random JSPs. Additionally, this model accounts for the cost of locating suboptimal solutions, and provides an explanation for differences in the relative difficulty of square versus rectangular JSPs. We also identify two important limitations of our model. First, model accuracy is inversely correlated with problem difficulty, and is exceptionally poor for rare, very high-cost problem instances. Second, the model is significantly less accurate for structured, non-random JSPs. Our results are also likely to be useful in future research on difficulty models of local search in SAT, as local search cost in both SAT and the JSP is largely dictated by the same search space features. Similarly, our research represents the first attempt to quantitatively model the cost of tabu search for any NP -complete problem, and may possibly be leveraged in an effort to understand tabu search in problems other than job-shop scheduling.

