Results 1  10
of
67
Does code decay? Assessing the evidence from change management data
 In IEEE Transactions on Software Engineering
, 2001
"... AbstractÐA central feature of the evolution of large software systems is that changeÐwhich is necessary to add new functionality, accommodate new hardware, and repair faultsÐbecomes increasingly difficult over time. In this paper, we approach this phenomenon, which we term code decay, scientifically ..."
Abstract

Cited by 145 (12 self)
 Add to MetaCart
AbstractÐA central feature of the evolution of large software systems is that changeÐwhich is necessary to add new functionality, accommodate new hardware, and repair faultsÐbecomes increasingly difficult over time. In this paper, we approach this phenomenon, which we term code decay, scientifically and statistically. We define code decay and propose a number of measurements (code decay indices) on software and on the organizations that produce it, that serve as symptoms, risk factors, and predictors of decay. Using an unusually rich data set (the fifteenplus year change history of the millions of lines of software for a telephone switching system), we find mixed, but on the whole persuasive, statistical evidence of code decay, which is corroborated by developers of the code. Suggestive indications that perfective maintenance can retard code decay are also discussed. Index TermsÐSoftware maintenance, metrics, statistical analysis, fault potential, span of changes, effort modeling. æ 1
On Locally Adaptive Density Estimation
, 1996
"... : In this paper, theoretical and practical aspects of the samplepoint adaptive positive kernel density estimator are examined. A closedform expression for the mean integrated squared error is obtained through the device of preprocessing the data by binning. With this expression, the exact behavio ..."
Abstract

Cited by 40 (5 self)
 Add to MetaCart
: In this paper, theoretical and practical aspects of the samplepoint adaptive positive kernel density estimator are examined. A closedform expression for the mean integrated squared error is obtained through the device of preprocessing the data by binning. With this expression, the exact behavior of the optimally adaptive smoothing parameter function is studied for the first time. The approach differs from most earlier techniques in that bias of the adaptive estimator remains O(h 2 ) and is not "improved" to the rate O(h 4 ). A practical algorithm is constructed using a modification of leastsquares crossvalidation. Simulated and real examples are presented, including comparisons with a fixed bandwidth estimator and a fully automatic version of Abramson's adaptive estimator. The results are very promising. KEY WORDS: Kernel Function, Variable Bandwidth, Binning, CrossValidation. 1 Stephan R. Sain is Research Associate, Department of Statistical Science, Southern Methodist U...
Visualizing Software Changes
 INTERACTIONS
, 2002
"... Visualizations of software changes are presented that complement existing visualizations of software structure. The principal metaphors are matrix views, cityscapes, bar and pie charts, data sheets and networks. Linked by selection mechanisms, multiple views are combined to form perspectives that bo ..."
Abstract

Cited by 37 (3 self)
 Add to MetaCart
Visualizations of software changes are presented that complement existing visualizations of software structure. The principal metaphors are matrix views, cityscapes, bar and pie charts, data sheets and networks. Linked by selection mechanisms, multiple views are combined to form perspectives that both enable discovery of highlevel structure in software change data and allow effective access to details of those data. Use of the views and perspectives is illustrated in two important contexts: understanding software change by exploration of software change data and management of software development.
Geoadditive Models
, 2000
"... this paper is a recent article on modelbased geostatistics by Diggle, Tawn and Moyeed (1998) where pure kriging (i.e. no covariates) is the focus. Our paper inherits some of its aspects: modelbased and with mixed model connections. In particular the comment by Bowman (1998) in the ensuing discussi ..."
Abstract

Cited by 33 (1 self)
 Add to MetaCart
this paper is a recent article on modelbased geostatistics by Diggle, Tawn and Moyeed (1998) where pure kriging (i.e. no covariates) is the focus. Our paper inherits some of its aspects: modelbased and with mixed model connections. In particular the comment by Bowman (1998) in the ensuing discussion suggested that additive modelling would be a worthwhile extension. This paper essentially follows this suggestion. However, this paper is not the first to combine the notions of geostatistics and additive modelling. References known to us are Kelsall and Diggle (1998), Durban Reguera (1998) and Durban, Hackett, Currie and Newton (2000). Nevertheless, we believe that our approach has a number of attractive features (see (1)(4) above), not all shared by these references. Section 2 describes the motivating application and data in detail. Section 3 shows how one can express additive models as a mixed model, while Section 4 does the same for kriging and merges the two into the geoadditive model. Issues concerning the amount of smoothing are discussed in Section 5 and inferential aspects are treated in Section 6. Our analysis of the Upper Cape Cod reproductive data is presented in Section 7. Section 8 discusses extension to the generalised context.We close the paper with some disussion in Section 9. 2 Description of the application and data
Long range dependence analysis of Internet traffic: Summary page for LRD project
, 2010
"... Long Range Dependent time series are endemic in the statistical analysis of Internet traffic. The Hurst Parameter provides good summary of important selfsimilar scaling properties. We compare a number of different Hurst parameter estimation methods and some important variations. This is done in the ..."
Abstract

Cited by 10 (7 self)
 Add to MetaCart
Long Range Dependent time series are endemic in the statistical analysis of Internet traffic. The Hurst Parameter provides good summary of important selfsimilar scaling properties. We compare a number of different Hurst parameter estimation methods and some important variations. This is done in the context of a wide range of simulated, laboratory generated and real data sets. Important differences between the methods are highlighted. Deep insights are revealed on how well the laboratory data mimic the real data. Nonstationarities, that are local in time, are seen to be central issues, and lead to both conceptual and practical recommendations. 1
Polynomial spline confidence bands for regression curves
, 2007
"... Abstract: Asymptotically exact and conservative confidence bands are obtained for a nonparametric regression function, using piecewise constant and piecewise linear spline estimation, respectively. Compared to the pointwise confidence interval of Huang (2003), the confidence bands are inflated by a ..."
Abstract

Cited by 10 (6 self)
 Add to MetaCart
Abstract: Asymptotically exact and conservative confidence bands are obtained for a nonparametric regression function, using piecewise constant and piecewise linear spline estimation, respectively. Compared to the pointwise confidence interval of Huang (2003), the confidence bands are inflated by a factor proportional to {log (n)} 1/2, with the same width order as the NadarayaWatson bands of Härdle (1989), and the local polynomial bands of Xia (1998) and Claeskens and Van Keilegom (2003). Simulation experiments corroborate the asymptotic theory. The linear spline band has been used to identify an appropriate polynomial trend for fossil data.
Confidence Intervals for Nonparametric Curve Estimates Based on Local Smoothing
 J. Am. Stat. Assoc
, 1998
"... Numerous nonparametric regression methods exist which yield consistent estimators of function curves. Often one is also interested in constructing confidence intervals for the unknown function. Pointwise confidence intervals are available using globally crossvalidated smoothing spline (GCV) estim ..."
Abstract

Cited by 8 (0 self)
 Add to MetaCart
Numerous nonparametric regression methods exist which yield consistent estimators of function curves. Often one is also interested in constructing confidence intervals for the unknown function. Pointwise confidence intervals are available using globally crossvalidated smoothing spline (GCV) estimation. When the function estimate is based on a single global smoothing parameter the resulting confidence intervals may hold their desired confidence level 1 \Gamma ff on average but because bias in nonparametric estimation is not uniform, they do not hold the desired level uniformly at all design points. To deal with this problem, a new smoothing spline estimator is developed which uses a local crossvalidation (LCV) criterion to determine a separate smoothing parameter for each design point. The local smoothing parameters are then used to compute the point estimators of the regression curve and the corresponding pointwise confidence intervals. Incorporation of local information th...
Feature Significance for Multivariate Kernel Density Estimation
"... Multivariate kernel density estimation provides information about structure in data. Feature significance is a technique for deciding whether features – such as local extrema – are statistically significant. This paper proposes a framework for feature significance in ddimensional data which combine ..."
Abstract

Cited by 8 (1 self)
 Add to MetaCart
Multivariate kernel density estimation provides information about structure in data. Feature significance is a technique for deciding whether features – such as local extrema – are statistically significant. This paper proposes a framework for feature significance in ddimensional data which combines kernel density derivative estimators and hypothesis tests for modal regions. For the gradient and curvature estimators distributional properties are given, and pointwise test statistics are derived. The hypothesis tests extend the twodimensional feature significance ideas of Godtliebsen et al. (2002). The theoretical framework is complemented by novel visualisation for threedimensional data. Applications to real data sets show that tests based on the kernel curvature estimators perform well in identifying modal regions. These results can be enhanced by corresponding tests with kernel gradient estimators.
Application of Local Rank Tests to Nonparametric Regression
, 1999
"... . Let Y i = f(x i ) + E i (1 # i # n) with given covariates x 1 < x 2 < < x n , an unknown regression function f and independent random errors E i with median zero. It is shown how to apply several linear rank test statistics simultaneously in order to test monotonicity of f in various regio ..."
Abstract

Cited by 7 (3 self)
 Add to MetaCart
. Let Y i = f(x i ) + E i (1 # i # n) with given covariates x 1 < x 2 < < x n , an unknown regression function f and independent random errors E i with median zero. It is shown how to apply several linear rank test statistics simultaneously in order to test monotonicity of f in various regions and to identify its local extrema. Keywords and phrases. exponential inequality, linear rank statistic, modality, monotonicity, multiscale testing, quadratic complexity 1 1 Introduction Suppose that one observes (x 1 , Y 1 ), (x 2 , Y 2 ), . . . , (x n , Y n ), where x 1 < x 2 < < x n are given real numbers, and the Y i are independent random variables with continuous distribution functions G i () := IP{Y i # }. With G := ((x i , G i )) 1#i#n we call G increasing on an interval J # R if G i # st. G j whenever x i , x j # J and x i # x j . Here G i # st. G j means that G i is stochastically smaller than G j , that means, G i # G j pointwise. Analogously we call G decre...