Results 1 - 10
of
46
Does Code Decay? Assessing the Evidence from Change Management Data
- IEEE TRANSACTIONS ON SOFTWARE ENGINEERING
, 1998
"... A central feature of the evolution of large software systems is that change -- which is necessary to add new functionality, accommodate new hardware and repair faults -- becomes increasingly difficult over time. In this paper we approach this phenomenon, which we term code decay, scientifically and ..."
Abstract
-
Cited by 124 (8 self)
- Add to MetaCart
A central feature of the evolution of large software systems is that change -- which is necessary to add new functionality, accommodate new hardware and repair faults -- becomes increasingly difficult over time. In this paper we approach this phenomenon, which we term code decay, scientifically and statistically. We define code decay, and propose a number of measurements (code decay indices) on software, and on the organizations that produce it, that serve as symptoms, risk factors and predictors of decay. Using an unusually rich data set (the fifteen-plus year change history of the millions of lines of software for a telephone switching system), we find mixed but on the whole persuasive statistical evidence of code decay, which is corroborated by developers of the code. Suggestive indications that perfective maintenance can retard code decayarealso discussed.
Visualizing Software Changes
- INTERACTIONS
, 2002
"... Visualizations of software changes are presented that complement existing visualizations of software structure. The principal metaphors are matrix views, cityscapes, bar and pie charts, data sheets and networks. Linked by selection mechanisms, multiple views are combined to form perspectives that bo ..."
Abstract
-
Cited by 32 (1 self)
- Add to MetaCart
Visualizations of software changes are presented that complement existing visualizations of software structure. The principal metaphors are matrix views, cityscapes, bar and pie charts, data sheets and networks. Linked by selection mechanisms, multiple views are combined to form perspectives that both enable discovery of high-level structure in software change data and allow effective access to details of those data. Use of the views and perspectives is illustrated in two important contexts: understanding software change by exploration of software change data and management of software development.
On Locally Adaptive Density Estimation
, 1996
"... : In this paper, theoretical and practical aspects of the sample-point adaptive positive kernel density estimator are examined. A closed-form expression for the mean integrated squared error is obtained through the device of preprocessing the data by binning. With this expression, the exact behavio ..."
Abstract
-
Cited by 30 (4 self)
- Add to MetaCart
: In this paper, theoretical and practical aspects of the sample-point adaptive positive kernel density estimator are examined. A closed-form expression for the mean integrated squared error is obtained through the device of preprocessing the data by binning. With this expression, the exact behavior of the optimally adaptive smoothing parameter function is studied for the first time. The approach differs from most earlier techniques in that bias of the adaptive estimator remains O(h 2 ) and is not "improved" to the rate O(h 4 ). A practical algorithm is constructed using a modification of least-squares cross-validation. Simulated and real examples are presented, including comparisons with a fixed bandwidth estimator and a fully automatic version of Abramson's adaptive estimator. The results are very promising. KEY WORDS: Kernel Function, Variable Bandwidth, Binning, Cross-Validation. 1 Stephan R. Sain is Research Associate, Department of Statistical Science, Southern Methodist U...
Geoadditive Models
, 2000
"... this paper is a recent article on model-based geostatistics by Diggle, Tawn and Moyeed (1998) where pure kriging (i.e. no covariates) is the focus. Our paper inherits some of its aspects: model-based and with mixed model connections. In particular the comment by Bowman (1998) in the ensuing discussi ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
this paper is a recent article on model-based geostatistics by Diggle, Tawn and Moyeed (1998) where pure kriging (i.e. no covariates) is the focus. Our paper inherits some of its aspects: model-based and with mixed model connections. In particular the comment by Bowman (1998) in the ensuing discussion suggested that additive modelling would be a worthwhile extension. This paper essentially follows this suggestion. However, this paper is not the first to combine the notions of geostatistics and additive modelling. References known to us are Kelsall and Diggle (1998), Durban Reguera (1998) and Durban, Hackett, Currie and Newton (2000). Nevertheless, we believe that our approach has a number of attractive features (see (1)-(4) above), not all shared by these references. Section 2 describes the motivating application and data in detail. Section 3 shows how one can express additive models as a mixed model, while Section 4 does the same for kriging and merges the two into the geoadditive model. Issues concerning the amount of smoothing are discussed in Section 5 and inferential aspects are treated in Section 6. Our analysis of the Upper Cape Cod reproductive data is presented in Section 7. Section 8 discusses extension to the generalised context.We close the paper with some disussion in Section 9. 2 Description of the application and data
Long range dependence analysis of Internet traffic: Summary page for LRD project
, 2010
"... Long Range Dependent time series are endemic in the statistical analysis of Internet traffic. The Hurst Parameter provides good summary of important self-similar scaling properties. We compare a number of different Hurst parameter estimation methods and some important variations. This is done in the ..."
Abstract
-
Cited by 10 (7 self)
- Add to MetaCart
Long Range Dependent time series are endemic in the statistical analysis of Internet traffic. The Hurst Parameter provides good summary of important self-similar scaling properties. We compare a number of different Hurst parameter estimation methods and some important variations. This is done in the context of a wide range of simulated, laboratory generated and real data sets. Important differences between the methods are highlighted. Deep insights are revealed on how well the laboratory data mimic the real data. Non-stationarities, that are local in time, are seen to be central issues, and lead to both conceptual and practical recommendations. 1
Confidence Intervals for Nonparametric Curve Estimates Based on Local Smoothing
- J. Am. Stat. Assoc
, 1998
"... Numerous nonparametric regression methods exist which yield consistent estimators of function curves. Often one is also interested in constructing confidence intervals for the unknown function. Pointwise confidence intervals are available using globally crossvalidated smoothing spline (GCV) estim ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Numerous nonparametric regression methods exist which yield consistent estimators of function curves. Often one is also interested in constructing confidence intervals for the unknown function. Pointwise confidence intervals are available using globally crossvalidated smoothing spline (GCV) estimation. When the function estimate is based on a single global smoothing parameter the resulting confidence intervals may hold their desired confidence level 1 \Gamma ff on average but because bias in nonparametric estimation is not uniform, they do not hold the desired level uniformly at all design points. To deal with this problem, a new smoothing spline estimator is developed which uses a local cross-validation (LCV) criterion to determine a separate smoothing parameter for each design point. The local smoothing parameters are then used to compute the point estimators of the regression curve and the corresponding pointwise confidence intervals. Incorporation of local information th...
Polynomial spline confidence bands for regression curves
, 2007
"... Abstract: Asymptotically exact and conservative confidence bands are obtained for a nonparametric regression function, using piecewise constant and piecewise linear spline estimation, respectively. Compared to the pointwise confidence interval of Huang (2003), the confidence bands are inflated by a ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
Abstract: Asymptotically exact and conservative confidence bands are obtained for a nonparametric regression function, using piecewise constant and piecewise linear spline estimation, respectively. Compared to the pointwise confidence interval of Huang (2003), the confidence bands are inflated by a factor proportional to {log (n)} 1/2, with the same width order as the Nadaraya-Watson bands of Härdle (1989), and the local polynomial bands of Xia (1998) and Claeskens and Van Keilegom (2003). Simulation experiments corroborate the asymptotic theory. The linear spline band has been used to identify an appropriate polynomial trend for fossil data.
Application of Local Rank Tests to Nonparametric Regression
, 1999
"... . Let Y i = f(x i ) + E i (1 # i # n) with given covariates x 1 < x 2 < < x n , an unknown regression function f and independent random errors E i with median zero. It is shown how to apply several linear rank test statistics simultaneously in order to test monotonicity of f in various regio ..."
Abstract
-
Cited by 5 (2 self)
- Add to MetaCart
. Let Y i = f(x i ) + E i (1 # i # n) with given covariates x 1 < x 2 < < x n , an unknown regression function f and independent random errors E i with median zero. It is shown how to apply several linear rank test statistics simultaneously in order to test monotonicity of f in various regions and to identify its local extrema. Keywords and phrases. exponential inequality, linear rank statistic, modality, monotonicity, multiscale testing, quadratic complexity 1 1 Introduction Suppose that one observes (x 1 , Y 1 ), (x 2 , Y 2 ), . . . , (x n , Y n ), where x 1 < x 2 < < x n are given real numbers, and the Y i are independent random variables with continuous distribution functions G i () := IP{Y i # }. With G := ((x i , G i )) 1#i#n we call G increasing on an interval J # R if G i # st. G j whenever x i , x j # J and x i # x j . Here G i # st. G j means that G i is stochastically smaller than G j , that means, G i # G j pointwise. Analogously we call G decre...
Mixed model-based hazard estimation
- Journal of Computational and Graphical Statistics
, 2002
"... We propose a new method for estimation of the hazard function from a set of censored failure time data, with a view to extending the general approach to more complicated models. The approach is based on a mixed model representation of penalized spline hazard estimators. One payoff is the automation ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We propose a new method for estimation of the hazard function from a set of censored failure time data, with a view to extending the general approach to more complicated models. The approach is based on a mixed model representation of penalized spline hazard estimators. One payoff is the automation of the smoothing parameter choice through restricted maximum likelihood. Another is the option to use standard mixed model software for automatic hazard estimation. Key words: Non-parametric regression; Restricted maximum likelihood; Variance component; Survival analysis.
Visualization of Multivariate Density Estimates with Level Set Trees
"... We present a method for visualization of multivariate functions. The method is based on a tree structure, built from separated parts of level sets of a function, which we call level set tree. The method is applied for visualization of estimates of multivarate density functions. With dierent grap ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
We present a method for visualization of multivariate functions. The method is based on a tree structure, built from separated parts of level sets of a function, which we call level set tree. The method is applied for visualization of estimates of multivarate density functions. With dierent graphical representations of level set trees we may visualize the number and location of modes, excess masses associated with the modes, and certain shape characteristics of the estimate. We present simulation examples where projecting data to two dimension does not help to reveal the modes of the density, but with the help of level set trees one may detect the modes.

