## Machine learning methods for predicting failures in hard drives: A multiple-instance application (2005)

### Cached

### Download Links

- [jmlr.csail.mit.edu]
- [www.jmlr.org]
- [hebb.mit.edu]
- [jmlr.org]
- DBLP

### Other Repositories/Bibliography

Venue: | Journal of Machine Learning research |

Citations: | 31 - 1 self |

### BibTeX

@ARTICLE{Murray05machinelearning,

author = {Joseph F. Murray and Gordon F. Hughes and Dale Schuurmans},

title = {Machine learning methods for predicting failures in hard drives: A multiple-instance application},

journal = {Journal of Machine Learning research},

year = {2005},

volume = {6},

pages = {816}

}

### OpenURL

### Abstract

We compare machine learning methods applied to a difficult real-world problem: predicting computer hard-drive failure using attributes monitored internally by individual drives. The problem is one of detecting rare events in a time series of noisy and nonparametrically-distributed data. We develop a new algorithm based on the multiple-instance learning framework and the naive Bayesian classifier (mi-NB) which is specifically designed for the low false-alarm case, and is shown to have promising performance. Other methods compared are support vector machines (SVMs), unsupervised clustering, and non-parametric statistical tests (rank-sum and reverse arrangements). The failure-prediction performance of the SVM, rank-sum and mi-NB algorithm is considerably better than the threshold method currently implemented in drives, while maintaining low false alarm rates. Our results suggest that nonparametric statistical tests should be considered for learning problems involving detecting rare events in time series data. An appendix details the calculation of rank-sum significance probabilities in the case of discrete, tied observations, and we give new recommendations about when the exact calculation should be used instead of the commonly-used normal approximation. These normal approximations may be particularly inaccurate for rare event problems like hard drive failures.

### Citations

9946 | Statistical Learning Theory - Vapnik - 1998 |

2497 | A tutorial on support vector machines for pattern recognition
- Burges
- 1998
(Show Context)
Citation Context ... + ξi + ∑ L ∀i|yi=−1 − ξi where w and b are the parameters of the hyperplane �y = w T φ(x) + b and φ(·) is the mapping to the high-dimensional space implicit in the kernel k(x j,xk) = φ(x j) T φ(xk) (=-=Burges, 1998-=-). In the hard-drive failure problem, L + penalizes false alarms, and L − penalizes missed detections. Since C is multiplied by both L + and L − , there are only two independent parameters and we set ... |

645 | On the optimality of the simple Bayesian classifier under zero-one loss - Domingos, Pazzani - 1997 |

623 | Sparse Bayesian learning and the Relevance Vector Machine - Tipping - 2001 |

505 |
Individual Comparisons by Ranking Methods
- Wilcoxon
- 1945
(Show Context)
Citation Context ...nd Approximate Calculation of the Wilcoxon-Mann-Whitney Significance Probabilities The Wilcoxon-Mann-Whitney test is a widely used statistical procedure for comparing two sets of single-variate data (=-=Wilcoxon, 1945-=-; Mann and Whitney, 1947). The test makes no assumptions about the parametric form of the distributions each set is drawn from and so belongs to the class of nonparametric or distribution-free tests. ... |

457 | Supervised and Unsupervised Discretization of Continuous Features
- Dougherty, Kohavi, et al.
- 1995
(Show Context)
Citation Context ...aling. Performance comparison of the preprocessing is given in Section 5. The first type of preprocessing is binning (or discretization), which takes one of two forms: equal-frequency or equal-width (=-=Dougherty et al., 1995-=-). In equal-frequency binning, an attributes’ values are converted into discrete levels such that the number of counts at each level is the same (the discrete levels are percentile groups). In equal-w... |

394 | On discriminative vs. generative classifiers: A comparison of logistic regression and näıve Bayes - Ng, Jordan |

313 | On a test of whether one of two random variables is stochastically larger than the other
- Mann, Whitney
- 1947
(Show Context)
Citation Context ...alculation of the Wilcoxon-Mann-Whitney Significance Probabilities The Wilcoxon-Mann-Whitney test is a widely used statistical procedure for comparing two sets of single-variate data (Wilcoxon, 1945; =-=Mann and Whitney, 1947-=-). The test makes no assumptions about the parametric form of the distributions each set is drawn from and so belongs to the class of nonparametric or distribution-free tests. It tests the null hypoth... |

198 |
Solving the multiple instance problem with axis-parallel rectangles
- Dietterich, Lathrop, et al.
- 1997
(Show Context)
Citation Context ...f prediction algorithms, there are two main novel algorithmic contributions of the present work. First, we cast the hard drive failure prediction problem as a multiple-instance (MI) learning problem (=-=Dietterich et al., 1997-=-) and develop a new algorithm termed multiple-instance naive Bayes (mi-NB). The mi-NB algorithm adheres to the strict MI assumption (Xu, 2003) and is specifically designed with the low false-alarm cas... |

196 | Support vector machines for multipleinstance learning
- Andrews, Tsochantaridis, et al.
- 2003
(Show Context)
Citation Context ...ed by Dietterich et al. (1997) is called axis-parallel-rectangles, and other algorithms were subsequently developed based on many of the paradigms in machine learning such as support vector machines (=-=Andrews et al., 2003-=-), neural networks, expectation-maximization, nearest-neighbor (Wang and Zucker, 2000), as well as special purpose algorithms like the diversedensity algorithm. An extended discussion of many of these... |

168 |
On Changing Continuous Attributes into ordered discrete Attributes
- Catlett
- 1991
(Show Context)
Citation Context ...) is a common type of preprocessing in machine learning and can provide certain advantages in performance, generalization and computational efficiency (Frank and Witten, 1999; Dougherty et al., 1995; =-=Catlett, 1991-=-). As shown by Dougherty et al. (1995), discretization can provide performance improvements for certain classifiers (such as naive Bayes), and that while more complex discretization methods (such as t... |

73 |
Nonparametric tests against trend
- Mann
(Show Context)
Citation Context ...ared are support vector machines (SVMs), unsupervised clustering using the Autoclass software of Cheeseman and Stutz (1995) and the reverse-arrangements test (another nonparametric statistical test) (=-=Mann, 1945-=-). The best performance overall was achieved with SVMs, although computational times were much longer and there were many more parameters to set. The methods described here can be used in other applic... |

70 | Kernel matching pursuit
- Vincent, Bengio
(Show Context)
Citation Context ...esearchers have noticed this property of SVMs and have developed algorithms that create smaller sets of support vectors, such as the relevance vector machine (Tipping, 2001), kernel matching pursuit (=-=Vincent and Bengio, 2002-=-) and Bayesian neural networks (Liang, 2003). The SMART failure prediction algorithms (as currently implemented in hard-drives) run on the internal CPU’s of the drive and have rather limited memory an... |

66 | Learning to Predict Rare Events in Event Sequences
- Weiss, Hirsh
- 1998
(Show Context)
Citation Context ...d, 2000), financial forecasting such as predicting business failures and personal bankruptcies (Theodossiou, 1993), and predicting mechanical and electronic device failure (Preusser and Hadley, 1991; =-=Weiss and Hirsh, 1998-=-). 1.1 Previous Work in Hard Drive Failure Prediction In our previous work (Hughes et al., 2002) we studied the SMART failure prediction problem, comparing the manufacturer-selected decision threshold... |

58 |
Statistical inference based on ranks
- Hettmansperger
- 1984
(Show Context)
Citation Context ...om a larger distribution than R, against the hypothesis of identical distributions. Multivariate nonparametric rank-based tests that exploit correlations between attribute values have been developed (=-=Hettmansperger, 1984-=-; Dietz and Killeen, 1981; Brunner et al., 2002). A different multivariate rank-sum test was successfully applied to early SMART data (Hughes et al., 2002). It exploits the fact that error counts are ... |

45 | Always good turing: asymptotically optimal probability estimation, Science 302 (5644 - Orlitsky, Santhanam, et al. - 2003 |

33 | Bayesian approaches to failure prediction for disk drives
- Hamerly, Elkan
- 2001
(Show Context)
Citation Context ...hat by using nonparametric statistical tests, the accuracy of correctly detected failures can be improved to as much as 40-60% while maintaining acceptably low false alarm rates (Hughes et al., 2002; =-=Hamerly and Elkan, 2001-=-). In addition to providing a systematic comparison of prediction algorithms, there are two main novel algorithmic contributions of the present work. First, we cast the hard drive failure prediction p... |

27 |
Random Data
- Bendat, Piersol
- 1971
(Show Context)
Citation Context ...ure detection algorithms (see Section 4). 3.1 Reverse Arrangements Test The reverse arrangements test is a nonparametric test for trend which is applied to each attribute in the data set (Mann, 1945; =-=Bendat and Piersol, 2000-=-). It is used here based on the idea that a pattern of increasing drive errors is indicative of failure. Suppose we have a time sequence of observations of a random variable, xi,i = 1...N. In our case... |

25 |
The Advanced Theory of Statistics, volume 1
- Kendall, Stuart
- 1969
(Show Context)
Citation Context ...ize. The inaccuracies of normal approximations in small sample data size situations is a known aspect of the central limit theorem. It is particularly weak for statistics dependent on extreme values (=-=Kendall, 1969-=-). 813sMURRAY, HUGHES AND KREUTZ-DELGADO Recommendations Based on the results of the comparisons between the exact calculation of p0 and the normal approximation (Tables 7 and 7), we offer recommendat... |

20 | Improved disk-drive failure warnings
- HUGHES, MURRAY, et al.
- 2002
(Show Context)
Citation Context ...REUTZ-DELGADO Hard drive manufacturers have been developing self-monitoring technology in their products since 1994, in an effort to predict failures early enough to allow users to backup their data (=-=Hughes et al., 2002-=-). This Self-Monitoring and Reporting Technology (SMART) system uses attributes collected during normal operation (and during off-line tests) to set a failure prediction flag. The SMART flag is a one-... |

17 | Making better use of global discretization
- Frank, Witten
- 1999
(Show Context)
Citation Context ... (see 4.4). Binning (as a form of discretization) is a common type of preprocessing in machine learning and can provide certain advantages in performance, generalization and computational efficiency (=-=Frank and Witten, 1999-=-; Dougherty et al., 1995; Catlett, 1991). As shown by Dougherty et al. (1995), discretization can provide performance improvements for certain classifiers (such as naive Bayes), and that while more co... |

17 | Upper Saddle - Hall - 1995 |

14 |
On obtaining permutation distributions in polynomial time
- Pagano, Tritchler
- 1983
(Show Context)
Citation Context ...for computing the exact probabilities. Here we outline how to calculate the exact value of p0 but keep in mind there are other more efficient (but more complicated) algorithms (Mehta et al., 1988a,b; =-=Pagano and Tritchler, 1983-=-). Each element in X and Y can take one of c values, z1 < z2 < ··· < zc. The probability that xi will take on a value zk is pk: Similarly for yi, P(xi = zk) = pk i = 1..n, k = 1..c . P(y j = zk) = rk ... |

11 |
Predicting shifts in the mean of a multivariate time series process: an application in predicting business failures
- Theodossiou
- 1993
(Show Context)
Citation Context ...e series including medical diagnosis of rare diseases (Bridge and Sawilowsky, 1999; Rothman and Greenland, 2000), financial forecasting such as predicting business failures and personal bankruptcies (=-=Theodossiou, 1993-=-), and predicting mechanical and electronic device failure (Preusser and Hadley, 1991; Weiss and Hirsh, 1998). 1.1 Previous Work in Hard Drive Failure Prediction In our previous work (Hughes et al., 2... |

10 |
The wilcoxon, ties, and the computer
- Klotz
- 1966
(Show Context)
Citation Context ...V gives the ties configuration of the concatenated set. See Table 7 for an example of how to calculate T . Under the null hypothesis H0, the probability of observing ties configuration U is given by (=-=Klotz, 1966-=-), P(U|T) = ⎛ ⎝ t1 u1 ⎞⎛ ⎠⎝ t2 ⎞ ⎛ ⎠... ⎝ u2 tc ⎞ ⎠ uc ⎛ ⎞ ⎝ n+m n To find p0, we must find all the U such that WU > Wx, where WU is the rank sum of a set with ties configuration U, p0 = ∑ P(Ui|T) Ui∈... |

8 |
Increasing physicians’ awareness of the impact of statistics on research outcomes: Comparative power of the t-test and Wilcox Rank-Sum test in small samples applied research
- Bridge, Sawilowsky
- 1999
(Show Context)
Citation Context ...were many more parameters to set. The methods described here can be used in other applications where it is necessary to detect rare events in time series including medical diagnosis of rare diseases (=-=Bridge and Sawilowsky, 1999-=-; Rothman and Greenland, 2000), financial forecasting such as predicting business failures and personal bankruptcies (Theodossiou, 1993), and predicting mechanical and electronic device failure (Preus... |

7 |
A Nonparametric Multivariate Test for Monotone Trend with Pharmaceutical Applications
- Dietz, Killeen
- 1981
(Show Context)
Citation Context ...on than R, against the hypothesis of identical distributions. Multivariate nonparametric rank-based tests that exploit correlations between attribute values have been developed (Hettmansperger, 1984; =-=Dietz and Killeen, 1981-=-; Brunner et al., 2002). A different multivariate rank-sum test was successfully applied to early SMART data (Hughes et al., 2002). It exploits the fact that error counts are always positive. Here, we... |

6 |
Importance sampling for estimating exact probabilities in permutational inference
- Mehta, Patel, et al.
- 1988
(Show Context)
Citation Context ...rs nor give algorithms for computing the exact probabilities. Here we outline how to calculate the exact value of p0 but keep in mind there are other more efficient (but more complicated) algorithms (=-=Mehta et al., 1988-=-a,b; Pagano and Tritchler, 1983). Each element in X and Y can take one of c values, z1 < z2 < ··· < zc. The probability that xi will take on a value zk is pk: Similarly for yi, P(xi = zk) = pk i = 1..... |

5 | An effective Bayesian neural network classifier with a comparison study to support vector machine - Liang - 2003 |

4 | drive failure prediction using non-parametric statistical methods - Hard - 2003 |

3 |
Advances in Knowledge Discovery and Data Mining, chapter Bayesian Classification (AutoClass
- Cheeseman, Stutz
- 1995
(Show Context)
Citation Context ...FTWARE/MYSVM. 798 �s4.5 Clustering (Autoclass) METHODS FOR PREDICTING FAILURES IN HARD DRIVES Unsupervised clustering algorithms can be used for anomaly detection. Here, we use the Autoclass package (=-=Cheeseman and Stutz, 1995-=-) to learn a probabilistic model of the training data from only good drives. If any pattern is an anomaly (outlier) from the learned statistical model of good drives, then that drive is predicted to f... |

3 |
Constructing exact significance tests with restricted randomization rules
- Mehta, Patel, et al.
- 1988
(Show Context)
Citation Context ...rs nor give algorithms for computing the exact probabilities. Here we outline how to calculate the exact value of p0 but keep in mind there are other more efficient (but more complicated) algorithms (=-=Mehta et al., 1988-=-a,b; Pagano and Tritchler, 1983). Each element in X and Y can take one of c values, z1 < z2 < ··· < zc. The probability that xi will take on a value zk is pk: Similarly for yi, P(xi = zk) = pk i = 1..... |

3 |
Motor current signature analysis as a predictive maintenance tool
- Preusser, Hadley
- 1991
(Show Context)
Citation Context ... 1999; Rothman and Greenland, 2000), financial forecasting such as predicting business failures and personal bankruptcies (Theodossiou, 1993), and predicting mechanical and electronic device failure (=-=Preusser and Hadley, 1991-=-; Weiss and Hirsh, 1998). 1.1 Previous Work in Hard Drive Failure Prediction In our previous work (Hughes et al., 2002) we studied the SMART failure prediction problem, comparing the manufacturer-sele... |

2 | Exact and approximate distributions for the Wilcoxon statistic with ties - Lehman - 1961 |

1 | The multivariate nonparametric Behrens-Fisher problem - Brunner, Munzel, et al. |

1 | A note on the Wilcoxon-Mann-Whitney test for 2 x k ordered tables - Emerson, Moses - 1985 |