## Simultaneous Truth and Performance Level Estimation (STAPLE): An Algorithm for the Validation of Image Segmentation (2004)

### Cached

### Download Links

- [spl.bwh.harvard.edu:8000]
- [www.cs.ualberta.ca]
- DBLP

### Other Repositories/Bibliography

Venue: | IEEE TRANS. MED. IMAG |

Citations: | 130 - 10 self |

### BibTeX

@ARTICLE{Warfield04simultaneoustruth,

author = {Simon K. Warfield and Kelly H. Zou and William M. Wells},

title = {Simultaneous Truth and Performance Level Estimation (STAPLE): An Algorithm for the Validation of Image Segmentation},

journal = {IEEE TRANS. MED. IMAG},

year = {2004},

volume = {23},

pages = {903--921}

}

### Years of Citing Articles

### OpenURL

### Abstract

Characterizing the performance of image segmentation approaches has been a persistent challenge. Performance analysis is important since segmentation algorithms often have limited accuracy and precision. Interactive drawing of the desired segmentation by human raters has often been the only acceptable approach, and yet suffers from intrarater and inter-rater variability. Automated algorithms have been sought in order to remove the variability introduced by raters, but such algorithms must be assessed to ensure they are suitable for the task. The performance of raters...

### Citations

8921 | Maximum likelihood from incomplete data via the EM algorithm
- Dempster, Laird, et al.
- 1977
(Show Context)
Citation Context ...ction may be an appropriately trained human rater or raters, or it may be an automated segmentation algorithm. Our algorithm is formulated as an instance of the ExpectationMaximization (EM) algorithm =-=[31]-=-, [32] and builds upon our earlier work [33], [34]. In the formulation of our algorithm described here, the expert segmentation decision at each voxel is directly observable, the hidden true segmentat... |

1093 | On combining classifiers
- Kittler, Hatef, et al.
- 1998
(Show Context)
Citation Context ...dering, class probability combining strategies (the Product Rule and the Sum Rule which can be used to express the Min Rule, the Max Rule, the Median Rule and the Majority 2sVote Rule as described by =-=[22]),-=- and strategies which assume each classifier has expertise in a subset of the decision domain [23]–[25]. Similar issues in combining decisions from multiple raters or studies arise in content-based ... |

1061 |
The EM Algorithm and Extensions
- McLachlan, Krishnan
(Show Context)
Citation Context ...may be an appropriately trained human rater or raters, or it may be an automated segmentation algorithm. Our algorithm is formulated as an instance of the ExpectationMaximization (EM) algorithm [31], =-=[32]-=- and builds upon our earlier work [33], [34]. In the formulation of our algorithm described here, the expert segmentation decision at each voxel is directly observable, the hidden true segmentation is... |

988 | On the statistical analysis of dirty pictures
- Besag
- 1986
(Show Context)
Citation Context ...ors which use MRF models are usually more complex to implement, exact estimates may be obtained in reasonable (polynomial) time [42], [43] and efficient approximation schemes are also available [44], =-=[45]. The Ha-=-mmersley-Clifford theorem [46] establishes a one to one correspondence between MRFs and probability models written in Gibbs form, as follows: f(T) = 1 Z exp(−E(T) ), τ Z = � exp( T −E(T) ), τ ... |

832 | Adaptive mixtures of local experts - Jacobs, Jordan, et al. - 1991 |

759 | Hierarchical mixtures of experts and the EM algorithm
- Jordan, Jacobs
- 1994
(Show Context)
Citation Context ...press the Min Rule, the Max Rule, the Median Rule and the Majority 2sVote Rule as described by [22]), and strategies which assume each classifier has expertise in a subset of the decision domain [23]�=-=��[25]-=-. Similar issues in combining decisions from multiple raters or studies arise in content-based collaborative filtering [26] and in meta-analysis of diagnostic tests [27], [28]. Estimating performance ... |

404 | Theoretical improvements in algorithmic efficiency for network flow problems
- EDMOND, KARP
- 1956
(Show Context)
Citation Context ... voxels for which Ti = 0 are on the sink side of the minimum cut and all the voxels for which Ti = 1 are on the other side. Our implementation uses the Edmonds-Karp maximum flow-minimum cut algorithm =-=[50]-=- with the multi-resolution solution strategy suggested by Greig et al. [42]. This model allows us to proscribe a spatially correlated true segmentation in our estimation framework, and creates a more ... |

358 |
Measures of the amount of ecological association between species. Ecology 26:297–302
- Dice
- 1945
(Show Context)
Citation Context ...e different boundaries [9] and so alternative measures have been sought [10]. Other measures used in practice include measures of spatial overlap, such as the Dice and Jaccard Similarity Coefficients =-=[11]-=-, [12]. For example, spatial overlap has been used to compare manual segmentations with segmentations obtained through nonrigid registration [13]. Furthermore, measures inspired by information theory ... |

344 |
Exact maximum a posteriori estimation for binary images
- Greig, Porteous, et al.
- 1989
(Show Context)
Citation Context ...cluding medical image segmentation problems [39]–[41]. While the estimators which use MRF models are usually more complex to implement, exact estimates may be obtained in reasonable (polynomial) tim=-=e [42]-=-, [43] and efficient approximation schemes are also available [44], [45]. The Hammersley-Clifford theorem [46] establishes a one to one correspondence between MRFs and probability models written in Gi... |

330 | Decision combination in multiple classifier systems
- Ho, Hull, et al.
- 1994
(Show Context)
Citation Context ...referred choice of the voters. Preferential voting strategies, operating on votes or class probabilities, have been examined in the context of classifier fusion [20]. Examples include the Borda count =-=[21]-=- for preferential vote ordering, class probability combining strategies (the Product Rule and the Sum Rule which can be used to express the Min Rule, the Max Rule, the Median Rule and the Majority 2sV... |

313 | Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm
- Zhang, Brady, et al.
(Show Context)
Citation Context ...d. Beginning with the Ising [38] models of ferromagnetism, these models have frequently been used to model phenomena that exhibit spatial coherence, including medical image segmentation problems [39]�=-=��[41]-=-. While the estimators which use MRF models are usually more complex to implement, exact estimates may be obtained in reasonable (polynomial) time [42], [43] and efficient approximation schemes are al... |

251 | Spatial registration and normalization of images - Friston, Ashburner, et al. - 1995 |

236 | Content-boosted collaborative filtering for improved recommendations
- Melville, Mooney, et al.
- 2002
(Show Context)
Citation Context ...assume each classifier has expertise in a subset of the decision domain [23]–[25]. Similar issues in combining decisions from multiple raters or studies arise in content-based collaborative filterin=-=g [26]-=- and in meta-analysis of diagnostic tests [27], [28]. Estimating performance in the presence of an imperfect or limited reference standard has also been explored [29], [30]. The most appropriate vote ... |

184 |
Design and construction of a realistic digital brain phantom
- Collins, Zijdenbos, et al.
- 1998
(Show Context)
Citation Context ...the realism of the model by incorporating the imaging system characteristics, but also reduces the fidelity with which the true segmentation is known. Although such physical and digital phantoms [2], =-=[3] h-=-ave an important role to play in quantifying algorithm performance, such phantoms don’t fully reflect clinical images due to the difficulty of constructing phantoms that reproduce the full range of ... |

182 |
Beitrag zur theorie des ferromagnetismus, (Contribution to the Theory of Ferromagnetism)”, Zeitschrift für Physik
- Ising
(Show Context)
Citation Context ...tion is the MRF model, in which the conditional dependence of a given voxel on all of the others is equal to its conditional dependence on the voxels in a local neighborhood. Beginning with the Ising =-=[38] m-=-odels of ferromagnetism, these models have frequently been used to model phenomena that exhibit spatial coherence, including medical image segmentation problems [39]–[41]. While the estimators which... |

159 | Exact optimization for Markov random fields with convex priors
- Ishikawa
(Show Context)
Citation Context ...g medical image segmentation problems [39]–[41]. While the estimators which use MRF models are usually more complex to implement, exact estimates may be obtained in reasonable (polynomial) time [42]=-=, [43]-=- and efficient approximation schemes are also available [44], [45]. The Hammersley-Clifford theorem [46] establishes a one to one correspondence between MRFs and probability models written in Gibbs fo... |

149 |
Parallel and deterministic algorithms from MRFs: Surface reconstruction and integration
- Geiger, Girosi
- 1990
(Show Context)
Citation Context ...s [48]. The mean field approximation has been used for constrained surface reconstruction, and its use has been motivated by the fact that it is the minimum variance Bayes estimator of the true field =-=[49]-=-. The mean field of the estimated true segmentation at iteration k (that is, the mean field of W (k) si ), can be found by an embedded iteration process. It is found by initializing with the voxelwise... |

93 |
Morphometric analysis of white matter lesions in MR images: Method and validation
- Zijdenbos
- 1994
(Show Context)
Citation Context ...ifferences is useful and the Hausdorff measure and modifications have been used for this [16]. Agreement measures for comparing different experts, such as the kappa statistic, have also been explored =-=[17]-=-. A reference standard has sometimes been formed by taking a combination of expert segmentations. For example, a voting rule as often used in practice selects all voxels where some majority of experts... |

92 | template moderated, spatially varying statistical classi¯cation
- Warfield, Kaus, et al.
(Show Context)
Citation Context ...is an underestimate of the peripheral zone. 4) Assessment of an Automated Segmentation Algorithm: Figure 8 illustrates STAPLE applied 18 to the analysis of expert segmentations of a brain tumor [53], =-=[54]-=-. The hidden true segmentation was estimated from three expert segmentations, requiring 0.15 seconds to compute. The segmentation generated by the program was then assessed by evaluating the sensitivi... |

89 |
The distribution of flora in the alpine zone
- Jaccard
- 1912
(Show Context)
Citation Context ...erent boundaries [9] and so alternative measures have been sought [10]. Other measures used in practice include measures of spatial overlap, such as the Dice and Jaccard Similarity Coefficients [11], =-=[12]-=-. For example, spatial overlap has been used to compare manual segmentations with segmentations obtained through nonrigid registration [13]. Furthermore, measures inspired by information theory have b... |

80 | Automated model-based bias field correction of MR images of the brain - Leemput, Maes, et al. - 1999 |

75 | Valmet: A new validation tool for assessing and improving 3D object segmentation
- Gerig, Jomier, et al.
- 2001
(Show Context)
Citation Context ...res inspired by information theory have been applied [14], [15]. In many applications, assessment of boundary differences is useful and the Hausdorff measure and modifications have been used for this =-=[16]-=-. Agreement measures for comparing different experts, such as the kappa statistic, have also been explored [17]. A reference standard has sometimes been formed by taking a combination of expert segmen... |

54 | Parametric estimate of intensity inhomogeneities applied to MRI
- Styner, Brechbuhler, et al.
- 2000
(Show Context)
Citation Context ...ases the realism of the model by incorporating the imaging system characteristics, but also reduces the fidelity with which the true segmentation is known. Although such physical and digital phantoms =-=[2], -=-[3] have an important role to play in quantifying algorithm performance, such phantoms don’t fully reflect clinical images due to the difficulty of constructing phantoms that reproduce the full rang... |

46 |
The visible human male: A technical report
- Spitzer, Ackerman
- 1996
(Show Context)
Citation Context ...rements on a phantom to the results expected in practice. Cadavers provide a more realistic model of anatomy, but the true segmentation can only be estimated, and such models differ from in vivo data =-=[4]-=-, [5]. Patient data provides the most realistic model for a given application task, but is the most difficult for which to identify a reference standard. A common alternative to phantom studies has be... |

39 | Spatial registration and normalisation of images - Friston, Ashburner, et al. - 1995 |

39 |
Routine quantitative analysis of brain and cerebrospinal fluid spaces with MR imaging. J Magn Reson Imaging 1992;2:619–629
- Kikinis, ME, et al.
(Show Context)
Citation Context ...king a combination of expert segmentations. For example, a voting rule as often used in practice selects all voxels where some majority of experts agree the structure to be segmented is present [18], =-=[19]-=-. This simple approach unfortunately does not provide guidance as to how many experts should agree before the structure is considered to be present. Furthermore vote counting strategies treat each vot... |

34 |
An automated registration algorithm for measuring MRI subcortical brain structures
- Iosifescu, Shenton, et al.
- 1997
(Show Context)
Citation Context ...entation to a group of expert segmentations is so far unclear. A number of metrics have been proposed to compare segmentations. Simply measuring the volume of segmented structures has often been used =-=[6]-=-, [7]. Two segmentation methods may be compared by assessing the limits of agreement [8] of volume estimates derived from the segmentations. However, volume estimates may be quite similar when the seg... |

34 | Quantitative magnetic resonance imaging of brain development in premature and mature newborns. Ann Neurol 43:224--235
- PS, Warfield, et al.
- 1998
(Show Context)
Citation Context ...ment of tissue types, including cortical gray matter, subcortical gray matter, myelinated white matter, unmyelinated white matter, and cerebrospinal fluid, from volumetric MRI of newborn infants [54]–=-=[56]-=- has been useful in understanding and characterising normal brain development and the processes of maturation [56], and the impact of injury upon brain development [57], [58]. We evaluated the impact ... |

31 | Validation of image segmentation and expert quality with an expectation-maximization algorithm
- Warfield, Zou, et al.
(Show Context)
Citation Context ...or raters, or it may be an automated segmentation algorithm. Our algorithm is formulated as an instance of the ExpectationMaximization (EM) algorithm [31], [32] and builds upon our earlier work [33], =-=[34]-=-. In the formulation of our algorithm described here, the expert segmentation decision at each voxel is directly observable, the hidden true segmentation is a binary variable for each voxel, and the p... |

27 | Evaluating image segmentation algorithms using the pareto front
- Everingham, Muller, et al.
- 2002
(Show Context)
Citation Context ...wever, volume estimates may be quite similar when the segmented structures are located differently, have different shapes or have different boundaries [9] and so alternative measures have been sought =-=[10]-=-. Other measures used in practice include measures of spatial overlap, such as the Dice and Jaccard Similarity Coefficients [11], [12]. For example, spatial overlap has been used to compare manual seg... |

27 | GPU-based level sets for 3D segmentation
- Lefohn, Cates, et al.
(Show Context)
Citation Context ...of the structure of interest, and with Markov Random Field models for spatial homogeneity. Recently the application of the STAPLE algorithm [34] to the evaluation of tumor segmentations was described =-=[59]-=-. The mean and standard deviation of sensitivity and specificity parameters, as estimated by STAPLE, were used to compare a rapid interactive level-set based segmentation algorithm to hand contouring ... |

25 |
3D Model-Based Segmentation of Individual Brain Structures from Magnetic Resonance Imaging Data
- Collins
- 1994
(Show Context)
Citation Context ...egmentations of white matter, a probabilistic atlas of the distribution of white matter derived from a large group of subjects can provide the prior true segmentation probabilities at each voxel [35]�=-=��[37]-=-. However, a probabilistic atlas is not available for all structures of interest. A second mechanism for incorporating spatial homogeneity is to introduce a Markov Random Field (MRF) model. This allow... |

24 |
The Application of Mean Field Theory to Image Motion Estimation
- Zhang, Hanauer
- 1995
(Show Context)
Citation Context ...tensive investigation of this has been reported by Elfadel [44] and it has been used to impose a spatial homogeneity constraint for image segmentation [39], [47] and for motion estimation from images =-=[48]-=-. The mean field approximation has been used for constrained surface reconstruction, and its use has been motivated by the fact that it is the minimum variance Bayes estimator of the true field [49]. ... |

23 |
Enhanced spatial priors for segmentation of magnetic resonance imagery," presented at MICCAI'98
- Kapur, Grimson, et al.
- 1998
(Show Context)
Citation Context ...orhood. Beginning with the Ising [38] models of ferromagnetism, these models have frequently been used to model phenomena that exhibit spatial coherence, including medical image segmentation problems =-=[39]��-=-�[41]. While the estimators which use MRF models are usually more complex to implement, exact estimates may be obtained in reasonable (polynomial) time [42], [43] and efficient approximation schemes a... |

19 | Automatic identification of grey matter structures from MRI to improve the segmentation of white matter lesions
- Warfield, Dengler, et al.
- 1995
(Show Context)
Citation Context ... by taking a combination of expert segmentations. For example, a voting rule as often used in practice selects all voxels where some majority of experts agree the structure to be segmented is present =-=[18]-=-, [19]. This simple approach unfortunately does not provide guidance as to how many experts should agree before the structure is considered to be present. Furthermore vote counting strategies treat ea... |

17 |
Measuring global and local spatial correspondence using information theory
- Bello, Colchester
- 1998
(Show Context)
Citation Context ...e, spatial overlap has been used to compare manual segmentations with segmentations obtained through nonrigid registration [13]. Furthermore, measures inspired by information theory have been applied =-=[14]-=-, [15]. In many applications, assessment of boundary differences is useful and the Hausdorff measure and modifications have been used for this [16]. Agreement measures for comparing different experts,... |

17 | three-dimensional finite element-based deformable registration of pre-and intraoperative prostate imaging
- Bharatha, Hirose, et al.
- 2001
(Show Context)
Citation Context ...onventional MRI scan (T2w acquisition, 0.468 750 0.468 750 3.0 mm ). The goal of the operator was to segment the prostate peripheral zone to enable radiation dose planning in support of brachytherapy =-=[52]-=-. The STAPLE algorithm ran to convergence in 0.48 s of wallclock time.WARFIELD et al.: STAPLE: AN ALGORITHM FOR THE VALIDATION OF IMAGE SEGMENTATION 917 TABLE II QUALITY ESTIMATES FOR THE FIVE PROSTA... |

15 | Applying the right statistics: Analyses of measurement studies
- Bland, Altman
- 2003
(Show Context)
Citation Context ...been proposed to compare segmentations. Simply measuring the volume of segmented structures has often been used [6], [7]. Two segmentation methods may be compared by assessing the limits of agreement =-=[8]-=- of volume estimates derived from the segmentations. However, volume estimates may be quite similar when the segmented structures are located differently, have different shapes or have different bound... |

14 | A novel nonrigid registration algorithm and applications. Paper read at
- Rexilius, Warfield, et al.
- 2001
(Show Context)
Citation Context ...ation estimate. An interesting strategy for certain problems is to use a probabilistic atlas to provide an initial true segmentation. For example, one may use a probabilistic atlas of the brain [37], =-=[51]-=- to provide an initial true segmentation estimate for the tissues of the brain. We have used the estimated true segmentation from the voxelwise independent STAPLE to initialize the STAPLE algorithm wh... |

14 |
Mesial temporal sclerosis and temporal lobe epilepsy: MR imaging deformation-based segmentation of the hippocampus in five patients
- Hogan, Mark, et al.
- 2000
(Show Context)
Citation Context ..., such as the Dice and Jaccard Similarity Coefficients [11], [12]. For example, spatial overlap has been used to compare manual segmentations with segmentations obtained through nonrigid registration =-=[13]-=-. Furthermore, measures inspired by information theory have been applied [14], [15]. In many applications, assessment of boundary differences is useful and the Hausdorff measure and modifications have... |

12 | M.: Toward a common validation methodology for segmentation and registration algorithms
- Yoo, Ackerman, et al.
- 2000
(Show Context)
Citation Context ...ts on a phantom to the results expected in practice. Cadavers provide a more realistic model of anatomy, but the true segmentation can only be estimated, and such models differ from in vivo data [4], =-=[5]-=-. Patient data provides the most realistic model for a given application task, but is the most difficult for which to identify a reference standard. A common alternative to phantom studies has been to... |

12 | Using a combination of reference tests to assess the accuracy of a new diagnostic test. Stat. Med
- Alonzo, Pepe
- 1999
(Show Context)
Citation Context ...ent-based collaborative filtering [26] and in meta-analysis of diagnostic tests [27], [28]. Estimating performance in the presence of an imperfect or limited reference standard has also been explored =-=[29]-=-, [30]. The most appropriate vote ordering or decision combining strategy remains unclear. We present here a new algorithm, Simultaneous Truth And Performance Level Estimation (STAPLE), which takes a ... |

12 | White matter injury in the premature infant: A comparison between serial cranial sonographic and MR findings at term
- Inder, Anderson, et al.
- 2003
(Show Context)
Citation Context ...ric MRI of newborn infants [54]–[56] has been useful in understanding and characterising normal brain development and the processes of maturation [56], and the impact of injury upon brain development =-=[57]-=-, [58]. We evaluated the impact of repeated selection of tissue type prototypes upon the classification of an MRI of a newborn infant at term equivalent age. A single operator repeatedly selected samp... |

11 |
Reliability parameters to improve combination strategies in multi-expert systems
- Cordella, Foggia, et al.
- 1999
(Show Context)
Citation Context ...nique and may not reflect the overall preferred choice of the voters. Preferential voting strategies, operating on votes or class probabilities, have been examined in the context of classifier fusion =-=[20]-=-. Examples include the Borda count [21] for preferential vote ordering, class probability combining strategies (the Product Rule and the Sum Rule which can be used to express the Min Rule, the Max Rul... |

11 | Expectation maximization strategies for multi-atlas multi-label segmentation
- Rohlfing, Russakoff, et al.
(Show Context)
Citation Context ...r intracranial cavity segmentation of brain MRI images [60]. The STAPLE algorithm [34] and generalizations have also been applied to the evaluation of segmentations derived from nonrigid registration =-=[61]-=-, [62]. In this work comparisons were made between a majority vote rule, pairwise binary and unordered multi-category label estimates of the hidden true segmentation and performance. The advantage of ... |

9 |
Physics-Based Nonrigid Registration for Medical Image Analysis
- Rexilius
- 2001
(Show Context)
Citation Context ...ain segmentations of white matter, a probabilistic atlas of the distribution of white matter derived from a large group of subjects can provide the prior true segmentation probabilities at each voxel =-=[35]��-=-�[37]. However, a probabilistic atlas is not available for all structures of interest. A second mechanism for incorporating spatial homogeneity is to introduce a Markov Random Field (MRF) model. This ... |

8 | Automated model-based bias field correction - Leemput, Maes, et al. - 1999 |

8 | Automated segmentation of MRI brain tumors
- Kaus, Warfield, et al.
(Show Context)
Citation Context ...ntation 4 is an under-estimate of the peripheral zone. 4) Assessment of an Automated Segmentation Algorithm: Fig. 8 illustrates STAPLE applied to the analysis of expert segmentations of a brain tumor =-=[53]-=-, [54]. The hidden true segmentation was estimated from three expert segmentations, requiring 0.15 s to compute. The segmentation generated by the program was then assessed by evaluating the sensitivi... |

7 |
Morphometric Analysis of White
- Zijdenbos, Dawant, et al.
- 1994
(Show Context)
Citation Context ...ifferences is useful and the Hausdorff measure and modifications have been used for this [16]. Agreement measures for comparing different experts, such as the kappa statistic, have also been explored =-=[17]-=-. A reference standard has sometimes been formed by taking a combination of expert segmentations. For example, a voting rule used in practice selects all voxels where some majority of experts agree th... |

6 | Simultaneous validation of image segmentation and assessment of expert quality
- Warfield, Zou, et al.
(Show Context)
Citation Context ...rater or raters, or it may be an automated segmentation algorithm. Our algorithm is formulated as an instance of the ExpectationMaximization (EM) algorithm [31], [32] and builds upon our earlier work =-=[33]-=-, [34]. In the formulation of our algorithm described here, the expert segmentation decision at each voxel is directly observable, the hidden true segmentation is a binary variable for each voxel, and... |