## Scale-Based Gaussian Coverings: Combining Intra and Inter Mixture Models in Image Segmentation (2009)

Citations: | 1 - 0 self |

### BibTeX

@MISC{Murtagh09scale-basedgaussian,

author = {Fionn Murtagh and Pedro Contreras and Jean-luc Starck and Wilton Place},

title = {Scale-Based Gaussian Coverings: Combining Intra and Inter Mixture Models in Image Segmentation},

year = {2009}

}

### OpenURL

### Abstract

By a “covering ” we mean a Gaussian mixture model fit to observed data. Approximations of the Bayes factor can be availed of to judge model fit to the data within a given Gaussian mixture model. Between families of Gaussian mixture models, we propose the Rényi quadratic entropy as an excellent and tractable model comparison framework. We exemplify this using the segmentation of an MRI image volume, based (1) on a direct Gaussian mixture model applied to the marginal distribution function, and (2) Gaussian model fit through k-means applied to the 4D multivalued image volume furnished by the wavelet transform. Visual preference for one model over another is not immediate. The Rényi quadratic entropy allows us to show clearly that one of these modelings is superior to the other.

### Citations

6968 |
A mathematical theory of communication
- SHANNON
- 1948
(Show Context)
Citation Context ...e most probable image is obtained by maximizing p(g|f). This leads to algorithms for noise filtering and to deconvolution [29]. We need a probability density p(g) of the data. The Shannon entropy, HS =-=[26]-=-, is the summing of the following for each pixel, Nb ∑ HS(g) = − k=1 10 pk log pk (6)where X = {g1, ..gn} is an image containing integer values, Nb is the number of possible values of a given pixel g... |

311 | How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis
- Fraley, Raftery
(Show Context)
Citation Context ...rmation Criterion (BIC), which approximates the Bayes factor of posterior ratios, takes the form of the same penalized likelihood, − log f(X| ˆ θ) + k 2 log n, where ˆθ = ML or MAP estimate of θ. See =-=[7]-=- for case studies using BIC. 3 Segmentation of Arbitrary Signal through a Gaussian Mixture Model Notwithstanding the fact that often signal is not Gaussian, cf. the illustration of Figure 1, we can fi... |

156 | Model selection and the principle of minimum description length
- Hansen, Yu
- 2001
(Show Context)
Citation Context ...scribe it does not cater for hierarchically embedded segments or clusters. An example of where hierarchical embedding, or nested clusters, come into play can be found in [20]. Following Hansen and Yu =-=[12]-=-, we consider a model class, Θ, and an instantiation of this involving parameters θ to be estimated, yielding ˆ θ. We have θ ∈ R k 4so the parameter space is k-dimensional. Our observation vectors, o... |

138 |
Maximum entropy spectral analysis
- Burg
- 1975
(Show Context)
Citation Context ...his kind of entropy definition is not easy to use for signal restoration, because its gradient is not easy to compute. For these reasons, other entropy functions are generally used, including: • Burg =-=[2]-=-: HB(g) = − − 1 n∑ ln(gk) (7) k=1 • Frieden [8]: n∑ HF (g) = − gk ln(gk) (8) k=1 • Gull and Skilling [11]: HG(g) = n∑ k=1 ( ) gk gk − Mk − gk ln Mk where M is a given model, usually taken as a flat im... |

65 | Entropy measures and unconditional security in cryptography. (ETH series in information security and cryptography, vol 1) Hartung-Gorre
- Cachin
- 1997
(Show Context)
Citation Context ...entropy: HS = − ∫ f(x) log f(x)dx • Rényi entropy: HRα = 1 1−α log ∫ f(x) α dx for α > 0, α ̸= 1. We have: limα−→1HRα = HS. So HR1 = HS. We also have: HRβ ≥ HS ≥ HRγ for 0 < β < 1 and 1 < γ (see e.g. =-=[3]-=-, section 3.3). When α = 2, HR2 is quadratic entropy. Both Shannon and Rényi quadratic entropy are additive, a property which will be availed of by us below for example when we we define entropy for a... |

52 |
Information Theoretic Clustering
- Gokcay, Príncipe
- 2002
(Show Context)
Citation Context ...alysis. An additional reason for discussing the work reported on in this section is the common processing platform provided by entropy. Often the entropy provides the optimization criterion used (see =-=[10, 24, 29]-=-, and many other works besides). In keeping with entropy as having a key role in a common processing platform we instead want to use entropy for crossmodel selection. Note that it complements other cr... |

38 |
Astronomical Image and Data Analysis
- Starck, Murtagh
- 2002
(Show Context)
Citation Context ...A) − 2 log p(B) = − log p 2 (A) − log p 2 (B). 5 The Entropy of a Wavelet Transformed Signal The wavelet transform is a resolution-based decomposition – hence with an in-built spatial model: see e.g. =-=[29, 30]-=-. A redundant wavelet transform is most appropriate, even if decimated alternatives can be considered straightforwardly too. This is because segmentation, taking information into account at all availa... |

30 |
State of the art in pattern recognition
- Nagy
- 1968
(Show Context)
Citation Context ...hical algorithms as more versatile than their partitional counterparts (for example, k-means or Gaussian mixture models) since the latter tend to work well only on data sets having isotropic clusters =-=[23]-=-. So in [20], we segmented astronomical images of different observing filters, that had first been matched such that they related to exactly the same fields of view and pixel resolution. For the segme... |

16 |
A Survey of Algorithm for Contiguity-constrained clustering and Related Problems. The computer journal
- MURTAGH
- 1985
(Show Context)
Citation Context ...rm we have obtained expresses interactions beween pairs. Function fij is a Gaussian. There are evident links here with Parzen kernels [4, 13] and clustering through mode detection (see e.g. [14], and =-=[17]-=- and references therein). For segmentation we will simplify further expression (20) to take into account just the equiweighted segments reduced to their mean (cf. [4]). In line with how we defined mut... |

14 |
Image enhancement and restoration
- Frieden
- 1979
(Show Context)
Citation Context ...se for signal restoration, because its gradient is not easy to compute. For these reasons, other entropy functions are generally used, including: • Burg [2]: HB(g) = − − 1 n∑ ln(gk) (7) k=1 • Frieden =-=[8]-=-: n∑ HF (g) = − gk ln(gk) (8) k=1 • Gull and Skilling [11]: HG(g) = n∑ k=1 ( ) gk gk − Mk − gk ln Mk where M is a given model, usually taken as a flat image In all definitions n is the number of pixel... |

10 |
The extendibility of statistical models
- Wit, McCullagh
- 2001
(Show Context)
Citation Context ...y sequence or adjacency relationships. Often we will use interchangeably the terms image, image volume if relevant, signal and data. The word “model” is used, in general, in many senses – statistical =-=[16]-=-, mathematical, physical models; mixture model; linear model; noise model; neural network model; sparse decomposition model; even, in different senses, data model. In practice, firstly and foremostly ... |

10 | Overcoming the curse of dimensionality in clustering by means of the wavelet transform
- Murtagh, Berry
- 2000
(Show Context)
Citation Context ... filtering allows, as a special case, thresholding and reading off segmented regions. Such approaches have been used for very fast – indeed one could say with justice, turbo-charged – clustering. See =-=[21, 22]-=-. Noise models are particularly important in the physical sciences (cf. CCD, charge-coupled device, detectors) and the following approach was developed in [28]. Observed data f in the physical science... |

7 |
statistical optics, and data testing: a problem solving approach
- Frieden, Probability
- 1983
(Show Context)
Citation Context ...mber of occurrences in the histogram’s kth bin. The trouble with this approach is that, because the number of occurrences 2 is finite, the estimate pk will be in error by an amount proportional to mk =-=[9]-=-. The error becomes significant when mk is small. Furthermore this kind of entropy definition is not easy to use for signal restoration, because its gradient is not easy to compute. For these reasons,... |

7 | On-line Bayesian tree-structured transformation of HMMs with optimal model selection for speaker adaptation
- Wang, Zhao
(Show Context)
Citation Context ...probability and largevalued noise events can be modeled as Gaussian components in the tail of the distribution. A fit of this fat tail distribution by a Gaussian mixture model is commonly carried out =-=[31]-=-. As in Wang and Zhao [31], one can allow Gaussian component PDFs to recombine to provide the clusters which are sought. These authors also found that using priors with heavy tails, rather than using ... |

6 |
Quantization from Bayes factors with application to multilevel thresholding
- Murtagh, Starck
- 2003
(Show Context)
Citation Context ...l mixing proportions. Figures 2 and 3 illustrate long-tailed behavior and show how marginal density Gaussian model fitting works in practice. The ordinates give frequencies. See further discussion in =-=[19, 18]-=-. 4 Additive Entropy Background on entropy can be found e.g. in [24]. Following Hartley’s 1928 treatment of equiprobable events, Shannon in 1948 developed his theory around expectation. In 1960 Rényi ... |

5 | Approximation of correlated nongaussian noise pdfs using gaussian mixture models, published
- Blum, Zhang, et al.
- 1999
(Show Context)
Citation Context ... or Laplace distribution. For 0 < β < 2, the distribution is heavy tailed. For β > 2, the distribution is light tailed. Heavy tailed noise can be modeled by a Gaussian mixture model with enough terms =-=[1]-=-. Similarly, in speech and audio processing, low-probability and largevalued noise events can be modeled as Gaussian components in the tail of the distribution. A fit of this fat tail distribution by ... |

4 |
Minimum entropy, k-means, spectral clustering
- Lee, Choi
- 2004
(Show Context)
Citation Context ...(14) and also restricting the weights, αi, αj = 1, ∀i ̸= j. The term we have obtained expresses interactions beween pairs. Function fij is a Gaussian. There are evident links here with Parzen kernels =-=[4, 13]-=- and clustering through mode detection (see e.g. [14], and [17] and references therein). For segmentation we will simplify further expression (20) to take into account just the equiweighted segments r... |

4 |
A summary of entropy statistics, Kybernetica
- Esteban, D
- 1995
(Show Context)
Citation Context ...ent of equiprobable events, Shannon in 1948 developed his theory around expectation. In 1960 Rényi developed a recursive rather than linear estimation. Various other forms of entropy are discussed in =-=[6]-=-. Consider density f with support in R m . Then: • Shannon entropy: HS = − ∫ f(x) log f(x)dx • Rényi entropy: HRα = 1 1−α log ∫ f(x) α dx for α > 0, α ̸= 1. We have: limα−→1HRα = HS. So HR1 = HS. We a... |

4 |
Bayesian inference for multiband image segmentation via model-based cluster trees
- Murtagh, Raftery, et al.
(Show Context)
Citation Context ...thms as more versatile than their partitional counterparts (for example, k-means or Gaussian mixture models) since the latter tend to work well only on data sets having isotropic clusters [23]. So in =-=[20]-=-, we segmented astronomical images of different observing filters, that had first been matched such that they related to exactly the same fields of view and pixel resolution. For the segmentation we u... |

4 | Pattern clustering based on noise modeling in wavelet space
- Murtagh, Starck
- 1998
(Show Context)
Citation Context ... filtering allows, as a special case, thresholding and reading off segmented regions. Such approaches have been used for very fast – indeed one could say with justice, turbo-charged – clustering. See =-=[21, 22]-=-. Noise models are particularly important in the physical sciences (cf. CCD, charge-coupled device, detectors) and the following approach was developed in [28]. Observed data f in the physical science... |

3 |
Bayes factors for edge detection from wavelet product spaces, Optical Engineering 2003
- Murtagh, Starck
(Show Context)
Citation Context ...l mixing proportions. Figures 2 and 3 illustrate long-tailed behavior and show how marginal density Gaussian model fitting works in practice. The ordinates give frequencies. See further discussion in =-=[19, 18]-=-. 4 Additive Entropy Background on entropy can be found e.g. in [24]. Following Hartley’s 1928 treatment of equiprobable events, Shannon in 1948 developed his theory around expectation. In 1960 Rényi ... |

2 |
Information Theory, Inference, and Learning Theory
- MacKay
- 2003
(Show Context)
Citation Context ...ts can use a BIC approximation to the Bayes factor (see section 2) for selection of model, k. Each of the functions fi comprising the new basis for the observed density f can be termed a radial basis =-=[15]-=-. A radial basis network, in this context, is an iterative EM-like fit optimization algorithm. An alternative view of parsimony is the view of a sparse basis, and model fitting is sparsification. This... |

1 |
Exploration of parameter spaces in a Virtual Observatory 2001, in Astronomical Data Analysis
- Djorgovski, Mahabal, et al.
(Show Context)
Citation Context ...least as relates to resolution scale. Our motivation is to have a rigorous model-based approach to data clustering or segmentation, that also and in addition encompasses resolution scale. In Figure 1 =-=[5]-=-, the clustering task is portrayed in its full generality. One way to address it is to build up parametrized clusters, for example using a Gaussian mixture model (GMM), so that the cluster “pods” are ... |

1 |
Function-point cluster analysis, Systematic Zoology
- Katz, Rohlf
- 1973
(Show Context)
Citation Context ... j. The term we have obtained expresses interactions beween pairs. Function fij is a Gaussian. There are evident links here with Parzen kernels [4, 13] and clustering through mode detection (see e.g. =-=[14]-=-, and [17] and references therein). For segmentation we will simplify further expression (20) to take into account just the equiweighted segments reduced to their mean (cf. [4]). In line with how we d... |

1 |
Dongxin Xu, Information-theoretic learning using Rényi’s quadratic entropy
- Principe
- 1999
(Show Context)
Citation Context ...d show how marginal density Gaussian model fitting works in practice. The ordinates give frequencies. See further discussion in [19, 18]. 4 Additive Entropy Background on entropy can be found e.g. in =-=[24]-=-. Following Hartley’s 1928 treatment of equiprobable events, Shannon in 1948 developed his theory around expectation. In 1960 Rényi developed a recursive rather than linear estimation. Various other f... |

1 |
An astronomer’s perspective on SCMA III, in Statistical Challenges in Modern Astronomy
- Silk
- 2003
(Show Context)
Citation Context ... for example using a Gaussian mixture model (GMM), so that the cluster “pods” are approximated by the mixture made up of the cluster component “peas” (a viewpoint expressed by A.E. Raftery, quoted in =-=[27]-=-). A step beyond a pure “peas” in a “pod” approach to clustering is a hierarchical approach. Application specialists often consider hierarchical algorithms as more versatile than their partitional cou... |