## On tests for global maximum of the log-likelihood function (2004)

### Cached

### Download Links

- [www.eecs.umich.edu]
- [web.eecs.umich.edu]
- DBLP

### Other Repositories/Bibliography

Venue: | IEEE Trans. Inform. Theory |

Citations: | 1 - 1 self |

### BibTeX

@ARTICLE{Blatt04ontests,

author = {Doron Blatt and Student Member and Alfred O. Hero},

title = {On tests for global maximum of the log-likelihood function},

journal = {IEEE Trans. Inform. Theory},

year = {2004}

}

### OpenURL

### Abstract

Abstract — Given the location of a relative maximum of the log-likelihood function, how to assess whether it is the global maximum? This paper investigates a statistical tool, which answers this question by posing it as a hypothesis testing problem. A general framework for constructing tests for global maximum is given. The characteristics of the tests are investigated for two cases: correctly specified model and model mismatch. A finite sample approximation to the power is given, which gives a tool for performance prediction and a measure for comparison between tests. The sensitivity of the tests to model mismatch is analyzed in terms of the Renyi divergence and the Kullback-Leibler distance between the true underlying distribution and the assumed parametric class and tests that are insensitive to small deviations from the model are derived. The tests are illustrated for three applications: passive localization or direction finding using an array of sensors, estimating the parameters of a Gaussian mixture model, and estimation of superimposed exponentials in noise- problems that are known to suffer from local maxima. Index Terms — Parameter estimation, maximum likelihood, global optimization, local maxima, array processing, Gaussian

### Citations

8176 | Maximum Likelihood from Incomplete Data with the EM Algorithm (with discussion
- Dempster, Laird, et al.
- 1997
(Show Context)
Citation Context ...plied. These methods are based on an initial guess (often found by a simpler method) which is followed by a local, often iterative, optimization procedure (e.g. the expectation maximization algorithm =-=[9]-=- and its variations [10], Fisher scoring [10], the Gauss-Newton method [11], and majorizing or minorizing algorithms [12], [13]). As a consequence, the performance of these methods highly depends on t... |

3036 |
Weak Convergence of Probability Measures
- Billingsley
- 1999
(Show Context)
Citation Context ..., hence �θn = � θn. Using the mean value theorem we obtain hn( � θn) = hn(θ 0 ) + ∇ T hn(θ)( � θn − θ 0 ) a.s.. Using the martingale central limit theorem with the filtration {Ft = σ(e1, . . . , et)} =-=[48]-=-, we obtain that hn(θ 0 ) converges in distribution to a zero-mean Gaussian random variable with variance σ 2 /2. Next, we show that the second term is oP (1). First split the second term into two com... |

1847 |
Robust Statistics
- Huber
- 1981
(Show Context)
Citation Context ...mple is which the measurements are i.n.i.d. was treated in Sec. VC. The concept of using a statistical test for discriminating between global and local maxima can be generalized to other M-estimators =-=[2]-=-, or any other optimization problem in which a statistical characterization of the global maximum is available. APPENDIX I ASYMPTOTIC DISTRIBUTION OF M-TESTS The proof follows White’s methodology [29]... |

950 |
The EM Algorithm and Extensions
- McLachlan, Krishnan
- 1997
(Show Context)
Citation Context ...e based on an initial guess (often found by a simpler method) which is followed by a local, often iterative, optimization procedure (e.g. the expectation maximization algorithm [9] and its variations =-=[10]-=-, Fisher scoring [10], the Gauss-Newton method [11], and majorizing or minorizing algorithms [12], [13]). As a consequence, the performance of these methods highly depends on the starting The material... |

808 |
Fundamentals of Statistical Signal Processing: Estimation Theory
- Kay
- 1993
(Show Context)
Citation Context ...in noise. I. INTRODUCTION THE maximum likelihood (ML) estimation method is one of the standard tools for parameter estimation. Among its appealing properties are consistency and asymptotic efficiency =-=[1]-=-–[3]. However, a major drawback of this method when applied to non-linear estimation problems is the fact that the associated likelihood equations required for the derivation of the estimator rarely h... |

476 |
statistical inference and its applications
- Rao
- 1973
(Show Context)
Citation Context ... the elements of Hn(θ), we have Hn(θ) a.s. → H(θ) uniformly in θ, and therefore using Lemma 3.1 of White [?], Hn(θn) − H(θ∗ ) a.s. → 0. Using these intermediate results we obtain from 2c.4(xa) of Rao =-=[46]-=- that � Hn(θn) − H(θ ∗ ) � √ � n �θn − θ ∗� P → 0. (I.57) Equation (A.2) of [20] asserts that n� ∇ log f(yt, θ ∗ ) + √ � n �θn − θ ∗� P → 0. A −1 (θ ∗ ) 1 √ n t=1 Therefore, by the finiteness of H(θ ∗... |

419 |
Maximum Likelihood Estimation of Misspecified Models
- White
- 1982
(Show Context)
Citation Context ... θn will be equal to one of the localMLEs � θm n , w.p. 1. The local-MLE � θm n is the MLE associated with the model {f(y, θ) : θ ∈ Θm } and therefore falls into the mismatch model framework of White =-=[20]-=-. Hence we have the following. Corollary 1: For all m: 1) � θm a.s. m n → θ as n → ∞, and 2) √ � n �θ m n − θm � D→ m N (0, C(θ )). In addition, by (15)-(17) we obtain the following: Corollary 2: For ... |

274 | Unsupervised learning of finite mixture models
- Figueiredo, Jain
(Show Context)
Citation Context ... INFORMATION THEORY f(Y;θ) 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 6 4 Local maximum 2 η 1 0 −2 Global maximum Fig. 5. The likelihood function of the Gaussian mixture distribution. −2 (see e.g. =-=[44]-=- and references therein). The MLE for this problem is usually found by using the EM algorithm [10]. In [44], the authors describe a method that finds the global maximum with good performance. However,... |

237 |
Maximum Likelihood, and Cramer-Rao Bound
- Stoica, Nehorai, et al.
- 1989
(Show Context)
Citation Context ...del mismatch has been recently addressed in [39] and [35]. Fig. 1. Geometrical interpretation of the construction of tests insensitive to Pitman drift. Here we adopt the standard narrow band model of =-=[40]-=-. We consider the estimation of the directions of two uncorrelated narrow band Gaussian sources using a uniform linear array of P = 4 sensors with λ/2 spacing between elements (λ is the wavelength of ... |

230 |
Two decades of array signal processing research
- Krim, Viberg
- 1996
(Show Context)
Citation Context ...at are orthogonal to deviations from the model are demonstrated. A. Direction Finding in Array Signal Processing For a review of the problem of direction finding using antenna arrays see e.g. [37] or =-=[38]-=-. The characterization of the MLE under possible model mismatch has been recently addressed in [39] and [35]. Fig. 1. Geometrical interpretation of the construction of tests insensitive to Pitman drif... |

228 | Differential–Geometrical Methods in Statistics. Springer-Verlag, Berlin Amari S - Amari - 1985 |

204 |
Continuous univariate distributions
- Johnson, Kotz, et al.
- 1994
(Show Context)
Citation Context ...articularly when � θn = � θm n , the distribution of the test statistic Sn is approximately non-central χ2 Q with non-centrality parameter nδ m = nh T (θ m )V −1 (θ m )h(θ m ) denoted by χ 2 Q (nδm ) =-=[34]-=-. We denote the χ 2 Q (nδm ) cumulative distribution function by F χ 2 Q (nδ m )(·). The finite sample power of the test against a local maximum at θ m can be approximated by [34, p. 468] � 1 − F χ 2 ... |

191 |
Modulation Theory
- Trees, Detection
- 1968
(Show Context)
Citation Context ...oise. I. INTRODUCTION THE maximum likelihood (ML) estimation method is one of the standard tools for parameter estimation. Among its appealing properties are consistency and asymptotic efficiency [1]–=-=[3]-=-. However, a major drawback of this method when applied to non-linear estimation problems is the fact that the associated likelihood equations required for the derivation of the estimator rarely have ... |

156 |
Asymptotic Properties of Non-Linear Least Squares Estimators
- Jennrich
- 1969
(Show Context)
Citation Context ...maximum is available. APPENDIX I ASYMPTOTIC DISTRIBUTION OF M-TESTS The proof follows White’s methodology [29]. Given the assumptions, the mean value theorem for random functions, given as Lemma 3 in =-=[33]-=-, guarantees the existence of measurable Θ-valued functions θn such that √ nhn( � θn) = √ nhn(θ ∗ ) + Hn(θn) √ � n �θn − θ ∗� (I.56) where each θn lies on the segment joining � θn and θ∗ . Each row of... |

138 |
Automatic lag selection in covariance matrix estimation
- Newey, West
- 1994
(Show Context)
Citation Context ...cessarily positive definite. A number of authors investigated ways of estimating the covariance matrix in scenarios in which unexpected dependencies between the measurements may occur (see e.g. [29], =-=[31]-=- and references therein). Methods for eliminating the requirement for covariance matrix estimation altogether were recently proposed in [32] for the problem of model testing in non-linear regression. ... |

87 | Exact maximum likelihood parameter estimation of superimposed exponential signals in noise - Bresler, Macovski - 1986 |

81 | Test of separate families of hypotheses - Cox - 1961 |

74 | Monotonic algorithms for transmission tomography
- Erdogan, Fessler
(Show Context)
Citation Context ... iterative, optimization procedure (e.g. the expectation maximization algorithm [9] and its variations [10], Fisher scoring [10], the Gauss-Newton method [11], and majorizing or minorizing algorithms =-=[12]-=-, [13]). As a consequence, the performance of these methods highly depends on the starting The material in this paper will be presented in part at the 2005 IEEE International Conference on Acoustics, ... |

68 | A tutorial on MM algorithms
- Hunter, Lange
- 2004
(Show Context)
Citation Context ...tive, optimization procedure (e.g. the expectation maximization algorithm [9] and its variations [10], Fisher scoring [10], the Gauss-Newton method [11], and majorizing or minorizing algorithms [12], =-=[13]-=-). As a consequence, the performance of these methods highly depends on the starting The material in this paper will be presented in part at the 2005 IEEE International Conference on Acoustics, Speech... |

53 |
Maximum likelihood specification testing and conditional moment tests
- Newey
- 1985
(Show Context)
Citation Context ...e MLE under a possible model mismatch and pose the problem of discriminating between local and global maxima as a statistical hypothesis testing problem. The general framework for constructing Mtests =-=[26]-=-–[28] is presented, and it is shown that two of the available tests in the literature are special cases of M-tests. In Sec. III, the consistency of the tests is established and an approximation of the... |

53 |
Distributed EM algorithms for density estimation and clustering in sensor networks
- Nowak
- 2003
(Show Context)
Citation Context ...fected by this type of model mismatch. B. Estimation of Gaussian Mixture Parameters The problem of estimation of Gaussian mixture parameters arises in both non-parametric density estimation (see e.g. =-=[43]-=- and references therein) and a variety of clustering problemss12 SUBMITTED TO: IEEE TRANSACTIONS ON INFORMATION THEORY f(Y;θ) 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 6 4 Local maximum 2 η 1 0 −2... |

52 | Diagnostic Testing and Evaluation of Maximum Likelihood Models - Tauchen - 1985 |

26 |
Inference and Specification Analysis
- Estimation
- 1994
(Show Context)
Citation Context ...ified but the specification of the mean is correct, the condition h(θ ∗ ) = E {y} − µ(θ ∗ ) = 0 (27) will still hold if the parametric class {f(y; θ) : θ ∈ Θ} belongs to the linear exponential family =-=[29]-=-. If the mean of the data does not depend on θ or is weakly dependent, one can improve the test by including higher order moments. For example, one can specify e(y, θ) as one or more elements of the d... |

22 |
Specification Testing in Dynamic Models
- White
- 1987
(Show Context)
Citation Context ... under a possible model mismatch and pose the problem of discriminating between local and global maxima as a statistical hypothesis testing problem. The general framework for constructing Mtests [26]–=-=[28]-=- is presented, and it is shown that two of the available tests in the literature are special cases of M-tests. In Sec. III, the consistency of the tests is established and an approximation of the fini... |

15 |
Maximum likelihood estimation for array processing in colored noise
- Nagesha, Kay
- 1993
(Show Context)
Citation Context ...ariance matrix C(θ, γ) = D(θ)KsD H (θ) + σ 2 R(γ), (52) where R(γ) is a symmetric Toeplitz matrix whose first row is [1, γ, γ 2 , γ 3 ], which corresponds to a first order AR spatial noise covariance =-=[41]-=-, and in the simulation γ = 0.1. For both Biernacki’s test and the covariance based test the effect of model mismatch on the level was evaluated for three cases: (a) The increase in level due to model... |

12 | Nonlinear regression on cross-section data
- White
- 1980
(Show Context)
Citation Context ...uarantees that V −1 n ( � θn) exists for sufficiently large n, since the determinant of a matrix is a continuous function of its elements. The last part of the theorem follows from Lemma 3.3 of White =-=[47]-=- and the proof is completed. APPENDIX II ASYMPTOTIC DISTRIBUTION OF THE TEST STATISTIC FOR EXPONENTIALS IN NOISE The derivation is given under the null hypothesis, hence �θn = � θn. Using the mean val... |

11 |
Estimation of the minimum of a function using order statistics
- Haan
- 1981
(Show Context)
Citation Context ...based on an asymptotic (in the number of starting points) result on the total probability of unobserved outcomes due to Bickel and Yahav [15]. Veall [16] used an order statistic result due to de Haan =-=[17]-=- that characterizes the distribution of the ordered values of a smooth function, sampled at random points. Given a relative maximum, the loglikelihood function is evaluated at a large number of random... |

8 |
The determination of the location of the global maximum of a function in the presence of several local extrema
- SLUMP, HOENDERS
- 1985
(Show Context)
Citation Context ...tion: Given a location of a relative maximum of the log-likelihood function, how to assess whether this is the global maximum? One approach to this question is the Kronecker-Picard integral framework =-=[6]-=-. However, the computation of this multi-dimensional integral is difficult, indeed equivalent to the complexity involved in finding the global maximum, rendering this approach impractical. Instead, in... |

8 |
Maximum likelihood localization of diversely polarized sources by simulated annealing
- Ziskind, Wax
(Show Context)
Citation Context ...ization problem. Solving this problem by applying numerical methods is usually computationally prohibitive. To date, there have been few global optimization methods applied to ML estimation (e.g. [4]–=-=[8]-=-) because of the computational complexity involved. More commonly, initiate and converge methods are applied. These methods are based on an initial guess (often found by a simpler method) which is fol... |

8 |
A test for global maximum
- Gan, Jiang
- 1999
(Show Context)
Citation Context ...n that a model mismatch is likely, the hypothesis that the relative maximum is the global one is rejected. Otherwise, the relative maximum is declared the final estimate. Independently, Gan and Jiang =-=[19]-=- made the same observation and proposed White’s information matrix test [20] as a test for global maximum. More recently, Biernacki [21], [22] proposed a new test, which is closely related to Cox’s te... |

8 |
Eliminating multiple root problems in estimation
- Small, Wang, et al.
- 2000
(Show Context)
Citation Context ...he problem of testing a relative maximum is related to the problem of eliminating spurious maxima in scenarios in which the ML estimator (MLE) is not necessarily consistent or may not even exist (see =-=[25]-=- and references therein). Although some of the results apply to that problem as well, we do not pursue this connection here. In Sec. II, we review the properties of the MLE under a possible model mism... |

8 |
Simple Robust Testing of Hypotheses in Non-linear Models
- Bunzel, Kiefer, et al.
- 2001
(Show Context)
Citation Context ...ndencies between the measurements may occur (see e.g. [29], [31] and references therein). Methods for eliminating the requirement for covariance matrix estimation altogether were recently proposed in =-=[32]-=- for the problem of model testing in non-linear regression. III. POWER ANALYSIS In order to derive the power function, the asymptotic distribution of � θn under H1 needs to be determined. Therefore, a... |

7 |
Probabilistic measures of adequacy of a numerical search for a global maximum
- Finch, Mendell, et al.
- 1989
(Show Context)
Citation Context ...were based on sampling the domain of the log-likelihood function. Given a sequence of random starting points and the corresponding set of relative maxima found by a local search method, Finch et. al. =-=[14]-=- proposed a statistical method to assess the probability that the global maximum has not yet been found based on an asymptotic (in the number of starting points) result on the total probability of uno... |

6 |
Simulated annealing for maximum a posteriori parameter estimation of hidden Markov models
- Andrieu, Doucet
- 2000
(Show Context)
Citation Context ...ptimization problem. Solving this problem by applying numerical methods is usually computationally prohibitive. To date, there have been few global optimization methods applied to ML estimation (e.g. =-=[4]-=-–[8]) because of the computational complexity involved. More commonly, initiate and converge methods are applied. These methods are based on an initial guess (often found by a simpler method) which is... |

6 |
Newton algorithm for conditional and unconditional maximum likelihood estimation of the parameters of exponential signals in noise
- Starer, Nehorai
- 1992
(Show Context)
Citation Context ...er method) which is followed by a local, often iterative, optimization procedure (e.g. the expectation maximization algorithm [9] and its variations [10], Fisher scoring [10], the Gauss-Newton method =-=[11]-=-, and majorizing or minorizing algorithms [12], [13]). As a consequence, the performance of these methods highly depends on the starting The material in this paper will be presented in part at the 200... |

6 |
Asymptotic behavior of maximum likelihood estimates of superimposed exponential signals
- Rao, Zhao
- 1993
(Show Context)
Citation Context ... are distributed as non-zero time-varying mean circular Gaussian process. Hence, the treatment in Sec. II-A does not cover this problem. Furthermore, since the MLE for this problem is super efficient =-=[45]-=-, the more general framework of White [29] for constructing tests in dynamical models does not cover this problem either. However, a detailed statistical asymptotic analysis for this problem is availa... |

4 |
A bound on meansquare estimation error with background parameter mismatch
- Xu, Baggeroer, et al.
(Show Context)
Citation Context ...γ) : θ ∈ Θ, γ ∈ Γ ⊂ RK′ } such that f(y; θ) = � f(y; θ, γ0 ) for all θ ∈ Θ, and that the true underlying density is g(y) = � f(y; θ0 , γ1 ), with θ1 close to θ0 . This setting was recently treated in =-=[35]-=-, where the parameter vector γ was referred to as the background parameter. In this case, the local equivalence and symmetry of fdivergence measures [36, p. 85] can be used to approximate the Renyi di... |

4 | Robust least-squares estimation with a relative entropy constraint
- Levy, Nikoukhah
- 2004
(Show Context)
Citation Context ...iteration, given a relative maximum � θn, c = max γ∈[0,1] D1 � �f(y; θn, � γ)||f(y; � � θn) was computed, using the known formula for the KullbackLeibler distance between two Gaussian densities (e.g. =-=[42]-=-), Level (Probability of fulse alarm) 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 Covariance based Biernacki Covariance based CT Biernacki CT Covariance based Orthogonal Biernacki Orthogonal nominal v... |

3 |
Detection of spurious maxima through random draw tests and specification tests
- Dorsey, Mayer
- 2000
(Show Context)
Citation Context ...sionality and do not generalize well to high dimensional problems. Yet high dimensional problems are exactly those in which global optimization methods are computationally demanding. Dorsey and Mayer =-=[18]-=- reported poor performance of Veall’s method and, as an alternative, proposed to use the available methods for testing parametric models to answer the question at hand. They observed that a local maxi... |

3 |
Un test pour le maximum global de vraisemblance. 35ièmes journées de statistiques
- Biernacki
- 2003
(Show Context)
Citation Context ...um is declared the final estimate. Independently, Gan and Jiang [19] made the same observation and proposed White’s information matrix test [20] as a test for global maximum. More recently, Biernacki =-=[21]-=-, [22] proposed a new test, which is closely related to Cox’s tests for separate families of hypotheses [23], [24], and showed through simulations that his new test outperforms White’s information mat... |

3 | and detection of misspecified nonlinear regression models - “Consequences - 1981 |

3 |
et al., “Highlights of statistical signal and array processing
- Hero
- 1998
(Show Context)
Citation Context ...tests that are orthogonal to deviations from the model are demonstrated. A. Direction Finding in Array Signal Processing For a review of the problem of direction finding using antenna arrays see e.g. =-=[37]-=- or [38]. The characterization of the MLE under possible model mismatch has been recently addressed in [39] and [35]. Fig. 1. Geometrical interpretation of the construction of tests insensitive to Pit... |

2 | Maximum likelihood parameter estimation of F-ARIMA processes using the genetic algorithm in the frequency domain - Bon-Sen, Bore-Kuen, et al. - 2002 |

2 |
On estimating the total probability of the unobserved outcomes of an experiment,” in Adaptive statistical procedures and related
- Bickel, Yahav
- 1986
(Show Context)
Citation Context ...the probability that the global maximum has not yet been found based on an asymptotic (in the number of starting points) result on the total probability of unobserved outcomes due to Bickel and Yahav =-=[15]-=-. Veall [16] used an order statistic result due to de Haan [17] that characterizes the distribution of the ordered values of a smooth function, sampled at random points. Given a relative maximum, the ... |

2 |
Testing for a glomal maximum in an econometric context
- Veall
- 1990
(Show Context)
Citation Context ...ity that the global maximum has not yet been found based on an asymptotic (in the number of starting points) result on the total probability of unobserved outcomes due to Bickel and Yahav [15]. Veall =-=[16]-=- used an order statistic result due to de Haan [17] that characterizes the distribution of the ordered values of a smooth function, sampled at random points. Given a relative maximum, the loglikelihoo... |

2 |
results on tests of separate families of hypotheses
- “Further
- 1962
(Show Context)
Citation Context ...s information matrix test [20] as a test for global maximum. More recently, Biernacki [21], [22] proposed a new test, which is closely related to Cox’s tests for separate families of hypotheses [23], =-=[24]-=-, and showed through simulations that his new test outperforms White’s information matrix test. A drawback of the methods of [18], [19], and [22] is that they are sensitive to model mismatch. In parti... |

2 | General asymptotic analysis of the generalized likelihood ratio test for a Gaussian point source under statistical or spatial mismodeling
- Friedmann, Fishler, et al.
- 2002
(Show Context)
Citation Context ...al Processing For a review of the problem of direction finding using antenna arrays see e.g. [37] or [38]. The characterization of the MLE under possible model mismatch has been recently addressed in =-=[39]-=- and [35]. Fig. 1. Geometrical interpretation of the construction of tests insensitive to Pitman drift. Here we adopt the standard narrow band model of [40]. We consider the estimation of the directio... |