## Subset selection in noise based on diversity measure minimization (2003)

Venue: | IEEE Trans. Signal Processing |

Citations: | 40 - 10 self |

### BibTeX

@ARTICLE{Rao03subsetselection,

author = {Bhaskar D. Rao and Kjersti Engan and Shane F. Cotter and Jason Palmer and Kenneth Kreutz-delgado and Senior Member},

title = {Subset selection in noise based on diversity measure minimization},

journal = {IEEE Trans. Signal Processing},

year = {2003},

pages = {760--770}

}

### Years of Citing Articles

### OpenURL

### Abstract

Abstract—In this paper, we develop robust methods for subset selection based on the minimization of diversity measures. A Bayesian framework is used to account for noise in the data and a maximum a posteriori (MAP) estimation procedure leads to an iterative procedure which is a regularized version of the FOCal Underdetermined System Solver (FOCUSS) algorithm. The convergence of the regularized FOCUSS algorithm is established and it is shown that the stable fixed points of the algorithm are sparse. We investigate three different criteria for choosing the regularization parameter: quality of fit, sparsity criterion, and-curve. The-curve method, as applied to the problem of subset selection, is found not to be robust, and we propose a novel modified-curve procedure that solves this problem. Each of the regularized FOCUSS algorithms is evaluated through simulation of a detection problem, and the results are compared with those obtained using a sequential forward selection algorithm termed orthogonal matching pursuit (OMP). In each case, the regularized FOCUSS algorithm is shown to be superior to the OMP in noisy environments. Index Terms—Diversity measures, linear inverse problems, matching pursuit, regularization, sparsity, subset selection, undetermined systems. I.

### Citations

1957 |
CV: Matrix Computations
- Golub, Loan
- 1996
(Show Context)
Citation Context ...sure leads to the FOCUSS algorithm [4], [5]. The algorithm is iterative and produces intermediate approximate solutions according to where diag , and is used to denote the Moore–Penrose pseudoinverse =-=[19]-=-. The properties of this (3) algorithm have been examined in depth in [4], [5], and [18]. Intuitively, the algorithm can be explained by noting that there is competition between the columns of to repr... |

1652 | Atomic decomposition by basis pursuit
- Chen, Donoho, et al.
- 2001
(Show Context)
Citation Context ...olution based on the smallest -norm is not appropriate. The minimum -norm criterion favors solutions with many small nonzero entries, which is a property that is contrary to the goal of sparsity [4], =-=[15]-=-. Consequently, there is a need to consider the minimization of alternative measures that promote sparsity. In this context, of particular interest are diversity measures that are functionals that mea... |

1048 | Matching pursuits with time-frequency dictionaries
- Mallat, Zhang
- 1993
(Show Context)
Citation Context ...are many solutions to the system of equations in (1) and the subset selection problem corresponds to identifying a few columns of the matrix , which can be used to represent the data vector [1], [2], =-=[14]-=-. This corresponds to finding a solution with few nonzero entries that satisfies (1), and such a solution is said to be sparse. Finding an optimal solution to this problem generally requires a combina... |

1041 |
Introduction to Linear and Nonlinear Programming
- Luenberger
- 1973
(Show Context)
Citation Context ...his is presented next in Lemma 1, which specializes more general results to be found in [24] and [25]. Lemma 1: (17) where diag . Proof: Consider the scalar function sgn , , and . Since it is concave =-=[26]-=- Hence sgn sgn Substituting and ,wehave sgn sgn The above inequality applies to each of the components of leading to IV. CONVERGENCE RESULTS Throughout this discussion, a sparse solution refers to a b... |

257 | Learning overcomplete representations
- Lewicki, Sejnowski
(Show Context)
Citation Context ...ntrols the shape, and is a generalized variance. For instance, setting reduces this generalized form to that of a Laplacian distribution that has been assumed as the prior distribution of in [15] and =-=[22]-=-. If we set and , this distribution reduces to the standard normal distribution. If a unit variance distribution is desired, i.e., , then becomes a function of as given by Therefore, only one paramete... |

216 | Sparse signal reconstruction from limited data using FOCUSS: a re-weighted norm minimization algorithm
- Gorodnitsky, Rao
- 1997
(Show Context)
Citation Context ...inatorial search that is computationally unattractive. Therefore, suboptimal techniques are usually employed [1], [2]. We discuss one such method called FOCUSS, which has been extensively examined in =-=[4]-=- and [5]. The FOCUSS method was motivated by the observation that if a sparse solution is desired then choosing a solution based on the smallest -norm is not appropriate. The minimum -norm criterion f... |

179 |
Analysis of Discrete Ill-Posed Problems by Means of the L-Curve
- Hansen
- 1992
(Show Context)
Citation Context ...d uses a small number of vectors from the available dictionary. Finally, we experiment with an -curve criterion, which seeks to trade off the representation error and the size of the selection subset =-=[8]-=-, [9]. This criterion is applicable to the problem of dictionary/frame learning as considered in [10] and [11]. However, as applied to the problem of subset selection, we find that the -curve method d... |

110 |
The use of the L-curve in the regularization of discrete ill-posed problems
- Hansen, O’Leary
- 1993
(Show Context)
Citation Context ...st and second derivatives respectively. Alternatively, as in =-=[9]-=- and [28], the curvature computation may be done in the log–log scale, that is, , . The argument made for the adoption of this scale in [9] is that the corner is found to be more distinct in the log–log scale. However, a problem pointed out in [29] is that the -curve in the log–log scale is, in general, no longer convex. In [30], a linea... |

78 | An affine scaling methodology for best basis selection
- Rao, Kreutz-Delgado
- 1999
(Show Context)
Citation Context ...ables I and II for different values of SNR. In these tables, is an additional factor used in the FOCUSS algorithm as given in (14) and can be used to trade off convergence speed against sparsity [4], =-=[5]-=-. is the user chosen factor that determines the error bound used when running FOCUSS and using the discrepancy principle to determine . The column headed by # gives the average number of vectors selec... |

75 |
Adaptive time-frequency decompositions
- Davis, Mallat, et al.
- 1994
(Show Context)
Citation Context ...rithms. The results obtained using each of the regularized FOCUSS algorithms are compared to the results of an improved sequential forward selection algorithm termed orthogonal matching pursuit (OMP) =-=[12]-=-, [13]. We conclude that the regularized FOCUSS procedures give much better results than OMP in detecting the correct subset in noisy environments. The outline of this paper is as follows. In Section ... |

73 |
Neuromagnetic source imaging with focuss: a recursive weighted minimum norm algorithm, Electroencephalography and Clinical Neurophysiology 95
- Gorodnitsky, George, et al.
- 1995
(Show Context)
Citation Context ...n proposed for finding suboptimal solutions to the problem, including algorithms based on forward sequential search or elimination of elements from the full dictionary available [2]. In previous work =-=[3]-=-–[5], an iterative algorithm termed FOCal Underdetermined System Solver (FOCUSS) has been developed based on the minimization of diversity measures. This algorithm essentially removes elements from th... |

32 |
Parametric generalized Gaussian density estimation
- Varanasi, Aazhang
- 1989
(Show Context)
Citation Context ...or this purpose [5], [17]. The elements are assumed to be i.i.d. random variables with a generalized Gaussian distribution. The pdf of the generalized Gaussian distribution family is defined as [20], =-=[21]-=- (6) where is the standard gamma function. The factor controls the shape, and is a generalized variance. For instance, setting reduces this generalized form to that of a Laplacian distribution that ha... |

24 |
Forward sequential algorithms for best basis selection
- Cotter, Adler, et al.
- 1999
(Show Context)
Citation Context .... The results obtained using each of the regularized FOCUSS algorithms are compared to the results of an improved sequential forward selection algorithm termed orthogonal matching pursuit (OMP) [12], =-=[13]-=-. We conclude that the regularized FOCUSS procedures give much better results than OMP in detecting the correct subset in noisy environments. The outline of this paper is as follows. In Section II, we... |

22 |
Signal processing with the sparseness constraint
- Rao
- 1998
(Show Context)
Citation Context ...y, subset selection, undetermined systems. I. INTRODUCTION SUBSET selection algorithms have received a lot of attention in recent years because of the large number of applications in which they arise =-=[1]-=-. The task of a subset selection algorithm can be viewed, in many instances, as that of selecting a small number of elements or vectors from a large collection of elements (termed a dictionary) that a... |

22 |
A Regularization Parameter in Discrete Ill-Posed Problems
- Regińska
- 1996
(Show Context)
Citation Context ...done in the log–log scale, that is, , . The argument made for the adoption of this scale in [9] is that the corner is found to be more distinct in the log–log scale. However, a problem pointed out in =-=[29]-=- is that the -curve in the log–log scale is, in general, no longer convex. In [30], a linear scale -curve is used, and in [31], both linear and log–log scale -curves are mentioned. In fact, experiment... |

15 | Using the L-curve for determining optimal regularization parameters
- Engl, Grever
- 1994
(Show Context)
Citation Context ... be more distinct in the log–log scale. However, a problem pointed out in [29] is that the -curve in the log–log scale is, in general, no longer convex. In [30], a linear scale -curve is used, and in =-=[31]-=-, both linear and log–log scale -curves are mentioned. In fact, experiments have shown that the log–log curve often has several corners, and finding the maximum curvature in this scale does not necess... |

10 |
An application of the Wiener-Kolmogorov smoothing theory to matrix inversion
- Foster
- 1995
(Show Context)
Citation Context ...his paper, the algorithm is also useful in the overdetermined context. D. Interpretation as Regularized FOCUSS The algorithm given in (14) has an interesting interpretation as Tikhonov regularization =-=[23]-=- applied to (4). This can be readily seen by rewriting (14) as , where is obtained as (15) Alternately and equivalently, can be shown to be the solution to the following optimization problem: where (1... |

9 |
An efficient implementation of the backward greedy algorithm for sparse signal reconstruction
- Reeves
- 1999
(Show Context)
Citation Context ...the FOCUSS algorithm in the manner suggested in Section V-A. If the FOCUSS procedure returns more columns than desired, one can prune the selected subset using OMP or a backward elimination procedure =-=[27]-=-. At this stage, we can choose to proceed using either the OMP or FOCUSS generated solution, depending on which is better. C. Modified -Curve Method A final possibility is that the number of dictionar... |

8 |
Frame based signal representation and compression
- Engan
- 2001
(Show Context)
Citation Context ...e criterion, which seeks to trade off the representation error and the size of the selection subset [8], [9]. This criterion is applicable to the problem of dictionary/frame learning as considered in =-=[10]-=- and [11]. However, as applied to the problem of subset selection, we find that the -curve method did not provide robust solutions. This leads us to develop a novel modified -curve procedure to determ... |

5 | Subset selection algorithms with applications - Cotter - 2001 |

5 |
Sparse basis selection, ica, and majorization: towards a unified perspective
- Kreutz-Delgado, Rao
- 1999
(Show Context)
Citation Context ...e diversity measures that are functionals that measure the lack of concentration/sparsity and algorithms for minimizing these measures to obtain sparse solutions. There are many measures of diversity =-=[16]-=-, [17], but a set of diversity measures that has been found to produce very good results as applied to the subset selection problem is the diversity measure given by [5], [18] (1) sgn (2) Minimization... |

4 |
Optimized signal expansions for sparse representation
- Aase, Husoy, et al.
- 2001
(Show Context)
Citation Context ...on, which seeks to trade off the representation error and the size of the selection subset [8], [9]. This criterion is applicable to the problem of dictionary/frame learning as considered in [10] and =-=[11]-=-. However, as applied to the problem of subset selection, we find that the -curve method did not provide robust solutions. This leads us to develop a novel modified -curve procedure to determine the r... |

3 |
A globally convergent algorithm for maximum likelihood estimation in the Bayesian linear model with nonGaussian source and noise priors
- Palmer, Kreutz-Delgado
- 2002
(Show Context)
Citation Context ...need a preparatory result that helps connect to the quadratic cost function being minimized at each iteration. This is presented next in Lemma 1, which specializes more general results to be found in =-=[24]-=- and [25]. Lemma 1: (17) where diag . Proof: Consider the scalar function sgn , , and . Since it is concave [26] Hence sgn sgn Substituting and ,wehave sgn sgn The above inequality applies to each of ... |

2 |
Regularized FOCUSS for subset selection in noise
- Engan, Rao, et al.
- 2000
(Show Context)
Citation Context .... Digital Object Identifier 10.1109/TSP.2002.808076 shown to outperform other subset selection algorithms in low noise environments. The goal of this paper, which expands on work presented in [6] and =-=[7]-=-, is to extend the FOCUSS algorithm so that it can be used in subset selection problems where the signal-to-noise ratio (SNR) is low. A formal methodology is developed for deriving algorithms that can... |

2 |
An l-curve approach to optimal determination of regularization parameter in image restoration
- Leung, Lu
- 1993
(Show Context)
Citation Context ... scale in [9] is that the corner is found to be more distinct in the log–log scale. However, a problem pointed out in [29] is that the -curve in the log–log scale is, in general, no longer convex. In =-=[30]-=-, a linear scale -curve is used, and in [31], both linear and log–log scale -curves are mentioned. In fact, experiments have shown that the log–log curve often has several corners, and finding the max... |

1 |
selection in the presence of noise
- “Basis
- 1998
(Show Context)
Citation Context ....his.no). Digital Object Identifier 10.1109/TSP.2002.808076 shown to outperform other subset selection algorithms in low noise environments. The goal of this paper, which expands on work presented in =-=[6]-=- and [7], is to extend the FOCUSS algorithm so that it can be used in subset selection problems where the signal-to-noise ratio (SNR) is low. A formal methodology is developed for deriving algorithms ... |

1 |
and algorithms for best basis selection
- “Measures
- 1998
(Show Context)
Citation Context ...ny measures of diversity [16], [17], but a set of diversity measures that has been found to produce very good results as applied to the subset selection problem is the diversity measure given by [5], =-=[18]-=- (1) sgn (2) Minimization of this diversity measure leads to the FOCUSS algorithm [4], [5]. The algorithm is iterative and produces intermediate approximate solutions according to where diag , and is ... |

1 |
Function curvature, relative concavity, and a new criterion for sub- and super-gaussianity
- Palmer
- 2002
(Show Context)
Citation Context ...eparatory result that helps connect to the quadratic cost function being minimized at each iteration. This is presented next in Lemma 1, which specializes more general results to be found in [24] and =-=[25]-=-. Lemma 1: (17) where diag . Proof: Consider the scalar function sgn , , and . Since it is concave [26] Hence sgn sgn Substituting and ,wehave sgn sgn The above inequality applies to each of the compo... |

1 |
Limitations of the L-curve method in discrete ill-posed problems,” BIT
- Hanke
- 1996
(Show Context)
Citation Context ...at a good choice of value for is the one corresponding to the corner in the . Furthermore, it is suggested that the corner of the -shaped curve can be found by finding the maximum curvature [8], [9], =-=[28]-=-. The plot of versus can be shown to be convex [9], and the point of maximum curvature represents a tradeoff point between sparsity and accuracy. The curvature can be computed by means of the formula ... |