## Nonnegativity Constraints in Numerical Analysis

Citations: | 11 - 2 self |

### BibTeX

@MISC{Chen_nonnegativityconstraints,

author = {Donghui Chen and Robert J. Plemmons},

title = {Nonnegativity Constraints in Numerical Analysis},

year = {}

}

### OpenURL

### Abstract

A survey of the development of algorithms for enforcing nonnegativity constraints in scientific computation is given. Special emphasis is placed on such constraints in least squares computations in numerical linear algebra and in nonlinear optimization. Techniques involving nonnegative low-rank matrix and tensor factorizations are also emphasized. Details are provided for some important classical and modern applications in science and engineering. For completeness, this report also includes an effort toward a literature survey of the various algorithms and applications of nonnegativity constraints in numerical analysis. Key Words: nonnegativity constraints, nonnegative least squares, matrix and tensor factorizations, image processing, optimization.

### Citations

9735 | The Nature of Statistical Learning Theory
- Vapnik
- 1995
(Show Context)
Citation Context ...tivity constraints 5.1 Support vector machines Support Vector machines were introduced by Vapnik and co-workers [13, 24] theoretically motivated by Vapnik-Chervonenkis theory (also known as VC theory =-=[88, 89]-=-). Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression. They belong to a family of generalized linear classifiers. They are based on ... |

2980 | Eigenfaces for Recognition - Turk, Pentland - 1990 |

2336 | Support Vector Networks
- Cortes, Vapnik
- 1995
(Show Context)
Citation Context ...mputed and multiplied with the nonzero elements of T . 18s5 Some Applications of Nonnegativity Constraints 5.1 Support Vector Machines Support Vector machines were introduced by Vapnik and co-workers =-=[13, 23]-=- theoretically motivated by Vapnik-Chervonenkis theory (also known as VC theory [87, 88]). Support vector machines (SVMs) are a set of related supervised learning methods used for classification and r... |

2113 |
Numerical Optimization
- Nocedal, Wright
(Show Context)
Citation Context ...tually variants of a simple optimization technique that has been used for decades, and are known under various names such as alternating variables, coordinate search, or the method of local variation =-=[63]-=-. While statements about global convergence in the most general cases have not been proven for the method of alternating variables, a bit has been said about certain special cases. For instance, [74] ... |

1620 | Eigenfaecs vs. fisherfaces: Recognition using class specific linear projection
- Belhumer, Hespanha, et al.
- 1997
(Show Context)
Citation Context ...the underlying parameters represent quantities that can take on only nonnegative values, e.g., amounts of materials, chemical concentrations, pixel intensities, to name a few. In such a case, problem =-=(3)-=- must be modified to include nonnegativity constraints on the model parameters x. The resulting problem is called Nonnegative Least Squares (NNLS), and is formulated as follows: (3)Nonnegativity cons... |

1488 |
Practical optimization
- Gill, Murray, et al.
- 1986
(Show Context)
Citation Context ...or Point Method Bro and Jong’s Fast NNLS Projected Landweber method Principal Block Pivoting method Fast Combinatorial NNLS Sequential Coordinate-wise Alg. 3.2.1 Active set methods Active-set methods =-=[31]-=- are based on the observation that only a small subset of constraints are usually active (i.e. satisfied exactly) at the solution. There are n inequality constraints in NNLS problem. The ith constrain... |

1392 | A training algorithm for optimal margin classifiers
- Boser, Guyon, et al.
- 1992
(Show Context)
Citation Context ...mputed and multiplied with the nonzero elements of T . 18s5 Some Applications of Nonnegativity Constraints 5.1 Support Vector Machines Support Vector machines were introduced by Vapnik and co-workers =-=[13, 23]-=- theoretically motivated by Vapnik-Chervonenkis theory (also known as VC theory [87, 88]). Support vector machines (SVMs) are a set of related supervised learning methods used for classification and r... |

1387 | On information and sufficiency - Kullback, Leibler - 1951 |

1073 |
Learning the parts of objects by nonnegative matrix factorization
- Lee, Seung
- 1999
(Show Context)
Citation Context ...on, the reduced quadratic approximation, and the descent search. Specific implementations generally can be categorized into alternating least squares algorithms [65], multiplicative update algorithms =-=[42, 53, 54]-=-, gradient descent algorithms, and hybrid algorithms [68, 70]. Some general assessments of these methods can be found in [20, 57]. It appears that there is much room for improvement of numerical metho... |

1028 | Face Recognition Using Eigenfaces - Turk, Pentland - 1991 |

976 |
The Elements of
- Hastie, Tibshirani, et al.
- 2001
(Show Context)
Citation Context ...unit length and usually nonnegative.88 Donghui Chen and Robert J. Plemmons The indexing matrix contains lot of information for retrieval. In the context of latent semantic indexing (LSI) application =-=[10, 38]-=-, for example, suppose a query represented by a row vector q T = [q1,...,qm] ∈ R m , where qi denotes the weight of term i in the query q, is submitted. One way to measure how the query q matches the ... |

954 | Face recognition: A literature survey - Zhao, Chellappa, et al. - 2003 |

801 | Algorithms for non-negative matrix factorization
- Lee, Seung
- 2001
(Show Context)
Citation Context ...on, the reduced quadratic approximation, and the descent search. Specific implementations generally can be categorized into alternating least squares algorithms [65], multiplicative update algorithms =-=[42, 53, 54]-=-, gradient descent algorithms, and hybrid algorithms [68, 70]. Some general assessments of these methods can be found in [20, 57]. It appears that there is much room for improvement of numerical metho... |

674 |
Matrix Iterative Analysis
- VARGA
- 1962
(Show Context)
Citation Context ...ory of nonnegative matrices, such as the classical PerronFrobenious theory, have been included in various books. For more details the reader is referred to the books, in chronological order, by Varga =-=[90]-=-, by Berman and Plemmons. [8], and by Bapat and Raghavan [6]. This topic leads naturally to the concepts of inverse-positivity, monotonicity and iterative methods, and Mmatrix computations. For exampl... |

323 | Concept Decompositions for Large Sparse Text Data Using Clustering
- Dhillon, Modha
(Show Context)
Citation Context ...however, the matrix Y is never exact. A major challenge in the field has been to represent the indexing matrix and the queries in a more compact form so as to facilitate the computation of the scores =-=[25, 66]-=-. The idea of representing Y by its nonnegative matrix factorization approximation seems plausible. In this context, the standard parts wi indicated in (31) may be interpreted as subcollections of som... |

317 | Non-negative matrix factorization with sparseness constraints
- Hoyer
- 2004
(Show Context)
Citation Context ...ition. It is suggested that the factorization in the linear model would enable the identification and classification of intrinsic “parts” that make up the object being imaged by multiple observations =-=[19, 43, 53, 55]-=-. More specifically, each column xj of a nonnegative matrix X now represents m pixel values of one image. The columns wi of W are basis elements in R m . The columns of H, belonging to R k , can be th... |

291 |
Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics
- Paatero, Tapper
- 1994
(Show Context)
Citation Context ...irection iterations, the projected Newton, the reduced quadratic approximation, and the descent search. Specific implementations generally can be categorized into alternating least squares algorithms =-=[65]-=-, multiplicative update algorithms [42, 53, 54], gradient descent algorithms, and hybrid algorithms [68, 70]. Some general assessments of these methods can be found in [20, 57]. It appears that there ... |

289 |
Foundations of the PARAFAC procedure: Model and Conditions for an ”Explanatory” Mutli-code Factor Analysis
- Harshman
- 1970
(Show Context)
Citation Context ...ation to the tensor T . Fig. 2 An illustration of 3-D tensor factorization. Alternating least squares for NTF A common approach to solving Equation (18) is an alternating least squares(ALS) algorithm =-=[29, 37, 85]-=-, due to its simplicity and ability to handle constraints. At each inner iteration, we compute an entire factor matrix while holding all the others fixed. Starting with random initializations for X, Y... |

271 |
Nonnegative matrices in the mathematical sciences
- BERMAN, PLEMMONS
- 1979
(Show Context)
Citation Context ...uch as the classical Perron-Frobenious theory, have been included in various books. For more details the reader is referred to the books, in chronological order, by Varga [89], by Berman and Plemmons =-=[8]-=-, and by Bapat and Raghavan [6]. This topic leads naturally to the concepts of inverse-positivity, monotonicity and iterative methods, and M-matrix computations. For example, M-Matrices A have positiv... |

242 |
An efficient method for finding the minimum of a function in several variables without calculating derivatives
- Powell
- 1964
(Show Context)
Citation Context ...f alternating variables, a bit has been said about certain special cases. For instance, [74] proved that every limit point of a sequence of alternating variable iterates is a stationary point. Others =-=[72, 73, 91]-=- proved convergence for special classes of objective functions, such as convex quadratic functions. Furthermore, it is known that an ALS algorithm that properly enforces nonnegativity, for example, th... |

219 | Face recognition by independent component analysis
- Bartlett, Movellon, et al.
(Show Context)
Citation Context ...l f(x) = 1 2‖Ax − b‖2 , i.e. min f(x) = x 1 2 ‖Ax − b‖2 , (4) subject to x ≥ 0. The gradient of f(x) is ∇f(x) = A T (Ax − b) and the KKT optimality conditions for NNLS problem (4) are x ≥ 0 ∇f(x) ≥ 0 =-=(5)-=- ∇f(x) T x = 0. Some of the iterative methods for solving (4) are based on the solution of the corresponding linear complementarity problem (LCP). Linear Complementarity Problem: Given a matrix A ∈ R ... |

148 | Projected gradient methods for non-negative matrix factorization
- Lin
- 2005
(Show Context)
Citation Context ...f the most successful applications of image analysis and understanding, face recognition has recently received significant attention, especially during the past few years. Recently, many papers, like =-=[9, 43, 53, 56, 65]-=- have proved that Nonnegative Matrix Factorization (NMF) is a good method to obtain a representation of data using non-negativity constraints. These constraints lead to a part-based representation bec... |

132 | Learning spatially localized, parts-based representation
- Li, Hou, et al.
- 2001
(Show Context)
Citation Context ...ition. It is suggested that the factorization in the linear model would enable the identification and classification of intrinsic “parts” that make up the object being imaged by multiple observations =-=[19, 43, 53, 55]-=-. More specifically, each column xj of a nonnegative matrix X now represents m pixel values of one image. The columns wi of W are basis elements in R m . The columns of H, belonging to R k , can be th... |

126 | Algorithms and Applications for Approximate Nonnegative Matrix Factorization
- Berry, Browne, et al.
- 2007
(Show Context)
Citation Context .... A suitable representation for data is essential to applications in fields such as statistics, signal and image processing, machine learning, and data mining. (See, e.g., the survey by Berry, et al. =-=[9]-=-). Low rank constraints on high dimensional massive data sets are prevalent in dimensionality reduction and data analysis across numerous scientific disciplines. Techniques for dimensionality reductio... |

88 | On the Equivalence of Nonnegative Matrix Factorization and Spectral Clustering
- Ding, He, et al.
- 2005
(Show Context)
Citation Context ...asis elements in R m . The columns of H, belonging to R k , can be thought of as coefficient sequences representing the n images in the basis elements. In other words, the relationship k∑ xj = wihij, =-=(27)-=- i=1 can be thought of as that there are standard parts wi in a variety of positions and that each image represented as a vector xj making up the factor W of basis elements is made by superposing thes... |

85 | Non-negative tensor factorization with applications to statistics and computer vision - Shashua, Hazan - 2005 |

79 | On the solution of large quadratic programming problems with bound constraints - Moré, Toraldo - 1991 |

73 |
Human and Machine Recognition of Faces: A
- Chellappa, Wilson, et al.
- 1995
(Show Context)
Citation Context ... norm of the matricized array, i.e., the square root of the sum of squares of all its elements. Nonnegative Rank-k Tensor Decomposition Problem: min x (i),y(i),z(i) ||T − r∑ x (i) ◦ y (i) ◦ z (i) ||, =-=(18)-=- i=1 subject to: x (i) ≥ 0,y (i) ≥ 0,z (i) ≥ 0 where T ∈ R m×n×p ,x (i) ∈ R m ,y (i) ∈ R n ,z (i) ∈ R p . Note that Equation (18) defines matrices X which is m×k, Y which is n×k, and X which is p × k.... |

73 | Speech recognition using SVMs - Smith, Gales - 2002 |

57 |
A fast non-negativity-constrained least squares algorithm
- Bro, Jong
- 1997
(Show Context)
Citation Context ...on the unconstrained subset of the variables. The NNLS algorithm of Lawson and Hanson [49] is an active set method, and was the de facto method for solving (34) for many years. Recently, Bro and Jong =-=[15]-=- modified it and developed a method called Fast NNLS (FNNLS), which often speeds up the basic algorithm, especially in the presence of multiple right-hand sides, by avoiding unnecessary recomputations... |

52 | Linear image coding for regression and classification using tensor-rank principle - Shashua, Levin |

47 |
Text mining using non-negative matrix factorizations
- Pauca, Shahnaz, et al.
- 2004
(Show Context)
Citation Context ...Specific implementations generally can be categorized into alternating least squares algorithms [65], multiplicative update algorithms [42, 53, 54], gradient descent algorithms, and hybrid algorithms =-=[68, 70]-=-. Some general assessments of these methods can be found in [20, 57]. It appears that there is much room for improvement of numerical methods. Although schemes and approaches are different, any numeri... |

44 | H (2006) Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares - Kim, Park - 2005 |

40 |
Nonnegative Matrix Factorization for Spectral Data Analysis
- Pauca, Piper, et al.
(Show Context)
Citation Context ...n-imaging data such as spectra of visible and NIR range, with90 Donghui Chen and Robert J. Plemmons different spectral resolutions and in the presence of noise and atmospheric turbulence (See, e.g., =-=[69]-=- or [70, 71].). This is the research area of space object identification (SOI). A primary goal of using remote sensing image data is to identify materials present in the object or scene being imaged a... |

39 | Lower dimensional representation of text data based on centroids and least squares
- Park, Jeon, et al.
- 2003
(Show Context)
Citation Context ...however, the matrix Y is never exact. A major challenge in the field has been to represent the indexing matrix and the queries in a more compact form so as to facilitate the computation of the scores =-=[26, 67]-=-. The idea of representing Y by its nonnegative matrix factorization approximation seems plausible. In this context, the standard parts wi indicated in (27) may be interpreted as subcollections of som... |

35 | A multi-span language modelling framework for large vocabulary speech recognition
- Bellegarda
- 1998
(Show Context)
Citation Context ...ch as word trigger models, high-order n-grams, cache models, etc., have been used in combination with standard n-gram models. One such method, a Latent Semantic Analysis based model has been proposed =-=[2]-=-. A word-document occurrence matrix Xm×n is formed (m = size of the vocabulary, n = number 22sof documents), using a training corpus explicitly segmented into a collection of documents. A Singular Val... |

34 | Optimality, computation and interpretation of nonnegative matrix factorizations. Preprint. Available online at http://www4.ncsu.edu/∼mtchu/Research/Papers/nnmf.ps
- Chu, Diele, et al.
- 2005
(Show Context)
Citation Context ...ng least squares algorithms [64], multiplicative update algorithms [40, 51, 52], gradient descent algorithms, and hybrid algorithms [67, 69]. Some general assessments of these methods can be found in =-=[20, 56]-=-. It appears that there is much room for improvement of numerical methods. Although schemes and approaches are different, any numerical method is essentially centered around satisfying the first order... |

33 | Fast Newton-type methods for the least squares nonnegative matrix approximation problem - Kim, Sra, et al. - 2007 |

33 | Distinctive feature detection using support vector machines - Niyogi, Burges, et al. - 1999 |

32 | Enforcing nonnegativity in image reconstruction algorithms
- Nagy, Strakoš
(Show Context)
Citation Context ...− b‖2 , subject to x ≥ 0. (30) Thus, we can use NNLS to solve this problem. Experiments show that enforcing a nonnegativity constraint can produce a much more accurate approximate solution, see e.g., =-=[36, 45, 61, 78]-=-. 5.3 Text mining Assume that the textual documents are collected in an matrix Y = [yij] ∈ R m×n . Each document is represented by one column in Y. The entry yij represents the weight of one particula... |

29 |
On the convergence of the block nonlinear Gauss-Seidel method under convex constraints
- Grippo, Sciandrone
- 2000
(Show Context)
Citation Context ...it is known that an ALS algorithm that properly enforces nonnegativity, for example, through the nonnegative least squares (NNLS) algorithm of Lawson and Hanson [49], will converge to a local minimum =-=[11, 32, 54]-=-. 4.2 Nonnegative Tensor Decomposition Nonnegative Tensor Factorization (NTF) is a natural extension of NMF to higher dimensional data. In NTF, high-dimensional data, such as hyperspectral or other im... |

28 |
Non-negative sparse coding, neural networks for signal processing XII
- Hoyer
- 1992
(Show Context)
Citation Context ...on, the reduced quadratic approximation, and the descent search. Specific implementations generally can be categorized into alternating least squares algorithms [65], multiplicative update algorithms =-=[42, 53, 54]-=-, gradient descent algorithms, and hybrid algorithms [68, 70]. Some general assessments of these methods can be found in [20, 57]. It appears that there is much room for improvement of numerical metho... |

26 |
Fast algorithm for the solution of large-scale non-negativity-constrained least squares problems
- Benthem, Keenan
(Show Context)
Citation Context ..., which often speeds up the basic algorithm, especially in the presence of multiple right-hand sides, by avoiding unnecessary recomputations. A recent variant of FNNLS, called fast combinatorial NNLS =-=[4]-=-, appropriately rearranges calculations to achieve further speedups in the presence of multiple right hand sides. However, all of these approaches still depend on A T A, or the normal equations in fac... |

26 |
Solutions to some functional equations and their applications to characterization of probability distributions
- Khatri, Rao
- 1968
(Show Context)
Citation Context ...1 · · · AmnBmn The symbol ⊗ denotes the Kronecker product, i.e. ⎛ ⎞ A11B · · · A1nB ⎜ A ⊗ B = ⎝ . . .. ⎟ . ⎠ (16) Am1B · · · AmnB And the symbol ⊙ denotes the Khatri-Rao product (columnwise Kronecker)=-=[44]-=-, A ⊙ B = (A1 ⊗ B1 · · · An ⊗ Bn). (17)Nonnegativity constraints in numerical analysis 83 where Ai,Bi are the columns of A,B respectively. The concept of matricizing or unfolding is simply a rearrang... |

26 |
Minimizing a function without calculating derivatives
- Zangwill
- 1967
(Show Context)
Citation Context ...f alternating variables, a bit has been said about certain special cases. For instance, [74] proved that every limit point of a sequence of alternating variable iterates is a stationary point. Others =-=[72, 73, 91]-=- proved convergence for special classes of objective functions, such as convex quadratic functions. Furthermore, it is known that an ALS algorithm that properly enforces nonnegativity, for example, th... |

25 | Quasi-Newton approach to nonnegative image restorations
- Hanke, Nagy, et al.
- 2000
(Show Context)
Citation Context ...− b‖2 , subject to x ≥ 0. (30) Thus, we can use NNLS to solve this problem. Experiments show that enforcing a nonnegativity constraint can produce a much more accurate approximate solution, see e.g., =-=[36, 45, 61, 78]-=-. 5.3 Text mining Assume that the textual documents are collected in an matrix Y = [yij] ∈ R m×n . Each document is represented by one column in Y. The entry yij represents the weight of one particula... |

23 |
Recent developments in CANDECOMP/ PARAFAC algorithms: A critical review
- Faber, Bro, et al.
(Show Context)
Citation Context ...on to the tensor T . Figure 2: An illustration of 3-D tensor factorization. Alternating least squares for NTF A common approach to solving Equation (22) is an alternating least squares(ALS) algorithm =-=[28, 36, 84]-=-, due to its simplicity and ability to handle constraints. At each inner iteration, we compute an entire factor matrix while holding all the others fixed. Starting with random initializations for X, Y... |

22 | Hierarchical als algorithms for nonnegative matrix and 3d tensor factorization
- Cichocki, Zdunek, et al.
- 2007
(Show Context)
Citation Context ...t depend on the initial frame of reference. Recently, tensor analysis techniques have become a widely applied tool, especially in the processing of massive data sets. (See the work of Cichocki et al. =-=[22]-=- and Ho [39], as well as the program for the 2008 Stanford Workshop on Modern Massive Data Sets on the web page http://www.stanford.edu/group/mmds/). Together, NNLS, NMF and NTF are used in various ap... |

22 | Hopke, P.K., “Receptor Modeling in Environmental Chemistry - Hopke - 1985 |

21 | Image processing software for imaging spectrometry data analysis: Remote Sensing of Environment, v - Mazer, Martin, et al. - 1988 |