## Nonnegative approximations of nonnegative tensors (2009)

### Cached

### Download Links

Venue: | Jour. Chemometrics |

Citations: | 12 - 6 self |

### BibTeX

@ARTICLE{Lim09nonnegativeapproximations,

author = {Lek-heng Lim and Pierre Comon},

title = {Nonnegative approximations of nonnegative tensors},

journal = {Jour. Chemometrics},

year = {2009},

pages = {432--441}

}

### OpenURL

### Abstract

Abstract. We study the decomposition of a nonnegative tensor into a minimal sum of outer product of nonnegative vectors and the associated parsimonious naïve Bayes probabilistic model. We show that the corresponding approximation problem, which is central to nonnegative parafac, will always have optimal solutions. The result holds for any choice of norms and, under a mild assumption, even Brègman divergences. hal-00410056, version 1- 16 Aug 2009 1. Dedication This article is dedicated to the memory of our late colleague Richard Allan Harshman. It is loosely organized around two of Harshman’s best known works — parafac [19] and lsi [13], and answers two questions that he posed. We target this article to a technometrics readership. In Section 4, we discussed a few aspects of nonnegative tensor factorization and Hofmann’s plsi, a variant of the lsi model co-proposed by Harshman [13]. In Section 5, we answered a question of Harshman on why the apparently unrelated construction of Bini, Capovani, Lotti, and Romani in [1] should be regarded as the first example of what he called ‘parafac degeneracy ’ [27]. Finally in Section 6, we showed that such parafac degeneracy will not happen for nonnegative approximations of nonnegative tensors, answering another question of his. 2.

### Citations

2907 | Indexing by latent semantic analysis
- Deerwester, Dumais, et al.
- 1990
(Show Context)
Citation Context ...6 Aug 2009 1. Dedication This article is dedicated to the memory of our late colleague Richard Allan Harshman. It is loosely organized around two of Harshman’s best known works — parafac [19] and lsi =-=[13]-=-, and answers two questions that he posed. We target this article to a technometrics readership. In Section 4, we discussed a few aspects of nonnegative tensor factorization and Hofmann’s plsi, a vari... |

1413 |
Independent component analysis, a new concept
- Comon
- 1994
(Show Context)
Citation Context ...gonal approximations We have often been asked about norm-regularized and orthogonal approximations of tensors that are not necessarily nonnegative. These approximation problems are useful in practice =-=[10, 20, 32]-=-. Nevertheless these always have optimal solutions for a much simpler reason — they are continuous optimization problems over compact feasible set, so the existence of a global minima is immediate fro... |

1345 |
On information and sufficiency
- Kullback, Leibler
- 1951
(Show Context)
Citation Context ...compositions [29, 34]. In fact, one of the main novelty of nmf as introduced by Lee and Seung [29] over the earlier studies in technometrics [9, 26, 33] is their use of the KullbackLeibler divergence =-=[28]-=- as a proximity measure4 . The kl divergence is defined for nonnegative matrices in [29] but it is straightforward to extend the definition to nonnegative tensors. For A ∈ R d1×···×dk + and B ∈ ri(R d... |

1056 |
Learning the parts of objects by nonnegative matrix factorization
- Lee, Seung
- 1999
(Show Context)
Citation Context ..., in the context of chemometrics, sample concentration and spectral intensity often cannot assume negative values [5, 6, 9, 26, 31, 33]. Nonnegativity can also be motivated by the data analytic tenet =-=[29]-=- that the way ‘basis functions’ combine to build ‘target objects’ is an exclusively additive process and should not involve any cancellations between the basis functions. For k = 2, this is the motiva... |

851 | Probabilistic Latent Semantic Indexing
- Hofmann
- 1999
(Show Context)
Citation Context ...nnegative tensor on the probability simplex, decomposes in a nonnegative rank-revealing manner that parallels the matrix singular value decomposition. This generalizes Hofmann’s probabilistic variant =-=[23]-=- of latent semantic indexing (lsi), a well-known technique in natural language processing and information retrieval that Harshman played a role in developing [13]. Nonnegative tensor decompositions we... |

387 | Gaussian elimination is not optimal - Strassen - 1969 |

339 |
Analysis of individual differences in multidimensional scaling via an n-way generalization of eckart-young decomposition
- Carroll, Chang
- 1970
(Show Context)
Citation Context ...f vectors, probably first surfaced as data analytic models in psychometrics in the work of Harshman [19], who called his model parafac (for Parallel Factor Analysis), and the work of Carrol and Chang =-=[8]-=-, who called their model candecomp (for Canonical Decomposition). The candecomp/parafac model, sometimes abbreviated as cp model, essentially asks for a solution to the following problem: given a tens... |

282 |
Foundations of the PARAFAC procedure: models and conditions for an explanatory multi-modal factor analysis,”UCLA
- Harshman
- 1970
(Show Context)
Citation Context ...version 1 - 16 Aug 2009 1. Dedication This article is dedicated to the memory of our late colleague Richard Allan Harshman. It is loosely organized around two of Harshman’s best known works — parafac =-=[19]-=- and lsi [13], and answers two questions that he posed. We target this article to a technometrics readership. In Section 4, we discussed a few aspects of nonnegative tensor factorization and Hofmann’s... |

280 |
Positive matrix factorization: a nonnegative factor model with optimal utilization of error estimates of data values. Environmetrics 5:111–126
- Paatero, Tapper
- 1994
(Show Context)
Citation Context ...ed to be nonnegative. Such nonnegativity arises naturally in applications. For example, in the context of chemometrics, sample concentration and spectral intensity often cannot assume negative values =-=[5, 6, 9, 26, 31, 33]-=-. Nonnegativity can also be motivated by the data analytic tenet [29] that the way ‘basis functions’ combine to build ‘target objects’ is an exclusively additive process and should not involve any can... |

277 |
The relaxation method to find the common point of convex sets and its applications to the solution of problems in convex programming
- Bregman
- 1967
(Show Context)
Citation Context ...ntropy, margin, spectral separation, volume, etc, are often used as loss functions in matrix and tensor approximations. Such measures may not even be a metric, an example being the Brègman divergence =-=[3, 14, 24]-=-, a class of proximity measures that often have information theoretic or probabilistic interpretations. In the definition below,12 L.-H. LIM AND P. COMON hal-00410056, version 1 - 16 Aug 2009 ri(Ω) d... |

255 | The approximation of one matrix by another of lower rank - Eckart, Young - 1936 |

83 | Non-negative tensor factorization with applications to statistics and computer vision
- Shashua, Hazan
- 2005
(Show Context)
Citation Context ...neralization of nmf to tensors of higher order yields a model known as nonnegative parafac [9, 26, 31], which has also been studied more recently under the name nonnegative tensor factorization (ntf) =-=[34]-=-. As we have just mentioned, a general tensor can fail to have a best low-rank approximation. So the first question that one should ask in a multilinear generalization of a bilinear model is whether t... |

74 | Tensor rank and the ill-posedness of the best low-rank approximation problem
- Silva, Lim
(Show Context)
Citation Context ...responding replacement of the Euclidean inner product in (10) by the Hermitian inner product) though a minor caveat is that the tensor rank as defined in (6) depends on the choice of base fields (see =-=[12]-=- for a discussion). 4. Nonnegative decomposition of nonnegative tensors hal-00410056, version 1 - 16 Aug 2009 We will see that a finite collection of discrete random variables satisfying both the naïv... |

59 | Algebraic geometry of bayesian networks
- Garcia, Stillman, et al.
- 2005
(Show Context)
Citation Context ...e technometrics communities [5, 6, 9, 26, 31]. The interpretation as a naïve Bayes decomposition of probability distributions into conditional distributions was due to Garcia, Stillman, and Sturmfels =-=[16]-=- and Sashua and Hazan [34]. It is perhaps worth taking this opportunity to point out a minor detail that had somehow been neglected in [16, 34]: the naïve Bayes hypothesis is not sufficient to guarant... |

55 |
Algebraic Complexity Theory. Grundlehren der mathematischen Wissenschaften 315
- Bűrgisser, Clausen, et al.
- 1997
(Show Context)
Citation Context ... tensor into a minimal sum of outer products of vectors was first studied by Hitchcock [21, 22] in 1927. The topic has a long and illustrious history in algebraic computational complexity theory (cf. =-=[7]-=- and the nearly 600 references in its bibliography) dating back to Strassen’s celebrated result [36]. It has also recently found renewed interests, coming most notably from algebraic statistics and qu... |

52 |
Relation between plsa and nmf and implications
- Gaussier, Goutte
- 2005
(Show Context)
Citation Context ... the simplest possible, i.e. the hidden variable Θ be minimally supported. For the case k = 2, (15) is Hofmann’s plsi [23], a probabilistic variant of latent semantic indexing [13]. While it is known =-=[17]-=- that the multiplicative updating rule for nmf with kl divergence in [29] is equivalent to the use of em algorithm for maximum likelihood estimation of plsi in [23], this is about the equivalence of t... |

40 | Symmetric tensors and symmetric tensor rank
- Comon, Golub, et al.
(Show Context)
Citation Context ...ence). Moreover such failures can occur with positive probability and in some cases with certainty, i.e. where the infimum in (16) is never attained. This phenomenon also extends to symmetric tensors =-=[11]-=-. This poses some serious conceptual difficulties — if one cannot guarantee a solution a priori, then what is one trying to compute in instances where there are no solutions? We often get the answer “... |

37 | Multi-way analysis: Applications in the chemical sciences - Smilde, Bro, et al. - 2004 |

32 | Seung HS: Learning the parts of objects by non-negative matrix factorization. Nature - DD - 1999 |

29 |
The Expression of a Tensor or a Polyadic as a Sum of Products
- Hitchcock
- 1927
(Show Context)
Citation Context ... approximations of nonnegative tensors, answering another question of his. 2. Introduction The decomposition of a tensor into a minimal sum of outer products of vectors was first studied by Hitchcock =-=[21, 22]-=- in 1927. The topic has a long and illustrious history in algebraic computational complexity theory (cf. [7] and the nearly 600 references in its bibliography) dating back to Strassen’s celebrated res... |

29 |
Multiple invariants and generalized rank of a p-way matrix or tensor
- Hitchcock
(Show Context)
Citation Context ... approximations of nonnegative tensors, answering another question of his. 2. Introduction The decomposition of a tensor into a minimal sum of outer products of vectors was first studied by Hitchcock =-=[21, 22]-=- in 1927. The topic has a long and illustrious history in algebraic computational complexity theory (cf. [7] and the nearly 600 references in its bibliography) dating back to Strassen’s celebrated res... |

26 |
O(n2.7799) complexity for n × n approximate matrix multiplication
- Bini, Capovani, et al.
- 1979
(Show Context)
Citation Context ...’s plsi, a variant of the lsi model co-proposed by Harshman [13]. In Section 5, we answered a question of Harshman on why the apparently unrelated construction of Bini, Capovani, Lotti, and Romani in =-=[1]-=- should be regarded as the first example of what he called ‘parafac degeneracy’ [27]. Finally in Section 6, we showed that such parafac degeneracy will not happen for nonnegative approximations of non... |

25 |
The Art of Computer Programming, 2: Semi numerical Algorithms
- Knuth
- 1981
(Show Context)
Citation Context ...cy and continue to credit the much later work of Paatero [32]. The truth is that such constructions are well-known in algebraic computational complexity; in addition to [1], one may also find them in =-=[2, 7, 25]-=-, all predating [32]. As a small public service 3 , we will translate the original construction of Bini, Capovani, Lotti, and Romani into notations more familiar to the technometrics communities. In [... |

24 |
A weighted non-negative least squares algorithm for three-way “PARAFAC” factor analysis
- Paatero
- 1997
(Show Context)
Citation Context ...ed to be nonnegative. Such nonnegativity arises naturally in applications. For example, in the context of chemometrics, sample concentration and spectral intensity often cannot assume negative values =-=[5, 6, 9, 26, 31, 33]-=-. Nonnegativity can also be motivated by the data analytic tenet [29] that the way ‘basis functions’ combine to build ‘target objects’ is an exclusively additive process and should not involve any can... |

22 | Foundations of the PARAFAC procedure: Models and conditions for an ‘exploratory’ multi-modal factor analysis - RA - 1970 |

17 |
Data preprocessing and the extended PARAFAC model. In Research methods for multimode data analysis, edited by
- Harshman, Lundy
- 1984
(Show Context)
Citation Context ...gonal approximations We have often been asked about norm-regularized and orthogonal approximations of tensors that are not necessarily nonnegative. These approximation problems are useful in practice =-=[10, 20, 32]-=-. Nevertheless these always have optimal solutions for a much simpler reason — they are continuous optimization problems over compact feasible set, so the existence of a global minima is immediate fro... |

16 |
Construction and analysis of degenerate parafac models
- Paatero
(Show Context)
Citation Context ...il today,8 L.-H. LIM AND P. COMON many remain unconvinced that the construction in [1] indeed provides an explicit example of parafac degeneracy and continue to credit the much later work of Paatero =-=[32]-=-. The truth is that such constructions are well-known in algebraic computational complexity; in addition to [1], one may also find them in [2, 7, 25], all predating [32]. As a small public service 3 ,... |

13 | Approximate solutions for the bilinear form computational problem - Bini, Lotti, et al. - 1980 |

12 |
How 3-MFA data can cause degenerate Parafac solutions, among other relationships
- Kruskal, Harshman, et al.
- 1989
(Show Context)
Citation Context ...answered a question of Harshman on why the apparently unrelated construction of Bini, Capovani, Lotti, and Romani in [1] should be regarded as the first example of what he called ‘parafac degeneracy’ =-=[27]-=-. Finally in Section 6, we showed that such parafac degeneracy will not happen for nonnegative approximations of nonnegative tensors, answering another question of his. 2. Introduction The decompositi... |

9 | Least squares algorithms under unimodality and non-negativity constraints
- Bro, Sidiropoulos
- 1998
(Show Context)
Citation Context ...ed to be nonnegative. Such nonnegativity arises naturally in applications. For example, in the context of chemometrics, sample concentration and spectral intensity often cannot assume negative values =-=[5, 6, 9, 26, 31, 33]-=-. Nonnegativity can also be motivated by the data analytic tenet [29] that the way ‘basis functions’ combine to build ‘target objects’ is an exclusively additive process and should not involve any can... |

9 | editors. Multiway data analysis - Coppi, Bolasco - 1989 |

8 | Chang JJ. Analysis of individual differences in multidimensional scaling via an N-way generalization of ‘Eckardt-Young’ decomposition. Psychometrika - JD - 1970 |

7 |
Optimal solutions to non-negative parafac/multilinear nmf always exist
- Lim
(Show Context)
Citation Context ... never observed when fitting nonnegative-valued data with a nonnegative parafac model. This then led Harshman to conjecture that this is always the case. The text of his e-mail had been reproduced in =-=[30]-=-. The conjectured result involves demonstrating the existence of global minima over a non-compact feasible region and is thus not immediate. Nevertheless the proof is still straightforward by the foll... |

6 |
Optimization: Insights and Applications
- Brinkhuis, Tikhomirov
- 2005
(Show Context)
Citation Context ...cting the assumption that f(Tn) ≤ α for all n. □ The proof essentially shows that the function f is coercive — a real-valued function f is said to be coercive for minimization if lim ‖x‖→+∞ f(x) = +∞ =-=[4]-=-. This is a standard condition often used to guarantee that a continuous function on a noncompact domain attains its global minimum and is equivalent to saying that f has bounded sublevel sets. A mino... |

5 |
A fast non-negativity constrained least squares algorithm
- Bro, Jong
- 1997
(Show Context)
Citation Context |

5 |
Fitting of the latent class model via iteratively reweighted least squares CANDECOMP with nonnegativity constraints
- Carroll, Soete, et al.
- 1989
(Show Context)
Citation Context |

4 |
Gaussian elimination is not optimal, Numer
- Strassen
- 1969
(Show Context)
Citation Context ...1927. The topic has a long and illustrious history in algebraic computational complexity theory (cf. [7] and the nearly 600 references in its bibliography) dating back to Strassen’s celebrated result =-=[36]-=-. It has also recently found renewed interests, coming most notably from algebraic statistics and quantum computing. However the study of the corresponding approximation problem, i.e. the approximatio... |

3 |
Matrix nearness problems using Brègman divergences
- Dhillon, Tropp
- 2006
(Show Context)
Citation Context ...ntropy, margin, spectral separation, volume, etc, are often used as loss functions in matrix and tensor approximations. Such measures may not even be a metric, an example being the Brègman divergence =-=[3, 14, 24]-=-, a class of proximity measures that often have information theoretic or probabilistic interpretations. In the definition below,12 L.-H. LIM AND P. COMON hal-00410056, version 1 - 16 Aug 2009 ri(Ω) d... |

3 | How 3-MFA data can cause degenerate PARAFAC solutions, among other relationships - JB, RA, et al. |

3 | The Art of Computer Programming: Seminumerical Algorithms, Vol.2 [M ]. 3rd edition - DE - 2003 |

3 | Data preprocessing and the extended PARAFAC model - ME - 1984 |

2 |
Contrastvrije oplossingen van het CANDECOMP/PARAFAC-model
- Krijnen, Berge
- 1991
(Show Context)
Citation Context |

1 |
Bregman distance,” and “Bregman function,” pp. 152–154 in
- Iusem
- 1997
(Show Context)
Citation Context ...ntropy, margin, spectral separation, volume, etc, are often used as loss functions in matrix and tensor approximations. Such measures may not even be a metric, an example being the Brègman divergence =-=[3, 14, 24]-=-, a class of proximity measures that often have information theoretic or probabilistic interpretations. In the definition below,12 L.-H. LIM AND P. COMON hal-00410056, version 1 - 16 Aug 2009 ri(Ω) d... |

1 | The expression of a tensor or a polyadic as a sum of products - FL |

1 | Multiple invariants and generalized rank of a p-way matrix or tensor - FL - 1927 |

1 | de Jong S. A fast non-negativity constrained least squares algorithm - Bro - 1997 |

1 | De Soete G, Pruzansky S. Fitting of the latent class model via iteratively reweighted least squares CANDECOMP with nonnegativity constraints - JD |

1 | Ten Berge JMF. Contrastvrije oplossingen van het CANDECOMP/PARAFAC-model. Kwantitatieve Methoden - WP - 1991 |

1 | L-H, Mourrain B. Symmetric tensors and symmetric tensor rank - Comon, Golub, et al. |

1 | Low-rank approximation of generic p × q × 2arraysand diverging components in the CANDECOMP/PARAFAC model - Stegeman |