Results 1 - 10
of
31
On the equivalence of nonnegative matrix factorization and spectral clustering
- in SIAM International Conference on Data Mining
, 2005
"... Current nonnegative matrix factorization (NMF) deals with X = FG T type. We provide a systematic analysis and extensions of NMF to the symmetric W = HH T, and the weighted W = HSHT. We show that (1) W = HHT is equivalent to Kernel K-means clustering and the Laplacian-based spectral clustering. (2) X ..."
Abstract
-
Cited by 60 (7 self)
- Add to MetaCart
Current nonnegative matrix factorization (NMF) deals with X = FG T type. We provide a systematic analysis and extensions of NMF to the symmetric W = HH T, and the weighted W = HSHT. We show that (1) W = HHT is equivalent to Kernel K-means clustering and the Laplacian-based spectral clustering. (2) X = FGT is equivalent to simultaneous clustering of rows and columns of a bipartite graph. Algorithms are given for computing these symmetric NMFs. 1
Orthogonal nonnegative matrix tri-factorizations for clustering
- In SIGKDD
, 2006
"... Currently, most research on nonnegative matrix factorization (NMF) focus on 2-factor X = FG T factorization. We provide a systematic analysis of 3-factor X = FSG T NMF. While unconstrained 3-factor NMF is equivalent to unconstrained 2-factor NMF, constrained 3factor NMF brings new features to constr ..."
Abstract
-
Cited by 45 (12 self)
- Add to MetaCart
Currently, most research on nonnegative matrix factorization (NMF) focus on 2-factor X = FG T factorization. We provide a systematic analysis of 3-factor X = FSG T NMF. While unconstrained 3-factor NMF is equivalent to unconstrained 2-factor NMF, constrained 3factor NMF brings new features to constrained 2-factor NMF. We study the orthogonality constraint because it leads to rigorous clustering interpretation. We provide new rules for updating F,S,G and prove the convergence of these algorithms. Experiments on 5 datasets and a real world case study are performed to show the capability of bi-orthogonal 3-factor NMF on simultaneously clustering rows and columns of the input data matrix. We provide a new approach of evaluating the quality of clustering on words using class aggregate distribution and multi-peak distribution. We also provide an overview of various NMF extensions and examine their relationships.
Exponentiated gradient algorithms for conditional random fields and maxmargin Markov networks
, 2008
"... Log-linear and maximum-margin models are two commonly-used methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large dat ..."
Abstract
-
Cited by 35 (1 self)
- Add to MetaCart
Log-linear and maximum-margin models are two commonly-used methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, where EG updates are applied to the convex dual of either the log-linear or maxmargin objective function; the dual in both the log-linear and max-margin cases corresponds to minimizing a convex function with simplex constraints. We study both batch and online variants of the algorithm, and provide rates of convergence for both cases. In the max-margin case, O ( 1 ε) EG updates are required to reach a given accuracy ε in the dual; in contrast, for log-linear models only O(log (1/ε)) updates are required. For both the max-margin and log-linear cases, our bounds suggest that the online EG algorithm requires a factor of n less computation to reach a desired accuracy than the batch EG algorithm, where n is the number of training examples. Our experiments confirm that the online algorithms are much faster than the batch algorithms in practice. We describe how the EG updates factor in a convenient way for structured prediction problems, allowing the algorithms to be
Convex and Semi-Nonnegative Matrix Factorizations
, 2006
"... We present several new variations on the theme of nonnegative matrix factorization (NMF). Considering factorizations of the form X = F G T, we focus on algorithms in which G is restricted to contain nonnegative entries, but allow the data matrix X to have mixed signs, thus extending the applicable r ..."
Abstract
-
Cited by 29 (3 self)
- Add to MetaCart
We present several new variations on the theme of nonnegative matrix factorization (NMF). Considering factorizations of the form X = F G T, we focus on algorithms in which G is restricted to contain nonnegative entries, but allow the data matrix X to have mixed signs, thus extending the applicable range of NMF methods. We also consider algorithms in which the basis vectors of F are constrained to be convex combinations of the data points. This is used for a kernel extension of NMF. We provide algorithms for computing these new factorizations and we provide supporting theoretical analysis. We also analyze the relationships between our algorithms and clustering algorithms, and consider the implications for sparseness of solutions. Finally, we present experimental results that explore the properties of these new methods. 1
Probability Density Estimation from Optimally Condensed Data Samples
- IEEE Trans. Pattern Analysis and Machine Intelligence
, 2003
"... Abstract—The requirement to reduce the computational cost of evaluating a point probability density estimate when employing a Parzen window estimator is a well-known problem. This paper presents the Reduced Set Density Estimator that provides a kernelbased density estimator which employs a small per ..."
Abstract
-
Cited by 26 (0 self)
- Add to MetaCart
Abstract—The requirement to reduce the computational cost of evaluating a point probability density estimate when employing a Parzen window estimator is a well-known problem. This paper presents the Reduced Set Density Estimator that provides a kernelbased density estimator which employs a small percentage of the available data sample and is optimal in the L2 sense. While only requiring OðN 2 Þ optimization routines to estimate the required kernel weighting coefficients, the proposed method provides similar levels of performance accuracy and sparseness of representation as Support Vector Machine density estimation, which requires OðN 3 Þ optimization routines, and which has previously been shown to consistently outperform Gaussian Mixture Models. It is also demonstrated that the proposed density estimator consistently provides superior density estimates for similar levels of data reduction to that provided by the recently proposed Density-Based Multiscale Data Condensation algorithm and, in addition, has comparable computational scaling. The additional advantage of the proposed method is that no extra free parameters are introduced such as regularization, bin width, or condensation ratios, making this method a very simple and straightforward approach to providing a reduced set density estimator with comparable accuracy to that of the full sample Parzen density estimator. Index Terms—Kernel density estimation, Parzen window, data condensation, sparse representation. 1
The relationships among various nonnegative matrix factorization methods for clustering
- In ICDM
, 2006
"... The nonnegative matrix factorization (NMF) has been shown recently to be useful for clustering. Various extensions of NMF have also been proposed. In this paper we present an overview and theoretically analyze the relationships among them. In addition, we clarify previously unaddressed issues, such ..."
Abstract
-
Cited by 17 (5 self)
- Add to MetaCart
The nonnegative matrix factorization (NMF) has been shown recently to be useful for clustering. Various extensions of NMF have also been proposed. In this paper we present an overview and theoretically analyze the relationships among them. In addition, we clarify previously unaddressed issues, such as NMF normalization, cluster posterior probabilty, and NMF algoritm convergence rate. Experiments are also conducted to empirically evaluate and compare various factorization methods.
On the convergence of bound optimization algorithms
- in: Proc. 19th Conference in Uncertainty in Artificial Intelligence (UAI ’03
, 2003
"... Many practitioners who use EM and related algorithms complain that they are sometimes slow. When does this happen, and what can be done about it? In this paper, we study the general class of bound optimization algorithms – including EM, Iterative Scaling, Non-negative Matrix Factorization, CCCP – an ..."
Abstract
-
Cited by 15 (0 self)
- Add to MetaCart
Many practitioners who use EM and related algorithms complain that they are sometimes slow. When does this happen, and what can be done about it? In this paper, we study the general class of bound optimization algorithms – including EM, Iterative Scaling, Non-negative Matrix Factorization, CCCP – and their relationship to direct optimization algorithms such as gradientbased methods for parameter learning. We derive a general relationship between the updates performed by bound optimization methods and those of gradient and second-order methods and identify analytic conditions under which bound optimization algorithms exhibit quasi-Newton behavior, and under which they possess poor, first-order convergence. Based on this analysis, we consider several specific algorithms, interpret and analyze their convergence properties and provide some recipes for preprocessing input to these algorithms to yield faster convergence behavior. We report empirical results supporting our analysis and showing that simple data preprocessing can result in dramatically improved performance of bound optimizers in practice. 1 Bound Optimization Algorithms Many problems in machine learning and pattern recognition ultimately reduce to the optimization of a scalar valued function L(Θ) of a free parameter vector Θ. For example, in supervised and unsupervised probabilistic modeling the objective function may be the (conditional) data likelihood or the posterior over parameters. In discriminative learning we may use a classification or regression score; in reinforcement learning an average discounted reward. Optimization may also arise during inference; for example we may want to reduce the cross entropy between two distributions or minimize a function such as the Bethe free energy. Bound optimization (BO) algorithms take advantage of the fact that many objective functions arising in practice have a
Efficient estimation of detailed single-neuron models
- Journal of Neurophysiology
, 2006
"... Running head: Efficient estimation of detailed single-neuron models ..."
Abstract
-
Cited by 12 (7 self)
- Add to MetaCart
Running head: Efficient estimation of detailed single-neuron models
Nonnegative mixed-norm preconditioning for microscopy image segmentation
- Proc. Int. Conf. Information Processing in Med. Imaging
, 2009
"... Abstract. Image segmentation in microscopy, especially in interferencebased optical microscopy modalities, is notoriously challenging due to inherent optical artifacts. We propose a general algebraic framework for preconditioning microscopy images. It transforms an image that is unsuitable for direc ..."
Abstract
-
Cited by 9 (7 self)
- Add to MetaCart
Abstract. Image segmentation in microscopy, especially in interferencebased optical microscopy modalities, is notoriously challenging due to inherent optical artifacts. We propose a general algebraic framework for preconditioning microscopy images. It transforms an image that is unsuitable for direct analysis into an image that can be effortlessly segmented using global thresholding. We formulate preconditioning as the minimization of nonnegative-constrained convex objective functions with smoothness and sparseness-promoting regularization. We propose efficient numerical algorithms for optimizing the objective functions. The algorithms were extensively validated on simulated differential interference (DIC) microscopy images and challenging real DIC images of cell populations. With preconditioning, we achieved unprecedented segmentation accuracy of 97.9 % for CNS stem cells, and 93.4 % for human red blood cells in challenging images. 1
On the convergence of concave-convex procedure
- In NIPS Workshop on Optimization for Machine Learning
, 2009
"... The concave-convex procedure (CCCP) is a majorization-minimization algorithm that solves d.c. (difference of convex functions) programs as a sequence of convex programs. In machine learning, CCCP is extensively used in many learning algorithms like sparse support vector machines (SVMs), transductive ..."
Abstract
-
Cited by 7 (0 self)
- Add to MetaCart
The concave-convex procedure (CCCP) is a majorization-minimization algorithm that solves d.c. (difference of convex functions) programs as a sequence of convex programs. In machine learning, CCCP is extensively used in many learning algorithms like sparse support vector machines (SVMs), transductive SVMs, sparse principal component analysis, etc. Though widely used in many applications, the convergence behavior of CCCP has not gotten a lot of specific attention. Yuille and Rangarajan analyzed its convergence in their original paper, however, we believe the analysis is not complete. Although the convergence of CCCP can be derived from the convergence of the d.c. algorithm (DCA), its proof is more specialized and technical than actually required for the specific case of CCCP. In this paper, we follow a different reasoning and show how Zangwill’s global convergence theory of iterative algorithms provides a natural framework to prove the convergence of CCCP, allowing a more elegant and simple proof. This underlines Zangwill’s theory as a powerful and general framework to deal with the convergence issues of iterative algorithms, after also being used to prove the convergence of algorithms like expectation-maximization, generalized alternating minimization, etc. In this paper, we provide a rigorous analysis of the convergence of CCCP by addressing these questions: (i) When does CCCP find a local minimum or a stationary point of the d.c. program under consideration? (ii) When does the sequence generated by CCCP converge? We also present an open problem on the issue of local convergence of CCCP. 1

