Results 1  10
of
44
On the equivalence of nonnegative matrix factorization and spectral clustering
 in SIAM International Conference on Data Mining
, 2005
"... Current nonnegative matrix factorization (NMF) deals with X = FG T type. We provide a systematic analysis and extensions of NMF to the symmetric W = HH T, and the weighted W = HSHT. We show that (1) W = HHT is equivalent to Kernel Kmeans clustering and the Laplacianbased spectral clustering. (2) X ..."
Abstract

Cited by 85 (10 self)
 Add to MetaCart
Current nonnegative matrix factorization (NMF) deals with X = FG T type. We provide a systematic analysis and extensions of NMF to the symmetric W = HH T, and the weighted W = HSHT. We show that (1) W = HHT is equivalent to Kernel Kmeans clustering and the Laplacianbased spectral clustering. (2) X = FGT is equivalent to simultaneous clustering of rows and columns of a bipartite graph. Algorithms are given for computing these symmetric NMFs. 1
Orthogonal nonnegative matrix trifactorizations for clustering
 In SIGKDD
, 2006
"... Currently, most research on nonnegative matrix factorization (NMF) focus on 2factor X = FG T factorization. We provide a systematic analysis of 3factor X = FSG T NMF. While unconstrained 3factor NMF is equivalent to unconstrained 2factor NMF, constrained 3factor NMF brings new features to constr ..."
Abstract

Cited by 66 (18 self)
 Add to MetaCart
Currently, most research on nonnegative matrix factorization (NMF) focus on 2factor X = FG T factorization. We provide a systematic analysis of 3factor X = FSG T NMF. While unconstrained 3factor NMF is equivalent to unconstrained 2factor NMF, constrained 3factor NMF brings new features to constrained 2factor NMF. We study the orthogonality constraint because it leads to rigorous clustering interpretation. We provide new rules for updating F,S,G and prove the convergence of these algorithms. Experiments on 5 datasets and a real world case study are performed to show the capability of biorthogonal 3factor NMF on simultaneously clustering rows and columns of the input data matrix. We provide a new approach of evaluating the quality of clustering on words using class aggregate distribution and multipeak distribution. We also provide an overview of various NMF extensions and examine their relationships.
Exponentiated gradient algorithms for conditional random fields and maxmargin Markov networks
, 2008
"... Loglinear and maximummargin models are two commonlyused methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large dat ..."
Abstract

Cited by 59 (1 self)
 Add to MetaCart
Loglinear and maximummargin models are two commonlyused methods in supervised machine learning, and are frequently used in structured prediction problems. Efficient learning of parameters in these models is therefore an important problem, and becomes a key factor when learning from very large data sets. This paper describes exponentiated gradient (EG) algorithms for training such models, where EG updates are applied to the convex dual of either the loglinear or maxmargin objective function; the dual in both the loglinear and maxmargin cases corresponds to minimizing a convex function with simplex constraints. We study both batch and online variants of the algorithm, and provide rates of convergence for both cases. In the maxmargin case, O ( 1 ε) EG updates are required to reach a given accuracy ε in the dual; in contrast, for loglinear models only O(log (1/ε)) updates are required. For both the maxmargin and loglinear cases, our bounds suggest that the online EG algorithm requires a factor of n less computation to reach a desired accuracy than the batch EG algorithm, where n is the number of training examples. Our experiments confirm that the online algorithms are much faster than the batch algorithms in practice. We describe how the EG updates factor in a convenient way for structured prediction problems, allowing the algorithms to be
Convex and SemiNonnegative Matrix Factorizations
, 2008
"... We present several new variations on the theme of nonnegative matrix factorization (NMF). Considering factorizations of the form X = F GT, we focus on algorithms in which G is restricted to contain nonnegative entries, but allow the data matrix X to have mixed signs, thus extending the applicable ra ..."
Abstract

Cited by 45 (4 self)
 Add to MetaCart
We present several new variations on the theme of nonnegative matrix factorization (NMF). Considering factorizations of the form X = F GT, we focus on algorithms in which G is restricted to contain nonnegative entries, but allow the data matrix X to have mixed signs, thus extending the applicable range of NMF methods. We also consider algorithms in which the basis vectors of F are constrained to be convex combinations of the data points. This is used for a kernel extension of NMF. We provide algorithms for computing these new factorizations and we provide supporting theoretical analysis. We also analyze the relationships between our algorithms and clustering algorithms, and consider the implications for sparseness of solutions. Finally, we present experimental results that explore the properties of these new methods.
Probability Density Estimation from Optimally Condensed Data Samples
 IEEE Trans. Pattern Analysis and Machine Intelligence
, 2003
"... Abstract—The requirement to reduce the computational cost of evaluating a point probability density estimate when employing a Parzen window estimator is a wellknown problem. This paper presents the Reduced Set Density Estimator that provides a kernelbased density estimator which employs a small per ..."
Abstract

Cited by 36 (0 self)
 Add to MetaCart
Abstract—The requirement to reduce the computational cost of evaluating a point probability density estimate when employing a Parzen window estimator is a wellknown problem. This paper presents the Reduced Set Density Estimator that provides a kernelbased density estimator which employs a small percentage of the available data sample and is optimal in the L2 sense. While only requiring OðN 2 Þ optimization routines to estimate the required kernel weighting coefficients, the proposed method provides similar levels of performance accuracy and sparseness of representation as Support Vector Machine density estimation, which requires OðN 3 Þ optimization routines, and which has previously been shown to consistently outperform Gaussian Mixture Models. It is also demonstrated that the proposed density estimator consistently provides superior density estimates for similar levels of data reduction to that provided by the recently proposed DensityBased Multiscale Data Condensation algorithm and, in addition, has comparable computational scaling. The additional advantage of the proposed method is that no extra free parameters are introduced such as regularization, bin width, or condensation ratios, making this method a very simple and straightforward approach to providing a reduced set density estimator with comparable accuracy to that of the full sample Parzen density estimator. Index Terms—Kernel density estimation, Parzen window, data condensation, sparse representation. 1
Efficient estimation of detailed singleneuron models
 Journal of Neurophysiology
, 2006
"... Running head: Efficient estimation of detailed singleneuron models ..."
Abstract

Cited by 30 (14 self)
 Add to MetaCart
Running head: Efficient estimation of detailed singleneuron models
The relationships among various nonnegative matrix factorization methods for clustering
 In ICDM
, 2006
"... The nonnegative matrix factorization (NMF) has been shown recently to be useful for clustering. Various extensions of NMF have also been proposed. In this paper we present an overview and theoretically analyze the relationships among them. In addition, we clarify previously unaddressed issues, such ..."
Abstract

Cited by 28 (8 self)
 Add to MetaCart
The nonnegative matrix factorization (NMF) has been shown recently to be useful for clustering. Various extensions of NMF have also been proposed. In this paper we present an overview and theoretically analyze the relationships among them. In addition, we clarify previously unaddressed issues, such as NMF normalization, cluster posterior probabilty, and NMF algoritm convergence rate. Experiments are also conducted to empirically evaluate and compare various factorization methods.
On the convergence of concaveconvex procedure
 In NIPS Workshop on Optimization for Machine Learning
, 2009
"... The concaveconvex procedure (CCCP) is a majorizationminimization algorithm that solves d.c. (difference of convex functions) programs as a sequence of convex programs. In machine learning, CCCP is extensively used in many learning algorithms like sparse support vector machines (SVMs), transductive ..."
Abstract

Cited by 21 (0 self)
 Add to MetaCart
The concaveconvex procedure (CCCP) is a majorizationminimization algorithm that solves d.c. (difference of convex functions) programs as a sequence of convex programs. In machine learning, CCCP is extensively used in many learning algorithms like sparse support vector machines (SVMs), transductive SVMs, sparse principal component analysis, etc. Though widely used in many applications, the convergence behavior of CCCP has not gotten a lot of specific attention. Yuille and Rangarajan analyzed its convergence in their original paper, however, we believe the analysis is not complete. Although the convergence of CCCP can be derived from the convergence of the d.c. algorithm (DCA), its proof is more specialized and technical than actually required for the specific case of CCCP. In this paper, we follow a different reasoning and show how Zangwill’s global convergence theory of iterative algorithms provides a natural framework to prove the convergence of CCCP, allowing a more elegant and simple proof. This underlines Zangwill’s theory as a powerful and general framework to deal with the convergence issues of iterative algorithms, after also being used to prove the convergence of algorithms like expectationmaximization, generalized alternating minimization, etc. In this paper, we provide a rigorous analysis of the convergence of CCCP by addressing these questions: (i) When does CCCP find a local minimum or a stationary point of the d.c. program under consideration? (ii) When does the sequence generated by CCCP converge? We also present an open problem on the issue of local convergence of CCCP. 1
On the convergence of bound optimization algorithms
 in: Proc. 19th Conference in Uncertainty in Artificial Intelligence (UAI ’03
, 2003
"... Many practitioners who use EM and related algorithms complain that they are sometimes slow. When does this happen, and what can be done about it? In this paper, we study the general class of bound optimization algorithms – including EM, Iterative Scaling, Nonnegative Matrix Factorization, CCCP – an ..."
Abstract

Cited by 20 (0 self)
 Add to MetaCart
Many practitioners who use EM and related algorithms complain that they are sometimes slow. When does this happen, and what can be done about it? In this paper, we study the general class of bound optimization algorithms – including EM, Iterative Scaling, Nonnegative Matrix Factorization, CCCP – and their relationship to direct optimization algorithms such as gradientbased methods for parameter learning. We derive a general relationship between the updates performed by bound optimization methods and those of gradient and secondorder methods and identify analytic conditions under which bound optimization algorithms exhibit quasiNewton behavior, and under which they possess poor, firstorder convergence. Based on this analysis, we consider several specific algorithms, interpret and analyze their convergence properties and provide some recipes for preprocessing input to these algorithms to yield faster convergence behavior. We report empirical results supporting our analysis and showing that simple data preprocessing can result in dramatically improved performance of bound optimizers in practice. 1 Bound Optimization Algorithms Many problems in machine learning and pattern recognition ultimately reduce to the optimization of a scalar valued function L(Θ) of a free parameter vector Θ. For example, in supervised and unsupervised probabilistic modeling the objective function may be the (conditional) data likelihood or the posterior over parameters. In discriminative learning we may use a classification or regression score; in reinforcement learning an average discounted reward. Optimization may also arise during inference; for example we may want to reduce the cross entropy between two distributions or minimize a function such as the Bethe free energy. Bound optimization (BO) algorithms take advantage of the fact that many objective functions arising in practice have a
Nonnegative mixednorm preconditioning for microscopy image segmentation
 Proc. Int. Conf. Information Processing in Med. Imaging
, 2009
"... Abstract. Image segmentation in microscopy, especially in interferencebased optical microscopy modalities, is notoriously challenging due to inherent optical artifacts. We propose a general algebraic framework for preconditioning microscopy images. It transforms an image that is unsuitable for direc ..."
Abstract

Cited by 10 (7 self)
 Add to MetaCart
Abstract. Image segmentation in microscopy, especially in interferencebased optical microscopy modalities, is notoriously challenging due to inherent optical artifacts. We propose a general algebraic framework for preconditioning microscopy images. It transforms an image that is unsuitable for direct analysis into an image that can be effortlessly segmented using global thresholding. We formulate preconditioning as the minimization of nonnegativeconstrained convex objective functions with smoothness and sparsenesspromoting regularization. We propose efficient numerical algorithms for optimizing the objective functions. The algorithms were extensively validated on simulated differential interference (DIC) microscopy images and challenging real DIC images of cell populations. With preconditioning, we achieved unprecedented segmentation accuracy of 97.9 % for CNS stem cells, and 93.4 % for human red blood cells in challenging images. 1