### Citations

7427 | Convex optimization
- Boyd, Vandenberghe
- 2004
(Show Context)
Citation Context ...ubproblems explicitly to §7. 5.2 Examples Boolean PCA. Suppose Aij ∈ {−1, 1}m×n, and we wish to approximate this Boolean matrix. We may take the loss to be L(u, a) = (1− au)+, which is the hinge loss =-=[BV04]-=-, and solve the problem (17) with or without regularization. When the regularization is sum of squares (r(x) = λ‖x‖22, r̃(y) = λ‖y‖22), fixing X and minimizing over yj is equivalent to training a supp... |

2963 |
Robust Statistics
- Huber
- 1981
(Show Context)
Citation Context ...r function is defined as huber(x) = { (1/2)x2 |x| ≤ 1 |x| − (1/2) |x| > 1. Using Huber loss, Lij(u, a) = huber(u− a), in place of `1 loss also yields an estimator robust to occasionaly large outliers =-=[Hub11]-=-. The Huber function is less sensitive to small errors |u− a| than the `1 norm, but becomes linear in the error for large errors. This choice of loss function results in a generalized low rank model f... |

2707 | Atomic decomposition by basis pursuit - Chen, Donoho, et al. |

2226 | Finding Groups in Data: An Introduction to Cluster Analysis - Kaufman, Rousseeuw - 1990 |

1663 |
Learning the parts of objects by non-negative matrix factorization
- Lee, Seung
- 1999
(Show Context)
Citation Context ...ument is in C and ∞ otherwise.) Then a solution to problem (9) gives the matrix best approximating A that has a nonnegative factorization (i.e., a factorization into elementwise nonnegative matrices) =-=[LS99]-=-. The nonnegative matrix factorization problem has a rich analytical structure [BRRT12, DS14] and a wide range of uses in practice [LS99, SBPP06, BBL+07, Vir07, KP07, FBD09]; hence the literature on i... |

1220 | Algorithms for Non-negative Matrix Factorization - Lee, Seung - 2001 |

995 | Distributed optimization and statistical learning via the alternating direction method of multipliers - Boyd, Parikh, et al. - 2011 |

973 | Quantile Regression - Koenker - 2005 |

958 | Analysis of a complex of statistical variables into principal components - Hotelling - 1933 |

955 | Sparse coding with an overcomplete basis set: a strategy employed by V1?, Vision Res. 37 - Olshausen, Field - 1997 |

928 | K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation
- Aharon, Elad, et al.
- 2006
(Show Context)
Citation Context ...e that minimizes the least squares error by setting xi to be the solution to the corresponding least squares problem. Many other algorithms for this problem have also been proposed, such as the k-SVD =-=[AEB06]-=- and sparse subspace clustering [EV09], some with provable guarantees on the quality of the recovered solution [SC12]. 15 Supervised learning. Sometimes we want to understand the variation that a cert... |

888 |
A technique for the measurement of attitudes
- Likert
- 1932
(Show Context)
Citation Context ...<a (1− u+ a′)+ + ∑ a′>a (1 + u− a′)+, which generalizes the hinge loss to ordinal data. This loss function may be useful for encoding Likert-scale data indicating degrees of agreement with a question =-=[Lik32]-=-. For example, we might have Fj = {strongly disagree, disagree, neither agree nor disagree, agree, strongly agree}. Interval PCA. Suppose that the data Aij ∈ R2 are tuples denoting the endpoints of an... |

867 | Exact matrix completion via convex optimization. - Candes, Recht - 2009 |

814 | On lines and planes of closest fit to systems of points in space - Pearson - 1901 |

796 | Signal recovery from random measurements via orthogonal matching pursuit,” - Tropp, Gilbert - 2007 |

721 | Solving multiclass learning problems via error-correcting output codes
- Dietterich, Bakiri
- 1995
(Show Context)
Citation Context ...rs of the model give the optimal matrix Y , while the implied features will populate the optimal matrix X. For example, it is possible to use loss functions derived from error-correcting output codes =-=[DB95]-=-; the Directed Acyclic Graph SVM [PCST99]; the Crammer-Singer multi-class loss [CS02]; or the multi-category SVM [LLW04]. Ordinal PCA. We saw in §5 one way to fit a GLRM to ordinal data. Here, we use ... |

562 | Robust principal component analysis - Candès, Li, et al. |

555 | On the algorithmic implementation of multiclass kernelbased vector machines
- Crammer, Singer
(Show Context)
Citation Context ...the optimal matrix X. For example, it is possible to use loss functions derived from error-correcting output codes [DB95]; the Directed Acyclic Graph SVM [PCST99]; the Crammer-Singer multi-class loss =-=[CS02]-=-; or the multi-category SVM [LLW04]. Ordinal PCA. We saw in §5 one way to fit a GLRM to ordinal data. Here, we use a larger embedding dimension for ordinal features. The multi-dimensional embedding wi... |

477 | k-means++: The advantages of careful seeding - Arthur, Vassilvitskii - 2007 |

441 | Efficient sparse coding algorithms - Lee, Battle, et al. |

369 | Large margin DAGs for multiclass classification
- Platt, Cristianini, et al.
(Show Context)
Citation Context ... Y , while the implied features will populate the optimal matrix X. For example, it is possible to use loss functions derived from error-correcting output codes [DB95]; the Directed Acyclic Graph SVM =-=[PCST99]-=-; the Crammer-Singer multi-class loss [CS02]; or the multi-category SVM [LLW04]. Ordinal PCA. We saw in §5 one way to fit a GLRM to ordinal data. Here, we use a larger embedding dimension for ordinal ... |

357 | The power of convex relaxation: Near-optimal matrix completion, - Candes, Tao - 2009 |

297 | Self-taught learning: transfer learning from unlabeled data - Raina, Battle, et al. - 2007 |

276 | Sparse principal component analysis
- Zou, Hastie, et al.
(Show Context)
Citation Context ... or exact solutions (for small s) using the branch and bound method [LW66, BM03]. This regularization can be relaxed to a convex, but still sparsifying, penalty by letting r(x) = γ‖x‖1, r̃(y) = γ‖y‖1 =-=[ZHT06]-=-. Orthogonal nonnegative matrix factorization. One well known property of PCA is that the principal components obtained (i.e., the columns of X and rows of Y ) are orthogonal (i.e., XTX and Y Y T are ... |

274 | A direct formulation for sparse PCA using semidefinite programming - d’Aspremont, Ghaoui, et al. |

274 | Projected Gradient Methods for Nonnegative Matrix Factorization - Lin - 2007 |

260 | Maximum-margin matrix factorization
- Srebro, Rennie, et al.
- 2005
(Show Context)
Citation Context ...d framework to loss functions derived from the generalized Bregman divergence of any convex function, which includes models such as Independent Components 4 Analysis (ICA). Nathan Srebro’s PhD thesis =-=[SRJ04]-=- summarizes a number of models extending the framework to other loss functions (e.g., hinge loss and KL-divergence loss), and adding nuclear norm and max-norm regularization. In [SG08], the authors of... |

254 | Matrix completion with noise - Candes, Plan |

246 | Fast maximum margin matrix factorization for collaborative prediction.
- Rennie, Srebro
- 2005
(Show Context)
Citation Context ...ion minimization has been found to underperform relative to other methods [SG08], while semidefinite programming becomes computationally intractable for very large (or even just large) scale problems =-=[RS05]-=-. We have not previously seen a treatment of algorithms for this entire class of problems that can handle large scale data or take advantage of parallel computing resources. Below, we give a number of... |

243 |
Neural networks and principal component analysis: Learning from examples without local minima
- Baldi, Hornik
- 1989
(Show Context)
Citation Context ... minima. A.2 Fixed points of alternating minimization Theorem 1. The quadratically regularized PCA problem (2) has only one local minimum, which is the global minimum. Our proof is similar to that of =-=[BH89]-=-, who proved a related theorem for the case of PCA (1). Proof. We showed above that every stationary point of (2) has the form XY = ∑ i∈Ω uidiv T i , with Ω ⊆ {1, . . . , k′}, |Ω| ≤ k, and di = σi− γ.... |

243 | Online dictionary learning for sparse coding - Mairal, Bach, et al. |

238 | Sparse subspace clustering
- Elhamifar, Vidal
- 2009
(Show Context)
Citation Context ...r by setting xi to be the solution to the corresponding least squares problem. Many other algorithms for this problem have also been proposed, such as the k-SVD [AEB06] and sparse subspace clustering =-=[EV09]-=-, some with provable guarantees on the quality of the recovered solution [SC12]. 15 Supervised learning. Sometimes we want to understand the variation that a certain set of features can explain, and t... |

235 | Convex analysis and nonlinear optimization: theory and examples - Borwein, Lewis - 2000 |

211 | Spark: cluster computing with working sets - Zaharia, Chowdhury, et al. - 2010 |

196 | Algorithms and applications for approximation nonnegative matrix factorization - BERRY, BROWNE, et al. - 2007 |

196 | Branch-and-bound methods: A survey - Lawler, Wood - 1966 |

196 | Weighted low-rank approximations
- Srebro, Jaakkola
- 2003
(Show Context)
Citation Context ...nt in the matrix A. In the generalized low rank model, we let Lij(u−a) = wij(a − u)2, where wij is a weight, and take r = r̃ = 0. Unlike PCA, the weighted PCA problem has no known analytical solution =-=[SJ03]-=-; in fact, it is NP-hard to find an exact solution to weighted PCA [GG11], although it is not known whether approximate solutions of moderate accuracy may always be efficiently obtained. Robust PCA. D... |

194 | Matrix completion from a few entries
- Keshavan, Montanari, et al.
(Show Context)
Citation Context ...a few of them. It is a surprising recent result that this is possible: if at least |Ω| = O(nk log n) entries are observed, then the solution to (8) exactly recovers the matrix A with high probability =-=[KMO10]-=-. Alternating minimization. The problem (8) has no known analytical solution, but it is still easy to find a local minimum using alternating minimization. Alternating minimization has been shown to co... |

193 | Supervised dictionary learning - Mairal, Ponce, et al. - 2009 |

182 | Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria - Virtanen - 2007 |

159 | A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization - Burer, Monteiro |

158 | Hogwild: A lock-free approach to parallelizing stochastic gradient descent - Niu, Recht, et al. - 2011 |

153 | A generalization of principal component analysis to the exponential family
- Collins, Dasgupta, et al.
- 2001
(Show Context)
Citation Context ...trix factorization algorithms may be viewed in a unified framework, parametrized by a small number of modeling decisions. The first instance we find in the literature of this unified view appeared in =-=[CDS01]-=-, extending PCA to any probabilistic model in the exponential family. Gordon’s Generalized2 Linear2 models [Gor02] further extended the unified framework to loss functions derived from the generalized... |

152 | Nonnegative matrix factorization with the Itakura-Saito divergence. With application to music analysis - Févotte, Bertin, et al. - 2009 |

147 | Robust principal component analysis: Exact recovery of corrupted low-rank matrices by convex optimization - Wright, Ganesh, et al. |

100 | Document clustering using nonnegative matrix factorization - Shahnaz, Berry, et al. - 2006 |

100 | Sparse principal component analysis via regularized low rank matrix approximation - Shen, Huang - 2008 |

98 | A new approach to collaborative filtering: Operator estimation with spectral regularization - Abernethy, Bach, et al. |

94 | The scree test for the number of factors. Multivariate behavioral research - Cattell - 1966 |

93 |
Large scale image annotation: learning to rank with joint word-image embeddings
- Weston, Bengio, et al.
(Show Context)
Citation Context ...ing the top ranked choices. These losses include the area under the curve loss [Ste07], ordered weighted average of pairwise classification losses [UBG09], the weighted approximate-rank pairwise loss =-=[WBU10]-=-, the k-order statistic loss [WYW13], and the accuracy at the top loss [BCMR12]. 6.2 Offsets and scaling Just as in the previous section, better practical performance can often be achieved by allowing... |

89 | Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis
- Kim, Park
- 2007
(Show Context)
Citation Context ...of the above properties. As an example, one may require that both X and Y be simultaneously sparse and nonnegative by choosing r(x) = ‖x‖1 + I+(x) = 1Tx+ I+(x), 16 and similarly for r̃(y). Similarly, =-=[KP07]-=- show how to obtain a nonnegative matrix factorization in which one factor is sparse by using r(x) = ‖x‖21 + I+(x) and r̃(y) = ‖y‖22 + I+(y); they go on to use this factorization as a clustering techn... |

86 | Robust PCA via outlier pursuit
- Xu, Caramanis, et al.
- 2012
(Show Context)
Citation Context ...e matrix A = L + S + N into a low rank matrix L, a sparse matrix S, and a matrix with small Gaussian entries N by minimizing the loss ‖L‖∗ + ‖S‖1 + (1/2)‖N‖2F over all decompositions A = L+ S +N of A =-=[XCS12]-=-. In fact, this formulation is equivalent to Huber PCA with quadratic regularization on the factors X and Y . The argument showing this is very similar to the one we made above for robust PCA. The onl... |

83 |
Orthogonal nonnegative matrix tfactorizations for clustering
- Ding, Li, et al.
- 2006
(Show Context)
Citation Context ...terpreted as a point in Rn, defines a ray from the origin passing through that point. Orthogonal nonnegative matrix factorization models each row of X as a point along one of these rays. Some authors =-=[DLPP06]-=- have also considered how to obtain a bi-orthogonal nonnegative matrix factorization, in which both X and Y T have orthogonal columns. By the same argument as above, we see this is equivalent to requi... |

83 | Non-negative matrix factorization based on alternating non-negativity constrained least squares and active set method - KIM, PARK - 2008 |

72 | Regression quantiles. Econometrica - Koenker, Bassett - 1978 |

72 | Parallel stochastic gradient algorithms for large-scale matrix completion. Optimization Online - Recht, Ré - 2011 |

64 | A geometric analysis of subspace clustering with outliers
- Soltanolkotabi, Candes
(Show Context)
Citation Context ...Many other algorithms for this problem have also been proposed, such as the k-SVD [AEB06] and sparse subspace clustering [EV09], some with provable guarantees on the quality of the recovered solution =-=[SC12]-=-. 15 Supervised learning. Sometimes we want to understand the variation that a certain set of features can explain, and the variance that remains unexplainable. To this end, one natural strategy would... |

57 | A generalized linear model for principal component analysis of binary data
- Schein, Saul, et al.
- 2003
(Show Context)
Citation Context ... exp(−au)). With this loss, fixing X and minimizing over yj is equivalent to using logistic regression to predict the labels Aij. This model has been previously considered under the name logistic PCA =-=[SSU03]-=-. Poisson PCA. Now suppose the data Aij are nonnegative integers. We can use any loss function that might be used in a regression framework to predict integral data to construct a generalized low rank... |

56 | A unified view of matrix factorization models
- Singh, Gordon
- 2008
(Show Context)
Citation Context ...o’s PhD thesis [SRJ04] summarizes a number of models extending the framework to other loss functions (e.g., hinge loss and KL-divergence loss), and adding nuclear norm and max-norm regularization. In =-=[SG08]-=-, the authors offer a complete view of the state of the literature on matrix factorization as of 2008 in Table 1 of their paper. They note that by changing the loss function and regularization, one ma... |

54 | Low-rank matrix completion using alternating minimization.
- Jain, Netrapalli, et al.
- 2013
(Show Context)
Citation Context ...o find a local minimum using alternating minimization. Alternating minimization has been shown to converge geometrically to the global solution when the initial values of X and Y are chosen carefully =-=[JNS13]-=-, but in general the method should be considered a heuristic. 2.5 Interpretations and applications The recovered matrices X and Y in the quadratically regularized PCA problems (2) and (8) admit a numb... |

52 | Incremental gradient, subgradient, and proximal methods for convex optimization: a survey. Optimization for
- Bertsekas
(Show Context)
Citation Context ...k = 1/k guarantees convergence to the globally optimal X if Y is fixed, while using a fixed, but sufficiently small, step size α guarantees convergence to a small O(α) neighborhood around the optimum =-=[Ber11]-=-. In numerical experiments, we find that using a fixed step size α on the order of 1/‖g‖2 gives fast convergence in practice. Stochastic gradients. Instead of computing the full gradient of L with res... |

51 | Proximal algorithms
- Parikh, Boyd
(Show Context)
Citation Context ...ases where the regularizer and loss are (finite) real valued. When either the loss of regularizer take on infinite values, we can use a proximal gradient method. The proximal operator of a function f =-=[PB13]-=- is proxf (z) = argmin x (f(x) + 1 2 ‖x− z‖22). If f is the indicator function of a set C, the proximal operator of f is just (Euclidean) projection onto C. A proximal gradient update updateL,r is imp... |

48 | Local minima and convergence in low-rank semidefinite programming. - Burer, Monteiro - 2005 |

45 | Deflation methods for sparse PCA - Mackey - 2009 |

44 | A simple and fast algorithm for k-medoids clustering - Park, Jun - 2009 |

43 | Practical large-scale optimization for max-norm regularization - Lee, Recht, et al. - 2010 |

40 | Nearest q-flat to m points - Tseng |

39 | k-means projective clustering - Agarwal, Mustafa - 2004 |

38 | Factoring nonnegative matrices with linear programs - Bittorf, Recht, et al. - 2012 |

38 | H (2008) Toward faster nonnegative matrix factorization: a new algorithm and comparisons - Kim, Park |

34 | Ranking with ordered weighted pairwise classification
- Usunier, Buffoni, et al.
- 2009
(Show Context)
Citation Context ...proposed above, or which prioritize correctly predicting the top ranked choices. These losses include the area under the curve loss [Ste07], ordered weighted average of pairwise classification losses =-=[UBG09]-=-, the weighted approximate-rank pairwise loss [WBU10], the k-order statistic loss [WYW13], and the accuracy at the top loss [BCMR12]. 6.2 Offsets and scaling Just as in the previous section, better pr... |

33 | Fast nonnegative matrix factorization: An active-set-like method and comparisons - Kim, Park - 2011 |

29 | Proximal alternating linearized minimization for nonconvex and nonsmooth problems - Bolte, Sabach, et al. - 2013 |

28 | A tutorial on subspace clustering.
- Vidal
- 2010
(Show Context)
Citation Context ...mates a data set by a single low dimensional subspace. We may also be interested in approximating a data set as a union of low dimensional subspaces. This problem is known as subspace clustering (see =-=[Vid10]-=- and references therein). Subspace clustering may also be thought of as generalizing quadratic clustering to assign each data vector to a low dimensional subspace rather than to a single cluster centr... |

28 | Matrix estimation by universal singular value thresholding - Chatterjee |

27 | Julia: A fast dynamic language for technical computing. arXiv preprint arXiv:1209.5145
- Bezanson, Karpinski, et al.
- 2012
(Show Context)
Citation Context ...l description and up-to-date information about available functionality, we encourage the reader to consult the on-line documentation. 8.1 Julia implementation LowRankModels is a code written in Julia =-=[BKSE12]-=- for modelling and fitting GLRMs. The implementation is available on-line at https://github.com/madeleineudell/LowRankModels.jl. We discuss some aspects of the usage and features of the code here. For... |

26 |
Generalized2 linear2 models
- Gordon
- 2002
(Show Context)
Citation Context ...isions. The first instance we find in the literature of this unified view appeared in [CDS01], extending PCA to any probabilistic model in the exponential family. Gordon’s Generalized2 Linear2 models =-=[Gor02]-=- further extended the unified framework to loss functions derived from the generalized Bregman divergence of any convex function, which includes models such as Independent Components 4 Analysis (ICA).... |

25 | Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization,” arXiv - Agarwal, Anandkumar, et al. - 2013 |

20 | The Gifi system of nonlinear multivariate analysis. Data analysis and informatics - Leeuw - 1984 |

19 |
Subgradient methods. Lecture notes for EE364b
- Boyd, Xiao, et al.
- 2003
(Show Context)
Citation Context ...adient step on the objective. This method can be used as long as L, r, and r̃ do not take infinite values. (If any of these functions f is not differentiable, replace ∇f below by any subgradient of f =-=[BXM03]-=-.) We implement updateL,r as follows. Let g = ∑ j:(i,j)∈Ω ∇Lij(xiyj, Aij)yj +∇r(xi). Then set xki = x k−1 i − αkg, for some step size αk. For example, a common step size rule is αk = 1/k, which guaran... |

17 | H (2014) Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework - Kim, He, et al. |

16 | 1-bit matrix completion. arXiv preprint arXiv:1209.3672 - Davenport, Plan, et al. - 2012 |

15 | Low-rank matrix approximation with weights or missing data is NP-hard
- Gillis, Glineur
(Show Context)
Citation Context ...wij(a − u)2, where wij is a weight, and take r = r̃ = 0. Unlike PCA, the weighted PCA problem has no known analytical solution [SJ03]; in fact, it is NP-hard to find an exact solution to weighted PCA =-=[GG11]-=-, although it is not known whether approximate solutions of moderate accuracy may always be efficiently obtained. Robust PCA. Despite its widespread use, PCA is very sensitive to outliers. Many author... |

13 | Gifi methods for optimal scaling in R: The package homals - Leeuw, Mair - 2009 |

7 | Accuracy at the top
- Boyd, Cortes, et al.
- 2012
(Show Context)
Citation Context ... [Ste07], ordered weighted average of pairwise classification losses [UBG09], the weighted approximate-rank pairwise loss [WBU10], the k-order statistic loss [WYW13], and the accuracy at the top loss =-=[BCMR12]-=-. 6.2 Offsets and scaling Just as in the previous section, better practical performance can often be achieved by allowing an offset in the model as described in §3.3, and scaling loss functions as des... |

7 |
Hinge rank loss and the area under the ROC curve
- Steck
- 2007
(Show Context)
Citation Context ...iterature that interpolate between the the two first loss functions proposed above, or which prioritize correctly predicting the top ranked choices. These losses include the area under the curve loss =-=[Ste07]-=-, ordered weighted average of pairwise classification losses [UBG09], the weighted approximate-rank pairwise loss [WBU10], the k-order statistic loss [WYW13], and the accuracy at the top loss [BCMR12]... |

7 |
The information bottleneck method,” arXiv preprint physics/0004057
- Tishby, Pereira, et al.
- 2000
(Show Context)
Citation Context ...r auto-encoder for the data; among all bilinear low rank encodings (X) and decodings (Y ) of the data, PCA minimizes the squared reconstruction error. Compression. We impose an information bottleneck =-=[TPB00]-=- on the data by using a low rank auto-encoder to fit the data. PCA finds X and Y to maximize the information transmitted through this k-dimensional information bottleneck. We can interpret the solutio... |

7 | It’s not carpal tunnel syndrome!: RSI theory and therapy for computer professionals - Damany, Bellis - 2000 |

6 | Matrix completion and low-rank svd via fast alternating least squares. arXiv
- Hastie, Mazumder, et al.
- 2014
(Show Context)
Citation Context ...+10], for modelling and fitting GLRMs. The implementation is available on-line at http://git.io/glrmspark. Design. In SparkGLRM, the data matrix A is split entry-wise across many machines, just as in =-=[HMLZ14]-=-. The model (factors X and Y ) is replicated and stored in memory on every machine. Thus the total computation time required to fit the model is proportional to the number of nonzeros divided by the n... |

5 |
Robust subspace clustering,” arXiv preprint arXiv:1301.2603
- Soltanolkotabi, Elhamifar, et al.
- 2013
(Show Context)
Citation Context ...th an `1 or Huber loss function. For example, k-mediods [KR09, PJ09] is obtained by using `1 loss in place of quadratic loss in the quadratic clustering problem. Similarly, robust subspace clustering =-=[SEC13]-=- can be obtained by using an `1 or Huber penalty in the subspace clustering problem. 19 Quantile PCA. For some applications, it can be much worse to overestimate the entries of A than to underestimate... |

3 | Random projections for non-negative matrix factorization. arXiv preprint arXiv:1405.4275 - Damle, Sun - 2014 |

3 | Alternating maximization: unifying framework for 8 sparse PCA formulations and efficient parallel codes. arXiv preprint arXiv:1212.4137 - Richtárik, Takáč, et al. - 2012 |

2 | Smallk is a C++/Python highperformance software library for nonnegative matrix factorization (NMF) and hierarchical and flat clustering using the NMF; current version 1.2.0. http: //smallk.github.io - Boyd, Drake, et al. - 2014 |

2 | Branch and bound methods. Lecture notes for EE364b - Boyd, Mattingley - 2003 |

2 | Learning to rank recommendations with the k-order statistic loss
- Weston, Yee, et al.
- 2013
(Show Context)
Citation Context ...sses include the area under the curve loss [Ste07], ordered weighted average of pairwise classification losses [UBG09], the weighted approximate-rank pairwise loss [WBU10], the k-order statistic loss =-=[WYW13]-=-, and the accuracy at the top loss [BCMR12]. 6.2 Offsets and scaling Just as in the previous section, better practical performance can often be achieved by allowing an offset in the model as described... |

1 |
Personality, motivation and cognitive performance: Final report to the army research institute on contract MDA 903-93-K-0008
- Revelle, Anderson
- 1998
(Show Context)
Citation Context ...and ignoring any missing (NA) element in the data frame. This GLRM can then be fit with the function fit. Example. As an example, we fit a GLRM to the Motivational States Questionnaire (MSQ) data set =-=[RA98]-=-. This data set measures 3896 subjects on 92 aspects of mood and personality type, as well as recording the time of day the data were collected. The data include realvalued, Boolean, and ordinal measu... |

1 | Factorbird — a parameter server approach to distributed matrix factorization
- Schelter, Satuluri, et al.
- 2014
(Show Context)
Citation Context ...by the number of cores, with the restriction that the model should fit in memory. (The authors leave to future work an extension to models that do not fit in memory, e.g., by using a parameter server =-=[SSZ14]-=-.) Where possible, hardware acceleration (via breeze and BLAS) is used for local linear algebraic operations. 42 x -0.2 -0.1 0.0 0.1 0.2 y Afraid Angry Aroused Ashamed Astonished AtEase Confident Cont... |

1 | Random walks on context spaces: Towards an explanation of the mysteries of semantic word embeddings. arXiv preprint arXiv:1502.03520 - Arora, Li, et al. - 2015 |

1 | Parallel prefix polymorphism permits parallelization, presentation & proof. arXiv preprint arXiv:1410.6449 - Chen, Edelman - 2014 |

1 | Quadratic programing solver for non-negative matrix factorization with spark - Das, Das - 2014 |