## Recommender Systems Using Linear Classifiers (2002)

Venue: | Journal of Machine Learning Research |

Citations: | 24 - 0 self |

### BibTeX

@ARTICLE{Zhang02recommendersystems,

author = {Tong Zhang and Vijay S. Iyengar and Pack Kaelbling},

title = {Recommender Systems Using Linear Classifiers},

journal = {Journal of Machine Learning Research},

year = {2002},

volume = {2},

pages = {313--334}

}

### Years of Citing Articles

### OpenURL

### Abstract

Recommender systems use historical data on user preferences and other available data on users (for example, demographics) and items (for example, taxonomy) to predict items a new user might like. Applications of these methods include recommending items for purchase and personalizing the browsing experience on a web-site. Collaborative filtering methods have focused on using just the history of user preferences to make the recommendations.

### Citations

9102 | The Nature of Statistical Learning Theory
- Vapnik
- 1995
(Show Context)
Citation Context ...rom the sample complexity theory. Slightly di#erent from our approach of forcing threshold # = 0, and then compensating by appending 1 to each data vector, the standard linear support vector machine (=-=Vapnik, 1998-=-) explicitly includes # in a quadratic formulation as follows: (sw, #) = arg inf w,# # 1 n n # i=1 # i + #w 2 # , (3) s.t. y i (w T x i - #) # 1 - # i , # i # 0, i = 1, . . . , n. By eliminating # i ,... |

5010 |
C4.5: Programs for Machine Learning
- Quinlan
- 1993
(Show Context)
Citation Context ...decision tree is that we take advantage of the sparse data representation. The algorithm has low memory and time complexity for sparse data. A standard decision-tree package such as the C4.5 program (=-=Quinlan, 1993-=-) that does not take advantage of sparsity in the data will not be able to handle our problems. Although a decision-tree rule-based system is e#cient and interpretable, it does not provide the best pe... |

2211 | Support-vector networks - Cortes, Vapnik - 1995 |

1975 | Matrix Computations - Golub, Loan - 1996 |

1721 | Text Categorization with Support Vector Machines: Learning with Many Relevant Features
- Joachims
- 1998
(Show Context)
Citation Context ... In this paper, we explore the use of various linear classifiers in a model-based approach to the recommendation task. Linear classifiers have been quite successful in the text classification domain (=-=Joachims, 1998-=-, Yang and Chute, 1994, Zhang and Oles, 2001). Some of the characteristics shared between the text and CF domains include the high dimensionality and sparseness of the data in these domains. The main ... |

1142 | GroupLens: an open architecture for collaborative filtering of netnews
- Resnick, Iacovou, et al.
- 1994
(Show Context)
Citation Context ...ers and items to recommend items that might be interesting to a new user. One of the earliest techniques developed for recommendations was based on nearestneighbor collaborative filtering algorithms (=-=Resnick et al., 1994-=-, Shardanand and Maes, 1995) that used just the history of user preferences as input. Sometimes in the literature the term collaborative filtering is used to refer to just these methods. However, we w... |

1129 |
Pattern Recognition and Neural Networks
- Ripley
- 1996
(Show Context)
Citation Context ...ainly been associated with regression problems, it can also be used in classification. Examples include use in text categorization (Yang and Chute, 1994) and uses in combination with neural networks (=-=Ripley, 1996-=-). The solution of (1) is given bysw = # n # i=1 x i x T i # -1 # n # i=1 x i y i # . One problem with the above formulation is that the matrix # n i=1 x i x T i may be singular or ill-conditioned. Th... |

1049 | Empirical analysis of predictive algorithms for collaborative filtering
- Breese, Heckerman, et al.
- 1998
(Show Context)
Citation Context ...ased method uses clustering to group users based on their past preferences. The parameters for this clustering model can be estimated by methods like Gibbs sampling and EM (Ungar and Foster, 1998a,b, =-=Breese et al., 1998-=-). The clustering model explored by Breese et al. (1998) was outperformed by the model-based approach using Bayesian networks and by the memory-based approach CR+ described by Breese et al. (1998). In... |

900 | Social information filtering: Algorithms for automating ’word of mouth
- Shardanand, Maes
- 1995
(Show Context)
Citation Context ...mend items that might be interesting to a new user. One of the earliest techniques developed for recommendations was based on nearestneighbor collaborative filtering algorithms (Resnick et al., 1994, =-=Shardanand and Maes, 1995-=-) that used just the history of user preferences as input. Sometimes in the literature the term collaborative filtering is used to refer to just these methods. However, we will follow the taxonomy int... |

770 | A comparison of event models for naive bayes text classification
- McCallum, Nigam
- 1998
(Show Context)
Citation Context ...of regularization # can significantly a#ect the performance. There is another di#erence between the naive Bayes method described above and the standard naive Bayes method used in text categorization (=-=McCallum and Nigam, 1998-=-). A standard naive Bayes method would have computed w j as w 1 j and # as # 1 . That is, it only uses the in-class data. Although this approach is reasonable in text categorization (in that the quant... |

500 |
Ridge regression: biased estimation for nonorthogonal problems. Technometrics
- Hoerl, Kennard
- 1970
(Show Context)
Citation Context ...n order to handle large sparse systems, we need to use iterative algorithms which do not rely on matrix factorization techniques. Therefore in this paper, we use the standard ridge regression method (=-=Hoerl and Kennard, 1970-=-) that adds a regularization term to (1):sw = arg min w # 1 n n # i=1 (w T x i y i - 1) 2 + #w 2 # , (2) 317 Zhang and Iyengar where # is an appropriately chosen regularization parameter. The solution... |

500 | An evaluation of statistical approaches to text categorization
- Yang
- 1999
(Show Context)
Citation Context ...CR+. CR+ achieves very good performance in the recommender system application. As a comparison, it is also known that nearest neighbor algorithms achieve very good performance in text categorization (=-=Yang, 1999-=-). However, the major disadvantage of this method is the large computational and memory complexity. Consequently, simplifying heuristics have to be used to overcome this problem in practical applicati... |

364 | Analysis of recommendation algorithms for e-commerce
- Sarwar, Karypis, et al.
- 2000
(Show Context)
Citation Context ...cussed by Breese et al. (1998). Scalability is an issue with nearest-neighbor methods. The use of dimension reduction techniques like latent semantic indexing has been proposed to address this issue (=-=Sarwar et al., 2000-=-). c #2002 Tong Zhang and Vijay S. Iyengar. Zhang and Iyengar In contrast, model-based CF methods use the historical data to build models which are then used for predicting new preferences. A model-ba... |

161 | Clustering methods for collaborative filtering
- Ungar, Foster
- 1998
(Show Context)
Citation Context ...al., 2000). Another model-based method uses clustering to group users based on their past preferences. The parameters for this clustering model can be estimated by methods like Gibbs sampling and EM (=-=Ungar and Foster, 1998-=-a,b, Breese et al., 1998). The clustering model explored by Breese et al. (1998) was outperformed by the model-based approach using Bayesian networks and by the memory-based approach CR+ described by ... |

157 | Dependency networks for inference, collaborative filtering, and data visualization - Heckerman, Chickering, et al. |

116 |
An example-based mapping method for text categorization and retrieval
- Yang, Chute
- 1994
(Show Context)
Citation Context ...we explore the use of various linear classifiers in a model-based approach to the recommendation task. Linear classifiers have been quite successful in the text classification domain (Joachims, 1998, =-=Yang and Chute, 1994-=-, Zhang and Oles, 2001). Some of the characteristics shared between the text and CF domains include the high dimensionality and sparseness of the data in these domains. The main computational cost of ... |

79 | The context tree weighting method: Basic properties
- Willems, Shtarkov, et al.
- 1995
(Show Context)
Citation Context ...on-tree package, the splitting criterion during tree growth is a modified version of entropy and the tree pruning is done using a Bayesian model combination approach originated from data compression (=-=Willems et al., 1995-=-, Zhang, 1998). A similar approach has been suggested by Kearns and Mansour (1998). One useful aspect of our decision tree is that we take advantage of the sparse data representation. The algorithm ha... |

73 | Maximizing text-mining performance
- Weiss, Apté, et al.
- 1999
(Show Context)
Citation Context ...boosting" procedure, which votes on a large number of decision trees. In fact, the best text categorization result on the standard Reuters evaluation dataset is achieved using boosted decision tr=-=ees (Weiss et al., 1999-=-). Unfortunately, this approach requires a large number of trees (one hundred in Weiss et al., 1999). The resulting system not only loses the interpretability of a single decision tree, but is also co... |

34 | Bottom-up Decision Tree Pruning algorithm with Near-Optimal generalization - Kearns, Mansour, et al. - 1998 |

26 | Nonlinear markov networks for continuous variables
- Hofmann, Tresp
- 1997
(Show Context)
Citation Context ...ch using Bayesian networks was found to be comparable to the memory-based approach of Breese et al. (1998). More recently, models based on a newer graphical representation called dependency networks (=-=Hofmann and Tresp, 1997-=-) have been applied to this problem (Heckerman et al., 2000). For this task, dependency network models seem to have slightly poorer accuracy but require significantly less computation when compared to... |

26 | A formal statistical approach to collaborative filtering - Ungar, Foster - 1998 |

3 |
Compression by model combination
- Zhang
- 1998
(Show Context)
Citation Context ...plitting criterion during tree growth is a modified version of entropy and the tree pruning is done using a Bayesian model combination approach originated from data compression (Willems et al., 1995, =-=Zhang, 1998-=-). A similar approach has been suggested by Kearns and Mansour (1998). One useful aspect of our decision tree is that we take advantage of the sparse data representation. The algorithm has low memory ... |

1 |
Recommender Systems Using Linear Classifiers
- Dumais, Platt, et al.
- 1998
(Show Context)
Citation Context ...in this paper. It is known that in text categorization, the same level of performance achieved by boosted decision trees can be achieved by computationally more e#cient linear classification methods (=-=Dumais et al., 1998-=-, Joachims, 1998, Zhang and Oles, 2001). It is thus natural to consider linear classification in the context of CF. 2.2 Linear Models We formally define a two-class categorization problem as one to de... |

1 | bottom-up decision tree pruning algorithm with near optimal generalization - Afast - 1998 |