Results 1 - 10
of
274
Projected gradient methods for non-negative matrix factorization
- Neural Computation
, 2007
"... Non-negative matrix factorization (NMF) can be formulated as a minimiza-tion problem with bound constraints. Although bound-constrained optimization has been studied extensively in both theory and practice, so far no study has formally applied its techniques to NMF. In this paper, we propose two pro ..."
Abstract
-
Cited by 76 (1 self)
- Add to MetaCart
Non-negative matrix factorization (NMF) can be formulated as a minimiza-tion problem with bound constraints. Although bound-constrained optimization has been studied extensively in both theory and practice, so far no study has formally applied its techniques to NMF. In this paper, we propose two projected gradient methods for NMF, both of which exhibit strong optimization properties. We discuss efficient implementations and demonstrate that one of the proposed methods converges faster than the popular multiplicative update approach. A simple MATLAB code is also provided. 1
Letor: Benchmark dataset for research on learning to rank for information retrieval
- In Proceedings of SIGIR 2007 Workshop on Learning to Rank for Information Retrieval
, 2007
"... This paper is concerned with learning to rank for information retrieval (IR). Ranking is the central problem for information retrieval, and employing machine learning techniques to learn the ranking function is viewed as a promising approach to IR. Unfortunately, there was no benchmark dataset that ..."
Abstract
-
Cited by 73 (11 self)
- Add to MetaCart
This paper is concerned with learning to rank for information retrieval (IR). Ranking is the central problem for information retrieval, and employing machine learning techniques to learn the ranking function is viewed as a promising approach to IR. Unfortunately, there was no benchmark dataset that could be used in comparison of existing learning algorithms and in evaluation of newly proposed algorithms, which stood in the way of the related research. To deal with the problem, we have constructed a benchmark dataset referred to as LETOR and distributed it to the research communities. Specifically we have derived the LETOR data from the existing data sets widely used in IR, namely, OHSUMED and TREC data. The two collections contain queries, the contents of the retrieved documents, and human judgments on the relevance of the documents with respect to the queries. We have extracted features from the datasets, including both conventional features, such as term frequency, inverse document frequency, BM25, and language models for IR, and features proposed recently at SIGIR, such as HostRank, feature propagation, and topical PageRank. We have then packaged LETOR with the extracted features, queries, and relevance judgments. We have also provided the results of several state-ofthe-arts learning to rank algorithms on the data. This paper describes in details about LETOR.
Feature generation for text categorization using world knowledge
- In IJCAI’05
, 2005
"... We enhance machine learning algorithms for text categorization with generated features based on domain-specific and common-sense knowledge. This knowledge is represented using publicly available ontologies that contain hundreds of thousands of concepts, such as the Open Directory; these ontologies a ..."
Abstract
-
Cited by 62 (13 self)
- Add to MetaCart
We enhance machine learning algorithms for text categorization with generated features based on domain-specific and common-sense knowledge. This knowledge is represented using publicly available ontologies that contain hundreds of thousands of concepts, such as the Open Directory; these ontologies are further enriched by several orders of magnitude through controlled Web crawling. Prior to text categorization, a feature generator analyzes the documents and maps them onto appropriate ontology concepts, which in turn induce a set of generated features that augment the standard bag of words. Feature generation is accomplished through contextual analysis of document text, implicitly performing word sense disambiguation. Coupled with the ability to generalize concepts using the ontology, this approach addresses the two main problems of natural language processing—synonymy and polysemy. Categorizing documents with the aid of knowledge-based features leverages information that cannot be deduced from the documents alone. Experimental results confirm improved performance, breaking through the plateau previously reached in the field. 1
Multi-label classification: An overview
- Int J Data Warehousing and Mining
, 2007
"... Nowadays, multi-label classification methods are increasingly required by modern applications, such as protein function classification, music categorization and semantic scene classification. This paper introduces the task of multi-label classification, organizes the sparse related literature into a ..."
Abstract
-
Cited by 46 (6 self)
- Add to MetaCart
Nowadays, multi-label classification methods are increasingly required by modern applications, such as protein function classification, music categorization and semantic scene classification. This paper introduces the task of multi-label classification, organizes the sparse related literature into a structured presentation and performs comparative experimental results of certain multi-label classification methods. It also contributes the definition of concepts for the quantification of the multi-label nature of a data set.
Text Categorization with Many Redundant Features: Using Aggressive Feature Selection to Make SVMs Competitive with C4.5
- In ICML’04
, 2004
"... Text categorization algorithms usually represent documents as bags of words and consequently have to deal with huge numbers of features. Most previous studies found that the majority of these features are relevant for classification, and that the performance of text categorization with support ..."
Abstract
-
Cited by 43 (4 self)
- Add to MetaCart
Text categorization algorithms usually represent documents as bags of words and consequently have to deal with huge numbers of features. Most previous studies found that the majority of these features are relevant for classification, and that the performance of text categorization with support vector machines peaks when no feature selection is performed.
A geometric approach to monitoring threshold functions over distributed data streams
- In ACM SIGMOD
, 2006
"... Monitoring data streams in a distributed system is the focus of much research in recent years. Most of the proposed schemes, however, deal with monitoring simple aggregated values, such as the frequency of appearance of items in the streams. More involved challenges, such as the important task of fe ..."
Abstract
-
Cited by 40 (5 self)
- Add to MetaCart
Monitoring data streams in a distributed system is the focus of much research in recent years. Most of the proposed schemes, however, deal with monitoring simple aggregated values, such as the frequency of appearance of items in the streams. More involved challenges, such as the important task of feature selection (e.g., by monitoring the information gain of various features), still require very high communication overhead using naive, centralized algorithms. We present a novel geometric approach by which an arbitrary global monitoring task can be split into a set of constraints applied locally on each of the streams. The constraints are used to locally filter out data increments that do not affect the monitoring outcome, thus avoiding unnecessary communication. As a result, our approach enables monitoring of arbitrary threshold functions over distributed data streams in an efficient manner. We present experimental results on real-world data which demonstrate that our algorithms are highly scalable, and considerably reduce communication load in comparison to centralized algorithms. 1.
Large scale transductive svms
- JMLR
"... We show how the Concave-Convex Procedure can be applied to Transductive SVMs, which traditionally require solving a combinatorial search problem. This provides for the first time a highly scalable algorithm in the nonlinear case. Detailed experiments verify the utility of our approach. Software is a ..."
Abstract
-
Cited by 38 (2 self)
- Add to MetaCart
We show how the Concave-Convex Procedure can be applied to Transductive SVMs, which traditionally require solving a combinatorial search problem. This provides for the first time a highly scalable algorithm in the nonlinear case. Detailed experiments verify the utility of our approach. Software is available at
Efficient Projections onto the ℓ1-Ball for Learning in High Dimensions
"... We describe efficient algorithms for projecting a vector onto the ℓ1-ball. We present two methods for projection. The first performs exact projection in O(n) expected time, where n is the dimension of the space. The second works on vectors k of whose elements are perturbed outside the ℓ1-ball, proje ..."
Abstract
-
Cited by 36 (8 self)
- Add to MetaCart
We describe efficient algorithms for projecting a vector onto the ℓ1-ball. We present two methods for projection. The first performs exact projection in O(n) expected time, where n is the dimension of the space. The second works on vectors k of whose elements are perturbed outside the ℓ1-ball, projecting in O(k log(n)) time. This setting is especially useful for online learning in sparse feature spaces such as text categorization applications. We demonstrate the merits and effectiveness of our algorithms in numerous batch and online learning tasks. We show that variants of stochastic gradient projection methods augmented with our efficient projection procedures outperform interior point methods, which are considered state-of-the-art optimization techniques. We also show that in online settings gradient updates with ℓ1 projections outperform the exponentiated gradient algorithm while obtaining models with high degrees of sparsity. 1.
Trust region Newton method for large-scale logistic regression
- In Proceedings of the 24th International Conference on Machine Learning (ICML
, 2007
"... Large-scale logistic regression arises in many applications such as document classification and natural language processing. In this paper, we apply a trust region Newton method to maximize the log-likelihood of the logistic regression model. The proposed method uses only approximate Newton steps in ..."
Abstract
-
Cited by 35 (5 self)
- Add to MetaCart
Large-scale logistic regression arises in many applications such as document classification and natural language processing. In this paper, we apply a trust region Newton method to maximize the log-likelihood of the logistic regression model. The proposed method uses only approximate Newton steps in the beginning, but achieves fast convergence in the end. Experiments show that it is faster than the commonly used quasi Newton approach for logistic regression. We also compare it with existing linear SVM implementations. 1

