Results 1  10
of
721,561
Approximate Classification via Earthmover Metrics
 In SODA ’04: Proceedings of the fifteenth annual ACMSIAM symposium on Discrete algorithms
, 2004
"... Given a metric space (X, d), a natural distance measure on probability distributions over X is the earthmover metric. We use randomized rounding of earthmover metrics to devise new approximation algorithms for two wellknown classification problems, namely, metric labeling and 0extension. ..."
Abstract

Cited by 21 (4 self)
 Add to MetaCart
Given a metric space (X, d), a natural distance measure on probability distributions over X is the earthmover metric. We use randomized rounding of earthmover metrics to devise new approximation algorithms for two wellknown classification problems, namely, metric labeling and 0extension.
An extensive empirical study of feature selection metrics for text classification
 J. of Machine Learning Research
, 2003
"... Machine learning for text classification is the cornerstone of document categorization, news filtering, document routing, and personalization. In text domains, effective feature selection is essential to make the learning task efficient and more accurate. This paper presents an empirical comparison ..."
Abstract

Cited by 483 (15 self)
 Add to MetaCart
in different situations. The results reveal that a new feature selection metric we call ‘BiNormal Separation ’ (BNS), outperformed the others by a substantial margin in most situations. This margin widened in tasks with high class skew, which is rampant in text classification problems and is particularly
Approximate Signal Processing
, 1997
"... It is increasingly important to structure signal processing algorithms and systems to allow for trading off between the accuracy of results and the utilization of resources in their implementation. In any particular context, there are typically a variety of heuristic approaches to managing these tra ..."
Abstract

Cited by 516 (2 self)
 Add to MetaCart
these tradeoffs. One of the objectives of this paper is to suggest that there is the potential for developing a more formal approach, including utilizing current research in Computer Science on Approximate Processing and one of its central concepts, Incremental Refinement. Toward this end, we first summarize a
Distance Metric Learning, With Application To Clustering With SideInformation
 ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 15
, 2003
"... Many algorithms rely critically on being given a good metric over their inputs. For instance, data can often be clustered in many "plausible" ways, and if a clustering algorithm such as Kmeans initially fails to find one that is meaningful to a user, the only recourse may be for the us ..."
Abstract

Cited by 799 (14 self)
 Add to MetaCart
Many algorithms rely critically on being given a good metric over their inputs. For instance, data can often be clustered in many "plausible" ways, and if a clustering algorithm such as Kmeans initially fails to find one that is meaningful to a user, the only recourse may
Similarity search in high dimensions via hashing
, 1999
"... The nearest or nearneighbor query problems arise in a large variety of database applications, usually in the context of similarity searching. Of late, there has been increasing interest in building search/index structures for performing similarity search over highdimensional data, e.g., image dat ..."
Abstract

Cited by 622 (13 self)
 Add to MetaCart
to 20, searching in kd trees and related structures involves the inspection of a large fraction of the database, thereby doing no better than bruteforce linear search. It has been suggested that since the selection of features and the choice of a distance metric in typical applications is rather
Determining the Number of Factors in Approximate Factor Models
, 2000
"... In this paper we develop some statistical theory for factor models of large dimensions. The focus is the determination of the number of factors, which is an unresolved issue in the rapidly growing literature on multifactor models. We propose a panel Cp criterion and show that the number of factors c ..."
Abstract

Cited by 538 (29 self)
 Add to MetaCart
In this paper we develop some statistical theory for factor models of large dimensions. The focus is the determination of the number of factors, which is an unresolved issue in the rapidly growing literature on multifactor models. We propose a panel Cp criterion and show that the number of factors can be consistently estimated using the criterion. The theory is developed under the framework of large crosssections (N) and large time dimensions (T). No restriction is imposed on the relation between N and T. Simulations show that the proposed criterion yields almost precise estimates of the number of factors for configurations of the panel data encountered in practice. The idea that variations in a large number of economic variables can be modelled bya small number of reference variables is appealing and is used in manyeconomic analysis. In the finance literature, the arbitrage pricing theory(APT) of Ross (1976) assumes that a small number of factors can be used to explain a large number of asset returns.
A Guided Tour to Approximate String Matching
 ACM COMPUTING SURVEYS
, 1999
"... We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining t ..."
Abstract

Cited by 584 (38 self)
 Add to MetaCart
We survey the current techniques to cope with the problem of string matching allowing errors. This is becoming a more and more relevant issue for many fast growing areas such as information retrieval and computational biology. We focus on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms and their complexities. We present a number of experiments to compare the performance of the different algorithms and show which are the best choices according to each case. We conclude with some future work directions and open problems.
Text Classification from Labeled and Unlabeled Documents using EM
 MACHINE LEARNING
, 1999
"... This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large qua ..."
Abstract

Cited by 1033 (19 self)
 Add to MetaCart
This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is important because in many text classification problems obtaining training labels is expensive, while large
Local features and kernels for classification of texture and object categories: a comprehensive study
 International Journal of Computer Vision
, 2007
"... Recently, methods based on local image features have shown promise for texture and object recognition tasks. This paper presents a largescale evaluation of an approach that represents images as distributions (signatures or histograms) of features extracted from a sparse set of keypoint locations an ..."
Abstract

Cited by 644 (35 self)
 Add to MetaCart
the influence of background correlations on recognition performance via extensive tests on the PASCAL database, for which groundtruth object localization information is available. Our experiments demonstrate that image representations based on distributions of local features are surprisingly effective
Results 1  10
of
721,561