Results 1 - 10
of
73,548
An extensive empirical study of feature selection metrics for text classification
- J. of Machine Learning Research
, 2003
"... Machine learning for text classification is the cornerstone of document categorization, news filtering, document routing, and personalization. In text domains, effective feature selection is essential to make the learning task efficient and more accurate. This paper presents an empirical comparison ..."
Abstract
-
Cited by 496 (15 self)
- Add to MetaCart
Machine learning for text classification is the cornerstone of document categorization, news filtering, document routing, and personalization. In text domains, effective feature selection is essential to make the learning task efficient and more accurate. This paper presents an empirical comparison of twelve feature selection methods (e.g. Information Gain) evaluated on a benchmark of 229 text classification problem instances that were gathered from Reuters, TREC, OHSUMED, etc. The results are analyzed from multiple goal perspectives—accuracy, F-measure, precision, and recall—since each is appropriate in different situations. The results reveal that a new feature selection metric we call ‘Bi-Normal Separation ’ (BNS), outperformed the others by a substantial margin in most situations. This margin widened in tasks with high class skew, which is rampant in text classification problems and is particularly challenging for induction algorithms. A new evaluation methodology is offered that focuses on the needs of the data mining practitioner faced with a single dataset who seeks to choose one (or a pair of) metrics that are most likely to yield the best performance. From this perspective, BNS was the top single choice for all goals except precision, for which Information Gain yielded the best result most often. This analysis also revealed, for example, that Information Gain and Chi-Squared have correlated failures, and so they work poorly together. When choosing optimal pairs of metrics for each of the four performance goals, BNS is consistently a member of the pair—e.g., for greatest recall, the pair BNS + F1-measure yielded the best performance on the greatest number of tasks by a considerable margin.
A Language Modeling Approach to Information Retrieval
, 1998
"... Models of document indexing and document retrieval have been extensively studied. The integration of these two classes of models has been the goal of several researchers but it is a very difficult problem. We argue that much of the reason for this is the lack of an adequate indexing model. This sugg ..."
Abstract
-
Cited by 1154 (42 self)
- Add to MetaCart
Models of document indexing and document retrieval have been extensively studied. The integration of these two classes of models has been the goal of several researchers but it is a very difficult problem. We argue that much of the reason for this is the lack of an adequate indexing model
An Empirical Study of Smoothing Techniques for Language Modeling
, 1998
"... We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Br ..."
Abstract
-
Cited by 1224 (21 self)
- Add to MetaCart
We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e
Some informational aspects of visual perception
- Psychol. Rev
, 1954
"... The ideas of information theory are at present stimulating many different areas of psychological inquiry. In providing techniques for quantifying situations which have hitherto been difficult or impossible to quantify, they suggest new and more precise ways of conceptualizing these situations (see M ..."
Abstract
-
Cited by 643 (2 self)
- Add to MetaCart
Miller [12] for a general discussion and bibliography). Events ordered in time are particularly amenable to informational analysis; thus language sequences are being extensively studied, and other sequences, such as those of music, plainly invite research. In this paper I shall indicate some of the ways
A theory of timed automata
, 1999
"... Model checking is emerging as a practical tool for automated debugging of complex reactive systems such as embedded controllers and network protocols (see [23] for a survey). Traditional techniques for model checking do not admit an explicit modeling of time, and are thus, unsuitable for analysis of ..."
Abstract
-
Cited by 2651 (32 self)
- Add to MetaCart
using finitely many real-valued clock variables. Automated analysis of timed automata relies on the construction of a finite quotient of the infinite space of clock valuations. Over the years, the formalism has been extensively studied leading to many results establishing connections to circuits
Local features and kernels for classification of texture and object categories: a comprehensive study
- International Journal of Computer Vision
, 2007
"... Recently, methods based on local image features have shown promise for texture and object recognition tasks. This paper presents a large-scale evaluation of an approach that represents images as distributions (signatures or histograms) of features extracted from a sparse set of keypoint locations an ..."
Abstract
-
Cited by 653 (34 self)
- Add to MetaCart
the influence of background correlations on recognition performance via extensive tests on the PASCAL database, for which ground-truth object localization information is available. Our experiments demonstrate that image representations based on distributions of local features are surprisingly effective
A calculus for cryptographic protocols: The spi calculus
- Information and Computation
, 1999
"... We introduce the spi calculus, an extension of the pi calculus designed for the description and analysis of cryptographic protocols. We show how to use the spi calculus, particularly for studying authentication protocols. The pi calculus (without extension) suffices for some abstract protocols; the ..."
Abstract
-
Cited by 898 (50 self)
- Add to MetaCart
We introduce the spi calculus, an extension of the pi calculus designed for the description and analysis of cryptographic protocols. We show how to use the spi calculus, particularly for studying authentication protocols. The pi calculus (without extension) suffices for some abstract protocols
How practical is network coding?
, 2006
"... With network coding, intermediate nodes between the source and the receivers of an end-to-end communication session are not only capable of relaying and replicating data messages, but also of coding incoming messages to produce coded outgoing ones. Recent studies have shown that network coding is ..."
Abstract
-
Cited by 1016 (23 self)
- Add to MetaCart
layer. We present our observations based on an extensive series of experiments, draw conclusions from a wide range of scenarios, and are more cautious and less optimistic as compared to previous studies.
Matching words and pictures
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... We present a new approach for modeling multi-modal data sets, focusing on the specific case of segmented images with associated text. Learning the joint distribution of image regions and words has many applications. We consider in detail predicting words associated with whole images (auto-annotation ..."
Abstract
-
Cited by 665 (40 self)
- Add to MetaCart
, including several which explicitly learn the correspondence between regions and words. We study multi-modal and correspondence extensions to Hofmann’s hierarchical clustering/aspect model, a translation model adapted from statistical machine translation (Brown et al.), and a multi-modal extension to mixture
Happiness is everything, or is it? Explorations on the meaning of psychological well-being
, 1989
"... Reigning measures of psychological well-being have little theoretical grounding, despite an extensive literature on the contours of positive functioning. Aspects of well-being derived from this literature (i.e., self-acceptance, positive relations with others, autonomy, environmental mastery, purpos ..."
Abstract
-
Cited by 595 (14 self)
- Add to MetaCart
Reigning measures of psychological well-being have little theoretical grounding, despite an extensive literature on the contours of positive functioning. Aspects of well-being derived from this literature (i.e., self-acceptance, positive relations with others, autonomy, environmental mastery
Results 1 - 10
of
73,548