Results 1 - 10
of
12,172
Word Sequences as Features in Text-Learning
- In Proceedings of the 17th Electrotechnical and Computer Science Conference (ERK98
, 1998
"... This paper proposes an efficient algorithm for the generation of new features that enrich the known bagof -words document representation. New features are generated based on word sequences of different length. Learning is performed using Naive Bayesian classifier on feature-vectors, where only highl ..."
Abstract
-
Cited by 58 (5 self)
- Add to MetaCart
This paper proposes an efficient algorithm for the generation of new features that enrich the known bagof -words document representation. New features are generated based on word sequences of different length. Learning is performed using Naive Bayesian classifier on feature-vectors, where only
Word-sequence kernels
- JOURNAL OF MACHINE LEARNING RESEARCH
, 2003
"... We address the problem of categorising documents using kernel-based methods such as Support Vector Machines. Since the work of Joachims (1998), there is ample experimental evidence that SVM using the standard word frequencies as features yield state-of-the-art performance on a number of benchmark pr ..."
Abstract
-
Cited by 22 (0 self)
- Add to MetaCart
problems. Recently, Lodhi et al. (2002) proposed the use of string kernels, a novel way of computing document similarity based of matching non-consecutive subsequences of characters. In this article, we propose the use of this technique with sequences of words rather than characters. This approach has
Word Sequences and Intersection Numbers
"... There are certain sequences of words in the generators of a two-generator subgroup of SL(2,C) that frequently arise in the Teichmüller theory of hyperbolic three-manifolds and Kleinian groups. In this pa-per we establish the connection between two such families, the family of Farey words that have ..."
Abstract
- Add to MetaCart
There are certain sequences of words in the generators of a two-generator subgroup of SL(2,C) that frequently arise in the Teichmüller theory of hyperbolic three-manifolds and Kleinian groups. In this pa-per we establish the connection between two such families, the family of Farey words that have
Numerals and word sequences Summary
"... According to (Spelke and Tsivkin 2001) numerals are a linguistic and cognitive bridge between two types of “core ” knowledge, that is, subitization of small quantities and approximate representation of large quantities. In this paper I go somewhat their way but I also introduce some apriori constrai ..."
Abstract
- Add to MetaCart
is primarily neither a syntactic nor a semantic peculiarity. It is instead in their morphology. Mastering numerals and names for days of the week is assigning them a certain non-standard morphology, whereby any numeral is a mandatorily a non-independent part of a longer sequence. It is hypothesized
Unsupervised learning of human action categories using spatial-temporal words
- In Proc. BMVC
, 2006
"... Imagine a video taken on a sunny beach, can a computer automatically tell what is happening in the scene? Can it identify different human activities in the video, such as water surfing, people walking and lying on the beach? To automatically classify or localize different actions in video sequences ..."
Abstract
-
Cited by 494 (8 self)
- Add to MetaCart
Imagine a video taken on a sunny beach, can a computer automatically tell what is happening in the scene? Can it identify different human activities in the video, such as water surfing, people walking and lying on the beach? To automatically classify or localize different actions in video sequences
Discovery of frequent word sequences in text
- In Proceedings of Pattern Detection and Discovery,pages 180–189
"... Abstract. We have developed a method that extracts all maximal frequent word sequences from the documents of a collection. A sequence is said to be frequent if it appears in more than σ documents, in which σ is the frequency threshold given. Furthermore, a sequence is maximal, if no other frequent ..."
Abstract
-
Cited by 13 (0 self)
- Add to MetaCart
Abstract. We have developed a method that extracts all maximal frequent word sequences from the documents of a collection. A sequence is said to be frequent if it appears in more than σ documents, in which σ is the frequency threshold given. Furthermore, a sequence is maximal, if no other frequent
Authorship Attribution using Word Sequences
"... Abstract. Authorship attribution is the task of identifying the author of a given text. The main concern of this task is to define an appropriate characterization of documents that captures the writing style of authors. This paper proposes a new method for authorship attribution supported on the ide ..."
Abstract
-
Cited by 14 (5 self)
- Add to MetaCart
on the idea that a proper identification of authors must consider both stylistic and topic features of texts. This method characterizes documents by a set of word sequences that combine functional and content words. The experimental results on poem classification demonstrated that this method outperforms most
Using Word Sequences for Text Summarization
"... Abstract. Traditional approaches for extractive summarization score/classify sentences based on features such as position in the text, word frequency and cue phrases. These features tend to produce satisfactory summaries, but have the inconvenience of being domain dependent. In this paper, we propos ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
propose to tackle this problem representing the sentences by word sequences (n-grams), a widely used representation in text categorization. The experiments demonstrated that this simple representation not only diminishes the domain and language dependency but also enhances the summarization performance. 1
Estimation of probabilities from sparse data for the language model component of a speech recognizer
- IEEE Transactions on Acoustics, Speech and Signal Processing
, 1987
"... Abstract-The description of a novel type of rn-gram language model is given. The model offers, via a nonlinear recursive procedure, a com-putation and space efficient solution to the problem of estimating prob-abilities from sparse data. This solution compares favorably to other proposed methods. Wh ..."
Abstract
-
Cited by 799 (2 self)
- Add to MetaCart
, and it is a problem that one always encounters while collecting fre-quency statistics on words and word sequences (m-grams) from a text of finite size. This means that even for a very large data col-lection, the maximum likelihood estimation method does not allow Turing’s estimate PT for a probability of a
Results 1 - 10
of
12,172