• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 12,172
Next 10 →

Word Sequences as Features in Text-Learning

by Dunja Mladenic, Marko Grobelnik - In Proceedings of the 17th Electrotechnical and Computer Science Conference (ERK98 , 1998
"... This paper proposes an efficient algorithm for the generation of new features that enrich the known bagof -words document representation. New features are generated based on word sequences of different length. Learning is performed using Naive Bayesian classifier on feature-vectors, where only highl ..."
Abstract - Cited by 58 (5 self) - Add to MetaCart
This paper proposes an efficient algorithm for the generation of new features that enrich the known bagof -words document representation. New features are generated based on word sequences of different length. Learning is performed using Naive Bayesian classifier on feature-vectors, where only

Word-sequence kernels

by Nicola Cancedda, Eric Gaussier, Cyril Goutte, Jean-Michel Renders - JOURNAL OF MACHINE LEARNING RESEARCH , 2003
"... We address the problem of categorising documents using kernel-based methods such as Support Vector Machines. Since the work of Joachims (1998), there is ample experimental evidence that SVM using the standard word frequencies as features yield state-of-the-art performance on a number of benchmark pr ..."
Abstract - Cited by 22 (0 self) - Add to MetaCart
problems. Recently, Lodhi et al. (2002) proposed the use of string kernels, a novel way of computing document similarity based of matching non-consecutive subsequences of characters. In this article, we propose the use of this technique with sequences of words rather than characters. This approach has

Word Sequences and Intersection Numbers

by Jane Gilman, Linda Keen
"... There are certain sequences of words in the generators of a two-generator subgroup of SL(2,C) that frequently arise in the Teichmüller theory of hyperbolic three-manifolds and Kleinian groups. In this pa-per we establish the connection between two such families, the family of Farey words that have ..."
Abstract - Add to MetaCart
There are certain sequences of words in the generators of a two-generator subgroup of SL(2,C) that frequently arise in the Teichmüller theory of hyperbolic three-manifolds and Kleinian groups. In this pa-per we establish the connection between two such families, the family of Farey words that have

Numerals and word sequences Summary

by Roberto Casati
"... According to (Spelke and Tsivkin 2001) numerals are a linguistic and cognitive bridge between two types of “core ” knowledge, that is, subitization of small quantities and approximate representation of large quantities. In this paper I go somewhat their way but I also introduce some apriori constrai ..."
Abstract - Add to MetaCart
is primarily neither a syntactic nor a semantic peculiarity. It is instead in their morphology. Mastering numerals and names for days of the week is assigning them a certain non-standard morphology, whereby any numeral is a mandatorily a non-independent part of a longer sequence. It is hypothesized

On Collatz ’ Words, Sequences and Trees

by unknown authors , 2014
"... ar ..."
Abstract - Add to MetaCart
Abstract not found

Unsupervised learning of human action categories using spatial-temporal words

by Juan Carlos Niebles, Hongcheng Wang, Li Fei-fei - In Proc. BMVC , 2006
"... Imagine a video taken on a sunny beach, can a computer automatically tell what is happening in the scene? Can it identify different human activities in the video, such as water surfing, people walking and lying on the beach? To automatically classify or localize different actions in video sequences ..."
Abstract - Cited by 494 (8 self) - Add to MetaCart
Imagine a video taken on a sunny beach, can a computer automatically tell what is happening in the scene? Can it identify different human activities in the video, such as water surfing, people walking and lying on the beach? To automatically classify or localize different actions in video sequences

Discovery of frequent word sequences in text

by Helena Ahonen-Myka - In Proceedings of Pattern Detection and Discovery,pages 180–189
"... Abstract. We have developed a method that extracts all maximal frequent word sequences from the documents of a collection. A sequence is said to be frequent if it appears in more than σ documents, in which σ is the frequency threshold given. Furthermore, a sequence is maximal, if no other frequent ..."
Abstract - Cited by 13 (0 self) - Add to MetaCart
Abstract. We have developed a method that extracts all maximal frequent word sequences from the documents of a collection. A sequence is said to be frequent if it appears in more than σ documents, in which σ is the frequency threshold given. Furthermore, a sequence is maximal, if no other frequent

Authorship Attribution using Word Sequences

by Rosa María Coyotl-morales, Luis Villaseñor-pineda, Manuel Montes-y-gómez, Paolo Rosso, Laboratorio De Tecnologías Del Lenguaje, Informáticos Computación
"... Abstract. Authorship attribution is the task of identifying the author of a given text. The main concern of this task is to define an appropriate characterization of documents that captures the writing style of authors. This paper proposes a new method for authorship attribution supported on the ide ..."
Abstract - Cited by 14 (5 self) - Add to MetaCart
on the idea that a proper identification of authors must consider both stylistic and topic features of texts. This method characterizes documents by a set of word sequences that combine functional and content words. The experimental results on poem classification demonstrated that this method outperforms most

Using Word Sequences for Text Summarization

by Esaú Villatoro-tello, Luis Villaseñor-pineda, Manuel Montes-y-gómez
"... Abstract. Traditional approaches for extractive summarization score/classify sentences based on features such as position in the text, word frequency and cue phrases. These features tend to produce satisfactory summaries, but have the inconvenience of being domain dependent. In this paper, we propos ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
propose to tackle this problem representing the sentences by word sequences (n-grams), a widely used representation in text categorization. The experiments demonstrated that this simple representation not only diminishes the domain and language dependency but also enhances the summarization performance. 1

Estimation of probabilities from sparse data for the language model component of a speech recognizer

by Slava M. Katz - IEEE Transactions on Acoustics, Speech and Signal Processing , 1987
"... Abstract-The description of a novel type of rn-gram language model is given. The model offers, via a nonlinear recursive procedure, a com-putation and space efficient solution to the problem of estimating prob-abilities from sparse data. This solution compares favorably to other proposed methods. Wh ..."
Abstract - Cited by 799 (2 self) - Add to MetaCart
, and it is a problem that one always encounters while collecting fre-quency statistics on words and word sequences (m-grams) from a text of finite size. This means that even for a very large data col-lection, the maximum likelihood estimation method does not allow Turing’s estimate PT for a probability of a
Next 10 →
Results 1 - 10 of 12,172
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University