Results 1 - 10
of
96
Adaptive Information Filtering: Evolutionary Computation and N-gram Representation
, 2000
"... Abstract Adaptive Information Filtering (AIF) is concerned with filtering information streams in changing environments. The changes may occur both on the transmission side (the nature of the streams can change) and on the reception side (the interests of a user can change). The research described i ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
in this paper details the progress made in a prototype AIF system based on weighted ngram analysis and evolutionary computation. A major advance is the design and implementation of an n-gram class library allowing experimentation with different values of n instead of solely with 3-grams as in the past. The new
Smoothed bloom filter language models: Tera-scale LMs on the cheap
- In Proc. of ACL
, 2007
"... A Bloom filter (BF) is a randomised data structure for set membership queries. Its space requirements fall significantly below lossless information-theoretic lower bounds but it produces false positives with some quantifiable probability. Here we present a general framework for deriving smoothed lan ..."
Abstract
-
Cited by 22 (1 self)
- Add to MetaCart
probabilities can be derived efficiently from this randomised representation. Our proposal takes advantage of the one-sided error guarantees of the BF and simple inequalities that hold between related n-gram statistics in order to further reduce the BF storage requirements and the error rate of the derived
Abstract Learning Distributed Representations for Statistical Language Modelling and Collaborative Filtering
, 2010
"... With the increasing availability of large datasets machine learning techniques are be-coming an increasingly attractive alternative to expert-designed approaches to solving complex problems in domains where data is abundant. In this thesis we introduce several models for large sparse discrete datase ..."
Abstract
- Add to MetaCart
-valued vectors. Two of the models are based on the Restricted Boltzmann Machine (RBM) architecture while the third one is a simple deterministic model. We show that the deterministic model outperforms the widely used n-gram models and learns sensible word representations. To reduce the time complexity
Privacy-Preserving Spam Filtering
"... Email is a private medium of communication, and the in-herent privacy constraints form a major obstacle in devel-oping effective spam filtering methods which require access to a large amount of email data belonging to multiple users. To mitigate this problem, we envision a privacy preserv-ing spam f ..."
Abstract
- Add to MetaCart
and security, and perform experiments of a prototype system on a large scale spam filtering task. State of the art spam filters often use character n-grams as features which result in large sparse data representation, which is not feasible to be used directly with our training and evaluation protocols. We
Request for Comments: 5784
"... Sieve Email Filtering: Sieves and Display Directives in XML This document describes a way to represent Sieve email filtering language scripts in XML. Representing Sieves in XML is intended not as an alternate storage format for Sieve but rather as a means to facilitate manipulation of scripts using ..."
Abstract
- Add to MetaCart
Sieve Email Filtering: Sieves and Display Directives in XML This document describes a way to represent Sieve email filtering language scripts in XML. Representing Sieves in XML is intended not as an alternate storage format for Sieve but rather as a means to facilitate manipulation of scripts using
Texplore - Exploring Expository Texts Via Hierarchical Representation
"... Exploring expository texts presents an interest- Jig and important ch_allenge. They are read routinely and extensively in the fore of online newspapers, web-based articles, reports, technical aud academic papers. We present a system, called Texplore, which assists readers in exploring the content of ..."
Abstract
- Add to MetaCart
with hierarchical agglomerative clustering. The list of concepts are discovered by n-gram analysis filtered by part-of-speech patterns. Rather than the common presentation of documents by static abstracts, Texplore provides dynamic presentation of the text's content, where the user controls the level
Feature Representation for Effective Action-Item Detection
- In ACM SIGIR Special Interest Group on Information Retrival
, 2005
"... E-mail users face an ever-growing challenge in managing their inboxes due to the growing centrality of email in the workplace for task assignment, action requests, and other roles beyond information dissemination. Whereas Information Retrieval and Machine Learning techniques are gaining initial acce ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
, action-item detection requires inferring the sender’s intent, and as such responds less well to pure bag-of-words classification. However, using enriched feature sets, such as n-grams (up to n=4) with chi-squared feature selection, and contextual cues for action-item location improve performance by up
A Computational Approach to the Discovery and Representation of Lexical Chunks
"... La connaissance des « chunks » (tronçons) lexicaux est maintenant reconnue comme une compétence essentielle pour l’apprentissage d'une seconde langue. Nous étudions deux des principaux problèmes que les « chunks » posent en lexicographie et nous présentons des méthodes de résolution informatiqu ..."
Abstract
- Add to MetaCart
la disposition de l’apprenant. Pour résoudre le premier problème, nous proposons un algorithme glouton exécuté sur un corpus de 20 millions de mots du BNC qui reproduit des mesures d'associations de mot sur des n-grams de plus en plus longs. Cette approche donne la priorité à un rappel élevé et
Representation Models for Text Classification: a comparative analysis over three Web document types
"... Text classification constitutes a popular task in Web research with various applications that range from spam filtering to sentiment analysis. To address it, patterns of cooccurring words or characters are typically extracted from the textual content of Web documents. However, not all documents are ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
. In addition, we consider a novel approach that improves the performance of topic classification across all types of Web documents: namely the n-gram graphs. This model goes beyond the established bag-of-words one, representing each document as a graph. Individual graphs can be combined into a class graph
Comment on ‘‘Low frequency variability in globally integrated tropical
"... Res. Lett., 34, L11703, doi:10.1029/2006GL028283. [1] Sriver and Huber [2006] (hereinafter referred to as SH06), in an effort to examine low frequency tropical cyclone (TC) intensity trends, utilized atmospheric reanalysis data (ERA40 [Uppala et al., 2005] and NNR [Kalnay et al., 1996]) to develop a ..."
Abstract
- Add to MetaCart
that the ERA40 TC PD climatology was an independent, uncorrected, and robust representation of trends in global TC activity. Furthermore, SH06 concluded that the power dissipation index (PDI) developed by E05 was an accurate estimate of the PD. In this comment, we
Results 1 - 10
of
96