Results 1 
5 of
5
Incremental Clustering and Dynamic Information Retrieval
, 1997
"... Motivated by applications such as document and image classification in information retrieval, we consider the problem of clustering dynamic point sets in a metric space. We propose a model called incremental clustering which is based on a careful analysis of the requirements of the information retri ..."
Abstract

Cited by 153 (5 self)
 Add to MetaCart
Motivated by applications such as document and image classification in information retrieval, we consider the problem of clustering dynamic point sets in a metric space. We propose a model called incremental clustering which is based on a careful analysis of the requirements of the information retrieval application, and which should also be useful in other applications. The goal is to efficiently maintain clusters of small diameter as new points are inserted. We analyze several natural greedy algorithms and demonstrate that they perform poorly. We propose new deterministic and randomized incremental clustering algorithms which have a provably good performance. We complement our positive results with lower bounds on the performance of incremental algorithms. Finally, we consider the dual clustering problem where the clusters are of fixed diameter, and the goal is to minimize the number of clusters. 1 Introduction We consider the following problem: as a sequence of points from a metric...
AntiAliasing on the Web
, 2004
"... It is increasingly common for users to interact with the web using a number of di#erent aliases. This trend is a doubleedged sword. On one hand, it is a fundamental building block in approaches to online privacy. On the other hand, there are economic and social consequences to allowing each user an ..."
Abstract

Cited by 20 (4 self)
 Add to MetaCart
It is increasingly common for users to interact with the web using a number of di#erent aliases. This trend is a doubleedged sword. On one hand, it is a fundamental building block in approaches to online privacy. On the other hand, there are economic and social consequences to allowing each user an arbitrary number of free aliases. Thus, there is great interest in understanding the fundamental issues in obscuring the identities behind aliases.
AdROSA – Adaptive personalization of web advertising
 Information Sciences
, 2007
"... Abstract. One of the greatest and most recent challenges for online advertising is the use of adaptive personalization at the same time that the Internet continues to grow as a global market. Most existing solutions to online advertising placement are based on demographic targeting or on information ..."
Abstract

Cited by 15 (4 self)
 Add to MetaCart
Abstract. One of the greatest and most recent challenges for online advertising is the use of adaptive personalization at the same time that the Internet continues to grow as a global market. Most existing solutions to online advertising placement are based on demographic targeting or on information gained directly from the user. The AdROSA system for automatic web banner personalization, which integrates web usage and content mining techniques to reduce user input and to respect users ' privacy, is presented in the paper. Furthermore, certain advertising policies, important factors for both publishers and advertisers, are taken into consideration. The integration of all the relevant information is accomplished in one vector space to enable online and fully personalized advertising.
Streaming Algorithms for kCenter Clustering with Outliers and with Anonymity
"... Abstract. Clustering is a common problem in the analysis of large data sets. Streaming algorithms, which make a single pass over the data set using small working memory and produce a clustering comparable in cost to the optimal offline solution, are especially useful. We develop the first streaming ..."
Abstract

Cited by 6 (0 self)
 Add to MetaCart
Abstract. Clustering is a common problem in the analysis of large data sets. Streaming algorithms, which make a single pass over the data set using small working memory and produce a clustering comparable in cost to the optimal offline solution, are especially useful. We develop the first streaming algorithms achieving a constantfactor approximation to the cluster radius for two variations of the kcenter clustering problem. We give a streaming (4+ɛ)approximation algorithm using O(ɛ −1 kz) memory for the problem with outliers, in which the clustering is allowed to drop up to z of the input points; previous work used a random sampling approach which yields only a bicriteria approximation. We also give a streaming (6 + ɛ)approximation algorithm using O(ɛ −1 ln(ɛ −1)k + k 2) memory for a variation motivated by anonymity considerations in which each cluster must contain at least a certain number of input points.
Integration And Reuse Of Heterogeneous XML DTDS For Information Agents
, 2001
"... This paper proposes a novel approach to integrating heterogeneous XML DTDs ..."
Abstract
 Add to MetaCart
This paper proposes a novel approach to integrating heterogeneous XML DTDs