Results 1 -
5 of
5
Incremental Clustering and Dynamic Information Retrieval
, 1997
"... Motivated by applications such as document and image classification in information retrieval, we consider the problem of clustering dynamic point sets in a metric space. We propose a model called incremental clustering which is based on a careful analysis of the requirements of the information retri ..."
Abstract
-
Cited by 129 (3 self)
- Add to MetaCart
Motivated by applications such as document and image classification in information retrieval, we consider the problem of clustering dynamic point sets in a metric space. We propose a model called incremental clustering which is based on a careful analysis of the requirements of the information retrieval application, and which should also be useful in other applications. The goal is to efficiently maintain clusters of small diameter as new points are inserted. We analyze several natural greedy algorithms and demonstrate that they perform poorly. We propose new deterministic and randomized incremental clustering algorithms which have a provably good performance. We complement our positive results with lower bounds on the performance of incremental algorithms. Finally, we consider the dual clustering problem where the clusters are of fixed diameter, and the goal is to minimize the number of clusters. 1 Introduction We consider the following problem: as a sequence of points from a metric...
Anti-Aliasing on the Web
, 2004
"... It is increasingly common for users to interact with the web using a number of di#erent aliases. This trend is a doubleedged sword. On one hand, it is a fundamental building block in approaches to online privacy. On the other hand, there are economic and social consequences to allowing each user an ..."
Abstract
-
Cited by 17 (4 self)
- Add to MetaCart
It is increasingly common for users to interact with the web using a number of di#erent aliases. This trend is a doubleedged sword. On one hand, it is a fundamental building block in approaches to online privacy. On the other hand, there are economic and social consequences to allowing each user an arbitrary number of free aliases. Thus, there is great interest in understanding the fundamental issues in obscuring the identities behind aliases.
AdROSA – Adaptive personalization of web advertising
- Information Sciences
, 2007
"... Abstract. One of the greatest and most recent challenges for online advertising is the use of adaptive personalization at the same time that the Internet continues to grow as a global market. Most existing solutions to online advertising placement are based on demographic targeting or on information ..."
Abstract
-
Cited by 7 (2 self)
- Add to MetaCart
Abstract. One of the greatest and most recent challenges for online advertising is the use of adaptive personalization at the same time that the Internet continues to grow as a global market. Most existing solutions to online advertising placement are based on demographic targeting or on information gained directly from the user. The AdROSA system for automatic web banner personalization, which integrates web usage and content mining techniques to reduce user input and to respect users ' privacy, is presented in the paper. Furthermore, certain advertising policies, important factors for both publishers and advertisers, are taken into consideration. The integration of all the relevant information is accomplished in one vector space to enable online and fully personalized advertising.
Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity
"... Abstract. Clustering is a common problem in the analysis of large data sets. Streaming algorithms, which make a single pass over the data set using small working memory and produce a clustering comparable in cost to the optimal offline solution, are especially useful. We develop the first streaming ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract. Clustering is a common problem in the analysis of large data sets. Streaming algorithms, which make a single pass over the data set using small working memory and produce a clustering comparable in cost to the optimal offline solution, are especially useful. We develop the first streaming algorithms achieving a constant-factor approximation to the cluster radius for two variations of the k-center clustering problem. We give a streaming (4+ɛ)-approximation algorithm using O(ɛ −1 kz) memory for the problem with outliers, in which the clustering is allowed to drop up to z of the input points; previous work used a random sampling approach which yields only a bicriteria approximation. We also give a streaming (6 + ɛ)-approximation algorithm using O(ɛ −1 ln(ɛ −1)k + k 2) memory for a variation motivated by anonymity considerations in which each cluster must contain at least a certain number of input points.
Integration And Reuse Of Heterogeneous XML DTDS For Information Agents
, 2001
"... This paper proposes a novel approach to integrating heterogeneous XML DTDs ..."
Abstract
- Add to MetaCart
This paper proposes a novel approach to integrating heterogeneous XML DTDs

