Searching for authors named "Dennis Fetterly" – sorted by Relevance.
-
Detecting phrase-level duplication on the world wide web
- Two years ago, we conducted a study on the evolution of web pages over time. In the course of that study, we discovered a large number of machine-generated “spam ” web pages emanating from a handful of web servers in Germany. These spam web pages were dynamically assembled by stitching together gram
- Cited by 23 (1 self) – Add To MetaCart
-
Measuring the Search Effectiveness of a Breadth-First Crawl
- Abstract. Previous scalability experiments found that early precision improves as collection size increases. However, that was under the assumption that a collection’s documents are all sampled with uniform probability from the same population. We contrast this to a large breadth-first web crawl, an
- Add To MetaCart
-
Spam, Damn Spam, and Statistics: Using statistical analysis to locate spam web pages
- The increasing importance of search engines to commercial web sites has given rise to a phenomenon we call "web spam", that is, web pages that exist only to mislead search engines into (mis)leading users to certain web sites. Web spam is a nuisance to users as well as search engines: users have a ha
- Cited by 6 (0 self) – Add To MetaCart
-
On the Evolution of Clusters of Near-Duplicate Web Pages
- This paper expands on a 1997 study of the amount and distribution of near-duplicate pages on the World Wide Web. We downloaded a set of 150 million web pages on a weekly basis over the span of 11 weeks. We then determined which of these pages are near-duplicates of one another, and tracked how clust
- Cited by 27 (2 self) – Add To MetaCart
-
Kumar Chellapilla Microsoft Live Labs
- Adversarial IR in general, and search engine spam, in particular, are engaging research topics with a real-world impact for Web users, advertisers and publishers. The AIRWeb workshop will bring researchers and practitioners in these areas together, to present and discuss state-of-the-art techniques
- Add To MetaCart
-
Implementing Portable Desktops: A New Option and Comparisons. Microsoft Corporation
- Abstract. We consider the problem of using a wide variety of computers, dispersed geographically and with varied connectivity, while wanting to have each of them provide you with the same, personalized, desktop environment: operating system, customizations, applications and your documents. We descri
- Cited by 2 (1 self) – Add To MetaCart
-
The Impact of Crawl Policy on Web Search Effectiveness
- Crawl selection policy has a direct influence on Web search effectiveness, because a useful page that is not selected for crawling will also be absent from search results. Yet there has been little or no work on measuring this effect. We introduce an evaluation framework, based on relevance judgment
- Add To MetaCart
-
A large-scale study of the evolution of web pages
- How fast does the web change? Does most of the content remain unchanged once it has been authored, or are the documents continuously updated? Do pages change a little or a lot? Is the extent of change correlated to any other property of the page? All of these questions are of interest to those who m
- Cited by 102 (5 self) – Add To MetaCart
-
DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language
- DryadLINQ is a system and a set of language extensions that enable a new programming model for large scale distributed computing. It generalizes previous execution environments such as SQL, MapReduce, and Dryad in two ways: by adopting an expressive data model of strongly typed.NET objects; and by s
- Cited by 11 (2 self) – Add To MetaCart

