Results 1 - 10
of
13
Methodologies for Crawler Based Web Surveys
, 2002
"... There have been many attempts to study the content of the web, either through human or automatic agents. Five different previously used web survey methodologies are described and analysed, each justifiable in its own right, but a simple experiment is presented that demonstrates concrete differences ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
There have been many attempts to study the content of the web, either through human or automatic agents. Five different previously used web survey methodologies are described and analysed, each justifiable in its own right, but a simple experiment is presented that demonstrates concrete differences between them. The concept of crawling the web also bears further inspection, including the scope of the pages to crawl, the method used to access and index each page, and the algorithm for the identification of duplicate pages. The issues involved here will be well-known to many computer scientists but, with the increasing use of crawlers and search engines in other disciplines, they now require a public discussion in the wider research community. This paper concludes that any scientific attempt to crawl the web must make available the parameters under which it is operating so that researchers can, in principle, replicate experiments or be aware of and take into account differences between methodologies. A new hybrid random page selection methodology is also introduced.
Adaptive Combination of Evidence for Information Retrieval
, 1999
"... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv I Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 II Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 A. Overview . . . . . . . . . . . . . . . . . . . . . . . . . ..."
Abstract
-
Cited by 10 (1 self)
- Add to MetaCart
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv I Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 II Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 A. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 B. Information Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. The Information Retrieval Problem . . . . . . . . . . . . . . . . 4 2. Relevance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 4. Relevance Feedback . . . . . . . . . . . . . . . . . . . . . . . . . 12 5. Approaches to IR#Sources of Evidence . . . . . . . . . . . . . . . 13 C. Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1. Optimization Techniques . . . . . . . . . . . . . . . . . . . . . . 20 D. Machine Learning in IR . . . . . . . . . . . . . . . . . . . . . . . . . ...
How Much More is Better? - Characterizing the Effects of Adding More IR Systems to a Combination
- In Content-Based Multimedia Information Access (RIAO
, 2000
"... We present the results of some expansion experiments for solving the routing, data fusion problem using TREC5 systems. The experiments address the question "How much more is better?" when combining the results of multiple information retrieval systems using a linear combination (weighted sum) model. ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
We present the results of some expansion experiments for solving the routing, data fusion problem using TREC5 systems. The experiments address the question "How much more is better?" when combining the results of multiple information retrieval systems using a linear combination (weighted sum) model. By investigating all 2way, 3-way, 4-way and 10-way combinations of 10 IR systems on 10 queries, we show that: (1) one can expect potentially significant amounts of improvement in performance over the best system used in the combination if enough systems are used, (2) for this number of candidate systems, the point of diminishing returns is reached when around four systems are used in the combination, (3) queries generally have too few relevant documents, causing little correlation in performance between the training set and test set; thus making it difficult to get test set improvement even when multiple systems are used, and (4) if one knows the relative past performance of the candidate s...
Knowledge Discovery for Automatic Query Expansion on the World Wide Web
- In Workshop on the World-Wide Web and Conceptual Modeling (WWWCM'99), LNCS 1727
, 1999
"... The World-Wide Web is an enormous, distributed, and heterogeneous information space. Currently, with the growth of available data, finding interesting information is difficult. Search engines like Altavista are useful, but their results are not always satisfactory. In this paper, we present a method ..."
Abstract
-
Cited by 6 (3 self)
- Add to MetaCart
The World-Wide Web is an enormous, distributed, and heterogeneous information space. Currently, with the growth of available data, finding interesting information is difficult. Search engines like Altavista are useful, but their results are not always satisfactory. In this paper, we present a method called Knowledge Discovery on the Web for extracting connections between terms. The knowledge in these connections is used for query expansion. We present experiments performed with our system, which is based on the SMART retrieval system. We used the comparative precision method for evaluating our system against three well-known Web search engines on a collection of 60,000 Web pages.
Methodologies for Crawler Based Web
- Surveys, Internet Research: Electronic Networking and Applications
, 2002
"... There have been many attempts to study the content of the Web, either through human or automatic agents. Describes five different previously used Web survey methodologies, each justifiable in its own right, but presents a simple experiment that demonstrates concrete differences between them. The con ..."
Abstract
-
Cited by 4 (4 self)
- Add to MetaCart
There have been many attempts to study the content of the Web, either through human or automatic agents. Describes five different previously used Web survey methodologies, each justifiable in its own right, but presents a simple experiment that demonstrates concrete differences between them. The concept of crawling the Web also bears further inspection, including the scope of the pages to crawl, the method used to access and index each page, and the algorithm for the identification of duplicate pages. The issues involved here will be well-known to many computer scientists but, with the increasing use of crawlers and search engines in other disciplines, they now require a public discussion in the wider research community. Concludes that any scientific attempt to crawl the Web must make available the parameters under which it is operating so that researchers can, in principle, replicate experiments or be aware of and take into account differences between methodologies. Also introduces a new hybrid random page selection methodology. Electronic access The research register for this journal is available at
Technology issues in useraccess to Web-based medical information
- Proceedings of the AMIA Symposium
, 1999
"... We conducted a study of user queries to the National Library of Medicine Web site over a three month period. Our purpose was to study the nature and scope of these queries in order to understand how to improve users ’ access to the information they are seeking on our site. The results show that the ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
We conducted a study of user queries to the National Library of Medicine Web site over a three month period. Our purpose was to study the nature and scope of these queries in order to understand how to improve users ’ access to the information they are seeking on our site. The results show that the queries are primarily medical in content (94%), with only a small percentage (5.5%) relating to library services, and with a very small percentage (.5%) not being medically relevant at all. We characterize the data set, and conclude with a discussion of our plans to develop a UMLS-based terminology server to assist NLM Web users.
The new literacy
, 1990
"... What is literally digital about literacy today is how much of what is read and written has been electronically conveyed as binary strings of one and zeros, before appearing as letters, words, numbers, symbols, and images on the screens and pages of our literate lives. This digital aspect of literacy ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
What is literally digital about literacy today is how much of what is read and written has been electronically conveyed as binary strings of one and zeros, before appearing as letters, words, numbers, symbols, and images on the screens and pages of our literate lives. This digital aspect of literacy, invisible to the naked eye, is the very currency that drives the global information economy. Yet what we see of this literacy is remarkably continuous with the literacy of print culture, right down to the very serifs that grace many of the fonts of digital literacy. So begins the paradox that while digital literacy constitutes an entirely new medium for reading and writing, it is but a further extension of what writing first made of language. 1 On the one hand, long-standing scholars of this new medium, such as Donald Leu, favor treating digital literacy as itself a “great transformation, ” holding that such technologies do nothing less than “rapidly and continuously redefine the nature of literacy. ” 2 We tend, on the other hand, to look to the continuities and extensions achieved through the introduction of digital literacy into a print culture, while seeking to understand how these developments encourage what is most admirable about the nature of literacy. 3
Why Autonomy Makes the Agent
, 2001
"... This paper works on the premise that the position stated by Jennings et al. [17] is correct. Specifically that, amongst other things, the agent metaphor is a useful extension of the object-oriented metaphor. Object-oriented (OO) programming [29] is programming where data-abstraction is achieved by u ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper works on the premise that the position stated by Jennings et al. [17] is correct. Specifically that, amongst other things, the agent metaphor is a useful extension of the object-oriented metaphor. Object-oriented (OO) programming [29] is programming where data-abstraction is achieved by users defining their own data-structures (see figure 1), or "objects". These objects encapsulate data and methods for operating on that data; and the OO framework allows new objects to be created that inherit the properties (both data and methods) of existing objects. This allows archetypeal objects to be defined and then extended by different programmers, who needn't have complete understanding of exactly how the underlying objects are implemented
I Still Haven’t Found What I’m Looking For: Web Searching as Query Refinement
, 2002
"... Dedicated to the memory of ..."

