Results 1 - 10
of
21
Detecting session boundaries from Web user logs
- In Proceedingsof of the BCS-IRSG 22nd Annual Colloquium on Information Retrieval Research
, 2000
"... Detecting session boundaries on the Web is important for several reasons. Firstly, it is important to establish a common context for various statistics relating to user sessions and frequency of user activities. More specifically, it is important to detect some boundaries in order to group related ..."
Abstract
-
Cited by 31 (2 self)
- Add to MetaCart
Detecting session boundaries on the Web is important for several reasons. Firstly, it is important to establish a common context for various statistics relating to user sessions and frequency of user activities. More specifically, it is important to detect some boundaries in order to group related information together for other applications, such as learning techniques for adaptive search engines. To date, however, the notion of a session on the Web has not been consistently defined, if it at all. The tendency has been to group the log data that has been made available from one user or IP address under the umbrella of one session regardless of the length of time covered by the logs. This tendency lacks a more user oriented view. Our argument is that a session on the Web can be defined as a group of user activities that are related to each other not only through an evolving information need but also through close proximity in time. Thus, we describe and discuss the investigatio...
Location Awareness in Unstructured Peer-To-Peer Systems
- IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS
, 2005
"... Peer-to-Peer (P2P) computing has emerged as a popular model aiming at further utilizing Internet information and resources. However, the mechanism of peers randomly choosing logical neighbors without any knowledge about underlying physical topology can cause a serious topology mismatch between the ..."
Abstract
-
Cited by 19 (4 self)
- Add to MetaCart
Peer-to-Peer (P2P) computing has emerged as a popular model aiming at further utilizing Internet information and resources. However, the mechanism of peers randomly choosing logical neighbors without any knowledge about underlying physical topology can cause a serious topology mismatch between the P2P overlay network and the physical underlying network. The topology mismatch problem brings great stress in the Internet infrastructure. It greatly limits the performance gain from various search or routing techniques. Meanwhile, due to the inefficient overlay topology, the flooding-based search mechanisms cause a large volume of unnecessary traffic. Aiming at alleviating the mismatching problem and reducing the unnecessary traffic, we propose a location-aware topology matching (LTM) technique. LTM builds an efficient overlay by disconnecting slow connections and choosing physically closer nodes as logical neighbors while still retaining the search scope and reducing response time for queries. LTM is scalable and completely distributed in the sense that it does not require any global knowledge of the whole overlay network. The effectiveness of LTM is demonstrated through simulation studies.
A Security Model for FullText File System Search in Multi-User Environments
- In Proceedings of the FAST
, 2005
"... Most desktop search systems maintain per-user indices to keep track of file contents. In a multi-user environment, this is not a viable solution, because the same file has to be indexed many times, once for every user that may access the file, causing both space and performance problems. Having a si ..."
Abstract
-
Cited by 18 (3 self)
- Add to MetaCart
Most desktop search systems maintain per-user indices to keep track of file contents. In a multi-user environment, this is not a viable solution, because the same file has to be indexed many times, once for every user that may access the file, causing both space and performance problems. Having a single system-wide index for all users, on the other hand, allows for efficient indexing but requires special security mechanisms to guarantee that the search results do not violate any file permissions. We present a security model for full-text file system search, based on the UNIX security model, and discuss two possible implementations of the model. We show that the first implementation, based on a postprocessing approach, allows an arbitrary user to obtain information about the content of files for which he does not have read permission. The second implementation does not share this problem. We give an experimental performance evaluation for both implementations and point out query optimization opportunities for the second one. 1
Analysing Web search logs to determine session boundaries for user-oriented learning
- In Proceedings of Adaptive Hypermedia and Adaptive Web-based Systems
, 2000
"... Abstract. Incremental learning approaches based on user search activities provide a means of building adaptive information retrieval systems. To develop more effective user-oriented learning techniques for the Web, we need to be able to identify a meaningful session unit from which we can learn. Wit ..."
Abstract
-
Cited by 12 (0 self)
- Add to MetaCart
Abstract. Incremental learning approaches based on user search activities provide a means of building adaptive information retrieval systems. To develop more effective user-oriented learning techniques for the Web, we need to be able to identify a meaningful session unit from which we can learn. Without this, we run a high risk of grouping together activities that are unrelated or perhaps not from the same user. We are interested in detecting boundaries of sequences between related activities (sessions) that would group the activities for a learning purpose. Session boundaries, in Reuters transaction logs, were detected automatically. The generated boundaries were compared with human judgements. The comparison confirmed that a meaningful session threshold for establishing these session boundaries was confined to a 11-15 minute range. 1
Digital Video in Education
, 1998
"... Digital Video is an exciting new medium with the potential to revolutionize the way organizations train their employees. However, there are questions that must be answered. How practical is video? What is the demand? What is the best use of video? In this paper, we compare the quality and performanc ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Digital Video is an exciting new medium with the potential to revolutionize the way organizations train their employees. However, there are questions that must be answered. How practical is video? What is the demand? What is the best use of video? In this paper, we compare the quality and performance of video over a 100 Mbps Switched Ethernet Network using a RealVideo Server, the Oracle Video Server, and a file server. We compare the performance and quality of the RealVideo Stream with MPEG-1 distributed from a File Server using various client and server combinations. We also analyze 50,000 queries from an Excite database to determine the current demand for video. Finally, we present the results of a study that explores the value of digital video in an educational environment. 1.1 Keywords Introduction The United States Army has approximately 475,000 soldiers deployed across the United States, Europe, Asia, Central America, and South America. Regardless of their location, soldiers r...
Web Information Retrieval - an Algorithmic Perspective
- Proceedings of the 8 th Annual European Symposium on Algorithms, (ESA
, 2000
"... In this paper we survey algorithmic aspects of Web information retrieval. As an example, we discuss ranking of search engine results using connectivity analysis. ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
In this paper we survey algorithmic aspects of Web information retrieval. As an example, we discuss ranking of search engine results using connectivity analysis.
DiCAS: An efficient distributed caching mechanism for P2P systems
- IEEE Transactions on Parallel and Distributed Systems
"... Abstract—Peer-to-peer networks are widely criticized for their inefficient flooding search mechanism. Distributed Hash Table (DHT) algorithms have been proposed to improve the search efficiency by mapping the index of a file to a unique peer based on predefined hash functions. However, the tight cou ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Abstract—Peer-to-peer networks are widely criticized for their inefficient flooding search mechanism. Distributed Hash Table (DHT) algorithms have been proposed to improve the search efficiency by mapping the index of a file to a unique peer based on predefined hash functions. However, the tight coupling between indices and hosting peers incurs high maintenance cost in a highly dynamic network. To properly balance the tradeoff between the costs of indexing and searching, we propose the distributed caching and adaptive search (DiCAS) algorithm, where indices are passively cached in a group of peers based on a predefined hash function. Guided by the same function, adaptive search selectively forwards queries to “matched ” peers with a high probability of caching the desired indices. The search cost is reduced due to shrunk searching space. Different from the DHT solutions, distributed caching loosely maps the index of a file to a group of peers in a passive fashion, which saves the cost of updating indices. Our simulation study shows that the DiCAS protocol can significantly reduce the network search traffic with the help of small cache space contributed by each individual peer. Index Terms—Peer-to-peer, query response, flooding, distributed caching and adaptive search, search efficiency. 1
Improving Document Retrieval by Automatic Query Expansion Using Collaborative Learning of Term-Based Concepts
- In Proceedings of the 5th International Workshop on Document Analysis Systems (DAS), volume 2423 of Lecture Notes in Computer Science
, 2002
"... Query expansion methods have been studied for a long time with debatable success in many instances. In this paper, a new approach is presented based on using term concepts learned by other queries. Two important issues with query expansion are addressed: the selection and the weighing of additional ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Query expansion methods have been studied for a long time with debatable success in many instances. In this paper, a new approach is presented based on using term concepts learned by other queries. Two important issues with query expansion are addressed: the selection and the weighing of additional search terms. In contrast to other methods, the regarded query is expanded by adding those terms which are most similar to the concept of individual query terms, rather than selecting terms that are similar to the complete query or that are directly similar to the query terms. Experiments have shown that this kind of query expansion results in notable improvements of the retrieval effectiveness if measured the recall/precision in comparison to the standard vector space model and to the pseudo relevance feedback. This approach can be used to improve the retrieval of documents in Digital Libraries, in Document Management Systems, in the WWW etc.
Collaborative Learning of Term-Based Concepts for Automatic Query Expansion
- In Proceedings of the 13th European Conference on Machine Learning (ECML 2002
, 2002
"... Information Retrieval Systems have been studied in Computer Sci- ence for decades. The traditional ad-hoc task is to find all documents relevant for an ad-hoc given query but the accuracy of ad-hoc document retrieval systems has plateaued in recent years. At DFKI, we are working on so-called collabo ..."
Abstract
-
Cited by 3 (1 self)
- Add to MetaCart
Information Retrieval Systems have been studied in Computer Sci- ence for decades. The traditional ad-hoc task is to find all documents relevant for an ad-hoc given query but the accuracy of ad-hoc document retrieval systems has plateaued in recent years. At DFKI, we are working on so-called collaborative information retrieval (C1R) systems which unintrusively learn from their users search processes.
Personalized concept-based clustering of search engine queries
- IEEE Transactions on Knowledge and Data Engineering
, 2008
"... Abstract—A major problem of current Web search is that search queries are usually short and ambiguous, and thus are insufficient for specifying the precise user needs. To alleviate this problem, some search engines suggest terms that are semantically related to the submitted queries so that users ca ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
Abstract—A major problem of current Web search is that search queries are usually short and ambiguous, and thus are insufficient for specifying the precise user needs. To alleviate this problem, some search engines suggest terms that are semantically related to the submitted queries so that users can choose from the suggestions the ones that reflect their information needs. In this paper, we introduce an effective approach that captures the user’s conceptual preferences in order to provide personalized query suggestions. We achieve this goal with two new strategies. First, we develop online techniques that extract concepts from the web-snippets of the search result returned from a query and use the concepts to identify related queries for that query. Second, we propose a new twophase personalized agglomerative clustering algorithm that is able to generate personalized query clusters. To the best of the authors’ knowledge, no previous work has addressed personalization for query suggestions. To evaluate the effectiveness of our technique, a Google middleware was developed for collecting clickthrough data to conduct experimental evaluation. Experimental results show that our approach has better precision and recall than the existing query clustering methods. Index Terms—Clickthrough, concept-based clustering, personalization, query clustering, search engine. Ç 1

