Results 1 - 10
of
350
Personalizing search via automated analysis of interests and activities
, 2005
"... We formulate and study search algorithms that consider a user’s prior interactions with a wide variety of content to personalize that user’s current Web search. Rather than relying on the unrealistic assumption that people will precisely specify their intent when searching, we pursue techniques that ..."
Abstract
-
Cited by 303 (29 self)
- Add to MetaCart
(Show Context)
We formulate and study search algorithms that consider a user’s prior interactions with a wide variety of content to personalize that user’s current Web search. Rather than relying on the unrealistic assumption that people will precisely specify their intent when searching, we pursue techniques that leverage implicit information about the user’s interests. This information is used to re-rank Web search results within a relevance feedback framework. We explore rich models of user interests, built from both search-related information, such as previously issued queries and previously visited Web pages, and other information about the user such as documents and email the user has read and created. Our research suggests that rich representations of the user and the corpus are important for personalization, but that it is possible to approximate these representations and provide efficient client-side algorithms for personalizing search. We show that such personalization algorithms can significantly improve on current Web search.
Reference reconciliation in complex information spaces
- In SIGMOD
, 2005
"... Reference reconciliation is the problem of identifying when different references (i.e., sets of attribute values) in a dataset correspond to the same real-world entity. Most previous literature assumed references to a single class that had a fair number of attributes (e.g., research publications). W ..."
Abstract
-
Cited by 168 (2 self)
- Add to MetaCart
(Show Context)
Reference reconciliation is the problem of identifying when different references (i.e., sets of attribute values) in a dataset correspond to the same real-world entity. Most previous literature assumed references to a single class that had a fair number of attributes (e.g., research publications). We consider complex information spaces: our references belong to multiple related classes and each reference may have very few attribute values. A prime example of such a space is Personal Information Management, where the goal is to provide a coherent view of all the information on one’s desktop. Our reconciliation algorithm has three principal features. First, we exploit the associations between references to design new methods for reference comparison. Second, we propagate information between reconciliation decisions to accumulate positive and negative evidences. Third, we gradually enrich references by merging attribute values. Our experiments show that (1) we considerably improve precision and recall over standard methods on a diverse set of personal information datasets, and (2) there are advantages to using our algorithm even on a standard citation dataset benchmark. 1.
Blinks: Ranked keyword searches on graphs
, 2007
"... Query processing over graph-structured data is enjoying a growing number of applications. A top-k keyword search query on a graph nds the top k answers according to some ranking criteria, where each answer is a substructure of the graph containing all query keywords. Current techniques for supportin ..."
Abstract
-
Cited by 139 (9 self)
- Add to MetaCart
(Show Context)
Query processing over graph-structured data is enjoying a growing number of applications. A top-k keyword search query on a graph nds the top k answers according to some ranking criteria, where each answer is a substructure of the graph containing all query keywords. Current techniques for supporting such queries on general graphs suffer from several drawbacks, e.g., poor worst-case performance, not taking full advantage of indexes, and high memory requirements. To address these problems, we propose BLINKS, a bi-level indexing and query processing scheme for top-k keyword search on graphs. BLINKS follows a search strategy with provable performance bounds, while additionally exploiting a bi-level index for pruning and accelerating the search. To reduce the index space, BLINKS partitions a data graph into blocks: The bilevel index stores summary information at the block level to initiate and guide search among blocks, and more detailed information for each block to accelerate search within blocks. Our experiments show that BLINKS offers orders-of-magnitude performance improvement over existing approaches.
Principles of dataspace systems
- IN PODS
, 2006
"... The most acute information management challenges today stem from organizations relying on a large number of diverse, interrelated data sources, but having no means of managing them in a convenient, integrated, or principled fashion. These challenges arise in enterprise and government data management ..."
Abstract
-
Cited by 126 (9 self)
- Add to MetaCart
The most acute information management challenges today stem from organizations relying on a large number of diverse, interrelated data sources, but having no means of managing them in a convenient, integrated, or principled fashion. These challenges arise in enterprise and government data management, digital libraries, “smart ” homes and personal information management. We have proposed dataspaces as a data management abstraction for these diverse applications and DataSpace Support Platforms (DSSPs) as systems that should be built to provide the required services over dataspaces. Unlike data integration systems, DSSPs do not require full semantic integration of the sources in order to provide useful services. This paper lays out specific technical challenges to realizing DSSPs and ties them to existing work in our field. We focus on query answering in DSSPs, the DSSP’s ability to introspect on its content, and the use of human attention to enhance the semantic relationships in a dataspace.
Stuff goes into the computer and doesn't come out”: A cross-tool study of personal information management
- Study of Personal Information Management. Proc. CHI, ACM
, 2004
"... This paper reports a study of Personal Information Management (PIM), which advances research in two ways: (1) rather than focusing on one tool, we collected cross-tool data relating to file, email and web bookmark usage for each participant, and (2) we collected longitudinal data for a subset of the ..."
Abstract
-
Cited by 117 (1 self)
- Add to MetaCart
This paper reports a study of Personal Information Management (PIM), which advances research in two ways: (1) rather than focusing on one tool, we collected cross-tool data relating to file, email and web bookmark usage for each participant, and (2) we collected longitudinal data for a subset of the participants. We found that individuals employ a rich variety of strategies both within and across PIM tools, and we present new strategy classifications that reflect this behaviour. We discuss synergies and differences between tools that may be useful in guiding the design of tool integration. Our longitudinal data provides insight into how PIM behaviour evolves over time, and suggests that the supporting nature of PIM discourages reflection by users on their strategies. We discuss how users may benefit if tools and organizations promote increased reflection on PIM.
Semantic annotation, indexing, and retrieval
- Journal of Web Semantics
, 2004
"... Abstract. The Semantic Web realization depends on the availability of a critical mass of metadata for the web content, associated with the respective formal knowledge about the world. We claim that the Semantic Web, at its current stage of development, is in a state of a critically need of metadata ..."
Abstract
-
Cited by 108 (4 self)
- Add to MetaCart
(Show Context)
Abstract. The Semantic Web realization depends on the availability of a critical mass of metadata for the web content, associated with the respective formal knowledge about the world. We claim that the Semantic Web, at its current stage of development, is in a state of a critically need of metadata generation and usage schemata that are specific, well-defined and easy to understand. This paper introduces our vision for a holistic architecture for semantic annotation, indexing, and retrieval of documents with regard to extensive semantic repositories. A system (called KIM), implementing this concept, is presented in brief and it is used for the purposes of evaluation and demonstration. A particular schema for semantic annotation with respect to real-world entities is proposed. The underlying philosophy is that a practical semantic annotation is impossible without some particular knowledge modelling commitments. Our understanding is that a system for such semantic annotation should be based upon a simple model of real-world entity classes, complemented with extensive instance knowledge. To ensure the efficiency, ease of sharing, and reusability of the metadata,
Information re-retrieval: repeat queries in yahoo’s logs
- In SIGIR ’07: Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
, 2007
"... People often repeat Web searches, both to find new information on topics they have previously explored and to re-find information they have seen in the past. The query associated with a repeat search may differ from the initial query but can nonetheless lead to clicks on the same results. This paper ..."
Abstract
-
Cited by 104 (22 self)
- Add to MetaCart
(Show Context)
People often repeat Web searches, both to find new information on topics they have previously explored and to re-find information they have seen in the past. The query associated with a repeat search may differ from the initial query but can nonetheless lead to clicks on the same results. This paper explores repeat search behavior through the analysis of a one-year Web query log of 114 anonymous users and a separate controlled survey of an additional 119 volunteers. Our study demonstrates that as many as 40 % of all queries are re-finding queries. Refinding appears to be an important behavior for search engines to explicitly support, and we explore how this can be done. We demonstrate that changes to search engine results can hinder refinding, and provide a way to automatically detect repeat searches and predict repeat clicks.
Do Life-logging Technologies Support Memory for the Past? An Experimental Study Using Sensecam
- In Proc. CHI 2007, ACM Press
, 2007
"... We report on the results of a study using SenseCam, a ―lifelogging‖ technology in the form of a wearable camera, which aims to capture data about everyday life in order to support people‘s memory for past, personal events. We find evidence that SenseCam images do facilitate people‘s ability to conne ..."
Abstract
-
Cited by 83 (7 self)
- Add to MetaCart
(Show Context)
We report on the results of a study using SenseCam, a ―lifelogging‖ technology in the form of a wearable camera, which aims to capture data about everyday life in order to support people‘s memory for past, personal events. We find evidence that SenseCam images do facilitate people‘s ability to connect to their past, but that images do this in different ways. We make a distinction between ―remembering ‖ the past, and ―knowing ‖ about it, and provide evidence that SenseCam images work differently over time in these capacities. We also compare the efficacy of user-captured images with automatically captured images and discuss the implications of these findings and others for how we conceive of and make claims about life-logging technologies.
Don't Take My Folders Away! Organizing Personal Information to Get Things Done
- Paper presented at the Conference on Human Factors in Computing Systems (CHI 2005
, 2005
"... A study explores the way people organize information in support of projects (“teach a course”, “plan a wedding”, etc.). The folder structures to organize project information – especially electronic documents and other files – frequently resembled a “divide and conquer ” problem decomposition with su ..."
Abstract
-
Cited by 79 (11 self)
- Add to MetaCart
(Show Context)
A study explores the way people organize information in support of projects (“teach a course”, “plan a wedding”, etc.). The folder structures to organize project information – especially electronic documents and other files – frequently resembled a “divide and conquer ” problem decomposition with subfolders corresponding to major components (subprojects) of the project. Folders were clearly more than simply a means to one end: Organizing for later retrieval. Folders were information in their own right – representing, for example, a person’s evolving understanding of a project and its components. Unfortunately, folders are often “overloaded ” with information. For example, folders sometimes included leading characters to force an ordering (“aa”, “zz”). And folder hierarchies frequently reflected a tension between organizing information for current use vs. repeated re-use.
A Platform for Personal Information Management and Integration
"... The explosion of the amount of information available in digital form has made search a hot research topic for the Information Management Community. While most of the research on search is focused on the WWW, individual computer users have developed their own vast collections of data on their desktop ..."
Abstract
-
Cited by 74 (6 self)
- Add to MetaCart
(Show Context)
The explosion of the amount of information available in digital form has made search a hot research topic for the Information Management Community. While most of the research on search is focused on the WWW, individual computer users have developed their own vast collections of data on their desktops, and these collections are in critical need for good search tools. We describe the Semex System that offers users a flexible platform for personal information management. Semex has two main goals. The first goal is to enable browsing personal information by semantically meaningful associations. The challenge it to automatically create such associations between data items on one’s desktop, and to create enough of them so Semex becomes an indispensable tool. Our second goal is to leverage the personal information space we created to increase users ’ productivity. As our first target, Semex leverages the personal information to enable lightweight information integration tasks that are discouragingly difficult to perform with today’s tools.