Results 1 - 10
of
11
Using Wikipedia to Bootstrap Open Information Extraction
"... We often use ‘Data Management ’ to refer to the manipulation of relational or semi-structured information, but much of the world’s data is unstructured, for example the vast amount of natural-language text on the Web. The ability to manage ..."
Abstract
-
Cited by 9 (0 self)
- Add to MetaCart
We often use ‘Data Management ’ to refer to the manipulation of relational or semi-structured information, but much of the world’s data is unstructured, for example the vast amount of natural-language text on the Web. The ability to manage
Overview-Based Example Selection in End-User Interactive Concept Learning
"... Interaction with large unstructured datasets is difficult because existing approaches, such as keyword search, are not always suited to describing concepts corresponding to the distinctions people want to make within datasets. One possible solution is to allow end-users to train machine learning sys ..."
Abstract
-
Cited by 4 (3 self)
- Add to MetaCart
Interaction with large unstructured datasets is difficult because existing approaches, such as keyword search, are not always suited to describing concepts corresponding to the distinctions people want to make within datasets. One possible solution is to allow end-users to train machine learning systems to identify desired concepts, a strategy known as interactive concept learning. A fundamental challenge is to design systems that preserve end-user flexibility and control while also guiding them to provide examples that allow the machine learning system to effectively learn the desired concept. This paper presents our design and evaluation of four new overview-based approaches to guiding example selection. We situate our explorations within CueFlik, a system examining end-user interactive concept learning in Web image search. Our evaluation shows our approaches not only guide end-users to select better training examples than the best-performing previous design for this application, but also reduce the impact of not knowing when to stop training the system. We discuss challenges for end-user interactive concept learning systems and identify opportunities for future research on the effective design of such systems. ACM Classification: H5.2 [Information Interfaces and
Compiling a Massive, Multilingual Dictionary via Probabilistic Inference
, 2009
"... Can we automatically compose a large set of Wiktionaries and translation dictionaries to yield a massive, multilingual dictionary whose coverage is substantially greater than that of any of its constituent dictionaries? The composition of multiple translation dictionaries leads to a transitive infer ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Can we automatically compose a large set of Wiktionaries and translation dictionaries to yield a massive, multilingual dictionary whose coverage is substantially greater than that of any of its constituent dictionaries? The composition of multiple translation dictionaries leads to a transitive inference problem: if word A translates to word B which in turn translates to word C, what is the probability that C is a translation of A? The paper introduces a novel algorithm that solves this problem for 10,000,000 words in more than 1,000 languages. The algorithm yields PANDIC-TIONARY, a novel multilingual dictionary. PANDICTIONARY contains more than four times as many translations than in the largest Wiktionary at precision 0.90 and over 200,000,000 pairwise translations in over 200,000 language pairs at precision 0.8.
Intelligence in Wikipedia
"... The Intelligence in Wikipedia project at the University of Washington is combining self-supervised information extraction (IE) techniques with a mixed initiative interface designed to encourage communal content creation (CCC). Since IE and CCC are each powerful ways to produce large amounts of struc ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The Intelligence in Wikipedia project at the University of Washington is combining self-supervised information extraction (IE) techniques with a mixed initiative interface designed to encourage communal content creation (CCC). Since IE and CCC are each powerful ways to produce large amounts of structured information, they have been studied extensively — but only in isolation. By combining the two methods in a virtuous feedback cycle, we aim for substantial synergy. While previous papers have described the details of individual aspects of our endeavor [25, 26, 24, 13], this report provides an overview of the project’s progress and vision.
The Trouble With Social Computing Systems Research
"... ° These authors contributed equally to the paper and are listed alphabetically. Social computing has led to an explosion of research in understanding users, and has the potential to similarly revolutionize systems research. However, the number of papers designing and building new sociotechnical syst ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
° These authors contributed equally to the paper and are listed alphabetically. Social computing has led to an explosion of research in understanding users, and has the potential to similarly revolutionize systems research. However, the number of papers designing and building new sociotechnical systems has not kept pace. In this paper we analyze the reasons for this disparity, ranging from misaligned methodological incentives, evaluation expectations and research relevance compared to industry. We suggest improvements for the community to consider and evolve so that we can chart the future of our field. General Terms Social computing, systems research, evaluation
Using Interaction to Improve Intelligence: How Intelligent Systems Should Ask Users for Input
"... Intelligent systems will often need to collect input from users, to provide labels for training data or to correct mistakes the system makes. One interesting avenue of research is how to formulate the questions an intelligent system asks a user, in order to obtain the most accurate responses. In thi ..."
Abstract
- Add to MetaCart
Intelligent systems will often need to collect input from users, to provide labels for training data or to correct mistakes the system makes. One interesting avenue of research is how to formulate the questions an intelligent system asks a user, in order to obtain the most accurate responses. In this paper, we study the impact of varying 5 dimensions of questions on response accuracy: indicating uncertainty, amount of context, level of context, suggesting an answer and asking for supplemental information. In a study of an email sorting task, we show that there is a combination that results in higher levels of accuracy than other combinations and validate this combination in a comparison to questions that a panel of HCI and email experts chose. The contributions of the paper are the approach to determine the best combination of dimensions, the validated combination, and a demonstration of how this type of question interaction can improve intelligent systems. 1
H5.2. Information interfaces and presentation (e.g., HCI): User Interfaces (Evaluation/Methodology). General Terms
"... Social computing has led to an explosion of research in understanding users, and it has the potential to similarly revolutionize systems research. However, the number of papers designing and building new sociotechnical systems has not kept pace. We analyze challenges facing social computing systems ..."
Abstract
- Add to MetaCart
Social computing has led to an explosion of research in understanding users, and it has the potential to similarly revolutionize systems research. However, the number of papers designing and building new sociotechnical systems has not kept pace. We analyze challenges facing social computing systems research, ranging from misaligned methodological incentives, evaluation expectations, double standards, and relevance compared to industry. We suggest improvements for the community to consider so that we can chart the future of our field. ° These authors contributed equally to the paper and are listed alphabetically. Copyright is held by the author/owner(s).
SALEEMA AMERSHI RESEARCH STATEMENT
"... The unprecedented opportunity for big data to enhance our capabilities and improve our lives is limited by our ability to use that data. Machine learning can give us this ability by transforming raw data into the building blocks necessary for configuring automated behaviors. However, the complexity ..."
Abstract
- Add to MetaCart
The unprecedented opportunity for big data to enhance our capabilities and improve our lives is limited by our ability to use that data. Machine learning can give us this ability by transforming raw data into the building blocks necessary for configuring automated behaviors. However, the complexity of machine learning has largely restricted its use to experts and skilled developers. For example, trained developers can employ machine learning to automatically detect objects, organize information, and understand and predict behaviors. In contrast, ill-equipped end-users are limited to using machine learning for simple personalization based on developer-conceived notions of interest or similarity. Even within personalization systems, end-user control over the machine learning is typically and intentionally reduced to simple object labeling. As a human-computer interaction researcher, I aim to put the full potential of machine learning in the hands of everyday people. CURRENT RESEARCH My dissertation examines the fundamental process by which end-users interact with machine learning systems (Figure 1). In this process, a person iteratively guides a
WiGipedia: A Tool for Improving Structured Data in Wikipedia
"... Abstract—Wikipedia is emerging as the dominant global knowledge repository. Recently, large numbers of users have collaborated to produce more structured information in the so called “infoboxes”. However, editing this data requires even more care than editing standard wikitext, as one must follow ar ..."
Abstract
- Add to MetaCart
Abstract—Wikipedia is emerging as the dominant global knowledge repository. Recently, large numbers of users have collaborated to produce more structured information in the so called “infoboxes”. However, editing this data requires even more care than editing standard wikitext, as one must follow arcane template syntax. This paper describes WiGipedia, a novel tool which provides an alternative to the traditional approach, by supporting editing of structured wiki data through two intuitive and interactive interfaces, facilitating user input on both tabular and graph-based representations of structured data. The tool allows users to identify and correct inconsistencies that are otherwise hidden across multiple articles. Furthermore, a novel recommendation algorithm is applied to assist users in their contribution to the wiki. The paper discusses design, implementation details, and results of a usability study in which the system compares significantly well against the traditional approach to editing Wikipedia infoboxes. I.

