Results 1 - 10
of
26
Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment (Supplementary Materials)
"... the main paper with additional details in the caption. Supplementary Figure 3 shows additional data points for Figure 7 in the main paper. 2. Expert-Authored Concepts in Information Visualization We conducted a survey asking ten experienced information visualization (InfoVis) researchers to identify ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
the main paper with additional details in the caption. Supplementary Figure 3 shows additional data points for Figure 7 in the main paper. 2. Expert-Authored Concepts in Information Visualization We conducted a survey asking ten experienced information visualization (InfoVis) researchers to identify what they consider to be significant and coherent areas of research in their field. Participants were asked to label each area, and describe it with lists of exemplary terms and documents. We focused on InfoVis research due to relevance, scope and familiarity. Analysis of academic publications is one of the common real-world uses of topic modeling
HierarchicalTopics: Visually exploring large text collections using topic hierarchies
- IEEE TVCG
"... Accepted for publication by IEEE. ©2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/ ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
(Show Context)
Accepted for publication by IEEE. ©2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/
Crowd synthesis: extracting categories and clusters from complex data
- In Proc. CSCW
, 2014
"... Analysts synthesize complex, qualitative data to uncover themes and concepts, but the process is time-consuming, cog-nitively taxing, and automated techniques show mixed suc-cess. Crowdsourcing could help this process through on-demand harnessing of flexible and powerful human cog-nition, but incurs ..."
Abstract
-
Cited by 6 (2 self)
- Add to MetaCart
Analysts synthesize complex, qualitative data to uncover themes and concepts, but the process is time-consuming, cog-nitively taxing, and automated techniques show mixed suc-cess. Crowdsourcing could help this process through on-demand harnessing of flexible and powerful human cog-nition, but incurs other challenges including limited atten-tion and expertise. Further, text data can be complex, high-dimensional, and ill-structured. We address two major chal-lenges unsolved in prior crowd clustering work: scaffolding expertise for novice crowd workers, and creating consistent and accurate categories when each worker only sees a small portion of the data. To address these challenges we present an empirical study of a two-stage approach to enable crowds to create an accurate and useful overview of a dataset: A) we draw on cognitive theory to assess how re-representing data can shorten and focus the data on salient dimensions; and B) introduce an iterative clustering approach that pro-vides workers a global overview of data. We demonstrate a classification-plus-context approach elicits the most accurate categories at the most useful level of abstraction.
Visualizing and verifying directed social queries
- IEEE Workshop on Interactive Visual Text Analytics
, 2012
"... Abstract—We present a novel visualization system that automatically classifies social network data in order to support a user’s directed social queries and, furthermore, that allows the user to quickly verify the accuracy of the classifications. We model a user’s friends ’ interests in particular to ..."
Abstract
-
Cited by 5 (5 self)
- Add to MetaCart
(Show Context)
Abstract—We present a novel visualization system that automatically classifies social network data in order to support a user’s directed social queries and, furthermore, that allows the user to quickly verify the accuracy of the classifications. We model a user’s friends ’ interests in particular topics through the creation of a crowd-sourced knowledge base comprised of terms related to userspecified semantic categories. Modeling friends in terms of these topics enables precise and efficient social querying to effectively fulfill a user’s information needs. That is, our system makes it possible to quickly identify friends who have a high probability of being able to answer particular questions or of having a shared interested in particular topics. Our initial investigations indicate that our model is effective at correlating friends to these topics even without sentiment or other lexical analyses. However, even the most robust system may produce results that have false positives or false negatives due to inaccurate classifications stemming from incorrect, polysemous, or sparse data. To mitigate these errors, and to allow for more fine-grained control over the selection of friends for directed social queries, an interactive visualization exposes the results of our model and enables a human-in-the-loop approach for result analysis and verification. A qualitative analysis of our verification system indicates that the transparent representation of our shared-interest modeling algorithm leads to an increased effectiveness of the model. Index Terms—Interactive verification, user classification, shared-interest modeling, topic modeling, social network visualization. 1
Topic Models and Metadata for Visualizing Text Corpora
"... Effectively exploring and analyzing large text corpora requires visualizations that provide a high level summary. Past work has relied on faceted browsing of document metadata or on natural language processing of document text. In this paper, we present a new web-based tool that integrates topics le ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Effectively exploring and analyzing large text corpora requires visualizations that provide a high level summary. Past work has relied on faceted browsing of document metadata or on natural language processing of document text. In this paper, we present a new web-based tool that integrates topics learned from an unsupervised topic model in a faceted browsing experience. The user can manage topics, filter documents by topic and summarize views with metadata and topic graphs. We report a user study of the usefulness of topics in our tool. 1
Low-dimensional Embeddings for Interpretable Anchor-based Topic Inference
"... The anchor words algorithm performs provably efficient topic model inference by finding an approximate convex hull in a high-dimensional word co-occurrence space. However, the existing greedy al-gorithm often selects poor anchor words, reducing topic quality and interpretability. Rather than finding ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
(Show Context)
The anchor words algorithm performs provably efficient topic model inference by finding an approximate convex hull in a high-dimensional word co-occurrence space. However, the existing greedy al-gorithm often selects poor anchor words, reducing topic quality and interpretability. Rather than finding an approximate con-vex hull in a high-dimensional space, we propose to find an exact convex hull in a visualizable 2- or 3-dimensional space. Such low-dimensional embeddings both improve topics and clearly show users why the algorithm selects certain words. 1
Diagnoses, Decisions, and Outcomes: Web Search as Decision Support for Cancer
"... People diagnosed with a serious illness often turn to the Web for their rising information needs, especially when decisions are required. We analyze the search and browsing behavior of searchers who show a surge of interest in prostate cancer. Prostate cancer is the most common serious cancer in men ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
(Show Context)
People diagnosed with a serious illness often turn to the Web for their rising information needs, especially when decisions are required. We analyze the search and browsing behavior of searchers who show a surge of interest in prostate cancer. Prostate cancer is the most common serious cancer in men and is a leading cause of cancer-related death. Diagnoses of prostate cancer typically involve reflection and decision making about treatment based on assessments of preferences and outcomes. We annotated timelines of treatment-related queries from nearly 300 searchers with tags indicating differ-ent phases of treatment, including decision making, prepa-ration, and recovery. Using this corpus, we present a vari-ety of analyses toward the goal of understanding search and decision making about treatments. We characterize search queries and the content of accessed pages for different treat-ment phases, model search behavior during the decision-making phase, and create an aggregate alignment of treat-ment timelines illustrated with a variety of visualizations. The experiments provide insights about how people who are engaged in intensive searches about prostate cancer over an extended period of time pursue and access information from the Web.
Large-scale examination of academic publications using statistical models
- In International Working Conference on Advanced Visual Interfaces (AVI): Workshop on Supporting Asynchronous Collaboration in Visual Analytics Systems
, 2012
"... We describe our experiences in three collaborative visual analytics projects. The projects center on large-scale examination of academic publications using statistical models. Each project involves a multidisciplinary team of social scientists, ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
We describe our experiences in three collaborative visual analytics projects. The projects center on large-scale examination of academic publications using statistical models. Each project involves a multidisciplinary team of social scientists,
Infrastructure for Supporting Exploration and Discovery in Web Archives
"... Web archiving initiatives around the world capture ephem-eral web content to preserve our collective digital memory. However, unlocking the potential of web archives requires tools that support exploration and discovery of captured content. These tools need to be scalable and responsive, and to this ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Web archiving initiatives around the world capture ephem-eral web content to preserve our collective digital memory. However, unlocking the potential of web archives requires tools that support exploration and discovery of captured content. These tools need to be scalable and responsive, and to this end we believe that modern“big data ” infrastruc-ture can provide a solid foundation. We present Warcbase, an open-source platform for managing web archives built on the distributed datastore HBase. Our system provides a flexible data model for storing and managing raw content as well as metadata and extracted knowledge. Tight integra-tion with Hadoop provides powerful tools for analytics and data processing. Relying on HBase for storage infrastruc-ture simplifies the development of scalable and responsive applications. We describe a service that provides tempo-ral browsing and an interactive visualization based on topic models that allows users to explore archived content.
A Topic-Based Search, Visualization, and Exploration System
"... From literature surveys to legal document collections, people need to organize and explore large amounts of documents. During these tasks, students and researchers will search for documents based on particular themes. In this paper, we use a popular topic modeling algorithm, Latent Dirichlet Alloca- ..."
Abstract
- Add to MetaCart
From literature surveys to legal document collections, people need to organize and explore large amounts of documents. During these tasks, students and researchers will search for documents based on particular themes. In this paper, we use a popular topic modeling algorithm, Latent Dirichlet Alloca-tion, to derive topic distributions for articles. We allow users to specify personal topic distribution to contextualize the ex-ploration experience. We introduce three types of exploration: user model re-weighted keyword search, topic-based search, and topic-based exploration. We demonstrate these methods using a scientific citation data set and a Wikipedia article col-lection. We also describe the user interaction model. 1