• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

Termite: visualization techniques for assessing textual topic models (2012)

by J Chuang, C D Manning, J Heer
Venue:in AVI
Add To MetaCart

Tools

Sorted by:
Results 1 - 10 of 26
Next 10 →

Topic Model Diagnostics: Assessing Domain Relevance via Topical Alignment (Supplementary Materials)

by Jason Chuang, Sonal Gupta, Christopher D. Manning, Jeffrey Heer
"... the main paper with additional details in the caption. Supplementary Figure 3 shows additional data points for Figure 7 in the main paper. 2. Expert-Authored Concepts in Information Visualization We conducted a survey asking ten experienced information visualization (InfoVis) researchers to identify ..."
Abstract - Cited by 9 (1 self) - Add to MetaCart
the main paper with additional details in the caption. Supplementary Figure 3 shows additional data points for Figure 7 in the main paper. 2. Expert-Authored Concepts in Information Visualization We conducted a survey asking ten experienced information visualization (InfoVis) researchers to identify what they consider to be significant and coherent areas of research in their field. Participants were asked to label each area, and describe it with lists of exemplary terms and documents. We focused on InfoVis research due to relevance, scope and familiarity. Analysis of academic publications is one of the common real-world uses of topic modeling

HierarchicalTopics: Visually exploring large text collections using topic hierarchies

by Wenwen Dou, Li Yu, Xiaoyu Wang, Zhiqiang Ma, William Ribarsky - IEEE TVCG
"... Accepted for publication by IEEE. ©2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/ ..."
Abstract - Cited by 9 (1 self) - Add to MetaCart
Accepted for publication by IEEE. ©2013 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/
(Show Context)

Citation Context

...tions of topical results, visual text analytics researchers have designed algorithms and visual representations that make the probabilistic topic results legible and exploratory to a broader audience =-=[8, 9, 10, 14, 15, 26, 34]-=-. Examples of the utility of these topic-based visualization interfaces include the analysis of social media users based on the content they generated [22], depiction of the temporal evolution of topi...

Crowd synthesis: extracting categories and clusters from complex data

by Aniket Kittur, Steven P. Dow - In Proc. CSCW , 2014
"... Analysts synthesize complex, qualitative data to uncover themes and concepts, but the process is time-consuming, cog-nitively taxing, and automated techniques show mixed suc-cess. Crowdsourcing could help this process through on-demand harnessing of flexible and powerful human cog-nition, but incurs ..."
Abstract - Cited by 6 (2 self) - Add to MetaCart
Analysts synthesize complex, qualitative data to uncover themes and concepts, but the process is time-consuming, cog-nitively taxing, and automated techniques show mixed suc-cess. Crowdsourcing could help this process through on-demand harnessing of flexible and powerful human cog-nition, but incurs other challenges including limited atten-tion and expertise. Further, text data can be complex, high-dimensional, and ill-structured. We address two major chal-lenges unsolved in prior crowd clustering work: scaffolding expertise for novice crowd workers, and creating consistent and accurate categories when each worker only sees a small portion of the data. To address these challenges we present an empirical study of a two-stage approach to enable crowds to create an accurate and useful overview of a dataset: A) we draw on cognitive theory to assess how re-representing data can shorten and focus the data on salient dimensions; and B) introduce an iterative clustering approach that pro-vides workers a global overview of data. We demonstrate a classification-plus-context approach elicits the most accurate categories at the most useful level of abstraction.

Visualizing and verifying directed social queries

by Angus Graeme Forbes, Saiph Savage, Tobias Höllerer - IEEE Workshop on Interactive Visual Text Analytics , 2012
"... Abstract—We present a novel visualization system that automatically classifies social network data in order to support a user’s directed social queries and, furthermore, that allows the user to quickly verify the accuracy of the classifications. We model a user’s friends ’ interests in particular to ..."
Abstract - Cited by 5 (5 self) - Add to MetaCart
Abstract—We present a novel visualization system that automatically classifies social network data in order to support a user’s directed social queries and, furthermore, that allows the user to quickly verify the accuracy of the classifications. We model a user’s friends ’ interests in particular topics through the creation of a crowd-sourced knowledge base comprised of terms related to userspecified semantic categories. Modeling friends in terms of these topics enables precise and efficient social querying to effectively fulfill a user’s information needs. That is, our system makes it possible to quickly identify friends who have a high probability of being able to answer particular questions or of having a shared interested in particular topics. Our initial investigations indicate that our model is effective at correlating friends to these topics even without sentiment or other lexical analyses. However, even the most robust system may produce results that have false positives or false negatives due to inaccurate classifications stemming from incorrect, polysemous, or sparse data. To mitigate these errors, and to allow for more fine-grained control over the selection of friends for directed social queries, an interactive visualization exposes the results of our model and enables a human-in-the-loop approach for result analysis and verification. A qualitative analysis of our verification system indicates that the transparent representation of our shared-interest modeling algorithm leads to an increased effectiveness of the model. Index Terms—Interactive verification, user classification, shared-interest modeling, topic modeling, social network visualization. 1
(Show Context)

Citation Context

...akes insights from early work on explanations in recommender systems [7] and more recent research on revealing the inner workings of recommendations and topic modeling using interactive visualization =-=[4, 5, 6, 20]-=-. 2 SHARED-INTEREST MODELING We classify each of the user’s friends in terms of their potential to fulfill the user’s information needs. Although our model could be extended to other social networks, ...

Topic Models and Metadata for Visualizing Text Corpora

by Justin Snyder, Rebecca Knowles, Mark Dredze, Matthew R. Gormley, Travis Wolfe
"... Effectively exploring and analyzing large text corpora requires visualizations that provide a high level summary. Past work has relied on faceted browsing of document metadata or on natural language processing of document text. In this paper, we present a new web-based tool that integrates topics le ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Effectively exploring and analyzing large text corpora requires visualizations that provide a high level summary. Past work has relied on faceted browsing of document metadata or on natural language processing of document text. In this paper, we present a new web-based tool that integrates topics learned from an unsupervised topic model in a faceted browsing experience. The user can manage topics, filter documents by topic and summarize views with metadata and topic graphs. We report a user study of the usefulness of topics in our tool. 1

Low-dimensional Embeddings for Interpretable Anchor-based Topic Inference

by Moontae Lee, David Mimno
"... The anchor words algorithm performs provably efficient topic model inference by finding an approximate convex hull in a high-dimensional word co-occurrence space. However, the existing greedy al-gorithm often selects poor anchor words, reducing topic quality and interpretability. Rather than finding ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
The anchor words algorithm performs provably efficient topic model inference by finding an approximate convex hull in a high-dimensional word co-occurrence space. However, the existing greedy al-gorithm often selects poor anchor words, reducing topic quality and interpretability. Rather than finding an approximate con-vex hull in a high-dimensional space, we propose to find an exact convex hull in a visualizable 2- or 3-dimensional space. Such low-dimensional embeddings both improve topics and clearly show users why the algorithm selects certain words. 1
(Show Context)

Citation Context

...ol the subjectivity in labelings between annotators, which is open to interpretive errors. There has been considerable interest in automating the labeling process (Mei et al., 2007; Lau et al., 2011; =-=Chuang et al., 2012-=-). (Chuang et al., 2012) propose a measure of saliency: a good summary term should be both distinctive specifically to one topic and probable in that topic. Anchor words are by definition optimally di...

Diagnoses, Decisions, and Outcomes: Web Search as Decision Support for Cancer

by Michael J. Paul, Ryen W. White, Eric Horvitz
"... People diagnosed with a serious illness often turn to the Web for their rising information needs, especially when decisions are required. We analyze the search and browsing behavior of searchers who show a surge of interest in prostate cancer. Prostate cancer is the most common serious cancer in men ..."
Abstract - Cited by 2 (1 self) - Add to MetaCart
People diagnosed with a serious illness often turn to the Web for their rising information needs, especially when decisions are required. We analyze the search and browsing behavior of searchers who show a surge of interest in prostate cancer. Prostate cancer is the most common serious cancer in men and is a leading cause of cancer-related death. Diagnoses of prostate cancer typically involve reflection and decision making about treatment based on assessments of preferences and outcomes. We annotated timelines of treatment-related queries from nearly 300 searchers with tags indicating differ-ent phases of treatment, including decision making, prepa-ration, and recovery. Using this corpus, we present a vari-ety of analyses toward the goal of understanding search and decision making about treatments. We characterize search queries and the content of accessed pages for different treat-ment phases, model search behavior during the decision-making phase, and create an aggregate alignment of treat-ment timelines illustrated with a variety of visualizations. The experiments provide insights about how people who are engaged in intensive searches about prostate cancer over an extended period of time pursue and access information from the Web.
(Show Context)

Citation Context

...m search queries and webpage bodies—and domain names that are most associated with retrieval in each phase. We wish to identify features that are salient—both probable and representative of the phase =-=[8]-=-. We achieve this with a two-component mixture model that mixes phase-specific feature distributions with a phase-independent background distribution which accounts for common features that are not re...

Large-scale examination of academic publications using statistical models

by Jason Chuang, Daniel Ramage, Daniel A. Mcfarl, Christopher D. Manning, Jeffrey Heer - In International Working Conference on Advanced Visual Interfaces (AVI): Workshop on Supporting Asynchronous Collaboration in Visual Analytics Systems , 2012
"... We describe our experiences in three collaborative visual analytics projects. The projects center on large-scale examination of academic publications using statistical models. Each project involves a multidisciplinary team of social scientists, ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
We describe our experiences in three collaborative visual analytics projects. The projects center on large-scale examination of academic publications using statistical models. Each project involves a multidisciplinary team of social scientists,
(Show Context)

Citation Context

...ing a shared representation to enable effective communication. We are currently investigating visualizations that expose topic models in one more level of details — at the level of word distributions =-=[4]-=-. Aligning models and visualizations can be labor intensive as demonstrated by the manual grouping and labeling of areas. We are interested in modeling techniques and HCI approaches that can speed up ...

Infrastructure for Supporting Exploration and Discovery in Web Archives

by Jimmy Lin, Milad Gholami, Jinfeng Rao
"... Web archiving initiatives around the world capture ephem-eral web content to preserve our collective digital memory. However, unlocking the potential of web archives requires tools that support exploration and discovery of captured content. These tools need to be scalable and responsive, and to this ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Web archiving initiatives around the world capture ephem-eral web content to preserve our collective digital memory. However, unlocking the potential of web archives requires tools that support exploration and discovery of captured content. These tools need to be scalable and responsive, and to this end we believe that modern“big data ” infrastruc-ture can provide a solid foundation. We present Warcbase, an open-source platform for managing web archives built on the distributed datastore HBase. Our system provides a flexible data model for storing and managing raw content as well as metadata and extracted knowledge. Tight integra-tion with Hadoop provides powerful tools for analytics and data processing. Relying on HBase for storage infrastruc-ture simplifies the development of scalable and responsive applications. We describe a service that provides tempo-ral browsing and an interactive visualization based on topic models that allows users to explore archived content.
(Show Context)

Citation Context

...llection as a sequence of temporal slices, where each slice corresponds to a monthly crawl. On each slice we run LDA, and the induced topic models are then visualized with a custom variant of Termite =-=[6]-=-, as shown in Figure 2. The main visualization area displays a person-by-topic matrix, where the rows represent websites that are associated with U.S. senators and the columns represent the topics (du...

A Topic-Based Search, Visualization, and Exploration System

by Christan Grant, Clint P. George, Virupaksha Kanjilal, Supriya Nirkhiwale, Joseph N. Wilson, Daisy Zhe Wang
"... From literature surveys to legal document collections, people need to organize and explore large amounts of documents. During these tasks, students and researchers will search for documents based on particular themes. In this paper, we use a popular topic modeling algorithm, Latent Dirichlet Alloca- ..."
Abstract - Add to MetaCart
From literature surveys to legal document collections, people need to organize and explore large amounts of documents. During these tasks, students and researchers will search for documents based on particular themes. In this paper, we use a popular topic modeling algorithm, Latent Dirichlet Alloca-tion, to derive topic distributions for articles. We allow users to specify personal topic distribution to contextualize the ex-ploration experience. We introduce three types of exploration: user model re-weighted keyword search, topic-based search, and topic-based exploration. We demonstrate these methods using a scientific citation data set and a Wikipedia article col-lection. We also describe the user interaction model. 1
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University