Results 1 -
9 of
9
Active Learning with Constrained Topic Model
"... Latent Dirichlet Allocation (LDA) is a topic modeling tool that automatically discovers topics from a large collection of documents. It is one of the most popular text analysis tools currently in use. In practice however, the topics discovered by LDA do not al-ways make sense to end users. In this e ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
(Show Context)
Latent Dirichlet Allocation (LDA) is a topic modeling tool that automatically discovers topics from a large collection of documents. It is one of the most popular text analysis tools currently in use. In practice however, the topics discovered by LDA do not al-ways make sense to end users. In this ex-tended abstract, we propose an active learn-ing framework that interactively and itera-tively acquires user feedback to improve the quality of learned topics. We conduct exper-iments to demonstrate its effectiveness with simulated user input on a benchmark dataset. 1
New Topic Detection in Microblogs and Topic Model Evaluation using Topical Alignment
, 2014
"... Copyright by ..."
(Show Context)
Interactive Exploration of Asynchronous Conversations: Applying a User-centered Approach to Design a Visual Text Analytic System
"... Exploring an online conversation can be very difficult for a user, especially when it becomes a long complex thread. We fol-low a human-centered design approach to tightly integrate text mining methods with interactive visualization techniques to sup-port the users in fulfilling their informa-tion n ..."
Abstract
- Add to MetaCart
(Show Context)
Exploring an online conversation can be very difficult for a user, especially when it becomes a long complex thread. We fol-low a human-centered design approach to tightly integrate text mining methods with interactive visualization techniques to sup-port the users in fulfilling their informa-tion needs. The resulting visual text ana-lytic system provides multifaceted explo-ration of asynchronous conversations. We discuss a number of open challenges and possible directions for further improve-ment including the integration of interac-tive human feedback in the text mining loop, applying more advanced text analy-sis methods with visualization techniques, and evaluating the system with real users. 1
Scalable and interpretable data representation for high-dimensional, complex data
"... The majority of machine learning research has been fo-cused on building models and inference techniques with sound mathematical properties and cutting edge perfor-mance. Little attention has been devoted to the develop-ment of data representation that can be used to improve a user’s ability to inter ..."
Abstract
- Add to MetaCart
The majority of machine learning research has been fo-cused on building models and inference techniques with sound mathematical properties and cutting edge perfor-mance. Little attention has been devoted to the develop-ment of data representation that can be used to improve a user’s ability to interpret the data and machine learn-ing models to solve real-world problems. In this paper, we quantitatively and qualitatively evaluate an efficient, accurate and scalable feature-compression method us-ing latent Dirichlet allocation for discrete data. This representation can effectively communicate the charac-teristics of high-dimensional, complex data points. We show that the improvement of a user’s interpretability through the use of a topic modeling-based compres-sion technique is statistically significant, according to a number of metrics, when compared with other repre-sentations. Also, we find that this representation is scal-able — it maintains alignment with human classifica-tion accuracy as an increasing number of data points are shown. In addition, the learned topic layer can semanti-cally deliver meaningful information to users that could potentially aid human reasoning about data characteris-tics in connection with compressed topic space.
Learning Frames from Text with an Unsupervised Latent Variable Model
, 2014
"... We develop a probabilistic latent-variable model to discover semantic frames—types of events and their participants—from corpora. We present a Dirichlet-multinomial model in which frames are latent cate-gories that explain the linking of verb-subject-object triples, given document-level sparsity. We ..."
Abstract
- Add to MetaCart
We develop a probabilistic latent-variable model to discover semantic frames—types of events and their participants—from corpora. We present a Dirichlet-multinomial model in which frames are latent cate-gories that explain the linking of verb-subject-object triples, given document-level sparsity. We analyze what the model learns, and compare it to FrameNet, noting it learns some novel and interesting frames. This document also contains a discussion of inference issues, including concentration parameter learn-ing; and a small-scale error analysis of syntactic parsing accuracy. Note: this work was originally posted online October 2012 as part of CMU MLD’s Data Analysis Project requirement. This version has no new experiments or results, but has added some discussion of new related work. 1