Semi-supervised Extraction of Entity Aspects Using Topic Models
BibTeX
@MISC{Sharifi_semi-supervisedextraction,
author = {Mehrbod Sharifi and Jaime Carbonell and William Cohen},
title = {Semi-supervised Extraction of Entity Aspects Using Topic Models},
year = {}
}
OpenURL
Abstract
Information extraction techniques (such as Named Entity Recognition) have long been used to extract useful pieces of information from text. The types of information to be extracted are generally fixed and well defined (e.g., names of people, organizations, etc.). However in some cases, the user goal is more abstract and information types cannot be narrowly defined. For example, a reader of online user reviews typically has the goal of making a good choice and is interested to learn about the different aspects or attributes that people have mentioned for an entity (e.g., quality of service for a restaurant or battery life of a digital camera). Some of these aspects may be known by the reader and some others may need to be discovered from the inherent text structure in a large collection. Even for the known aspects (such as “service ” for a restaurant), the challenge is to recognize various expressions (e.g., “long wait ” or “friendly waiter”). In this thesis, we model the entity aspects as topics with identifiable word distributions across documents. We review several probabilistic graphical models (such as Latent Dirichlet Allocation) and propose a new model which can operate in a semi-supervised setting. We







