Results 1 -
7 of
7
A Survey of Modern Authorship Attribution Methods
- JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY
"... Authorship attribution supported by statistical or computational methods has a long history starting from 19th century and marked by the seminal study of Mosteller and Wallace (1964) on the authorship of the disputed Federalist Papers. During the last decade, this scientific field has been developed ..."
Abstract
-
Cited by 18 (0 self)
- Add to MetaCart
Authorship attribution supported by statistical or computational methods has a long history starting from 19th century and marked by the seminal study of Mosteller and Wallace (1964) on the authorship of the disputed Federalist Papers. During the last decade, this scientific field has been developed substantially taking advantage of research advances in areas such as machine learning, information retrieval, and natural language processing. The plethora of available electronic texts (e.g., e-mail messages, online forum messages, blogs, source code, etc.) indicates a wide variety of applications of this technology provided it is able to handle short and noisy text from multiple candidate authors. In this paper, a survey of recent advances of the automated approaches to attributing authorship is presented examining their characteristics for both text representation and text classification. The focus of this survey is on computational requirements and settings rather than linguistic or literary issues. We also discuss evaluation methodologies and criteria for authorship attribution studies and list open questions that will attract future work in this area.
Individual and Domain Adaptation in Sentence Planning for Dialogue
"... One of the biggest challenges in the development and deployment of spoken dialogue systems is the design of the spoken language generation module. This challenge arises from the need for the generator to adapt to many features of the dialogue domain, user population, and dialogue context. A promisin ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
One of the biggest challenges in the development and deployment of spoken dialogue systems is the design of the spoken language generation module. This challenge arises from the need for the generator to adapt to many features of the dialogue domain, user population, and dialogue context. A promising approach is trainable generation, which uses general-purpose linguistic knowledge that is automatically adapted to the features of interest, such as the application domain, individual user, or user group. In this paper we present and evaluate a trainable sentence planner for providing restaurant information in the MATCH dialogue system. We show that trainable sentence planning can produce complex information presentations whose quality is comparable to the output of a templatebased generator tuned to this domain. We also show that our method easily supports adapting the sentence planner to individuals, and that the individualized sentence planners generally perform better than models trained and tested on a population of individuals. Previous work has documented and utilized individual preferences for content selection, but to our knowledge, these results provide the first demonstration of individual preferences for sentence planning operations, affecting the content order, discourse structure and sentence structure of system responses. Finally, we evaluate the contribution of different feature sets, and show that, in our application, n-gram features often do as well as features based on higher-level linguistic representations. 1.
MULTI-CLASS FEATURE SELECTION WITH SUPPORT VECTOR MACHINES
, 2008
"... ABSTRACT: We consider feature selection in a multi-class setting where the goal is to find a small set of features for all the classes simultaneously. We develop an embedded method for this problem using the idea of scaling factors. Training involves the solution of a convex program for which we giv ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
ABSTRACT: We consider feature selection in a multi-class setting where the goal is to find a small set of features for all the classes simultaneously. We develop an embedded method for this problem using the idea of scaling factors. Training involves the solution of a convex program for which we give a scalable algorithm. The method is closely related to extensions of L1 regularization and recursive feature elimination. These methods are shown to be very effective in text classification.
Investigating topic influence in authorship attribution
"... The aim of this paper is to explore text topic influence in authorship attribution. Specifically, we test the widely accepted belief that stylometric variables commonly used in authorship attribution are topic-neutral and can be used in multi-topic corpora. In order to investigate this hypothesis, w ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
The aim of this paper is to explore text topic influence in authorship attribution. Specifically, we test the widely accepted belief that stylometric variables commonly used in authorship attribution are topic-neutral and can be used in multi-topic corpora. In order to investigate this hypothesis, we created a special corpus, which was controlled for topic and author simultaneously. The corpus consists of 200 Modern Greek newswire articles written by two authors in two different topics. Many commonly used stylometric variables were calculated and for each one we performed a two-way ANOVA test, in order to estimate the main effects of author, topic and the interaction between them. The results showed that most of the variables exhibit considerable correlation with the text topic and their exploitation in authorship analysis should be done with caution.
Authorship Attribution with Latent Dirichlet Allocation
"... The problem of authorship attribution – attributing texts to their original authors – has been an active research area since the end of the 19th century, attracting increased interest in the last decade. Most of the work on authorship attribution focuses on scenarios with only a few candidate author ..."
Abstract
- Add to MetaCart
The problem of authorship attribution – attributing texts to their original authors – has been an active research area since the end of the 19th century, attracting increased interest in the last decade. Most of the work on authorship attribution focuses on scenarios with only a few candidate authors, but recently considered cases with tens to thousands of candidate authors were found to be much more challenging. In this paper, we propose ways of employing Latent Dirichlet Allocation in authorship attribution. We show that our approach yields state-of-the-art performance for both a few and many candidate authors, in cases where these authors wrote enough texts to be modelled effectively. 1
Multi-Class Feature Selection with Support Vector Machines
"... We consider feature selection in a multi-class setting where the goal is to find a small set of features for all the classes simultaneously. We develop an embedded method for this problem using the idea of scaling factors. Training involves the solution of a convex program for which we give a scalab ..."
Abstract
- Add to MetaCart
We consider feature selection in a multi-class setting where the goal is to find a small set of features for all the classes simultaneously. We develop an embedded method for this problem using the idea of scaling factors. Training involves the solution of a convex program for which we give a scalable algorithm. The method is closely related to extensions of L1 regularization and recursive feature elimination. These methods are shown to be very effective in text classification.
2009 10th International Conference on Document Analysis and Recognition Author Identification using Compression Models
"... In this paper we discuss the use of compression algorithms for author identification. We present the basic background about compression algorithms and introduce the Prediction by Partial Matching algorithm, which has been used in our experiments. To better compare the results produced by the PPM alg ..."
Abstract
- Add to MetaCart
In this paper we discuss the use of compression algorithms for author identification. We present the basic background about compression algorithms and introduce the Prediction by Partial Matching algorithm, which has been used in our experiments. To better compare the results produced by the PPM algorithm, we present some experiments using stylometric features used very often by forensic examiners. In this case the authors are modeled using Support Vector Machines. Comprehensive experiments performed on a database composed of 20 different authors show that the PPM algorithm is an interesting alternative for author identification, since all the process of feature definition, extraction, and selection can be avoided. 1

