Results 1 - 10
of
17
Inter-Coder Agreement for Computational Linguistics
- COMPUTATIONAL LINGUISTICS
, 2008
"... This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff’s alpha as well as Scott’s pi and Cohen’s kappa; discusses the use of coefficients in several annotation tasks; ..."
Abstract
-
Cited by 54 (1 self)
- Add to MetaCart
This article is a survey of methods for measuring agreement among corpus annotators. It exposes the mathematics and underlying assumptions of agreement coefficients, covering Krippendorff’s alpha as well as Scott’s pi and Cohen’s kappa; discusses the use of coefficients in several annotation tasks; and argues that weighted, alpha-like coefficients, traditionally less used than kappa-like measures in Computational Linguistics, may be more appropriate for many corpus annotation tasks – but that their use makes the interpretation of the value of the coefficient even harder.
Interpreting social science link analysis research: A theoretical framework
- Journal of the American Society for Information Science and Technology
, 2006
"... Link analysis in various forms is now an established technique in many different subjects, reflecting the perceived importance of links and that of the web. A critical but very difficult issue is how to interpret the results of social science link analyses. It is argued that the dynamic nature of th ..."
Abstract
-
Cited by 7 (1 self)
- Add to MetaCart
Link analysis in various forms is now an established technique in many different subjects, reflecting the perceived importance of links and that of the web. A critical but very difficult issue is how to interpret the results of social science link analyses. It is argued that the dynamic nature of the web, its lack of quality control and the online proliferation of copying and imitation mean that methodologies operating within a highly positivist, quantitative framework are ineffective. Conversely, the sheer variety of the web makes qualitative methodologies and pure reason very problematic to apply to large-scale studies. Methodology triangulation is consequently advocated, in combination with a warning that the web is incapable of giving definitive answers to large-scale link analysis research questions concerning social factors underlying link creation. Finally, it is claimed that whilst theoretical frameworks with which to guide research are appropriate, a Theory of Link Analysis is not possible.
Hyperlinks as a data source for science mapping
- JOURNAL OF INFORMATION SCIENCE
, 2004
"... Hyperlinks between academic web sites, like citations, can potentially be used to map disciplinary structures and identify evidence of connections between disciplines. In this paper we classified a sample of links originating in three different disciplines: maths, physics and sociology. Links within ..."
Abstract
-
Cited by 6 (4 self)
- Add to MetaCart
Hyperlinks between academic web sites, like citations, can potentially be used to map disciplinary structures and identify evidence of connections between disciplines. In this paper we classified a sample of links originating in three different disciplines: maths, physics and sociology. Links within a discipline were found to be different in character to links between pages in different disciplines. There were also disciplinary differences in both types of link. As a consequence, we argue that interpretations of web science maps covering multiple disciplines will need to be sensitive to the contexts of the links mapped.
CommentSpace: Structured Support for Collaborative Visual Analysis
"... Collaborative visual analysis tools can enhance sensemaking by facilitating social interpretation and parallelization of effort. These systems enable distributed exploration and evidence gathering, allowing many users to pool their effort as they discuss and analyze the data. We explore how adding l ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Collaborative visual analysis tools can enhance sensemaking by facilitating social interpretation and parallelization of effort. These systems enable distributed exploration and evidence gathering, allowing many users to pool their effort as they discuss and analyze the data. We explore how adding lightweight tag and link structure to comments can aid this analysis process. We present CommentSpace, a collaborative system in which analysts comment on visualizations and websites and then use tags and links to organize findings and identify others ’ contributions. In a pair of studies comparing CommentSpace to a system without support for tags and links, we find that a small, fixed vocabulary of tags (question, hypothesis, to-do) and links (evidencefor, evidence-against) helps analysts more consistently and accurately classify evidence and establish common ground. We also find that managing and incentivizing participation is important for analysts to progress from exploratory analysis to deeper analytical tasks. Finally, we demonstrate that tags and links can help teams complete evidence gathering and synthesis tasks and that organizing comments using tags and links improves analytic results. Author Keywords Information visualization, asynchronous collaboration, social data analysis, tagging
The Statistics of Text: New methods for Content Analysis
"... Computer content analysis (CCA) is used across the social sciences, and is beginning to find a range of applications in political science. These have traditionally been concentrated on political communication and policy analysis in America and Western Europe (Laver and Garry, 2000; Pennings and Kema ..."
Abstract
- Add to MetaCart
Computer content analysis (CCA) is used across the social sciences, and is beginning to find a range of applications in political science. These have traditionally been concentrated on political communication and policy analysis in America and Western Europe (Laver and Garry, 2000; Pennings and Keman, 2002), although CCA is potentially appropriate anywhere traditional discourse analysis might normally be considered (Neuendorf, 2002; Abdelal et al., 2003). In the dominant approach to CCA, the researcher constructs a category system or ’dictionary ’ that associates a set of words with each theoretically relevant concept, and summarizes a document’s content in a vector of category occurrence frequencies. More linguistically sophisticated methods have been used for particular research problems; two important examples are the use of partial parsing and information
Yoshikoder: An Open Source Multilingual Content Analysis Tool for Social Scientists
"... This short paper is about the Yoshikoder 1, an open-source desktop tool for performing classical computer-aided content analysis in multiple languages. The paper starts with some background on content analysis, continues with a short technical characterization of the Yoshikoder as a content analysis ..."
Abstract
- Add to MetaCart
This short paper is about the Yoshikoder 1, an open-source desktop tool for performing classical computer-aided content analysis in multiple languages. The paper starts with some background on content analysis, continues with a short technical characterization of the Yoshikoder as a content analysis tool, and concludes with a some necessarily brief examples of the kind of analysis the Yoshikoder makes possible. Classical Content Analysis By classical content analysis I mean the tradition of examining word frequencies, creating concordances, and building content dictionaries in order to operationalize substantively interesting aspects of document meaning (West, 2001; Neuendorf, 2002, for reviews). There are, of course, other traditions of content analysis e.g. discourse analysis, cognitive mapping, and collocational clustering, with specialized software available often available to apply each method (see Herrera and Braumoeller, 2004, for some comparisons). Content analysis also borrows technology from computational linguistics (Manning and Schütze, 2000; Jurafsky and Martin, 2000). However, the Yoshikoder is designed primarily for classical content analysis so I will not discuss alternative methods.
Measuring Qualitative Information in Capital Markets Research
, 2010
"... A growing stream of research in accounting and finance tests the extent to which the tone of financial disclosure narrative, also referred to as its qualitative information, affects security prices, over and above the disclosed financial performance. These studies typically measure tone by counting ..."
Abstract
- Add to MetaCart
A growing stream of research in accounting and finance tests the extent to which the tone of financial disclosure narrative, also referred to as its qualitative information, affects security prices, over and above the disclosed financial performance. These studies typically measure tone by counting the relative frequency of positive versus negative words in a given disclosure such as earnings press releases. Critical to word-frequency based analysis is the list of words deemed to be positive or negative. Because general wordlists (GI or Diction) likely omit words that would be considered positive or negative in the context of financial disclosure and include words that would not, we expect that these general wordlists be less powerful for hypothesis testing compared to wordlists specifically for the domain of financial disclosure (FD). Using a sample of 29,712 earnings press releases, we find that the context-specific FD wordlist produces a more powerful predictor of market reaction than the general wordlists. Additionally, in smaller samples – demonstrated here with 250 regressions using randomly-selected subsamples ranging in size from 50 to 2,000 – the domain-specific FD wordlist retains predictive ability, with rejection rates exceeding 97 percent for samples of 2,000 while the rejection rates for the general wordlists are less than 30 percent. The FD wordlist also performs better than an alternative, domain-specific wordlist. Overall, our findings indicate that the domain-specific FD wordlist provides an alternative, more powerful measure of tone for capital markets researchers. Finally, we show that equal weighting of word occurrences is more intuitive, easier to implement, and more amenable to replication than alternative sample-dependent weighting methodologies advocated by certain concurrent research.
Climate change and journalistic norms: A case-study of US
, 2001
"... www.elsevier.com/locate/geoforum ..."
Influence of Pre-annotation on POS-tagged Corpus Development
"... This article details a series of carefully designed experiments aiming at evaluating the influence of automatic pre-annotation on the manual part-of-speech annotation of a corpus, both from the quality and the time points of view, with a specific attention drawn to biases. For this purpose, we manua ..."
Abstract
- Add to MetaCart
This article details a series of carefully designed experiments aiming at evaluating the influence of automatic pre-annotation on the manual part-of-speech annotation of a corpus, both from the quality and the time points of view, with a specific attention drawn to biases. For this purpose, we manually annotated parts of the Penn Treebank corpus (Marcus et al., 1993) under various experimental setups, either from scratch or using various pre-annotations. These experiments confirm and detail the gain in quality observed before (Marcus et al., 1993; Dandapat et al., 2009; Rehbein et al., 2009), while showing that biases do appear and should be taken into account. They finally demonstrate that even a not so accurate tagger can help improving annotation speed. 1
ECPR Ljubjana Course 17: Quantitative Text Analysis Course Details
, 2009
"... The course is intended to survey and characterize methods for systematically extracting information from text for social scientific purposes, as well as to teach students how to apply these methods in practical research. It takes as a starting point more traditional methods of content analysis, but ..."
Abstract
- Add to MetaCart
The course is intended to survey and characterize methods for systematically extracting information from text for social scientific purposes, as well as to teach students how to apply these methods in practical research. It takes as a starting point more traditional methods of content analysis, but is aimed at the most recent advances in quantitative content analysis that treat words as data to be analysed using statistical tools. The course surveys several of these methods but also applies the statistical framework to more traditional non-automated coding schemes such as the Comparative Manifesto Project. It is also designed to cover many fundamental issues such as inter-coder agreement, reliability, validation, accuracy, and precision. Lessons will consist of a mixture of theoretical grounding in content analysis approaches and techniques, with hands-on analysis of real texts using content analytic and statistical software. Prior Knowledge Ideally, students in this course will have prior knowledge in the following areas: • A basic understanding of probability and statistics at the level of an introductory postgraduate social science course. Understanding of regression analysis is presumed; • Familiarity with the language R.

