Results 1 - 10
of
12
Towards Deeper Understanding of the LSA Performance
- In Proc. Recent Advances in Natural Language Processing
, 2003
"... The paper presents on-going work towards deeper understanding of the factors influencing the performance of the Latent Semantic Analysis (LSA). Unlike previous attempts that concentrate on problems such as matrix elements weighting, space dimensionality selection, similarity measure etc., we pr ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
The paper presents on-going work towards deeper understanding of the factors influencing the performance of the Latent Semantic Analysis (LSA). Unlike previous attempts that concentrate on problems such as matrix elements weighting, space dimensionality selection, similarity measure etc., we primarily study the impact of another, often neglected, but fundamental element of LSA (and of any text processing technique) : the definition of "word". For the purpose, a balanced corpus of Bulgarian newspaper texts was carefully created, to allow for in-depth observations of the LSA performance, and series of experiments were performed in order to understand and compare (with respect to the task of text categorisation) six possible inputs with different level of linguistic quality, including: graphemic form as met in the text, stem, lemma, phrase, lemma&phrase and part-of-speech annotation.
A Research Taxonomy for Latent Semantic Analysis-Based Educational Applications
- IN
, 2005
"... The paper presents a taxonomy that summarises and highlights the major research into Latent Semantic Analysis (LSA) based educational applications. The taxonomy identifies five main research themes and emphasises the point that even after more than 15 years of research, much is left to be discovered ..."
Abstract
-
Cited by 5 (3 self)
- Add to MetaCart
The paper presents a taxonomy that summarises and highlights the major research into Latent Semantic Analysis (LSA) based educational applications. The taxonomy identifies five main research themes and emphasises the point that even after more than 15 years of research, much is left to be discovered to bring the LSA theory to maturity. The paper provides a framework for LSA researchers to publish their results in a format that is comprehensive, relatively compact, and useful to other researchers.
Parameters Driving Effectiveness of Automated Essay Scoring with LSA
"... Automated essay scoring with latent semantic analysis (LSA) has recently been subject to increasing interest. Although previous authors have achieved grade ranges similar to those awarded by humans, it is still not clear which and how parameters improve or decrease the effectiveness of LSA. This pap ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
Automated essay scoring with latent semantic analysis (LSA) has recently been subject to increasing interest. Although previous authors have achieved grade ranges similar to those awarded by humans, it is still not clear which and how parameters improve or decrease the effectiveness of LSA. This paper presents an analysis of the effects of these parameters, such as text preprocessing, weighting, singular value dimensionality and type of similarity measure, and benchmarks this effectiveness by comparing machine-assigned with human-assigned scores in a real-world case. We show that each of the identified factors significantly influences the quality of automated essay scoring and that the factors are not independent of each other.
Design and Evaluation of Inflectional Stemmer for Bulgarian
- IN PROCEEDINGS OF WORKSHOP ON BALKAN LANGUAGE RESOURCES AND TOOLS (1ST BALKAN CONFERENCE IN INFORMATICS
, 1998
"... The paper starts with an overview of some important approaches to stemming for English and other languages. Then, the design, implementation and evaluation of the BulStem inflectional stemmer for Bulgarian are presented. The problem is addressed from a machinelearning perspective using a large morph ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
The paper starts with an overview of some important approaches to stemming for English and other languages. Then, the design, implementation and evaluation of the BulStem inflectional stemmer for Bulgarian are presented. The problem is addressed from a machinelearning perspective using a large morphological dictionary. A detailed automatic evaluation in terms of understemming, over-stemming and coverage is provided. In addition, the effect of stemming and BulStem parameters setting is demonstrated on a particular task: text categorisation using kNN+LSA.
Towards pertinent evaluation methodologies for word-space models
- In Proceedings of the 5th International Conference on Language Resources and Evaluation
, 2006
"... This paper discusses evaluation methodologies for a particular kind of meaning models known as word-space models, which use distributional information to assemble geometric representations of meaning similarities. Word-space models have received considerable attention in recent years, and have begun ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
This paper discusses evaluation methodologies for a particular kind of meaning models known as word-space models, which use distributional information to assemble geometric representations of meaning similarities. Word-space models have received considerable attention in recent years, and have begun to see employment outside the walls of computational linguistics laboratories. However, the evaluation methodologies of such models remain infantile, and lack efforts at standardization. Very few studies have critically assessed the methodologies used to evaluate word spaces. This paper attempts to fill some of this void. It is the central goal of this paper to answer the question “how can we determine whether a given word space is a good word space?” 1. Word-space models
BulStem: Design and Evaluation of Inflectional Stemmer for Bulgarian
"... Abstract: The paper starts with an overview of some important approaches to stemming for English and other languages. Then, the design, implementation and evaluation of BulStem – a freely available inflectional stemmer for Bulgarian, are presented. The problem is addressed from a machine-learning pe ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
Abstract: The paper starts with an overview of some important approaches to stemming for English and other languages. Then, the design, implementation and evaluation of BulStem – a freely available inflectional stemmer for Bulgarian, are presented. The problem is addressed from a machine-learning perspective using a large morphological dictionary. A detailed automatic evaluation in terms of under-stemming, over-stemming and coverage is provided. In addition, the effect of stemming and the impact of stemmer parameters tuning are demonstrated and assessed on a particular task: text categorisation using kNN+LSA. Results show BulStem is competitive to manually checked lemmatisation.
Arguments
, 2009
"... Description The basic idea of latent semantic analysis (LSA) is, that text do have a higher order (=latent semantic) structure which, however, is obscured by word usage (e.g. through the use of synonyms or polysemy). By using conceptual indices that are derived statistically via a truncated singular ..."
Abstract
- Add to MetaCart
Description The basic idea of latent semantic analysis (LSA) is, that text do have a higher order (=latent semantic) structure which, however, is obscured by word usage (e.g. through the use of synonyms or polysemy). By using conceptual indices that are derived statistically via a truncated singular value decomposition (a two-mode factor analysis) over a given document-term matrix, this variability problem can be overcome.
Latent Semantic Analysis Parameters for Essay Evaluation using Small-Scale Corpora*
"... Some previous studies (e.g. that carried out by Van Bruggen et al. in 2004) have pointed to a need for additional research in order to firmly establish the usefulness of LSA (latent semantic analysis) parameters for automatic evaluation of academic essays. The extreme variability in approaches to th ..."
Abstract
- Add to MetaCart
Some previous studies (e.g. that carried out by Van Bruggen et al. in 2004) have pointed to a need for additional research in order to firmly establish the usefulness of LSA (latent semantic analysis) parameters for automatic evaluation of academic essays. The extreme variability in approaches to this technique makes it difficult to identify the most efficient parameters and the optimum combination. With this goal in mind, we conducted a high spectrum study to investigate the efficiency of some of the major LSA parameters in small-scale corpora. We used two specific domain corpora that differed in the structure of the text (one containing only technical terms and the other with more tangential information). Using these corpora we tested different semantic spaces, formed by applying different parameters and different methods of comparing the texts. Parameters varied included weighting functions (Log-IDF or Log-Entropy), dimensionality reduction (truncating the matrices after SVD to a set percentage of dimensions), methods of forming pseudo-documents (vector sum and folding-in) and measures of similarity (cosine or Euclidean distances). We also included two groups of essays to be
Date of delivery Contractual: 01-11-2009 Actual: 15-01-2010 Code name D4.2 Version: 1.0 Draft Final Type of deliverable Report Security (distribution level) Contributors Authors (Partner)
"... WP/Task responsible ..."

