Results 1 -
6 of
6
Document and Corpus Level Inference For Unsupervised and Transductive Learning of Information Structure of Scientific Documents
"... Inferring the information structure of scientific documents has proved useful for supporting information access across scientific disciplines. Current approaches are largely supervised and expensive to port to new disciplines. We investigate primarily unsupervised discovery of information structure. ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
(Show Context)
Inferring the information structure of scientific documents has proved useful for supporting information access across scientific disciplines. Current approaches are largely supervised and expensive to port to new disciplines. We investigate primarily unsupervised discovery of information structure. We introduce a novel graphical model that can consider different types of prior knowledge about the task: within-document discourse patterns, cross-document sentence similarity information based on linguistic features, and prior knowledge about the correct classification of some of the input sentences when this information is available. We apply the model to Argumentative Zoning (AZ) scheme and evaluate it on a fully unsupervised learning scenario and two transduction scenarios where the categories of some test sentences are known. The model substantially outperforms similarity and topic model based clustering approaches as well as traditional transduction algorithms. TITLE AND ABSTRACT IN FINNISH Dokumentti- ja korpustason inferenssiin perustuva ohjaamattomankoneoppimisen tekniikka tieteellisen
Diverse M-best solutions in MRFs
- In Workshop on Discrete Optimization in Machine Learning, NIPS
, 2011
"... Current methods for computing theM most probable configurations under a prob-abilistic model produce solutions that tend to be very similar to the MAP solution and each other. This is often an undesirable property. In this paper we propose an algorithm for the M-Best Mode problem, which involves fin ..."
Abstract
-
Cited by 2 (0 self)
- Add to MetaCart
(Show Context)
Current methods for computing theM most probable configurations under a prob-abilistic model produce solutions that tend to be very similar to the MAP solution and each other. This is often an undesirable property. In this paper we propose an algorithm for the M-Best Mode problem, which involves finding a diverse set of highly probable solutions under a discrete probabilistic model. Given a dissimi-larity function measuring closeness of two solutions, our formulation maximizes a linear combination of the probability and dissimilarity to previous solutions. Our formulation generalizes the M-Best MAP problem and we show that for certain families of dissimilarity functions we can guarantee that these solutions can be found as easily as the MAP solution. 1
Examiner Examiner Guide
"... I declare that this written submission represents my ideas in my own words and where others ideas or words have been included I have adequately cited and referenced the original sources. I also declare that I have adhered to all principles of academic honesty and integrity and have not misrepresente ..."
Abstract
- Add to MetaCart
I declare that this written submission represents my ideas in my own words and where others ideas or words have been included I have adequately cited and referenced the original sources. I also declare that I have adhered to all principles of academic honesty and integrity and have not misrepresented or fabricated or falsified any idea/data/fact/source in my submission. I understand that any violation of the above will be cause for disciplinary action by the Institute and can also evoke penal action from the sources which have thus not been properly cited or from whom proper permission has not been taken when needed.
Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
"... State-of-the-art statistical parsers and POS taggers perform very well when trained with large amounts of in-domain data. When training data is out-of-domain or limited, accuracy degrades. In this paper, we aim to compensate for the lack of available training data by exploiting similarities between ..."
Abstract
- Add to MetaCart
(Show Context)
State-of-the-art statistical parsers and POS taggers perform very well when trained with large amounts of in-domain data. When training data is out-of-domain or limited, accuracy degrades. In this paper, we aim to compensate for the lack of available training data by exploiting similarities between test set sentences. We show how to augment sentencelevel models for parsing and POS tagging with inter-sentence consistency constraints. To deal with the resulting global objective, we present an efficient and exact dual decomposition decoding algorithm. In experiments, we add consistency constraints to the MST parser and the Stanford part-of-speech tagger and demonstrate significant error reduction in the domain adaptation and the lightly supervised settings across five languages. 1
Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
"... State-of-the-art statistical parsers and POS taggers perform very well when trained with large amounts of in-domain data. When training data is out-of-domain or limited, accuracy degrades. In this paper, we aim to compensate for the lack of available training data by exploiting similarities between ..."
Abstract
- Add to MetaCart
(Show Context)
State-of-the-art statistical parsers and POS taggers perform very well when trained with large amounts of in-domain data. When training data is out-of-domain or limited, accuracy degrades. In this paper, we aim to compensate for the lack of available training data by exploiting similarities between test set sentences. We show how to augment sentencelevel models for parsing and POS tagging with inter-sentence consistency constraints. To deal with the resulting global objective, we present an efficient and exact dual decomposition decoding algorithm. In experiments, we add consistency constraints to the MST parser and the Stanford part-of-speech tagger and demonstrate significant error reduction in the domain adaptation and the lightly supervised settings across five languages. 1
CZECH TECHNICAL UNIVERSITY IN PRAGUE DOCTORAL THESIS STATEMENT
, 2013
"... Doctoral thesis statement for obtaining the academic title of “Doctor”, abbreviated to “Ph.D.” ..."
Abstract
- Add to MetaCart
Doctoral thesis statement for obtaining the academic title of “Doctor”, abbreviated to “Ph.D.”