Results 1 -
9 of
9
Language Modeling Using Efficient Best-First Bottom-Up Parsing
- In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop
, 2003
"... In this paper we present a two-stage best-first bottom-up word-lattice parser which we use as a language model for speech recognition. The parser works by using a "Figure of Merit" that selects lattice paths while simultaneously selecting syntactic category edges for parsing. Additionally, we introd ..."
Abstract
-
Cited by 13 (6 self)
- Add to MetaCart
In this paper we present a two-stage best-first bottom-up word-lattice parser which we use as a language model for speech recognition. The parser works by using a "Figure of Merit" that selects lattice paths while simultaneously selecting syntactic category edges for parsing. Additionally, we introduce a modified version of the Inside-Outside algorithm used as a pruning stage between syntactic context-free parsing and lexicalized context-dependent parsing. We report our results in terms of Word Error Rate on the HUB--1 word-lattices and compare these results to other syntactic language modeling techniques.
Rule filtering by pattern for efficient hierarchical translation
- In Proceedings of the EACL
, 2009
"... We describe refinements to hierarchical translation search procedures intended to reduce both search errors and memory usage through modifications to hypothesis expansion in cube pruning and reductions in the size of the rule sets used in translation. Rules are put into syntactic classes based on th ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
We describe refinements to hierarchical translation search procedures intended to reduce both search errors and memory usage through modifications to hypothesis expansion in cube pruning and reductions in the size of the rule sets used in translation. Rules are put into syntactic classes based on the number of non-terminals and the pattern, and various filtering strategies are then applied to assess the impact on translation speed and quality. Results are reported on the 2008 NIST Arabic-to-English evaluation task. 1
A Single Generative Model for Joint Morphological Segmentation and Syntactic Parsing
"... Morphological processes in Semitic languages deliver space-delimited words which introduce multiple, distinct, syntactic units into the structure of the input sentence. These words are in turn highly ambiguous, breaking the assumption underlying most parsers that the yield of a tree for a given sent ..."
Abstract
-
Cited by 6 (1 self)
- Add to MetaCart
Morphological processes in Semitic languages deliver space-delimited words which introduce multiple, distinct, syntactic units into the structure of the input sentence. These words are in turn highly ambiguous, breaking the assumption underlying most parsers that the yield of a tree for a given sentence is known in advance. Here we propose a single joint model for performing both morphological segmentation and syntactic disambiguation which bypasses the associated circularity. Using a treebank grammar, a data-driven lexicon, and a linguistically motivated unknown-tokens handling technique our model outperforms previous pipelined, integrated or factorized systems for Hebrew morphological and syntactic processing, yielding an error reduction of 12% over the best published results so far. 1
Better Arabic parsing: Baselines, evaluations, and analysis
, 2010
"... In this paper, we offer broad insight into the underperformance of Arabic constituency parsing by analyzing the interplay of linguistic phenomena, annotation choices, and model design. First, we identify sources of syntactic ambiguity understudied in the existing parsing literature. Second, we show ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
In this paper, we offer broad insight into the underperformance of Arabic constituency parsing by analyzing the interplay of linguistic phenomena, annotation choices, and model design. First, we identify sources of syntactic ambiguity understudied in the existing parsing literature. Second, we show that although the Penn Arabic Treebank is similar to other treebanks in gross statistical terms, annotation consistency remains problematic. Third, we develop a human interpretable grammar that is competitive with a latent variable PCFG. Fourth, we show how to build better models for three different parsers. Finally, we show that in application settings, the absence of gold segmentation lowers parsing performance by 2–5 % F1. 1
Parsing n-best lists of handwritten sentences
- In 7th Int. Conference on Document Analysis and Recognition
, 2003
"... This paper investigates the application of a probabilistic parser for natural language on the list of the Nbest sentences produced by an off-line recognition system for cursive handwritten sentences. For the generation of the N-best sentence list an HMM-based recognizer including a bigram language m ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
This paper investigates the application of a probabilistic parser for natural language on the list of the Nbest sentences produced by an off-line recognition system for cursive handwritten sentences. For the generation of the N-best sentence list an HMM-based recognizer including a bigram language model is used. The parsing of the sentences is achieved by a bottom-up chart parser for stochastic context-free grammars which produces the parse tree of the input sentence as well as the word tags. From a collection of corpora we extract the linguistic resources to build the lexicon,a word bigram model and the stochastic context-free grammar. Results from experiments indicate an increase of the word and sentence recognition rate when using the proposed combination scheme.
ISIS: Interaction through Speech with Information Systems
- Proceedings of the 3rd International Workshop, TSD 2000
, 2000
"... We present the result of an experimental system aimed at performing a robust semantic analysis of analyzed speech input in the are of information system access. The goal of this experiment was to investigate the eectiveness of such a system in a pipelined architecture, where no control is possible ..."
Abstract
-
Cited by 2 (2 self)
- Add to MetaCart
We present the result of an experimental system aimed at performing a robust semantic analysis of analyzed speech input in the are of information system access. The goal of this experiment was to investigate the eectiveness of such a system in a pipelined architecture, where no control is possible over the morpho-syntactic analysis which precedes the semantic analysis and query formation. 1 Introduction The general applicative framework of the ISIS project 1 was to design an information system NLP interface for automated telephone-based phone-book inquiry. The objective of the project was to dene an architecture to improve speech recognition results by integrating higher level linguistic knowledge. The availability of a huge collection of annotated telephone calls for querying the Swiss phone-book database (i.e the Swiss French PolyPhone corpus [6]) allowed us to propose and evaluate a very rst functional prototype of software architecture for vocal access to database through...
Monte-Carlo Sampling for NP-Hard Maximization Problems in the Framework of Weighted Parsing
- Natural Language Processing -- NLP 2000, number 1835 in Lecture Notes in Artificial Intelligence
, 2000
"... The purpose of this paper is (1) to provide a theoretical justification for the use of Monte-Carlo sampling for approximate resolution of NP-hard maximization problems in the framework of weighted parsing, and (2) to show how such sampling techniques can be e#ciently implemented with an explicit ..."
Abstract
-
Cited by 1 (1 self)
- Add to MetaCart
The purpose of this paper is (1) to provide a theoretical justification for the use of Monte-Carlo sampling for approximate resolution of NP-hard maximization problems in the framework of weighted parsing, and (2) to show how such sampling techniques can be e#ciently implemented with an explicit control of the error probability. We provide an algorithm to compute the local sampling probability distribution that guarantee that the global sampling probability indeed corresponds to the aimed theoretical score. The proposed sampling strategy significantly di#ers from existing methods, showing by the same way the bias induced by these methods.
Joint Hebrew Segmentation and Parsing using a PCFG-LA Lattice Parser
"... We experiment with extending a lattice parsing methodology for parsing Hebrew (Goldberg and Tsarfaty, 2008; Golderg et al., 2009) to make use of a stronger syntactic model: the ..."
Abstract
- Add to MetaCart
We experiment with extending a lattice parsing methodology for parsing Hebrew (Goldberg and Tsarfaty, 2008; Golderg et al., 2009) to make use of a stronger syntactic model: the
Nonlexical Chart Parsing for TAG
"... Bangalore and Joshi (1999) investigate supertagging as “almost parsing”. In this paper we explore this claim further by replacing their Lightweight Dependency Analyzer with a nonlexical probabilistic chart parser. Our approach is still in the spirit of their work in the sense that lexical informatio ..."
Abstract
- Add to MetaCart
Bangalore and Joshi (1999) investigate supertagging as “almost parsing”. In this paper we explore this claim further by replacing their Lightweight Dependency Analyzer with a nonlexical probabilistic chart parser. Our approach is still in the spirit of their work in the sense that lexical information is only used during supertagging; the parser and its probabilistic model only see supertags. 1

