Results 1  10
of
11
Formal structure of Sanskrit text: Requirements analysis for a mechanical Sanskrit processor
"... Abstract. We discuss the mathematical structure of various levels of representation of Sanskrit text in order to guide the design of computer aids aiming at useful processing of the digitalised Sanskrit corpus. Two main levels are identified, respectively called the linear and functional level. The ..."
Abstract

Cited by 9 (3 self)
 Add to MetaCart
(Show Context)
Abstract. We discuss the mathematical structure of various levels of representation of Sanskrit text in order to guide the design of computer aids aiming at useful processing of the digitalised Sanskrit corpus. Two main levels are identified, respectively called the linear and functional level. The design space of these two levels is sketched, and the computational implications of the main design choices are discussed. Current solutions to the problems of mechanical segmentation, tagging, and parsing of Sanskrit text are briefly surveyed in this light. An analysis of the requirements of relevant linguistic resources is provided, in view of justifying standards allowing interoperability of computer tools. This paper does not attempt to provide definitive solutions to the representation of Sanskrit at the various levels. It should rather be considered as a survey of various choices, allowing an open discussion of such issues in a formally precise general framework. 1
A Distributed Platform for Sanskrit Processing
"... Sanskrit, the classical language of India, presents specific challenges for computational linguistics: exact phonetic transcription in writing that obscures word boundaries, rich morphology and an enormous corpus, among others. Recent international cooperation has developed innovative solutions to t ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Sanskrit, the classical language of India, presents specific challenges for computational linguistics: exact phonetic transcription in writing that obscures word boundaries, rich morphology and an enormous corpus, among others. Recent international cooperation has developed innovative solutions to these problems and significant resources for linguistic research. Solutions include efficient segmenting and tagging algorithms and dependency parsers based on constraint programming. The integration of lexical resources, text archives and linguistic software is achieved by distributed interoperable Web services. Resources include a morphological tagger and tagged corpus.
Completeness Analysis of a Sanskrit Reader
"... Abstract. We analyse in this paper differences of linguistic treatment of Sanskrit in the Sanskrit Heritage platform 1 and in the Paninian grammatical tradition. 1 General methodology The general assumption behind the design of the Heritage Sanskrit Reader is that sentences from Classical Sanskrit m ..."
Abstract

Cited by 2 (1 self)
 Add to MetaCart
(Show Context)
Abstract. We analyse in this paper differences of linguistic treatment of Sanskrit in the Sanskrit Heritage platform 1 and in the Paninian grammatical tradition. 1 General methodology The general assumption behind the design of the Heritage Sanskrit Reader is that sentences from Classical Sanskrit may be generated as the image by a regular relation R of the Kleene closure W ∗ of a regular set W of words over a finite alphabet Σ. Think of W as the vocabulary of (inflected) words (padas) and of R as sandhi. The computerized lexer underlying the Heritage Reader essentially proceeds by inverting relation R over the candidate sentence w in order to produce a finite sequence w1, w2,...wn of word forms, together with a proof that w ∈ R(w1 ·w2...· wn). The word forms wi must be justified being valid word forms of Sanskrit (i.e. padas), and some justification must be offered that the combination of such word forms makes sense. The first justification consists in exhibiting wi as the lemmatization of some root stem, according to valid rules of morphology. The second justification consists in giving some dependency analysis of sentence w using assignments of semantic roles for the individual wi’s consistent with their morphological analysis. Both kinds of justifications must be ultimately related to the traditional methods of Sanskrit grammar (vyākaran. a). That is, that each wi corresponds to some w ′ i, obtainable by a valid Paninian derivation sequence, and that the concatenation of the sequence w ′ 1, w ′ 2,...w ′ n yields by some valid Paninian derivation a final sequence of phonemes w ′ equivalent in a strong sense to the original w. Thus, to fix ideas with a concrete trivial example, the sentence (in Roman transliteration): rāmogrāmam. gacchati, equivalently represented as: rāmogrāma ˙ngacchati2, may be analysed as the sequence:
Computing with Relational Machines
, 2008
"... Abstract. We give a quick presentation of the Xmachines of Eilenberg, a generalisation of finite state automata suitable for general nondeterministic computation. Such machines complement an automaton, seen as its control component, with a computation component over a data domain specified as an ac ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract. We give a quick presentation of the Xmachines of Eilenberg, a generalisation of finite state automata suitable for general nondeterministic computation. Such machines complement an automaton, seen as its control component, with a computation component over a data domain specified as an action algebra. Actions are interpreted as binary relations over the data domain, structured by regular expression operations. We show various strategies for the sequential simulation of our relational machines, using variants of the reaction engine. In a particular case of finite machines, we show that bottomup search yields an efficient complete simulator. Relational machines may be composed in a modular fashion, since atomic actions of one machine may be mapped to the characteristic relation of other relational machines acting as its parameters. The control components of machines is compiled from regular expressions. Several such translations have been proposed in the literature, that we briefly survey. Our view of machines is completely applicative. They may be defined constructively in type theory, where the correctness of their simulation may be formally checked. From formal proofs in the Coq proof assistant, efficient functional programs in the Objective Caml programming language may be mechanically extracted. Most of this material is extracted from the (forthcoming) Ph.D. thesis of Benoît Razet.
Automates, machines, moteurs réactifs
, 2008
"... 1.1 Automate sur un monoïde d’actions Soit S = 〈S, ·, 1 〉 un monoïde de support un ensemble S d’éléments appelés actions, muni d’une opération associative notée · appelée produit et d’un élément 1 neutre à gauche et à droite pour ce produit. On appelle Sautomate un tuple 〈Q, I, T, δ 〉 où: – Q est u ..."
Abstract
 Add to MetaCart
(Show Context)
1.1 Automate sur un monoïde d’actions Soit S = 〈S, ·, 1 〉 un monoïde de support un ensemble S d’éléments appelés actions, muni d’une opération associative notée · appelée produit et d’un élément 1 neutre à gauche et à droite pour ce produit. On appelle Sautomate un tuple 〈Q, I, T, δ 〉 où: – Q est un ensemble fini d’états – I ∈ Q est l’ensemble des états initiaux – T ∈ Q est l’ensemble des états terminaux – δ ∈ Q → ℘(S × Q) est la relation de transition, qui associe à tout état un ensemble fini de paires (a, q) formées d’une action a et d’un état q. On appelle support de l’automate A = 〈Q, I, T, δ 〉 l’ensemble fini d’actions ΦA = {a ∈ S  ∃q, q ′ ∈ Q (a, q ′ ) ∈ δ(q)}. On appelle inverse de l’automate A = 〈Q, I, T, δ 〉 l’automate de même support 〈Q, T, I, δ ′ 〉 tel que δ ′(q′) = {(a, q)  (a, q ′) ∈ δ(q)}, noté Ã. On a bien sûr ˜Ã = A. 1.2 Comportement d’un automate On appelle parcours de l’automate A = 〈Q, I, T, δ 〉 une séquence p = q0 a1 a2 an → q1 →...q2 → qn (n ≥ 0) avec ∀i ≥ 0 qi ∈ Q ∧ (ai+1, qi+1) ∈ δ(qi). On définit l’action associée au parcours p comme act(p) = 1 si n = 0, et act(p) = a1 ·... · an sinon. Le parcours est dit acceptant si q0 ∈ I et qn ∈ T, et on note pa(A) pour l’ensemble des parcours acceptants de A. On appelle comportement de l’automate A l’ensemble A  = {act(p)  p ∈ pa(A)}. On dit que qn est Aaccessible à partir de q0, et que q est Aaccessible s’il est Aaccessible à partir d’un état initial de A. On dit que q est Acoaccessible s’il est Āaccessible, et Autile s’il est Aaccessible et Acoaccessible. On dit que A est émondé ssi tous ses états sont utiles. Tout automate peut être réduit en un automate émondé de même comportement. II 1.3 Exemples Le monoïde des actions peut être le monoïde libre Σ ∗ engendré par un alphabet fini Σ. Un Σ ∗automate est alors la généralisation d’un automate fini non déterministe, où on permet d’étiqueter une transition par un mot arbitraire, et non seulement une lettre. Un cas particulier est celui des automates avec transition spontanée (ɛmove). On obtient les transducteurs d’un alphabet Σ dans un alphabet Σ ′ en considérant le monoïde (nonlibre) produit des monoïdes libres Σ ∗ et Σ ′ ∗. 2
Sanskrit Segmentation
"... preprocessing We discuss in this paper the topic of Sanskrit segmentation, that is how to solve by computer software the problem of identifying in a Sanskrit sentence the division of a continuous enunciation into a sequence of discrete word forms. This ..."
Abstract
 Add to MetaCart
(Show Context)
preprocessing We discuss in this paper the topic of Sanskrit segmentation, that is how to solve by computer software the problem of identifying in a Sanskrit sentence the division of a continuous enunciation into a sequence of discrete word forms. This
A Collaborative Platform for Sanskrit Processing
"... Sanskrit, the classical language of India, presents specific challenges for computational linguistics: exact phonetic transcription in writing that obscures word boundaries, rich morphology and an enormous corpus, among others. Recent international cooperation has developed innovative solutions to t ..."
Abstract
 Add to MetaCart
(Show Context)
Sanskrit, the classical language of India, presents specific challenges for computational linguistics: exact phonetic transcription in writing that obscures word boundaries, rich morphology and an enormous corpus, among others. Recent international cooperation has developed innovative solutions to these problems and significant resources for linguistic research. Solutions include efficient segmenting and tagging algorithms and dependency parsers based on constraint programming. The integration of lexical resources, text archives and linguistic software is achieved by distributed interoperable Web services. Resources include a morphological tagger and tagged corpus.
Simulating Finite Eilenberg Machines with a Reactive Engine
"... Eilenberg machines have been introduced in 1974 in the field of formal language theory. They are finite automata for which the alphabet is interpreted by mathematical relations over an abstract set. They generalize many finite state machines. We consider in the present work the subclass of finite Ei ..."
Abstract
 Add to MetaCart
(Show Context)
Eilenberg machines have been introduced in 1974 in the field of formal language theory. They are finite automata for which the alphabet is interpreted by mathematical relations over an abstract set. They generalize many finite state machines. We consider in the present work the subclass of finite Eilenberg machines for which we provide an executable complete simulator. This program is specified using the Coq proof assistant. The correctness of the algorithm is also proved formally and mechanically verified using Coq. Using its extraction mechanism, the Coq proof assistant allows to translate the specification into an executable OCaml program. The algorithm and specification are inspired from the reactive engine of Gérard Huet. The finite Eilenberg machines model includes deterministic and nondeterministic automata (DFA and NFA) but also realtime transducers. As an example, we present a pushdown automaton (PDA) recognizing ambiguous terms of the λcalculus. We show that this pushdown automaton is a finite Eilenberg machine, then the simulation using the reactive engine provides a complete recognizer for this particular contextfree language.
unknown title
"... Shallow syntax analysis in Sanskrit guided by semantic nets constraints We present the state of the art of a computational platform for the analysis of classical Sanskrit. The platform comprises modules for phonology, morphology, segmentation and shallow syntax analysis, organized around a structure ..."
Abstract
 Add to MetaCart
(Show Context)
Shallow syntax analysis in Sanskrit guided by semantic nets constraints We present the state of the art of a computational platform for the analysis of classical Sanskrit. The platform comprises modules for phonology, morphology, segmentation and shallow syntax analysis, organized around a structured lexical database. It relies on the Zen toolkit for finite state automata and transducers, which provides data structures and algorithms for the modular construction and execution of finite state machines, in a functional framework. Some of the layers proceed in bottomup synthesis modefor instance, noun and verb morphological modules generate all inflected forms from stems and roots listed in the lexicon. Morphemes are assembled through internal sandhi, and the inflected forms are stored with morphological tags in dictionaries usable for lemmatizing. These dictionaries are then compiled into transducers, implementing the analysis of external sandhi, the phonological process which merges words together by euphony. This provides a tagging segmenter, which analyses a sentence presented as a stream of phonemes and produces a stream of tagged lexical entries, hyperlinked to the lexicon. The next layer is a syntax analyser, guided by semantic nets constraints expressing dependencies between the word forms. Finite verb forms demand semantic roles, according to valency patterns depending on the voice (active, passive) of the form and the governance (transitive, etc) of the root. Conversely, noun/adjective forms provide actors which may fill those roles, provided agreement constraints are satisfied. Tool words are mapped to transducers operating on tagged streams, allowing the modeling of linguistic phenomena such as coordination by abstract interpretation of actor streams. The parser ranks the various interpretations (matching actors with roles) with penalties, and returns to the user the minimum penalty analyses, for final validation of ambiguities. The whole platform is organized as a Web service, allowing the piecewise tagging of a Sanskrit text.