Results 1 - 10
of
39
Automatic Sanskrit Segmentizer Using Finite State Transducers
"... In this paper, we propose a novel method for automatic segmentation of a Sanskrit string into different words. The input for our segmentizer is a Sanskrit string either encoded as a Unicode string or as a Roman transliterated string and the output is a set of possible splits with weights associated ..."
Abstract
-
Cited by 4 (0 self)
- Add to MetaCart
with each of them. We followed two different approaches to segment a Sanskrit text using sandhi1 rules extracted from a parallel corpus of manually sandhi split text. While the first approach augments the finite state transducer used to analyze Sanskrit morphology and traverse it to segment a word
Lexicon-directed Segmentation and Tagging of Sanskrit
- in « XIIth World Sanskrit Conference
, 2003
"... We propose a methodology for Sanskrit processing by computer. The first layer of this software, which analyses the linear structure of a Sanskrit sentence as a set of possible interpretations under sandhi analysis, is operational. Each interpretation proposes a segmentation of the sentence as a list ..."
Abstract
-
Cited by 5 (0 self)
- Add to MetaCart
We propose a methodology for Sanskrit processing by computer. The first layer of this software, which analyses the linear structure of a Sanskrit sentence as a set of possible interpretations under sandhi analysis, is operational. Each interpretation proposes a segmentation of the sentence as a
Completeness Analysis of a Sanskrit Reader
"... Abstract. We analyse in this paper differences of linguistic treatment of Sanskrit in the Sanskrit Heritage platform 1 and in the Paninian grammatical tradition. 1 General methodology The general assumption behind the design of the Heritage Sanskrit Reader is that sentences from Classical Sanskrit m ..."
Abstract
-
Cited by 2 (1 self)
- Add to MetaCart
relation R over the candidate sentence w in order to produce a finite sequence w1, w2,...wn of word forms, together with a proof that w ∈ R(w1 ·w2...· wn). The word forms wi must be justified being valid word forms of Sanskrit (i.e. padas), and some justification must be offered that the combination
External Sandhi and its Relevance to Syntactic Treebanking External Sandhi and its Relevance to Syntactic Treebanking
"... Abstract-External sandhi is a linguistic phenomenon which refers to a set of sound changes that occur at word boundaries. These changes are similar to phonological processes such as assimilation and fusion when they apply at the level of prosody, such as in connected speech. External sandhi formati ..."
Abstract
- Add to MetaCart
, we argue, necessitates the introduction of a sandhi splitting stage in the generic annotation pipeline currently being followed for the treebanking of Indian languages. We identify one type of external sandhi widely occurring in the previous version of the Telugu treebank (version 0.2) and manually
Analysis of Sanskrit text: Parsing and semantic relations
- Proceedings, First International Symposium on Sanskrit Computational Linguistics
, 2007
"... In this paper, we are presenting our work towards building a dependency parser for Sanskrit language that uses deterministic finite automata(DFA) for morphological analysis and ’utsarga apavaada ’ approach for relation analysis. A computational grammar based on the framework of Panini is being devel ..."
Abstract
-
Cited by 1 (0 self)
- Add to MetaCart
In this paper, we are presenting our work towards building a dependency parser for Sanskrit language that uses deterministic finite automata(DFA) for morphological analysis and ’utsarga apavaada ’ approach for relation analysis. A computational grammar based on the framework of Panini is being
Telugu Bigram Splitting using Consonant-based and Phrase-based Splitting
"... Abstract—Splitting is a conventional process in most of Indian languages according to their grammar rules. It is called ‘pada vicchEdanam ’ (a Sanskrit term for word splitting) and is widely used by most of the Indian languages. Splitting plays a key role in Machine Translation (MT) particularly whe ..."
Abstract
- Add to MetaCart
Abstract—Splitting is a conventional process in most of Indian languages according to their grammar rules. It is called ‘pada vicchEdanam ’ (a Sanskrit term for word splitting) and is widely used by most of the Indian languages. Splitting plays a key role in Machine Translation (MT) particularly
An Algorithm Based on Empirical Methods, for Text-to-Tuneful- Speech Synthesis of Sanskrit Verse
, 2010
"... The rendering of Sanskrit poetry from text to speech is a problem that has not been solved before. One reason may be the complications in the language itself. We present unique algorithms based on extensive empirical analysis, to synthesize speech from a given text input of Sanskrit verses. Using a ..."
Abstract
- Add to MetaCart
The rendering of Sanskrit poetry from text to speech is a problem that has not been solved before. One reason may be the complications in the language itself. We present unique algorithms based on extensive empirical analysis, to synthesize speech from a given text input of Sanskrit verses. Using a
Under consideration for publication in J. Functional Programming 1 A Functional Toolkit for Morphological and Phonological Processing, Application to a Sanskrit Tagger
"... We present the Zen toolkit for morphological and phonological processing of natural languages. This toolkit is presented in literate programming style, in the Pidgin ML subset of the Objective Caml functional programming language. This toolkit is based on a systematic representation of finite state ..."
Abstract
- Add to MetaCart
and completeness are formally proved. An application to the segmentation of Sanskrit by sandhi analysis is demonstrated. Dedicated to Rod Burstall on the occasion of his 65th birthday
Former Dean of Computing Sciences,
"... Abstract—Splitting of compound Telugu words into its components or root words is one of the important, tedious and yet inaccurate tasks of Natural Language Processing (NLP). Except in few special cases, at least one vowel is necessarily involved in Telugu conjunctions. In the result, vowels are ofte ..."
Abstract
- Add to MetaCart
are often repeated as they are or are converted into other vowels or consonants. This paper describes issues involved in vowel based splitting of a Telugu bigram into proper root words using Telugu grammar conjunction (‘sandhi’) rules for MT. Keywords—Telugu word splitting; vowel based splitting; compound
Samāsa-Kartā: An Online Tool for Producing Compound Words using IndoWordNet
"... Samāsa or compounds are a regular feature of Indian Languages. They are also found in other languages like German, Italian, French, Russian, Spanish, etc. Compound word is constructed from two or more words to form a single word. The meaning of this word is derived from each of the individual words ..."
Abstract
- Add to MetaCart
Net. The Samāsa-Kartā can be used for various applications viz., compound cate-gorization, sandhi creation, morphological analysis, paraphrasing, synset creation, etc. 1
Results 1 - 10
of
39