Results 1 -
5 of
5
Table 6: Chunking accuracies of the teams for Hindi, Bengali and Telugu.
in Organizers
2007
"... In PAGE 12: ...nd 66.85 % respectively. This also supports the fact that Telugu is richer and hence harder to learn than Hindi and Bengali. Table6 also shows that systems which have been trained with richer features (Avi- nesh and Karthik, 2007; Himanshu, 2007; Sandi-... In PAGE 22: ...-VG 74.0 72.7 69.6 Table6 : Recall values in percentage 4 Conclusion The TnT tagger uses the Viterbi algorithm. Its processing time may be reduced by introducing a beam search which can greatly increase the speed of execution.... In PAGE 24: ... Secondly, TnT on an average performs better than Qtag. Table6 shows the result of TnT tagger (best approach) on the test data. The following section explains our approach to shallow parsing.... In PAGE 24: ...9 84.7 Table6 : Part-of-speech tagging results on test data Language Accuracy Hindi 76.68 Bengali 74.... In PAGE 25: ...onverted to [BIIE]. S is used for singleton chunks. Table 8 shows the improvement obtained by map- ping BIO to BIES alone. 1The comparative performance for the variant CRF chunk- ing models given in Table6 -8 on development data are based on the use of goldstandard part-of-speech tags for the development data, rather than model derived tags. Table 8: Gain from BIO to BIES: CRF Method F1 CRF + BIO 79.... In PAGE 34: ... The rule-based chunking system is designed for Bengali and the same chunker is then applied for Hindi and Telegu test data. After removing the chunk labels and chunk boundary markers (keep- ing the POS tags intact) of the manually chunked corpus (development test sets), the chunker is evaluated and the obtained results are presented in Table 5 and Table6 for the development test sets and unannotated test sets respectively. Fol- lowing abbreviations are used in this table: No.... In PAGE 34: ...65 Telegu 5193 2760 53.15 Table6 : Chunking Results (Unannotated sets) The accuracy figures of the chunking results for Hindi and Telegu test sets are not satisfactory as the rules of the chunker were mainly designed for Bengali. These accuracy figures can be in- creased by deriving the rules of chunking for Hindi and Telegu language.... In PAGE 38: ...47 68.59 Table6 : POS Tagging and Chunking accuracies (in %) for different languages for the test data set 7 Conclusion In this report we have described an approach for automatic stochastic tagging of natural language text. The models described here are very simple and efficient for automatic tagging even when the amount of available labeled text is small.... ..."
Table 6: Recall values in percentage
in Organizers
2007
"... In PAGE 12: ...85 71.48 Table6 : Chunking accuracies of the teams for Hindi, Bengali and Telugu. cision Forests for chunking too with features from words within a window size of 2.... In PAGE 12: ...nd 66.85 % respectively. This also supports the fact that Telugu is richer and hence harder to learn than Hindi and Bengali. Table6 also shows that systems which have been trained with richer features (Avi- nesh and Karthik, 2007; Himanshu, 2007; Sandi-... In PAGE 24: ... Secondly, TnT on an average performs better than Qtag. Table6 shows the result of TnT tagger (best approach) on the test data. The following section explains our approach to shallow parsing.... In PAGE 24: ...9 84.7 Table6 : Part-of-speech tagging results on test data Language Accuracy Hindi 76.68 Bengali 74.... In PAGE 25: ...onverted to [BIIE]. S is used for singleton chunks. Table 8 shows the improvement obtained by map- ping BIO to BIES alone. 1The comparative performance for the variant CRF chunk- ing models given in Table6 -8 on development data are based on the use of goldstandard part-of-speech tags for the development data, rather than model derived tags. Table 8: Gain from BIO to BIES: CRF Method F1 CRF + BIO 79.... In PAGE 34: ... The rule-based chunking system is designed for Bengali and the same chunker is then applied for Hindi and Telegu test data. After removing the chunk labels and chunk boundary markers (keep- ing the POS tags intact) of the manually chunked corpus (development test sets), the chunker is evaluated and the obtained results are presented in Table 5 and Table6 for the development test sets and unannotated test sets respectively. Fol- lowing abbreviations are used in this table: No.... In PAGE 34: ...65 Telegu 5193 2760 53.15 Table6 : Chunking Results (Unannotated sets) The accuracy figures of the chunking results for Hindi and Telegu test sets are not satisfactory as the rules of the chunker were mainly designed for Bengali. These accuracy figures can be in- creased by deriving the rules of chunking for Hindi and Telegu language.... In PAGE 38: ...47 68.59 Table6 : POS Tagging and Chunking accuracies (in %) for different languages for the test data set 7 Conclusion In this report we have described an approach for automatic stochastic tagging of natural language text. The models described here are very simple and efficient for automatic tagging even when the amount of available labeled text is small.... ..."
Table 3: Part-of-speech tagging results on the devel- opment data (all words)
in Organizers
2007
"... In PAGE 11: ... Also uses NER. Him Conditional Random Fields Unannotated corpus word window of 2 last chars 4, word length San Maximum entropy Model Morphological Analyser word window of 1 for Bengali jsuf xj 4 and jpre xj 4 San Decision Forests - features based on Syllables, Phonemes and Onset-Vowel-Code Table3 : Summary of the approaches followed by the participants for POS tagging. Avinesh and Karthik (2007) proposed a two level training approach where Transformation Based Learning (TBL) was applied on top of a CRF based model.... In PAGE 17: ... The results for all the three languages are shown below in Table 1. Also evaluation results of Text chunker are shown in Table 2 and Table3 . In Table 2 the evaluation results shown are when the text chunker is applied on the gold standard POS tagged input text, which means there are no errors in the POS tagger.... In PAGE 17: ... In Table 2 the evaluation results shown are when the text chunker is applied on the gold standard POS tagged input text, which means there are no errors in the POS tagger. The results shown in Table3 are obtained when the input text to the text chunker is POS tagged text obtained by executing our hybrid POS tagger. Table 3 shows the combined results of POS tagger and Text chunker.... In PAGE 17: ... The results shown in Table 3 are obtained when the input text to the text chunker is POS tagged text obtained by executing our hybrid POS tagger. Table3 shows the combined results of POS tagger and Text chunker. Languag e No.... In PAGE 17: ...17 50.38 Table3 . Combined Results of the Text Chunker with POS tagged input by the system In POS tagging, when a hybrid system is used, the disadvantages of one component is either nullified or reduced by the other component.... In PAGE 20: ...RB 67.2 75.0 73.4 Table3 : Recall Values in percentage 3 Chunking We have attempted to implement a novel method to obtain the chunks using the Cocke-Kasami- Young (CKY) algorithm. The following subsec- tions outline the existing techniques in chunking, our approach and the results obtained on testing our method on the test data.... In PAGE 33: ...05 Telegu 3898 63.93 Table3 : POS tagger Results (Development sets) (CTT: No. of tokens correctly tagged by the POS tagger) Test Set CTT Accuracy (in %) Bengali 4061 77.... In PAGE 33: ...87 Telegu 3505 67.49 Table 4: POS tagger Results (Unannotated sets) The POS tagger has been tested with all three different development test sets and their results have been reported in Table3 . The POS tagger is then tested with three different unannotated test sets and their results have been presented in Ta- ble 4.... In PAGE 34: ...observed from Table3 and Table 4 that the POS tagger performs best for Bengali development and unannotated test sets. The key to this higher accuracy, compared to Hindi and Telegu, is the mechanism of handling of unknown words.... In PAGE 37: ... In order to measure the performance of the sys- tem we use annotated test data (development data set and test data set in two phases) for each language. Table3 summarizes the size of the data, the number of POS and chunk categories used for the language. Language Bengali Hindi Telugu Training data (in words) 20,396 21,470 21,416 Development data (in words) 5,023 5,681 6,098 Test data (in words) 5,226 4,924 5,193 No.... In PAGE 37: ... of POS tags 27 25 25 No. of Chunk labels 6 7 6 Table3 : Data size and tags used for the lan- guages 5 System Performance 5.1 Tagging Accuracy for the Development Data Set We define the tagging accuracy as the ratio of the correctly tagged words to the total number of words.... In PAGE 42: ...2 Chunking In the feature set discussed in section 4 for chunck- ing , the CART and the Decision Forest are giving percentages as shown in the table-3. Table3 : Performance of Chunking using CART and Decision Forest (in %) Method Hindi Bengali Telugu CART 69.11 69.... ..."
Table 3: Data size and tags used for the lan- guages
in Organizers
2007
"... In PAGE 11: ... Also uses NER. Him Conditional Random Fields Unannotated corpus word window of 2 last chars 4, word length San Maximum entropy Model Morphological Analyser word window of 1 for Bengali jsuf xj 4 and jpre xj 4 San Decision Forests - features based on Syllables, Phonemes and Onset-Vowel-Code Table3 : Summary of the approaches followed by the participants for POS tagging. Avinesh and Karthik (2007) proposed a two level training approach where Transformation Based Learning (TBL) was applied on top of a CRF based model.... In PAGE 17: ... The results for all the three languages are shown below in Table 1. Also evaluation results of Text chunker are shown in Table 2 and Table3 . In Table 2 the evaluation results shown are when the text chunker is applied on the gold standard POS tagged input text, which means there are no errors in the POS tagger.... In PAGE 17: ... In Table 2 the evaluation results shown are when the text chunker is applied on the gold standard POS tagged input text, which means there are no errors in the POS tagger. The results shown in Table3 are obtained when the input text to the text chunker is POS tagged text obtained by executing our hybrid POS tagger. Table 3 shows the combined results of POS tagger and Text chunker.... In PAGE 17: ... The results shown in Table 3 are obtained when the input text to the text chunker is POS tagged text obtained by executing our hybrid POS tagger. Table3 shows the combined results of POS tagger and Text chunker. Languag e No.... In PAGE 17: ...17 50.38 Table3 . Combined Results of the Text Chunker with POS tagged input by the system In POS tagging, when a hybrid system is used, the disadvantages of one component is either nullified or reduced by the other component.... In PAGE 20: ...RB 67.2 75.0 73.4 Table3 : Recall Values in percentage 3 Chunking We have attempted to implement a novel method to obtain the chunks using the Cocke-Kasami- Young (CKY) algorithm. The following subsec- tions outline the existing techniques in chunking, our approach and the results obtained on testing our method on the test data.... In PAGE 24: ...4 tently performs well on out of vocabulary words. Table3 : Part-of-speech tagging results on the devel- opment data (all words) Method Hindi Bengali Telugu INV 78.9 73.... In PAGE 33: ...05 Telegu 3898 63.93 Table3 : POS tagger Results (Development sets) (CTT: No. of tokens correctly tagged by the POS tagger) Test Set CTT Accuracy (in %) Bengali 4061 77.... In PAGE 33: ...87 Telegu 3505 67.49 Table 4: POS tagger Results (Unannotated sets) The POS tagger has been tested with all three different development test sets and their results have been reported in Table3 . The POS tagger is then tested with three different unannotated test sets and their results have been presented in Ta- ble 4.... In PAGE 34: ...observed from Table3 and Table 4 that the POS tagger performs best for Bengali development and unannotated test sets. The key to this higher accuracy, compared to Hindi and Telegu, is the mechanism of handling of unknown words.... In PAGE 37: ... In order to measure the performance of the sys- tem we use annotated test data (development data set and test data set in two phases) for each language. Table3 summarizes the size of the data, the number of POS and chunk categories ... In PAGE 42: ...2 Chunking In the feature set discussed in section 4 for chunck- ing , the CART and the Decision Forest are giving percentages as shown in the table-3. Table3 : Performance of Chunking using CART and Decision Forest (in %) Method Hindi Bengali Telugu CART 69.11 69.... ..."
Table 3: Performance of Chunking using CART and Decision Forest (in %)
in Organizers
2007
"... In PAGE 11: ... Also uses NER. Him Conditional Random Fields Unannotated corpus word window of 2 last chars 4, word length San Maximum entropy Model Morphological Analyser word window of 1 for Bengali jsuf xj 4 and jpre xj 4 San Decision Forests - features based on Syllables, Phonemes and Onset-Vowel-Code Table3 : Summary of the approaches followed by the participants for POS tagging. Avinesh and Karthik (2007) proposed a two level training approach where Transformation Based Learning (TBL) was applied on top of a CRF based model.... In PAGE 17: ... The results for all the three languages are shown below in Table 1. Also evaluation results of Text chunker are shown in Table 2 and Table3 . In Table 2 the evaluation results shown are when the text chunker is applied on the gold standard POS tagged input text, which means there are no errors in the POS tagger.... In PAGE 17: ... In Table 2 the evaluation results shown are when the text chunker is applied on the gold standard POS tagged input text, which means there are no errors in the POS tagger. The results shown in Table3 are obtained when the input text to the text chunker is POS tagged text obtained by executing our hybrid POS tagger. Table 3 shows the combined results of POS tagger and Text chunker.... In PAGE 17: ... The results shown in Table 3 are obtained when the input text to the text chunker is POS tagged text obtained by executing our hybrid POS tagger. Table3 shows the combined results of POS tagger and Text chunker. Languag e No.... In PAGE 17: ...17 50.38 Table3 . Combined Results of the Text Chunker with POS tagged input by the system In POS tagging, when a hybrid system is used, the disadvantages of one component is either nullified or reduced by the other component.... In PAGE 20: ...RB 67.2 75.0 73.4 Table3 : Recall Values in percentage 3 Chunking We have attempted to implement a novel method to obtain the chunks using the Cocke-Kasami- Young (CKY) algorithm. The following subsec- tions outline the existing techniques in chunking, our approach and the results obtained on testing our method on the test data.... In PAGE 24: ...4 tently performs well on out of vocabulary words. Table3 : Part-of-speech tagging results on the devel- opment data (all words) Method Hindi Bengali Telugu INV 78.9 73.... In PAGE 33: ...05 Telegu 3898 63.93 Table3 : POS tagger Results (Development sets) (CTT: No. of tokens correctly tagged by the POS tagger) Test Set CTT Accuracy (in %) Bengali 4061 77.... In PAGE 33: ...87 Telegu 3505 67.49 Table 4: POS tagger Results (Unannotated sets) The POS tagger has been tested with all three different development test sets and their results have been reported in Table3 . The POS tagger is then tested with three different unannotated test sets and their results have been presented in Ta- ble 4.... In PAGE 34: ...observed from Table3 and Table 4 that the POS tagger performs best for Bengali development and unannotated test sets. The key to this higher accuracy, compared to Hindi and Telegu, is the mechanism of handling of unknown words.... In PAGE 37: ... In order to measure the performance of the sys- tem we use annotated test data (development data set and test data set in two phases) for each language. Table3 summarizes the size of the data, the number of POS and chunk categories used for the language. Language Bengali Hindi Telugu Training data (in words) 20,396 21,470 21,416 Development data (in words) 5,023 5,681 6,098 Test data (in words) 5,226 4,924 5,193 No.... In PAGE 37: ... of POS tags 27 25 25 No. of Chunk labels 6 7 6 Table3 : Data size and tags used for the lan- guages 5 System Performance 5.1 Tagging Accuracy for the Development Data Set We define the tagging accuracy as the ratio of the correctly tagged words to the total number of words.... ..."