Results 1 -
4 of
4
Hindi CLIR in thirty days
- ACM Transactions on Asian Language Information Processing (TALIP
, 2003
"... As participants in the TIDES Surprise Language exercise, researchers at the University of Massachusetts helped collect Hindi-English resources and developed a cross-language information retrieval system. Components included normalization, stop-word removal, transliteration, structured query translat ..."
Abstract
-
Cited by 6 (0 self)
- Add to MetaCart
As participants in the TIDES Surprise Language exercise, researchers at the University of Massachusetts helped collect Hindi-English resources and developed a cross-language information retrieval system. Components included normalization, stop-word removal, transliteration, structured query translation, and language modeling using a probabilistic dictionary derived from a parallel corpus. Existing technology was successfully applied to Hindi. The biggest stumbling blocks were collection of parallel English and Hindi text and dealing with numerous proprietary encodings.
Language-specific models in multilingual topic tracking
- In Proceedings of SIGIR 2004
, 2004
"... Topic tracking is complicated when the stories in the stream occur in multiple languages. Typically, researchers have trained only English topic models because the training stories have been provided in English. In tracking, non-English test stories are then machine translated into English to compar ..."
Abstract
-
Cited by 4 (1 self)
- Add to MetaCart
Topic tracking is complicated when the stories in the stream occur in multiple languages. Typically, researchers have trained only English topic models because the training stories have been provided in English. In tracking, non-English test stories are then machine translated into English to compare them with the topic models. We propose a native language hypothesis stating that comparisons would be more effective in the original language of the story. We first test and support the hypothesis for story link detection. For topic tracking the hypothesis implies that it should be preferable to build separate language-specific topic models for each language in the stream. We compare different methods of incrementally building such native language topic models.
Unstructured Content Analysis & Classification System for the IRS R.Palson Kennedy,
"... Creating ontological approaches to personalizing queries of unstructured data requires intensive use of XML-based tables and schema. From the legacy design efforts for CSDL to the myriad of approaches to XML schema development including the development of XIRQL, Hybrid XML retrieval and XML queries, ..."
Abstract
- Add to MetaCart
Creating ontological approaches to personalizing queries of unstructured data requires intensive use of XML-based tables and schema. From the legacy design efforts for CSDL to the myriad of approaches to XML schema development including the development of XIRQL, Hybrid XML retrieval and XML queries, the adoption of advanced techniques for unstructured content management is progressing rapidly. Paralleling these research advances is pervasive adoption of Cloud Computing platforms including Software-as-a-Service (SaaS), driven by the growth of the Amazon Web Services platform in addition to others. The intent of this thesis proposal is to define an XML schema that can aggregate unstructured content that when combined based on the individualized taxonomies and ontological preferences of system users, delivers highly relevant and timely data. The proposed XML Schema Model for Unstructured Content Personalization shown in Figure 1. This model is further supported by the development of and continual fine-tuning of Quantum Information Algorithms to define approximate taxonomies and approaches to creating role-based query is used as the basis of creating personalization pathways in the data. Quantum Information Theory also makes it possible to create enterprise-wide networks of knowledge management systems that can effectively “learn ” over time through the use of latent semantic indexing (LSI) to create linguistic models of representation of the data. Quantum Information Theory provides the basis for creating an entire network of systems that can in essence learn over time, continually fueling new insights into the knowledgebase of the complex of systems themselves.
Unstructured Content Analysis & Classification System for the IRS
"... Creating ontological approaches to personalizing queries of unstructured data requires intensive use of XML-based tables and schema. From the legacy design efforts for CSDL to the myriad of approaches to XML schema development including the development of XIRQL, Hybrid XML retrieval and XML queries, ..."
Abstract
- Add to MetaCart
Creating ontological approaches to personalizing queries of unstructured data requires intensive use of XML-based tables and schema. From the legacy design efforts for CSDL to the myriad of approaches to XML schema development including the development of XIRQL, Hybrid XML retrieval and XML queries, the adoption of advanced techniques for unstructured content management is progressing rapidly. Paralleling these research advances is pervasive adoption of Cloud Computing platforms including Software-as-a-Service (SaaS), driven by the growth of the Amazon Web Services platform in addition to others. The intent of this thesis proposal is to define an XML schema that can aggregate unstructured content that when combined based on the individualized taxonomies and ontological preferences of system users, delivers highly relevant and timely data. The proposed XML Schema Model for Unstructured Content Personalization shown in Figure 1. This model is further supported by the development of and continual fine-tuning of Quantum Information Algorithms to define approximate taxonomies and approaches to creating role-based query is used as the basis of creating personalization pathways in the data. Quantum Information Theory also makes it possible to create enterprise-wide networks of knowledge management systems that can effectively “learn ” over time through the use of latent semantic indexing (LSI) to create linguistic models of representation of the data. Quantum Information Theory provides the basis for creating an entire network of systems that can in essence learn over time, continually fueling new insights into the knowledgebase of the complex of systems themselves.

