• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 30,571
Next 10 →

N-Grams Conflation Approach for Arabic Text

by Farag Ahmed, Andreas Nürnberger
"... In this paper we present a language independent approach for conflation that does not depend on predefined rules or prior knowledge in the target language. Different from prior studies on Arabic text that use pure n-gram models without any attempt for further enhancement on the basis of refined n-gr ..."
Abstract - Cited by 1 (0 self) - Add to MetaCart
In this paper we present a language independent approach for conflation that does not depend on predefined rules or prior knowledge in the target language. Different from prior studies on Arabic text that use pure n-gram models without any attempt for further enhancement on the basis of refined n-gram

Evaluation of N-Grams Conflation Approach in Text-Based Information Retrieval

by Serhiy Kosinov - Proceedings of International Workshop on Information Retrieval , 2001
"... This paper examines a conflation method based on the N-grams approach and evaluates its performance relative to the results achieved by other techniques such as Porter algorithm and successor variety stemming. In addition to that, an alternative way of enhancing the N-grams method, derived from the ..."
Abstract - Cited by 6 (0 self) - Add to MetaCart
This paper examines a conflation method based on the N-grams approach and evaluates its performance relative to the results achieved by other techniques such as Porter algorithm and successor variety stemming. In addition to that, an alternative way of enhancing the N-grams method, derived from

An evaluation of statistical approaches to text categorization

by Yiming Yang - Journal of Information Retrieval , 1999
"... Abstract. This paper focuses on a comparative evaluation of a wide-range of text categorization methods, including previously published results on the Reuters corpus and new results of additional experiments. A controlled study using three classifiers, kNN, LLSF and WORD, was conducted to examine th ..."
Abstract - Cited by 664 (23 self) - Add to MetaCart
Abstract. This paper focuses on a comparative evaluation of a wide-range of text categorization methods, including previously published results on the Reuters corpus and new results of additional experiments. A controlled study using three classifiers, kNN, LLSF and WORD, was conducted to examine

N-grambased text categorization

by William B. Cavnar, John M. Trenkle - In Proc. of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval , 1994
"... Text categorization is a fundamental task in document processing, allowing the automated handling of enormous streams of documents in electronic form. One difficulty in handling some classes of documents is the presence of different kinds of textual errors, such as spelling and grammatical errors in ..."
Abstract - Cited by 431 (0 self) - Add to MetaCart
in email, and character recognition errors in documents that come through OCR. Text categorization must work reliably on all input, and thus must tolerate some level of these kinds of problems. We describe here an N-gram-based approach to text categorization that is tolerant of textual errors. The system

Using Linear Algebra for Intelligent Information Retrieval

by Michael W. Berry, Susan T. Dumais - SIAM REVIEW , 1995
"... Currently, most approaches to retrieving textual materials from scientific databases depend on a lexical match between words in users' requests and those in or assigned to documents in a database. Because of the tremendous diversity in the words people use to describe the same document, lexical ..."
Abstract - Cited by 672 (18 self) - Add to MetaCart
by 200-300 of the largest singular vectors are then matched against user queries. We call this retrieval method Latent Semantic Indexing (LSI) because the subspace represents important associative relationships between terms and documents that are not evident in individual documents. LSI is a completely

Understanding Code Mobility

by Alfonso Fuggetta, Gian Pietro Picco, Giovanni Vigna - IEEE COMPUTER SCIENCE PRESS , 1998
"... The technologies, architectures, and methodologies traditionally used to develop distributed applications exhibit a variety of limitations and drawbacks when applied to large scale distributed settings (e.g., the Internet). In particular, they fail in providing the desired degree of configurability, ..."
Abstract - Cited by 549 (34 self) - Add to MetaCart
, code mobility is generating a growing body of scientific literature and industrial developments. Nevertheless, the field is still characterized by the lack of a sound and comprehensive body of concepts and terms. As a consequence, it is rather difficult to understand, assess, and compare the existing

The English noun phrase in its sentential aspect

by Richard Larson, Steven Paul Abney, Steven Paul Abney , 1987
"... This dissertation is a defense of the hypothesis that the noun phrase is headed by afunctional element (i.e., \non-lexical " category) D, identi ed with the determiner. In this way, the structure of the noun phrase parallels that of the sentence, which is headed by In (ection), under assump ..."
Abstract - Cited by 509 (4 self) - Add to MetaCart
This dissertation is a defense of the hypothesis that the noun phrase is headed by afunctional element (i.e., \non-lexical " category) D, identi ed with the determiner. In this way, the structure of the noun phrase parallels that of the sentence, which is headed by In (ection), under assumptions now standard within the Government-Binding (GB) framework. The central empirical problem addressed is the question of the proper analysis of the so-called \Poss-ing " gerund in English. This construction possesses simultaneously many properties of sentences, and many properties of noun phrases. The problem of capturing this dual aspect of the Possing construction is heightened by current restrictive views of X-bar theory, which, in particular, rule out the obvious structure for Poss-ing, [NP NP VPing], by virtue of its exocentricity. Consideration of languages in which nouns, even the most basic concrete nouns, show agreement (AGR) with their possessors, points to an analysis

Head-Driven Statistical Models for Natural Language Parsing

by Michael Collins , 1999
"... ..."
Abstract - Cited by 1145 (16 self) - Add to MetaCart
Abstract not found

Pig Latin: A Not-So-Foreign Language for Data Processing

by Christopher Olston, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, Andrew Tomkins
"... There is a growing need for ad-hoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected every day. Parallel database products, e.g., Teradata, offer a solution, but are usually prohibitively e ..."
Abstract - Cited by 584 (12 self) - Add to MetaCart
There is a growing need for ad-hoc analysis of extremely large data sets, especially at internet companies where innovation critically depends on being able to analyze terabytes of data collected every day. Parallel database products, e.g., Teradata, offer a solution, but are usually prohibitively expensive at this scale. Besides, many of the people who analyze this data are entrenched procedural programmers, who find the declarative, SQL style to be unnatural. The success of the more procedural map-reduce programming model, and its associated scalable implementations on commodity hardware, is evidence of the above. However, the map-reduce paradigm is too low-level and rigid, and leads to a great deal of custom user code that is hard to maintain, and reuse. We describe a new language called Pig Latin that we have designed to fit in a sweet spot between the declarative style of SQL, and the low-level, procedural style of map-reduce. The accompanying system, Pig, is fully implemented, and compiles Pig Latin into physical plans that are executed over Hadoop, an open-source, map-reduce implementation. We give a few examples of how engineers at Yahoo! are using Pig to dramatically reduce the time required for the development and execution of their data analysis tasks, compared to using Hadoop directly. We also report on a novel debugging environment that comes integrated with Pig, that can lead to even higher productivity gains. Pig is an open-source, Apache-incubator project, and available for general use. 1.

Strategies of Discourse Comprehension

by Teun A. Van Dijk, Walter Kintsch , 1983
"... El Salvador, Guatemala is a, study in black and white. On the left is a collection of extreme Marxist-Leninist groups led by what one diplomat calls “a pretty faceless bunch of people.’ ’ On the right is an entrenched elite that has dominated Central America’s most populous country since a CIA-backe ..."
Abstract - Cited by 601 (27 self) - Add to MetaCart
El Salvador, Guatemala is a, study in black and white. On the left is a collection of extreme Marxist-Leninist groups led by what one diplomat calls “a pretty faceless bunch of people.’ ’ On the right is an entrenched elite that has dominated Central America’s most populous country since a CIA-backed coup deposed the reformist government of Col. Jacobo Arbenz Guzmán in 1954. Moderates of the political center. embattled but alive in E1 Salvador, have virtually disappeared in Guatemala-joining more than 30.000 victims of terror over the last tifteen vears. “The situation in Guatemala is much more serious than in EI Salvador, ” declares one Latin American diplomat. “The oligarchy is that much more reactionary. and the choices are far fewer. “ ‘Zero’: The Guatemalan oligarchs hated Jimmy Carter for cutting off U.S. military aid in 1977 to protest human-rights abuses-and the right-wingers hired marimba bands and set off firecrackers on the night Ronald Reagan was elected. They considered Reagan an ideological kinsman and believed they had a special
Next 10 →
Results 1 - 10 of 30,571
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University