• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Tools

Sorted by:
Try your query at:
Semantic Scholar Scholar Academic
Google Bing DBLP
Results 1 - 10 of 185
Next 10 →

Mining the Web for Bilingual Text

by Philip Resnik - In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics , 1999
"... STRAND (Resnik, 1998) is a language-independent system for automatic discovery of text in parallel translation on the World Wide Web. This paper extends the preliminary STRAND results by adding automatic language identification, scaling up by orders of magnitude, and formally evaluating performance. ..."
Abstract - Cited by 135 (7 self) - Add to MetaCart
STRAND (Resnik, 1998) is a language-independent system for automatic discovery of text in parallel translation on the World Wide Web. This paper extends the preliminary STRAND results by adding automatic language identification, scaling up by orders of magnitude, and formally evaluating performance

Automatic keyword extraction from historical document images. DAS06

by Kengo Terasawa, Takeshi Nagasaki, Toshio Kawashima , 2006
"... Abstract. This paper presents an automatic keyword extraction method from historical document images. The proposed method is language in-dependent because it is purely appearance based, where neither lexical information nor any other statistical language models are required. More-over, since it does ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
Abstract. This paper presents an automatic keyword extraction method from historical document images. The proposed method is language in-dependent because it is purely appearance based, where neither lexical information nor any other statistical language models are required. More-over, since

Using word sense discrimination on historic document collections

by Nina Tahmasebi, Kai Niklas, Thomas Risse, Thomas Theuerkauf - In JCDL ’10: Proceedings of the 10th ACM/IEEE-CS joint conference on Digital libraries, Gold , 2010
"... Word sense discrimination is the first, important step towards automatic detection of language evolution within large, historic document collections. By comparing the found word senses over time, we can reveal and use important information that will improve understanding and accessibility of a digit ..."
Abstract - Cited by 10 (5 self) - Add to MetaCart
Word sense discrimination is the first, important step towards automatic detection of language evolution within large, historic document collections. By comparing the found word senses over time, we can reveal and use important information that will improve understanding and accessibility of a

Corpus and Evaluation Measures for Automatic Plagiarism Detection

by Alberto Barrón-cedeño, Martin Potthast, Paolo Rosso, Benno Stein, Andreas Eiselt
"... The simple access to texts on digital libraries and the WWW has led to an increased number of plagiarism cases in recent years, which renders manual plagiarism detection infeasible at large. Various methods for automatic plagiarism detection have been developed whose objective is to assist human exp ..."
Abstract - Cited by 3 (0 self) - Add to MetaCart
The simple access to texts on digital libraries and the WWW has led to an increased number of plagiarism cases in recent years, which renders manual plagiarism detection infeasible at large. Various methods for automatic plagiarism detection have been developed whose objective is to assist human

Non-interactive ocr post-correction for giga-scale digitization projects

by Martin Reynaert - In Proceedings of the 9th international conference on Computational linguistics and intelligent text processing, CICLing'08 , 2008
"... Abstract. This paper proposes a non-interactive system for reducing the level of OCR-induced typographical variation in large text collections, contemporary and historical. Text-Induced Corpus Clean-up or ticcl (pronounce ’tickle’) focuses on high-frequency words derived from the corpus to be cleane ..."
Abstract - Cited by 18 (5 self) - Add to MetaCart
Abstract. This paper proposes a non-interactive system for reducing the level of OCR-induced typographical variation in large text collections, contemporary and historical. Text-Induced Corpus Clean-up or ticcl (pronounce ’tickle’) focuses on high-frequency words derived from the corpus

Automatic program repair with evolutionary computation,”

by Stephanie Forrest , Westley Weimer , Stephanie Forrest , Claire Le Goues , Thanhvu Nguyen - Commun. ACM, , 2010
"... Abstract There are many methods for detecting and mitigating software errors but few generic methods for automatically repairing errors once they are discovered. This paper highlights recent work combining program analysis methods with evolutionary computation to automatically repair bugs in off-th ..."
Abstract - Cited by 34 (5 self) - Add to MetaCart
generations. These results show that GP can automatically discover repairs for a variety of documented bugs in production C programs. COMMUNICATIONS OF THE ACM | MAY 2010 | VOL. 53 | NO. 5 research highlights The trial terminates if it discovers a primary repair. We performed 100 trials for each program

Measuring Historical Word Sense Variation

by David Bamman, Gregory Crane
"... We describe here a method for automatically identifying word sense variation in a dated collection of historical books in a large digital library. By leveraging a small set of known translation book pairs to induce a bilingual sense inventory and labeled training data for a WSD classifier, we are ab ..."
Abstract - Cited by 9 (2 self) - Add to MetaCart
We describe here a method for automatically identifying word sense variation in a dated collection of historical books in a large digital library. By leveraging a small set of known translation book pairs to induce a bilingual sense inventory and labeled training data for a WSD classifier, we

Comparative Classifier Evaluation for Web-scale

by Taxonomies Using Power Law, Rohit Babbar, Ioannis Partalas, Cornelia Metzig, Eric Gaussier
"... Abstract. In the context of web-scale taxonomies such as Mozilla and Yahoo! 1 directories, previous works have shown the existence of power law distribution in the size of the categories for every level in the tax-onomy. In this work, we analyse how such high-level semantics can be leveraged to eval ..."
Abstract - Add to MetaCart
to evaluate accuracy of hierarchical classifiers which automat-ically assign the unseen documents to leaf-level categories in the taxon-omy. Commonly used evaluation method, which relies on k-fold cross-validation suffers from computational challenges for such large scale taxonomies. The proposed technique

Automatic Measurement of Quality Metrics for

by Colonoscopy Videos, Sae Hwang, Junghwan Oh, Jeongkyu Lee, Yu Cao, Wallapak Tavanapong, Danyu Liu, Johnny Wong, Piet C. De Groen
"... Colonoscopy is the accepted screening method for detection of colorectal cancer or its precursor lesions, colorectal polyps. Indeed, colonoscopy has contributed to a decline in the number of colorectal cancer related deaths. However, not all cancers or large polyps are detected at the time of colono ..."
Abstract - Add to MetaCart
method allows future quality control in the day-to-day medical practice setting on a large scale. In addition, our method can be adapted to other healthcare procedures. Last but not least, our method may be useful to assess progress during colonoscopy training, or as part of endoscopic skills assessment

Toward a Predictive Statistical Model of Task-based Performance Using Automatic MT Evaluation Metrics

by unknown authors
"... Can automatic intrinsic metrics predict when an MT engine is “good enough ” to support people performing sense-making tasks, such as the extraction of who, when, and where information from translated documents? This paper reports on the challenges and results of a large-scale, task-based experiment ..."
Abstract - Add to MetaCart
Can automatic intrinsic metrics predict when an MT engine is “good enough ” to support people performing sense-making tasks, such as the extraction of who, when, and where information from translated documents? This paper reports on the challenges and results of a large-scale, task-based experiment
Next 10 →
Results 1 - 10 of 185
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University