Results 1 - 10
of
185
Mining the Web for Bilingual Text
- In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics
, 1999
"... STRAND (Resnik, 1998) is a language-independent system for automatic discovery of text in parallel translation on the World Wide Web. This paper extends the preliminary STRAND results by adding automatic language identification, scaling up by orders of magnitude, and formally evaluating performance. ..."
Abstract
-
Cited by 135 (7 self)
- Add to MetaCart
STRAND (Resnik, 1998) is a language-independent system for automatic discovery of text in parallel translation on the World Wide Web. This paper extends the preliminary STRAND results by adding automatic language identification, scaling up by orders of magnitude, and formally evaluating performance
Automatic keyword extraction from historical document images. DAS06
, 2006
"... Abstract. This paper presents an automatic keyword extraction method from historical document images. The proposed method is language in-dependent because it is purely appearance based, where neither lexical information nor any other statistical language models are required. More-over, since it does ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
Abstract. This paper presents an automatic keyword extraction method from historical document images. The proposed method is language in-dependent because it is purely appearance based, where neither lexical information nor any other statistical language models are required. More-over, since
Using word sense discrimination on historic document collections
- In JCDL ’10: Proceedings of the 10th ACM/IEEE-CS joint conference on Digital libraries, Gold
, 2010
"... Word sense discrimination is the first, important step towards automatic detection of language evolution within large, historic document collections. By comparing the found word senses over time, we can reveal and use important information that will improve understanding and accessibility of a digit ..."
Abstract
-
Cited by 10 (5 self)
- Add to MetaCart
Word sense discrimination is the first, important step towards automatic detection of language evolution within large, historic document collections. By comparing the found word senses over time, we can reveal and use important information that will improve understanding and accessibility of a
Corpus and Evaluation Measures for Automatic Plagiarism Detection
"... The simple access to texts on digital libraries and the WWW has led to an increased number of plagiarism cases in recent years, which renders manual plagiarism detection infeasible at large. Various methods for automatic plagiarism detection have been developed whose objective is to assist human exp ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
The simple access to texts on digital libraries and the WWW has led to an increased number of plagiarism cases in recent years, which renders manual plagiarism detection infeasible at large. Various methods for automatic plagiarism detection have been developed whose objective is to assist human
Non-interactive ocr post-correction for giga-scale digitization projects
- In Proceedings of the 9th international conference on Computational linguistics and intelligent text processing, CICLing'08
, 2008
"... Abstract. This paper proposes a non-interactive system for reducing the level of OCR-induced typographical variation in large text collections, contemporary and historical. Text-Induced Corpus Clean-up or ticcl (pronounce ’tickle’) focuses on high-frequency words derived from the corpus to be cleane ..."
Abstract
-
Cited by 18 (5 self)
- Add to MetaCart
Abstract. This paper proposes a non-interactive system for reducing the level of OCR-induced typographical variation in large text collections, contemporary and historical. Text-Induced Corpus Clean-up or ticcl (pronounce ’tickle’) focuses on high-frequency words derived from the corpus
Automatic program repair with evolutionary computation,”
- Commun. ACM,
, 2010
"... Abstract There are many methods for detecting and mitigating software errors but few generic methods for automatically repairing errors once they are discovered. This paper highlights recent work combining program analysis methods with evolutionary computation to automatically repair bugs in off-th ..."
Abstract
-
Cited by 34 (5 self)
- Add to MetaCart
generations. These results show that GP can automatically discover repairs for a variety of documented bugs in production C programs. COMMUNICATIONS OF THE ACM | MAY 2010 | VOL. 53 | NO. 5 research highlights The trial terminates if it discovers a primary repair. We performed 100 trials for each program
Measuring Historical Word Sense Variation
"... We describe here a method for automatically identifying word sense variation in a dated collection of historical books in a large digital library. By leveraging a small set of known translation book pairs to induce a bilingual sense inventory and labeled training data for a WSD classifier, we are ab ..."
Abstract
-
Cited by 9 (2 self)
- Add to MetaCart
We describe here a method for automatically identifying word sense variation in a dated collection of historical books in a large digital library. By leveraging a small set of known translation book pairs to induce a bilingual sense inventory and labeled training data for a WSD classifier, we
Comparative Classifier Evaluation for Web-scale
"... Abstract. In the context of web-scale taxonomies such as Mozilla and Yahoo! 1 directories, previous works have shown the existence of power law distribution in the size of the categories for every level in the tax-onomy. In this work, we analyse how such high-level semantics can be leveraged to eval ..."
Abstract
- Add to MetaCart
to evaluate accuracy of hierarchical classifiers which automat-ically assign the unseen documents to leaf-level categories in the taxon-omy. Commonly used evaluation method, which relies on k-fold cross-validation suffers from computational challenges for such large scale taxonomies. The proposed technique
Automatic Measurement of Quality Metrics for
"... Colonoscopy is the accepted screening method for detection of colorectal cancer or its precursor lesions, colorectal polyps. Indeed, colonoscopy has contributed to a decline in the number of colorectal cancer related deaths. However, not all cancers or large polyps are detected at the time of colono ..."
Abstract
- Add to MetaCart
method allows future quality control in the day-to-day medical practice setting on a large scale. In addition, our method can be adapted to other healthcare procedures. Last but not least, our method may be useful to assess progress during colonoscopy training, or as part of endoscopic skills assessment
Toward a Predictive Statistical Model of Task-based Performance Using Automatic MT Evaluation Metrics
"... Can automatic intrinsic metrics predict when an MT engine is “good enough ” to support people performing sense-making tasks, such as the extraction of who, when, and where information from translated documents? This paper reports on the challenges and results of a large-scale, task-based experiment ..."
Abstract
- Add to MetaCart
Can automatic intrinsic metrics predict when an MT engine is “good enough ” to support people performing sense-making tasks, such as the extraction of who, when, and where information from translated documents? This paper reports on the challenges and results of a large-scale, task-based experiment
Results 1 - 10
of
185