• Documents
  • Authors
  • Tables
  • Other Seers ▼
    RefSeer AckSeer CollabSeer SeerSeer
  • Log in
  • Sign up
  • MetaCart

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations | Disambiguate

A Semi-Supervised Active Learning Algorithm for Information Extraction from Textual Data.” Journal of the American Society for Information Science 107 document is a research report submitted to the U.S. Department of Justice. This report has not been publ (0)

by T Wu, W M Pottenger
Add To MetaCart

Tools

Sorted by:
Results 1 - 6 of 6

Regular expression learning for information extraction

by Yunyao Li, Rajasekar Krishnamurthy, Sriram Raghavan, Shivakumar Vaithyanathan, H. V. Jagadish - In EMNLP , 2008
"... Regular expressions have served as the dominant workhorse of practical information extraction for several years. However, there has been little work on reducing the manual effort involved in building high-quality, complex regular expressions for information extraction tasks. In this paper, we propos ..."
Abstract - Cited by 8 (5 self) - Add to MetaCart
Regular expressions have served as the dominant workhorse of practical information extraction for several years. However, there has been little work on reducing the manual effort involved in building high-quality, complex regular expressions for information extraction tasks. In this paper, we propose Re-LIE, a novel transformation-based algorithm for learning such complex regular expressions. We evaluate the performance of our algorithm on multiple datasets and compare it against the CRF algorithm. We show that ReLIE, in addition to being an order of magnitude faster, outperforms CRF under conditions of limited training data and cross-domain data. Finally, we show how the accuracy of CRF can be improved by using features extracted by ReLIE. 1

Distributed Higher Order Association Rule Mining Using Information Extracted from Textual Data

by Shenzhi Li - SIGKDD Explorations, Volume 7, Issue , 2005
"... The burgeoning amount of textual data in distributed sources combined with the obstacles involved in creating and maintaining central repositories motivates the need for effective distributed information extraction and mining techniques. Recently, as the need to mine patterns across distributed data ..."
Abstract - Cited by 6 (5 self) - Add to MetaCart
The burgeoning amount of textual data in distributed sources combined with the obstacles involved in creating and maintaining central repositories motivates the need for effective distributed information extraction and mining techniques. Recently, as the need to mine patterns across distributed databases has grown, Distributed Association Rule Mining (D-ARM) algorithms have been developed. These algorithms, however, assume that the databases are either horizontally or vertically distributed. In the special case of databases populated from information extracted from textual data, existing D-ARM algorithms cannot discover rules based on higher-order associations between items in distributed textual documents that are neither vertically nor horizontally distributed, but rather a hybrid of the two. In this article we present D-HOTM, a framework for Distributed Higher Order Text Mining. D-HOTM is a hybrid approach that combines information extraction and distributed data mining. We employ a novel information extraction technique to extract meaningful entities from unstructured text in a distributed environment. The information extracted is stored in local databases and a mapping function is applied to identify globally unique keys. Based on the extracted information, a novel distributed association rule mining algorithm is applied to discover higher-order associations between items (i.e., entities) in records fragmented across the distributed databases using the keys. Unlike existing algorithms, D-HOTM requires neither knowledge of a global schema nor that the distribution of data be horizontal or vertical. Evaluation methods are proposed to incorporate the performance of the mapping function into the traditional support metric used in ARM evaluation. An example application of the algorithm on distributed law enforcement data demonstrates the relevance of D-HOTM in the fight against terrorism. Keywords Distributed data mining, distributed association rule mining, knowledge discovery, artificial intelligence, machine learning, data mining, association rule mining, text mining, evaluation, privacy-preserving, terrorism, law enforcement, criminal justice 1.

Robust document image understanding technologies

by Henry S. Baird, Daniel Lopresti, Brian D. Davison, William M. Pottenger - In HDP ’04: Proceedings of the 1st ACM Workshop on Hardcopy Document Processing,pages 9–14,2004
"... No existing document image understanding technology, whether experimental or commercially available, can guarantee high accuracy across the full range of documents of interest to industrial and government agency users. Ideally, users should be able to search, access, examine, and navigate among docu ..."
Abstract - Cited by 5 (0 self) - Add to MetaCart
No existing document image understanding technology, whether experimental or commercially available, can guarantee high accuracy across the full range of documents of interest to industrial and government agency users. Ideally, users should be able to search, access, examine, and navigate among document images as effectively as they can among encoded data files, using familiar interfaces and tools as fully as possible. We are investigating novel algorithms and software tools at the frontiers of document image analysis, information retrieval, text mining, and visualization that will assist in the full integration of such documents into collections of textual document images as well as “born digital ” documents. Our approaches emphasize versatility first: that is, methods which work reliably across the broadest possible range of documents.

2005b) Link Analysis Survey

by M. Pottenger, Ph. D. Xiaoning Yang, William M. Pottenger, Ph. D, Xiaoning Yang, Stephen V. Zanias, Stephen V. Zanias - NIJ Proposal Number , 2005
"... The author(s) shown below used Federal funds provided by the U.S. Department of Justice and prepared the following final report: ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
The author(s) shown below used Federal funds provided by the U.S. Department of Justice and prepared the following final report:

2007) "Automated criminal link analysis based on domain knowledge

by Jennifer Schroeder, Jennifer Xu, Hsinchun Chen, Michael Chau - Journal of the American Society for Information Science and Technology
"... Link (association) analysis has been used in the criminal justice domain to search large datasets for associations between crime entities in order to facilitate crime investigations. However, link analysis still faces many challenging problems, such as information overload, high search complexity, a ..."
Abstract - Cited by 1 (1 self) - Add to MetaCart
Link (association) analysis has been used in the criminal justice domain to search large datasets for associations between crime entities in order to facilitate crime investigations. However, link analysis still faces many challenging problems, such as information overload, high search complexity, and heavy reliance on domain knowledge. To address these challenges, this article proposes several techniques for automated, effective, and efficient link analysis. These techniques include the co-occurrence analysis, the shortest path algorithm, and a heuristic approach to identifying associations and determining their importance. We developed a prototype system called CrimeLink Explorer based on the proposed techniques. Results of a user study with 10 crime investigators from the Tucson Police Department showed that our system could help subjects conduct link analysis more efficiently than traditional single-level link analysis tools. Moreover, subjects believed that association paths found based on the heuristic approach were more accurate than those found based solely on the co-occurrence analysis and that the automated link analysis system would be of great help in crime investigations.

for Information Extraction in Criminal Justice Information Systems”

by William M. Pottenger, Tianhao Wu
"... 2.1.1. Context Pattern Discovery...................................................................... 7 2.1.2. Active selection of true segments............................................................ 8 ..."
Abstract - Add to MetaCart
2.1.1. Context Pattern Discovery...................................................................... 7 2.1.2. Active selection of true segments............................................................ 8
The National Science Foundation
  • About CiteSeerX
  • Submit Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2010 The Pennsylvania State University