Results 11 - 20
of
159
Information Extraction: Beyond Document Retrieval
- COMPUTATIONAL LINGUISTICS AND CHINESE LANGUAGE PROCESSING
, 1998
"... In this paper we give a synoptic view of the growth text processing technology of information extraction (IE) whose function is to extract information about a pre-specified set of entities, relations or events from natural language textsand to record this information in structured representations ..."
Abstract
-
Cited by 48 (10 self)
- Add to MetaCart
In this paper we give a synoptic view of the growth text processing technology of information extraction (IE) whose function is to extract information about a pre-specified set of entities, relations or events from natural language textsand to record this information in structured representations called templates. Here we describe the nature of the IE task, review the history of the area from its origins in AI work in the 1960's and 70's till the present, discuss the techniques being used to carry out the task, describe application areas where IE systems are or are about to be at work, and conclude with a discussion of the challenges facing the area. What emerges is a picture of an exciting new text processing technology with a host of new applications, both on its own and in conjunction with other technologies, such as information retrieval, machine translation and data mining.
Adaptive Performance Prediction for Distributed Data-Intensive Applications
, 1999
"... The computational grid is becoming the platform of choice for large-scale distributed data-intensive applications. Accurately predicting the transfer times of remote data les, a fundamental component of such applications, is critical to achieving application performance. In this paper, we introduce ..."
Abstract
-
Cited by 34 (3 self)
- Add to MetaCart
The computational grid is becoming the platform of choice for large-scale distributed data-intensive applications. Accurately predicting the transfer times of remote data les, a fundamental component of such applications, is critical to achieving application performance. In this paper, we introduce a performance prediction method, ARM (Adaptive Regression Modeling), to determine data transfer times for network-bound distributed dataintensive applications. We demonstrate the eectiveness of the ARM method on two distributed data applications, SARA (Synthetic Aperture Radar Atlas) and SRB (Storage Resource Broker) , and discuss how it can be used for application scheduling. Our experiments demonstrate that applying the ARM method to these applications predicted data transfer times in wide-area multi-user grid environments with accuracy of 88% or better. 1 Introduction Ensembles of distributed computational, storage, and other resources, also known as computational grids [12, 14], are...
Text Mining: Generating Hypotheses from MEDLINE
- Journal of the American Society for Information Science and Technology
"... Hypothesis generation, a crucial initial step for making scientific discoveries, relies on prior knowledge, experience and intuition. Chance connections made between seemingly distinct subareas sometimes turn out to be fruitful. The goal in text mining is to assist in this process by automaticall ..."
Abstract
-
Cited by 34 (2 self)
- Add to MetaCart
Hypothesis generation, a crucial initial step for making scientific discoveries, relies on prior knowledge, experience and intuition. Chance connections made between seemingly distinct subareas sometimes turn out to be fruitful. The goal in text mining is to assist in this process by automatically discovering a small set of interesting hypotheses from a suitable text collection.
Data Mining
- TO APPEAR IN THE HANDBOOK OF TECHNOLOGY MANAGEMENT, H. BIDGOLI (ED.)
, 2010
"... The amount of data being generated and stored is growing exponentially, due in large part to the continuing advances in computer technology. This presents tremendous opportunities for those who can ..."
Abstract
-
Cited by 26 (1 self)
- Add to MetaCart
The amount of data being generated and stored is growing exponentially, due in large part to the continuing advances in computer technology. This presents tremendous opportunities for those who can
Variable selection in data mining: Building a predictive model for bankruptcy
- Journal of the American Statistical Association
, 2004
"... We predict the onset of personal bankruptcy using least squares regression. Although well publicized, only 2,244 bankruptcies occur in our data set of 2.9 million months of credit-card activity. We use stepwise selection to find predictors from a mix of payment history, debt load, demographics, and ..."
Abstract
-
Cited by 24 (7 self)
- Add to MetaCart
We predict the onset of personal bankruptcy using least squares regression. Although well publicized, only 2,244 bankruptcies occur in our data set of 2.9 million months of credit-card activity. We use stepwise selection to find predictors from a mix of payment history, debt load, demographics, and their interactions. This combination of rare responses and over 67,000 possible predictors leads to a challenging modeling question: How does one separate coincidental from useful predictors? We show that three modifications turn stepwise regression into an effective methodology for predicting bankruptcy. Our version of stepwise regression (1) organizes calculations to accommodate interactions, (2) exploits modern decision theoretic criteria to choose predictors, and (3) conservatively estimates p-values to handle sparse data and a binary response. Omitting any one of these leads to poor performance. A final step in our procedure calibrates regression predictions. With these modifications, stepwise regression predicts bankruptcy as well, if not better, than recently developed data-mining tools. When sorted, the largest 14,000 resulting predictions hold 1000 of the 1800 bankruptcies hidden in a validation sample of 2.3 million observations. If the cost of missing a bankruptcy is 200 times that of a false positive, our predictions incur less than 2/3 of the costs of classification errors produced by the tree-based classifier C4.5. Key Phrases: AIC, Cp, Bonferroni, calibration, hard thresholding, risk inflation criterion (RIC),
Text mining and ontologies in biomedicine: making sense of raw text
- Briefings in Bioinformatics
, 2005
"... The volume of biomedical literature is increasing at such a rate that it is becoming difficult to locate, retrieve and manage the reported information without text mining, which aims to automatically distill info rmation, extract facts, discover implicit links and generate hypotheses relevant to use ..."
Abstract
-
Cited by 24 (0 self)
- Add to MetaCart
The volume of biomedical literature is increasing at such a rate that it is becoming difficult to locate, retrieve and manage the reported information without text mining, which aims to automatically distill info rmation, extract facts, discover implicit links and generate hypotheses relevant to user needs. Ontologies, as conceptual models, provide the necessary framework for semantic representation of textual information. The principal link between text and an ontology is terminology, which maps terms to domain-specific concepts. In this article, we summarize different approaches in which ontologies have been used for text mining applications in biomedicine.
Noise-Tolerant Windowing
- JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH
, 1998
"... Windowing has been proposed as a procedure for efficient memory use in the ID3 decision tree learning algorithm. However, it was shown that it may often lead to a decrease in performance, in particular in noisy domains. Following up on previous work, where we have demonstrated that the ability of ru ..."
Abstract
-
Cited by 23 (4 self)
- Add to MetaCart
Windowing has been proposed as a procedure for efficient memory use in the ID3 decision tree learning algorithm. However, it was shown that it may often lead to a decrease in performance, in particular in noisy domains. Following up on previous work, where we have demonstrated that the ability of rule learning algorithms to learn rules independently can be exploited for more efficient windowing procedures, we demonstrate in this paper how this property can be exploited to achieve noise-tolerance in windowing.
Iterate: A conceptual clustering algorithm for data mining
- IEEE TRANSACTIONS ON SYSTEMS, MAN AND CYBERNETICS
, 1998
"... The data exploration task can be divided into three interrelated subtasks: (i) feature selection, (ii) discovery, and (iii) interpretation. This paper describes an unsupervised discovery method with biases geared toward partitioning objects into clusters that improve interpretability. The algorithm, ..."
Abstract
-
Cited by 17 (0 self)
- Add to MetaCart
The data exploration task can be divided into three interrelated subtasks: (i) feature selection, (ii) discovery, and (iii) interpretation. This paper describes an unsupervised discovery method with biases geared toward partitioning objects into clusters that improve interpretability. The algorithm, ITERATE, employs: (i) a data ordering scheme and (ii) an iterative redistribution operator to produce maximally cohesive and distinct clusters. Cohesion or intra-class similarity is measured in terms of the match between individual objects and their assigned cluster prototype. Distinctness or inter-class dissimilarity is measured by an average of the variance of the distribution matchbetween clusters. We demonstrate that interpretability, from a problem solving viewpoint, is addressed by theintra- and interclass measures. Empirical results demonstrate the properties of the discovery algorithm, and its applications to problem solving.
Machine Learning in Games: A Survey
- MACHINES THAT LEARN TO PLAY GAMES, CHAPTER 2
, 2000
"... This paper provides a survey of previously published work on machine learning in game playing. The material is organized around a variety of problems that typically arise in game playing and that can be solved with machine learning methods. This approach, we believe, allows both, researchers in g ..."
Abstract
-
Cited by 16 (3 self)
- Add to MetaCart
This paper provides a survey of previously published work on machine learning in game playing. The material is organized around a variety of problems that typically arise in game playing and that can be solved with machine learning methods. This approach, we believe, allows both, researchers in game playing to find appropriate learning techniques for helping to solve their problems as well as machine learning researchers to identify rewarding topics for further research in game-playing domains. The paper covers learning techniques that range from neural networks to decision tree learning in games that range from poker to chess.
Using artificial intelligence planning to automate science data analysis for large image database
- In Proc. 1997 Conference on Knowledge Discovery and Data Mining
, 1997
"... This paper describes the use of AI planning techniques to represent scientific, image processing, and software tool knowledge to automate knowledge discovery and data mining (e.g., science data analysis) of large image databases. In particular, we describe two fielded systems. The Multimission VICAR ..."
Abstract
-
Cited by 15 (1 self)
- Add to MetaCart
This paper describes the use of AI planning techniques to represent scientific, image processing, and software tool knowledge to automate knowledge discovery and data mining (e.g., science data analysis) of large image databases. In particular, we describe two fielded systems. The Multimission VICAR Planner (MVP) which has been deployed for 2 years and is currently supporting science product generation for the Galileo mission. MVP has reduced time to fill certain classes of requests from 4 hours to 15 minutes. The Automated SAR Image Processing system (ASIP) which is currently in use by the Dept. of Geology at ASU supporting aeolian science analysis of synthetic aperture radar images. ASIP reduces the number of manual inputs in science product generation by 10-fold.

