Results 11 -
16 of
16
Adverse Drug Reactions: A Retrospective
"... Citations (this article cites 17 articles hosted on the ..."
Spontaneous Reporting System Modelling for Data Mining Methods Evaluation in Pharmacovigilance
"... The pharmacovigilance aims at detecting adverse effects of marketed drugs. It is based on the spontaneous reporting of events that are supposed to be adverse effects of drugs. The Spontaneous Reporting System (SRS) is supplying huge databases that pharmacovigilance experts cannot exhaustively exploi ..."
Abstract
- Add to MetaCart
The pharmacovigilance aims at detecting adverse effects of marketed drugs. It is based on the spontaneous reporting of events that are supposed to be adverse effects of drugs. The Spontaneous Reporting System (SRS) is supplying huge databases that pharmacovigilance experts cannot exhaustively exploit without any data mining tools. Data mining methods have been proposed in the literature but none of them is the object of a consensus in terms of applicability and efficiency. It is especially due to the difficulties to evaluate the methods on real data. In this context, the aim of this paper is to propose the SRS modelling in order to simulate realistic data that would permit to complete the methods evaluation and comparison, with the perspective to help in defining surveillance strategies. In fact, as the status of the drug-event relations is known in the simulated dataset, the signal generated by the data mining methods can be labelled as ”true ” or ”false”. Spontaneous Reporting process is viewed as a Poisson process depending on the drugs exposure frequency, on the delay from the drugs launch, on the adverse events background incidence and seriousness and on a reporting probability. This reporting probability, quantitatively unknown, is derived from the qualitative knowledge found in literature and expressed by experts. This knowledge is represented and exploited by means of a fuzzy characterisation of variables and a set of fuzzy rules. Simulated data are described and two Bayesian data mining methods are applied to illustrate the kind of information, on methods performances, that can be derived from the SRS modelling and from the data simulation. 1
A Sequential Monte Carlo Method for Bayesian
- Journal of Knowledge Discovery and Data Mining
, 2003
"... Markov chain Monte Carlo (MCMC) techniques revolutionized statistical practice in the 1990s by providing an essential toolkit for making the rigor and flexibility of Bayesian analysis computationally practical. At the same time the increasing prevalence of massive datasets and the expansion of the f ..."
Abstract
- Add to MetaCart
Markov chain Monte Carlo (MCMC) techniques revolutionized statistical practice in the 1990s by providing an essential toolkit for making the rigor and flexibility of Bayesian analysis computationally practical. At the same time the increasing prevalence of massive datasets and the expansion of the field of data mining has created the need for statistically sound methods that scale to these large problems. Except for the most trivial examples, current MCMC methods require a complete scan of the dataset for each iteration eliminating their candidacy as feasible data mining techniques.
Significance Tests for Unsupervised Pattern Discovery in Large Continuous Multivariate Data Sets Richard J. Bolton
"... In this paper we consider the question of uncertainty of discovered patterns in data mining. In particular, we develop statistical tests for flagged patterns found in continuous data, where such patterns are perhaps more familiar to statisticians as local modes in the data. We indicate the significa ..."
Abstract
- Add to MetaCart
In this paper we consider the question of uncertainty of discovered patterns in data mining. In particular, we develop statistical tests for flagged patterns found in continuous data, where such patterns are perhaps more familiar to statisticians as local modes in the data. We indicate the significance of these patterns in terms of the probability that they have occurred by chance. We examine the performance of these tests on patterns discovered in several large data sets, including a data set describing the locations of earthquakes in California and another describing flow cytometry measurements on phytoplankton. Keywords: Data mining, pattern discovery, mode analysis, local structure, uncertainty, significance tests Corresponding author. Telephone +44 (0)20 7589 5111 ext 58600; Fax +44 (0)20 7594 8517. 1
A Rigorous Statistical Approach for Identifying Significant Itemsets
"... As advances in technology allow for the collection, storage, and mining of vast amounts of data, the task of screening and assessing the significance of the discovered patterns is becoming a major challenge in data mining applications. In this work, we address significance in the context of frequent ..."
Abstract
- Add to MetaCart
As advances in technology allow for the collection, storage, and mining of vast amounts of data, the task of screening and assessing the significance of the discovered patterns is becoming a major challenge in data mining applications. In this work, we address significance in the context of frequent itemset mining. Specifically, we develop a novel methodology to identify a meaningful support threshold s for a dataset, such that the family of frequent itemsets with respect to s embodies a substantial deviation from what would be expected in a random dataset, hence these itemsets can be flagged as significant. Our methodology hinges on a Poisson approximation of the distribution of the number of frequent itemsets of a given size, which is the main theoretical result of the paper. A crucial feature of our approach is that, unlike previous work, it takes into account the entire dataset rather than individual discoveries, hence it is able to distinguishing between significant observations and random fluctuations in data, thus resulting in fewer false discoveries. Extensive experiments are reported that substantiate the effectiveness of our methodology. 1.
unknown title
"... Dependency derivation is the search for combinations of variables (or states of variables) in a database, that co-occur unexpectedly often. In Bayesian dependency derivation, indications are ranked primarily by their estimated strengths, but an adjustment is made to account for uncertainty when data ..."
Abstract
- Add to MetaCart
Dependency derivation is the search for combinations of variables (or states of variables) in a database, that co-occur unexpectedly often. In Bayesian dependency derivation, indications are ranked primarily by their estimated strengths, but an adjustment is made to account for uncertainty when data is scarce. This reduces the risk of highlighting spurious associations. This report presents refined methods for IC analysis—one method for Bayesian dependency derivation. The disproportionality measure in IC analysis is the Information Component (IC) [BLE+ 98]. It relates the observed joint frequency of two particular states of two different variables to the frequency expected under the assumption of independence. In the current implementation of IC analysis, estimates for the lower 95% credibility interval limit are derived based on a normal approximation to the posterior IC distribution [OLBL00]. In this report, the validity of these approximations is examined through Monte Carlo simulation. Monte Carlo simulation is also proposed and used as a general tool to study the IC distribution.

