Results 1  10
of
15
Model based document classification and clustering. Manuscript in preparation
, 2001
"... WORK IN PROGRESS, DO NOT QUOTE In this paper we develop a complete methodology for document classification and clustering. We start by investigating how the choice of document features, such as weights, transformations, and dimensionality reduction, influences the performance of document classificat ..."
Abstract

Cited by 5 (2 self)
 Add to MetaCart
WORK IN PROGRESS, DO NOT QUOTE In this paper we develop a complete methodology for document classification and clustering. We start by investigating how the choice of document features, such as weights, transformations, and dimensionality reduction, influences the performance of document classification. We then used these findings to construct a model based document clustering (MBDC) algorithm suitable for document collections. This method explicitly models the data as being drawn from a Gaussian mixture. This is used to construct clusters based on the likelihood of the data, and to classify documents according to the Bayes rule. One main advantage of our approach is the ability to automatically select the number of clusters present in the document collection via Bayes factors. Our experiments with the Topic detection and tracking Corpus demonstrates the ability of MBDC to choose a sensible number of clusters as well as meaningful partitions of the data.
Some Studies of Expectation Maximization Clustering Algorithm to Enhance Performance
"... Abstract: Expectation Maximization (EM) is an efficient mixturemodel based clustering method. In this paper, authors made an attempt to scaleup the algorithm, by reducing the computation time required for computing quadratic term, without sacrificing the accuracy. Probability density function (pdf ..."
Abstract

Cited by 1 (1 self)
 Add to MetaCart
(Show Context)
Abstract: Expectation Maximization (EM) is an efficient mixturemodel based clustering method. In this paper, authors made an attempt to scaleup the algorithm, by reducing the computation time required for computing quadratic term, without sacrificing the accuracy. Probability density function (pdf) is to be calculated in EM, which involves evaluating quadratic term calculation. Three recursive approaches are introduced for quadratic term 2 computation. As per our observation, the standard EM needs O ( d) computations for quadratic term computation, where d is number of dimensions. The proposed recursive 2 EM approaches are with time complexity of O ( d /2) for the quadratic term computation.
Proc. Fifth Australasian Data Mining Conference (AusDM2006) Analysis of Breast Feeding Data Using Data Mining Methods
"... The purpose of this study is to demonstrate the benefit of using common data mining techniques on survey data where statistical analysis is routinely applied. The statistical survey is commonly used to collect quantitative information about an item in a population. Statistical analysis is usually ca ..."
Abstract
 Add to MetaCart
The purpose of this study is to demonstrate the benefit of using common data mining techniques on survey data where statistical analysis is routinely applied. The statistical survey is commonly used to collect quantitative information about an item in a population. Statistical analysis is usually carried out on survey data to test hypothesis. We report in this paper an application of data mining methodologies to breast feeding survey data which have been conducted and analysed by statisticians. The purpose of the research is to study the factors leading to deciding whether or not to breast feed a new born baby. Various data mining methods are applied to the data. Feature or variable selection is conducted to select the most discriminative and least redundant features using an information theory based method and a statistical approach. Decision tree and regression approaches are tested on classification tasks using features selected. Risk pattern mining method is also applied to identify groups with high risk of not breast feeding. The success of data mining in this study suggests that using data mining approaches will be applicable to other similar survey data. The data mining methods, which enable a search for hypotheses, may be used as a complementary survey data analysis tool to traditional statistical analysis.
An Advanced ConceptBased Mining Model to Enrich Text Clustering
"... Abstract — Text mining are based on the statistical analysis of a term, either word or phrase. Statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes more to the ..."
Abstract
 Add to MetaCart
Abstract — Text mining are based on the statistical analysis of a term, either word or phrase. Statistical analysis of a term frequency captures the importance of the term within a document only. However, two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. A new conceptbased mining model that analyzes terms on the sentence, document, and corpus levels is introduced. The conceptbased mining model can effectively discriminate between non important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The similarity between documents is calculated based on a new conceptbased similarity measure. The proposed similarity measure takes full advantage of using the concept analysis measures on the sentence, document, and corpus levels in calculating the similarity between documents. The experiments demonstrate extensive comparison between the conceptbased analysis and the traditional analysis. Experimental results demonstrate the substantial enhancement of the clustering quality using the sentencebased, documentbased, corpusbased, and combined approach concept analysis.
A Consistent Web Documents Based Text Clustering Using Concept Based Mining Model
"... Text mining is a growing innovative field that endeavors to collect significant information from natural language processing term. It might be insecurely distinguished as the course of examining texts to extract information that is practical for particular purposes. In this case, the mining model ca ..."
Abstract
 Add to MetaCart
(Show Context)
Text mining is a growing innovative field that endeavors to collect significant information from natural language processing term. It might be insecurely distinguished as the course of examining texts to extract information that is practical for particular purposes. In this case, the mining model can detain provisions that identify the concepts of the sentence or document, which tends to detect the subject of the document. In an existing work, the conceptbased mining model is used only for normal text documents clustering and clustered the text parts of the documents and efficiently discover noteworthy identical concepts among documents, according to the semantics of the sentences. But the downside of the work is that the existing work cannot be linked to web documents clustering and the text classification for the documents is an unreliable one. To make the text clustering more consistent, in our work, we plan to present a Conceptual Rule Mining On Text clusters to evaluate the more related and influential sentences contributing to the document topic. In this paper, the conceptual text clustering extends to web documents, containing various markup language formats associated with the documents (term extraction mode). Based on the markup languages like presentations, procedural and descriptive markup, the web document's text clustering is done efficiently using the conceptbased mining model. Experiments are conducted with the web documents extracted from the research repositories to evaluate the efficiency of the proposed consistent web document's text clustering using conceptual rule mining with an existing An
Stock Trend Analysis and Trading Strategy
"... This paper outlines a data mining approach to analysis and prediction of the trend of stock prices. The approach consists of three steps, namely partitioning, analysis and prediction. A modification of the commonly used kmeans clustering algorithm is used to partition stock price time series data. ..."
Abstract
 Add to MetaCart
This paper outlines a data mining approach to analysis and prediction of the trend of stock prices. The approach consists of three steps, namely partitioning, analysis and prediction. A modification of the commonly used kmeans clustering algorithm is used to partition stock price time series data. After data partition, linear regression is used to analyse the trend within each cluster. The results of the linear regression are then used for trend prediction for windowed time series data. The approach is efficient and effective at predicting forward trends of stock prices. Using our trend prediction methodology, we propose a trading strategy TTP (Trading based on Trend Prediction). Some preliminary results of applying TTP to stock trading are reported.
International Journal of Electronics and Computer Science Engineering 665 Available Online at www.ijecse.org ISSN 22771956 EFFICIENT ALGORITHMS FOR EXTENSION OF MOBILE BACKBONE NETWORKS
"... Abstract Network wide communication is an essential criterion for all wireless sensor networks, to transmit the collected data from environment to base station (sink node) in an efficient way. The wide network coverage is provided by constructing Mobile Backbone Networks (MBN), which are dynamic ne ..."
Abstract
 Add to MetaCart
(Show Context)
Abstract Network wide communication is an essential criterion for all wireless sensor networks, to transmit the collected data from environment to base station (sink node) in an efficient way. The wide network coverage is provided by constructing Mobile Backbone Networks (MBN), which are dynamic networks. These networks have two types of nodes. They are Mobile backbone nodes and Regular nodes. The Mobile backbone nodes have superior mobility and communication capability than regular nodes. All the information needs to be routed through mobile backbone nodes to regular nodes. The communication between clusters is done through backbone node so that transmission overhead is less. In this paper, we are mainly concentrates on throughput optimization and assigning new regular nodes. First the throughput range is calculated for each cluster, and then data packets are transmitted in such a way that the calculated range of throughput for each cluster is satisfied. The number of regular nodes that can be successfully assigned to mobile backbone nodes can be improved by means of adopting network design formulation technique. In case of any failure of mobile backbone node,new cluster head is elected based on high energy first (HEF) algorithm, where the residual energy of the nodes are considered for election. Keywords MBN, HEF, Network design formulation. 1.
Concept Based Mining Model for Text Clustering
"... Abstract: The common techniques in text mining are based on the statistical analysis of a term either word or phrase. Statistical analysis of a term frequency captures the importance of the term within a document only. Two terms can have the same frequency in their documents, but one term contribute ..."
Abstract
 Add to MetaCart
Abstract: The common techniques in text mining are based on the statistical analysis of a term either word or phrase. Statistical analysis of a term frequency captures the importance of the term within a document only. Two terms can have the same frequency in their documents, but one term contributes more to the meaning of its sentences than the other term. A new conceptbased mining model that analyzes terms in the sentence, document level and corpus level is introduced. The concept based mining model can effectively discriminate between non important terms with respect to sentence semantics and terms which hold the concepts that represent the sentence meaning. The proposed model consists of sentencebased concept analysis, documentbased concept analysis, corpus based concept analysis and conceptbased similarity measure in calculating the similarity between documents.