• Documents
  • Authors
  • Tables
  • Log in
  • Sign up
  • MetaCart
  • DMCA
  • Donate

CiteSeerX logo

Advanced Search Include Citations
Advanced Search Include Citations

A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization (1997)

by Thorsten Joachims
Add To MetaCart

Tools

Sorted by:
Results 11 - 20 of 456
Next 10 →

Improving Text Classification by Shrinkage in a Hierarchy of Classes

by Andrew Mccallum, Ronald Rosenfeld, Tom Mitchell, Andrew Ng , 1998
"... When documents are organized in a large number of topic categories, the categories are often arranged in a hierarchy. The U.S. patent database and Yahoo are two examples. ..."
Abstract - Cited by 289 (6 self) - Add to MetaCart
When documents are organized in a large number of topic categories, the categories are often arranged in a hierarchy. The U.S. patent database and Yahoo are two examples.

Trust management for the semantic web

by Matthew Richardson, Rakesh Agrawal, Pedro Domingos - In ISWC , 2003
"... Abstract. Though research on the Semantic Web has progressed at a steady pace, its promise has yet to be realized. One major difficulty is that, by its very nature, the Semantic Web is a large, uncensored system to which anyone may contribute. This raises the question of how much credence to give ea ..."
Abstract - Cited by 271 (3 self) - Add to MetaCart
Abstract. Though research on the Semantic Web has progressed at a steady pace, its promise has yet to be realized. One major difficulty is that, by its very nature, the Semantic Web is a large, uncensored system to which anyone may contribute. This raises the question of how much credence to give each source. We cannot expect each user to know the trustworthiness of each source, nor would we want to assign top-down or global credibility values due to the subjective nature of trust. We tackle this problem by employing a web of trust, in which each user provides personal trust values for a small number of other users. We compose these trusts to compute the trust a user should place in any other user in the network. A user is not assigned a single trust rank. Instead, different users may have different trust values for the same user. We define properties for combination functions which merge such trusts, and define a class of functions for which merging may be done locally while maintaining these properties. We give examples of specific functions and apply them to data from Epinions and our BibServ bibliography server. Experiments confirm that the methods are robust to noise, and do not put unreasonable expectations on users. We hope that these methods will help move the Semantic Web closer to fulfilling its promise. 1.
(Show Context)

Citation Context

...mework as well. The analog of belief combination for the WWW is estimating the quality and relevance of web pages. Information retrieval methods based solely on the content of the page (such as TFIDF =-=[20]-=-) are useful, but are outperformed by methods that also involve the connectivity between pages [12][23][26]. Gil and Ratnaker [19] present an algorithm that involves a more complex, thoughqualitative...

Analyzing the Effectiveness and Applicability of Co-training

by Kamal Nigam, Rayid Ghani , 2000
"... Recently there has been significant interest in supervised learning algorithms that combine labeled and unlabeled data for text learning tasks. The co-training setting [1] applies to datasets that have a natural separation of their features into two disjoint sets. We demonstrate that when learning f ..."
Abstract - Cited by 263 (7 self) - Add to MetaCart
Recently there has been significant interest in supervised learning algorithms that combine labeled and unlabeled data for text learning tasks. The co-training setting [1] applies to datasets that have a natural separation of their features into two disjoint sets. We demonstrate that when learning from labeled and unlabeled data, algorithms explicitly leveraging a natural independent split of the features outperform algorithms that do not. When a natural split does not exist, co-training algorithms that manufacture a feature split may out-perform algorithms not using a split. These results help explain why co-training algorithms are both discriminative in nature and robust to the assumptions of their embedded classifiers. Categories and Subject Descriptors I.2.6 [Artificial Intelligence]: Learning; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval--- Information Filtering Keywords co-training, expectation-maximization, learning with labeled and unlabeled...
(Show Context)

Citation Context

...ent over EM. 5.1 The News 2x2 dataset To test this question, we create a semi-artificial dataset that has the independence properties we seek. We select four newsgroups from the 20 Newsgroups dataset =-=[11]-=-. We create a two-class problem with class-conditional independence by joining together randomly selected documents from each of the first two newsgroups to make positive examples, and joining togethe...

Learning to Construct Knowledge Bases from the World Wide Web

by Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew McCallum, Tom Mitchell, Kamal Nigam, Sean Slattery , 2000
"... The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable knowledge base whose content mirrors that of the World Wide Web. Such a knowledge base would ena ..."
Abstract - Cited by 242 (5 self) - Add to MetaCart
The World Wide Web is a vast source of information accessible to computers, but understandable only to humans. The goal of the research described here is to automatically create a computer understandable knowledge base whose content mirrors that of the World Wide Web. Such a knowledge base would enable much more effective retrieval of Web information, and promote new uses of the Web to support knowledge-based inference and problem solving. Our approach is to develop a trainable information extraction system that takes two inputs. The first is an ontology that defines the classes (e.g., company, person, employee, product) and relations (e.g., employed_by, produced_by) of interest when creating the knowledge base. The second is a set of training data consisting of labeled regions of hypertext that represent instances of these classes and relations. Given these inputs, the system learns to extract information from other pages and hyperlinks on the Web. This article describes our general a...
(Show Context)

Citation Context

... Pr(c; v i ) log / Pr(c; v i ) Pr(c) Pr(v i ) ! (8) This feature selection method has been found to perform best among several alternatives [63], and has been used in many text classification studies =-=[25, 29, 30, 45, 40]-=-. 5.1.2. Experimental Evaluation We evaluate our method using the data sets and cross-validation methodology described in Section 4. On each iteration of the cross-validation run, we train a classifie...

Learning to Classify Text from Labeled and Unlabeled Documents

by Kamal Nigam, Andrew Mccallum, Tom Mitchell , 1998
"... . This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is significant because in many important text classification problems obtaining classification labels is expensi ..."
Abstract - Cited by 188 (20 self) - Add to MetaCart
. This paper shows that the accuracy of learned text classifiers can be improved by augmenting a small number of labeled training documents with a large pool of unlabeled documents. This is significant because in many important text classification problems obtaining classification labels is expensive, while large quantities of unlabeled documents are readily available. We present a theoretical argument showing that, under common assumptions, unlabeled data contain information about the target function. We then introduce an algorithm for learning from labeled and unlabeled text, based on the combination of Expectation-Maximization with a naive Bayes classifier. The algorithm first trains a classifier using the available labeled documents, and probabilistically labels the unlabeled documents. It then trains a new classifier using the labels for all the documents, and iterates. Experimental results, obtained using text from three different real-world tasks, show that the use of unlabeled...
(Show Context)

Citation Context

...tion of text documents are all violated in practice, and yet empirically, naive Bayes does a good job of classifying text documents (Lewis & l~inguette 1994; Craven et al. 1998; Yang & Pederson 1997; =-=Joachims 1997-=-). This paradox is explained by the fact that classification estimation is only a function of the sign (in binary cases) of the function estimation (Domingos & Pazzani 1997; Friedman 1997). Also note ...

Understanding inverse document frequency: On theoretical arguments for IDF

by Stephen Robertson - Journal of Documentation , 2004
"... The term weighting function known as IDF was proposed in 1972, and has since been extremely widely used, usually as part of a TF*IDF function. It is often described as a heuristic, and many papers have been written (some based on Shannon’s Information Theory) seeking to establish some theoretical ba ..."
Abstract - Cited by 168 (2 self) - Add to MetaCart
The term weighting function known as IDF was proposed in 1972, and has since been extremely widely used, usually as part of a TF*IDF function. It is often described as a heuristic, and many papers have been written (some based on Shannon’s Information Theory) seeking to establish some theoretical basis for it. Some of these attempts are reviewed, and it is shown that the Information Theory approaches are problematic, but that there are good theoretical justifications of both IDF and TF*IDF in traditional probabilistic model of information retrieval.
(Show Context)

Citation Context

... example of an attempt to use information theory for a derivation of a TF*IDF weighting scheme – and we have already seen some problems with such a formulation. (10)sJoachims The approach taken here (=-=Joachims, 1997-=-) appeals not to information theory, but to naïve Bayes machine learning models, as used in the relevance weighting theory (Joachims’ task is not the usual ranked retrieval task, but a categorisation ...

A Simple Relational Classifier

by Sofus A. Macskassy, Foster Provost - Proceedings of the Second Workshop on Multi-Relational Data Mining (MRDM-2003) at KDD-2003 , 2003
"... We analyze a Relational Neighbor (RN) classifier, a simple relational predictive model that predicts only based on class labels of related neighbors, using no learning and no inherent attributes. We show that it performs surprisingly well by comparing it to more complex models such as Probabilist ..."
Abstract - Cited by 111 (12 self) - Add to MetaCart
We analyze a Relational Neighbor (RN) classifier, a simple relational predictive model that predicts only based on class labels of related neighbors, using no learning and no inherent attributes. We show that it performs surprisingly well by comparing it to more complex models such as Probabilistic Relational Models and Relational Probability Trees on three data sets from published work.
(Show Context)

Citation Context

...no learning may perform quite well. While results reported on the relational classifiers (PRMs, RPTs and RBCs) have been compared to non-relational baseline learners (e.g., the naive Bayes classifier =-=[5, 15, 19]-=- or C4.5 [26]), a simple relational classifier is an equally important, and perhaps a more appropriate, point of comparison. We analyze here the Relational Neighbor (RN) [25] classifier as such a simp...

Active + Semi-Supervised Learning = Robust Multi-View Learning

by Ion Muslea Muslea, Steven Minton, Craig A. Knoblock - Proceedings of ICML-02, 19th International Conference on Machine Learning , 2002
"... In a multi-view problem, the features of the domain can be partitioned into disjoint subsets (views) that are sufficient to learn the target concept. ..."
Abstract - Cited by 110 (7 self) - Add to MetaCart
In a multi-view problem, the features of the domain can be partitioned into disjoint subsets (views) that are sufficient to learn the target concept.

Determining the semantic orientation of terms through gloss classification

by Andrea Esuli, Fabrizio Sebastiani - In Proc. CIKM-05 , 2005
"... Sentiment classification is a recent subdiscipline of text classification which is concerned not with the topic a document is about, but with the opinion it expresses. It has a rich set of applications, ranging from tracking users ’ opinions about products or about political candidates as expressed ..."
Abstract - Cited by 104 (4 self) - Add to MetaCart
Sentiment classification is a recent subdiscipline of text classification which is concerned not with the topic a document is about, but with the opinion it expresses. It has a rich set of applications, ranging from tracking users ’ opinions about products or about political candidates as expressed in online forums, to customer relationship management. Functional to the extraction of opinions from text is the determination of the orientation of “subjective ” terms contained in text, i.e. the determination of whether a term that carries opinionated content has a positive or a negative connotation. In this paper we present a new method for determining the orientation of subjective terms. The method is based on the quantitative analysis of the glosses of such terms, i.e. the definitions that these terms are given in on-line dictionaries,
(Show Context)

Citation Context

...g algorithms we have tested are the naive Bayesian learner using the multinomial model (NB), support vector machines using linear kernels, and the PrTFIDF probabilistic version of the Rocchio learner =-=[8]-=- 16 . 5. RESULTS The various combinations of choices of seed set, expansion method (also considering the variable number of expansion steps steps), method for the creation of textual representations, ...

Text-learning and related intelligent agents: A survey.

by D Mladenic - IEEE Intell. Syst., , 1999
"... ..."
Abstract - Cited by 90 (2 self) - Add to MetaCart
Abstract not found
Powered by: Apache Solr
  • About CiteSeerX
  • Submit and Index Documents
  • Privacy Policy
  • Help
  • Data
  • Source
  • Contact Us

Developed at and hosted by The College of Information Sciences and Technology

© 2007-2019 The Pennsylvania State University