Results 1 - 10
of
11
Document Categorization and Query Generation on the World Wide Web Using WebACE
- AI Review
, 1999
"... We present WebACE, an agent for exploring and categorizing documents on the World Wide Web based on a user profile. The heart of the agent is an unsupervised categorization of a set of documents, combined with a process for generating new queries that is used to search for new related documents and ..."
Abstract
-
Cited by 71 (25 self)
- Add to MetaCart
We present WebACE, an agent for exploring and categorizing documents on the World Wide Web based on a user profile. The heart of the agent is an unsupervised categorization of a set of documents, combined with a process for generating new queries that is used to search for new related documents and for filtering the resulting documents to extract the ones most closely related to the starting set. The document categories are not given a priori. We present the overall architecture and describe two novel algorithms which provide significant improvement over traditional clustering algorithms and form the basis for the query generation and search component of the agent. We report on the results of our experiments comparing these new algorithms with more traditional clustering algorithms and we show that our algorithms are fast and scalable.
On the performance of bisecting K-means and PDDP
- Proceedings of the First SIAM International Conference on Data Mining (ICDM-2001
, 2001
"... The problem this paper focuses on is the unsupervised clustering of a data-set. The data-p× ..."
Abstract
-
Cited by 13 (4 self)
- Add to MetaCart
The problem this paper focuses on is the unsupervised clustering of a data-set. The data-p×
Nonparametric identification of finite mixture models of dynamic discrete choices,” Queen’s University Working Paper, forthcoming in Econometrica. Available at http://www.econ.queensu.ca/faculty/shimotsu/papers/initialEM.pdf
- Journal of Political Economy
, 2008
"... The copyright to this Article is held by the Econometric Society. It may be downloaded, printed and reproduced only for educational or research purposes, including use in course packs. No downloading or copying may be done for any commercial purpose without the explicit permission of the Econometric ..."
Abstract
-
Cited by 12 (3 self)
- Add to MetaCart
The copyright to this Article is held by the Econometric Society. It may be downloaded, printed and reproduced only for educational or research purposes, including use in course packs. No downloading or copying may be done for any commercial purpose without the explicit permission of the Econometric Society. For such commercial purposes contact the Office of the Econometric Society (contact information may be found at the website http://www.econometricsociety.org or in the back cover of Econometrica). This statement must the included on all copies of this Article that are made available electronically or in any other
Cluster Selection in Divisive Clustering Algorithms
- SIAM Internation Conference on Data Mining
, 2002
"... The problem this paper focuses on is the classical problem of unsupervised clustering of a data-set. In particular, the bisecting divisive clustering approach is here considered. This approach consists in recursively splitting a cluster into two sub-clusters, starting from the main data-set. This is ..."
Abstract
-
Cited by 9 (1 self)
- Add to MetaCart
The problem this paper focuses on is the classical problem of unsupervised clustering of a data-set. In particular, the bisecting divisive clustering approach is here considered. This approach consists in recursively splitting a cluster into two sub-clusters, starting from the main data-set. This is one of the more basic and common problems in fields like pattern
Hierarchical Taxonomies using Divisive Partitioning
, 1998
"... We propose an unsupervised divisive partitioning algorithm for document data sets which enjoys many favorable properties. In particular, the algorithm shows excellent scalability to large data collections and produces high quality clusters which are competitive with other clustering methods. The alg ..."
Abstract
-
Cited by 8 (3 self)
- Add to MetaCart
We propose an unsupervised divisive partitioning algorithm for document data sets which enjoys many favorable properties. In particular, the algorithm shows excellent scalability to large data collections and produces high quality clusters which are competitive with other clustering methods. The algorithm yields information on the significant and distinctive words within each cluster, and these words can be inserted into the naturally occuring hierarchical structure produced by the algorithm. The result is an automatically generated hierarchical topical taxonomy of a document set. In this paper, we show how the algorithm's cost scales up linearly with the size of the data, illustrate experimentally the quality of the clusters produced, and show how the algorithm can produce a hierarchical topical taxonomy.
Nonparametric Identification and Estimation of Finite Mixture Models of Dynamic Discrete Choices
, 2006
"... In dynamic discrete choice analysis, controlling for unobserved heterogeneity is an important issue, and finite mixture models provide flexible ways to account for unobserved heterogeneity. This paper studies nonparametric identifiability of type probabilities and type-specific component distributio ..."
Abstract
-
Cited by 3 (2 self)
- Add to MetaCart
In dynamic discrete choice analysis, controlling for unobserved heterogeneity is an important issue, and finite mixture models provide flexible ways to account for unobserved heterogeneity. This paper studies nonparametric identifiability of type probabilities and type-specific component distributions in finite mixture models of dynamic discrete choices. We derive sufficient conditions for nonparametric identification for various finite mixture models of dynamic discrete choices used in applied work. Three elements emerge as the important determinants of identification; the time-dimension of panel data, the number of values the covariates can take, and the heterogeneity of the response of different types to changes in the covariates. For example, in a simple case, a time-dimension of T = 3 is sufficient for identification, provided that the number of values the covariates can take is no smaller than the number of types, and that the changes in the covariates induce sufficiently heterogeneous variations in the choice probabilities across types. Type-specific components are identifiable even when state dependence is present as long as the panel has a moderate time-dimension (T ≥ 6). We also develop a series logit estimator for finite mixture models of dynamic discrete choices and derive its convergence rate.
Nonparametric Identification and Estimation of Multivariate Mixtures
, 2008
"... This article analyzes the identifiability of k-variate, M-component finite mixture models without making parametric assumptions on the component distributions. We consider the identifiability of both the number of components and the component distributions. Under the assumption of conditionally inde ..."
Abstract
-
Cited by 3 (0 self)
- Add to MetaCart
This article analyzes the identifiability of k-variate, M-component finite mixture models without making parametric assumptions on the component distributions. We consider the identifiability of both the number of components and the component distributions. Under the assumption of conditionally independent marginals that have been used in the existing literature, we reveal an important link between the number of variables (k), the number of values each variable can take, and the number of identifiable components. The number of components (M) is nonparametrically identifiable if k ≥ 2 and each element of the variables takes at least M different values. The mixing proportions and the component distributions are nonparametrically identified if k ≥ 3 and each element of the variables takes at least M different values. Our requirement on k substantially improves the existing work, which requires either k ≥ 2M − 1 or k ≥ 6M log M. The number of components is identified by the rank of a matrix constructed from the distribution function of the data. Exploiting this property, we propose a procedure to nonparametrically estimate the number of components.
Contents lists available at ScienceDirect Linear Algebra and its Applications
"... journal homepage: www.elsevier.com/locate/laa A concise proof of Kruskal’s theorem on tensor ..."
Abstract
- Add to MetaCart
journal homepage: www.elsevier.com/locate/laa A concise proof of Kruskal’s theorem on tensor
The Effect Restoration from Measurement Bias in Causal Inference
"... This paper highlights several areas where graphical techniques can be harnessed to address the problem of measurement errors in causal inference. In particulars,the paper discusses the control of partially observable confounders in parametric and non parametric models and the computational problem o ..."
Abstract
- Add to MetaCart
This paper highlights several areas where graphical techniques can be harnessed to address the problem of measurement errors in causal inference. In particulars,the paper discusses the control of partially observable confounders in parametric and non parametric models and the computational problem of obtaining bias-free effect estimates in such models.

