## Classification in Very High Dimensional Problems with Handfuls of Examples

### Cached

### Download Links

- [www.ri.cmu.edu]
- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- [www.cs.cmu.edu]
- [www-cgi.cs.cmu.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 5 - 0 self |

### BibTeX

@MISC{Palatucci_classificationin,

author = {Mark Palatucci and Tom M. Mitchell},

title = {Classification in Very High Dimensional Problems with Handfuls of Examples},

year = {}

}

### OpenURL

### Abstract

Abstract. Modern classification techniques perform well when the number of training examples exceed the number of features. If, however, the number of features greatly exceed the number of training examples, then these same techniques can fail. To address this problem, we present a hierarchical Bayesian framework that shares information between features by modeling similarities between their parameters. We believe this approach is applicable to many sparse, high dimensional problems and especially relevant to those with both spatial and temporal components. One such problem is fMRI time series, and we present a case study that shows how we can successfully classify in this domain with 80,000 original features and only 2 training examples per class. 1

### Citations

1457 | Bayesian Data Analysis
- Gelman, Carlin, et al.
- 1995
(Show Context)
Citation Context ...of time. 1.2 Related Work Hierarchical Bayesian methods have been used for quite some time within the statistics community for combining data from similar experiments. A good introduction is given in =-=[6]-=-. In general, these methods fall under “shrinkage” estimators. If we want to estimate several quantities that we believe are related, then in the absence of large sample sizes for the individual quant... |

497 | Multitask learning
- Caruana
- 1997
(Show Context)
Citation Context ...ed, this can provide a better estimate of the individual means. Shrinkage estimators are very similar in spirit to multi-task learning algorithms within the machine learning community. With multi-task=-=[3]-=- or “lifelong” learning, the goal is to leverage “related” information from similar tasks to help the current learning task [14]. The overall goal in both these communities is to learn concepts with f... |

204 | On bias, variance, 0/1 - loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery 1:55–77
- Friedman
- 1997
(Show Context)
Citation Context ...x (CALC). however, that sharing the variance parameter plays the larger role in improving overall classification accuracy. While this might seem surprising at first, some interesting theoretical work =-=[5]-=- shows that in the bias/variance decomposition under 0/1 loss, the variance dominates the bias. This may suggest why sharing the variance parameters caused the larger increase in performance. 4 Conclu... |

142 |
Learning to decode cognitive states from brain images
- Mitchell, Hutchinson, et al.
(Show Context)
Citation Context ...t it is possible to classify cognitive states from fMRI data. For example, researchers have been able to determine the category of words that a person is reading (e.g. fruits, buildings, tools, etc.) =-=[10]-=- by analyzing fMR images of their neural activity. Others have shown that is is possible to classify between drug addicted persons and non-drug using controls [15]. One study even used fMRI data to cl... |

92 | Exploiting task relatedness for multiple task learning
- Ben-David, Schuller
- 2003
(Show Context)
Citation Context ...h fewer data. A good example of using hierarchical Bayes in a multi-task learning application is [7]. There has also been some interesting theoretical work to explain why these methods are beneficial =-=[2, 1]-=-. Hierarchical Bayesian methods have been applied successfully within the fMRI domain to the task of multiple subject classification. [13] demonstrates a hierarchical model that improves classificatio... |

77 | A Bayesian/information theoretical model of learning to learn via multi-task sampling
- Baxter
- 1997
(Show Context)
Citation Context ...h fewer data. A good example of using hierarchical Bayes in a multi-task learning application is [7]. There has also been some interesting theoretical work to explain why these methods are beneficial =-=[2, 1]-=-. Hierarchical Bayesian methods have been applied successfully within the fMRI domain to the task of multiple subject classification. [13] demonstrates a hierarchical model that improves classificatio... |

43 |
Classifying spatial patterns of brain activity with machine learning methods: Application to lie detection
- Davatzikos, Ruparel, et al.
- 2005
(Show Context)
Citation Context ...ve shown that is is possible to classify between drug addicted persons and non-drug using controls [15]. One study even used fMRI data to classify whether participants were lying or telling the truth =-=[4]-=-. Classification in this domain is tricky. The data are very high dimensional and noisy, and training examples are sparse. A typical experiment takes a 3D volumetric image of the brain every second. E... |

25 | Bayesian Statistics
- Lee
- 1989
(Show Context)
Citation Context ...ld then require simulation to calculate the posterior for θj. Another, more tractable approach is to estimate the hyperparameters directly from the data. This technique is often called empirical Bayes=-=[9]-=- and uses point estimates for the hyperparameters: �µ = 1 J J� j=1 ¯X•j �τ 2 = 1 J J� ( ¯ X•j − �µ) 2 Here we are just taking the sample mean and variance for all the individual sample means. We use a... |

22 | Bayesian network learning with parameter constraints
- Niculescu, Mitchell, et al.
(Show Context)
Citation Context ...data from multiple subjects within the same study. Our model, by contrast, focuses on sharing information between features of a single subject. The most similar work to ours within the fMRI domain is =-=[11, 12]-=-. This work demonstrates that sharing parameters between voxels can lead to more accurate models of the fMRI data. Our work by comparison, does not directly couple the parameters of shared features, b... |

16 | Solving a huge number of similar tasks: a combination of multi-task learning and hierarchical Bayesian modeling
- Heskes
- 1998
(Show Context)
Citation Context ...elp the current learning task [14]. The overall goal in both these communities is to learn concepts with fewer data. A good example of using hierarchical Bayes in a multi-task learning application is =-=[7]-=-. There has also been some interesting theoretical work to explain why these methods are beneficial [2, 1]. Hierarchical Bayesian methods have been applied successfully within the fMRI domain to the t... |

9 | Hidden process models
- Hutchinson, Mitchell, et al.
- 2006
(Show Context)
Citation Context ... � βk→j ¯ X•kt This allows us to find estimates � βk→j using the usual method of least squares: �βk→j = min β T� ( ¯ X•jt − β ¯ X•kt) 2 t=1 which is given by: �T �βk→j t=1 = ¯ X•jt ¯ X•kt �T t=1 ¯ X2 =-=(8)-=- •kt Now that we have our sharing groups and parameter transformation functions we can define a hierarchical model for fMRI: Xivt|θvt ∼ N(θvt,σ 2 ) θvt ∼ N(µvt,τ 2 vt) Combining all these equations to... |

9 | Exploiting Parameter Domain Knowledge for Learning in Bayesian Networks
- Niculescu
- 2005
(Show Context)
Citation Context ...data from multiple subjects within the same study. Our model, by contrast, focuses on sharing information between features of a single subject. The most similar work to ours within the fMRI domain is =-=[11, 12]-=-. This work demonstrates that sharing parameters between voxels can lead to more accurate models of the fMRI data. Our work by comparison, does not directly couple the parameters of shared features, b... |

7 | Learning To Learn: Introduction
- Thrun
- 1996
(Show Context)
Citation Context ...earning algorithms within the machine learning community. With multi-task[3] or “lifelong” learning, the goal is to leverage “related” information from similar tasks to help the current learning task =-=[14]-=-. The overall goal in both these communities is to learn concepts with fewer data. A good example of using hierarchical Bayes in a multi-task learning application is [7]. There has also been some inte... |

6 | Exploring temporal information in functional magnetic resonance imaging brain data
- Zhang, Samaras, et al.
- 2005
(Show Context)
Citation Context ...e.g. fruits, buildings, tools, etc.) [10] by analyzing fMR images of their neural activity. Others have shown that is is possible to classify between drug addicted persons and non-drug using controls =-=[15]-=-. One study even used fMRI data to classify whether participants were lying or telling the truth [4]. Classification in this domain is tricky. The data are very high dimensional and noisy, and trainin... |

1 | Hierarchical gaussian naive bayes classifier for multiple-subject fmri data
- Rustandi
- 2006
(Show Context)
Citation Context ...heoretical work to explain why these methods are beneficial [2, 1]. Hierarchical Bayesian methods have been applied successfully within the fMRI domain to the task of multiple subject classification. =-=[13]-=- demonstrates a hierarchical model that improves classification of a single human subject by combining data from multiple subjects within the same study. Our model, by contrast, focuses on sharing inf... |