## Alignment by Maximization of Mutual Information (1995)

### Cached

### Download Links

- [publications.ai.mit.edu]
- [dspace.mit.edu]
- [www.ai.mit.edu]
- [www.ai.mit.edu]
- [ftp.ai.mit.edu]
- [www.ai.mit.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 757 - 13 self |

### BibTeX

@MISC{Viola95alignmentby,

author = {Paul A. Viola},

title = {Alignment by Maximization of Mutual Information},

year = {1995}

}

### Years of Citing Articles

### OpenURL

### Abstract

### Citations

8564 |
Elements of Information Theory
- Cover, Thomas
- 2006
(Show Context)
Citation Context ... maximum likelihood sense with respect to samples drawn from the random variables. This approach is equivalent to minimizing the cross entropy of the estimated distribution with the true distribution =-=[8]-=-. Such techniques for density estimation are described in [9]. For simplicity, we assume that the covariance matrices are diagonal, / = DIAG(oe 2 1 ; oe 2 2 ; : : :) : (11) We formulate the likelihood... |

3921 |
Pattern Classification and Scene Analysis
- Duda, Hart
- 1973
(Show Context)
Citation Context ...2 exp(\Gamma 1 2 x T / \Gamma1 x) : This method of density estimation is widely known as the Parzen window method. A good discussion of density estimation can be found in the textbook by Duda and Hart=-=[2]-=-. Next we approximate statistical expectation with the sample average over another set of samples B drawn from z, E z (f(z))s1 NB X z i 2B f(z i ) : We may now express an approximation for the entropy... |

3629 |
Neural Networks: A Comprehensive Foundation
- Haykin
- 1994
(Show Context)
Citation Context ...enetrate small local minima. Such local minima are often characteristic of continuous alignment schemes, and we too have found that local minima can be overcome in this manner. The textbook by Haykin =-=[5]-=- discusses the use of such algorithms by the neural network community. An excellent discussion of stochastic approximation appears in the textbook by Ljung and Soderstrom [6]. Simulated annealing [7] ... |

3531 | Optimization by simulated annealing
- Kirkpatrick, Jr, et al.
- 1983
(Show Context)
Citation Context ... [5] discusses the use of such algorithms by the neural network community. An excellent discussion of stochastic approximation appears in the textbook by Ljung and Soderstrom [6]. Simulated annealing =-=[7]-=- is another method that has been used in high-dimensional optimization problems having numerous local minima. It too is a descent technique where noise is explicitly added to the current solution. In ... |

3098 | A computational approach to edge detection
- Canny
- 1986
(Show Context)
Citation Context ...ottom. These 10 images demonstrate that even for a simple Lambertian surface, image variation can be significant. Below this we show the output of a Canny edge detector run on the 10 different images =-=[10]-=-. The variation between the different edge images of the same object is quite striking. Our model consists of 7500 three dimensional points that are uniformly distributed on the surface of the 3 bumps... |

2650 |
Introduction to Statistical Pattern Recognition
- Fukunaga
- 1990
(Show Context)
Citation Context ...the random variables. This approach is equivalent to minimizing the cross entropy of the estimated distribution with the true distribution [8]. Such techniques for density estimation are described in =-=[9]-=-. For simplicity, we assume that the covariance matrices are diagonal, / = DIAG(oe 2 1 ; oe 2 2 ; : : :) : (11) We formulate the likelihood (with respect to variance) of a sample B drawn from z: L(/) ... |

1403 |
Robot Vision
- HORN
- 1986
(Show Context)
Citation Context ...5 9.22 .65 .38 8.35 3.12 80 Table 1: Curved Surface Alignment Data represent T , the imaging transformation from model to image coordinates, as a double quaternion followed by a perspective operation =-=[11]-=-. We use a dot product metric for the normals where the component density has oe = 0.7. We use oe = 0.5 for the image intensities for both the joint and marginal distributions. The size of the random ... |

1070 | An Information-Maximization Approach to Blind Separation and Blind Deconvolution
- Bell, Sejnowski
- 1995
(Show Context)
Citation Context ...t they wish to extract (i.e. the train their disparity detectors on random dot stereograms). Finally, Bell has used a measure of information to separate signals that have been linearly mixed together =-=[24]-=-. His technique assumes that the different mixed signals carry little mutual information. While he does not assume that the distribution has a particular functional form, he does assume that the distr... |

941 |
Face recognition using eigenface
- Turk, Petland
- 1991
(Show Context)
Citation Context ...with changes in lighting and pose. Turk and Pentland have used a large collection of face images to train a system to construct representations that are invariant to some changes in lighting and pose =-=[20]-=-. These representations are a projection onto the largest eigenvectors of the distribution of images within the collection. Their system addresses the problem of recognition rather than alignment, and... |

533 |
Adaptive switching circuits
- Widrow, Hoff
- 1960
(Show Context)
Citation Context ...ochastic Search Techniques Non-linear stochastic gradient descent is commonly used in the neural network literature, where it is often called the LMS rule. It was introduced there by Widrow and Hoff (=-=Widrow and Hoff, 1960-=-) and has been used extensively with good results. Since a stochastic estimate for the gradient of error is much cheaper to compute than a true estimate of the gradient, for many real problems LMS is ... |

449 |
Perceptual organization and visual recognition
- Lowe
- 1999
(Show Context)
Citation Context ...excellent survey articles: [17][18]). A smooth, optimizable version of this metric can be defined by introducing a penalty both for unmatched edges and for the distance between those that are matched =-=[19]-=- [15]. This metric can then be used both for image/model comparison and for pose refinement. Edge based metrics can work under a 17 variety of different lighting conditions, but they make two very str... |

374 |
Theory and practice of recursive identification
- Ljung, Soderstrom
- 1983
(Show Context)
Citation Context ...r. The textbook by Haykin [5] discusses the use of such algorithms by the neural network community. An excellent discussion of stochastic approximation appears in the textbook by Ljung and Soderstrom =-=[6]-=-. Simulated annealing [7] is another method that has been used in high-dimensional optimization problems having numerous local minima. It too is a descent technique where noise is explicitly added to ... |

343 | Multimodal volume registration by maximization of mutual information - Wells, Viola, et al. - 1996 |

246 |
Hierarchical chamfer matching: A parametric edge matching algorithm
- Borgefors
- 1988
(Show Context)
Citation Context ...pproximation of this analogy is due to the dissimilarity between max and softmax. Equation 12 is essentially the measure used in chamfer matching techniques, such as the method described by Borgefors =-=[13]-=-. Huttenlocher [14] has used a related measure in feature matching applications, the Hausdorff distance, which uses maximum instead of the sum that appears in Equation 12. The similarity between geome... |

222 | Three-dimensional object recognition - Besl, Jain - 1985 |

192 |
From basic network principles to neural architecture: emergence of spatial-opponents cells
- Linsker
- 1986
(Show Context)
Citation Context ...either binomial or Gaussian. This both simplifies and limits such approaches. Linsker has used the concept of information maximization to motivate a theory of development in the primary visual cortex =-=[22]-=-. He has been able to predict the development of receptive fields that are very reminiscent of the ones found in primate visual cortex. He uses a Gaussian model both for the signal and the noise. Beck... |

160 | Model-based recognition in robot vision
- Chin, Dyer
- 1986
(Show Context)
Citation Context ...es that represent models and images by collections of edges and define a distance metric between them that is proportional to the number of edges that coincide (see the excellent survey articles: [17]=-=[18]-=-). A smooth, optimizable version of this metric can be defined by introducing a penalty both for unmatched edges and for the distance between those that are matched [19] [15]. This metric can then be ... |

98 |
Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimates of parameters
- Bridle
- 1989
(Show Context)
Citation Context ...D / (z) j z T / \Gamma1 z : Thus, W z (z i ; z j ) is an indicator of the degree of match between its arguments, in a "soft" sense. It is equivalent to using the "softmax" function=-= of neural networks [3]-=- on the negative of the Mahalonobis distance to indicate correspondence between z i and elements of A. Equation 7 may also be expressed as d dT H(z(T ))s1 NB X z i 2B X z j 2A W z (z i ; z j ) d dT 1 ... |

89 |
The upper envelope of Voronoi surfaces and its applications. Discrete Comput. Geom
- Huttenlocher, Kedem, et al.
- 1993
(Show Context)
Citation Context ...s analogy is due to the dissimilarity between max and softmax. Equation 12 is essentially the measure used in chamfer matching techniques, such as the method described by Borgefors [13]. Huttenlocher =-=[14]-=- has used a related measure in feature matching applications, the Hausdorff distance, which uses maximum instead of the sum that appears in Equation 12. The similarity between geometrical matching and... |

73 | Geometry and Photometry in 3D Visual Recognition
- Shashua
- 1992
(Show Context)
Citation Context ...rk to the problem of pose refinement. On a related note Shashua has shown that all of the images, under different lighting, of a Lambertian surface are a linear combination of any three of the images =-=[21]-=-. This bears a clear relation to the work of Pentland in that the eigenvectors of a set of images of same object should span this three dimensional space. Entropy is playing an ever increasing role wi... |

50 | 3-D multi-modality medical image registration using feature space clustering. In: Proc 1st int conf on computer vision, virtual reality and robotics in medicine - Collignon, Vandermeulen, et al. - 1995 |

46 |
Voxel similarity measures for automated image registration
- Hill, Studholme, et al.
- 1994
(Show Context)
Citation Context ...so works well in domains having surface property discontinuities and silhouette information (see Section 3.3). Alignment by extremizing properties of the joint signal has been used by Hill and Hawkes =-=[12]-=- to align MRI, CT, and other medical image modalities. They use third order moments to characterize the clustering of the joint data. We believe that joint information is perhaps a more direct measure... |

30 | Statistical Object Recognition - Wells - 1992 |

23 | Geometry and Photometry - Shashua - 1992 |

5 | Posterior Marginal Pose Estimation - Wells - 1992 |

5 | Statistical Gain Correction and Segmentation of MRI Data - Wells, Grimson, et al. - 1994 |

5 |
Learning to make coherent predictions in domains with discontinuities
- Becker, Hinton
- 1992
(Show Context)
Citation Context ...e noise. Becker and Hinton have used the maximization of mutual information as a framework for learning different low-level processing algorithms such as disparity estimation and curvature estimation =-=[23]-=-. They assume that the signals whose mutual information is to be maximized are Gaussian. In addition, they assume that the only joint information between images is the information that they wish to ex... |