## Learnable Similarity Functions and Their Applications to Clustering and Record Linkage (2004)

### Cached

### Download Links

- [www.cs.utexas.edu]
- [www.cs.utexas.edu]
- [www.cs.utexas.edu]
- [www.cs.utexas.edu]
- [www.cs.utexas.edu]
- [www.cs.utexas.edu]
- [www.cs.utexas.edu]
- [www.cs.utexas.edu]
- DBLP

### Other Repositories/Bibliography

Citations: | 7 - 0 self |

### BibTeX

@MISC{Bilenko04learnablesimilarity,

author = {Mikhail Bilenko},

title = {Learnable Similarity Functions and Their Applications to Clustering and Record Linkage},

year = {2004}

}

### OpenURL

### Abstract

rship (Xing et al. 2003), and relative comparisons (Schultz & Joachims 2004). These approaches have shown improvements over traditional similarity functions for different data types such as vectors in Euclidean space, strings, and database records composed of multiple text fields. While these initial results are encouraging, there still remains a large number of similarity functions that are currently unable to adapt to a particular domain. In our research, we attempt to bridge this gap by developing both new learnable similarity functions and methods for their application to particular problems in machine learning and data mining. In preliminary work, we proposed two learnable similarity functions for strings that adapt distance computations given training pairs of equivalent and non-equivalent strings (Bilenko & Mooney 2003a). The first function is based on a probabilistic model of edit distance with affine gaps (Gus- Copyright c # 2004, American Association for Artificial Intelli