## A maximum entropy approach to species distribution modeling (2004)

### Cached

### Download Links

Venue: | In Proceedings of the Twenty-First International Conference on Machine Learning |

Citations: | 39 - 6 self |

### BibTeX

@INPROCEEDINGS{Phillips04amaximum,

author = {Steven J. Phillips and Miroslav Dudík and Robert E. Schapire},

title = {A maximum entropy approach to species distribution modeling},

booktitle = {In Proceedings of the Twenty-First International Conference on Machine Learning},

year = {2004}

}

### Years of Citing Articles

### OpenURL

### Abstract

We study the problem of modeling species geographic distributions, a critical problem in conservation biology. We propose the use of maximum-entropy techniques for this problem, specifically, sequential-update algorithms that can handle a very large number of features. We describe experiments comparing maxent with a standard distribution-modeling tool, called GARP, on a dataset containing observation data for North American breeding birds. We also study how well maxent performs as a function of the number of training examples and training time, analyze the use of regularization to avoid overfitting when the number of examples is small, and explore the interpretability of models constructed using maxent. 1.

### Citations

1084 | A maximum entropy approach to natural language processing
- Berger, Pietra, et al.
- 1996
(Show Context)
Citation Context ...hine learning community. To address this problem, we propose the application of maximum-entropy (maxent) techniques which have been so effective in other domains, such as natural language processing (=-=Berger et al., 1996-=-). Briefly, in maxent, one is given a set of samples from a distribution over some space, as well as a set of features (real-valued functions) on this space. The idea of maxent is to estimate the targ... |

551 | Inducing features of random fields
- Pietra, Pietra, et al.
- 1997
(Show Context)
Citation Context ...al variables (or functions thereof). See Figure 1 for an example. In Section 2, we describe the basics of maxent in greater detail. Iterative scaling and its variants (Darroch & Ratcliff, 1972; Della =-=Pietra et al., 1997-=-) are standard algorithms for computing the maximum entropy distribution. We use our own variant which iteratively updates the weights on Figure 1. Left to right: Yellow-throated Vireo training locali... |

429 |
Generalized iterative scaling for log-linear models
- Darroch, Ratchli
- 1972
(Show Context)
Citation Context ...the features are the environmental variables (or functions thereof). See Figure 1 for an example. In Section 2, we describe the basics of maxent in greater detail. Iterative scaling and its variants (=-=Darroch & Ratcliff, 1972-=-; Della Pietra et al., 1997) are standard algorithms for computing the maximum entropy distribution. We use our own variant which iteratively updates the weights onFigure 1. Left to right: Yellow-thr... |

230 | A comparison of algorithms for maximum entropy parameter estimation
- Malouf
- 2002
(Show Context)
Citation Context ...thms for finding the maxent distribution, especially iterative scaling and its variants (Darroch & Ratcliff, 1972; Della Pietra et al., 1997) as well as the gradient and second-order descent methods (=-=Malouf, 2002-=-; Salakhutdinov et al., 2003). In this paper, we used a sequential-update algorithm that modifies one weight # j at a time, as explored by Collins, Schapire and Singer (2002) in a similar setting. We ... |

163 | Representing twentiethcentury space-time climate variability. Part I: Development of a 1961–90 mean monthly terrestrial climatology
- New, Hulme, et al.
- 1999
(Show Context)
Citation Context ...re cells, and are all included with the GARP distribution, available at http://www.lifemapper.org/desktopgarp. Some coverages are derived from weather station readings during the period 1961 to 1990 (=-=New et al., 1999-=-). Out of these we use annual precipitation, number of wet days, average daily temperature and temperature range. The remaining coverages are derived from a digital elevation model for North America, ... |

86 | R: A Survey of Smoothing Techniques for ME Models
- SF, Rosenfeld
(Show Context)
Citation Context ...tor λ is bounded. (See (Dudík et al., 2004) for details.) In a Bayesian framework, Eq. (3) corresponds to a negative log posterior given a Laplace prior. Other priors studied for maxent are Gaussian (=-=Chen & Rosenfeld, 2000-=-) and exponential (Goodman, 2003). Laplace priors have been studied in the context of neural networks by Williams (1995). The regularized formulation can be solved using a simple modification of the a... |

68 |
Concluding remarks
- Hutchinson
- 1957
(Show Context)
Citation Context ...tical problem in conservation biology: to save a threatened species, one first needs to know where the species prefers to live, and what its requirements are for survival, i.e., its ecological niche (=-=Hutchinson, 1957-=-). The data available for this problem typically consists of a list of georeferenced occurrence localities, i.e., a set of geographic coordinates where the species has been observed. In addition, ther... |

60 |
The GARP modelling system: problems and solutions to automated spatial prediction
- Stockwell, Peters
- 1999
(Show Context)
Citation Context ..., generalized additive models, bioclimatic envelopes and more; see Elith (2002) for a comparison. From these, we selected the Genetic Algorithm for Ruleset Prediction (GARP) (Stockwell & Noble, 1992; =-=Stockwell & Peters, 1999-=-), because it has seen widespread recent use to study diverse topics such as global warming (Thomas et al., 2004), infectious diseases (Peterson & Shaw, 2003) and invasive species (Peterson & Robins, ... |

57 | Exponential priors for maximum entropy models
- Goodman
- 2004
(Show Context)
Citation Context ...4) for details.) In a Bayesian framework, Eq. (3) corresponds to a negative log posterior given a Laplace prior. Other priors studied for maxent are Gaussian (Chen & Rosenfeld, 2000) and exponential (=-=Goodman, 2003-=-). Laplace priors have been studied in the context of neural networks by Williams (1995). The regularized formulation can be solved using a simple modification of the above algorithm. On each round, a... |

52 | Performance guarantees for regularized maximum entropy density estimation
- Dudík, Phillips, et al.
- 2004
(Show Context)
Citation Context ...reases the possibility of overfitting, leading others to use feature selection for maxent (Berger et al., 1996). We instead use a regularization approach, introduced in a companion theoretical paper (=-=Dudík et al., 2004-=-), which allows one to prove bounds on the performance of maxent using finite data, even when the number of features is very large or even uncountably infinite. Here we investigate in detail the pract... |

50 |
Algorithms for maximum-likelihood logistic regression
- Minka
- 2001
(Show Context)
Citation Context ...teration: evaluate the log loss when # j is incremented by 2 i # for i = 0, 1, . . . in turn, and choose the last i before the log loss decreases. This is similar to line search methods described in (=-=Minka, 2001-=-). 2.2. Regularization The basic approach described above computes the maximum entropy distributions# for whichs#[f j ] =s#[f j ]. However, we do not expects#[f j ] to be equal to #[f j ] but only clo... |

44 |
AT (2003) Evaluating predictive models of species’ distributions: criteria for selecting optimal models. Ecol Model 162:211–232 Araújo MB, Williams PH (2000) Selecting areas for species persistence using occurrence data. Biol Conserv 96:331–345 Araújo M
- RP, Lew, et al.
- 1990
(Show Context)
Citation Context ... showing how sensitive our results are to the choice of β. To reduce the variability inherent in GARP’s random search procedure, we made composite GARP predictions using the “best-subsets” procedure (=-=Anderson et al., 2003-=-), as was done in recent applications (Peterson et al., 2003; Raxworthy et al., 2004). We generated 100 binary models, using GARP version 1.1.3 with default parameter values, then eliminated models wi... |

41 | Extinction risk from climate change - CD, Cameron, et al. - 2004 |

28 | Sequential Conditional Generalized Iterative Scaling - Goodman |

28 |
Lutzomyia vectors for cutaneous leishmaniasis in Southern Brazil: ecological niche models, predicted geographic distributions, and climate change effects
- AT, Shaw
(Show Context)
Citation Context ...iction (GARP) (Stockwell & Noble, 1992; Stockwell & Peters, 1999), because it has seen widespread recent use to study diverse topics such as global warming (Thomas et al., 2004), infectious diseases (=-=Peterson & Shaw, 2003-=-) and invasive species (Peterson & Robins, 2003); many further applications are cited in these references. GARP was also selected because it is one of the few methods available that does not require a... |

26 | Induction of sets of rules from animal distribution data: a robust and informative method of data analysis
- Stockwell, Noble
- 1992
(Show Context)
Citation Context ...generalized linear models, generalized additive models, bioclimatic envelopes and more; see Elith (2002) for a comparison. From these, we selected the Genetic Algorithm for Ruleset Prediction (GARP) (=-=Stockwell & Noble, 1992-=-; Stockwell & Peters, 1999), because it has seen widespread recent use to study diverse topics such as global warming (Thomas et al., 2004), infectious diseases (Peterson & Shaw, 2003) and invasive sp... |

21 | On the convergence of bound optimization algorithms
- Salakhutdinov, Roweis, et al.
(Show Context)
Citation Context ...ng the maxent distribution, especially iterative scaling and its variants (Darroch & Ratcliff, 1972; Della Pietra et al., 1997) as well as the gradient and second-order descent methods (Malouf, 2002; =-=Salakhutdinov et al., 2003-=-). In this paper, we used a sequential-update algorithm that modifies one weight λj at a time, as explored by Collins, Schapire and Singer (2002) in a similar setting. We chose this coordinate-wise de... |

20 |
The North American Breeding Bird Survey, Results and Analysis 1966–2005. Version 6.2
- Sauer, Hines, et al.
- 2005
(Show Context)
Citation Context ...ecause it is one of the few methods available that does not require absence data (negative examples). We compare GARP and maxent using data derived from the North American Breeding Bird Survey (BBS) (=-=Sauer et al., 2001-=-), an extensive dataset consisting of thousands of occurrence localities for North American birds and used previously for species distribution modeling, in particular for evaluating GARP (Peterson, 20... |

17 |
Predicting distributions of known and unknown reptile species in Madagascar. Nature 426:837–841
- Raxworthy, Martinez-Meyer, et al.
- 2003
(Show Context)
Citation Context ...y inherent in GARP's random search procedure, we made composite GARP predictions using the "best-subsets" procedure (Anderson et al., 2003), as was done in recent applications (Peterson et a=-=l., 2003; Raxworthy et al., 2004-=-). We generated 100 binary models, using GARP version 1.1.3 with default parameter values, then eliminated models with more than 5% intrinsic omission (negative prediction of training localities). If ... |

16 |
Quantitative methods for modeling species habitat: Comparative performance and an application to Australian plants. Pages 39–58 in Quantitative methods for conservation biology
- ELITH
(Show Context)
Citation Context ...irable for a species distribution model to allow interpretation to deduce the most important limiting factors for the species. A noted limitation of GARP is the difficulty of interpreting its models (=-=Elith, 2002-=-). We show how the models generated by maxent can be put into a form that is easily understandable and interpretable by humans. 2. The Maximum Entropy Approach In this section, we describe our approac... |

16 |
Predicting species geographic distributions based on ecological niche modeling
- Peterson
- 2001
(Show Context)
Citation Context ...et al., 2001), an extensive dataset consisting of thousands of occurrence localities for North American birds and used previously for species distribution modeling, in particular for evaluating GARP (=-=Peterson, 2001-=-). The comparison suggests that maxent methods hold great promise for species distribution modeling, often achieving substantially superior performance in controlled experiments relative to GARP. In a... |

9 |
Evaluation of museum collection data for use in biodiversity assessment
- Ponder, Carter, et al.
- 2001
(Show Context)
Citation Context ... often the case that only presence data is available indicating the occurrence of the species. Natural history museum and herbarium collections constitute the richest source of occurrence localities (=-=Ponder et al., 2001-=-; Stockwell & Peterson, 2002). Their collections typically have no information about the failure to observe the species at any given location; in addition, many locations have not been surveyed. In th... |

7 |
Modeling species' geographic distributions for preliminary conservation assessments: an implementation with the spiny pocket mice (Heteromys) of Ecuador
- Anderson, P, et al.
- 2004
(Show Context)
Citation Context ...ographic region of interest. The goal is to predict which areas within the region satisfy the requirements of the species’ ecological niche, and thus form part of the species’ potential distribution (=-=Anderson & Martínez-Meyer, 2004-=-). The potential distribution describes where conditions are suitable for survival of the species, and is thus of great importance for conservation. It can also be used to estimate the species’ realiz... |

6 |
Predicting the potential invasive distributions of four alien plant species in North America
- Peterson, Papes, et al.
- 2003
(Show Context)
Citation Context ...o reduce the variability inherent in GARP’s random search procedure, we made composite GARP predictions using the “best-subsets” procedure (Anderson et al., 2003), as was done in recent applications (=-=Peterson et al., 2003-=-; Raxworthy et al., 2004). We generated 100 binary models, using GARP version 1.1.3 with default parameter values, then eliminated models with more than 5% intrinsic omission (negative prediction of t... |

4 |
Controlling bias in biodiversity data
- Stockwell, Peterson
- 2002
(Show Context)
Citation Context ...only presence data is available indicating the occurrence of the species. Natural history museum and herbarium collections constitute the richest source of occurrence localities (Ponder et al., 2001; =-=Stockwell & Peterson, 2002-=-). Their collections typically have no information about the failure to observe the species at any given location; in addition, many locations have not been surveyed. In the lingo of machine learning,... |

3 |
Using ecological-niche modeling to predict barred owl invasions with implications for spotted owl conservation
- Peterson, Robins
- 2003
(Show Context)
Citation Context ...well & Peters, 1999), because it has seen widespread recent use to study diverse topics such as global warming (Thomas et al., 2004), infectious diseases (Peterson & Shaw, 2003) and invasive species (=-=Peterson & Robins, 2003-=-); many further applications are cited in these references. GARP was also selected because it is one of the few methods available that does not require absence data (negative examples). We compare GAR... |

3 | Niche modeling and geographic range predictions in the marine environment using a machine-learning algorithm
- Wiley, McNyset, et al.
- 2003
(Show Context)
Citation Context ..., we must interpret as "negative examples" all grid cells with no occurrence localities, even if they support good environmental conditions for the species. The maximumAUC is therefore less =-=than one (Wiley et al., 2003-=-), and is smaller for wider-ranging species. 4. Results 4.1. Equalized Area Test The results of the equalized area test are in Table 2. With a threshold of 1, GARP predicts large areas as having suita... |