Northeast Brook Trout Occupancy

Northeast Brook Trout Occupancy Model

View the Project on GitHub Conte-Ecology/Northeast_Bkt_Occupancy


Coming Soon


  1. Evaluate landscape, land-use, and climate factors affecting the probability of Brook Trout occupancy in the eastern United States
  2. Predict Brook Trout occupancy in each stream reach (confluence to confluence) across the region currently
  3. Forecast Brook Trout occupancy under future conditions
  4. Examine the tolerance of Brook Trout to warming and forest change across space
  5. Visualize climate mitigation potential through forest change


We used a logistic mixed effects model to include the effects of landscape, land-use, and climate variables on the probability of Brook Trout occupancy in stream reaches (confluence to confluence). We included random effects of HUC10 (watershed) to allow for the chance that the probability of occupancy and the effect of covariates were likely to be similar within a watershed. Our fish data came primarily from state and federal agencies (see below). We considered a stream occupied if any Brook Trout were ever caught during an electrofishing survey between 1991 and 2010.

Observed Presence-Absense Data (Dependent Data)

state data_source n_reaches min_yr max_yr range_yrs
CT CTDEEP 1267 1991 2010 19
CT NYDEC 2 1991 2010 19
MA CTDEEP 9 1991 2010 19
MA MADFW 321 1991 2010 19
MA NYDEC 4 2008 2010 2
MD Hitt 2 1991 2010 19
MD PFBC 1 1991 2010 19
ME MEIFW 1881 1991 2010 19
NH CTDEEP 1 1991 2010 19
NH MADFW 4 1991 2010 19
NH MEIFW 6 1995 2010 15
NH VTFWD 1 1991 2010 19
NJ NYDEC 2 1992 1994 2
NY CTDEEP 3 1991 2010 19
NY MADFW 2 2008 2010 2
NY NYDEC 4350 1991 2010 19
PA Hitt 5 1999 2005 6
PA NYDEC 6 1991 2010 19
PA PFBC 857 1991 2010 19
RI CTDEEP 2 1991 2010 19
VT MADFW 1 2004 2004 0
VT NYDEC 1 1995 1995 0
VT VTFWD 319 1991 2010 19
NA MEIFW 6 2007 2007 0
NA VTFWD 2 2006 2009 3

Predictor Variables

Documentation related to the landscape, land-use, streams, catchment delineation, and climate variable data sources and processing can be found at

Variable Description Source Processing GitHub Repository
Total Drainage Area The total contributing drainage area from the entire upstream network The SHEDS Data project The individual polygon areas are summed for all of the catchments in the contributing network NHDHRDV2
Riparian Forest Cover The percentage of the upstream 200ft riparian buffer area that is covered by trees taller than 5 meters The National LandCover Database (NLCD) All of the NLCD forest type classifications are combined and attributed to each riparian buffer polygon using GIS tools. All upstream polygon values are then aggregated. nlcdLandCover
Daily Precipition The daily precipitation record for the individual local catchment Daymet Daily Surface Weather and Climatological Summaries Daily precipitation records are spatially assigned to each catchment based on overlapping grid cells using the zonalDaymet R package daymet
Upstream Impounded Area The total area in the contributing drainage basin that is covered by wetlands, lakes, or ponds that intersect the stream network U.S. Fish & Wildlife Service (FWS) National Wetlands Inventory All freshwater surface water bodies are attributed to each catchment using GIS tools. All upstream polygon values are then aggregated. fwsWetlands
Percent Agriculture The percentage of the contributing drainage area that is covered by agricultural land (e.g. cultivated crops, orchards, and pasture) including fallow land. The National LandCover Database All of the NLCD agricutlural classifications are combined and attributed to each catchment polygon using GIS tools. All upstream polygon values are then aggregated. nlcdLandCover
Percent High Intensity Developed The percentage of the contributing drainage area covered by places where people work or live in high numbers (typically defined as areas covered by more than 80% impervious surface) The National LandCover Database The NLCD high intensity developed classification is attributed to each catchment polygon using GIS tools. All upstream polygon values are then aggregated. nlcdLandCover

General Results

Table of Model Results

Fixed Effects:

Parameter Estimate Std. Error z value P-value
(Intercept) 0.314 0.11 2.84 0.00445
area -0.416 0.0591 -7.04 1.89e-12
summer_prcp_mm 0.385 0.0978 3.94 8.14e-05
meanJulyTemp -0.706 0.0719 -9.82 9.03e-23
forest 0.413 0.0686 6.02 1.71e-09
surfcoarse 0.165 0.0586 2.81 0.00494
allonnet -0.291 0.0568 -5.13 2.83e-07
devel_hi -0.0996 0.0569 -1.75 0.0799
agriculture -0.664 0.0995 -6.67 2.57e-11
area:summer_prcp_mm 0.0217 0.0503 0.432 0.666
meanJulyTemp:forest -0.034 0.0501 -0.678 0.498
summer_prcp_mm:forest 0.127 0.0585 2.17 0.0302

Random Effects (HUC10):

Parameter SD Variance
(Intercept) 1.34 1.79
area 0.211 0.0443
agriculture 0.353 0.124
summer_prcp_mm 0.534 0.285
meanJulyTemp 0.235 0.0553

These results indicate that mean July stream temperature had the largest (negative) effect on the probability of Brook Trout occupancy. Forest cover within the 200 foot riparian buffer had a strong positive effect on occupancy, whereas agriculture within the entire upstream drainage had a negative effect on occupancy. Mean summer precipitation has a positive effect on occupancy and the effect was larger with increasing levels of riparian forest cover, but was not dependent on stream drainage area. The total impounded area on the stream network (allonnet) had a negative effect on Brook Trout occupancy as did the upstream drainage area. Surficial coarseness was positively correlated with the presence of Brook Trout, which may be a result of better physical habitat structure or as an indication of local groundwater upwelling.

Effect of mean July stream temperature on Brook Trout Occupancy

Effect of riparian forest cover on Brook Trout Occupancy

The average occupancy across the range of observed catchments was 0.58.

The effects of these landscape and climate characteristics are similar to what has been observed in other Brook Trout studies. (More detailed comparison with Downstream Strategies and DeWeber and Wager (2014) coming soon)

Model Fit

We examined the false positive and false negative rates and used the Area Under the Receiver Operating Characteristic ROC) curve (AUC) to assess the model fit.


  • AUC - measures a model’s ability to determine which locations are occupied (Zipkin et al. (2012) Ecological Applications)
  • Sensitivity - true positive rate (=“recall rate”)
  • Specificity - true negative rate
  • 1-Specificity - false positive rate (Type I Error rate; =“fall-out rate”)
  • 1-Sensitivity - false negative rate (Type II Error rate; =“miss rate”)
  • Accuracy - ability to identify true positives and true negatives
  • ROC - The curve is created by plotting the true positive rate (sensitivity) against the false positive rate (1-specificity) at various threshold settings

The model output (predictions) are the probability of occupancy but the data are observed presence and absence (1 or 0). Therefore, it is difficult to evaluate how well the model predicts the data. The probabilities of occupancy must be coverted to presence-absences for comparison. We do this over a range of thresholds (= cutoffs). The threshold is the probability above which the stream is assumed to be occupied (Brook Trout = present). For example, if the probability of occupancy for a stream is 0.45 and we set a threshold = 0.50, we would assign the stream as unoccupied (absent). However, if we used a threshold of 0.4 then this same stream would be assigned as occupied (present). If the true (observed) state of the stream was occupied, then using a threshold of 0.5 would result in a false absence (predicted absent when really present) but if we used a threshold of 0.4 we would correct assign the stream as occupied (true positive). Assigning a threshold is a balance of trade-offs between false positives and false negatives. The balance is based on the risk tolerance to the consequences of type I and type II errors.

AUC can range from 0-1. An AUC value of 0.5 indicates the model does no better than random chance in discriminating occupancy. Models with AUC >0.7 are considered to have good discrimination in assessing the probability of occupancy.