Title: | Spatial Methods and Indices |
---|---|
Description: | Functions for spatial methods based on generalized estimating equations (GEE) and wavelet-revised methods (WRM), functions for scaling by wavelet multiresolution regression (WMRR), conducting multi-model inference, and stepwise model selection. Further, contains functions for spatially corrected model accuracy measures. |
Authors: | Gudrun Carl [cre, aut], Ingolf Kuehn [aut], Sam Levin [aut] |
Maintainer: | Sam Levin <[email protected]> |
License: | GPL-3 |
Version: | 2.2.1 |
Built: | 2024-10-24 04:29:18 UTC |
Source: | https://github.com/levisc8/spind |
A function for calculating spatial autocorrelation using Moran's I.
acfft(coord, f, lim1 = 1, lim2 = 2, dmax = 10)
acfft(coord, f, lim1 = 1, lim2 = 2, dmax = 10)
coord |
A matrix of two columns with corresponding cartesian coordinates. Currently only supports integer coordinates. |
f |
A vector which is the same length as |
lim1 |
Lower bound for first bin. Default is 1 |
lim2 |
Upper bound for first bin. Default is 2 |
dmax |
Number of distance bins to examine. Bins are formed by annuli of gradually increasing radii. Default is 10. |
A vector of Moran's I values for each distance bin.
Gudrun Carl
data(musdata) coords <- musdata[ ,4:5] mglm <- glm(musculus ~ pollution + exposure, "poisson", musdata) ac <- acfft(coords, resid(mglm, type = "pearson"), lim1 = 0, lim2 = 1) ac
data(musdata) coords <- musdata[ ,4:5] mglm <- glm(musculus ~ pollution + exposure, "poisson", musdata) ac <- acfft(coords, resid(mglm, type = "pearson"), lim1 = 0, lim2 = 1) ac
Adjusts actual presence/absence data based on the autocorrelation in the predictions of a model. The function will optionally plot results of model predictions, un-modified actual presence/absence, and adjusted values.
adjusted.actuals(data, coord, plot.maps = FALSE, color.maps = FALSE)
adjusted.actuals(data, coord, plot.maps = FALSE, color.maps = FALSE)
data |
a dataframe or matrix containing actual presence/absence (binary, 0 or 1) values in 1st column and predicted values (numeric between 0 and 1) in 2nd column. |
coord |
a matrix of two columns of the same length providing integer,
consecutively numbered coordinates for each occurrence and prediction in
|
plot.maps |
A logical indicating whether maps should be plotted. Default is FALSE. |
color.maps |
A logical value. If |
A vector of adjusted actual values.
Gudrun Carl
data(hook) data <- hook[ ,1:2] coord <- hook[ ,3:4] aa <- adjusted.actuals(data, coord, plot.maps = TRUE)
data(hook) data <- hook[ ,1:2] coord <- hook[ ,3:4] aa <- adjusted.actuals(data, coord, plot.maps = TRUE)
Calculates AIC and AICc
aic.calc(formula, family, data, mu, n.eff = NULL)
aic.calc(formula, family, data, mu, n.eff = NULL)
formula |
A model formula |
family |
Family used to fit the model. |
data |
A data frame |
mu |
Fitted values from a model |
n.eff |
Effective number of observations. Default is NULL |
A list with the following components
loglik
Log likelihood of the model
df
Degrees of freedom
AIC
AIC score for the specified model
AICc
AIC score corrected for small sample sizes
Gudrun Carl, Sam Levin
data(musdata) coords <- musdata[ ,4:5] mglm <- glm(musculus ~ pollution + exposure, "poisson", musdata) aic <- aic.calc(musculus ~ pollution + exposure, "poisson", musdata, mglm$fitted) aic$AIC
data(musdata) coords <- musdata[ ,4:5] mglm <- glm(musculus ~ pollution + exposure, "poisson", musdata) aic <- aic.calc(musculus ~ pollution + exposure, "poisson", musdata, mglm$fitted) aic$AIC
A data frame containing simulated count data for the thistle, Carlina horrida.
carlinadata
carlinadata
A data frame with 961 rows and 5 columns
integer - Simulated count data
numeric - Simulated aridity index values. This variable has high spatial autocorrelation values.
numeric - Simulated land use intensity. This variable has no spatial autocorrelation.
integer - x-coordinates for each grid cell
integer - y-coordinates for each grid cell
Plots the wavelet variance or covariance for the specified formula. The scale-dependent results are graphically displayed.
covar.plot( formula, data, coord, wavelet = "haar", wtrafo = "dwt", plot = "covar", customize_plot = NULL )
covar.plot( formula, data, coord, wavelet = "haar", wtrafo = "dwt", plot = "covar", customize_plot = NULL )
formula |
With specified notation according to names in data frame. |
data |
Data frame. |
coord |
A matrix of 2 columns with corresponding x,y-coordinates which have to be integer. |
wavelet |
Type of wavelet: |
wtrafo |
Type of wavelet transform: |
plot |
Either |
customize_plot |
Additional plotting parameters passed to |
Each variable or pair of variables in formula
is passed to wavevar
or
wavecovar
internally, and the result is plotted as a function of level
.
A list containing
1. result
= a vector of results.
2. plot
= a ggplot
object
Gudrun Carl
data(carlinadata) coords <- carlinadata[,4:5] covariance <- covar.plot(carlina.horrida ~ aridity + land.use - 1, data = carlinadata, coord = coords, wavelet = "d4", wtrafo = 'modwt', plot = 'covar') covariance$plot covariance$result variance <- covar.plot(carlina.horrida ~ aridity + land.use - 1, data = carlinadata, coord = coords, wavelet = "d4", wtrafo = 'modwt', plot = 'var') variance$plot variance$result
data(carlinadata) coords <- carlinadata[,4:5] covariance <- covar.plot(carlina.horrida ~ aridity + land.use - 1, data = carlinadata, coord = coords, wavelet = "d4", wtrafo = 'modwt', plot = 'covar') covariance$plot covariance$result variance <- covar.plot(carlina.horrida ~ aridity + land.use - 1, data = carlinadata, coord = coords, wavelet = "d4", wtrafo = 'modwt', plot = 'var') variance$plot variance$result
GEE
provides GEE-based methods from the packages gee and geepack
to account for spatial autocorrelation in multiple linear regressions
GEE( formula, family, data, coord, corstr = "fixed", cluster = 3, moran.params = list(), plot = FALSE, scale.fix = FALSE, customize_plot = NULL ) ## S3 method for class 'GEE' plot(x, ...) ## S3 method for class 'GEE' predict(object, newdata, ...) ## S3 method for class 'GEE' summary(object, ..., printAutoCorPars = TRUE)
GEE( formula, family, data, coord, corstr = "fixed", cluster = 3, moran.params = list(), plot = FALSE, scale.fix = FALSE, customize_plot = NULL ) ## S3 method for class 'GEE' plot(x, ...) ## S3 method for class 'GEE' predict(object, newdata, ...) ## S3 method for class 'GEE' summary(object, ..., printAutoCorPars = TRUE)
formula |
Model formula. Variable names must match variables in |
family |
|
data |
A data frame with variable names that match the variables
specified in |
coord |
A matrix of two columns with corresponding cartesian coordinates. Currently only supports integer coordinates. |
corstr |
Expected autocorrelation structure:
|
cluster |
Cluster size for cluster models
|
moran.params |
A list of parameters for calculating Moran's I.
|
plot |
A logical value indicating whether autocorrelation of
residuals should be plotted. NOW DEPRECATED in favor of |
scale.fix |
A logical indicating whether or not the scale parameter should
be fixed. The default is |
customize_plot |
Additional plotting parameters passed to |
x |
An object of class |
... |
Not used. |
object |
An object of class |
newdata |
A data frame containing variables to base the predictions on. |
printAutoCorPars |
A logical indicating whether to print the working autocorrelation parameters |
GEE can be used to fit linear models for response variables with
different distributions: gaussian
, binomial
, or poisson
.
As a spatial model, it is a generalized linear model in which the residuals
may be autocorrelated. It accounts for spatial (2-dimensional)
autocorrelation of the residuals in cases of regular gridded datasets
and returns corrected parameter estimates. The grid cells are assumed to be square.
Furthermore, this function requires that all predictor variables
be continuous.
An object of class GEE
. This consists of a list with the
following elements:
call
Call
formula
Model formula
family
Family
coord
Coordinates used for the model
corstr
User-selected correlation structure
b
Estimate of regression parameters
s.e.
Standard errors of the estimates
z
Depending on the family
, either a z or t value
p
p-values for each parameter estimate
scale
Scale parameter (dispersion parameter) of the distribution's variance
scale.fix
Logical indicating whether scale
has fixed value
cluster
User-specified cluster size for clustered models
fitted
Fitted values from the model
resid
Normalized Pearson residuals
w.ac
Working autocorrelation parameters
Mat.ac
Working autocorrelation matrix
QIC
Quasi Information Criterion. See qic.calc
for further details
QLik
Quasi-likelihood. See qic.calc
for further details
plot
Logical value indicating whether autocorrelation should be plotted
moran.params
Parameters for calculating Moran's I
v2
Parameter variance of the GEE
model
var.naive
Parameter variance of the independence
model
ac.glm
Autocorrelation of GLM residuals
ac.gee
Autocorrelation of GEE residuals
plot
An object of class ggplot
containing information
on the autocorrelation of residuals from the fitted GEE
and a
GLM
Elements can be viewed using the summary.GEE
methods included in
the package.
When using corstr = "fixed"
on large data sets, the function
may return an error, as the resulting variance-covariance matrix is too
large for R to handle. If this happens, one will have to use one of the
cluster models (i.e quadratic, exchangeable
).
Gudrun Carl, Sam Levin
Carl G & Kuehn I, 2007. Analyzing Spatial Autocorrelation in Species Distributions using Gaussian and Logit Models, Ecol. Model. 207, 159 - 170
Carey, V. J., 2006. Ported to R by Thomas Lumley (versions 3.13, 4.4, version 4.13)., B. R. gee: Generalized Estimation Equation solver. R package version 4.13-11.
Yan, J., 2004. geepack: Generalized Estimating Equation Package. R package version 0.2.10.
data(musdata) coords<- musdata[,4:5] ## Not run: mgee <- GEE(musculus ~ pollution + exposure, family = "poisson", data = musdata, coord = coords, corstr = "fixed", scale.fix = FALSE) summary(mgee, printAutoCorPars = TRUE) pred <- predict(mgee, newdata = musdata) library(ggplot2) plot(mgee) my_gee_plot <- mgee$plot # move the legend to a new position print(my_gee_plot + ggplot2::theme(legend.position = 'top')) ## End(Not run)
data(musdata) coords<- musdata[,4:5] ## Not run: mgee <- GEE(musculus ~ pollution + exposure, family = "poisson", data = musdata, coord = coords, corstr = "fixed", scale.fix = FALSE) summary(mgee, printAutoCorPars = TRUE) pred <- predict(mgee, newdata = musdata) library(ggplot2) plot(mgee) my_gee_plot <- mgee$plot # move the legend to a new position print(my_gee_plot + ggplot2::theme(legend.position = 'top')) ## End(Not run)
A data frame containing actual presence absence data and predicted probability of occurrence values.
hook
hook
A data frame with 100 rows and 4 columns
integer - Presence/absence records
numeric - predicted probabilities of occurrence
integer - x-coordinates for each grid cell
integer - y-coordinates for each grid cell
mmiGEE is a multimodel inference approach evaluating the relative
importance of predictors used in GEE
.
@details It performs automatically generated model selection and creates a model selection table according to the approach of multi-model inference (Burnham & Anderson, 2002). QIC is used to obtain model selection weights and to rank the models. Moreover, mmiGEE calculates relative variable importance of a given model. Finally, this function requires that all predictor variables be continuous.
mmiGEE(object, data, trace = FALSE)
mmiGEE(object, data, trace = FALSE)
object |
A model of class |
data |
A data frame or set of vectors of equal length. |
trace |
A logical indicating whether or not to print results to console. |
Calculates the relative importance of each variable
using multi-model inference methods in a Generalized Estimating Equations
framework implemented in GEE
.
mmiGEE
returns a list containing the following elements
result
A matrix containing slopes, degrees of freedom, quasilikelihood, QIC, delta, and weight values for the set of candidate models. The models are ranked by QIC.
rvi
A vector containing the relative importance of each variable in the regression.
Gudrun Carl, Sam Levin
Burnham, K.P. & Anderson, D.R. (2002) Model selection and multimodel inference. Springer, New York.
Carl G & Kuehn I, 2007. Analyzing Spatial Autocorrelation in Species Distributions using Gaussian and Logit Models, Ecol. Model. 207, 159 - 170
# data (for demonstration only) library(MASS) data(birthwt) # impose an artificial (not fully appropriate) grid structure x <- rep(1:14, 14) y <- as.integer(gl(14, 14)) coords <- cbind(x[-(190:196)], y[-(190:196)]) ## Not run: formula <- formula(low ~ race + smoke + bwt) mgee <- GEE(formula, family = "gaussian", data = birthwt, coord = coords, corstr = "fixed", scale.fix = TRUE) mmi <- mmiGEE(mgee, birthwt) ## End(Not run)
# data (for demonstration only) library(MASS) data(birthwt) # impose an artificial (not fully appropriate) grid structure x <- rep(1:14, 14) y <- as.integer(gl(14, 14)) coords <- cbind(x[-(190:196)], y[-(190:196)]) ## Not run: formula <- formula(low ~ race + smoke + bwt) mgee <- GEE(formula, family = "gaussian", data = birthwt, coord = coords, corstr = "fixed", scale.fix = TRUE) mmi <- mmiGEE(mgee, birthwt) ## End(Not run)
mmiWMRR is a multimodel inference approach evaluating the relative
importance of predictors used in scaleWMRR
.
mmiWMRR(object, data, scale, detail = TRUE, trace = FALSE)
mmiWMRR(object, data, scale, detail = TRUE, trace = FALSE)
object |
A model of class |
data |
Data frame. |
scale |
0 or higher integers possible (limit depends on sample size).
|
detail |
Remove smooth wavelets? If |
trace |
Logical value indicating whether to print results to console. |
It performs automatically
generated model selection and creates a model
selection table according to the approach of multi-model inference
(Burnham & Anderson, 2002). The analysis is carried out for scale-specific
regressions (i.e. where scaleWMRR
can be used). AIC is
used to obtain model
selection weights and to rank the models.
Furthermore, this function requires that all predictor variables
be continuous.
mmiWMRR
returns a list containing the following elements
result
A matrix containing slopes, degrees of freedom, likelihood, AIC, delta, and weight values for the set of candidate models. The models are ranked by AIC.
level
An integer corresponding to scale
Gudrun Carl
Burnham, K.P. & Anderson, D.R. (2002) Model selection and multimodel inference. Springer, New York.
Carl G, Doktor D, Schweiger O, Kuehn I (2016) Assessing relative variable importance across different spatial scales: a two-dimensional wavelet analysis. Journal of Biogeography 43: 2502-2512.
aic.calc
, rvi.plot
,
MuMIn, WRM
data(carlinadata) coords <- carlinadata[ ,4:5] ## Not run: wrm <- WRM(carlina.horrida ~ aridity + land.use, family = "poisson", data = carlinadata, coord = coords, level = 1, wavelet = "d4") mmi <- mmiWMRR(wrm, data = carlinadata, scale = 3, detail = TRUE, trace = FALSE) ## End(Not run)
data(carlinadata) coords <- carlinadata[ ,4:5] ## Not run: wrm <- WRM(carlina.horrida ~ aridity + land.use, family = "poisson", data = carlinadata, coord = coords, level = 1, wavelet = "d4") mmi <- mmiWMRR(wrm, data = carlinadata, scale = 3, detail = TRUE, trace = FALSE) ## End(Not run)
A data frame containing simulated count data of a house mouse.
musdata
musdata
A data frame with 400 rows and 5 columns
integer - Simulated count data for Mus musculus
numeric - Simulated variable that describes degree of pollution in corresponding grid cell
numeric - Simulated variable that describes degree of exposure for each grid cell
integer - x-coordinates for each grid cell
integer - y-coordinates for each grid cell
A function for calculating quasi-likelihood and Quasi-Information Criterion values based on the method of Hardin & Hilbe (2003).
qic.calc(formula, family, data, mu, var.robust, var.indep.naive)
qic.calc(formula, family, data, mu, var.robust, var.indep.naive)
formula |
a model formula |
family |
|
data |
a data frame |
mu |
fitted values from a model |
var.robust |
variance of model parameters |
var.indep.naive |
naive variance of model parameters under the
|
A list with the following components:
QIC
quasi-information criterion
loglik
quasi-likelihood
Hardin, J.W. & Hilbe, J.M. (2003) Generalized Estimating Equations. Chapman and Hall, New York.
Barnett et al. Methods in Ecology & Evolution 2010, 1, 15-24.
Creates model selection tables, calculates and plots relative variable importance based on the scale level of a given model.
rvi.plot( formula, family, data, coord, maxlevel, detail = TRUE, wavelet = "haar", wtrafo = "dwt", n.eff = NULL, trace = FALSE, customize_plot = NULL )
rvi.plot( formula, family, data, coord, maxlevel, detail = TRUE, wavelet = "haar", wtrafo = "dwt", n.eff = NULL, trace = FALSE, customize_plot = NULL )
formula |
A model formula |
family |
|
data |
A data frame or set of vectors of equal length. |
coord |
X,Y coordinates for each observation. Coordinates should be consecutive integers. |
maxlevel |
An integer for maximum scale level |
detail |
Remove smooth wavelets? If |
wavelet |
Type of wavelet: |
wtrafo |
Type of wavelet transform: |
n.eff |
A numeric value of effective sample size |
trace |
Should R print progress updates to the console? Default is FALSE |
customize_plot |
Additional plotting parameters passed to |
Calculates the relative importance of each variable
using multi-model inference methods in a wavelet multi-resolution regression
framework implemented in mmiWMRR
. The scale level dependent
results are then graphically displayed.
A list containing
1. A matrix containing the relative importance of each variable in the regression at each value of the scale level.
2. A ggplot
object containing a plot of the relative
variable importance
data(carlinadata) coords<- carlinadata[,4:5] ## Not run: wrm <- WRM(carlina.horrida ~ aridity + land.use, family = "poisson", data = carlinadata, coord = coords, level = 1, wavelet = "d4") mmi <- mmiWMRR(wrm, data = carlinadata, scale = 3, detail = TRUE) # Plot scale-dependent relative variable importance rvi <- rvi.plot(carlina.horrida ~ aridity + land.use, family = "poisson", data = carlinadata, coord = coords, maxlevel = 4, detail = TRUE, wavelet = "d4") rvi$plot rvi$rvi ## End(Not run)
data(carlinadata) coords<- carlinadata[,4:5] ## Not run: wrm <- WRM(carlina.horrida ~ aridity + land.use, family = "poisson", data = carlinadata, coord = coords, level = 1, wavelet = "d4") mmi <- mmiWMRR(wrm, data = carlinadata, scale = 3, detail = TRUE) # Plot scale-dependent relative variable importance rvi <- rvi.plot(carlina.horrida ~ aridity + land.use, family = "poisson", data = carlinadata, coord = coords, maxlevel = 4, detail = TRUE, wavelet = "d4") rvi$plot rvi$rvi ## End(Not run)
scaleWMRR performs a scale-specific regression based on a wavelet multiresolution analysis.
scaleWMRR( formula, family, data, coord, scale = 1, detail = TRUE, wavelet = "haar", wtrafo = "dwt", b.ini = NULL, pad = list(), control = list(), moran.params = list(), trace = FALSE )
scaleWMRR( formula, family, data, coord, scale = 1, detail = TRUE, wavelet = "haar", wtrafo = "dwt", b.ini = NULL, pad = list(), control = list(), moran.params = list(), trace = FALSE )
formula |
With specified notation according to names in data frame. |
family |
|
data |
Data frame. |
coord |
Corresponding coordinates which have to be integer. |
scale |
0 (which is equivalent to GLM) or higher integers possible (limit depends on sample size). |
detail |
Remove smooth wavelets? If |
wavelet |
Type of wavelet: |
wtrafo |
Type of wavelet transform: |
b.ini |
Initial parameter values. Default is |
pad |
A list of parameters for padding wavelet coefficients.
|
control |
A list of parameters for controlling the fitting process.
|
moran.params |
A list of parameters for calculating Moran's I.
|
trace |
A logical value indicating whether to print parameter estimates to the console |
This function fits generalized linear models while taking the
two-dimensional grid structure of
datasets into account. The following error distributions (in
conjunction with appropriate link functions) are allowed: gaussian
,
binomial
, or poisson
. The model provides scale-specific
results for data sampled on a contiguous geographical area. The
dataset is assumed to be regular gridded and the grid cells are
assumed to be square. A function from the package 'waveslim' is used
for the wavelet transformations (Whitcher, 2005).
Furthermore, this function requires that all predictor variables
be continuous.
scaleWMRR returns a list containing the following elements
call
Model call
b
Estimates of regression parameters
s.e.
Standard errors of the parameter estimates
z
Z values (or corresponding values for statistics)
p
p-values for each parameter estimate
df
Degrees of freedom
fitted
Fitted values
resid
Pearson residuals
converged
Logical value whether the procedure converged
trace
Logical. If TRUE:
ac.glm
Autocorrelation of glm.residuals
ac
Autocorrelation of wavelet.residuals
Gudrun Carl
Carl G, Doktor D, Schweiger O, Kuehn I (2016) Assessing relative variable importance across different spatial scales: a two-dimensional wavelet analysis. Journal of Biogeography 43: 2502-2512.
Whitcher, B. (2005) Waveslim: basic wavelet routines for one-, two- and three-dimensional signal processing. R package version 1.5.
waveslim,mra.2d
data(carlinadata) coords <- carlinadata[ ,4:5] ## Not run: # scaleWMRR at scale = 0 is equivalent to GLM ms0 <- scaleWMRR(carlina.horrida ~ aridity + land.use, family = "poisson", data = carlinadata, coord = coords, scale = 0, trace = TRUE) # scale-specific regressions for detail components ms1 <- scaleWMRR(carlina.horrida ~ aridity + land.use, family = "poisson", data = carlinadata, coord = coords, scale = 1, trace = TRUE) ms2 <- scaleWMRR(carlina.horrida ~ aridity + land.use, family = "poisson", data = carlinadata, coord = coords, scale = 2, trace = TRUE) ms3<- scaleWMRR(carlina.horrida ~ aridity + land.use, family = "poisson", data = carlinadata, coord = coords, scale = 3, trace = TRUE) ## End(Not run)
data(carlinadata) coords <- carlinadata[ ,4:5] ## Not run: # scaleWMRR at scale = 0 is equivalent to GLM ms0 <- scaleWMRR(carlina.horrida ~ aridity + land.use, family = "poisson", data = carlinadata, coord = coords, scale = 0, trace = TRUE) # scale-specific regressions for detail components ms1 <- scaleWMRR(carlina.horrida ~ aridity + land.use, family = "poisson", data = carlinadata, coord = coords, scale = 1, trace = TRUE) ms2 <- scaleWMRR(carlina.horrida ~ aridity + land.use, family = "poisson", data = carlinadata, coord = coords, scale = 2, trace = TRUE) ms3<- scaleWMRR(carlina.horrida ~ aridity + land.use, family = "poisson", data = carlinadata, coord = coords, scale = 3, trace = TRUE) ## End(Not run)
The spind
package provides convenient implementation of Generalized
estimating equations (GEEs) and Wavelet-revised models (WRMs)
in the context of spatial models. It also provides tools for
multi-model inference, stepwise model selection, and spatially
corrected model diagnostics. This help section provides brief descriptions
of each function and is organized by the type of model they apply to
or the scenarios in which you might use them. Of course, these are
recommendations - feel free to use them as you see fit. For a more
detailed description of the package and its functions, please see
the vignette Intro to spind (browseVignettes('spind')
).
The GEE
function fits spatial models using a generalized
estimating equation and a set of gridded data. The package
also includes S3 methods for summary
and predict
so you can interact with these models in the same way you might
interact with a glm
or lm
.
The WRM
function fits spatial models using a wavelet-revised
model and a set of gridded data. The package
also includes S3 methods for summary
and predict
so you can interact with these models in the same way you might
interact with a glm
or lm
. There are also a number
of helper functions that help you fine tune the fitting process
that are specific to WRMs. Please see the documentation for
WRM
for more details on those.
WRM
also has a few other features specific to it. For example, if
you are interested in viewing the variance or covariance of your variables
as a function of level
, covar.plot
is useful. upscale
will plot your matrices as a function of level
so you can examine the
effect of cluster resolution on your results.
spind
includes a couple of functions to help you find the best fit
for your data. The first two are multimodel inference tools specific to GEEs
and WRMs and are called mmiGEE
and mmiWMRR
. These generate
outputs very similar to those from the MuMIn
package. If you would
like to see how variable importance changes as a function of the scale
of
the WMRR, you can call rvi.plot
. This will generate a model selection
table for each degree of level
(from 1 to maxlevel
) and then
plot the weight of each variable as a function of level
.
spind
also includes a function for stepwise model selection that is
loosely based on step
and stepAIC
. step.spind
differs
from these in that it is specific to classes WRM
and GEE
. It
performs model selection using AIC or AICc for WRMs and QIC for GEEs.
Finally, spind
has a number of functions that provide spatially
corrected goodness of fit diagnostics for any type of model (i.e. they
are not specific to classes WRM
or GEE
). These first appeared
in spind v1.0
and have not been updated in this release. The first two are
divided into whether or not they are threshold dependent or not. Threshold
dependent metrics can be calculated using th.dep
and threshold
independent metrics can be calculated using th.indep
.
acfft
calculates spatial autocorrelation of residuals from a model
using Moran's I. You can set the number of distance bins you'd
like to examine using dmax
argument and the size of those bins
using lim1
and lim2
.
The vignette titled Intro to spind provides more information on these functions and some example workflows that will demonstrate them in greater depth than this document. Of course, if you have suggestions on how to improve this document or any of the other ones in here, please don't hesitate to contact us.
Stepwise model selection by AIC or AICc for WRMS and QIC for GEEs
step.spind(object, data, steps = NULL, trace = TRUE, AICc = FALSE)
step.spind(object, data, steps = NULL, trace = TRUE, AICc = FALSE)
object |
A model of class |
data |
The data used to fit that model. |
steps |
Number of iterations the procedure should go through before concluding. The default is to use the number of variables as the number of iterations. |
trace |
Should R print progress updates and the final, best model found
to the console? Default is |
AICc |
Logical. In the case of model selection with |
This function performs stepwise variable elimination for model comparison. Each iteration will try to find the best combination of predictors for a given number of variables based on AIC, AICc, or QIC, and then use that as the base model for the next iteration until there are no more variables to eliminate. Alternatively, it will terminate when reducing the number of variables while respecting the model hierarchy no longer produces lower information criterion values.
A list with components model
and table
.
model
is always formula for the best model found by the procedure.
table
is always a data frame, but the content varies for each type of
model.
For WRM
's, the columns
returned are
Deleted.Vars
Variables retained from the previous iteration
which were tested in the current iteration.
LogLik
Log-likelihood of the model.
AIC
AIC score for the model.
AICc
AICc score for the model.
For GEE
s:
Deleted.Vars
Variables retained from the previous iteration
which were tested in the current iteration.
QIC
Quasi-information criterion of the model.
Quasi.Lik
Quasi-likelihood of the model.
Currently, the function only supports backwards model selection (i.e. one must start with a full model and subtract variables). Forward and both directions options may be added later.
Sam Levin
Hardin, J.W. & Hilbe, J.M. (2003) Generalized Estimating Equations. Chapman and Hall, New York.
qic.calc
, aic.calc
, add1
,
step
, stepAIC
# For demonstration only. We are artificially imposing a grid structure # on data that is not actually spatial data library(MASS) data(birthwt) x <- rep(1:14, 14) y <- as.integer(gl(14, 14)) coords <- cbind(x[-(190:196)], y[-(190:196)]) ## Not run: formula <- formula(low ~ age + lwt + race + smoke + ftv + bwt) mgee <- GEE(formula, family = "gaussian", data = birthwt, coord = coords, corstr = "fixed", scale.fix = TRUE) ss <- step.spind(mgee, birthwt) best.mgee <- GEE(ss$model, family = "gaussian", data = birthwt, coord = coords, corstr = "fixed", scale.fix = TRUE) summary(best.mgee, printAutoCorPars = FALSE) ## End(Not run)
# For demonstration only. We are artificially imposing a grid structure # on data that is not actually spatial data library(MASS) data(birthwt) x <- rep(1:14, 14) y <- as.integer(gl(14, 14)) coords <- cbind(x[-(190:196)], y[-(190:196)]) ## Not run: formula <- formula(low ~ age + lwt + race + smoke + ftv + bwt) mgee <- GEE(formula, family = "gaussian", data = birthwt, coord = coords, corstr = "fixed", scale.fix = TRUE) ss <- step.spind(mgee, birthwt) best.mgee <- GEE(ss$model, family = "gaussian", data = birthwt, coord = coords, corstr = "fixed", scale.fix = TRUE) summary(best.mgee, printAutoCorPars = FALSE) ## End(Not run)
Calculates spatially corrected, threshold-dependent metrics for an observational data set and model predictions (Kappa and confusion matrix)
th.dep(data, coord, thresh = 0.5, spatial = TRUE)
th.dep(data, coord, thresh = 0.5, spatial = TRUE)
data |
A data frame or matrix with two columns. The first column should contain actual presence/absence data (binary, 0 or 1) and the second column should contain model predictions of probability of occurrence (numeric, between 0 and 1). |
coord |
A data frame or matrix with two columns containing x,y coordinates for each actual and predicted value. Coordinates must be integer and consecutively numbered. |
thresh |
A cutoff value for classifying predictions as modeled presences or modeled absences. Default is 0.5. |
spatial |
A logical indicating whether spatially corrected indices (rather than classical indices) should be computed. |
A list with the following components:
kappa
Kappa statistic
cm
Confusion matrix
sensitivity
Sensitivity
specificity
Specificity
actuals
Actual occurrence data or adjusted actual occurrence data
splitlevel.pred
Level splitting of predicted values
splitlevel.act
Level splitting of actuals or adjusted actuals
splitposition.pred
Position splitting of predicted values
splitposition.act
Position splitting of actuals or adjusted actuals
Gudrun Carl
Carl G, Kuehn I (2017) Spind: a package for computing spatially corrected accuracy measures. Ecography 40: 675-682. DOI: 10.1111/ecog.02593
data(hook) data <- hook[ ,1:2] coord <- hook[ ,3:4] si1 <- th.dep(data, coord, spatial = TRUE) si1$kappa si1$cm
data(hook) data <- hook[ ,1:2] coord <- hook[ ,3:4] si1 <- th.dep(data, coord, spatial = TRUE) si1$kappa si1$cm
Calculates spatially corrected, threshold-independent metrics for an observational data set and model predictions (AUC, ROC, max-TSS)
th.indep(data, coord, spatial = TRUE, plot.ROC = FALSE, customize_plot = NULL)
th.indep(data, coord, spatial = TRUE, plot.ROC = FALSE, customize_plot = NULL)
data |
A data frame or matrix with two columns. The first column should contain actual presence/absence data (binary, 0 or 1) and the second column should contain model predictions of probability of occurrence (numeric, between 0 and 1). |
coord |
A data frame or matrix with two columns containing x,y coordinates for each actual and predicted value. Coordinates must be integer and consecutively numbered. |
spatial |
A logical value indicating whether spatial corrected indices (rather than classical indices) should be computed. |
plot.ROC |
A logical indicating whether the ROC should be plotted. NOW DEPRECATED. |
customize_plot |
Additional plotting parameters passed to |
A list with the following components:
AUC
Area under curve
opt.thresh
optimal threshold for maximum TSS value
TSS
Maximum TSS value
sensitivity
Sensitivity
Specificity
Specificity
AUC.plot
A ggplot
object
Gudrun Carl
Carl G, Kuehn I (2017) Spind: a package for computing spatially corrected accuracy measures. Ecography 40: 675-682. DOI: 10.1111/ecog.02593
data(hook) data <- hook[ ,1:2] coord <- hook[ ,3:4] si2 <- th.indep(data, coord, spatial = TRUE) si2$AUC si2$TSS si2$opt.thresh si2$plot
data(hook) data <- hook[ ,1:2] coord <- hook[ ,3:4] si2 <- th.indep(data, coord, spatial = TRUE) si2$AUC si2$TSS si2$opt.thresh si2$plot
The analysis is based a wavelet multiresolution analysis using only smooth wavelet components. It is a 2D analysis taking the grid structure and provides scale-specific results for data sampled on a contiguous geographical area. The dataset is assumed to be regular gridded and the grid cells are assumed to be square. The scale-dependent results are graphically displayed.
upscale( f, coord, wavelet = "haar", wtrafo = "dwt", pad = mean(f), color.maps = FALSE )
upscale( f, coord, wavelet = "haar", wtrafo = "dwt", pad = mean(f), color.maps = FALSE )
f |
A vector. |
coord |
A matrix of two columns with corresponding cartesian coordinates. Currently only supports integer coordinates. |
wavelet |
Name of wavelet family. |
wtrafo |
Type of wavelet transform. Either |
pad |
A numeric value for padding the matrix into a bigger square. Default is set to mean(f). |
color.maps |
A logical value. If |
A set of plots showing the matrix image at each value for
level
.
Gudrun Carl
data(carlinadata) coords <- carlinadata[ ,4:5] # Upscaling of smooth components upscale(carlinadata$land.use, coord = coords)
data(carlinadata) coords <- carlinadata[ ,4:5] # Upscaling of smooth components upscale(carlinadata$land.use, coord = coords)
Calculates the wavelet covariance based on a wavelet multiresolution analysis.
wavecovar(f1, f2, coord, wavelet = "haar", wtrafo = "dwt")
wavecovar(f1, f2, coord, wavelet = "haar", wtrafo = "dwt")
f1 |
A vector of length n. |
f2 |
A vector of length n. |
coord |
A matrix of two columns with corresponding cartesian coordinates. Currently only supports integer coordinates. |
wavelet |
Name of wavelet family. |
wtrafo |
Type of wavelet transform. Either |
Wavelet covariance for f1
and f2
.
Gudrun Carl
waveslim, WRM
, covar.plot
,
scaleWMRR
data(carlinadata) coords <- carlinadata[ ,4:5] pc <- covar.plot(carlina.horrida ~ aridity + land.use, data = carlinadata, coord = coords, wavelet = 'd4', wtrafo = 'modwt', plot = 'covar') pc$plot
data(carlinadata) coords <- carlinadata[ ,4:5] pc <- covar.plot(carlina.horrida ~ aridity + land.use, data = carlinadata, coord = coords, wavelet = 'd4', wtrafo = 'modwt', plot = 'covar') pc$plot
Calculates the wavelet variance based on a wavelet multiresolution analysis.
wavevar(f, coord, wavelet = "haar", wtrafo = "dwt")
wavevar(f, coord, wavelet = "haar", wtrafo = "dwt")
f |
A vector |
coord |
A matrix of two columns with corresponding cartesian coordinates. Currently only supports integer coordinates. |
wavelet |
Name of wavelet family. |
wtrafo |
Type of wavelet transform. Either |
Wavelet variance for f
.
Gudrun Carl
waveslim, WRM
, covar.plot
,
scaleWMRR
data(carlinadata) coords <- carlinadata[ ,4:5] pv <- covar.plot(carlina.horrida ~ aridity + land.use, data = carlinadata, coord = coords, wavelet = 'd4', wtrafo = 'modwt', plot = 'var') pv$plot
data(carlinadata) coords <- carlinadata[ ,4:5] pv <- covar.plot(carlina.horrida ~ aridity + land.use, data = carlinadata, coord = coords, wavelet = 'd4', wtrafo = 'modwt', plot = 'var') pv$plot
A wavelet-based method to remove spatial autocorrelation in multiple linear regressions. Wavelet transforms are implemented using waveslim (Whitcher, 2005).
WRM( formula, family, data, coord, level = 1, wavelet = "haar", wtrafo = "dwt", b.ini = NULL, pad = list(), control = list(), moran.params = list(), plot = FALSE, customize_plot = NULL ) ## S3 method for class 'WRM' plot(x, ...) ## S3 method for class 'WRM' summary(object, ...) ## S3 method for class 'WRM' predict(object, newdata, sm = FALSE, newcoord = NA, ...)
WRM( formula, family, data, coord, level = 1, wavelet = "haar", wtrafo = "dwt", b.ini = NULL, pad = list(), control = list(), moran.params = list(), plot = FALSE, customize_plot = NULL ) ## S3 method for class 'WRM' plot(x, ...) ## S3 method for class 'WRM' summary(object, ...) ## S3 method for class 'WRM' predict(object, newdata, sm = FALSE, newcoord = NA, ...)
formula |
Model formula. Variable names must match variables in |
family |
|
data |
A data frame with variable names that match the variables specified in |
coord |
A matrix of two columns with corresponding cartesian coordinates. Currently only supports integer coordinates. |
level |
An integer specifying the degree of wavelet decomposition
|
wavelet |
Name of wavelet family. |
wtrafo |
Type of wavelet transform. Either |
b.ini |
Initial parameter values. Default is NULL. |
pad |
A list of parameters for padding wavelet coefficients.
|
control |
a list of parameters for controlling the fitting process.
|
moran.params |
A list of parameters for calculating Moran's I.
|
plot |
A logical value indicating whether to plot autocorrelation of
residuals by distance bin. NOW DEPRECATED in favor of |
customize_plot |
Additional plotting parameters passed to |
x |
An object of class |
... |
Not used |
object |
An object of class |
newdata |
A data frame containing variables used to make predictions. |
sm |
Logical. Should part of smooth components be included? |
newcoord |
New coordinates corresponding to observations in |
WRM can be used to fit linear models for response vectors of different
distributions: gaussian
, binomial
, or poisson
.
As a spatial model, it is a generalized linear model in which the residuals
may be autocorrelated. It corrects for 2-dimensional residual
autocorrelation for regular gridded data sets using the wavelet
decomposition technique. The grid cells are assumed to be square.
Furthermore, this function requires that all predictor variables
be continuous.
An object of class WRM
. This consists of a list with the
following elements:
call
Call
formula
Model formula
family
Family
coord
Coordinates used in the model
b
Estimate of regression parameters
s.e.
Standard errors
z
Depending on the family
, either a z or t value
p
p-values
fitted
Fitted values from the model
resid
Pearson residuals
b.sm
Parameter estimates of neglected smooth part
fitted.sm
Fitted values of neglected smooth part
level
Selected level of wavelet decomposition
wavelet
Selected wavelet
wtrafo
Selected wavelet transformation
padzone
Selected padding zone expansion factor
padform
Selected matrix padding type
n.eff
Effective number of observations
AIC
Akaike information criterion
AICc
AIC score corrected for small sample sizes
LogLik
Log likelihood of the model
ac.glm
Autocorrelation of GLM residuals
ac.wrm
Autocorrelation of WRM residuals
b.ini
Initial parameter values
control
Control parameters for the fitting process
moran.params
Parameters for calculating Moran's I
pad
List of parameters for padding wavelet coefficients
plot
An object of class ggplot
containing information
on the autocorrelation of residuals from the fitted WRM
and a
GLM
For those interested in multimodel inference approaches, WRM
with
level = 1
is identical to mmiWMRR
with scale = 1
.
Gudrun Carl, Sam Levin
Carl, G., Kuehn, I. (2010): A wavelet-based extension of generalized linear models to remove the effect of spatial autocorrelation. Geographical Analysis 42 (3), 323 - 337
Whitcher, B. (2005) Waveslim: basic wavelet routines for one-, two- and three-dimensional signal processing. R package version 1.5.
mmiWMRR
, predict.WRM
, summary.WRM
,
aic.calc
data(musdata) coords <- musdata[,4:5] ## Not run: mwrm <- WRM(musculus ~ pollution + exposure, family = "poisson", data = musdata, coord = coords, level = 1) pred <- predict(mwrm, newdata = musdata) summary(mwrm) plot(mwrm) library(ggplot2) my_wrm_plot <- mwrm$plot # increase axis text size print(my_wrm_plot + ggplot2::theme(axis.text = element_text(size = 15))) ## End(Not run)
data(musdata) coords <- musdata[,4:5] ## Not run: mwrm <- WRM(musculus ~ pollution + exposure, family = "poisson", data = musdata, coord = coords, level = 1) pred <- predict(mwrm, newdata = musdata) summary(mwrm) plot(mwrm) library(ggplot2) my_wrm_plot <- mwrm$plot # increase axis text size print(my_wrm_plot + ggplot2::theme(axis.text = element_text(size = 15))) ## End(Not run)