Title: | Geographically Weighted Zero Inflated Negative Binomial Regression |
---|---|
Description: | Fits a geographically weighted regression model using zero inflated probability distributions. Has the zero inflated negative binomial distribution (zinb) as default, but also accepts the zero inflated Poisson (zip), negative binomial (negbin) and Poisson distributions. Can also fit the global versions of each regression model. Da Silva, A. R. & De Sousa, M. D. R. (2023). "Geographically weighted zero-inflated negative binomial regression: A general case for count data", Spatial Statistics <doi:10.1016/j.spasta.2023.100790>. Brunsdon, C., Fotheringham, A. S., & Charlton, M. E. (1996). "Geographically weighted regression: a method for exploring spatial nonstationarity", Geographical Analysis, <doi:10.1111/j.1538-4632.1996.tb00936.x>. Yau, K. K. W., Wang, K., & Lee, A. H. (2003). "Zero-inflated negative binomial mixed regression modeling of over-dispersed count data with extra zeros", Biometrical Journal, <doi:10.1002/bimj.200390024>. |
Authors: | Jéssica Vasconcelos [aut, cre], Juliana Rosa [aut], Alan da Silva [aut] |
Maintainer: | Jéssica Vasconcelos <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0 |
Built: | 2024-11-08 04:12:10 UTC |
Source: | https://github.com/jessicavasconcelos/gwzinbr |
Runs a Golden Section Search (GSS) algorithm for determining the optimum bandwidth for the geographically weighted zero inflated negative binomial regression and other spatial regression models.
Golden( data, formula, xvarinf, weight, lat, long, globalmin = TRUE, method, model = "zinb", bandwidth = "cv", offset, force = FALSE, maxg = 100, distancekm = FALSE )
Golden( data, formula, xvarinf, weight, lat, long, globalmin = TRUE, method, model = "zinb", bandwidth = "cv", offset, force = FALSE, maxg = 100, distancekm = FALSE )
data |
name of the dataset. |
formula |
regression model formula as in |
xvarinf |
name of the covariates for the zero inflated part of the model, default value is |
weight |
name of the variable containing the sample weights, default value is |
lat |
name of the variable containing the latitudes in the dataset. |
long |
name of the variable containing the longitudes in the dataset. |
globalmin |
logical value indicating whether to find a global minimum in the optimization process, default value is |
method |
indicates the method to be used for the bandwidth calculation ( |
model |
indicates the model to be used for the regression ( |
bandwidth |
indicates the criterion to be used for the bandwidth calculation ( |
offset |
name of the variable containing the offset values, if null then is set to a vector of zeros, default value is |
force |
logical value indicating whether to force the indicated model even if it is not the best fit for the data, default value is |
maxg |
integer indicating the maximum number of iterations for the zero inflated part of the model, default value is |
distancekm |
logical value indicating whether to calculate the distances in km, default value is |
A list that contains:
h_values
- Initial values tested for the bandwidth.
iterations
- All bandwidth values tested and respective cv/aic results for each Golden Section Search executed.
gss_results
- Optimum bandwidth found and respective cv/aic.
min_bandwidth
- Optimum bandwidth.
## Data data(southkorea_covid19) ## GSS algorithm gss <- Golden(data = southkorea_covid19,formula = n_covid1~Morbidity+high_sch_p+Healthcare_access, xvarinf = NULL, weight = NULL, lat = "x", long = "y", offset = "ln_total", model = "poisson", method = "fixed_g", bandwidth = "cv", globalmin = FALSE, distancekm = TRUE, force=TRUE) ## Bandwidth gss$min_bandwidth ## Iterations gss$iterations
## Data data(southkorea_covid19) ## GSS algorithm gss <- Golden(data = southkorea_covid19,formula = n_covid1~Morbidity+high_sch_p+Healthcare_access, xvarinf = NULL, weight = NULL, lat = "x", long = "y", offset = "ln_total", model = "poisson", method = "fixed_g", bandwidth = "cv", globalmin = FALSE, distancekm = TRUE, force=TRUE) ## Bandwidth gss$min_bandwidth ## Iterations gss$iterations
Fits a geographically weighted regression model using zero inflated probability distributions. Has the zero inflated negative binomial distribution (zinb) as default, but also accepts the zero inflated Poisson (zip), negative binomial (negbin) and Poisson distributions. Can also fit the global versions of each regression model.
gwzinbr( data, formula, xvarinf = NULL, weight = NULL, lat, long, grid = NULL, method, model = "zinb", offset = NULL, distancekm = FALSE, force = FALSE, int_inf = TRUE, maxg = 100, h = NULL )
gwzinbr( data, formula, xvarinf = NULL, weight = NULL, lat, long, grid = NULL, method, model = "zinb", offset = NULL, distancekm = FALSE, force = FALSE, int_inf = TRUE, maxg = 100, h = NULL )
data |
name of the dataset. |
formula |
regression model formula as in |
xvarinf |
name of the covariates for the zero inflated part of the model, default value is |
weight |
name of the variable containing the sample weights, default value is |
lat |
name of the variable containing the latitudes in the dataset. |
long |
name of the variable containing the longitudes in the dataset. |
grid |
name of the dataset containing the coordinates for the model locations, default value is |
method |
indicates the method to be used for the bandwidth calculation ( |
model |
indicates the model to be used for the regression ( |
offset |
name of the variable containing the offset values, if null then is set to a vector of zeros, default value is |
distancekm |
logical value indicating whether to calculate the distances in km, default value is |
force |
logical value indicating whether to force the indicated model even if it is not the best fit for the data, default value is |
int_inf |
logical value indicating whether to include an intercept in the zero inflated part of the model, default value is |
maxg |
integer indicating the maximum number of iterations for the zero inflated part of the model, default value is |
h |
integer indicating the bandwidth value (obtained from |
A list that contains:
bandwidth
- Bandwidth value.
measures
- Goodness of fit statistics and other measures.
qntls_gwr_param_estimates
- Quantiles of GWR parameter estimates.
descript_stats_gwr_param_estimates
- Descriptive statistics of GWR parameter estimates.
t_test_gwr_param_estimates
- Results for the parameters significance t tests.
qntls_gwr_se
- Quantiles of GWR standard errors.
descript_stats_gwr_se
- Descriptive statistics of GWR standard errors.
qntls_gwr_zero_infl_param_estimates
- Quantiles of GWR zero inflated parameter estimates.
descript_stats_gwr_zero_infl_param_estimates
- Descriptive statistics of GWR zero inflated parameter estimates.
t_test_gwr_zero_infl_param_estimates
- Results for the zero inflated parameters significance t tests.
qntls_gwr_zero_infl_se
- Quantiles of GWR zero inflated standard errors.
descript_stats_gwr_zero_infl_se
- Descriptive statistics of GWR zero inflated standard errors.
non_stationary_test
- Results for the Non-Stationary Test for GWR parameter estimates.
non_stationary_test_zero_infl
- Results for the Non-Stationary Test for GWR zero inflated parameter estimates.
global_param_estimates
- Parameter estimates for the global model.
analysis_max_like_zero_infl_param_estimated
- Analysis of Maximum Likelihood Zero Inflation Parameter Estimates.
analysis_max_like_gof_measures
- Goodness of fit measures for the Analysis of Maximum Likelihood Zero Inflation Parameter Estimates.
variance_covariance_matrix
- Variance-covariance matrix.
residuals
- Model residuals.
param_estimates_grid
- GWR parameter estimates using grid dataset.
alpha_estimates
- Estimates for the alpha parameter (for zinb and negbin).
gwr_param_estimates
- GWR parameter estimates.
## Data data(southkorea_covid19) ## Model mod <- gwzinbr(data = southkorea_covid19, formula = n_covid1~Morbidity+high_sch_p+Healthcare_access+ diff_sd+Crowding+Migration+Health_behavior, lat = "x", long = "y", offset = "ln_total", method = "adaptive_bsq", model = "negbin", distancekm = TRUE, h=230, force=TRUE) ## Bandwidth mod$bandwidth ## Goodness of fit measures mod$measures
## Data data(southkorea_covid19) ## Model mod <- gwzinbr(data = southkorea_covid19, formula = n_covid1~Morbidity+high_sch_p+Healthcare_access+ diff_sd+Crowding+Migration+Health_behavior, lat = "x", long = "y", offset = "ln_total", method = "adaptive_bsq", model = "negbin", distancekm = TRUE, h=230, force=TRUE) ## Bandwidth mod$bandwidth ## Goodness of fit measures mod$measures
COVID-19 data for South Korea from January 20th 2019 to March 20th 2020.
data(southkorea_covid19)
data(southkorea_covid19)
A data frame with with 244 observations on the following 11 variables:
n_covid1
- number of COVID-19 cases in the early phase of the pandemic (prequarantine)
Morbidity
- area morbidity rate
high_sch_p
- percentage of high school educated people
Healthcare_access
- access to healthcare
diff_sd
- difficulty to social distancing
Crowding
- area crowding
Migration
- population mobility
Health_behavior
- an index calculated based on habits as alcohol drinking, current smoking, etc
x
- a numeric vector of x coordinates
y
- a numeric vector of y coordinates
ln_total
- log transformation of the province's total population