The goal of surveysd is to combine all necessary steps to use
calibrated bootstrapping with custom estimation functions. This vignette
will cover the usage of the most important functions. For insights in
the theory used in this package, refer to
vignette("methodology")
.
A test data set based on
data(eusilc, package = "laeken")
can be created with
demo.eusilc()
library(surveysd)
set.seed(1234)
<- demo.eusilc(n = 2, prettyNames = TRUE)
eusilc
1:5, .(year, povertyRisk, gender, pWeight)] eusilc[
year | povertyRisk | gender | pWeight |
---|---|---|---|
2010 | FALSE | female | 504.5696 |
2010 | FALSE | male | 504.5696 |
2010 | FALSE | male | 504.5696 |
2010 | FALSE | female | 493.3824 |
2010 | FALSE | male | 493.3824 |
Use stratified resampling without replacement to generate 10 samples. Those samples are consistent with respect to the reference periods.
<- draw.bootstrap(eusilc, REP = 10, hid = "hid", weights = "pWeight",
dat_boot strata = "region", period = "year")
Calibrate each sample according to the distribution of
gender
(on a personal level) and region
(on a
household level).
<- recalib(dat_boot, conP.var = "gender", conH.var = "region",
dat_boot_calib epsP = 1e-2, epsH = 2.5e-2, verbose = FALSE)
1:5, .(year, povertyRisk, gender, pWeight, w1, w2, w3, w4)] dat_boot_calib[
year | povertyRisk | gender | pWeight | w1 | w2 | w3 | w4 |
---|---|---|---|---|---|---|---|
2010 | FALSE | female | 504.5696 | 1025.360 | 0.4581938 | 0.4456302 | 0.4520549 |
2010 | FALSE | male | 504.5696 | 1025.360 | 0.4581938 | 0.4456302 | 0.4520549 |
2010 | FALSE | male | 504.5696 | 1025.360 | 0.4581938 | 0.4456302 | 0.4520549 |
2011 | FALSE | female | 504.5696 | 1024.862 | 0.4721126 | 0.4582807 | 0.4608312 |
2011 | FALSE | male | 504.5696 | 1024.862 | 0.4721126 | 0.4582807 | 0.4608312 |
Estimate relative amount of persons at risk of poverty per period and
gender
.
<- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio, group = "gender")
err.est $Estimates err.est
year | n | N | gender | val_povertyRisk | stE_povertyRisk |
---|---|---|---|---|---|
2010 | 7267 | 3979572 | male | 12.02660 | 0.5882841 |
2010 | 7560 | 4202650 | female | 16.73351 | 0.7473909 |
2010 | 14827 | 8182222 | NA | 14.44422 | 0.6626295 |
2011 | 7267 | 3979572 | male | 12.81921 | 0.6059190 |
2011 | 7560 | 4202650 | female | 16.62488 | 0.7355060 |
2011 | 14827 | 8182222 | NA | 14.77393 | 0.6631967 |
The output contains estimates (val_povertyRisk
) as well
as standard errors (stE_povertyRisk
) measured in percent.
The rows with gender = NA
denotes the aggregate over all
genders for the corresponding year.
Estimate relative amount of persons at risk of poverty per period for
each region
, gender
, and combination of
both.
<- list("gender", "region", c("gender", "region"))
group <- calc.stError(dat_boot_calib, var = "povertyRisk", fun = weightedRatio, group = group)
err.est head(err.est$Estimates)
year | n | N | gender | region | val_povertyRisk | stE_povertyRisk |
---|---|---|---|---|---|---|
2010 | 261 | 122741.8 | male | Burgenland | 17.414524 | 3.831697 |
2010 | 288 | 137822.2 | female | Burgenland | 21.432598 | 3.243412 |
2010 | 359 | 182732.9 | male | Vorarlberg | 12.973259 | 1.869263 |
2010 | 374 | 194622.1 | female | Vorarlberg | 19.883637 | 3.112974 |
2010 | 440 | 253143.7 | male | Salzburg | 9.156964 | 1.809600 |
2010 | 484 | 282307.3 | female | Salzburg | 17.939382 | 2.587059 |
## skipping 54 more rows