Getting started with resquin
Source:vignettes/getting_started_with_resquin.Rmd
getting_started_with_resquin.Rmd
An introduction to resquin
This short tutorial describe the functions in resquin
and how you can use them on a technical level. For a more substantive
introduction see the (forthcoming) article Using
resquin in practice.
Functions in resquin
calculate response quality
indicators for survey data stored in a data frame or tibble. The
functions assume that the input data frame is structured in the
following way:
- The data frame is in wide format, meaning each row represents one respondent, each column represents one variable.
- All variables have the same number of response options.
- The variables are in same the order as the questions respondents saw while taking the survey.
- All responses have integer values.
- Missing values are set to
NA
. - (For
resp_styles()
) Reverse keyed variables are in their original form. No items were recoded.
Example dataset of survey responses
Consider the following (fake) data set of survey responses.
# A test data set with three items and ten respondents
testdata <- data.frame(
var_a = c(1,4,3,5,3,2,3,1,3,NA),
var_b = c(2,5,2,3,4,1,NA,2,NA,NA),
var_c = c(1,2,3,NA,3,4,4,5,NA,NA))
testdata
#> var_a var_b var_c
#> 1 1 2 1
#> 2 4 5 2
#> 3 3 2 3
#> 4 5 3 NA
#> 5 3 4 3
#> 6 2 1 4
#> 7 3 NA 4
#> 8 1 2 5
#> 9 3 NA NA
#> 10 NA NA NA
The data set contains responses to three survey questions
(var_a,var_b and var_c) from ten respondents. All three survey question
allow responses on a scale from 1 to 5. Some respondents have missing
values, which are set to NA
.
Lets use this data set to calculate response quality indicators.
resp_styles()
: Response style indicators
Response styles capture systematic shifts in respondents response behavior. For example, respondents with an extreme response style may only choose the lowest and highest categories (in our example 1 and 5) while mid-point responder only choose the midpoint of a scale (in our example 3).
To calculate response styles we can use the
resp_styles()
function. First, we need to specify our data
argument x
. Then, we need to specify the minimum and
maximum of the scales used in our questionnaire (scale_min
and scale_max
respectively). Remember that all questions
included must have the same number of response options. We will discuss
the arguments min_valid_responses
and
normalize
later.
library(resquin)
# Calculating response style indicators for all respondents with no missing values
results_response_styles <- resp_styles(
x = testdata,
scale_min = 1,
scale_max = 5,
min_valid_responses = 1, # Excludes respondents with less than 100% valid responses
normalize = T) # Presents results in percent of all responses
round(results_response_styles,2)
#> MRS ARS DRS ERS NERS
#> 1 0.00 0.00 1.00 0.67 0.33
#> 2 0.00 0.67 0.33 0.33 0.67
#> 3 0.67 0.00 0.33 0.00 1.00
#> 4 NA NA NA NA NA
#> 5 0.67 0.33 0.00 0.00 1.00
#> 6 0.00 0.33 0.67 0.33 0.67
#> 7 NA NA NA NA NA
#> 8 0.00 0.33 0.67 0.67 0.33
#> 9 NA NA NA NA NA
#> 10 NA NA NA NA NA
The resulting data frame contains five columns corresponding to the
middle response style (MRS), acquiescence response style (ARS),
disaquiescence response style (DRS), extreme response style (ERS), and
non-extreme response style (NERS) - you can learn more about the
response styles in the help file of the function using
?resp_styles
.
Each respondent receives one value for each indicator, given that
they can be calculated. Because normalize
is set to
TRUE
the values are expressed as the share of responses of a
respondent that can be attributed to a response style. For example,
respondent one has an ERS value of 0.67 meaning that two out of three
responses can be identified as extreme responses. On the other hand,
respondent one does not have any mid-point response, leading to a value
of 0 in the MRS column.
Instead of calculating proportions, we can extract the counts of
responses that can be attributed to a response option by setting
normalize
to FALSE
.
Finally, we can decide to include or exclude respondents from
receiving response style values by setting
min_valid_responses
, which can take values from 0 to 1.
min_valid_responses
sets the share of valid responses
(i.e. non-missing responses) a respondent must have to receive response
style values. A value of 0 indicates that response style values should
be calculated for all respondents, regardless of whether or not they
have missing values. A value of 1 indicates that response styles should
only be calculated for respondents who have valid responses on all
variables. Values between 0 and 1 indicate the share of responses that
need to be valid to be included in the response style calculations.
resp_distributions()
: Intra-individual response
distribution indicators
resp_distributions()
calculates indicators which reflect
the location and variability of responses within a respondent.
resp_distributions()
works similar to
resp_styles()
: We need to specify the data argument and we
can include or exclude respondents from the calculations based on amount
of missing data they exhibit (for an explanation see paragraph
above).
# Calulating response distribution indicators for all respondents with no missing values
results_resp_distributions <- resp_distributions(
x = testdata,
min_valid_responses = 1) # Excludes respondents with less than 100% valid responses
round(results_resp_distributions,2)
#> n_na prop_na ii_mean ii_sd ii_median mahal
#> 1 0 0.00 1.33 0.58 1 2.04
#> 2 0 0.00 3.67 1.53 4 1.60
#> 3 0 0.00 2.67 0.58 3 1.38
#> 4 1 0.33 NA NA NA NA
#> 5 0 0.00 3.33 0.58 3 0.97
#> 6 0 0.00 2.33 1.53 2 1.38
#> 7 1 0.33 NA NA NA NA
#> 8 0 0.00 2.67 2.08 2 1.88
#> 9 2 0.67 NA NA NA NA
#> 10 3 1.00 NA NA NA NA
The resulting data frame contains eight columns:
n_na: number of intra-individual missing answers
prop_na: proportion of intra-individual missing responses
ii_mean: intra-individual mean
ii_median: intra-individual median
ii_sd: intra-individual standard deviation
mahal: Mahalanobis distance per respondent.
You can learn more about the response distribution indicators using
?resp_distributions