Compute response pattern indicators for responses to multi-item scales or matrix questions.
Usage
resp_patterns(
x,
min_valid_responses = 1,
defined_patterns,
arbitrary_patterns,
min_repetitions = 2
)
Arguments
- x
A data frame containing survey responses in wide format. For more information see section "Data requirements" below.
- min_valid_responses
numeric between 0 and 1. Defines the share of valid responses a respondent must have to calculate response pattern indicators. Default is 1.
- defined_patterns
A character vector with patterns to search for. Will not be computed if not specified or if an empty vector is supplied.
- arbitrary_patterns
A vector of integer values or a list containing vectors of integer values. The values determine the pattern that should be searched for. Will not be computed if not specified or if 0 is supplied.
Value
Returns a data frame with response quality indicators per respondent. Dimensions:
Rows: Equal to number of rows in x.
Columns:
Details
The following response distribution indicators are calculated per respondent:
n_transitions: Number of times two consecutive response options differ.
mean_string_length: Mean length of strings of identical answers.
longest_string_length: Longest length of string of identical answers.
(optional) defined_pattern: A list column that contains one named vector per respondent. The names of the vector are repeating patterns found in the responses of a respondent. The values of the vector are how often the pattern specified in the argument "defined_patterns" occurs. See section "Defined patterns" for more information.
(optional) arbitrary_patterns: A list column that contains one named vector per respondent. The names of the vector are repeating patterns found in the responses of a respondent. The values of the vector are how often the pattern occurred. See "Arbitrary patterns" for more information.
Defined and arbitrary pattern indicators:
Responses of an individual respondent can follow patterns, such as zig-zagging across the response scale over multiple items. There might be a-priori knowledge which response patterns could occur and might be indicative of low quality responding. For this case the defined_patterns argument can be used to specify one or more patterns whose presence will be checked for each respondent. If no a-priori knowledge exists, it is possible to check for all patterns of a specified length.
Defined patterns:
A pattern is defined by providing one ore more patterns in a character vector. A few examples: resp_patterns(x,defined_patterns =" checks how often the response pattern "123" occurs in the responses of a single respondent. c("123","321") checks how often the two patterns "123" and "321" occur individually the responses of a single respondent. There can be an arbitrary number of patterns
Arbitrary patterns
Checks for arbitrary patterns are defined by providing one ore more integer values in a numeric vector. The integers must be larger or equal to two. A few examples: resp_patterns(x,arbitrary_patterns = 2) will check for sequences of responses of length two which repeat at least two times. resp_patterns(x,arbitrary_patterns = c(2,3,4,5)) will check for sequences of responses of length two, three, four and five that repeat at least two times.
Data requirements:
resp_patterns()
assumes that the input data frame is structured in the following way:
The data frame is in wide format, meaning each row represents one respondent, each column represents one variable.
The variables are in same the order as the questions respondents saw while taking the survey.
Reverse keyed variables are in their original form. No items were recoded.
All responses have integer values.
Questions have the same number of response options.
Missing values are set to
NA
.
References
Curran, P. G. (2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4–19. https://doi.org/10.1016/j.jesp.2015.07.006
See also
resp_styles()
for calculating response style indicators.
resp_distributions()
for calculating response distribution indicators.
resp_nondifferentiation()
for calculating response nondifferentiation indicators.
Examples
# A small test data set with ten respondents
# and responses to three survey questions
# with response scales from 1 to 5.
testdata <- data.frame(
var_a = c(1,4,3,5,3,2,3,1,3,NA),
var_b = c(2,5,2,3,4,1,NA,2,NA,NA),
var_c = c(1,2,3,NA,3,4,4,5,NA,NA))
# Calculate response pattern indicators
resp_patterns(x = testdata) |>
round(2)
#> # A tibble: 10 × 3
#> n_transitions mean_string_length longest_string_length
#> <dbl> <dbl> <dbl>
#> 1 2 1 1
#> 2 2 1 1
#> 3 2 1 1
#> 4 NA NA NA
#> 5 2 1 1
#> 6 2 1 1
#> 7 NA NA NA
#> 8 2 1 1
#> 9 NA NA NA
#> 10 NA NA NA
# Include respondents with NA values by decreasing the
# necessary number of valid responses per respondent.
resp_patterns(
x = testdata,
min_valid_responses = 0.2) |>
round(2)
#> # A tibble: 10 × 3
#> n_transitions mean_string_length longest_string_length
#> <dbl> <dbl> <dbl>
#> 1 2 1 1
#> 2 2 1 1
#> 3 2 1 1
#> 4 2 1 1
#> 5 2 1 1
#> 6 2 1 1
#> 7 2 1 1
#> 8 2 1 1
#> 9 2 1 1
#> 10 NA NA NA