Compute response pattern indicators (under development)

Compute response pattern indicators for responses to multi-item scales or matrix questions.

Usage

resp_patterns(
  x,
  min_valid_responses = 1,
  defined_patterns,
  arbitrary_patterns,
  min_repetitions = 2,
  id = T
)

Arguments

x: A data frame containing survey responses in wide format. For more information see section "Data requirements" below.
min_valid_responses: Numeric between 0 and 1 of length 1. Defines the share of valid responses a respondent must have to calculate response pattern indicators. Default is 1.
defined_patterns: A vector of integer values with patterns to search for or a list of integer vectors. Will not be computed if not specified or if an empty vector is supplied.
arbitrary_patterns: A vector of integer values or a list containing vectors of integer values. The values determine the pattern that should be searched for. Will not be computed if not specified or if 0 is supplied.
id: default is True. If the default value is supplied a column named id with integer ids will be created. If False is supplied, no id column will be created. Alternatively, a numeric or character vector of unique values identifying each respondent can be supplied. Needs to be of the same length as the number of rows of x.

Value

Returns a data frame with response quality indicators per respondent. Dimensions:

Rows: Equal to number of rows in x.
Columns: Three response pattern indicators + one column for defined patterns (if specified) + one column for arbitrary patterns (if specified) + one id column (if specified).

Details

The following response distribution indicators are calculated per respondent:

n_transitions: Number of times two consecutive response options differ.
mean_string_length: Mean length of strings of identical answers.
longest_string_length: Longest length of string of identical answers.
(optional) defined_pattern: A list column that contains one named vector per respondent. The names of the vector are repeating patterns found in the responses of a respondent. The values of the vector are how often the pattern specified in the argument "defined_patterns" occurs. See section "Defined patterns" for more information.
(optional) arbitrary_patterns: A list column that contains one named vector per respondent. The names of the vector are repeating patterns found in the responses of a respondent. The values of the vector are how often the pattern occurred. See "Arbitrary patterns" for more information.
(optional) min_repetitions: Defines number of times an arbitrary pattern has to be repeated to be retained in the results.

Defined and arbitrary pattern indicators:

Responses of an individual respondent can follow patterns, such as zig-zagging across the response scale over multiple items. There might be a-priori knowledge which response patterns could occur and might be indicative of low quality responding. For this case the defined_patterns argument can be used to specify one or more patterns whose presence will be checked for each respondent. If no a-priori knowledge exists, it is possible to check for all patterns of a specified length.

Defined patterns:

A pattern is defined by providing one ore more patterns in a character vector. A few examples: resp_patterns(x,defined_patterns = c(1,2,3) checks how often the response pattern "123" occurs in the responses of a single respondent. list(c(1,2,3),c(3,2,1)) checks how often the two patterns 1 2 3 and 3 2 1 occur individually in the responses of a single respondent. There can be an arbitrary number of patterns

Arbitrary patterns

Checks for arbitrary patterns are defined by providing one ore more integer values in a numeric vector. The integers must be larger or equal to two. A few examples: resp_patterns(x,arbitrary_patterns = 2) will check for sequences of responses of length two which repeat at least two times. resp_patterns(x,arbitrary_patterns = c(2,3,4,5)) will check for sequences of responses of length two, three, four and five that repeat at least two times.

Data requirements:

resp_patterns() assumes that the input data frame is structured in the following way:

The data frame is in wide format, meaning each row represents one respondent, each column represents one variable.
The variables are in same the order as the questions respondents saw while taking the survey.
Reverse keyed variables are in their original form. No items were recoded.
All responses have integer values.
Questions have the same number of response options.
Missing values are set to NA.

References

Curran, P. G. (2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4–19. https://doi.org/10.1016/j.jesp.2015.07.006

Author

Matthias Roth, Thomas Knopf

Examples

# A small test data set with ten respondents
# and responses to three survey questions
# with response scales from 1 to 5.
testdata <- data.frame(
  var_a = c(1,4,3,5,3,2,3,1,3,NA),
  var_b = c(2,5,2,3,4,1,NA,2,NA,NA),
  var_c = c(1,2,3,NA,3,4,4,5,NA,NA))

# Calculate response pattern indicators
resp_patterns(x = testdata) |>
    round(2)
#> # A tibble: 10 × 4
#>       id n_transitions mean_string_length longest_string_length
#>    <dbl>         <dbl>              <dbl>                 <dbl>
#>  1     1             2                  1                     1
#>  2     2             2                  1                     1
#>  3     3             2                  1                     1
#>  4     4            NA                 NA                    NA
#>  5     5             2                  1                     1
#>  6     6             2                  1                     1
#>  7     7            NA                 NA                    NA
#>  8     8             2                  1                     1
#>  9     9            NA                 NA                    NA
#> 10    10            NA                 NA                    NA

# Include respondents with NA values by decreasing the
# necessary number of valid responses per respondent.

resp_patterns(
      x = testdata,
      min_valid_responses = 0.2) |>
   round(2)
#> # A tibble: 10 × 4
#>       id n_transitions mean_string_length longest_string_length
#>    <dbl>         <dbl>              <dbl>                 <dbl>
#>  1     1             2                  1                     1
#>  2     2             2                  1                     1
#>  3     3             2                  1                     1
#>  4     4             2                  1                     1
#>  5     5             2                  1                     1
#>  6     6             2                  1                     1
#>  7     7             2                  1                     1
#>  8     8             2                  1                     1
#>  9     9             2                  1                     1
#> 10    10            NA                 NA                    NA