Skip to contents

Compute response pattern indicators for responses to multi-item scales or matrix questions.

Usage

resp_patterns(
  x,
  min_valid_responses = 1,
  defined_patterns,
  arbitrary_patterns,
  min_repetitions = 2,
  id = T
)

Arguments

x

A data frame containing survey responses in wide format. For more information see section "Data requirements" below.

min_valid_responses

Numeric between 0 and 1 of length 1. Defines the share of valid responses a respondent must have to calculate response pattern indicators. Default is 1.

defined_patterns

A vector of integer values with patterns to search for or a list of integer vectors. Will not be computed if not specified or if an empty vector is supplied.

arbitrary_patterns

A vector of integer values or a list containing vectors of integer values. The values determine the pattern that should be searched for. Will not be computed if not specified or if 0 is supplied.

id

default is True. If the default value is supplied a column named id with integer ids will be created. If False is supplied, no id column will be created. Alternatively, a numeric or character vector of unique values identifying each respondent can be supplied. Needs to be of the same length as the number of rows of x.

Value

Returns a data frame with response quality indicators per respondent. Dimensions:

  • Rows: Equal to number of rows in x.

  • Columns: Three response pattern indicators + one column for defined patterns (if specified) + one column for arbitrary patterns (if specified) + one id column (if specified).

Details

The following response distribution indicators are calculated per respondent:

  • n_transitions: Number of times two consecutive response options differ.

  • mean_string_length: Mean length of strings of identical answers.

  • longest_string_length: Longest length of string of identical answers.

  • (optional) defined_pattern: A list column that contains one named vector per respondent. The names of the vector are repeating patterns found in the responses of a respondent. The values of the vector are how often the pattern specified in the argument "defined_patterns" occurs. See section "Defined patterns" for more information.

  • (optional) arbitrary_patterns: A list column that contains one named vector per respondent. The names of the vector are repeating patterns found in the responses of a respondent. The values of the vector are how often the pattern occurred. See "Arbitrary patterns" for more information.

  • (optional) min_repetitions: Defines number of times an arbitrary pattern has to be repeated to be retained in the results.

Defined and arbitrary pattern indicators:

Responses of an individual respondent can follow patterns, such as zig-zagging across the response scale over multiple items. There might be a-priori knowledge which response patterns could occur and might be indicative of low quality responding. For this case the defined_patterns argument can be used to specify one or more patterns whose presence will be checked for each respondent. If no a-priori knowledge exists, it is possible to check for all patterns of a specified length.

Defined patterns:

A pattern is defined by providing one ore more patterns in a character vector. A few examples: resp_patterns(x,defined_patterns = c(1,2,3) checks how often the response pattern "123" occurs in the responses of a single respondent. list(c(1,2,3),c(3,2,1)) checks how often the two patterns 1 2 3 and 3 2 1 occur individually in the responses of a single respondent. There can be an arbitrary number of patterns

Arbitrary patterns

Checks for arbitrary patterns are defined by providing one ore more integer values in a numeric vector. The integers must be larger or equal to two. A few examples: resp_patterns(x,arbitrary_patterns = 2) will check for sequences of responses of length two which repeat at least two times. resp_patterns(x,arbitrary_patterns = c(2,3,4,5)) will check for sequences of responses of length two, three, four and five that repeat at least two times.

Data requirements:

resp_patterns() assumes that the input data frame is structured in the following way:

  • The data frame is in wide format, meaning each row represents one respondent, each column represents one variable.

  • The variables are in same the order as the questions respondents saw while taking the survey.

  • Reverse keyed variables are in their original form. No items were recoded.

  • All responses have integer values.

  • Questions have the same number of response options.

  • Missing values are set to NA.

References

Curran, P. G. (2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4–19. https://doi.org/10.1016/j.jesp.2015.07.006

See also

resp_styles() for calculating response style indicators. resp_distributions() for calculating response distribution indicators. resp_nondifferentiation() for calculating response nondifferentiation indicators.

Author

Matthias Roth, Thomas Knopf

Examples

# A small test data set with ten respondents
# and responses to three survey questions
# with response scales from 1 to 5.
testdata <- data.frame(
  var_a = c(1,4,3,5,3,2,3,1,3,NA),
  var_b = c(2,5,2,3,4,1,NA,2,NA,NA),
  var_c = c(1,2,3,NA,3,4,4,5,NA,NA))

# Calculate response pattern indicators
resp_patterns(x = testdata) |>
    round(2)
#> # A tibble: 10 × 4
#>       id n_transitions mean_string_length longest_string_length
#>    <dbl>         <dbl>              <dbl>                 <dbl>
#>  1     1             2                  1                     1
#>  2     2             2                  1                     1
#>  3     3             2                  1                     1
#>  4     4            NA                 NA                    NA
#>  5     5             2                  1                     1
#>  6     6             2                  1                     1
#>  7     7            NA                 NA                    NA
#>  8     8             2                  1                     1
#>  9     9            NA                 NA                    NA
#> 10    10            NA                 NA                    NA

# Include respondents with NA values by decreasing the
# necessary number of valid responses per respondent.

resp_patterns(
      x = testdata,
      min_valid_responses = 0.2) |>
   round(2)
#> # A tibble: 10 × 4
#>       id n_transitions mean_string_length longest_string_length
#>    <dbl>         <dbl>              <dbl>                 <dbl>
#>  1     1             2                  1                     1
#>  2     2             2                  1                     1
#>  3     3             2                  1                     1
#>  4     4             2                  1                     1
#>  5     5             2                  1                     1
#>  6     6             2                  1                     1
#>  7     7             2                  1                     1
#>  8     8             2                  1                     1
#>  9     9             2                  1                     1
#> 10    10            NA                 NA                    NA