Skip to contents

Compute response distribution indicators for responses to multi-item scales or matrix questions.

Usage

resp_distributions(x, min_valid_responses = 1)

Arguments

x

A data frame containing survey responses in wide format. For more information see section "Data requirements" below.

min_valid_responses

numeric between 0 and 1. Defines the share of valid responses a respondent must have to calculate response quality indicators. Default is 1.

Value

Returns a data frame with response quality indicators per respondent. Dimensions:

  • Rows: Equal to number of rows in x.

  • Columns: Six, one for each response distribution indicator.

Details

The following response distribution indicators are calculated per respondent:

  • n_na: number of intra-individual missing answers

  • prop_na: proportion of intra-individual missing responses

  • ii_mean: intra-individual mean

  • ii_median: intra-individual median

  • ii_sd: intra-individual standard deviation

  • mahal: mahalanobis distance per respondent.

Intra-individual response variability (ii_sd) has been proposed to measure insufficient effort responding (Dunn et al., 2018) and to distinguish between random and conscientious responding (Marjanovic et al, 2015).

Intra-individual location indicators can be used to asses the average location of responses on a set of questions (ii_mean, ii_median).

Mahalanobis distance is a outlier detection indicator. It represents the distance of a participants responses from the center of a multivariate normal distribution defined by the data of all respondents.

Data requirements

resp_distributions() assumes that data comes from multi-item scales or matrix questions, which have the same number and labeling of response options for many questions. The input data frame must be structured in the following way:

  • The data frame is in wide format, meaning each row represents one respondent, each column represents one variable.

  • All responses have integer values.

  • Missing values are set to NA.

Reverse coding of variables

The interpretation of the indicators depends on the whether response data of negatively worded questions was reversed or not:

  • Do not reverse data of negatively worded questions if you want to assess average response patterns (Dunn et al., 2018).

  • Reverse data of negatively worded questions if you want to assess whether responses are distributed randomly or not with respect to an assumed latent variable (Marjanovic et al., 2015).

Mahalanobis distance could not be calculated

Under certain circumstances, the mahalanobis distance can not be calculated. This may be if there is high collinearity (correlation between variables) or if there are to many missing values. Although this can happen in survey research data, this message can also indicate that something in the data is "off" due to one of the reasons stated above. A manual inspection for low-quality responses can be a next step.

References

Dunn, Alexandra M., Eric D. Heggestad, Linda R. Shanock, and Nels Theilgard. 2018. “Intra-Individual Response Variability as an Indicator of Insufficient Effort Responding: Comparison to Other Indicators and Relationships with Individual Differences.” Journal of Business and Psychology 33(1):105–21. doi: 10.1007/s10869-016-9479-0.

Marjanovic, Zdravko, Ronald Holden, Ward Struthers, Robert Cribbie, and Esther Greenglass. 2015. “The Inter-Item Standard Deviation (ISD): An Index That Discriminates between Conscientious and Random Responders.” Personality and Individual Differences 84:79–83. doi: 10.1016/j.paid.2014.08.021.

See also

resp_styles() for calculating response style indicators.

Author

Matthias Roth, Matthias Bluemke & Clemens Lechner

Examples

# A small test data set with ten respondents
# and responses to three survey questions
# with response scales from 1 to 5.
testdata <- data.frame(
  var_a = c(1,4,3,5,3,2,3,1,3,NA),
  var_b = c(2,5,2,3,4,1,NA,2,NA,NA),
  var_c = c(1,2,3,NA,3,4,4,5,NA,NA))

# Calculate response distribution indicators
resp_distributions(x = testdata) |>
    round(2)
#>    n_na prop_na ii_mean ii_sd ii_median mahal
#> 1     0    0.00    1.33  0.58         1  2.04
#> 2     0    0.00    3.67  1.53         4  1.60
#> 3     0    0.00    2.67  0.58         3  1.38
#> 4     1    0.33      NA    NA        NA    NA
#> 5     0    0.00    3.33  0.58         3  0.97
#> 6     0    0.00    2.33  1.53         2  1.38
#> 7     1    0.33      NA    NA        NA    NA
#> 8     0    0.00    2.67  2.08         2  1.88
#> 9     2    0.67      NA    NA        NA    NA
#> 10    3    1.00      NA    NA        NA    NA

# Include respondents with NA values by decreasing the
# necessary number of valid responses per respondent.

resp_distributions(
      x = testdata,
      min_valid_responses = 0.2) |>
   round(2)
#>    n_na prop_na ii_mean ii_sd ii_median mahal
#> 1     0    0.00    1.33  0.58       1.0  2.27
#> 2     0    0.00    3.67  1.53       4.0  1.68
#> 3     0    0.00    2.67  0.58       3.0  1.05
#> 4     1    0.33    4.00  1.41       4.0  2.21
#> 5     0    0.00    3.33  0.58       3.0  1.24
#> 6     0    0.00    2.33  1.53       2.0  1.29
#> 7     1    0.33    3.50  0.71       3.5  0.71
#> 8     0    0.00    2.67  2.08       2.0  2.24
#> 9     2    0.67    3.00   NaN       3.0  0.24
#> 10    3    1.00      NA    NA        NA    NA