The Impact of Scale Direction on Data Quality

838410.18148/srm/2025.v19i2.8384The Impact of Scale Direction on Data Quality

Ting Yan tyanuconn@gmail.com University of Chicago,
NORC Chicago,
Illinois USA

Alexandru Cernat alexandru.cernat@manchester.ac.uk University of Manchester Manchester UK

Florian Keusch f.keusch@uni-mannheim.de University of Mannheim Mannheim Germany

175122025European Survey Research Association

Survey literature has documented a tendency for respondents to select scale options closer to the start of a scale, resulting in scale direction effects. This paper builds upon the literature and expands it in two important ways. First, we experimentally manipulated scale length, scale labeling, and scale type, and fully crossed them with scale direction. This experimental setup allows us to examine the moderating effects of other scale features on scale direction effects. Second, previous literature largely examined the impact of scale direction on distributional properties of answers. In this paper, we conducted a systematic assessment of reliability and validity of answers as well as measurement equivalence by scale direction, allowing us to gain a deeper understanding and a more complete picture of the impact of scale direction on resultant answers. The findings will have important implications for question writers and will provide practical guidance on the direction of scales.

1 Introduction

Scale direction effects refer to respondents’ tendency to select scale points closer to the beginning of a response scale, holding other scale features constant (Yan & Keusch, 2015). Unlike response order effects observed for unordered response categories, which are manifested as primacy effects when the communication channel is visual (e.g., a web or paper survey) and recency effects when the response option list is read to respondents (e.g., a telephone survey), scale direction effects are not contingent upon the mode of data collection. Scale direction is empirically shown to produce primacy effects across a variety of modes of data collection such as telephone surveys (e.g., Yan & Keusch, 2015), face-to-face surveys (e.g., Carp, 1974), web surveys (e.g., Garbarski, Schaeffer, & Dykema, 2015, 2019; Höhne, Krebs, & Kühnel, 2023; Keusch & Yan, 2018; Tourangeau et al., 2017), mobile web surveys (e.g., Keusch & Yan, 2017; Tourangeau et al., 2017), and paper questionnaires (e.g., Höhne & Krebs, 2018; Israel, 2006).

Two mechanisms have been proposed to account for scale direction effects. According to the satisficing account (Krosnick, 1991, 1999), respondents unwilling or unable to exert the cognitive effort required to provide an optimal response satisfice by taking a cognitive short-cut and selecting the first acceptable or satisfactory scale point, leading to scale direction effects. Researchers resorting to the satisficing notion to explain scale direction effects essentially treat scale direction effects as a special case of primacy effects (e.g., Krosnick & Presser, 2010). The second possible mechanism is the anchoring-and-adjustment heuristic proposed by Tversky and Kahneman (1974). According to this account, respondents who are provided a response scale use the beginning scale point as an anchor and then make adjustments to that anchor until they reach a satisfactory scale point. Yan and Keusch (2015) demonstrated empirically that the anchoring-and-adjustment heuristic is at work for scale direction effects. Because both mechanisms predict primacy effects, it is hard to pinpoint which mechanism is at work under what circumstances. Conceptually speaking, the two mechanisms differ with respect to moderators of the scale direction effects. The satisficing account argues for a stronger scale direction effect for difficult tasks and among respondents with limited cognitive capacity and decreased motivation (Krosnick, 1991, 1999). However, empirical research showed that satisficing cannot entirely account for scale direction effects as these tend to be observed across the board among respondents who were at a high risk of satisficing and those who were not (e.g., Keusch & Yan, 2018) and under conditions that were conducive to satisficing and conditions that were not (e.g., Mingay & Greenwell, 1989; Carp, 1974). By contrast, conditions facilitating the use of anchoring-and-adjustment heuristic include relevant anchors (Mussweiler & Strack, 1999), plausible anchors (Wegener & Petty, 1995), respondents with a high need for cognition (Epley & Gilovich, 2006), respondents with high agreeableness (Eroglu & Croxton, 2010), respondents without required knowledge (Wilson et al., 1996), and respondents attentive to the anchor (Wilson et al., 1996). So far, only Yan and Keusch (2015) demonstrated empirically a stronger scale direction effect among respondents without the necessary knowledge, consistent with the predictions of the anchoring-and-adjustment process.

There are a few additional gaps in the literature. First, most of the research on scale direction effects focused on rating scales with attitudinal items. Example rating scales examined in earlier research include agreement scales (e.g., Höhne & Krebs, 2018; Leon et al., 2022; Yan & Keusch, 2018), satisfaction scales (e.g., Smyth et al., 2019), and evaluative scales (e.g., Garbarski et al., 2015, 2019). By contrast, only three studies investigated scale direction effects for questions using frequency scales (Carp, 1974; Keusch & Yan, 2019; Tourangeau et al., 2017). Carp (1974) examined 10 questions on frequency of trips using an 8‑point fully labeled frequency scale but failed to find evidence indicating that scale direction affected answers to these behavioral questions. Keusch and Yan (2019) varied scale direction, scale alignment, and verbal labeling of a 5-point fully labeled unipolar frequency scale to 10 survey items. Again, they did not find any significant effect of scale direction on resultant answers. Tourangeau and colleagues (2017) assessed scale direction effects to six behavior items using frequency scales and found significant scale direction effects only when the frequency scale had seven scale points but not when a 5-point scale was used. To advance scale direction literature, this paper examines the impact of scale direction on answers to questions that measure the same constructs but use either agreement or frequency scales.

Second, most studies on scale direction effects did not attempt to take into consideration potential confounding or moderating impact of other characteristics of the question and the scale. They tend to use scales as they are, without experimentally manipulating other features of the question and the scale. As a result, scale direction effects are observed in existing empirical research on both end-labeled scales (e.g., Keusch & Yan, 2018) and fully-labeled scales (e.g., Leon et al., 2022), on 5‑point scales (e.g., Garbarski et al., 2015, 2019) and longer scales (Krebs and Hoffmeyer-Zlotnik, 2010), on scales vertically aligned (e.g., Christian et al., 2009; Höhne and Lenzner, 2015) and horizontally aligned (e.g., Keusch & Yan, 2018), and on bipolar scales (e.g., Hofmans et al., 2007) and unipolar scales (e.g., Höhne et al., 2023). But it is not clear what conditions are more prone to scale direction effects.

A secondary data analysis demonstrates stronger scale direction effects for longer scales, questions with both subjective and behavioral components, and survey items appearing earlier in a questionnaire (Yan et al., 2018). In addition, the moderating impact of question type, question location, and scale length on scale direction effects is more pronounced for items administered via Computer-Assisted Personal Interviewing (CAPI) than in self-administration.

Two studies experimentally varied scale length in addition to scale direction (Höhne et al., 2023; Tourangeau et al., 2017). Both found scale direction effects for 7‑point scales only, but not for 5‑point scales. Three studies experimentally varied scale alignment but failed to find significant interaction effects between scale direction and scale alignment (Tourangeau et al., 2017; Keusch and Yan, 2019; Garbarski, Schaeffer, & Dykema, 2019). Tourangeau and colleagues (2017) also varied scale labeling so that half of respondents received fully-labeled scales and the other half end-labeled scales. They did not find a significant interaction between scale direction and scale labeling. Keusch and Yan (2019) manipulated verbal labels of frequency scales; frequency scales were labeled with quantifiers only (e.g., never, a little of the time, some of the time, most of the time, all of the time), precise frequency labels (e.g., zero days, one or two days, three or four days, five or six days, seven days), or a combination of both (e.g., never [zero days], a little of the time [one or two days], some of the time [three or four days], most of the time [five or six days], all of the time [seven days]). Although verbal labeling had a significant main effect on resultant answers, it did not interact with scale direction.

This paper reports findings from an experiment fully crossing scale direction with three other scale features: scale length (5- or 7‑point), scale labeling (end-labeled or fully-labeled), and scale type (agreement or frequency). This experiment allows us to clearly tease apart the moderating effects of scale features on scale direction effects.

Third, most studies (especially the earlier ones) examined scale direction effects in terms of respondents’ selection of response options closer to the beginning of a scale (e.g., Israel, 2006; Tourangeau et al., 2017) and means (e.g., Yan & Keusch, 2015; Garbarski et al., 2015). Only a few studies used latent variable models in their analysis of scale direction effects. Three studies showed measurement invariance by scale direction (Höhne et al., 2018; 2021; Krebs and Hoffmeyer-Zlotnik, 2010). However, Höhne and colleagues (2023) found that measurement invariance was achieved for 5‑point scales but not for 7‑point scales. Shifts in latent means were found for agreement scales (Höhne & Krebs, 2018) and 7‑point scales (Höhne et al., 2023), but were not found in the Höhne & Krebs, 2018 study, which used 7‑point end-labeled scales. Liu and Keusch (2017) showed that the latent content factor did not differ by scale direction. Two studies examined reliability or validity. Höhne and colleagues (2023) found that the composite reliability did not differ for 7‑point scales, but was higher for the 5‑point scale running from a high (or positive) end to a low (or negative) end than for the 5‑point scale progressing from a low (or negative) end to a high (or positive) end. Saris and Gallhofer (2007) showed that reliability and validity estimates did not significantly differ by scale direction in a meta-analysis of MultiTrait-MultiMethod (MTMM) experiments. In terms of indirect indicators of data quality, Liu and Keusch (2017) found that scale direction affected acquiescence but Yan and Keusch (2018) did not find evidence that scale direction affected acquiescence, mid-point response style, straightlining, and internal consistency in four surveys conducted face-to-face and online.

In this paper, we conducted a comprehensive analysis of scale direction effects by examining the impact of scale direction on means, validity, reliability, and other indicators of data quality including acquiescence, straightlining, extreme answers, and midpoint answers.

This paper uses data from two waves of a web survey collected about a month apart. In the first wave of the web survey, a 2 * 2 * 2 * 2 experiment was implemented on a set of 15 survey questions that fully crosses scale direction (ascending direction progressing from the low/negative end to the high/positive end vs. descending from the high/positive end to the low/negative end), scale length (5-point vs. 7-point), scale labeling (fully labeled vs. end labeled), and scale type (agreement scale vs. frequency scale). In the second wave, half of respondents were randomly assigned to receive the same scale direction assignment as in the first wave whereas the other half received a different scale direction. Taking advantage of the between-subject and within-subject design, we answer three research questions:

2 Data and Method

The first wave of data was collected in September 2014, achieving a total of 3007 completes at a response rate of 83% using AAPOR RR1 formula. Wave 2 was conducted in October 2014 with a total of 2740 completes and a response rate of 75% again using AAPOR RR1. Both response rates do not take into consideration panel recruitment response rates. In addition, panelists were allowed to use their own device to complete the two web surveys. Only about 5% of respondents completed the web survey on a mobile device. We pooled data across devices for the analysis but controlled for device use in regression models.

We experimentally manipulated scale direction so that half of the respondents received scales in an ascending order, that is, the scale begins with the low (i.e., never) or the negative end (i.e., totally disagree) and progresses to the high (i.e., always) or the positive end (i.e., totally agree), and the other half scales in a descending order (i.e., starting with the high or positive end). The 15 target items either used a bipolar agreement scale (ranging either from totally agree to totally disagree or from totally disagree to totally agree) or a unipolar frequency scale (from never to always or from always to never). Scales had either five or seven points. Either all scale points had a verbal label (fully-labeled conditions) or only the two endpoints had a verbal label (end-labeled conditions). All items were presented individually, that is, one item per screen, with response options shown in a vertical line.

All four experimental factors are fully crossed and respondents were randomly assigned to one of the sixteen cells at Wave 1 and Wave 2 separately. The assignment of scale type, scale length, and scale labeling was kept the same across waves. However, for scale direction assignment, a random half of respondents were given the same scale direction across waves, but the other half received scales of different directions (for instance, if they were assigned scales in ascending order at Wave 1, they would get scales in descending order at Wave 2). All survey data collected as part of the experiment is available at https://doi.org/10.17026/dans-z3f-jc65.

To answer RQ1, we re-scaled answers to all questions on a 0–1 scale where 1 represents more agreement or higher frequency. We then compared the means of the re-scaled answers to every item by scale direction. We restructured the data to the long format and ran a multilevel model to test the impact of scale direction on the rescaled means. As shown in the formula below, we accounted for the nested structure of the data by allowing for a random intercept at the individual level. We also estimated separate multilevel models testing interactions between the scale direction and each of the three moderating factors (scale type, scale length, and scale labeling).

Where Y_i, j is the rescaled response to question i by individual j, γ₀ is the intercept of the regression, and Σγ_hx_hij are a set of h predictors of the outcomes (e.g., experimental group assignment). The variance is decomposed in the individual level variation (u_0j) and the residual (ε_ij).

To address RQ2 on reliability by scale direction, we utilized the within-person design of the study and estimated reliability as the correlation between Wave 1 and Wave 2 measures for each question. We ran the correlations separately by scale direction as well as by scale direction and four moderator variables: scale type, scale length, scale labeling, and an indicator of whether scale direction in Wave 2 was changed from Wave 1. To test whether the difference is significant we first estimate reliability separately for each combination of scale direction and the three moderators (scale labeling, scale length and scale type). We then pooled all reliability estimates into a dataset consisting of 240 reliability estimates and used t‑test to compare reliability by scale direction.

To address RQ2 on validity by scale direction, we developed a MultiTrait-MultiError model (Cernat & Oberski, 2019, 2022, 2023) in which scale direction is considered as a potential source of systematic bias. Due to the within-subject experimental setup, two measurements are available from each individual. Furthermore, a random half of respondents received a different scale direction at Wave 2. Consequently, we can estimate the following model in the Structural Equation Modeling framework (Bollen, 1989):

where

y_{t d}^{*}

is the observed variable measuring a particular trait or topic, t, using a particular scale direction, d. We decompose the observed variance into three sources of variation: T, measuring the trait variance, D, measuring the scale direction variance, and an item specific random error, ε_td. The trait variance represents the valid source of variation that measures the concept of interest. The direction variance is systematic measurement error as it represents consistent answering patterns due to the format of the response scale and not the content. The random error represents noise in the data that can bias confidence intervals and multivariate analyses. Fig. 1 visually represents the model for the six mindfulness items as an illustration.

To answer RQ3, we calculated a number of proxy data quality indicators at the respondent level and investigated how they are affected by scale direction. Acquiescence was evaluated on questions using agreement scales and was calculated as the percentage of times respondents selected “totally agree.” For straightlining we created a binary indicator for each respondent to indicate whether or not they provided the same answer to either battery of items. Extreme response style was calculated as the proportion of times one of the most extreme categories was chosen. Middle response style was calculated as the proportion of times the middle category was chosen.

To test whether differences due to scale direction are statistically significant, we ran OLS regression models (or logistic regression models if the outcome was dichotomous) with the experimental factors as predictors in addition to conducting t‑tests and chi squared tests. We then specified separate models testing interactions between scale direction and each of the three moderator variables. We ran all the models for Wave 1 data and Wave 2 data separately. For models on Wave 2 data, we also included, as another possible moderator variable, whether respondents received the same or different scale direction. All regression models used to address RQ3 are at the respondent level, and are not multilevel models.

All data was cleaned and analyzed in R 4.3.2. Respondent-level regression models were run using the lme4 package (Bates et al., 2015). The multilevel models were run using the lme4 package (Bates et al., 2015) while the equivalence testing and the MTME were estimated using lavaan (Rosseel, 2012).

3 Results

3.1 RQ1. Scale Direction Effects On Means

The moderation effects of scale length, scale type, and scale labeling are plotted in Fig. 2, which shows the predicted values from the multilevel model (Table A2 in the Appendix). Scale direction effects are moderated by scale length. In particular, a significant scale direction effect was observed for 7‑point scales but not for 5‑point scales (see Table A2 in the Appendix). Neither scale labeling nor scale type moderated the impact of scale direction on means.

3.2 RQ2: Scale Direction Effects on Reliability

Fig. 3 presents the reliability of answers to each of the 15 survey items by scale direction. Scale direction does not seem to have a consistent impact on reliability. The overall difference in reliability by scale direction is small and not statistically significant (p > 0.05). The average reliability is 0.56 for the ascending order and 0.57 for the descending order (t = −0.58, p = 0.55). Scale length, labeling, and type did not moderate the impact of scale direction on reliability (see Table A3 in the Appendix). The interaction term between scale direction and whether or not scale direction was changed from Wave 1 to Wave 2 is also not statistically significant (Table A3 in the Appendix).

3.3 RQ2: Scale Direction Effects on Validity

We decomposed the variance of the responses to mindfulness questions using the MTME approach and plotted in Fig. 4 the estimated validity, the systematic variance due to the scale direction, and the random error. Overall, the validity for the mindfulness questions is relatively low, at around 50%. The systematic variation due to the scale direction is around 9% while the remaining variance is due to random error (Fig. 4). Validity did not seem to differ by scale direction though. We carried out variance decomposition for each of the six items measuring mindfulness. As shown in Figure A4 in Appendix, validity varies considerably with the lowest validity for the “absorbed” item, which is worded in the opposite direction to the rest of the mindfulness items. However, the effect of the scale direction seems relatively consistent across items.

3.4 RQ2: Scale Direction Effects on Measurement Equivalence

Table 1 Equivalence testing for Wave 1 data

Model	Chisq	Df	CFI	RMSEA	AIC	BIC
Mindfulness Questions
Configural	661.93	18	0.85	0.15	−3906.1	−3689.8
Metric	661.93	18	0.85	0.15	−3906.1	−3689.8
Scalar	661.93	18	0.85	0.15	−3906.1	−3689.8
Means	661.93	18	0.85	0.15	−3906.1	−3689.8
Political Efficacy Questions
Configural	2612.2	54	0.62	0.18	−5678.8	−5354.5
Metric	2612.2	54	0.62	0.18	−5678.8	−5354.5
Scalar	2612.2	54	0.62	0.18	−5678.8	−5354.5
Means	2612.2	54	0.62	0.18	−5678.8	−5354.5

3.5 RQ3: Scale Direction Effects on Proxy Quality Indicators

We found no significant differences in respondents’ likelihood to select extreme responses (18% vs. 18%, t = −1.129, df = 2997.6, p-value = 0.259) and middle responses by scale direction at both Wave 1 (26% vs. 26%, t = 0.13981, df = 2985.3, p-value = 0.8888, Fig. 6) and Wave 2 (Figure S5 in Supplemental Materials). Furthermore, there was no moderating effects by any of the three scale features on extreme responses and middle responses (Tables A8 and A9 in Appendix for Wave 1 results, and tables S8 and S9 in Supplemental Materials for Wave 2 results).

4 Discussion

Table 2 Summary of Findings

Outcomes Evaluated	Scale Direction Effects	Moderators
RQ1: Means	Yes	Wave 1: scale length Wave 2: no
RQ2: Reliability	No	–
RQ2: Validity	No	Not examined
RQ2: Measurement Equivalence	No	Not examined
RQ3: Acquiescence	No	No
RQ3: Straightlining	No	No
RQ3: Extreme Response Style	No	No
RQ3: Middle Response Style	No	No

Following the suit of the majority of earlier empirical research on scale direction effects, we examined the impact of scale direction on means of resultant answers. Consistent with the literature, we found a significant scale direction effect on means for both Wave 1 and Wave 2 data; means are larger for descending scales starting with the high/positive end than for ascending scales beginning with the low/negative end. For Wave 1 data, only one scale feature significantly moderated the impact of scale direction on means. Scale direction effect was found for seven-point scales but not for five-point scales consistent with the literature (e.g., Höhne et al., 2023; Tourangeau et al., 2017; Yan et al., 2018). The literature is mixed on whether scale direction affects answers to frequency scales. Two studies found no impact of scale direction for frequency scales (Carp, 1974; Keusch & Yan, 2019) and one study found a significant scale direction effect on 7‑point frequency scales, but not 5‑point scales (Tourangeau et al., 2017). We did not find a significant interaction effect of scale direction and scale type in this study.

We further found that scale direction had no significant impact on test-retest reliability and that none of the other scale features varied in our experiment moderated the effect of scale direction on reliability, different from the findings on composite reliability (Höhne et al., 2023) but consistent with reliability estimates from MTMM models (Saris and Gallhofer, 2007). Furthermore, scale direction was found to have no impact on validity estimated from MTMM models, consistent with earlier research by Saris and Gallhofer (2007). We found evidence of measurement equivalence by scale direction, consistent with Höhne and colleagues (2023) and Liu and Keusch (2017).

Consistent with Yan and Keusch (2018), we found no evidence of scale direction affecting straightlining, extreme responses, and middle responses. However, scale direction did marginally affect the proportion of times respondents selected “strongly agree” at Wave 1, but not at Wave 2.

Our findings have important practical implications. Survey researchers and practitioners have been concerned about scale direction effects and searching for evidence-based practical guidelines on which scale direction to use (see Discussion in Yan and Keusch, 2015). Our experimental findings provide good news for researchers and practitioners who are concerned with reliability, validity, measurement equivalence, straightlining, extreme responses, and middle responses. However, if survey researchers and practitioners are to use means of resultant answers for classification and comparison purposes, they should decide on one scale direction and use that direction consistently throughout the questionnaire and across different waves of a panel study. At the same time, users should be mindful of scale direction when making comparisons cross surveys and/or waves.

The survey literature recommends researchers and practitioners to avoid using agreement scales and to use construct or item specific scales instead (Saris, Revilla, Krosnick, & Schaeffer, 2010). We did not find evidence supporting worse performance of the agreement scales than frequency scales. However, if an agreement scale has to be used, we suggest using an ascending order starting with disagree options to reduce inflated acquiescing answers due to scale direction effects.

We attempted to understand mechanisms accounting for observed scale direction effects but due to lack of informative moderators (such as perceived relevant of anchors) we could not draw conclusions on whether the observed scale direction effects were due to satisficing or the use of anchoring-and-adjustment heuristics. We interpreted the presence of a moderating impact of scale length as a piece of evidence supporting the satisficing notion as the mechanism accounting for scale direction effects. This is because longer scales are cognitively harder to process than shorter scales but the anchors (that is, the scale endpoints) are the same regardless of scale length. We interpret the absence of scale direction effects on validity, reliability, and measurement equivalence as a support for the anchoring-and-adjustment heuristics as the working mechanism because satisficing, by definition, induces data of lower quality. However, studies are needed that systematically vary the moderators described in the Introduction section in order to tease apart the two mechanisms. We recommend continuing the research on establishing mechanisms accounting for scale direction effects and on uncovering circumstances under which quality of answers differ by scale direction.

1 Supplementary Information

References

Carp, F. M. (1974). Position effects on interview response. Journal of Gerontology, 29, 581–587. a, b, c, d, e

Cernat, A., & Oberski, D. L. (2019). Extending the within-persons experimental design: the Multitrait-Multierror (MTME) approach. In P. Lavrakas, M. Traugott, C. Kennedy, A. Holbrook, E. de Leeuw & B. West (Eds.), Experimental methods in survey research (1st edn., pp. 481–500). New York: Wiley. https://doi.org/10.1002/9781119083771.ch24. →

Cernat, A., & Oberski, D. L. (2022). Estimating stochastic survey response errors using the Multitrait-Multierror model. Journal of the Royal Statistical Society Series A: Statistics in Society, 185(1), 134–155. https://doi.org/10.1111/rssa.12733.

Cernat, A., & Oberski, D. L. (2023). Estimating measurement error in longitudinal data using the longitudinal MultiTrait MultiError approach. Structural Equation Modeling: A Multidisciplinary Journal., 30(4), 592–603. https://doi.org/10.1080/10705511.2022.2145961.

Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling: A Multidisciplinary Journal., 14(3), 464–504. https://doi.org/10.1080/10705510701301834. →

Christian, L. M., Parsons, N. L., & Dillman, D. A. (2009). Designing scalar questions for web surveys. Sociological Methods & Research, 37, 393–425. →

Epley, N., & Gilovich, T. (2006). The anchoring-and-adjustment heuristic. Why the adjustments are insufficient. Psychological Science, 17, 311–318. →

Eroglu, C., & Croxton, K. L. (2010). Biases in judgmental adjustments of statistical forecasts: the role of individual differences. International Journal of Forecasting, 26, 116–133. https://doi.org/10.1016/j.ijforecast.2009.02.005. →

Garbarski, D., Schaeffer, N. C., & Dykema, J. (2015). The effects of response option order and question order on self-rated health. Qual Life Res, 24(6), 1443–1453. https://doi.org/10.1007/s11136-014-0861-y. a, b, c, d

Garbarski, D., Schaeffer, N. C., & Dykema, J. (2019). The effects of features of survey measurement on self-rated health: response option order and scale orientation. Applied Research in Quality of Life, 14(2), 545–560. a, b, c, d

Hofmans, J., Theuns, P., Baekelandt, S., et al. (2007). Bias and change in perceived intensity of verbal qualifiers effected by scale orientation. Survey Research Methods, 1, 97–108. →

Höhne, J. K., & Krebs, D. (2018). Scale direction effects in agree/disagree and item-specific questions: a comparison of question formats. International Journal of Social Research Methodology, 21(1), 91–103. https://doi.org/10.1080/13645579.2017.1325566. a, b, c, d

Höhne, J. K., & Lenzner, T. (2015). Investigating response order effects in web surveys using eye tracking. Psiholgija, 48, 361–377. →

Höhne, J. K., Krebs, D., & Kühnel, S. M. (2023). Investigating direction effects in rating scales with five and seven points in a probability-based online panel. Survey Research Methods, 17(2), 193–204. https://doi.org/10.18148/srm/2023.v17i2.8006. a, b, c, d, e, f, g, h, i

Israel, G. D. (2006). Visual cues and response format effects in mail surveys. Revised version of the paper presented at the Annual Meeting of the Southern Rural Sociological Association, Orlando, FL (7 February). a, b

Keusch, F., & Yan, T. (2017). Web versus mobile web: an experimental study of device effects and self-selection effects. Social Science Computer Review, 35(6), 751–769. https://doi.org/10.1177/0894439316675566. →

Keusch, F., & Yan, T. (2018). Is satisficing responsible for response order effects in rating scale questions? Survey Research Methods, 12(3), 259–270. https://doi.org/10.18148/srm/2018.v12i3.7263. a, b, c, d

Keusch, F., & Yan, T. (2019). Impact of response scale features on survey responses to factual/behavioral questions. In P. J. Lavrakas, M. W. Traugott, C. Kennedy, A. Holbrook, E. de Leeuw & B. T. West (Eds.), Experimental methods in survey research: techniques that combine random sampling with random assignment (pp. 131–150). Hoboken: John Wiley & Sons, Inc.. a, b, c, d, e

Krebs, D., & Hoffmeyer-Zlotnik, J. H. P. (2010). Positive first or negative first? Effects of the order of answering categories on response behavior. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 6, 118–127. a, b

Krosnick, J. A. (1991). Response strategies for coping with the cognitive demands of attitude measures in surveys. Applied Cognitive Psychology, 5, 213–236. a, b

Krosnick, J. A. (1999). Survey research. Annual Review of Psychology, 50, 537–567. a, b

Krosnick, J. A., & Presser, S. (2010). Question and questionnaire design. In P. V. Marsden & J. D. Wright (Eds.), Handbook of survey research (pp. 263–313). Bingley: Emerald Group Publishing. →

Leon, C. M., Aizpurua, E., & van der Valk, S. (2022). Agree or disagree: does it matter which comes first? An examination of scale direction effects in a multi-device online survey. Field Methods, 34(2), 125–142. https://doi.org/10.1177/1525822X211012259. a, b

Liu, M., & Keusch, F. (2017). Effects of scale direction on response style of ordinal rating scales. Journal of Official Statistics, 33, 137–154. a, b, c

Mingay, D. J., & Greenwell, M. T. (1989). Memory bias and response-order effects. Journal of Official Statistics, 5, 253–263. →

Mussweiler, T., & Strack, F. (1999). Hypothesis-consistent testing and semantic priming in the anchoring paradigm: A selective accessibility model. Journal of Experimental Social Psychology, 35, 136–164. →

Rosseel, Y. (2012). lavaan: an R package for structural equation modeling. Journal of Statistical Software, 48(2), 1–36. https://doi.org/10.18637/jss.v048.i02. →

Saris, W. E., & Gallhofer, I. N. (2007). Design, evaluation, and analysis of questionnaires for survey research. Hoboken: John Wiley & Sons, Inc.. https://doi.org/10.1002/9780470165195. a, b, c

Saris, W. E., Revilla, M., Krosnick, J. A., & Shaeffer, E. M. (2010). Comparing questions with agree/disagree response options to questions with item-specific response options. Survey Research Methods, 4(1), 61–79. https://doi.org/10.18148/srm/2010.v4i1.2682. →

Scherpenzeel, A. (2011). Data collection in a probabilitybased internet panel: how the LISS panel was built and how it can be used. Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique, 5661, 109. https://doi.org/10.1177/0759106310387713. →

Scherpenzeel, A., & Das, J. (2010). Ttrue longitudinal and probability-based internet panels—research portal. In J. Das, P. Ester & L. Kaczmirek (Eds.), Social and behavioral research and the internet (pp. 77–103). Boca Raton: Taylor & Francis. →

Smyth, J. D., Israel, G. D., Newberry, M. G., & Hull, R. G. (2019). Effects of stem and response order on response patterns in satisfaction ratings. Field Methods, 31(3), 260–276. https://doi.org/10.1177/1525822X19860648. →

Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis: an introduction to basic and advanced multilevel modeling (2nd edn.). London: SAGE. →

Tourangeau, R., Maitland, A., Rivero, G., Sun, H., Williams, D., & Yan, T. (2017). Web surveys by smartphone and tablets: effects on survey responses. Public Opinion Quarterly, 81, 896–929. https://doi.org/10.1093/poq/nfx035. a, b, c, d, e, f, g, h, i, j

Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: heuristics and biases. Science, 185, 1124–1131. →

Wegener, D. T., & Petty, R. E. (1995). Flexible correction processes in social judgment: the role of naive theories in corrections for perceived bias. Journal of Personality and Social Psychology, 68, 36–51. →

Wilson, T. D., Houston, C., Etling, K. M., & Brekke, N. (1996). A new look at anchoring effects: basic anchoring and its antecedents. Journal of Experimental Psychology, 125, 387–402. a, b

Yan, T., & Keusch, F. (2015). The effects of the direction of rating scales on survey responses in a telephone survey. Public Opinion Quarterly, 79, 145–165. a, b, c, d, e, f

Yan, T., & Keusch, F. (2018). Direction of agree-disagree rating scales and data quality. Paper presented at the Annual Conference of the American Association for Public Opinion Research. a, b, c

Yan, T., Keusch, F., & He, L. (2018). The impact of question and scale characteristics on scale direction effects. Survey Practice. https://doi.org/10.29115/SP-2018-0008. a, b