Data Availability Statement
Data used for the case studies in Sect. 4.1 and Sect. 4.2 are available as online supplementary material.
The microdata used in Sect. 4.3 are not publicly shareable due to confidentiality constraints. Access to data from the Business Outlook Survey of Industrial and Service Firms (SONDTEL) is only possible through the Bank of Italy’s Research Data Center, which provides secure access to authorized researchers through the REX remote processing system or the laboratory on Bank of Italy’s premises. Detailed information on how to apply for access is available on the Bank of Italy’s official website: Banca d’Italia—Microdata of Industrial and Service Firms. Additionally, some of the data used in the paper, specifically regarding the used survey mode (e.g., CATI vs CAWI) are only available to staff affiliated at the Bank of Italy who are directly responsible for conducting the survey. Therefore these data cannot be gathered thought the Research Data Center.

Measuring the effect of questionnaire design on rating responses with the class of CUB models

4 Empirical evidence from the bank of Italy’s surveys

5 Conclusions

References

4 Empirical evidence from the bank of Italy’s surveys

5 Conclusions

References

1 Introduction

2 A review of the effect of questionnaire design

3 The methodological setting

3.3 Modelling the effects of survey modes and questionnaire features

4.1 Estimating the DK effect on SHIW dataset

4.2 Modelling the effects of CAWI and CAPI survey modes and visual representations on WEBIT/SHIW-I dataset

4.3 Comparing the effect of CATI and CAWI survey mode on firms data with the BOSISF dataset

3.3 Modelling the effects of survey modes and questionnaire features

4.1 Estimating the DK effect on SHIW dataset

4.2 Modelling the effects of CAWI and CAPI survey modes and visual representations on WEBIT/SHIW-I dataset

4.3 Comparing the effect of CATI and CAWI survey mode on firms data with the BOSISF dataset

3.1 Statistical models for rating data in brief

3.2 The reference model

Among possible alternatives, the selection of such distributions obeys the criteria of parsimony and consistency: the Binomial accounts for the combinatorial alternatives faced by respondents when ordinal ratings have to be singled out, whereas the Uniform is the least informative distribution among the discrete ones with finite support. In addition, this choice for the uncertainty component yields model parsimony, since no estimable parameter is involved, and it bears an evocative interpretation of intrinsic fuzziness of the response, due to both subjective and contingent factors (see, also, Golia 2015 for a discussion on further interpretation of uncertainty parameters¹). The selected parametrization implies that is the weight attached to the uncertainty distribution, and is a measure of feeling towards the item in a positively oriented scale. Although several statistical implications are discussed in Piccolo and Simone 2019a and related references, a fundamental issue should be emphasized with regard to CUB models in order to corroborate its application finalized to our research goals: CUB specification implies that the response of each individual is a mixture between the expression of the substantial feeling (driven by ), and an individual propensity to be influenced by contingent factors when marking a score to evaluate the item (conveyed by ), letting the feeling assessment be more or less fuzzy accordingly. In other words, CUB prescribes that quantifies the probability that an observed score is generated solely from the background overall feeling. Finally, among several admissible alternatives, the logistic link: , for any , is motivated by easiness and robustness reasons (Iannario et al. 2016; Iannario et al. 2017).

With respect to the classical approach to ordinal regression models (see (Agresti 2010; Tutz 2012), among others), a remarkable advantage of the class of CUB models is that subjects’ covariates are an important qualification of model (1), yet they are not compulsory. Indeed, assuming a unique probability model for the respondents, a CUB model can be estimated for the ordinal response R, in which case determines the relative weight of the feeling component, or dually, its complement to 1 quantifies the heterogeneity of the distribution, whereas the probability parameter summarizes the overall feeling gauged from the ordered evaluations. Although a full interpretation of parameters is related to the context of analysis (appreciation, evaluation, fear, worry, etc.), generally the feeling measure is simply associated to the location of the distribution (in terms of modal value), whereas uncertainty relates to heterogeneity in the responses (in terms of Gini heterogeneity index, see Capecchi and Iannario 2016). Beyond allowing a twofold interpretation of the response distribution, evidence is found that properly accounting for CUB uncertainty in preference models enhances also prediction performance Simone 2023.

Different survey items or groups of responses can be compared by plotting estimated feeling and uncertainty coefficients as points over the parameter space . This visual representation, firstly proposed in (D’Elia and Piccolo 2005; Piccolo and D’Elia 2008), offers distinctive insights of the CUB modelling approach since the possible effects of covariates (hereafter, the different modes of presentation of the questionnaire) may be immediately checked in size, direction and significance. This circumstance should not be underestimated since, in many instances, ordinal regression models may identify different significant covariates for different item, letting less straightforward the comparisons among respondents and items.

Further refinement of CUB models can be obtained to take into account the presence of an inflated category (Corduas et al. 2009; Iannario 2012). A shelter category is a modality of the support of R_i which receives an upward bias of preference with respect to the expected response pattern. This effect can be accommodated in CUB model by specifying a degenerate distribution , with probability mass concentrated at r = s. Thus, the inflated CUB model becomes:

for and . Here, so that if relevant covariates are possibly modifying the shelter effect².

Then, the special case in which the mixing weight of Binomial distribution collapses to zero leads to a CUSH model, that is a Combination of a discrete Uniform and a SHelter effect Capecchi and Piccolo 2017. It is defined by:

where is the known location of theshelter effect.

Thus, exploiting the CUB paradigm, the objective of this paper will be pursued with the following approach: ad-hoc dummy variables will be included within the specification of CUB regression models with both subjects and objects’ covariates on the grouped responses to:

test and compare the effect of the mode of interview and/or the visual layout for the rating questions;
check the effect of the presence/absence of DK option on the response support.

If necessary, CUB extensions will be considered under the same research scheme to encompass the presence of a shelter effect.

From the methodological point of view, it is worth to emphasize that CUB methods currently lack of a complete multivariate extension that would render possible its application and validation to consolidated scales in psychometric and educational studies. Some extensions to multivariate ordered ratings have been proposed in Ip and Wu 2024 using copula-based approaches assuming CUB margins, in Colombi and Giordano 2016 by assuming a mixture of multivariate uniform and Sarmanov distributions with CUB margins, and in Simone et al. 2020 in the setting of random effect models to account for subjective heterogeneity in response attitude: the latter proposal included a subject-specific random intercept in the regression link involving the uncertainty parameter to account for the individual propensity to a meditated response choice following the feeling model.

CUB models with covariates have been broadly applied to identify response profiles in terms of subjects’ characteristics (Capecchi and Piccolo 2017; Corduas et al. 2009; Fin et al. 2017) or objects’ features (Capecchi et al. 2016; Piccolo and D’Elia 2008). With respect to the state of the art on the topics, we propose to resort to CUB models to identify if there is any significant difference in response features (feeling, uncertainty, possible shelter effects) among independent groups of responses corresponding to different survey modes or questionnaire features. This circumstance applies, for instance, if a given ordered evaluation is collected on independent groups of respondents (yet homogeneous with respect to relevant covariates) via different survey modes (CATI, CAPI, CAWI), various visual layouts (vertical, horizontal, etc.), or with different scales (with or without a don’t know option, with different numbers of categories, labelled or not labelled categories, etc.). This task can be accomplished with the definition of suitable dummy variables identifying the different groups of responses, to be then included as covariates in a CUB model specification.

Then, the proposed method is analogous to the introduction of objects’ covariates in the CUB statistical framework (Capecchi et al. 2016; Piccolo and D’Elia 2008), with the important difference that the requirement of independence of the response groups is strictly respected under our setting: thus, all the inferential results are enforced.

The general CUB model specification with covariates could be usefully applied also to determine if independent groups of respondents to the same ordinal evaluations express different feelings and/or uncertainty. Assume, for instance, that two independent and homogeneous samples:

of ordinal evaluations are collected for the same survey item over a scale with m categories, but the survey has been administered in two different ways to the two samples (say, different survey modes or questionnaire features). Then, the two samples can be merged to derive a unique sample of observations for the given survey item. Therefore, a dummy variable D_i can be defined to flag the two samples according to the way the survey has been administered, namely:

Specifying a CUB model with D_i explaining the possible effect on feeling and uncertainty

or also on the shelter category, when present, with the extra specification of:

for , could reveal if the chosen survey feature entails any difference in either uncertainty or feeling components of the rating response process or modifies the refugee attitude.

To this aim, it is sufficient to test the significance of β₁ and/or γ₁ according to classical likelihood-based inference. As to interpretation, the positive (resp. negative) sign of these parameters implies that the survey feature identified by (namely, the one corresponding to R⁽²⁾), decreases (resp. increases) the corresponding uncertainty and/or feeling , respectively. Eventually, by including more respondents’ covariates in the model specification (5), possible interaction effects could be further tested. For instance, to check if covariate X has an effect on either feeling or uncertainty ) of the response, as well as if there is any interaction effect of X with the survey feature identified by D_i, the following model (7) can be fitted to the observations drawn from the grouped sample R:

Then, and measure the contribution that the combination of covariate value X_i and survey mode D_i induces on uncertainty and feeling (on the logit scale), respectively, with respect to the linear effect of both covariate and survey mode/design feature.

Analogously, to analyse the direct effect of covariate X on the shelter category or its interaction with the survey feature identified by D_i, we use the following model:

so that ν₃ measures the contribution that the combination of covariate value X_i and survey mode D_i induces on the probability to shelter with respect to the linear effect of both covariate and survey mode/design feature (on the logit scale).

For CUB models, parameters can be effectively estimated by maximum likelihood (ML) methods using available software. A devoted library for the R environment is available on the official CRAN repository (Iannario et al. 2024). Dedicated libraries for CUB models are available also for Gretl (Simone et al. 2019), and for STATA (Cerulli 2020; Cerulli et al. 2022). An accelerated version of the EM algorithm and the corresponding implementation of best-subset variable selection is available within the R library (Simone 2020), also on CRAN. To address the research goals of the paper, the CUB estimation procedure has been adapted to account for sample weights: the code is available as online supplementary material.

To check for the usefulness of the proposed approach three datasets will be considered: the first case study concerns measurements of the perceived value of future inheritance collected within the 2016 questionnaire of the Survey on Households Income and Wealth (SHIW) in order to verify the possible modifications induced by the presence/absence of a DK option in the response support (see Sect. 4.1). Then, two case studies based on the WEBIT/SHIW‑I survey on households (Sect. 4.2) and the BOSISF survey on enterprises (Sect. 4.3) will be discussed to investigate the effects of different survey modes in the cognitive response process. The WEBIT survey will be used also to test for the effects of different visual representations in self-administered questionnaires.

To analyse the effect of the introduction of the “don’t know-no answer” option we gather from the dataset which stems from the 2016 edition of the Survey of Households Income and Wealth (SHIW). The survey, conducted periodically by the Bank of Italy since 1962, collects information about the economic conditions of Italian households both with respect to real and financial assets held and to their sources of income together with a complete set of information about the socio-demographic characteristics of each of the family members (Baffigi et al. 2016; Bank of Italy. 2018). Data are collected from professional interviewers specifically trained using the CAPI method.

The SHIW adopts a two-stage stratified sampling design. Provided weights adjust for unequal selection probability and non-response, account for the correlation in the panel component and are post-stratified to external information about the socio-demographic characteristics of the reference population³. In order to deal with the complex survey design, without sharing respondents’ characteristics protected by privacy (such as the stratum they belong to), replication weights are disseminated with data.

Among several questions listed in the questionnaire submitted in the 2016 survey, the main interest of this paper concern the item related to the interviewee’s opinion about the global monetary value of the parent’s house, on December 31, 2016. More specifically in the 2016 questionnaire, a section was dedicated to inspecting the value of future inheritance, asking first for the number of dwellings owned by parents not living in the households and then for an estimate of their value. Due to the potential difficulty on the part of the respondents to provide an answer to this question, amounts have been expressed using ordinal categories. Furthermore, the “don’t know-no answer” option was randomly inserted for half of the sample (leading to two formulations of the question, D50a and D50b), allowing us to test whether only those who were not aware of the phenomenon or even those who adopted a satisficing behaviour, made use of it. In fact, to limit this latter conduct, interviewers were trained not to explicitly read this option even when available.

Both the formulations of the question are expressed with the same wording and m = 5 ordinal categories are offered for an orderly evaluation; the only difference consists in the absence (Question D50a) and presence (Question D50b), respectively, of a sixth possibility of response, denoted as “I don’t know/I don’t remember”, which will be simply denoted by DK.

Can you give me even a rough estimate of the total value of these properties on 31/12/2016? Choose one of the ranges listed below:

up to 50,000 euros … 1
from 50,000 to 150,000 euros … 2
from 150,000 to 300,000 euros … 3
from 300,000 to 500,000 euros … 4
over 500,000 euros … 5
Don’t know … 6

The statistical problem is to measure the significance of the effect of option DK on the expressed evaluations and if this situation is different with respect to definite clusters (with respect to gender, age, family composition, marital status, geographical area, income, financial education, etc.). Thus, the sample has been randomly split into two groups –statistically equivalent with respect to the main demo-socio-economic variables—which consist, respectively, of 678 interviewees who received a questionnaire with Question D50a (without DK) and 635 interviewees who received a questionnaire with Question D50b (with DK).

Table 1 and Figure 1 show the different frequency distributions among the groups. The normalized Laakso and Taagepera index⁴ confirms a substantially equal heterogeneity for the distributions of the two sub-samples. Furthermore, the DK option was selected approximately only by 8% of respondents,⁵ therefore the distributions on the ordered support are quite similar, except for categories whose relative frequencies reduce in the presence of DK (see Fig. 1). However, after removing the DK responses, the (relative) frequency distribution of the observed rating variable, denoted as for convenience and reported as the last column in Table 1, appears more similar to the distribution of responses to Question D50a.⁶

Table 1 Distribution of response options for expected value or real assets future inheritance

Categories	Absolute frequencies		Relative frequencies
	Without DK	With DK	Without DK	With DK	f⁽^reduced⁾
(1) up to 50,000 euros	60	55	0.088	0.087	0.094
(2) from 50,000 to 150,000 euros	273	239	0.403	0.376	0.408
(3) from 150,000 to 300,000 euros	228	192	0.336	0.302	0.328
(4) from 300,000 to 500,000 euros	79	61	0.117	0.096	0.104
(5) over 500,000 euros	38	38	0.056	0.060	0.066
(6) Don’t know (DK)	–	50	–	0.079	–
Total	678	635	1.000	1.000	1.000
Laakso and Taagepera index	–	–	0.584	0.572	0.589

Fig. 1 Frequency distribution for responses without (blue, points) and with (red, stars) DK option

Thus, according to the exploratory evidence, the presence/absence of a DK option has no relevant impact on the rating distribution. To verify this statement with suitable statistical models introduced in paragraph Sect. 3.3, Table 2 reports the estimation results for the model (5). Standard errors of parameters are obtained via replication weights using JRR (Jackknife Repeated Replication). Results indicate that the presence of a ‘don’t know’ response option does not significantly modify either the uncertainty or the actual feeling of the observed scores.

Since a certain amount of inflation in frequency at the second and at the third response options is observed with respect to CUB fit, a CUB model with shelter at each of these categories was tested to check if the presence of the “don’t know” option in the response scale significantly modifies the refuge attitude.

Results for model (6) are reported in Table 2 as well, showing that no significant effect is found.

Table 2 Parameter estimates and corresponding standard errors (in parentheses) for Model (5) and Model (6) (with shelter at second or third category)

Thus, it may be safely inferred that the presence of a DK option with the rating scale does not modify the way the rating options are perceived and used.

Model (7) can be considered to check if there is any covariate effect and interaction with the presence of a DK option in the rating scale. Accordingly, Table 3 reports the Wald statistics for the test of significance of the corresponding effects. Aside from identifying effects provided by increasing income, having a university degree and living in central Italy, all of which act by increasing the feeling of the respondents (i.e., they report on average higher values for parents’ dwellings), interaction effects between subjects’ characteristics and presence of DK option that are relevant for our analysis are found only with respect to the uncertainty component. Specifically, ceteris paribus for a fixed level of income, heterogeneity is lower if the DK option is present (D50b) than if there is no DK option (D50a). In addition, for D50a there is no significant income effect on the heterogeneity of the distribution, whereas heterogeneity of D50b increases with income. Responses of individuals from Central Italy are significantly less heterogeneous than those of the rest of the respondents, to a greater extent for respondents to D50a than for respondents to D50b. It is worth stressing that these results are only revealed by the use of the specific extension of CUB models derived for this analysis.

Table 3 Wald statistics for parameters of model (7)


^* Male = 0, Female = 1. *p < 0.05
Has children	2.579	−0.700	−0.569	0.194
Gender^*	2.796	−1.427	−0.056	0.773
Income	2.219	6.304*	−0.396	−29.632*
Northern Italy	2.216	−1.760	−1.904	1.661
Central Italy	3.412	0.089	3.760*	−3.074*
University Degree	4.676	0.254	0.232	−0.154
Home ownership	2.950	0.323	0.932	−1.144
Other real estate	0.580	−0.437	−0.502	0.452
–
Has children	4.534	−0.906	−0.766	0.873
Gender	5.009	−0.207	−1.174	0.131
Income	7.531	−0.568	−5.084*	−0.581
Northern Italy	7.859	−0.640	−1.118	0.419
Central Italy	6.521	0.244	−3.149*	−1.643
University Degree	9.781	0.012	−5.120*	−1.446
Home ownership	3.881	−0.832	−1.456	0.699
Other real estate	6.273	−0.107	0.022	−0.500

To investigate the effect of the survey mode for households, we gather information from the Web Survey on Italian Households (WEBIT) and the Intermediate Survey in Italian Households (SHIW-I) administered using the CAWI and the CAPI modes, respectively. The two surveys have been conducted in parallel, between two editions of the SHIW, with a shorter questionnaire containing mainly qualitative items. The WEBIT has been managed in collaboration by the Bank of Italy and ISTAT (the Italian National Institute of Statistics) to investigate the use of web surveys in collecting data on households income and wealth on a probabilistic sample of about 1000 individuals (Barcaroli et al. 2019). At the same time, the SHIW-I was carried out by the Bank of Italy using the traditional CAPI mode on a sample of about 2000 households selected from those who had participated in the 2014 edition of the SHIW. To make the two surveys as much comparable as possible, participants to both surveys were drawn from the population of the same municipalities, with a similar two stage sample design and the questionnaires were designed to contain common questions and the same information about respondents socio-demographic characteristics and their economic conditions that could be used as covariates in the analysis (Gambacorta et al. 2018).

To compare answers using different survey modes, we refer to the question regarding the subjective perception of the economic condition of the household. The question was present in both questionnaires adopting the same wording as follows:

Is your household’s income sufficient to see you through to the end of the month … ?

with great difficulty … 1
with difficulty … 2
with some difficulty … 3
fairly easily … 4
easily … 5
very easily … 6

Table 4 presents the corresponding frequency distributions for the CAPI and the CAWI surveys. The normalized Laakso and Taagepera index indicates a larger heterogeneity within the CAWI results than within the CAPI results.

Table 4 Distribution of response options for the subjective economic condition between CAPI and CAWI surveys

Categories	Absolute frequencies		Relative frequencies
	CAWI	CAPI	CAWI	CAPI
(1) with great difficulty	115	359	0.136	0.181
(2) with difficulty	90	277	0.106	0.140
(3) with some difficulty	207	522	0.245	0.264
(4) fairly easily	237	576	0.280	0.291
(5) easily	122	190	0.144	0.096
(6) very easily	75	54	0.089	0.028
Total	846	1978	1.000	1.000
Laakso-Taagepera Index	–	–	0.816	0.723

The same question can be used also to investigate how visual features of survey questions may influence respondents’ choices. Indeed, in the WEBIT survey, this question was asked using two different visual presentations on random sub-samples of respondents. In particular, response options were organized in a traditional vertical list of categories, as reported above, for half of the sample (vert-traditional) and with horizontal radio buttons for the remaining part (horiz-radio) (Table 5).

Table 5 Horizontal response options to question: “Is your household’s income sufficient to see you through to the end of the month ...?”

○

With great

difficulty

Very

easily

Table 6 summarizes the frequency distributions for the vertical and the horizontal option layouts. In this comparison, it turns out that an important difference between heterogeneity measured by Laakso and Taagepera index is obtained when comparing horizontal-radio versus vertical-traditional layouts.

Table 6 Distribution of response options for subjective economic condition by options visualization feature

Categories	Absolute frequencies		Relative frequencies
Categories	Vert-trad	Horiz-radio	Vert-trad	Horiz-radio
Midrule
(1) with great difficulty	51	64	0.122	0.150
(2) with difficulty	45	45	0.107	0.105
(3) with some difficulty	115	92	0.274	0.215
(4) fairly easily	135	102	0.322	0.239
(5) easily	56	66	0.134	0.155
(6) very easily	17	58	0.041	0.136
Total	419	427	1.000	1.000
Laakso-Taagepera Index	–	–	0.690	0.915

To test the effect of both survey mode and visual representation on the feeling and the uncertainty components, we modify the model (5) to include two dummy variables as covariates: (1) the CAPI dummy variable to identify the survey mode ( for CAPI respondents, and otherwise) and the Horiz dummy variable used to identify the different visual presentations of the question ( for the vertical list of response options and for the horizontal sequence of response options). Thus, the reference survey mode is the CAWI-vertical combination, against which the effect of CAPI mode and horizontal layout will be separately tested under the model:

No significant effect has been found of survey mode and visual representation for the feeling component. With respect to the uncertainty component, no significant difference is found between CAPI respondents and CAWI respondents, while we observe a significant difference between vertical and horizontal layouts for CAWI mode (see Eqs. 9 and 10)⁷.

In particular, results show that uncertainty rises when the horizontal layout is considered, thus leading to less homogeneous response patterns and, subsequently, higher fuzziness around the actual response signal. These results are in line with what is generally found in the literature, i.e. that the use of radio buttons increases uncertainty such as the labelling of only extreme categories with respect to providing labels for all items.

Next, we investigate if there is any interaction between vertical and horizontal layouts with relevant socio-demographic covariate X (gender: male = 0, female = 1, presence of children in the household, having a university degree, age: young if under 35 years and elderly if aged more than 64-, households main residence ownership, number of household components). Specifically, the following CUB specification with covariates effects only for the uncertainty component has been tested:

Results are reported in Table 7:

Table 7 Estimates of parameters and standard errors for Model (11)

X
Gender	Has children	Degree	Young	Elderly	Homeowner	Household size

It turns out that responses given by homeowners are more homogeneous than responses given by people not owning their homes. The general conclusion that CAWI responses given on a horizontal layout are more heterogeneous than responses collected via CAPI or CAWI responses on the vertical layout is not modified if one controls for covariates.

Next, we performed a model selection within the class of CUB models for each response group separately to test possible different shelter effects. Table 8 reports some indicators of fitting performances for competing models. Accordingly, Fig. 2 shows that a richer CUB model specification is needed to account also for the inflation in frequency at the first category for all the groups.

Table 8 Fitting results for competing models for different sub-groups of responses (best performances highlighted in bold fonts)

	CAPI (SHIW-I)		CAWI-vert (WEBIT)		CAWI-horiz (WEBIT)
	Loglik	BIC	Loglik	BIC	Loglik	BIC
CUB	−3148.101	6311.381	−581.228	1174.532	−681.710	1375.534
CUB + she(1)	−3018.891	6060.551	−556.097	1130.308	−674.446	1367.062
CUB + she(3)	−3148.255	6319.279	−581.315	1180.744	−679.107	1376.385
CUB + she(4)	−3078.720	6180.209	−566.933	1151.979	−681.710	1381.591

Fig. 2 Graphical check of the goodness of fit of candidate models for (weighted) observed distributions, given survey mode

Table 9 reports estimated parameters and standard errors: when focusing on responses on the vertical scale, one notices that the CUB with shelter at c = 1 reduces to a Binomial with shelter. In particular, the weight of the Binomial in the mixture is similar in the two groups, whereas the feeling measure is higher for CAPI than it is for CAWI. Although the difference is not statistically significant, this circumstance could be due to a social desirability bias manifest for CAPI respondents along the whole scale.

Table 9 Estimated parameters (and standard errors) of CUB and CUB with shelter(1), given survey mode (see Table 8)

	CUB	CUB+she(1)
		Binomial weight	Uniform weight
CAPI
CAWI-vert
CAWI-horiz

As a matter of fact, between CAPI and CAWI vertical responses we observe a larger tendency in CAWI mode to choose options that identify relevant economic difficulties, possibly due to the presence of a social desirability bias, which reduces the choice of these categories in favour of those who identify a better economic situation when the interviewer is present. Furthermore, while in the vertical layout, there is a larger tendency to choose more often the category “fairly easily” (a circumstance that can be identified as a central tendency bias as this option can be seen close to a neutral category, not identifying economic distress neither an extreme affluence), there is a larger tendency to choose the extreme categories when the horizontal layout is adopted. These are the only labelled ones, and thus they can be more clearly identified by the respondents, whereas neutral responses between the third and fourth categories cannot be clearly distinguished due to the absence of complete labelling of all categories.

To test for these hypotheses (social desirability bias and central tendency bias) we report the z‑test for the comparison of two independent proportions⁸.

To verify the presence of a social desirability bias we compare responses concerning the two lowest categories (conveying actual economic difficulties) obtained using the CAPI mode with those coming from the CAWI, considering in the latter survey first only responses to questions using the vertical representation (Table 10) and then all answer (Table 11). Both the results confirm evidence of social desirability bias at significance level .

Table 10 Testing for social desirability bias for CAPI versus CAWI respondents on the basis of the observed weighted distribution for responses collected on vertical scales


CAPI	0.295
CAWI-vert	0.337
p-value for one-sided z‑test	0.045

Table 11 Testing for social desirability bias for CAPI versus CAWI respondents on the basis of the observed weighted distribution


CAPI	0.295
CAWI	0.334
p-value for one-sided z‑test	0.021

Similarly, Table 12 reports relevant information on the z‑test to compare the frequency of the central categories (third and fourth, conveying pseudo-neutral evaluation) between vertical response mode (with all labelled categories) and the horizontal mode (with radio-buttons, and labels only for the extreme categories). Significant evidence is found for a strong tendency to central categories characterizing responses collected on vertical layout, due to the complete labelling of categories.

Table 12 Testing for central tendency bias for Vertical versus Horizontal response layout on the basis of the observed weighted distribution


Vertical	0.593	0.562
Horizontal	0.424	0.454
p-value for one-sided z‑test

The Business Outlook Survey of Industrial and Service Firms (BOSISF hereafter) is annually conducted by the Bank of Italy since 1993 to collect qualitative information on firms’ performance and on the main economic variables (Bank of Italy 2017b). The survey is conducted on about 4500 firms (3000 firms in industry with 20 and more workers, 1000 firms in non-financial private services and 500 construction firms with 10 and more workers). Firms are contacted by e‑mail and can decide either to fill in the questionnaire on the web (CAWI) or to provide the information by telephone (CATI).⁹ Interviews by phone are administered by officials of the Bank of Italy’s local branches, specially trained to conduct business surveys (Bank of Italy 2017a).

To study the effect of survey mode on businesses’ answer elicitation mechanism we consider two questions. The first question collect the “realization rate of investment”, i.e. how much of the investment expenditure that was planned in the previous year has been actually realized in the current year (Q₁). The second question collects the “expected investment growth rate”, that is the change in investment expenditures expected in the future year with respect to the current one (Q₂). Particularly, we refer to the 2020 edition of the survey since more answer options were provided for these questions due to the larger volatility of investment associated with the economic crisis resulting from the Covid-19 pandemic.

Namely, the questions are the following:

Q₁:

Compared to the level planned at the end of 2019, nominal expenditure on (tangible and intangible) fixed investment in the current year will be:

Q₂:

How does planned nominal expenditure on fixed investment in 2021 compared with that in 2020? with the same response options:

Lower by more than −50% … 1
Lower by between −50% and −25% … 2
Lower by between −25% and −10% … 3
Lower by between −10% and −3% … 4
Stable between −3% and +3% … 5
Higher by between +3.1 and 10% … 6
Higher by between +10.1 and 50% … 7
Higher by more than +50% … 8
Do not know, do not wish to answer … 9

Tables 13 and 14 summarize the frequency distributions respectively of the variation in realised (Q1) and expected (Q2) investment for CATI and CAWI respondents.

Table 13 Distribution of response options for realized investment variation (Q₁) in CATI and CAWI surveys

Categories	Absolute frequencies		Relative frequencies
	CATI	CAWI	CATI	CAWI
(1) Lower by more than −50%	173	227	0.111	0.086
(2) Lower by between −50% and −25%	111	264	0.071	0.100
(3) Lower by between −25% and −10%	139	259	0.090	0.099
(4) Lower by between −10% and −3%	102	200	0.066	0.076
(5) Stable between −3% and +3%	819	1297	0.527	0.494
(6) Higher by between +3.1 and 10%	82	138	0.053	0.053
(7) Higher by between +10.1 and 50%	68	90	0.044	0.034
(8) Higher by more than +50%	31	31	0.020	0.012
(9) Do not know, do not wish to answer	28	122	0.018	0.046
Total	1553	2628	1.000	1.000
Laakso-Taagepera Index	–	–	0.274	0.316

Table 14 Distribution of response options for expected investment variation (Q₂) in CATI and CAWI surveys

Categories	Absolute frequencies		Relative frequencies
	CATI	CAWI	CATI	CAWI
(1) Lower by more than −50%	76	94	0.049	0.036
(2) Lower by between −50% and −25%	33	105	0.021	0.040
(3) Lower by between −25% and −10%	68	131	0.044	0.050
(4) Lower by between −10% and −3%	68	99	0.044	0.038
(5) Stable between −3% and +3%	663	1201	0.427	0.457
(6) Higher by between +3.1 and 10%	221	367	0.142	0.139
(7) Higher by between +10.1 and 50%	162	205	0.104	0.078
(8) Higher by more than +50%	69	49	0.045	0.019
(9) Do not know, do not wish to answer	193	377	0.124	0.143
Total	1553	2628	1.000	1.000
Laakso-Taagepera Index	==	==	0.401	0.352

It should be noted that the intrinsic uncertainty within the answers to the two questions is by construction different since the first question concerns an observable item while the second requires the formulation of an expectation. Therefore, considering both items, we can test how respondents’ choice is realized with different survey modes in dissimilar uncertainty frameworks.

Before dwelling into a model-based analysis, it is worth pursuing a preliminary investigation of the target rating variables Q₁ and Q₂: from Tables 13 and 14, it can be inferred that, overall, the number of “don’t know” responses is higher when evaluations refer to the future (Q₂) with respect to evaluations referring to the past (Q₁) (the association is highly significant according to the test).

With respect to survey mode and according to a standard X² test, it follows that don’t know options occur more frequently for respondents answering via CAWI than for those answering via CATI only for Q₁: there is no significant association between the occurrence of “don’t know” responses and survey mode (CATI-CAWI) for Q₂. This first result might indicate that there is a tendency to adopt a satisficing behaviour when providing information concerning variation between planned and realized investments when the interview takes place via the web.

In order to provide a unified summary of these results, we fitted a logistic regression on the indicator variable D_i reporting if the response is “don’t know” () or observed , after merging Q₁ and Q₂ evaluations and thus assuming that they are conditionally independent given the chosen explanatory variables, namely a dummy variable CAWI to flag the survey modality ( for CAWI respondents, for CATI respondents), and a dummy variable X_i created to identify past (, namely Q₁) from future evaluations (, namely Q₂). It follows that, overall, both the modality CAWI of the questionnaire and the fact that the question requires an assessment about the future (rather than about the past) contribute to significantly increase the probability of observing a “don’t know” response. This result confirms the findings that satisficing behaviours that may lead to the DK choice are better contrasted by the interviewer with CATI mode with respect to a self-administered mode and that expectations are generally more subject to uncertainty than realized outcomes.

After omitting “don’t know” responses from the analysis, the Spearman correlation coefficient for the observed ratings for Q₁ and Q₂ amounts to , indicating poor dependence between the two ratings, with a slightly negative direction: it follows then that for increasing observed variation in the current year for nominal expenditure, there is a slight tendency to expect a lower variation in 2021 than that observed in 2020. This result is in line with the fact that the latent continuous variables to which the two qualitative outcomes refer to contain the realized investment in the current year, respectively in the numerator for the realized rate and in the denominator for the expected rate.

With reference to the observed frequency distributions reported in Tables 13 and 14, it is seen that response distributions to Q₁ and Q₂ present both a strong frequency inflation in category 5 (conveying overall stability). This result, referred to as central tendency bias (Pimentel 2019), reflects the tendency to choose the neutral category in a Likert scale, when available: this circumstance is particularly relevant in this case. Thus, CUB models cannot be expected to be sufficiently adequate to fit the data, even after including a possible shelter effect: indeed, the shelter parameter δ is not significant. This circumstance may be due to the fact that the modal value coincides with the category where the inflation is observed and to the large heterogeneity of the distribution. This conclusion continues to hold even after controlling for selected covariates. For this reason, in the following discussion we focus on fitting results obtained with the CUSH model (4) (Capecchi and Piccolo 2017), which allows investigating more carefully the variables related to the central tendency bias (even if the scale is not balanced with respect to the centre), which assumes extreme importance in this circumstance¹⁰.

First, consider the CUSH model:

for both Q₁ and Q₂, where CAWI is a dummy factor identifying CAWI respondents against CATI respondents . Results are reported in Table 15.¹¹

Table 15 Estimation results for CUSH model for Q₁ and Q₂ in terms of CAWI covariate

	ν₀	ν₁	BIC
Q₁			10942.01
Q₂			11061.26

Significant differences in central tendency bias between CAWI and CATI respondents are found only for Q₂ evaluation. In particular, the positive sign of estimated ν₁ for Q₂ is in line with the finding that the interviewer may help to reduce satisficing behaviours, in this case, related to the tendency to choose the neutral category due to the difficulty to provide information with respect to a future event.

When stratifying responses across sectors of activity, there is a significant CAWI effect for inflation in category c = 5 for chemicals, rubbers and plastics industry in Q₁ and for retail trade and food industries in Q₂. For the sake of completeness, Table 16 reports estimates of parameters and standard errors (in parentheses) for Model (12) fitted to Q₁ and Q₂ responses provided by enterprises in each economic sector to check for differences in inflation at c = 5.

Table 16 Estimates of parameters and standard errors for CUSH, Model (12) fitted to Q₁ and Q₂ responses for each economic sector.

Economic Sector	Q₁:	Q₁:	Q₂:	Q₂:
*p < 0.05
Food beverages and tobacco
Textiles, clothing, leather, footwear
Chemicals, rubber, plastics
Basic metals
Engineering
Other manufacturing
Energy and mining
Retail trade
Hotels and restaurants
Transport, storage and communication
Other services

For both Q₁ and Q₂, no significant effect modifying central tendency bias are found due to covariates and their interactions with survey mode: this statement follows from the estimation of model (8) for each covariate X (geographical region, dimensional class and export quota).

Next, with focus on responses to Q₂ only, the following model () can be estimated to explain if frequency inflation at c = 5 can be interpreted in terms of negative variation declared for Q₁ () or positive variation declared for Q₁ ():

All the effects are significant: then, it follows that inflation at c = 5 for Q₂ increases for CAWI respondents, also when accounting to responses provided to the question about current investment Q₁. Notice also that inflation at c = 5 for Q₂ decreases for negative or positive variations reported for Q₁ evaluations (mostly, for negative variations). This means that people reporting either or have lower probability to inflate frequency 5 for Q₂, and thus expect unstable variation for Q₂.¹²

Finally, to investigate differences across economic sectors in the joint effect of Q₁ ratings and survey mode on Q₂ evaluations we estimate, for each sector, the following model:

Results are reported in Table 17 and indicate that inflation in the category conveying stability c = 5 is significantly higher for CAWI respondents for sectors of food, beverages and tobacco, retail trade and engineering (in decreasing order of ). Ratings on Q₁, instead, significantly affect central tendency bias for other manufacturing industries, transport, storage and communication, textiles, clothing, leather and footwear, engineering and transport, storage and communication (in decreasing order of )¹³.

Table 17 Estimates of parameters and standard errors for model (14) for ratings on Q₂.

Economic Sector
*p < 0.05
Food beverages and tobacco
Textiles, clothing, leather, footwear
Chemicals, rubber, plastics
Basic metals
Engineering
Other manufacturing
Energy and mining
Retail trade
Hotels and restaurants
Transport, storage and communication
Other services

In order to contribute to the literature that investigates the presence of possible sources of distortion in micro-data from sample surveys, this work investigated many of the factors related to survey mode and questionnaire design that can potentially influence the response choice. In particular, using data from official surveys on both households and firms, we focus on the effects of different survey modes, visual representation of survey questions and the presence/absence of the don’t know (DK) option.

The novelty of the approach here introduced with respect to the state of the art on CUB models applications consists in the possibility to test for a possible effect of the different survey modes or questionnaire features (DK option, visual features) on both the feeling and uncertainty of the responses in a more straightforward way than classical methods, as scale-location cumulative link models.

Although referring to specific cases, the results show that, with respect to the feeling component, neither the presence of the “Don’t know” option, nor the survey mode or the graphical representation appear to significantly modify the way the rating options are perceived and used. On the other hand, we found evidence of the effects of these features on uncertainty and on shelter choices. In particular:

Survey mode: Our study provided evidence that CAWI collection mode leads to an increase in propensity to CUB uncertainty in responses provided by firms regarding expected investment. For the same question, we also find that firms choose more often the options related to a neutral position or to the absence of knowledge when the CAWI mode was used. This result may be due to the fact that, in the absence of an interviewer, respondents may adopt more easily satisficing behaviour to reduce their effort, especially when more complex questions are concerned. On the other hand, when household surveys are concerned, results show that the social desirability bias is reduced when using CAWI mode with respect to CAPI.
Don’t know option: This option is more frequently used when questions about expectations are concerned and sometimes interacts with variables regarding the size of the phenomenon. In the case of reported estimates of the value of parents’ dwellings, the weight of CUB uncertainty was higher when the DK option was present for households with a higher level of income or leaving in the Centre of Italy, who were also reporting on average higher evaluations for the item.
Visual representation: Comparing a horizontal layout, where labelling is provided only for extreme classes, with a classical vertical representation, where all options are labelled, we observe an increase in the heterogeneity of the responses in the former layout, and an increase in central tendency bias in the latter.

These results are in line with what is suggested by the literature, providing evidence that CUB models are a robust tool to interpret and compare data collected with with different survey techniques and questionnaire design. However, the peculiar nature of these results suggests considering these findings as non-exhaustive and continuing to carry out appropriate tests whenever different techniques are used in sub-samples in order to exclude any possible source of distortion in the results relating to the examined population.

Acknowledgements

Partially Supported by grant SI-WCWB from University of Naples Federico II (FRA 2022), DR n 3429, 07/09/2023 (CUP: E65F22000050001). The authors wish to thank Lucia Modugno and Andrea Neri for helpful comments. The views in this paper are those of the authors only and do not necessarily reflect those of the Bank of Italy.

Agresti, A. (2010). Analysis of ordinal categorical data (2nd edn.). Wiley. a, b

Baffigi, A., Cannari, L., & D’Alessio, G. (2016). Cinquanta anni di indagini sui bilanci delle famiglie italiane: Storia, metodi, prospettive (fifty years of household income and wealth surveys: history, methods and future prospects). Bank of Italy Occasional Paper, (368). →

Bank of Italy (2017a). Business outlook survey of industrial and service firms. Methods and Sources: Methodological notes, November, https://www.bancaditalia.it/pubblicazioni/metodi-e-fonti-note/metodi-note-2017/en-metodologia_sondaggio_impr_industr_serv.pdf?language_id=1. →

Bank of Italy (2017b). Survey of industrial and service firms. Methods and Sources: Methodological notes, July, https://www.bancaditalia.it/pubblicazioni/metodi-e-fonti-note/metodi-note-2017/en_survey_methodology_invind.pdf?language_id=1. →

Bank of Italy. (2018). The survey on household income and wealth. Methods and Sources: Methodological notesMarch, https://www.bancaditalia.it/pubblicazioni/metodi-e-fonti-note/metodi-note-2018/MOP_IBF_en.pdf?language_id=1. →

Barcaroli, G., Gambacorta, R., Conte, L. L., Murgia, M., Neri, A., & Zanichelli, F. (2019). L’indagine sperimentale web sulle famiglie italiane: Una valutazione della tecnica cawi per rilevare informazioni sul reddito e la ricchezza. ISTAT Metodi letture Statistiche. https://www.istat.it/it/archivio/228589 →

Batini, C., Cappiello, C., Francalanci, C., & Maurino, A. (2009). Methodologies for data quality assessment and improvement. ACM Comput. Surv.. https://doi.org/10.1145/1541880.1541883. →

Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57, 289–300. →

Bound, J., Brown, C., & Mathiowetz, N. (2001). Chapter 59—measurement error in survey data. In J. Heckman & E. Leamer (Eds.), Elsevier. https://doi.org/10.1016/S1573-4412(01)05012-7. →

Bowling, A. (2005). Mode of questionnaire administration can have serious effects on data quality. Journal of Public Health, 27(3), 281–291. https://doi.org/10.1093/pubmed/fdi031. →

Braunsberger, K., Wybenga, H., & Gates, R. (2007). A comparison of reliability between telephone and web-based surveys. Journal of Business Research, 60(7), 758–764. →

Capecchi, S., & Iannario, M. (2016). Gini heterogeneity index for detecting uncertainty in ordinal data surveys. METRON, 74, 223–232. →

Capecchi, S., & Piccolo, D. (2017). Dealing with heterogeneity in ordinal responses. Quality & Quantity, 51, 2375–2393. a, b, c, d

Capecchi, S., Endrizzi, I., Gasperi, F., & Piccolo, D. (2016). A multi-product approach for detecting subjects’ and objects’ covariates in consumer preferences. British Food Journal, 118, 515–526. a, b

Cerulli, G. (2020). CUB: Stata module to estimate ordinal outcome model estimated by a mixture of a uniform and a shifted binomial. Statistical Software Components, S458727. Boston College Department of Economics. →

Cerulli, G., Simone, R., Di Iorio, F., Piccolo, D., & Baum, C. (2022). Fitting mixture models for feeling and uncertainty for rating data analysis. The STATA Journal, 22(1), 195–223. →

Chan, Z., & Conrad, F. (2014). Speeding in web surveys: The tendency to answer very fast and its association with straightlining. Survey Research Methods, 8(2), 127–135. →

Chang, L., & Krosnick, J. (2010). Comparing oral interviewing with self-administered computerized questionnaires. An experiment. Public Opinion Quarterly, 74(1), 154–167. https://doi.org/10.1093/poq/nfp090. a, b

Colombi, R., & Giordano, S. (2016). A class of mixture models for multidimensional ordinal data. Statistical Modelling, 16, 322–340. →

Coombs, C., & Coombs, L. (1976). “Don’t know”’ item ambiguity or respondent uncertainty? Public Opinion Quarterly, 40, 495–514. →

Corduas, M., Iannario, M., & Piccolo, D. (2009). A class of statistical models for evaluating services and performances. In M. Bini, P. Monari, D. Piccolo & L. Salmaso (Eds.), Statistical methods for the evaluation of educational services and quality of products, contribution to statistics (pp. 99–117). Springer. a, b

Couper, M. (2013). Is the sky falling? new technology, changing media, and the future of surveys. Survey Research Methods, 7(3), 145–156. →

Couper, M., Traugott, M., & Lamias, M. (2001). Web survey design and administration. Public opinion quarterly, 65(2), 230–253. →

Davis, R., Couper, M., Janz, N., Caldwell, C., & Resnicow, K. (2009). Interviewer effects in public health surveys. Health Education Research, 25(1), 14–26. https://doi.org/10.1093/her/cyp046. →

De Leeuw, E., Hox, J., & Scherpenzeel, A. (2011). Mode effect or question wording? measurement error in mixed mode surveys. In Proceedings of the survey research methods section (pp. 5959–5967). American Statistical Association. a, b

DeCastellarnau, A. (2018). A classification of response scale characteristics that affect data quality. Quality & Quantity, 52, 1523–1559. a, b

D’Elia, A., & Piccolo, D. (2005). A mixture model for preference data analysis. Computational Statistics & Data Analysis, 49, 917–934. a, b

Dillman, D. (2002). Survey nonresponse in design, data collection, and analysis. In Survey nonresponse (pp. 3–26). →

Dillman, D., Phelps, G., Tortora, R., Swift, K., Kohrell, J., Berck, J., & Messer, B. (2009). Response rate and measurement differences in mixed-mode surveys using mail, telephone, interactive voice response (ivr) and the internet. Social science research, 38(1), 1–18. →

Faiella, I., & Gambacorta, R. (2007). The weighting process in the SHIW. Bank of Italy Temi di Discussione (Working Paper), 636. →

Fin, F., Iannario, M., Simone, R., & Piccolo, D. (2017). The effect of uncertainty on the assessment of individual performance: empirical evidence from professional soccer. Electronic Journal of Applied Statistical Analysis, 10, 677–692. a, b

Fricker, S., Galesic, M., Tourangeau, R., & Ting, Y. (2005). An experimental comparison of web and telephone surveys. Public Opinion Quarterly, 69(3), 370–392. →

Funke, F. (2016). A web experiment showing negative effects of slider scales compared to visual analogue scales and radio button scales. Social Science Computer Review, 34(2), 244–254. →

Gambacorta, R., Conte, M. L., Murgia, M., Neri, A., Rizzi, R., & Zanichelli, F. (2018). Mind the mode: Lessons from a web survey on household finances. Bank of Italy Occasional Paper, (437). →

Golia, S. (2015). On the interpretation of the uncertainty parame ter in CUB models. Electronic Journal of Applied Statistical Analysis, 8, 312–328. →

Hambleton, R. K. (1991). Fundamentals of item response theory. Vol. 2. New York: SAGE. →

Iannario, M. (2012). Modelling shelter choices in a class of mixture models for ordinal responses. Statistical Methods and Applications, 21, 1–22. →

Iannario, M., Monti, A. C., & Piccolo, D. (2016). Robustness issues in CUB models. TEST, 25(4), 731–750. →

Iannario, M., Monti, A., Piccolo, D., & Ronchetti, E. (2017). Robust inference for ordinal response models. Electronic Journal of Statistics, 11, 3407–3445. →

Iannario, M., Piccolo, D., & Simone, R. (2024). CUB: A class of mixture models for ordinal data. R package version 1.1.5. →

Ip, R., & Wu, K. (2024). A mixture distribution for modelling bivariate ordinal data. Statistical Papers, 65, 4453–4488. →

Jäckle, A., Roberts, C., & Lynn, P. (2010). Assessing the effect of data collection mode on measurement. International Statistical Review, 78(1), 3–20. https://doi.org/10.1111/j.1751-5823.2010.00102.x. →

Jäckle, A., Lynn, P., & Burton, J. (2015). Going online with a face-to-face household panel: effects of a mixed mode design on item and unit non-response. Survey Research Methods, 9(1), 57–70. →

Kankaras, M., & Capecchi, S. (2025). Neither agree nor disagree: use and misuse of the neutral response category in Likert-type scales. Metron, 83, 111–140. →

Keusch, F., & Yang, T. (2018). Is satisficing responsible for response order effects in rating scale questions? Survey Research Methods, 12, 259–270. →

Kreuter, F., Presser, S., & Tourangeau, R. (2008). Social desirability bias in CATI, IVR, and web surveys: the effects of mode and question sensitivity. Public opinion quarterly, 72(5), 847–865. →

Krosnick, J. (1991). Response strategies for coping with the cognitive demands of attitude measures in surveys. Applied Cognitive Psychology, (5), 213–236. →

Krosnick, J., Holbrook, A., Berent, M., Carson, R., Hanemann, M., Kopp, R., Mitchell, C., Presser, S., Ruud, P., Smith, V. K., Moody, W., Green, M., & Conaway, M. (2002). The impact of “no opinion” response options on data quality: non-attitude reduction or an invitation to satisfice? Public Opinion Quarterly, 66(3), 371–403. https://doi.org/10.1086/341394. →

Malhotra, M. (2008). Completion time and response order effects in web surveys. Public opinion quarterly, 72(5), 914–934. a, b

Maloshonok, N., & Terentev, E. (2016). The impact of visual design and response formats on data quality in a web survey of mooc students. Computers in Human Behavior, 62, 506–515. →

Manisera, M., & Zuccolotto, P. (2014). Modeling “don’t know” responses in rating scales. Pattern Recognition Letters, 45, 226–234. →

Masters, G. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174. →

McCullagh, P. (1980). Regression models for ordinal data (with discussion). Journal of the Royal Statistical Society, Series B, 42, 109–142. →

McCullagh, P., & Nelder, J. (1989). Generalized linear models. Hall: Chapman. →

Melani, C., Dillman, D., & Smyth, J. (2008). The effects of mode and format on answers to scalar questions in telephone and web surveys. Advances in telephone survey methodology, 12, 250–275. a, b

Montagni, I., Cariou, T., Tzourio, C., & Gonzalez-Caballero, J. (2019). “I don’t know”, “I’m not sure”, “I don’t want to answer”: A latent class analysis explaining the informative value of nonresponse options in an online survey on youth health. International Journal of Social Research Methodology, 22(6), 651–667. →

Piccolo, D. (2003). On the moments of a mixture of uniform and shifted binomial random variables. Quaderni di Statistica, 5, 85–104. →

Piccolo, D., & D’Elia, A. (2008). A new approach for modelling consumers’ preferences. Food Quality & Preferences, 19, 247–259. a, b, c

Piccolo, D., & Simone, R. (2019a). The class of CUB models: Statistical foundations, inferential issues and empirical evidence (with discussions and rejoinder). Statistical Methods & Applications, 28, 389–493. a, b, c

Piccolo, D., & Simone, R. (2019b). Rejoinder to the discussion of “The class of CUB models: statistical foundations, inferential issues and empirical evidence”. Statistical Methods & Applications, 28, 477–493. →

Pimentel, J. (2019). Some biases in likert scaling usage and its correction. International Journal of Science: Basic and Applied Research, 45(1), 183–191. →

Regmi, P., Waithaka, E., Paudyal, A., Simkhada, P., & Teijlingen, E. V. (2016). Guide to the design and application of online questionnaire surveys. Nepal Journal of Epidemiology, 6(4), 640–644. →

Rhodes, S., Bowie, D., & Hergenrather, K. (2003). Collecting behavioural data using the world wide web: considerations for researchers. Journal of Epidemiology & Community Health, 57, 68–73. →

Roberts, C., Vandenplas, C., & Ernst Stähli, M. (2014). Evaluating the impact of response enhancement methods on the risk of nonresponse bias and survey costs. Survey Research Methods, 8, 67–80. →

Samejima, F. (1969). Estimation of ability using a response pattern of graded scores. Psychometric Monograph No. 17. Richmond: Psychometric Society. →

Sarracino, F., Riillo, C., & Mikucka, M. (2017). Comparability of web and telephone survey modes for the measurement of subjective well-being. Survey Research Methods, 11, 141–169. https://doi.org/10.18148/srm/2017.v11i2.6740. a, b

Sciandra, M., Fasola, S., Albano, A., & Plaia, A. (2024). Discrete beta and shifted beta-binomial models for rating and ranking data. Environ Ecol Stat, 31(5), 317–338. →

Simon, H. (1957). Models of man. Wiley. →

Simone, R. (2020). FastCUB: Fast EM and best-subset selection for CUB models for rating data. R package version 0.0.2.. →

Simone, R. (2022). On finite mixtures of discretized beta model for ordered responses. TEST, 31, 828–855. →

Simone, R. (2023). Uncertainty diagnostics of binomial regression trees for ordered rating data. Journal of Classification, 40, 79–105. →

Simone, R., Di Iorio, F., & Lucchetti, R. (2019). CUB for GRETL. In F. Di Iorio & R. Lucchetti (Eds.), GRETL 2019: Proceedings of the International Conference on GNU Regression, Econometrics and Time series Library (pp. 147–166). feDOA University Press. →

Simone, R., Tutz, G., & Iannario, M. (2020). Subjective heterogeneity in response attitude for multivariate ordinal outcomes. Econometrics and Statistics, 14, 145–158. →

Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modelling: multilevel, longitudinal, and structural equation models. Chapman; Hall/CRC. →

Sur, P., Shmueli, G., Bose, S., & Dubey, P. (2015). Modeling bimodal discrete data using conway–maxwell–poisson mixture models. Journal of Business and Economic Statistics, 33, 352–365. →

Tourangeau, R., & Yan, T. (2007). Sensitive questions in surveys. Psychological bulletin, 133(5), 859. →

Tourangeau, R., Rips, L., & Rasinski, K. (2000). The psychology of survey response. Cambridge University Press. →

Tourangeau, R., Couper, M., & Conrad, F. (2004). Spacing, position, and order: interpretive heuristics for visual features of survey questions. Public opinion quarterly, 68(3), 368–393. →

Tourangeau, R., Couper, M., & Conrad, F. (2013). Up means good”: the effect of screen position on evaluative ratings in web surveys. Public Opinion Quarterly, 77(S1), 69–88. https://doi.org/10.1093/poq/nfs063. →

Tourangeau, R., Yan, T., & Sun, H. (2020). Who can you count on? understanding the determinants of reliability. Journal of Survey Statistics and Methodology, 8, 903–931. →

Tutz, G. Ordinal regression: a review and a taxonomy of models. WIREs Computational Statistics, 14(2), e1545.

Tutz, G. (2012). Regression for categorical data. Cambridge University Press. a, b

Ursino, M., & Gasparini, M. (2018). A new parsimonious model for ordinal longitudinal data with application to subjective evaluation of a gastrointestinal disease. Stat Methods Med Res, 27(5), 1376–1393. →

Van Vaerenbergh, Y., & Thomas, T. (2012). Response styles in survey research: a literature review of antecedents, consequences, and remedies. International Journal of Public Opinion Research, 25(2), 195–217. →

Vannieuwenhuyze, J. (2013). On the relative advantage of mixed-mode versus single-mode surveys. Survey Research Methods, 8, 31–42. →

Vannieuwenhuyze, J., Loosveldt, G., & Molenberghs, G. (2010). A method for evaluating mode effects in mixed-mode surveys. Public Opinion Quarterly, 74(5), 1027–1045. https://doi.org/10.1093/poq/nfq059. →

Velez, P., & Ashworth, S. D. (2007). The impact of item readability on the endorsement of the midpoint response in surveys. Survey Research Methods, 12, 69–74. →

Responses to questionnaire items can be influenced by various factors including sample design, interview mode and/or how questions are phrased. To analyse these aspects, this paper draws on the Bank of Italy’s surveys of households and firms, which employ different survey modes or questions with different phrasings, response options, or graphical features for sub-samples of respondents. We exploit the potential of CUB (Combination of Uniform Discrete and shifted Binomial random variables) modelling for the analysis of ordinal data. CUB models are able to capture and identify the different components of the cognitive process behind the responses and to study how these are related to the relevant covariates (such as respondents’ characteristics). The results show that in general, although diverse survey modes and a different phrasing or graphical representation of questions may yield somewhat different findings in terms of uncertainty, responses to relevant questions such as those on reported satisfaction or expectations did not produce pronounced differences in data reliability.

Model (6, she(2))

Model (6, she(3))