Robust Lavallee-Hidiroglou stratified sampling strategy
Keywords: robust regression, stratified design, auxiliary data
AbstractThere are several reasons why robust regression techniques are useful tools in sampling design. First of all, when stratified samples are considered, one needs to deal with three main issues: the sample size, the strata bounds determination and the sample allocation in the strata. Since the target variable Y, the objective of the survey, is unknown, some auxiliary information X known for the entire population from which the sample is drawn, is used. Such information is helpful as it is typically strongly correlated with the target Y. However, some discrepancies between these variables may arise. The use of auxiliary information, combined with the choice of the appropriate statistical model to estimate the relationship between Y and X, is crucial for the determination of the strata bounds, the size of the sample and the sampling rates according to a chosen precision level for the estimates, as has been shown by Rivest (2002). Nevertheless, this regression-based approach is highly sensitive to the presence of contaminated data. Since the key tool for stratified sampling is the measure of scale of Y conditional on the knowledge of the auxiliary X, a robust approach based on the S-estimator of the regression is proposed in this paper. The aim is to allow for robust sample size and strata bounds determination, together with optimal sample allocation. Simulation results based on data from the Construction sector of a Structural Business Survey illustrate the advantages of the proposed method.
Copyright for articles published in this journal is retained by the authors, with first publication rights granted to the journal. By virtue of their appearance in this open access journal, users can use, reuse and build upon the material published in the journal but only for non-commercial purposes and with proper attribution.