﻿* Encoding: UTF-8.
*Encoding: UTF-8.
***Syntax for data preparation of the EU-SILC data of 2008 to 2012
***Syntax is identical for each year of the analyzed time period
***Date of syntax 20.09.2016 by Verena Ortmanns
***Refers to the article "Can we assess representativeness of cross-national surveys using the education variable? 
by Verena Ortmanns and Silke L. Schneider published in Survey Research Methods



*Restriction of the age groupe to those aged 25 to 64.
*Change of the year of the dataset in the second compute comand has to be done manually.
compute age eq pb140.
compute age=2008 - pb140.
exe.
Formats age (F2.0).
freq age.
crosstab pb140 by age.
select if (age ge 25 and age le 64).

*Show distribution of the education variable.
freq pe040.

*Recode the education variable.
miss val pe040 ().
compute ISCED97_5 eq pe040.
recode ISCED97_5 (0 eq 1).
recode ISCED97_5 (1 eq 1).
recode ISCED97_5 (2 eq 2).
recode ISCED97_5 (3 eq 3).
recode ISCED97_5 (4 eq 4).
recode ISCED97_5 (5 eq 5).
recode ISCED97_5 (sysmis eq 9).

variable labels ISCED97_5 "5-level version of ISCED 1997".
value labels ISCED97_5
1 "ISCED 0-1"
2 "ISCED 2"
3 "ISCED 3"
4 "ISCED 4"
5 "ISCED 5-6"
9 "Missing".

Format ISCED97_5 (F1.0).
Missing values ISCED97_5 (9).

freq ISCED97_5.
crosstab pe040 by ISCED97_5.

*Use the design weight.
weight by pb040.

*Split file by country variable and calculate the distribution of the ISCED97_5 variable.
split file separate by pb020.
freq ISCED97_5.
*These distributions can be find in the provided excel-table*