Readme Replication of "Invitation Messages for Business Surveys: A Multi-Armed Bandit Experiment"
by Johannes J. Gaul, Florian Keusch, Davud Rostam-Afschar and Thomas Simon



Overview:

This replication package contains the code used to conduct the analysis in "Invitation Messages for Business Surveys: A Multi-Armed Bandit Experiment" using Python, Jupyter, and Stata.
Two main scripts execute the full analysis, generating the data for six main figures, four appendix figures, and three appendix tables. The replication process is expected to complete in under 30 minutes.



Software Requirements:

The analysis is conducted using Python and Stata, with the following configurations:
- Python configuration: See "environment.yml" or "environment.txt" (e.g., using the Python software distribution Anaconda)
- Stata: Version 17 or higher. Stata executes a Python script for the bbandits code (Kemper and Rostam-Afschar, 2025). Therefore, the following Python packages must be installed in the Python distribution accessed by Stata: scipy, pandas, scikit-learn, and statsmodels.



Hardware Requirements (exemplarily):

Manufacturer - LENOVO
Model - 20WNS2XP00
CPU - 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
NumberOfCores - 4
NumberOfLogicalProcessors - 8
Physical Memory - 15.7 GB



Data Availability and Provenance Statements:

Online accessible replication: The file "GKRS2025_business_survey_mab" contains the the experiment's data. Using this file and the publicly available code, all results can be replicated, except for the maps in Appendix A.2.
A second, extended version of this dataset "GKRS2025_business_survey_mab_all" includes the geocoordinates of the contacted firms. Due to data protection regulations, we are not permitted to make this geocoding data publicly available. However, independent third-party replicators may request on-site or controlled remote access to this dataset after signing a Data Use Agreement.
While we cannot provide the full dataset publicly, we offer unrestricted online access to all program code used to generate the results. The replication of Appendix A.2 is possible using openly available shapefiles of Germany:

Verwaltungsgebiete 1:250 000 Stand 01.01.2022 (VG250 01.01.)
Data Provider: Bundesamt für Kartographie und Geodäsie
Licence: Data Licence Germany – Attribution – Version 2.0 (dl-de/by-2-0)
Dataset Reference (URI): https://gdz.bkg.bund.de/index.php/default/open-data/verwaltungsgebiete-1-250-000-stand-01-01-vg250-01-01.html
Changes: -



Statement about Rights:

We certify that the author(s) of this manuscript have legitimate access to the data used in this study and have obtained the necessary permissions for its use.



Summary of Availability:

- "GKRS2025_business_survey_mab" is available to the public: https://doi.org/10.7802/2836.
- "GKRS2025_business_survey_mab_all" which includes the geocoordinates of the contacted firms, is available for onsite replication only due to data protection regulations.



Memory, Runtime, Storage Requirements

10-30 Minutes
2 GB to 25 GB



Description of Programs/Code:

- The folder "01_Inputs" contains the data.
- The folder "02_Outputs" contains all outputs.
- "GKRS_Empirical_Replication_STATA.do" replicates Figures 4-7, Figures A.1 and A.3 of the Appendix and Tables A.1 - A.3.
- "GKRS_Empirical_Replication_Python.ipynb" replicates Figures 2, 3 and Figure A.4 of the Appendix. Figure A.2 can be replicated on-site using an extended input data set (set above).
- "GKRS_Empirical_Replication_Python.ipynb" furthermore provides the functions that calculate the multi-armed bandit distribution weights (equation 2, 3) and a calculation example following Scott (2010).



Sequence of Running the Replication Files:

1. Ensure that all package requirements are met, preferably by creating a new Python environment in Anaconda using the provided environment files.
2. Verify that, in addition to meeting the Python environment requirements, the Python distribution accessed by Stata 17 or higher includes the following packages: scipy, pandas, scikit-learn, and statsmodels.
3. Update the global input and output paths in the Stata do-file "GKRS_Empirical_Replication_STATA.do" (lines 17 and 18).
4. Run: "GKRS_Empirical_Replication_STATA.do" to replicate Figures 4–7, Figures A.1 and A.3 in the Appendix, and Tables A.1–A.3. The output will be saved in the "02_Outputs" folder.
5. Run: "GKRS_Empirical_Replication_Python.ipynb" to replicate Figures 2, 3 and Figures A.4 in the Appendix. The output will be saved in the "02_Outputs" folder.
6. For replicating Figure A.2 in the appendix, run the commented-out section of "GKRS_Empirical_Replication_Python.ipynb". This part of the code requires access to the on-site version of the dataset containing geocoordinates.
7. Furthermore, "GKRS_Empirical_Replication_Python.ipynb" contains the algorithm that calculates the MAB distribution weights and a calculation example.



List of Tables and Programs:

Figure 1	(Not computed; graphically constructed for illustrative purposes)
Figure 2	jupyter (python) file - "GKRS_Empirical_Replication_Python.ipynb"
Figure 3	jupyter (python) file - "GKRS_Empirical_Replication_Python.ipynb"
Figure 4	Stata (17 or higher) file - "GKRS_Empirical_Replication_STATA.do"
Figure 5	Stata (17 or higher) file - "GKRS_Empirical_Replication_STATA.do"
Figure 6	Stata (17 or higher) file - "GKRS_Empirical_Replication_STATA.do"
Figure 7	Stata (17 or higher) file - "GKRS_Empirical_Replication_STATA.do"
Table A.1	Stata (17 or higher) file - "GKRS_Empirical_Replication_STATA.do"
Table A.2	Stata (17 or higher) file - "GKRS_Empirical_Replication_STATA.do"
Table A.3	Stata (17 or higher) file - "GKRS_Empirical_Replication_STATA.do"
Figure A.1	Stata (17 or higher) file - "GKRS_Empirical_Replication_STATA.do"
Figure A.2	jupyter (python) file - "GKRS_Empirical_Replication_Python.ipynb"
Figure A.3	Stata (17 or higher) file - "GKRS_Empirical_Replication_STATA.do"
Figure A.4	Stata (17 or higher) file - "GKRS_Empirical_Replication_STATA.do" (replicates the calculations using bbandits as provided by Kemper and Rostam-Afschar, 2025) - jupyter (python) file - "GKRS_Empirical_Replication_Python.ipynb" contains the visualization for graphical consistency.



Description of Variables:

Name			Availability		Description

'id'			public			Unique identifier of the message recipient.
'batch'			public			Experimental week in which the message was sent.
'message'		public			Encoded characteristics of the scent message (e.g., P1A0U1D0M1).
'opened'		public			Indicator variable (1 = message was opened).
'started'		public			Indicator variable (1 = survey was started).
'latitude' 		on-site			Latitude of the company that received the invitation message, based on the street address.
'longitude' 		on-site			Longitude of the company that received the invitation message, based on the street address.
'finished'		public			Indicator variable (1 = survey was completed).
'personalized'		public			Indicator variable (1 = personalization treatment was applied).
'authority'		public			Indicator variable (1 = authority treatment was applied).
'topurl'		public			Indicator variable (1 = top-URL treatment was applied).
'dataprotection'	public			Indicator variable (1 = data protection treatment was applied).
'mode'			public			Indicator variable (1 = mode of address treatment was applied).
'previous'		public			Indicator variable (1 = respondent had participated in a previous survey, conditional on opening this survey).
'employees'		public			Number of employees, sourced from the Bureau van Dijk Orbis database.
'formereastgermany'	public			Indicator variable (1 = company is located in one of the former East German states or in Berlin)
'batch4opened'		public			Indicator variable (1 = subject opened the email before the weight calculation for batch 5).
'batch4started'		public			Indicator variable (1 = subject started the survey before the weight calculation for batch 5).
'batch5opened'		public			Indicator variable (1 = subject opened the email before the weight calculation for batch 6).
'batch5started'		public			Indicator variable (1 = subject started the survey before the weight calculation for batch 6).
'batch6opened'		public			Indicator variable (1 = subject opened the email before the weight calculation for batch 7).
'batch6started'		public			Indicator variable (1 = subject started the survey before the weight calculation for batch 7).
'batch7opened'		public			Indicator variable (1 = subject opened the email before the weight calculation for batch 8).
'batch7started'		public			Indicator variable (1 = subject started the survey before the weight calculation for batch 8).
'batch8opened'		public			Indicator variable (1 = subject opened the email before the weight calculation for batch 9).
'batch8started'		public			Indicator variable (1 = subject started the survey before the weight calculation for batch 9).
'batch9opened'		public			Indicator variable (1 = subject opened the email before the weight calculation for batch 10).
'batch9started'		public			Indicator variable (1 = subject started the survey before the weight calculation for batch 10).
'batch10opened'		public			Indicator variable (1 = subject opened the email before the weight calculation for batch 11).
'batch10started'	public			Indicator variable (1 = subject started the survey before the weight calculation for batch 11).
'batch11opened'		public			Indicator variable (1 = subject opened the email before the weight calculation for batch 12).
'batch11started'	public			Indicator variable (1 = subject started the survey before the weight calculation for batch 12).
'batch12opened'		public			Indicator variable (1 = subject opened the email before the weight calculation for batch 13).
'batch12started'	public			Indicator variable (1 = subject started the survey before the weight calculation for batch 13).
'batch13opened'		public			Indicator variable (1 = subject opened the email before the weight calculation for batch 14).
'batch13started'	public			Indicator variable (1 = subject started the survey before the weight calculation for batch 14).
'batch14opened'		public			Indicator variable (1 = subject opened the email before the weight calculation for batch 15).
'batch14started'	public			Indicator variable (1 = subject started the survey before the weight calculation for batch 15).



References:
Kemper, J. / Rostam-Afschar, D. (2024). Earning While Learning: How to Run Batched Bandit Experiments, Unpublished.
Scott, S. L. (2010). A modern Bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry, 26(6):639–658.