Checklist for Web Experiments
(The following is not a comprehensive checklist for experimenters planning to run web-based procedures. It represents only some of the most important steps experimenters should take to ensure their procedure is functional and their analysis is sound. Many other considerations should be taken into account, depending on the particular method employed and population targeted by the experimenter.)
Testing your procedure
- ❑ Create a Mechanical Turk worker account so you can test your procedure as a participant before launching publicly
- ❑ Set a custom qualification and assign it only to your own worker account
- ❑ Launch your experiment with the custom qualification assigned to your worker account, preventing other workers from viewing the HIT
- ❑ Log into your worker account, and complete your HIT at least 8 times, each time using one of the following 8 configurations (Browser + Operating System):
- Chrome + Mac
- Chrome + Windows
- Firefox + Mac
- Firefox + Windows
- Edge + Mac
- Edge + Windows
- Safari + Mac
- Safari + Windows
- ❑ Check the URLs for each page in your procedure to ensure they do not contain any information that might reveal your experimental manipulation
- ❑ Check your logged data to make sure all activity is being tracked appropriately
- ❑ Have a few friends complete your survey and time the duration of their responses to each question type, so you have some sense of how long it *should* take the average participant to complete the procedure
Protecting yourself against low quality data
- ❑ Always include a neutral answer (e.g. “Choose here”) as the preselected or default response in form elements such as drop down lists
- ❑ Test multiple response styles (e.g. radial, slider, drop down) to determine whether response type may bias results
- ❑ Do not advertise eligibility requirements as part of the study sign up process, instead use qualification restrictions (such as country, number of studies completed, etc.) or a pre-survey to restrict eligibility (Siegel et al., 2015)
- ❑ Prevent retakes by assigning custom qualifications through the Mechanical Turk API, or using Qualtrics’s built-in system
- ❑ Collect IP addresses and location data, as these can be used to detect bot activity
- ❑ Use a transcription task at the start of your experiment that requires participants to transcribe a photograph of handwritten words
- ❑ Use a comprehension check with a non-obvious pattern (i.e. answers to multiple choice comprehension questions should not all be assigned to the top radial in each choice list)
- ❑ Include a mandatory free-form response at the end of your survey (you can instruct participants simply to type “na” if they have no response/comments)
Reducing the negative impact of attrition
- ❑ Ask personal demographic questions, and announce chances for additional financial incentives at the start of your procedure, as both of these measures have been shown to reduce dropouts (Reips, 2002a)
- ❑ Introduce a high hurdle task by concentrating motivationally adverse factors toward the start of the procedure
- ❑ Ask participants to estimate the likelihood they will complete the whole experiment before they begin the procedure
- ❑ Use a prewarning with an appeal to conscience (e.g. “Our research depends on good quality data, and that quality is compromised whenever participants fail to complete the full procedure.”) (Zhou & Fishbach, 2016)
- ❑ Use practice trials, pilot new experimental protocols, or ask participants to complete a boring task prior to starting the focal experimental procedure
Analyzing your data
- ❑ Wait until all data have posted to your server, or deactivate your Qualtrics survey, before beginning your analysis – some platforms (e.g. jsPsych, Qualtrics) do not post unfinished responses to the server for up to one week after a participant initiates her response, and you will not be able to measure attrition rates until these data have been posted
- ❑ Check every free-form response you have collected, looking for suspicious patterns (e.g. one script I’ve detected in my data copies text from the surrounding page, mixes up the words, then pastes the result into the free-form response box)
- ❑ Check for repeated IP addresses and latitude/longitude coordinates, as significant numbers of such repeated information may be a sign of bot activity (Bai, 2018)
- ❑ Include browser and operating system as factors in your analysis, as system configurations may impact participants’ interaction with your procedure, and have been found to be correlated with important individual differences that may affect your results (Buchanan & Reips, 2001)
References and Additional Resources
Arechar, A. A., Kraft-Todd, G. T., & Rand, D. G. (2017). Turking overtime: how participant characteristics and behavior vary over time and day on Amazon Mechanical Turk. Journal of the Economic Science Association, 3(1), 1-11.
Bai, H. (2018). Evidence that A Large Amount of Low Quality Responses on MTurk Can Be Detected with Repeated GPS Coordinates. Retrieved from: https://www.maxhuibai.com/blog/evidence-that-responses-from-repeating-gps-are-random
Behrend, T. S., Sharek, D. J., Meade, A. W., & Wiebe, E. N. (2011). The viability of crowdsourcing for survey research. Behavior Research Methods, 43(3), 800.
Buchanan, T., & Reips, U.-D. (2001). Platform-dependent biases in Online Research: Do Mac users really think different? In K. J. Jonas, P. Breuer, B. Schauenburg, & M. Boos (Eds.), Perspectives on Internet Research: Concepts and Methods.
Casey, L. S., Chandler, J., Levine, A. S., Proctor, A., & Strolovitch, D. Z. (2017). Intertemporal differences among MTurk workers: Time-based sample variations and implications for online data collection. SAGE Open, 7(2), 2158244017712774.
Chandler, J., Mueller, P., & Paolacci, G. (2014). Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavior Research Methods, 46, 112–130.
Chandler, J. J., & Paolacci, G. (2017). Lie for a dime: When most prescreening responses are honest but most study participants are impostors. Social Psychological and Personality Science, 8(5), 500-508.
Dennis, S. A., Goodson, B. M., & and Pearson, C. (2018). Mturk Workers’ Use of Low-Cost ‘Virtual Private Servers’ to Circumvent Screening Methods: A Research Note. Available at SSRN: https://ssrn.com/abstract=3233954
Krantz, J. H., & Reips, U. D. (2017). The state of web-based research: A survey and call for inclusion in curricula. Behavior Research Methods, 49(5), 1621-1629.
Necka, E.A., Cacioppo S., Norman G.J., Cacioppo, J.T. (2016). Measuring the Prevalence of Problematic Respondent Behaviors among MTurk, Campus, and Community Participants. PLoS ONE, 11(6): e0157732.
Plant, R. (2016). A reminder on millisecond timing accuracy and potential replication failure in computer-based psychology experiments: An open letter. Behavior Research Methods, 48(1), 408-411.
Reips, U. D. (2002a). Standards for Internet-based experimenting. Experimental psychology, 49(4), 243.
Reips, U. D. (2002b). Internet-based psychological experimenting: Five dos and five don’ts. Social Science Computer Review, 20 (3), 241-249.
Reips, U. D. (2010). Design and formatting in Internet-based research. Advanced methods for conducting online behavioral research, 29-43.
Sharpe Wessling, K., Huber, J., & Netzer, O. (2017). MTurk character misrepresentation: Assessment and solutions. Journal of Consumer Research, 44(1), 211-230.
Siegel, J. T., Navarro, M. A., & Thomson, A. L. (2015). The impact of overtly listing eligibility requirements on MTurk: An investigation involving organ donation, recruitment scripts, and feelings of elevation. Social Science & Medicine, 142, 256-260.
Zhou, H., & Fishbach, A. (2016). The pitfall of experimenting on the web: How unattended selective attrition leads to surprising (yet false) research conclusions. Journal of Personality and Social Psychology, 111(4), 493.