PlanAlyzer: Assessing Threats to the Validity of Online Experiments

Abstract

Online experiments have become a ubiquitous aspect of design and engineering processes within Internet firms. As the scale of experiments has grown, so has the complexity of their design and implementation. In response, firms have developed software frameworks for designing and deploying online experiments. Ensuring that experiments in these frameworks are correctly designed and that their results are trustworthy—referred to as internal validity—can be difficult. Currently, verifying internal validity requires manual inspection by someone with substantial expertise in experimental design.

We present the first approach for statically checking the internal validity of online experiments. Our checks are based on well-known problems that arise in experimental design and causal inference. Our analyses target PlanOut, a widely deployed, open-source experimentation framework that uses a domain-specific language to specify and run complex experiments. We have built a tool called PlanAlyzer that checks PlanOut programs for a variety of threats to internal validity, including failures of randomization, treatment assignment, and causal sufficiency. PlanAlyzer uses its analyses to automatically generate contrasts, a key type of information required to perform valid statistical analyses over the results of these experiments. We demonstrate PlanAlyzer’s utility on a corpus of PlanOut scripts deployed in production at Facebook, and we evaluate its ability to identify threats to validity on a mutated subset of this corpus. PlanAlyzer has both precision and recall of 92% on the mutated corpus, and 82% of the contrasts it generates match hand-specified data.

Tosch's primary interests are in PL applications to data collection and analysis for social science domains. Her early work on SurveyMan — a language and framework for designing, debugging, and deploying surveys — has won first place in the 2014 ACM student research competition at PLDI, a best paper award at OOPSLA 2014, and a 2015 Outstanding Synthesis Award in the College of Computer and Information Sciences at the University of Massachusetts. Her recent work on PlanAlyzer — a static analysis tool for programmatically-defined experiments — was recognized as a SIGPLAN research highlight in 2020 and was honored as a Research Highlight in the September 2021 issue of the Communications of the ACM.

Emma Tosch earned her B.A in English Literature from Wellesley College in 2008 before working at a healthcare IT start up. She obtained a post-baccalaureate certificate and M.A. in Computer Science from Brandeis University in 2011 before earning her PhD from the University of Massachusetts Amherst in 2020.