Challenges with Locality and State in ExPL Annotations, Part I

ExPL, the experiment specification language of Helical, relies on user-provided annotations to infer an implicit post-interventional causal structure and query set. Annotations are attached to ExPL expressions and provide a link to the program variables in HyPL. Because ExPL encodes stateful executions, we need to be able to reason about the scope over which stateful operations apply. In this two-part blog post we'll talk about some challenges associated with using these annotations.

Preliminaries: ExPL as IMP

First of all, let's start by giving an abstract grammar for ExPL. I wanted ExPL to feel like the languages that people typically use to code up their experiments.¹ To this end, the language can be thought of as a variation on IMP, i.e.,:

Boolean Expression    bexp ::= true | false | id | not <bexp> 
                               | <bexp> and <bexp> | <bexp> or <bexp> 
                               | <aexp> = <aexp> | <aexp> < <aexp>
Arithmetic Expression aexp ::= n | id 
                               | <aexp> + <aexp> | <aexp>  - <aexp> 
                               | <aexp> * <aexp> | <aexp> / <aexp>
Statement             stmt ::= id = <aexp>                          
                               | <stmt>; <stmt>                     
                               | if <bexp> then <stmt> else <stmt> fi  
                               | while <bexp> do <stmt> done        
Program               prgm ::= <stmt>

ExPL's annotations connect the values of various ExPL expressions with their corresponding high level variables, as defined in the associated HyPL program. We can add annotations anywhere we use expressions. Since in our evaluation semantics we will discard annotations, we can just add an arm to the expression productions to capture annotated expressions:

Annotation            annt ::= @(observes id) 
                               | @(controls id) 
                               | @(intervenes id)
                               | <annt> <annt>
Boolean Expression    bexp ::= <bexp> <annt> | ...
Arithmetic Expression aexp ::= <aexp> <annt> | ...
Statement             stmt ::= ...
Program               prgm ::= ...

We also need to capture the idea that ExPL programs execute effectful foreign programs. We do this with a run or exec statement that have positional and/or keyword arguments that are expressions and outputs that are annotated:

Annotation            annt ::= ...
Boolean Expression    bexp ::= ...
Arithmetic Expression aexp ::= ...
Statement             stmt ::= run id <parg> <karg> @> <annt> | ...
Positional Arguments  parg ::= <aexp> | <bexp> | <parg> <parg>
Keyword Arguments     karg ::= id = <aexp> | id = <bexp> | <karg> <karg>
Program               prgm ::= ...

Minimal syntactic forms from IMP needed for ExPL

ExPL, by design, isn't a Turing-complete language. There are two major reasons for this: ExPL programs must terminate and ExPL programs must meaninfully test causal or associational relations.

IMP is a Turing-complete language; these two requirements for ExPL programs each impact an arm of IMP's statement productions.

While-loops and termination

ExPL is a specification language, so there are actually two termination states we would like to guarantee: the termination of the compilation from ExPL to the target language (e.g., bash) and the termination of the resulting compiled program (modulo foreign calls).² Our goal is to have no loops in the compiled program. For example, we would like to be able to compile this program:

prog_input = 1;

while prog_input <= 16
do 
  run testprog prog_input @(intervenes X) > @(observes Y);
  prog_input = prog_input * 2 
done

...into something like this program:

/path/to/testprog 4  | /path/to/parsing/prog 
/path/to/testprog 1  | /path/to/parsing/prog 
/path/to/testprog 2  | /path/to/parsing/prog 
/path/to/testprog 16 | /path/to/parsing/prog 
/path/to/testprog 8  | /path/to/parsing/prog

However, the following would not be a valid or meaningful ExPL program:

prog_input = 1;

while true 
do 
  run testprog prog_input @(intervenes X) > @(observes Y);
  prog_input = prog_input * 2 
done

That is, we want to restrict ourselves to bounded loops. Therefore, we need to modify the while-loop arm of the statement production. The syntactic form we will use looks like so:

stmt ::= for id <annt> in ??? <annt> do <stmt> done

One way to re-interpret our desire to have bounded loops is that what we actually want are iterators, or at least the statement version of an iterator expression. In fact, iteration gives us the abstraction necessary to produce the non-deterministic ordering of the output implied by the unrolled compiled program above.

To ensure that our iteration is finite, we simply restrict the iterable expression to be finite:

Annotation            annt ::= ...
Boolean Expression    bexp ::= ...
Arithmetic Expression aexp ::= ...
Iterable Expression   iexp ::= [] | id 
                               | [aexp, ...] | [bexp, ...] | [iexp, ...] 
                               | <iexp> <annt>
Statement             stmt ::= ...
                               | for id in <iexp> do <stmt> done
Positional Arguments  parg ::= ...
Keyword Arguments     karg ::= ...
Program               prgm ::= ...

This modification removes our ability to loop indefinitely during the loop unrolling of compilation.

If-statements and causal structures

One of the things I noticed when working on PlanAlyzer was that if-statements in programming languages can map to two distinct ideas in causal modeling: conditional distributions and causal-mechanistic dependence. For example, the program below describes sampling from the joint assignment distribution for (flip, banner_color):

flip = bernoulliTrial(p=0.6, unit=userid)
if (flip) {
    banner_color = uniformChoice(["red", "green", "blue"], unit=userid)
} else {
    banner_color = weightedChoice(["red", "green", "blue"], weights=[1, 2, 4], unit=userid)
}

The above program is equivalent to the following:

flip = bernoulliTrial(p=0.6, unit=userid)
banner_color = weightedChoice(["red", "green", "blue"], weights=[9, 11, 15], unit=userid)

However, if we instead have the program:

flip = bernoulliTrial(p=0.6, unit=userid)
if (flip) {
    banner_color =  "red" 
} else {
    banner_color = uniformChoice(["green", "blue"], unit=userid)
}

then banner_color is mechanistically determined in certain circumstances by the value of flip. We cannot flatten the program as we did before.

There are additional oddities we see in PlanOut that we do not want in ExPL. For example, now let's consider an assignment distribution for (flip, banner_color, background_color), written in PlanOut with the form:

flip = bernoulliTrial(p=0.6, unit=userid)
if (flip) {
    banner_color = uniformChoice(["red", "green", "blue"], unit=userid)
    # insert code here to do some stuff that sets the background_color 
    # deterministically on the basis of banner_color
} else {
    background_color = weightedChoice(["red", "green", "blue"], weights=[1, 2, 4], unit=userid)
    # insert code here to do some stuff that sets the banner_color 
    # deterministically on the basis of background_color
}

Now the direction of the causal arrow insofar as it exists is reversed in each branch. I say "insofar as it exists" because the rewrite we did earlier to produce a flattened program makes it clear that control flow is an imperfect proxy for causal dependence. Things get even weirder when allow programs such as:

flip = bernoulliTrial(p=0.6, unit=userid)
if (flip) {
   banner_color = uniformChoice(["red", "green", "blue"], unit=userid) 
} else {
   background_color = weightedChoice(["red", "green", "blue"], weights=[1, 2, 4], unit=userid)
}

This program effectively describes two disjoint experiments.

ExPL has different affordances from PlanOut — for example, ExPL does not include specific assignment probabilities; this information, if needed, would be included in a configuration file. However, including conditionals could lead to similar problems. Therefore, I excluded all branching from ExPL proper. I'll leave further discussion of the implications of this design choice to a later blog post. For now, it simply means that we remove the conditional arm of the statement production.

Now that we have modified IMP to represent ExPL — adding annotations, adding a run/exec statement, removing branching, and removing unbounded loops — we can proceed to reason about two interpretation functions: $\mathcal{G}[[\cdot]]$ and $\mathcal{Q}[[\cdot]]$ , which we will discuss in part 2 of this blog post.

Aside: What languages do people "typically use" to encode their experiments? What follows isn't meant to be an answer to this question, but was a starting point/justification for the approach I would take. A few years ago I searched the ACM DL for SIGPLAN papers published between 2018 and 2020 that had artifact badges and used the word "benchmarks" somewhere. I ordered by citation count and then selected the top 50 papers. The represented conferences included ASPLPS, PPoPP, LCTES, CGO, PLDI, and CPP. Forty of these papers reported running experiments. I then recorded information about the type of experiment (intervention, observation, existential), whether there was a baseline comparison, and the language(s)/platform(s)/tool(s) used to encode experimentation logic. The list of technologies I observed for the latter task in this small set included: python, bash, Jupyter, C++, make, and C. These technologies are what I would expect. I would have also expected to see some slurm or qsub scripts, but it's possible that only scripts for local and sequential evaluation were provided as part of the artfact. ↩
While I'm not a compilers person, I have read the occasional compilers paper and written plenty of program->program programs. I don't recall termination of the compiler itself ever being something I've read about as an issue in an academic paper. In my own experiences writing translators or compilers, I have written rewrite rules that led to cycles, but those were all easy to diagnose and fix. I generally have the sense that termination of the compiler is not an interesting problem. ↩