POPL 2015 Artifact Evaluation Revisted
Just over a decade again POPL ran its first artifact evaluation and I was on it! I'm listed in the proceedings as "Emma F. Tosch" as an inside joke with Arjun, who ran the committee. I have what only I think is a Very Interesting Tale of reviewing, but that's not what this post is about. Instead, this post is about tracking down those old artifacts with just a pinch of oral history.
This post is the first in what I think might be a few where I try to re-run artifact evaluation from older conferences. It will read more like a lab notebook/diary than a blog post with a beginning, middle, and end.
Artifact evaluation today vs. yesteryear
Anyone who has submitted an artifact for review recently has of course had the benefit of over a decade of process debugging. Back then, artifact evaluation really was just a review process, but for (mostly) code. The purpose was largely to validate the claims that an artifact existing and that broad strokes such a particular process being automated were actually reflected in reality. If you had an implementation — especially if you had one that you wanted to demo or have other people use — you just included a link to the relevant resource in your paper. Thus, the artifact review process was entirely separate from the archiving and distribution of research artifacts. You could choose to upload your files to the ACM Digital Library's supplementary material server to make them discoverable from the DL, but there was no expectation for this.
I'd also note that back then we only had one badge, which indicated whether your paper passed evaluation or not. Today there are four or five badges that cover a range of conditions for repeatability, replicability, reproducibility, functionality, and availability. Back then the point was just to say that the stuff you claimed existed and "worked" actually existed and mostly worked.
Revisiting yesteryear
Given this context, re-evaluating some of these older artifacts is unsurprisingly harder than you might think. On a high level, I'd note that I was unable to find any references or pointers to artifacts or the artifact evaluation website in the proceedings. I did find the URL for the old website (it's still indexed!), but it's no longer live, since it was hosted on a university subdomain and the faculty member responsible for it has since moved to a different institution.
Hunting for artifacts, case-by-case
Let's start by taking a look at the most cited papers from POPL 2015 and see if we can find some artifacts!
The most cited paper (by a lot!) is the JSNice paper. While the DL only contains the talk in the supplementary material, the paper contains a link to the demo. The link to the demo (http://jsnice.org) actually still works! I ran it on an example and everything seems to still work. Cool! Unfortunately, there doesn't seem to be a link to the source code in the paper, nor on the website. Not as cool! I found a GitHub repo by someone who is neither an author nor listed in any acknowledgements (this paper has no acknowledgements) and has not been maintained since 2020. I didn't find much else.
The next most cited paper from POPL 2015 is Iris: Monoids and Invariants as an Orthogonal Basis for Concurrent Reasoning, which apparently won a test of time award this year. Cool! Again, the only supplementary material is a video of the talk. It has an artifact badge on the first page, so the authors definitely submitted for review. However, I can't find any references to artifacts in the paper or external links. I tried searching for the paper title and the word "artifact". I found the press release for the award, through which I found the Iris website. I also found the 2015 talk, which mentions mechanization in Rocq(Coq) on slide 24. It turns out that the Iris website actually includes the old mechanization (marked as historic)! Cool, let's dive in!
Executing ten year old proofs
So the first thing I did, of course, was naively just execute the Makefile without reading anything. I like to live dangerously like that. In doing so, I got the error:
Error: Cannot find a physical path bound to logical path Ssreflect.ssreflect.
Welp. IIRC, SSreflect is part of the Rocq(Coq) standard library, so the next thing I did was check whether I have Rocq installed on this machine...and I do not!
Okay, so before installing the latest Rocq — which has definitely changed in the last decade! — I actually looked at the README to see if they listed a version. They do not! Okay then, let's go.
sudo apt install coq
Although the official name of this proof assistant is now Roq, I assumed that the rename might cause problems, so I tried installing it under the old name. Here's what I got:
The Coq Proof Assistant, version 8.18.0
compiled with OCaml 5.2.0
make -j
I executed this command from iris-20151106/coq-ho and am still getting the SSreflect error. Upon closer inspection, I'm seeing the following error:
"coqc" -q -R "." _ -I "." lib/ModuRes/Util
Warning: File "./lib/ModuRes/Util" has been implicitly expanded to
"./lib/ModuRes/Util.v" [file-no-extension,filesystem,default]
File "./lib/ModuRes/Util.v", line 1, characters 0-35:
Error: Cannot find a physical path bound to logical path Ssreflect.ssreflect.
make: *** [Makefile:185: lib/ModuRes/Util.vo] Error 1
Okay so maybe I didn't adequately watch my doors and corners on this one. Let's try installing Rocq The Right Way: via the roq-prover platform. Hang tight.
Getting and installing the platform
wget https://github.com/coq/platform/archive/refs/tags/2025.01.0.zip
unzip 2025.01.0.zip
cd platform-2025.01.0
./coq_platform_make.sh
Now, when running the install script I have the option of selecting a Coq version from as far back as 2021! Let's just go ahead and select the most recent version...
[ERROR] Package dune-configurator has no version 3.16.1.
Well well well. There's a Stack Overflow post from a few months ago that discusses this error. Since I don't strictly require this version opam, I'm going to try installing the second newest version, this time from 2024. Tellingly, it took much longer to install (and then fail!) than the more recent version.1
After running for over an hour, I got the following output:
### output ###
# Package 'gtk+-3.0' was not found
Coooool. So it turns out I have gtk4 installed. According to the AI summary that Google is forcing upon me as its top hit, one can have gtk3 and gtk4 installed simultaneously. I'm a little frustrated that the platform install script didn't check this dependency before tying up my computer for over an hour, but it is what it is. I went ahead and installed libgtk3-3-dev and tried again. After over an hour of running through the installation process again, I got a new error!
# Package 'gtksourceview-3.0', required by 'virtual:world', not found
Rather than chase down this dependency, I next tried something different: a minimal installation (no fancy crpto or univalent libraries, which take a looooong time to install; no IDEs, which I'm guessing are causing the GTK issues). This took significantly less time and appeared to terminate successfully. Yay!! I switched to the newly installed opam switch and enabled it (i.e., opam switch CP.2025.01.0~8.19~2024.10; eval $(opam env). Time to try running the iris proof code!
Aside: The January 2025 is the latest stable release of the Roq platform and is still enearthing issues. I'd note that any analysis of the human factors regarding ease of reproducibility of historic artifacts would need to use as a baseline comparable current projects: the above error is only about installing the most recent Roq platform and is unrelated to the artifact.
make -j part 2
Does it work? Definitely not. A quick search for this issue reveals that all the recent responses are using _Coqproject, which I am not seeing in this source code. Instead, the Makefile refers to a COQBIN environment variable, which is definitely not set in my current shell.
The Makefile calls coqtop, which exists on my system. I decided to try to import the first two libraries interactively:
Coq < Require Import Ssreflect.ssreflect.
Toplevel input, characters 0-35:
> Require Import SSreflect.ssreflect.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Error: Cannot find a physical path bound to logical path SSreflect.ssreflect.
Coq < Require Import Setoid SetoidClass.
[Loading ML file extraction_plugin.cmxs (using legacy method) ... done]
Interesting! While importing Ssreflect fails, importing Setoid does not. This suggests a problem with the Ssreflect module specifically, rather than imports generally. My current working hypothesis is that Ssreflect has moved or been renamed. I found the ssreflect chapter of the Rocq manual and it looks like the rename/movement hypothesis has some support:
Coq < From Coq Require Import ssreflect.
[Loading ML file ssrmatching_plugin.cmxs (using legacy method) ... done]
[Loading ML file ssreflect_plugin.cmxs (using legacy method) ... done]
Re-running make -j with an edit to the ssreflect import statement, we now get through all the import statements, but fail on the last line:
"coqdep" -c -R "." _ -I "." "lib/ModuRes/Util.v" > "lib/ModuRes/Util.v.d" || ( RV=$?; rm -f "lib/ModuRes/Util.v.d"; exit ${RV} )
Warning: Unknown option "-c". [unknown-option,default]
"coqc" -q -R "." _ -I "." lib/ModuRes/Util
Warning: File "./lib/ModuRes/Util" has been implicitly expanded to
"./lib/ModuRes/Util.v" [file-no-extension,filesystem,default]
File "./lib/ModuRes/Util.v", line 73, characters 48-60:
Error: The reference Wf_nat.lt_wf was not found in the current environment.
make: *** [Makefile:185: lib/ModuRes/Util.vo] Error 1
Once again, I'm sure the issue is that a module has been moved around. I'm going to stop my efforts to re-execute the proofs here.
Some thoughts
There's a reason iris was awarded test of time recognition — to see the long list of work that had built on this foundation is impressive! It's equally impressive that the research staff maintaining the iris website has kept this historic progression of work.
Becuase the iris team continued this line of work, the fact that we cannot run the old scripts with newer Rocq versions is more of an interesting fact, rather than something indicative of how we do research. The research team presumably iterated upon and expanded on their proof scripts. There is new Rocq code available on the iris website from as recently as this year!
However, what would have happened to all of this work had the core team lost momentum? Imagine what it would be like for there to be a fallow period, after which a student tries to pick up this work. Connecting past work to current work appears quite laborious, especially for someone outside the core research team.2 There are both philosophical and software engineering problems here: in order to build upon the promises of e.g. POPLMark, we either need backwards compatability in theorem proving systems, or we need these programs to alert the user that they are attempting to execute a deprecated version of the code. The iris team helpfully marked this implementation as "historic," but it's a little silly to rely on informal annotations for an automated system!
There's no doubt that a containerized distribution of the original proof scripts would have made this process easier to repeat. That said, repeatability is a distinct goal from reproducibility. A major unaswered question that arises when using containers for reproducibility is whether a failure to reproduce results using a non-containerized installation procedure is due to a fundamental change in the underlying system (e.g., correcting an error) or due to unrelated factors (e.g., library or module name changes).
-
I'd note that I initially tried to install the 2025 platform while commuting. This is generally not a great idea, since I have to tether (my train has no Wifi) and my route passes through at least two cellular deadzones. ↩
-
It's possible that I was one of the reviewers for this artifact a decade ago (I remember reviewing some very intense Coq scripts, but I have no idea if this was the paper; this was, after all, a POPL paper!); while I am not an active Rocq user, I am familiar enough with this tool to be able to get things running. I now have decades of debugging experience; even if a new graduate student has more knowledge of the current Rocq ecosystem, I'm guessing it would take them at least as much time as it took me to work through the issues. ↩