What I've Been Reading (Oct 2022)

Talks

A Practical Guide to Applying Data-Oriented Design: This was a really nice talk – I’ve heard of data-oriented design before, but this talk gave a lot of really concrete examples about exactly what that means. Some of it was stuff I was familiar with (e.g. struct-of-arrays instead of array-of-structs), but the talk on data encodings (e.g. storing things in two arrays instead of adding a boolean field) new and interesting.
CHERI: A Modern Capability Architecture – this was a very nice introduction to CHERI, an experimental architecture that tries to build-in memory safety at the hardware level. I was vaguely aware of what CHERI was, but it was nice to have very concrete details about what specifically it does and how you can use it to build in safety guarantees. Highly recommended!

Articles

Sharing for a Lifetime – this was an article about how you define a formal semantics for Rust. I think the key idea (that types get to decide for themselves what the shared & invariant means) was super interesting. I haven’t done enough Rust to really run into this in practice (I’ve only done “toy” Rust programs that haven’t needed much interior mutability so far), but I’m sure it’ll help me in the future!
How They Made It: Bread – this was a nice overview of historical agriculture: what was involved, who actually did it, and what the social relations (and power structures) that created it. A great complement to some of the

Papers

A Series on Perennial Grains

After somehow running across the Land Institute, I somehow ended up falling down a strange rabbit hole investigating perennial grains. Apparently, all the grains we commonly rely on (e.g. wheat, maize, rice, potatoes) are annual plants which need to be replanted every year. This replanting is both expensive (requiring huge amounts of fertilizer and pesticide) and environmentally destructive (e.g. creating huge amounts of soil runoff, to the extent that 1/3rd of US agricultural lands have entirely lost their topsoil!). The Land Institute’s hope is that perennial grains can avoid these problems while being even more productive as well (since they don’t have to keep regrowing the same structures each year).

I read the following four papers about this overall vision:

I think the basic TLDR is that everyone agrees that the problems with modern agriculture are very real, but that (1) making perennial grains economically viable is a serious challenge (apparently, the naive approach leads to only 10-20% of the yield of annual grains), and (2) whether this is the best way to solve the problem. Smaje’s the only one who thinks that perennial grains might be impossible, but personally I didn’t find their argument very compelling.

I also tried to dig into what modern agricultural breeding research actually looks like and took a look at two papers:

Speed breeding is a powerful tool to accelerate crop research and breeding – this is basically a protocol for growing plants faster so you can get more generations of breeding per year (e.g. up to 6 generations a year). That’s pretty cool, but TBH even 6 generations of year seems painfully slow to my computer science sensibilities (I get frustrated at work when experiments taking more than a couple days to run!).
Roadmap for Accelerated Domestication of an Emerging Perennial Grain Crop – this identifies some useful traits of domesticated grains and associated genetic markers that can be specifically targeted for faster domestication. The details went over my head, but the most striking thing was how little we actually understood about the genetic

Overall, I thought this was really cool rabbit hole, and a good forcing function to review a bunch of basic botany and agriculture knowledge.

Metastable failures in distributed systems

https://dl.acm.org/doi/10.1145/3458336.3465286

This was a really neat paper – it defines a new category of “metastable” failures, which are failures that do not self-heal (e.g. there’s some feedback loop that sustains the failure over time). A retry storm is a good example of this: if a service is overloaded such that requests start to fail (and thus be retried by the client), those retrying requests can keep the service overloaded (and thus causing even more retrys later).

Although the paper (unfortunately) didn’t go into much detail about how to avoid metastable failures, just noticing the general features of this type of problem and providing a clear mental model was nice.

DBOS: a DBMS-oriented operating system

https://vldb.org/pvldb/vol15/p21-skiadopoulos.pdf

This was a nice vision paper about using distributed databases as a layer for defining distributed operating systems (instead of the million ad-hoc interfaces from things like Kubernetes, Slurm, Zookeper, etc).

I’ve been complaining for years that operating systems should expose a more database-like API for querying the internals (e.g. instead of having to parse /proc files) so I’m naturally on-board with this vision, but I wouldn’t say the paper did enough implementation work to convince me that this would actually work on real-world workloads. Hopefully in a future paper!

XRP: In-Kernel Storage Functions with eBPF

https://www.usenix.org/conference/osdi22/presentation/zhong

This paper details XRP, which uses eBPF to directly run logic in the Linux storage layer (similar to XDP and networking). The basic idea makes a ton of sense (modern SSDs are fast and syscall overhead can be pretty substantial), although I don’t have enough experience building high-performance database systems to say much about the actual API presented (or the technical limitations).

TBH, I’m a little surprised no one built this earlier (XDP is like 6 years old now!), but I’m glad it’s finally happening.

Finding missed optimizations through the lens of dead code elimination

https://ethz.ch/content/dam/ethz/special-interest/infk/ast-dam/documents/Theodoridis-ASPLOS22-DCE-Paper.pdf

This was a one of two really clever testing papers exploiting dead code optimization. The basic idea is that by inserting “markers” (i.e. special inline assembly instructions) into code, you can tell whether a compiler optimized out that entire basic block or not (e.g. because it was dead code). Then, by comparing two different compilers, you can find identify bugs (e.g. either one compiler is missing optimizations or one compiler is incorrectly optimizing out code). By randomly generating C code using C-Smith and repeating many many times, you get yourself an automated bug finder.

A nice, clever idea to identify bugs.

Compiler validation via equivalence modulo inputs

https://web.cs.ucdavis.edu/~su/publications/emi.pdf

This is a 2nd approach to finding compiler bugs – the basic idea is to:

Generate random C programs (e.g. using C-Smith)
Execute those programs on random inputs while collecting code coverage
Identify dead-code of the program for those inputs
Arbitrarily mutate that dead-code (in this paper, they just delete dead code) and then re-execute the new program on the same input.

As long as your mutations avoid introducing undefined behavior, any mismatch in steps 2 and 4 are indicative of a bug.

I think this was a fairly clever idea, but found the paper itself to be weirdly obfuscated and unclear. They spent a lot of time with formal definitions that IMO really weren’t necessary and didn’t contribute much.

Using Lightweight Formal Methods to Validate a Key-Value Storage Node in Amazon S3

https://dl.acm.org/doi/10.1145/3477132.3483540

A nice experience report about how Amazon tests S3 about how they used property based testing + fault injection to better test S3. It was fun to read, but nothing really stood out to me as particularly novel (a lot of it just seemed like straightforward property based testing against a known oracle). I think the most interesting thing was the degree to which they used “default” random test case generators and minimization heuristics – the fact that those worked out so well is another good evidence that they’re pretty well-tuned at this point.

I did find the title slightly amusing – as someone who does a ton of property based tests at work, I have never described it as a “lightweight formal method”, but maybe I should start!

Amateur Hour