What I've Been Reading (Sep 2022)

Books

Dancing in the Glory of Monsters: The Collapse of the Congo and the Great War of Africa

This was probably my favorite book of the year – it’s a history of the Congo wars, and it’s absolutely gut-wrenching. The author doesn’t flinch from just how brutal this war is¹, how complicated the conflict are (there are like 10 armies and 20 militia groups involved), how complicit in the atrocities all the major leaders are, and how little the West cares. I’m sure it would’ve been easy to try to reduce the story into a simple “look how savage those Africans are”, and I appreciated the insistence on centering the voices of the victims, on understanding the political dynamics (and personal relationships) that drove the conflict, and how leaders deliberately cultivate and shape international opinion so that the West turns a blind eye.

It’s written in a very journalistic style, which is bot ha strength and a weakness – there are tons of really revealing anecdotes (e.g. Laurent Kabila hiding Congo’s treasury in his bathroom) that give a lot of color to the events, but it also means some parts of the story felt a little glossed over (e.g. what exactly is going in Angola during this time?).

Reading this book made me feel ashamed about how little I knew about this conflict before-hand:

I knew vaguely that DRC was a failed state, but I had no idea that they were embroiled in massive war up until 2003 (and with an unofficial war that basically continues to the present day). How I missed the bloodiest war since World War II that happened while I was alive is crazy to me.
I knew about the Rwandan genocide, but had no idea that it directly spiraled into the Congo wars. Even when I learned about in an International Humans Rights course I took in high school, we (in retrospect) took a very US-centric view, focusing on US foreign policy and interventionism rather than, you know, the actual people who suffered in the massacre and the aftermath.
I did not know pretty much any of the names of the leaders involved in any of the countries (with the notable exception of Robert Mugabe in Zimbabwe because of news articles about hyperinflation). Even many of the independence movement leaders were new to me
In general, I think it’s telling that the only books I’ve read aobut the Congo are this and King Leopold’s Ghost…

Overall, one of my favorite books of the year. Would definitely recommend it.

Breaks of the Game

This book is ostensibly about the Portland Trail Blazer’s 1979-80 NBA season, but I think it’s really a pretty poignant description about American culture at the time. There’s a lot of subtle observations about race relations in America (between white fans and black athletes, between black athletes and white coaches, and between white athletes and black athletes), the effect of money on the sport (e.g. the generational divide between the “young” athletes making millions of dollars and the “older” generation who barely made any money), and what labor rights look like when you’re a rich and famous athlete (e.g. why are NBA stars so much wealthier and more famous than, say, NFL stars?).

Style-wise, the book’s a little weird. It honestly feels more like an anthology of long-form magazine articles than a single book – each chapter can pretty much be read stand-alone (if you know the names of everyone involved). It’s also written very journalistically, with lots of emphasis on (and empathy with) a handful of individual players. There’s one chapter that dives into Kermit Washington’s life that I particularly loved.

Even as someone who’s not a basketball fan, I found this an enjoyable read that I’d recommend to anyone interest in modern-ish American culture.

Dealers of Lightning

This book is essentially a history of Xerox PARC, a legendary research center that invented GUIs, the Smalltalk language, VLSI chip design, and Ethernet (among other things). Overall, I enjoyed it even though I didn’t find it particularly noteworthy. Mainly, I found it a useful counterweight to the sheer amount of hagiography that Silicon Valley has about PARC. It might just be because my job (trying to invent a new human interface technology with neural interfaces in service of a new computing paradigm with AR/VR), but I meet so many people who just totally drink the Kool-Aid about how PARC was this magical place with brilliant research management and genius researchers who did no wrong.

And like, yes, there were lots of smart people there. And the research management was undeniably effective (although I learned that PARC could recruit such good talent because they were willing to pay top dollar at a time when the government was cutting CS research funding). But it was nice to learn about the office politics between research groups, of the prima donna displays for intellectual dominance, and the groupthink that captured PARC even in its heydey. The mythology is that all of these problems were only about the friction between PARC and “those corporate suits at Xerox who just didn’t get it”, but this book makes it clear that these problems all existed within the walls of PARC (and even within the walls of Bob Taylors' research group) as well.

Honestly, this gives me more hope about my own future career. Reading the book, PARC felt more like a research group with an unusual density of talent and an unusually charismatic manager – something that’s rare to have, but not something that’s impossible to find.

Overall, I enjoyed the book, but am not sure I’d recommend it unless you’re interested in PARC for external reasons.

Papers

Graham: Synchronizing Clocks by Leveraging Local Clock Properties

https://www.usenix.org/conference/nsdi22/presentation/najafi

This paper argues that by actually modeling out the sources of clock error, you can actually get incredibly accurate clocks using comododity hardware on Linux, which (combined with NTP) lets you get very tightly synchronized clocks (on the order of 100 ppb, or 100 ns of drift per second). It includes a nice description of how clocks work in modern systems, but I was not interested in clock synchronization enough to engage deeply with the details of their model. Still, anyone who’s interested in tight clock synchronization (e.g. for something like Spanner’s TrueTime API) might be interested in this.

NWGraph: A Library of Generic Graph Algorithms and Data Structures in C++20

https://drops.dagstuhl.de/opus/volltexte/2022/16259/

I thought this was going to be about a different way to represent graph that is generically good compared to the standard “soup of pointers” or “adjacency matrix as sparse matrix” that I’m used to. Instead, this was about defining an abstract interface for graphs (roughly abstracting over adjacency lists) and algorithm implementations that operate over that abstract interfaces. There didn’t seem to be anything super special about the API design –it seemed like sort of the obvious thing to do if you want to define an abstract graph interface (although maybe that speaks to how well-written the exposition is)?

It was nice to see some modern C++ things (I had no idea about std::execution or std::async), but otherwise I didn’t find this paper to be that interesting. I’m sure this was a well-engineered library, but nothing really stuck out to me as that noteworthy.

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

https://arxiv.org/pdf/1507.04544.pdf

This goes through a couple different choices for how to do model selection with Bayesian inference. The paper assumes that you are trying to maximize the expected log predictive density, or

\[\sum_k \ln \mathbb{E}_{\theta} p(y_k | \theta)] \]

for some new, unobserved dataset. You can’t calculate this this directly, but there are two “obvious” sort of ways to try and get at it:

You could plug in your observed data as \(y\) and try to penalize the penalize the “double-dipping” since you also use them to fit the posterior. The paper claims that the WAIC (the Widely Applicable Information Criterion) is the best way to do that.
Alternatively, you could try to estimate it using cross validation. This seems to do better empirically than WAIC, but is (obviously) expensive to compute. The paper summarizes you can apply importance sampling to efficiently approximate leave-one-out CV, and then how to apply “Pareto smoothing” to your importance weights to get a lower variance estimate (and makes sure that the variance is even finite in the first place!).

The paper generally recommends this 2nd approach (or vanilla k-fold CV when you have data points that are correlated with each other in your data).

Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory

http://jmlr.org/papers/v11/watanabe10a.html

This paper goes through the theory of WAIC, which is defined as (all expectations are with respect to the posterior distribution):

\[\frac{1}{n} \sum_k \ln \mathbb{E}_{\theta}[p(y_k | \theta)] - \frac{1}{n} \sum_{k=1}^n \mathrm{Var}_{\theta}[\ln p(y_k | \theta)] \]

I quite like this definition – frankly, I think it makes a lot more intuitive sense than the AIC (for instance). I assume in practice, you approximate those expectations by drawing samples from the posterior.

The paper was quite heavy mathematically and I admit I didn’t have the patience to go through and understand all the proofs step-by-step. The paper lies heavily on “singular learning theory” (e.g. learning theory when the Fisher information matrix is singular), which I was unfamiliar with. Still, even the high-level skim gave me a sense of what this new subfield is and how the logic is put together.

Know when to fold ‘em: An empirical description of risk management in public research funding

https://doi.org/10.1016/j.respol.2019.103873

A relatively straightforward empirical paper that examines ARPA-E. The ARPA “style” of research funding gives a lot of discretion to individual program directors, who get to decide who actually gets funding within a program (among other things). The paper basically shows that, yes, ARPA-E program directors do in fact exercise their power to:

Lengthen contracts for high-performing groups
Directing additional money to high-performing groups
Renegotiate technical milestones and deliverables
Terminating contracts for groups who are not hitting their milestones

And that research groups with “bonus” funding + extended contracts tend to yield better research as far as we can tell, although I found this claim a little iffy given the limited measurements we have of “goodness of research”.

On one hand, this is kind of obvious, but as the paper says:

First, we could have found instead that PDs used their options sparingly, consistent with expectations for program managers in traditional funding programs. Second, we could have found that PDs chose to expand struggling projects, seeing them in greater need of support, while cutting short projects that show signs of success and therefore require less public support than anticipated. And third, we could have found that PDs modified projects ineffectually, without any discernable productivity gains.

In general, nothing here was too surprising but nice to get confirmation that when you give program directors flexibility to actively manage their program, they seem to (1) both use that power and (2) use it well.

Are You Sure You Want to Use MMAP in Your Database Management System?

https://db.cs.cmu.edu/mmap-cidr2022/

As the title implies, this paper argues that you shouldn’t use MMAPs when implementing a database. I’m generally sympathetic to this argument – I remember using MongoDB when it still ran on a mmap-based storage engine and seeing crazily bad performance once your working set didn’t fit in RAM² – but found this paper a mixed bag.

Honestly, the most convincing section was the historical list of databases that was the list of databases that originally used mmaps and moved off of the (Mongo, InfluxDB, SingleStore aka MemSQL, and apparently the LevelDB to RocksDB fork). That’s a fairly impressive “here be monsters” list.

On the more technical side, the authors made 4 main critiques:

Transactional Safety: honestly, I don’t understand this critique. It’s good to point out that mmaps need special care (since the OS can flush dirty pages to disk at any time), but these protocols (especially the user-space copy-on-write) don’t strike me as that complicated compared to implementing a full buffer manager (and if you do MVCC you might need a separate scratch space anyways).
I/O stalls: basically, mmap gives very limited control over what’s on disk vs in-memory, so arbitrary pointer dereferences might block on disk I/O. I had thought that mlock would’ve solved the problem, but apparently not (although I would’ve appreciated more detail on why Linux limits the amount of memory you can mlock).
Error Handling: this was something I didn’t think about – because every pointer access can actually go to disk, every pointer access might generate SIGBUS / SIGSEGV errors to signal IO problems, which makes robustness harder.
mmap Performance Issues (as in, problems with the implementation of mmap), although the paper doesn’t really make claims that these are “inherent” problems vs fixable performance issues.

Overall, I thought the paper was pretty good about identifying problems w/ mmap, but didn’t do a good job of comparing it to the alternative of building your own buffer manager. I’m not aware of any off-the-shelf buffer manager you can just use, and writing a high-quality buffer manager strikes me as fairly complicated nowadays (and a lot of the pieces I imagine you’d use, like io_uring are pretty new).

(This is probably an opportunity for someone, to see if they can build a “buffer-manager library” that other people can use, or at least “parts from which you can assemble your own buffer manager”).

An interesting counterpoint is from https://ravendb.net/articles/re-are-you-sure-you-want-to-use-mmap-in-your-database-management-system which mostly mirrors my takes and replies to the 4 problems with:

Copy-on-write is not that complicated
Apparently you can use madvise with MADV_WILLNEED to get most of what you want
On IO Errors, you should probably just crash anyways. Error handling is basically impossible to get right anyways (see fsyncgate in Postgres). I believe the high-level idea (I vaguely remember fsync too), but I’m (1) unconvinced that signal handlers are good ways of doing teardowns or that (2) there aren’t some subset of IO errors that could be recoverable.
Basically just a “not a problem in practice”, which might be true for them but presumably would be pretty workload specific?

Effective Mimicry of Belady’s MIN Policy

https://www.cs.utexas.edu/~lin/papers/hpca22.pdf

This is a fun paper about implementing a hardware CPU caching algorithm (think CPU cache). The basic idea is to try to run Bélády’s algorithm by trying to learn the “next use time” of every member of cache. There are three main parts here:

The “reuse distance predictor” (RDP), which is just a map from Program Counters to estimated “time to reuse”. In this world, time is just a counter that increments anytime the cache is accessed. The big innovation seems to be thta previous versions of this approach trained a classifier of “should evict or not” rather than a regression (or multi-class classifier, since time is discrete here).
A “sample cache”, which you use to actually train the reuse distance predictor. It’s essentially a map from memory address to (timestamp, PC) pairs. When an address is reused, then the corresponding \(\Delta\) is used to train the reuse predictor.
The actual cache, which uses the RDP to simulate Belady’s algorithm.

I was a little surprised about why the RDP is only indexed by the PC and not by the actual memory address, but after thinking about it I actually think that makes sense – there are probably way more addresses you want to load than PC points and b

I was also surprised by how small the performance improvements over a simple LRU cache was – on the ballpark of 5% for a “state of the art” algorithm.

I didn’t know all too much about how the CPU caches worked under-the-hood, so this was a nice read. I think the main “missing piece” for me here is (1) how the different levels of the cache work together (I would’ve assumed that the L1/L2/L3 cache algorithms are designed collaboratively) and (2) some insight about whta needs to change for multi-core caching (the paper runs experiments on multi-core workloads but doesn’t talk about how those might influence the cache design).

Rethinking Belady’s Algorithm to Accommodate Prefetching

https://www.cs.utexas.edu/~lin/papers/isca18.pdf

Another paper about hardware caches. Basically, they observe that Bélády’s algorithm doesn’t take into account the difference between loads (where you actually use the data) vs prefetches. The only downside of a cache miss for prefetches is extra memory traffic, so you usually want to trade off extra cache misses for prefetches to get get more cache hits for loads.

It’s a simple idea, and one that was well-communicated. I found their actual “Flex-Min” algorithm a little complicated in how it traded of cache misses for prefetches (IMO the obvious thing to do would be to just multiply prefetch’s “next fetch time” by some constant so they’re evicted more eagerly), but overall still a good paper.

Moderates

https://www.cambridge.org/core/journals/american-political-science-review/article/moderates/71A6A9BD7EC7A5C94F975703417F866F

When you poll people’s policy preferences and try to fit a 1-dimensional spatial model to them, you find that a lot of people end up being ideological moderates. But are they moderate because they actually have middle-of-the-road views, or are they moderate because they have some very left-learning views and some very right-learning views in ways that “cancel out”.

This paper tries to answer this question by fitting a mixture model to policy preference survey data with 3 components:

People who fit the traditional 1-dimensional model of ideology
People who select choices totally randomly (e.g. since they’re not paying attention to the survey). That is, Pr[y = 1] = 0.5 always.
People who “cluster” into together independently of the other two options (that is, the probability of answering “yes” on question Q is some fixed \(p_Q\) for everyone in this cluster).

I think modeling people’s ideology as a mixture model is clever (and I definitely like the inclusion of bucket 2), but I’m pretty skeptical about this approach in other ways:

I don’t think that this 3rd bucket captures “inconsistent” moderates. There’s nothing that claims that such inconsistent moderates agree with each other (AFAIK), and there’s no regularization imposing constraints on \(p_Q\) such that they’d be actually moderate.
They doesn’t do a great job establishing that their model’s actually fitting the data reasonably. In particular, I would’ve liked to know why they chose only a single cluster! Would the results have changed if we used 2 clusters? 3 clusters? A Chinese restaurnat process to learn the # of clusters as well?
Why are we even fitting a mixture model? I feel like we could answer the stated question by looking at people’s responses after we fit the 1D ideological model – their model also gives us the “ideological center” of each policy position, so can’t we directly use those to find if people are middle-of-the-road vs ideologically extreme but inconsistent?

It was an interesting problem and an interesting approach to the problem, but I’m not sure I fully trust the analysis.

Power Laws in Economics: An Introduction

https://www.aeaweb.org/articles?id=10.1257/jep.30.1.185

This was sort of a meh paper. It basically tries to argue that power law distributions are empirically common in economics, comes up with some toy models for it, and concludes with implications for the broader field.

The empirical part was pretty disappointing. The way it establishes empirical power laws is to just graph things on a log-log plot, perform a linear regression, and argue that the \(r^2\) is high. This is, IMO, pretty unconvincing – lots of non-Pareto distributions get high \(r^2\) (e.g. a log-normal distribution) if you do this and honestly the relationships don’t actually look that linear by eye (the errors definitely have some trends within them). I’m a little surprised this was the extent of the analysis since I think these problems have been well-established since at least 2009.

The “What Causes Power Laws” section gave a couple different generating processes and was generally well-done (although I wish it went into more depth about the mathematical intuition). That said, I wasn’t sure how seriously to take it given that I was unconvinced about the empirics of the power laws presented.

I think the last section about broader implications was the best part, mostly because it wasn’t really about power laws at all! It was more concerned with arbitrary heavy-tailed distributions (where you can’t just average out variance / fluctuations), which seems like the concept we actually care about.

War as an Economic Activity in the “Long” Eighteenth Century

https://journals.sagepub.com/doi/abs/10.1177/084387141002200202?journalCode=ijha

This is sort of a weird paper – it’s at once a history paper about British economic history, a position paper for other historians to “take war’s effect on economic development seriously”, and a bit of a polemic dunking on other economic fields. I’m not an expert enough in British economic history to judge the validity of its main claims, but it seemed plausible to me baesd on other things I know. If I had to summarize the argument, I’d say it was that:

18th century Britain should be understood as a “fiscal-military state” – an organization built to raise money in order to fight wars.
Britain devoted much more of its economy and manpower to war compared to its European rivals (it didn’t “save” on war because it was protected by the ocean)
Britain’s huge war expenditures caused its huge economic boom, even before the “mineral” economy of the industrial revolution. The paper speculates that this was because of the British spending on the navy (vs the army for other European countries).

Why Organizations Fail: Models and Cases

https://www.aeaweb.org/articles?id=10.1257/jel.54.1.137

This was a nice review article chronicling a couple different organizational failure modes along with some crisp toy models to explain the main intuition. It didn’t really delve into “which of these failure modes actually happen” (relying more on “empirical sketches” than full-throated anlysis), but I found the models to be generally well done (although some of them were a little too artificial for my tastes).

What If We Don’t Pop the Stack? The Return of 2nd-Class Values

https://drops.dagstuhl.de/opus/volltexte/2022/16243

The paper basically asks whether we can return variably-sized data from functions without having to heap allocate them (e.g. arrays of dynamic size, or objects that satisfy an interface, or closures). They show that yes, you can. The main idea is to not fully pop the stack when you return – this is ok, as long as you pop off multiple stack frames at once later to make up for it.

To do this safety, they introduce a typing discipline that enforces “safe stack pop deferring”. I admit I didn’t really follow the details of how this type system works, but it was a neat idea!

Articles

Logs vs Structured Events: This article basically advocates for “canonical” structured events (a la canonical log lines) – that is, a single structured event that is emitted per request. This event is (essentially) a “wide” JSON blob with potentially 100s of fields (things like “git commit of service” or “host IP address”) to help you retroactively figure out what happened. Because you only emit 1 event per request, adding new fields is cheap (and you should reduce storage costs by dynamically sampling events instead of discarding fields).
When to use Bazel: a nice experience report about people moving to Bazel. I think the TLDR is that moving to Bazel’s a big change, but it does deliver on its promises on fast + correct builds. It’s definitely worth it when you’re big enough, but for smaller shops it’s unclear how important that is.
Where do Economies of Scale Come From: nothing here is particularly shocking, but it’s a nice succinct writeup of exactly what the title says.
Project Cambria: this details a system for decentralized schema migrations. Basically, you create “lenses” which are automated ways to go to/from schema A to schema B and vice-versa. You can then create a graph between all the schemas and use lenses to traverse this graph. A neat idea, and their prototype system seemed pretty cool (e.g. a declarative YAML for defining these lenses and then auto-generate the code, the graph, and types for TypeScript). As the authors note, there are still some challenges (no silver bullet!), but it’s still a clever idea.
How Superhuman Built an Engine to Find Product Market Fit: This is one of those typical business think pieces, full of self-aggrandizement and fluff without much actual intellectual rigor. The core proposal is an “algorithm” for knowing how good your product is when you’re in some kind of beta mode and can’t rely on actual sales data. Basically, you divide customers (via survey) into those who would be “very disappointed”, “somewhat disappointed”, and “not very disappointed” if your product disappeared tomorrow. You then (1) look at common characteristics of the “very disappointed” customers to define your target market and further target your analysis, (2) identify core value propositions among those “very disappointed” users to prioritize and avoid regressing, and (3) ask for pain points from the “somewhat disappointed” users to prioritize feature work (hoping to convert them to “very disappointed” users). Certainly seems plausible, but I don’t know why they needed so many words to convey this idea.

One of the most nauseating elements was a description of the aftermath of a church massacre. After killing the villagers, the soldiers then basically plaid human origami with the corpses – I vividly remember the example of a corpse who had slits cut into their belly and had their hands inserted into their own insides (like pockets). And the reaction of the survivors was also wrenching – something to the effect of “I thought we were desensitized to the violence, but this was too much”. ↩︎
Although looking back, I’m not sure I can blame mmap for this design choice vs Mongo’s rather… subpar implementation quality more generally. These were the days of the giant collection-level lock (which I’m told was an imporvement over the global database-level lock that preceded it)! ↩︎

Amateur Hour