The paper conducts very careful inference, using the two leading approaches for inference in
studies with small numbers of clusters - randomization inference and wild cluster bootstrap-t.
This is the first paper to show the performance of both methods alongside each other. We
show that both perform quite well in our data, with a slight tendency for randomization
inference to either do very well or slightly under-reject (and the converse for wild-bootstrap
cluster-t).