Hill for the data scientist: an xkcd story

data-science
epidemiology
xkcd
NSSD
This was inspired by Hilary Parker & Roger Peng’s Not So Standard Deviations Episode 28. It was suggested that it would be useful to lay out Hill’s criterion for data scientists, I agree!
Author

Lucy D’Agostino McGowan

Published

December 15, 2016


This was inspired by Hilary Parker & Roger Peng’s Not So Standard Deviations Episode 28, which can be found here. It was suggested that it would be useful to lay out Hill’s criterion for data scientists, I agree!


[causation: An event or outcome B is influenced by a change in A]


Sir Austin Bradford Hill, a statistician and epidemiologist, created a list of guidelines for evaluating whether there is evidence of a causal relationship.[1] He determined the following aspects of associations ought to be considered when assessing causality. When thinking about this problem, an xkcd comic I have seen in every lecture on this topic came to mind:

correlation


This inspired me to attempt to explain Hill’s criteria using xkcd comics, both because it seemed fun, and also to motivate causal inference instructures to have some variety in which xkcd comic they include in lectures (bear with me, some of these are a stretch 🙈💁🏻).


Strength

  • How big is the effect you are seeing?
  • Note: Hill suggests that huge effects can suggest causality, however this does not mean small effects cannot

Note: I am using this idea for a talk and I found a strip (to the left) that I think better represents this concept. The original post had this one.

strength


Consistency

  • This is essentially reproducibility & replicability
  • Can your analysis be reproduced?
  • Has anyone been able to replicate your findings?

consistency


Specificity

  • Can the association be pinpointed to a specific cause with no other plausible explanation?
  • I appreciate Hill’s caveat here, “if specificity exists we may be able to draw conclusions without hesitation; if it is not apparent, we are not thereby necessarily left sitting irresolutely on the fence.”

specificity


Temporality

  • Does the timeline make sense?
  • In general, the exposure ought to come before the outcome it is said to cause.

temporality


Biological gradient

  • The wording of this point makes it a bit difficult to untangle from the medical application, but generally this refers to a dose effect
  • Does increasing an exposure yield a change in the outcome.

biological-gradient


Plausibility

  • Does the causal relationship make sense?
  • This is also a tricky one since plausibility depends on knowledge at the time. If we found it perfectly plausible, we may not need statistics to show the relationship.

plausibility


Coherence

  • Similar to plausibility, is there a logical argument that can be made by/to experts in the field regarding causality.
  • Does it fit into the understanding of the field (authors note: this should have caveats too…the field could be wrong).

coherence


Experiment

  • If a controlled experiment can take place, this can strengthen the argument for causality
  • I view this as a general attempt to implement a counterfactual analysis.

experiment


Analogy

  • Have we seen a similar effect from a similar exposure?

analogy

[1] Hill, A. B. (1965). The Environment and Disease: Association or Causation? Proceedings of the Royal Society of Medicine, 58(5), 295–300.

Think I’ve missed something? Submit a PR.