The dire consequences of tests for linearity

This is a tale of the dire type 1 error consequences that occur when you test for linearity.

the scream

Edvard Munch’s The Scream (1893), coincidentally also the face Frank Harrell makes when he sees students testing for linearity.

First, my favorite explanation of type 1 error 🐺:

We generally fix (or claim to fix) this type 1 error at 0.05, but sometimes our procedures can make this go awry!

I’ve prepared a very basic simulation.

  • generate 100 data points from two independent random normal distributions, an outcome \(y\) and a predictor \(x\) (Since these are generated randomly, we would not expect there to be an association between \(x\) and \(y\). If all goes as planned, our type 1 error would be 0.05)
  • fit simple linear model with a restricted cubic spline on the predictor \(x\)
  • test whether the nonlinear terms are significant
  • if they are, leave them in and test the association between \(x\) and \(y\)
  • if they are not, remove them and refit the model with only a linear term for \(x\) & proceed to test the association between \(x\) and \(y\).
  • calculate the type 1 error, how many times we detected a spurious significant association between \(x\) and \(y\).

Here’s my simulation code (run it yourself!):


sim <- function(wrong = TRUE){
#generate completely random data
y <- rnorm(100)
x <- rnorm(100)
#fit a model with a restricted cubic spline
mod <- ols(y~rcs(x))

if (wrong == TRUE & anova(mod)[3,5] > 0.05){
  #if the test for non-linearity is not "significant", remove nonlinear terms
  mod <- ols(y~x)
 #save the p-value

[Type 1 error when removing non-significant nonlinear terms]

test <- replicate(10000, sim()) 
cat("The type 1 error is", mean(test <= 0.05)) 
## The type 1 error is 0.0853

Uh oh! That type 1 error is certainly higher than the nominal 0.05 we claim!

[Type 1 error when not removing non-significant nonlinear terms]

We would expect the type 1 error to be 0.05 – I perform the same simulation omitting the step of removing non-significant nonlinear terms and calculate the type 1 error again.

test <- replicate(10000, sim(wrong = FALSE))
cat("The type 1 error is", mean(test <= 0.05))
## The type 1 error is 0.0501

Much better 👯

The conclusion: fit flexible models - skip the tests for linearity!

This has been elegently demonstrated by others, check out Grambsch and O’Brien.