The dire consequences of tests for linearity

rstats
rms
type 1 error
nonlinearity
This is a tale of the dire (type 1 error) consequences that occur when you test for linearity 😱
Author

Lucy D’Agostino McGowan

Published

February 18, 2017

This is a tale of the dire type 1 error consequences that occur when you test for linearity.

the scream

Edvard Munch’s The Scream (1893), coincidentally also the face Frank Harrell makes when he sees students testing for linearity.

First, my favorite explanation of type 1 error 🐺:

We generally fix (or claim to fix) this type 1 error at 0.05, but sometimes our procedures can make this go awry!

I’ve prepared a very basic simulation.

Here’s my simulation code (run it yourself!):

hank you Pua Yong Hao for pointing out a typo in the original version of this function – it has been updated!

library('rms')

sim <- function(wrong = TRUE){
#generate completely random data
y <- rnorm(100)
x <- rnorm(100)
#fit a model with a restricted cubic spline
mod <- ols(y ~ rcs(x))

if (wrong == TRUE & anova(mod)[2, 5] > 0.05){
  #if the test for non-linearity is not "significant", remove nonlinear terms
  mod <- ols(y ~ x)
} 
 #save the p-value
 anova(mod)[1, 5]
}

[Type 1 error when removing non-significant nonlinear terms]

test <- replicate(10000, sim()) 
cat("The type 1 error is", mean(test <= 0.05))
The type 1 error is 0.0812

Uh oh! That type 1 error is certainly higher than the nominal 0.05 we claim!

[Type 1 error when not removing non-significant nonlinear terms]

We would expect the type 1 error to be 0.05 – I perform the same simulation omitting the step of removing non-significant nonlinear terms and calculate the type 1 error again.

test <- replicate(10000, sim(wrong = FALSE))
cat("The type 1 error is", mean(test <= 0.05))
The type 1 error is 0.0501

Much better 👯

The conclusion: fit flexible models - skip the tests for linearity!

This has been elegently demonstrated by others, check out Grambsch and O’Brien.