When is complete case analysis unbiased?
I have been thinking about scenarios under which it makes sense to use imputation for prediction models and am struggling to come up with a case. Yikes! Even for inference, as long as you do some doubly robust approach, I’m not sure I see the value (other than for precision, but then is no longer a question of bias and thus is a question for a different day!)
It’s just a linear model: neural networks edition
I created a little Shiny application to demonstrate that Neural Networks are just souped up linear models: https://lucy.shinyapps.io/neural-net-linear/
Causal Quartets
On this weeks episode of Casual Inference we talk about a “Causal Quartet” a set of four datasets generated under different mechanisms, all with the same statistical summaries (including visualizations!) but different true causal effects.
Transparency in Public Health
Transparency in public health messaging matters. Hannah Mendoza and I looked at how providing transparent information about why a public health recommendation is being made can increase uptake in a randomized trial published today in Plos One.
The Peril of Power when Prioritizing a Point Estimate
I recently noticed that the Pfizer immunobridging trials, presumably set up to demonstrate that their COVID-19 vaccines elicit the same antibody response in children as was seen in 16-25 year olds, for whom efficacy has previously been demonstrated, have a strange criteria for “success”.
Would seeing Spider-Man: No Way Home decrease COVID-19 Cases?
In SNL’s cold open last night, President Joe Biden suggested that the COVID-19 surge we are seeing in the US is due to people seeing Spider-Man: No Way Home. If people would just stop seeing this film, he argues, cases will go back down! Interesting hypothesis, let’s take a looksy at the data, shall we?
Survival Model Detective: Part 2
A paper by Grein et al. was recently published in the New England Journal of Medicine examining a cohort of patients with COVID-19 who were treated with compassionate-use remdesivir. This paper had a flaw in it’s main statistical analysis. Let’s learn a bit about competing risks!
Survival Model Detective: Part 1
A paper by Grein et al. was recently published in the New England Journal of Medicine examining a cohort of patients with COVID-19 who were treated with compassionate-use remdesivir. This paper had a very cool figure - here’s how to recreate it in R!
First year as faculty
I just completed my first year as a faculty member - here is what I’ve learned! I’ll start by giving some context for where I am, what my university is like, etc. Then I’ll describe four recommendations, summarized by: discover your institution’s culture, become a peer, find harmony, and build community!
Prevalence of a disease plays an important role in your probability of having COVID-19 given you tested positive
The prevalence of a disease plays an important role in your probability of having it given you test positive.
Bayes Theorem and the Probability of Having COVID-19
I’ve seen a few papers describing the characteristics of people who tested positive for SARS-CoV-2 and this is sometimes being interpreted as describing people with certain characteristic’s the probability of infection. Let’s talk about why that’s likely not true.
IHME Model Uncertainty: A quick explainer
There has been a lot of talk about the IHME Covid-19 projection model. Ellie Murray & I have a chat about it on Episode 10 of Casual Inference; here is a quick description of what is going on here with a focus on the uncertainty.
Pulling co-authors for grant docs
I just submitted my first grant. It turns out you need tons of little things when you submit a grant (who knew!) and one of the little things is a list of all of the coauthors you’ve published with in the past four years. Instead of tracking that down, I automated the process using R and then stuck the code here so I have it for next time!
PheWAS-ME, an app for exploration of multimorbidity patterns in PheWAS
This post is a longer-form and less-formal accompaniment to the manuscript “PheWAS-ME: A web-app for interactive exploration of multimorbidity patterns in PheWAS” and accompanying application. As the first of three papers that make up my PhD dissertation, the project represents a significant collaborative effort bringing together Electronic Health Records (EHR) and Biobank data using R and Shiny.
Building a data-driven CV with R
Updating your CV or Resume can be a pain. It usually involves lots of copying and pasting along and then if you decide to tweak some style you may need to repeat the whole process. I decided to switch things up and design my CV so the format is just a wrapper around the underlying data. This post will help you do the same.
Extending the analogy: The boy who cried wolf was p-hacking!
During my postdoc with Jeff Leek, we worked on a few p-value, study design, and p-hacking “explainers”. Two of these were incorporated into TED-Ed cartoons (The totally ironically named (NOT BY ME) This one weird trick will help you spot clickbait and the less ironic Can you spot the problem with these headlines?), but the analogy written about here was never used, so here it is!
Understanding propensity score weighting
Come enjoy a graphical exploration of various propensity score weighting schemes.
One year to dissertate
I’ve compiled some resources that I used when completing my dissertation and I wanted to share them with YOU! Throughout this post, I link to a bunch of different templates that I used throughout my process. You can find them all in a GitHub repo. This how-to has gotten a biiiiiit long. This post contains the whole kit-and-kaboodle, but I will also be releasing these in a series of smaller posts over the next couple of weeks.
p-value thoughts: A twitter follow up
A conversation about how “convincing” various studies were based on sample size and p-values led me to post a poll on twitter. Here I discuss some thoughts that came up based on these results. tl;dr: p-values are hard, twitter is a fun way to spur stats convos!
network3d - a 3D network visualization and layout library
Recently, I have found myself needing to visualize networks. There are plenty of lovely options in R for visualizing networks in 2d, but I have found that many of the networks I want to visualize work much better when done in 3d and here the options are much smaller. This has prompted me to build the package network3d. This post will be a brief intro to using it.
A year as told by fitbit
Of all of the important things that happened in 2017, probably the most impactful on the world is that I managed to wear a fitbit the entire year. Here I download my entire years worth of heart rate and step data to see what my 2017 looked like, in terms of heart beats and steps.
Leveraging uncertainty information from deep neural networks for disease detection - a summary
I was recently sent this fantastic paper on using uncertainty in deep neural networks. In it the authors demonstrate a practical use of approximate bayesian inference by dropout in the context of massively complicated computer vision models for diagnosing disease. The paper, while well written, is very long. Here I summarize it into its main points and comment on their impactfulness.
MCMC and the case of the spilled seeds
For a long time I was confused by MCMC. I didn’t understand what it was, how it worked, and why we needed to do it. In this post I attempt to clear up those questions and allow you to play with the Metroplis Haystings algorithm as it attempts to find a posterior to help solve a mystery of two messy birds.
Commentary and follow up to p<0.005 suggestion
A recent paper, Redefine Statistical Significance by 72 co-authors, has caused quite a stir in the statistical community. Our student-run journal club at Vanderbilt will be discussing this contribution at our meeting led by Nathan James this week, so I’ve attempted to create a list of significant responses/commentary that have come out since this paper was posted on PsyArXiv.
The traveling metallurgist
Here I attempt to explain the concepts behind the optimization technique simulated annealing and the combinatorial optimization problem of the traveling salesman. First in words, and then more excitingly in an interactive visualization.
A Simple Slack Bot With Plumber
I’ve been excited about the R package Plumber ever since hearing about it for the first time as useR2017. So when I finally found an application that would allow me to use it, sending cat and dog photos over slack, I jumped at the opportunity.
Why you maybe shouldn’t care about that p-value
Recently, there seems to have been an uptick in citations of studies or statistics about this or that in the news and on the internet. Often these studies claim validity on the basis of a p-value. Through a small contrived example I make the point that in some situations we may want to ignore the forest and focus on the trees.
Twitter trees
A little over a week ago, Hilary Parker tweeted out a poll about sending calendar invites that generated quite the repartee. It was quite popular – so much so that I couldn’t possible keep up with all of the replies! I personally am quite dependent on my calender, but I was intrigured to see what others had to say. This inspired me to try out some swanky R packages for visualizing trees.
Introducing the tuftesque blogdown theme
If you like the way our blog looks, you too can have your own blogdown driven site just like it! In this post I walk through how to set up an RMarkdown driven blog from scratch using blogdown and the tuftesque theme constructed for Live Free Or Dichotomize.
ENAR in words
I had an absolutely delightful time at ENAR this year. Lots of talk about the intersection between data science & statistics, diversity, and great advancements in statistical methods. Since there was quite a bit of twitter action, I thought I’d do a quick tutorial in scraping twitter data in R.
The dire consequences of tests for linearity
This is a tale of the dire (type 1 error) consequences that occur when you test for linearity 😱
Custom JavaScript visualizations in RMarkdown
Recently RStudio added JavaScript chunks to RMarkdown. This makes many exciting things possible. Among these things is making your own custom JavaScript visualizations of data managed in R, all without leaving the .Rmd document. This is a quick walkthrough of doing just that.