I just submitted my first grant. It turns out you need tons of little things when you submit a grant (who knew!) and one of the little things is a list of all of the coauthors you've published with in the past four years. Instead of tracking that down, I automated the process using R and then stuck the code here so I have it for next time!
This post is a longer-form and less-formal accompaniment to the manuscript "PheWAS-ME: A web-app for interactive exploration of multimorbidity patterns in PheWAS" and accompanying application. As the first of three papers that make up my PhD dissertation, the project represents a significant collaborative effort bringing together Electronic Health Records (EHR) and Biobank data using R and Shiny.
Updating your CV or Resume can be a pain. It usually involves lots of copying and pasting along and then if you decide to tweak some style you may need to repeat the whole process. I decided to switch things up and design my CV so the format is just a wrapper around the underlying data. This post will help you do the same.
During my postdoc with Jeff Leek, we worked on a few p-value, study design, and p-hacking "explainers". Two of these were incorporated into TED-Ed cartoons (The totally ironically named (NOT BY ME) _This one weird trick will help you spot clickbait_ and the less ironic _Can you spot the problem with these headlines?_), but the analogy written about here was never used, so here it is!
Recently I was tasked with parsing 25tb of raw genotype data. This is the story of how I brought the query time and cost down from 8 minutes and $20 to a tenth of a second and less than a penny, plus the lessons learned along the way.
Come enjoy a graphical exploration of various propensity score weighting schemes.
I've compiled some resources that I used when completing my dissertation and I wanted to share them with YOU! Throughout this post, I link to a bunch of different templates that I used throughout my process. You can find them all in a GitHub repo. This how-to has gotten a biiiiiit long. This post contains the whole kit-and-kaboodle, but I will also be releasing these in a series of smaller posts over the next couple of weeks.
A conversation about how "convincing" various studies were based on sample size and p-values led me to post a poll on twitter. Here I discuss some thoughts that came up based on these results. tl;dr: p-values are hard, twitter is a fun way to spur stats convos!
A brief intro to, and tutorial for, the new function in the shinysense packages: shinyviewr. This function allows you to take photos using the camera on your computer or phone and directly send them into your shiny applications.
My husband's family throws a family reunion every year and this year we've been tasked with co-planning it. We were trying to decide on the best location for everyone, so I embarked on a mission to find the center of all of our residences.
I always love discussions about R release names and their origin. I have been working on this list for a while -- with the release of "Short Summer" today, I thought it'd be a good time to post!
Recently, I have found myself needing to visualize networks. There are plenty of lovely options in R for visualizing networks in 2d, but I have found that many of the networks I want to visualize work much better when done in 3d and here the options are much smaller. This has prompted me to build the package network3d. This post will be a brief intro to using it.
How different is the warmest day from the coldest day all around the country? Using readings from 7,000+ NOAA weather stations across the country we can find out.
Since twitter threads are excessively cumbersome to navigate, Maëlle asked me to relocate the list of #rstats Data Day Texas slides to a blog post, so here we are!
Recently I tweeted a small piece of advice re: when to set a seed in your script. Jenny pointed out that this may be blog post-worthy, so here we are!
Of all of the important things that happened in 2017, probably the most impactful on the world is that I managed to wear a fitbit the entire year. Here I download my entire years worth of heart rate and step data to see what my 2017 looked like, in terms of heart beats and steps.
I was recently sent this fantastic paper on using uncertainty in deep neural networks. In it the authors demonstrate a practical use of approximate bayesian inference by dropout in the context of massively complicated computer vision models for diagnosing disease. The paper, while well written, is very long. Here I summarize it into its main points and comment on their impactfulness.
'Tis the season for white elephant / גמד וענק / Yankee swap / secret santa-ing! We thought it'd be particularly fun to do it #rstats style.
Thanksgiving 🦃 is right around the corner 🎉 -- this year we are hosting 17 people 😱. If you too are hosting way more than your kitchen normally cooks for, perhaps this will be of interest!
One thing I always found confusing when learning what an LSTM does is understanding intuitively why it's doing what it does. Here I attempt to give an example of how a LSTM hidden layer can be thought of through baseball.
For a long time I was confused by MCMC. I didn't understand what it was, how it worked, and why we needed to do it. In this post I attempt to clear up those questions and allow you to play with the Metroplis Haystings algorithm as it attempts to find a posterior to help solve a mystery of two messy birds.
I always love discussions about R release names and their origin. I have been working on this list for a while -- with the release of "Short Summer" today, I thought it'd be a good time to post!
A recent paper, Redefine Statistical Significance by 72 co-authors, has caused quite a stir in the statistical community. Our student-run journal club at Vanderbilt will be discussing this contribution at our meeting led by Nathan James this week, so I've attempted to create a list of significant responses/commentary that have come out since this paper was posted on PsyArXiv.
Here I attempt to explain the concepts behind the optimization technique simulated annealing and the combinatorial optimization problem of the traveling salesman. First in words, and then more excitingly in an interactive visualization.
I've been excited about the R package Plumber ever since hearing about it for the first time as useR2017. So when I finally found an application that would allow me to use it, sending cat and dog photos over slack, I jumped at the opportunity.
I find series expansions fascinating. I also find any math envolving e to be fascinating. Here I explain some of the facets of the exponential power series and its connection to my favorite distribution, the Poisson.
Recently, there seems to have been an uptick in citations of studies or statistics about this or that in the news and on the internet. Often these studies claim validity on the basis of a p-value. Through a small contrived example I make the point that in some situations we may want to ignore the forest and focus on the trees.
Interested in creating your personal website with R Markdown? We've updated our R Markdown website tutorial to depend on RStudio for simplicity, making website building easy as 🍰!
Recently I overhauled the drawr function of my package shinysense. Some bugs were fixed but potentially more interesting new features were added. Among these are support for time series and the ability to use the function outside of Shiny. This post covers what changed and how to use the new features.
A little over a week ago, Hilary Parker tweeted out a poll about sending calendar invites that generated quite the repartee. It was quite popular -- so much so that I couldn't possible keep up with all of the replies! I personally am quite dependent on my calender, but I was intrigured to see what others had to say. This inspired me to try out some swanky R packages for visualizing trees.
Maëlle and I created a mosaic of R-Ladies for the JSM Data Art Show. Here is a quick tutorial if you are interested in trying something similar!
HAPPY world emoji day! In honor of this momentous occasion, I have decided to analyze the emojis used on rOpenSci's Slack.
If you like the way our blog looks, you too can have your own blogdown driven site just like it! In this post I walk through how to set up an RMarkdown driven blog from scratch using blogdown and the tuftesque theme constructed for Live Free Or Dichotomize.
We both recently attended useR!2017 in Brussels. It was a blast to say the least. Here we will cover our favorite things about things about the conference and the lessons we learned.
I had such a delightful time at rOpenSci's unconference. Not only was it extremely productive (21 packages were produced!), but in between the crazy productivity was some epic community building.
I had an absolutely delightful time at ENAR this year. Lots of talk about the intersection between data science & statistics, diversity, and great advancements in statistical methods. Since there was quite a bit of twitter action, I thought I'd do a quick tutorial in scraping twitter data in R.
Recently we have been working on a shiny app that mimics tinder for preprints. One of the more exciting things we've done in this app is implimented a swiping input. Now you can to with the package shinyswipr.
Lucy and I have made a simple package that allows you to pull down a collaborative google doc directly into an RMD file on your computer. Hopefully speeding up the process of writing collaborative statistical documents.
This is a tale of the dire (type 1 error) consequences that occur when you test for linearity 😱
For today's rendition of I am curious about everything, in Hilary Parker & Roger Peng's Not So Standard Deviations Episode 32, Roger suggested the prevalence of drunk podcasting has dramatically increased - so I thought I'd dig into it.
A New Year's resolution for all of our models: get more flexible! By flexible, we mean let's be more intential about fitting nonlinear parametric models.
Lara Harmon has put in countless hours to build and uplift the ASA Student community. We are SO grateful.
Recently RStudio added JavaScript chunks to RMarkdown. This makes many exciting things possible. Among these things is making your own custom JavaScript visualizations of data managed in R, all without leaving the .Rmd document. This is a quick walkthrough of doing just that.
Nick and I are starting a series following Frank Harrell's Regression Modeling Strategies course. Get ready for some crazy fun.
It's that post-holiday time of year to write some thank yous! I'm getting excited to attend rstudio::conf next week, so in that spirit, I have put together a little thank you using dplyr
P-Values are annoying, let's understand them so we dont get beaten by them.
This was inspired by Hilary Parker & Roger Peng's Not So Standard Deviations Episode 28. It was suggested that it would be useful to lay out Hill's criterion for data scientists, I agree!