NSF BIGDATA and BDHubs Joint PI Meeting: a student's perspective

This post was originally published on the South Big Data Hub’s blog, here.

This summer I was funded by the South Big Data Hub’s DataStart program to intern at a local startup. Through this opportunity, I met the co-executive director Dr. Lea Shanely, who invited me to attend Hub’s annual Principal Investigator (PI) meeting if I agreed to blog about it (easy sell!). The goal of the 2017 Joint PI Meeting was to gather PIs who are funded through the NSF’s BIGDATA research program and Big Data Hubs and Spokes programs, along with industry and government invitees, to discuss current research, identify challenges, and examine promising opportunities and future directions of data research and education.

By the numbers

The meeting was comprised of:

  • More than 150 PIs
  • Representatives of 84 universities
  • 16 students
  • 5 postdocs
  • 9 government agencies
  • 3 chief data officers

Themes

A few themes that really jumped out were:

  • “Analytics in motion”
  • Access to data
  • Diversity
  • Community-based participatory research
  • Collaboration

Analysis in motion

New York City Chief Analytics Officer Amen Ra Mashariki from the Mayor’s Office of Data Analytics (MODA) discussed the nessessity of analytics in motion, and the concept was reiterated throughout the remaining panels and discussions. He contrasted analytics in motion with restful analytics - or using analytics to anticipate a crisis compared to waiting for a crisis before running analytics. This is a topic I have heard discussed at great length in the statistics community, specifically when discussing what we can learn from data journalists. For example Andrew Flowers, former writer at FiveThirtyEight, recently said at RStudio’s annual conference that a key principle of data journalism is the deadline; therefore, be fast or be left behind.

I think this is a really nuanced topic that requires much more discussion. In theory, being fast is a great aspiration and I am 100 percent on board with the idea of analytics in motion, but certainly not at the expense of accuracy. Teams at organizations such as MODA and FiveThirtyEight do a great job of striking this balance.

Access to Data

There was much discussion about sharing data and finding ways to better incentivize data sharing.

In a Federal Big Data panel moderated by our co-executive director Lea Shanely, Steve Dennis from the Department of Homeland Security stated that the future is not about owning all the data, but having access to it. Dan Morgan of the U.S. Department of Transportation made a call to put publically available data to work.

And there was a push to replace publish or perish with something more data-sharing oriented.

Diversity

An attendee commented that the meeting comprised a very diverse group of leaders, both in disciplines and backgrounds. This was extremely refreshing. In an age where all white male panels are still very common, this was a breath of fresh air. There were nine panels, all comprised of a diverse set of leaders. Diversity in discipline was also refreshing. I sat at lunch with biologists, bioinformaticians, statisticians, and engineers. In a panel of the Hub Executive Directors, Meredith Lee stated that diversity makes us stronger and diversity makes us better - I agree!

Community-based participatory research

Community-based participatory research was my area of interest for my biostatistics masters degree, so it is very near and dear to my heart. There was a lot of emphasis on community engagement and community buy-in. The structure of the Big Data Hubs - with regional and local hubs and spokes - lends itself well to community engagement. I think this is one of the most valuable aspects of this program. I’d like to briefly highlight one project that is particularly great at this.

Gari Clifford leads a study titled Large-Scale Medical Informatics for Patient Care Coordination and Engagement. His team looks at outcomes, such as cardiovascular disease, using “data in the wild” (I love this phraseπŸ“ˆπŸŒΎ) such as personal fitness devices, mobile phone usage, local weather, pollution or even fast food restaurant maps. A key aim of this study is to “provide educational outreach and community participation, particularly in minority populations, to design a system which benefits users in both the short term (through employment and education) and the long term (through increased engagement and trust)”1. This translates into having members of the community actively engaged in each stage of the study process, from app design to analysis. Community members are active members on the research team. Additionally, the group teaches coding and app development to members of the community.

Collaboration

Community building and collaboration was a huge theme.

Summary

All in all, this was such an awesome opportunity. I believe the future is quite bright for the Big Data movement, and am very grateful to NSF for funding these amazing programs.