Open Data During Times of Crisis with Nick Hart of the Data Foundation
The Podcast That Helps You Understand Federal Data Privacy and Governance
This week on FEDSpace it’s all about data in a time of crisis. Nick Hart, President of the Data Foundation, joins Active Navigation’s Jesse Rauch, VP of Federal, to discuss the importance of open data – rooted in data privacy – during unprecedented situations.
As US policymakers continue to adapt their strategies in response to the coronavirus pandemic, they must use relevant and reliable data to aid in making decisions. Enter: The COVID Impact Survey. Released by the Data Foundation in collaboration with the NORC and other partners, the survey collects data from random samples in 18 geographic locations across the US and offers incredibly valuable insights into the behaviors and feelings of the American people as the world around us changes.
What insights have the results of the COVID Impact Survey offered so far? How can policymakers best use the data collected? Why is it important to openly share data within the federal government, especially in times like these?
During this episode, you’ll learn about:
• How to prioritize the concept of privacy when collecting and using data
• Why the COVID Impact Survey was launched and how the data can influence pandemic-related decisions
• The advantages of public/private partnerships when sharing data
• What the future holds for open data in the federal government
• What to consider when downloading a coronavirus tracing app
Marissa Wharton: Welcome to the FEDSpace podcast – your source for all things federal data privacy and governance. I’m Marissa Wharton with Active Navigation and I’m joined today by my colleague Jesse Rauch. During this episode, we’ll be talking about the importance of data during a time of crisis. And bringing his expertise to the conversation is our guest, Nick Hart, President of the Data Foundation.
Nick, thanks so much for joining us!
Nick Hart: Thanks for inviting me.
Marissa Wharton: Yeah, we’re excited! And as you can imagine, there’s a lot to unpack here. So, Jesse, I’ll pass it right over to you so we can jump right in.
Jesse Rauch: Thank you, Marissa. It’s great to be here today. And again, Nick thank you for joining us. So as, Marissa mentioned, you are currently the president of the Data Foundation.
So, as we kick this off, can you tell us a little bit about how you found your way to the Data Foundation and of course, the mission of Data Foundation?
Nick Hart: So, I’ll start by talking about the data foundation. It’s a relatively small Washington DC-based nonprofit that really focuses on encouraging our government to create strategies for better using its own data.
So fundamentally, we believe that good data is necessary for good evidence, and that’s good for society when decision-makers can have reliable information to make decisions. So the things that we focus on include everything from data sharing strategies, data access, data standards, even evidence-based policymaking.
This is kind of a natural fit for my own background. I spent about a decade working in the White House’s Office of Management and Budget on a variety of issues, but one of the themes that was through all of the issues that I worked on at OMB was really how we better use data to make decisions.
I had the opportunity to serve as a senior staffer of the US Commission on Evidence-based Policymaking back in 2016 and 2017. This was a group that Senator Patty Murray and then speaker Paul Ryan had created to basically develop a strategy for better-using data. And fortunately, we’ve seen a lot of progress on that commission’s recommendations, but there’s still a lot of work to be done.
And that’s one of the things that we’re really prioritizing in all of the threads of activities we have going on at the Data Foundation is how to both implement those recommendations and consider some next steps, but also just consider the practicalities of where the government is today, as conditions change, as leadership roles change, and even as pandemics come into play and affect our daily way of life and operations.
So there’s a lot of work to be done and we’re really excited to be carrying this banner at the Data Foundation because we think it’s important.
Jesse Rauch: That’s excellent! With that, we of course, as a vendor are all about data, analysis and sharing. and one of our big, aspects is data privacy.
And I know on your website it talks about sharing data in a way that’s useful and insightful while still protecting confidentiality and sensitive information. On that area, can you talk a little bit about how that principle is important to the data foundation and how you translate some of those aspects into practice as well?
Nick Hart: Absolutely. you know, privacy is really a principle in a way that cuts through all of our work and should cut through the work of anyone who’s engaged on data issues. It’s not really a barrier to data sharing. It’s a component of ethical use. It’s an element that’s essential for maintaining public trust in, not just our government, but even the information that our government is putting out.
And so the Evidence Commission really led with this topic back when it put out as its report in 2017 and I think increasingly we’ve seen a focus on this topic across the data community in DC. New Chief Data Officers that were established by the Foundations for Evidence-Based Policy Making Act, for example, are specifically tasked with thinking about privacy in some of the work that they’re doing.
There are a lot of different ways that we prioritize the concept, right? Privacy is a broad construct. You alluded to, confidentiality, which is a specific component of privacy. And I think this is really an area where the federal government has led practice for quite a long time. The development of disclosure avoidance techniques, for example, in the federal statistical system is really something that we’ve known how to do for decades. But how we implement those continually has to change as more data is made publicly available.
For example, it makes the job of ensuring confidentiality when we’re making public-use data files and open data all that much more complicated. There’s also a lot of emerging technology and emerging approaches that I think will fundamentally change how we think about implementing this concept going forward.
There’s an idea called secure, multi-party computation – we’ve known how to do this from a computer science perspective for decades, but we haven’t really figured out how to implement it across public administration systems. At the same time, SMC, multi-party computation, it allows us to do data sharing in a highly privacy-protective manner.
And that’s an example of something that our government should be pushing forward faster and should be doing more pilot projects on, more demonstration projects, to figure out how we take this next step of technology, the next step in the approaches for protecting privacy for the American public and making it real.
I’ve been excited actually, that there are some pilot projects underway in the federal government, including the intelligence community, but also in the federal statistical system using that concept of multi-party computation. But the pilots that we have in place are few and far between. We need to vastly improve the number and the scale of those projects in order to really figure out how we do this going forward.
And multi-party computation is just one example. There are a whole bunch of other approaches that we should be exploring rapidly to ensure that when we’re talking about privacy inside government, or we’re talking about privacy outside of government, that we’re really putting our best foot forward and protecting data.
And again, that’s really essential for people trusting those of us who are using data that we’re doing it ethically.
Jesse Rauch: Excellent – well said.
And speaking of using data, we’ve really been talking about the processes for the government at large, but I understand the Data Foundation, has been doing some data work yourselves.
Specifically, we wanted to talk about the COVID Impact Survey that the Data Foundation ran itself. Can you tell us a little bit about the survey and why the Data Foundation did its own survey?
Nick Hart: Well, we launched the COVID impact survey about six weeks ago, and it was a project that really launched by accident.
It started as an idea on social media in fact! Someone who’s now working with us on this project posted a tweet that said, gee, wouldn’t it be nice if someone conducted a large scale random sample survey so we could understand the impacts of what’s happening across the American public facing the current pandemic?
And you know, frankly, government should be doing something like this, and I’m happy to say government has started. But for those of us in the private sector, we recognized we would be able to do it very quickly. And so, within the course of just several weeks, we pulled together a team of some of the leading researchers across the country to help design a survey.
And, in a partnership with NORC at the University of Chicago, we rapidly got this survey in the field. And it fills a data gap. We knew we could do it fast. We knew we could also do a really credible job providing reliable information that could be useful for both decision-makers as well as researchers.
So this whole process, was again, just in the course of weeks. And, I think there are a couple of benefits to the strategy that we’ve used that also led us to launch this project. one we noticed there were a lot of surveys that were starting, including from some of the leading universities in the country, that were based on convenience samples.
That’s one way to rapidly develop and launch a survey, but the challenge is when you receive the data and you’re trying to understand what it all means, there’s some unknown level of bias in convenience sampling that creates a challenge for knowing whether the information’s reliable. So our decision-makers need good information to make decisions.
It’s not clear that a convenience sample’s really going to accomplish that. So we wanted to design something that was based on a random sample. We also recognized that the pandemic has localized effects. Just simply doing a survey at the national level was not going to tell someone in Kansas city or Atlanta who’s trying to work with the mayor’s office to determine how to set a policy that’s relevant for that geography – it doesn’t tell them what to do.
And so again, the COVID Impact Survey really specifically addresses that problem. We weren’t able to survey at every local geography across the country, but we have 18 subnational geographies, so that includes 10 states or cities.
So the survey is underway. We’ve released two waves of data so far, and in addition to creating some summary narratives that we’re publishing, we also released the de-identified microdata. And I’m happy to say we were able to get this information out in the public domain three weeks ahead of a survey that the Census Bureau was able to field and our microdata is four weeks ahead of theirs.
So it was really filling in this gap that existed across our country and just having insights that are relevant for knowing what to do next.
Jesse Rauch: Excellent. And, in those insights, anything surprising that you found so far or, insightful that you’d like to share with us today?
Nick Hart: So the COVID Impact Survey had results that, in many ways surprised us, and in other ways did not. Some of the top-line statistics that we derived were on things like food insecurity and economic insecurity, and just through the course of the media, I think we’ve seen that the effects on the American people are striking. The survey really validates that both perceived and real food insecurity are tremendously high for the American public, and that creates a policy problem that we need to figure out how to solve.
Another area that the COVID Impact Survey focused on was mental health. And so there’s a series of questions about depression and hopelessness, anxiety. And I think we also observed here that there are a lot of concerns about mental health across the American public as the pandemic has really taken its toll on our daily lives.
And that suggests potentially the policymakers need to prioritize the systems and the infrastructure that helps the American people respond appropriately to mental health concerns, whether it’s services or simply knowing who to call or talk to.
There are other things that we did find interesting, for example, the increase in communication with family members is something that we’ve documented through the survey. Very striking increases in daily and weekly contact. And you know, not surprising that we’re all able to pick up our phones maybe a little easier now and, hop on digital phone calls, via various webcasting and meeting platforms, but it is something that’s very, very strikingly, higher.
I think the last thing that I’d mention here is just that there’s a lot of behavioral questions that we ask. For example, who’s wearing face masks, who are complying with social distancing practices, who are minimizing contact with at-risk populations?
And across the two waves of data that we’ve released so far – one from April, one from May – in a lot of jurisdictions we started to see declines in some of these behaviors, and that’s perhaps not surprising as some States are beginning to relax some of the requirements that have been put in place, but one of the subpopulations that we were particularly surprised by, was just young individuals.
Those who are 18 to 22 seem to be complying with some of the guidance that had been in place at a much lower rate. And that suggests that maybe there’s a need for increased education about how to comply with these things and maybe even what behaviors we need to have in place as the economy opens back up.
So a lot of interesting findings that are coming out of the COVID Impact Survey and, you know, hopefully we’ll be able to continue, releasing those kinds of findings and coming weeks as well.
Jesse Rauch: Those are great insights. So, you work with policymakers, have they been receptive to this information?
And, does it impact policy, or can it even impact policy in a, timeline that’s fast enough to make a difference.?
Nick Hart: Well, I hate to over attribute causal relationships between our survey and what policymakers are doing, but we are, of course, as you noted, in close contact with folks in Congress across all of the jurisdictions that we are including in the survey.
And, you know, decision-makers need information to make a decision. And I think we’ve provided a really critical resource for them. And there are also some things that I know we’re helping frame future policy solutions for.
For example, one area that we’ve seen in the survey is that sick people in some cases are still going to work. So those who have symptoms like a fever who are working outside of their homes are still indicating that they’re working. And that’s an issue that policymakers should probably think about as we go into the next steps of minimizing the spread of coronavirus across the country. messaging around behaviors, as I alluded to a minute ago, you know, very clearly needs to be consistent and something that’s salient for those who are on the receiving end of the message.
So, hopefully policymakers can use this information. I think we’ve seen some evidence that they are, and we’ll try to do as good as we can.
Jesse Rauch: Yeah. Excellent. And, of course, this is a privacy-focused podcast, so how did you, over the course of this survey maintain privacy, what are some of the controls that you’re using yourselves?
Nick Hart: Well, this survey is a partnership with NORC at University of Chicago, and of course, as part of a survey data collection like this, there is an institutional review board approval process. So we’ve done our due diligence in ensuring compliance with the IRB protocols. But importantly, when we’re releasing the data as one of our goals is to make de-identified public datasets available we are going through best practices and disclosure avoidance.
So, the short version here is that these practices are in full compliance with federal guidance that comes out of what’s called Working Paper 22 from the Federal Committee on Statistical Methodology. So that includes things like cell suppression. These are traditional techniques that we use for disclosure avoidance. And you know, in short, we are specifically, doing what we can to protect confidentiality. Everything goes through multiple rounds of reviews. So again, we’re really trying to follow the best practices here in maintaining confidentiality.
Jesse Rauch: Always critical. So with the focus on open data, is there some, additional work that you’re doing with the federal government and policymakers as a whole, to bring, not only, of course, this data set, but, all data sets into open arenas with good standards?
Can you tell us a little bit about, not only how you’re sharing this data, but data sharing, in general, what’s the state of the future is going to be?
Nick Hart: Well, I think it’s hard to predict what the future holds here! But the Evidence Acts that passed through Congress almost unanimously back in 2018, signed by the President early in the next year, created some new infrastructure that is really instrumental for how we think about our government responsibly sharing data to generate insights that are relevant, whether it’s for researchers or for policymakers.
That infrastructure is, of course, challenged right now by the pandemic. Many of the regulations and the guidance documents we were expecting to come forward have been, you know, delayed a little bit as a consequence of a variety of issues across the federal government.
Even when that guidance is in place, even when those regulations are in place, if anything, I think the pandemic highlights the value of doing this data sharing. We have so many questions right now about what’s going on across society, what’s happening in health effects, and some level of data sharing would really make a big difference here.
So, there is a part of the Evidence Act that specifically allows for the better use of administrative records within a really strong privacy protective legal framework. And that requires the Office of Management and Budget to issue a regulation. They could issue that regulation today to enable that authority to exist and it would rapidly improve the federal statistical systems capability to use administrative records for generating statistics, generating those key summary insights that are relevant and doing it all on a privacy protective way
Because OMB hasn’t issued that regulation yet, these agencies hands are sort of tied – they can’t use that provision of the new law, the new Evidence Act, to really make this happen. But it’s a key aspect of the future of data sharing.
We need that legal framework to be operational and the work that we’re doing at the Data Foundation, I mean, these are the kinds of things that we’re trying to rapidly encourage our government to move forward on. And it’s not just the legal framework.
There are also capabilities that we need to have inside our government. for example, there’s a concept that’s been floated out there for a couple of years, it was really spearheaded by the Evidence Commission, and it’s called the National Secure Data Service. It was envisioned as a secure place to do data linkages and to do this in a temporary way where we can bring together the power of different data sets, potentially collected by different entities, and enable the improved use of that information to generate increasingly valuable insights.
Well, the Data Service doesn’t exist today. Technically the legal authority to do it doesn’t exist today. And this is an area where we know there’s a lot of work to be done, particularly to ensure both transparency and privacy protections that are promised in this infrastructure. So, long way to say we have a lot of work to do.
But I think that the direction is very clear that data sharing and data combination activities will be increasingly common inside the Federal Government infrastructure. And there’s a lot of support for this across the Federal Government – the Chief Data Officers, the federal statistical system – there’s a lot of folks that really want to think about how we implement all of this work, but do it in a responsible way.
We’re not talking about blindly sharing information. This is all for projects and activities that have a clear purpose. And while we’re trying to do everything we can to, again go back to the comment about trust earlier, protecting public trust in this information and that government can responsibly do these activities.
Jesse Rauch: Absolutely. And I know we’re working with the NIH on some data sharing, but it’s really focused on sharing data across government projects and government health research. I understand you also work a lot around sharing data – not just between government entities – but with the private sector.
Can you talk a little bit about the advantages of, not just sharing data internally for the government, but really helping bring in the industry?
Nick Hart: Well, I worked inside government for quite a long time in my career. And government has excellent people – well credentialed staff, very intelligent, expertise that can do a lot of productive work.
But the power of operating in public-private partnerships can’t be missed on anyone. Bringing in researchers from universities or experts from think tanks or even private sector companies has real value and ensuring that we’re applying the absolute best techniques and capabilities we can to generate insights rapidly.
So we no longer live in a world where we can wait three, five years for someone to produce a summary statistic that tells us what the state of the world looks like. The COVID pandemic suggests that we need that information almost in real-time basis – daily, weekly, maybe even hourly. And we can’t just turn the entire government infrastructure on a dime and expect that it’s able to produce those real-time insights for a whole number of reasons.
But public, private partnerships can really help us facilitate this in a much more rapid way. And so if one of the goals is to, again, be able to produce insights that are relevant for the American people and decision-makers, then we have to be able to deploy these strategies as effectively as possible.
And, you know, I don’t say this lightly, but I think government can’t do it alone. The world is a complex place. There are increasingly these wicked problems that we need to figure out how to solve. And the more we can apply data, the better strategies that we can use inside government and outside government to actually understand what the data is telling us, the more likely we’ll be to actually solve these problems.
Jesse Rauch: [And on those, activities, pushing for time is critical. recently the Federal Data Strategy team actually extended some of the target dates in the 2020 Action Plan. What are your thoughts on that extension? Can we afford to wait?
Nick Hart: Well, we can’t afford to wait, but we also can’t afford to get it wrong. I mean, the Federal Data Strategy is a really novel approach inside our government where we’ve recognized that we can’t keep operating by developing these short term, quick band aid fixes to our data problems. We need to develop a long-term strategy, and this is exactly what the Federal Data Strategy was intended to be – this 10 year plan.
We’re making some pretty big changes to how we operate and how we do business. That of course is broken up into one-year action plans. And what you’re alluding to is the change in some of the deadlines on the one-year action plan. I don’t think there’s any reason that that should affect the second year is action plan.
And my hope is that the work is still proceeding and developing that action plan. We always knew the first year would kind of bleed into the second year. There might be some items that get delayed for one reason or another, but we can still keep making progress. The Chief Data Officers have started their meetings, many have established their governments infrastructures across agencies, and that’s, you know, one of the core things that has to happen as a starting point.
But there are so many other elements of the data strategy that we have to continue making progress on some of the pilot projects that were envisioned, like creating the one-stop-shop for researchers to know how to access government data.
So, I’ve been excited to see some of the progress that’s already been made and, you know, really hope that OMB and those who are working on the strategy across the Federal Government will keep pushing as fast and as hard as they can because we just can’t afford to let this slip through the cracks.
Jesse Rauch: Excellent. Yeah. Thank you so much for your insight. before we wrap up, in the impact survey, you ask participants about contact tracing apps and so my team is curious – would you download
Nick Hart: Well, it depends! It depends on, in the consent statement, what they’re going to tell me the data is used for and how they’ll protect the information.
I’m a purist on these topics and even when I take surveys myself, I’m always interested in knowing will this data be used and how’s it going to be protected? All that said, I would also be happy to take a test if anyone wanted to give me one of those!
Jesse Rauch: So true! Well that is a great insight. We of course always appreciate that continued focus on privacy and it’s nice to know that you appreciate your privacy as we talk about open data as well. With that, thank you so much for joining us today. And Marissa, I’ll pass it back to you.
Marissa Wharton: Thanks Jesse, and thank you again, Nick, so much for joining us.
Nick Hart: Of course – thank you!
Marissa Wharton: I think it was really interesting how the COVID Impact Survey started from a tweet, so that was a really interesting story – again, there were so many great insights and takeaways for our listeners.
Thank you, of course, for listening and tuning in! Leave us a rating or review if you like what you heard, hit the subscribe button and tune in to the next episode of the FEDSpace Podcast – produced by Active Navigation.