From the course: The Data Science of Healthcare, Medicine, and Public Health

Data science and COVID-19

- [Instructor] Now, one of the challenges about talking about data science and how it connects to healthcare and medicine and public health at this point in time is that you simply can't do it without prominently addressing right up front COVID-19 and the global pandemic that resulted. The reason for that is, as everybody who has lived through it and that's everybody, COVID has impacted pretty much everything everywhere and, as it turns out, that data science is no exception to this. There are a few places where data science was really important and very public. One place, for instance, was in tracking COVID. It turns out that tracking diseases is difficult, especially with things that have a long incubation time. And so there was a lot of work that went into tracking COVID, as well as mapping the prevalence of the disease across places and across time, and so people saw maps like this on a regular basis. Also, the extraordinary work that went into developing vaccines in earth-shattering time and distributing the vaccines, producing them, and trying to persuade people to take them. Data science has been involved in a substantial part of all of these. But these are really the most obvious most public ways. There are a few other ways that data science and COVID have interacted. One, for instance, is data in the media. People got used to looking at charts, especially getting used to looking at logarithmic growth charts, which under normal circumstances are confusing. But when you're looking at something that grows exponentially like an infectious disease, that's allowed people to compare their growth rate across time from one place to another. Also, people got more accustomed to looking at rates and not just the total number of infections or deaths in a place, but the infections or deaths per 100,000 residents, a way of making more valid comparisons. Also, people became more aware of the tracking that is done passively on them. It's been going on for years and they may have had some idea of it, but because tracking became so important to monitoring the disease and warning people when they've been exposed, people became more keenly, explicitly aware of tracking in cell phones, in social media and other forms of technology. And then finally there was the development of COVID-related apps, that things that allowed you to do some kind of self-diagnosis, find out where you could get tested, qualifying for tests, getting your test results, doing contact tracing, having your vaccine passport and so on. These are all different ways that data science contributed in one way or another to the global response to COVID-19. There are a few other ways that have been unexpected. One is that a lot of data scientists who work, for instance, for private companies have had to start acting like in-house epidemiologists, not so much that they're trying to find where the disease came from, but they're trying to predict the disease patterns. So they know how it affects their company. Say, for instance, you're in transportation, it's going to have a huge impact. Say, you're in manufacturing. You're concerned about the supply chain. Say, you're in restaurants or hospitality. You are going to have people developing their own models in your company of how COVID is affecting your business and how you can best react to it. Next is disrupted analyses. One of the most interesting results of this is because the pandemic created such a different data environment from anything else that in recent history, it meant that predictive models that developed using a whole host of machine learning algorithms before the pandemic were no longer valid. They couldn't adapt necessarily to the new circumstances. And so a number of organizations were taking all of their predictive work, putting it off to the side. And in fact, just going back to descriptive models, just tell us what we're dealing with right now. And instead of trying to project five years into the future, let's try to project five days or even five hours. And it changed the way that analyses were conducted both during the pandemic and the way people are preparing for them afterwards, because they understand that there can be these huge disruptors within the data and the modeling. It requires you to have a different method. And actually that leads to the idea that a number of companies now are involved in what's called disaster modeling. The idea is we may have another pandemic. We may have another global or local emergency, but people are understanding that they need to try to be prepared for these future disruptions. Some companies are already very good at that. Developing a plan B for their supply chain, but this is something that more people are becoming aware of, especially when it comes to making your long-term projections, which is one of the important goals of data science work. A few other places where the pandemic has affected data science includes things, like understanding political participation. So COVID affected the ability for people to participate in public rallies or maybe even protests, but it didn't affect all regions or all groups equally. Some places had higher infection rates. Some places had groups that were less hesitant to go out, even when there was an infection. And that makes it harder to estimate support for candidates or positions. Also, it created this huge surge in voting by mail and absentee ballots. And that affects dramatically the way that political predictions are made. There's also an issue of crime and causality. Estimating crime is hard, but understanding the causes of changes in crime is always a difficult task under the best of circumstances. But with COVID, there were so many confounding changes, other things that could be explanations for any observed changes in crime rates. For instance, there were lockdowns, which made it very difficult for people to move around. Left a lot of people at home, a lot of people became unemployed and there were very serious financial problems that occurred. A lot of people had children at home all day, or schools were empty. There were so many different things that affected the potential exposure to crime, the motivations for crime. And it's hard to know whether COVID caused these things or they created some of the other changes. And it turns out that no matter what kind of analyses you're doing, teasing causality out of observational data is always difficult. And then when it comes to something as simple as school attendance, the idea normally is you just count the number of heads in class, but we know that when schools basically everywhere when online or people were going at home, or you went to these, some people can do hybrids, some people don't have to. It made even counting attendance, which affects things like, obviously, learning and students ability to take standardized tests, a lot of which got abandoned, but it affects local politics. And all of that was affected by massive shifts in the pandemic. And so some of these data questions, even something as simple as counting how many students are in fact attending school or learning from school became much more complicated by the pandemic. Now I'm going to specifically focus on the health-related elements of data science and how it relates to things like the pandemic and other diseases and other health issues. But I want to address that major issue, the elephant in the room, about the pandemic before we move on to any of the rest of this to give you an idea of the context in which the recent developments in data science and health have taken place.

Contents