AI demystifies proteins, the building blocks of life : Short Wave As artificial intelligence seeps into some realms of society, it rushes into others. One area it's making a big difference is protein science — as in the "building blocks of life," proteins! Producer Berly McCoy talks to host Emily Kwong about the newest advance in protein science: AlphaFold3, an AI program from Google DeepMind. Plus, they talk about the wider field of AI protein science and why researchers hope it will solve a range of problems, from disease to the climate.

Have other aspects of AI you want us to cover? Email us at shortwave@npr.org.

AI gets scientists one step closer to mapping the organized chaos in our cells

  • Download
  • <iframe src="https://www.npr.org/player/embed/1198909978/1250685180" width="100%" height="290" frameborder="0" scrolling="no" title="NPR embedded audio player">
  • Transcript

EMILY KWONG, HOST:

You're listening to SHORT WAVE from NPR.

Hey, hey, SHORT WAVErs - Emily Kwong here with producer Berly McCoy. What's up, Berly?

BERLY MCCOY, BYLINE: Hey, Emily.

MCCOY: OK, so, Emily, today I want to dig into how AI has shaken up the field of protein science, as in the fundamental building blocks of life, proteins.

KWONG: I've heard of them. Yeah. I mean, this is like what you studied back in your scientist days.

MCCOY: Yes, yes. I love proteins.

KWONG: Oh, we love that you love them. How has AI moved the needle in this field, though?

MCCOY: Well, scientists have used it to dig into a problem that protein scientists have struggled with for more than 60 years. And that is, what do these building blocks, of which there are millions, look like?

KWONG: Like their shape.

MCCOY: Like their shape. Yeah, exactly.

KWONG: Oh. And why is that so important?

MCCOY: Well, the ability of a protein to do its specific job - so, like, carry oxygen through your body or turn light into sugar. That relies wholly on its unique, complicated shape. So to understand how it works, you need to know its shape.

KWONG: But why can't scientists just run an experiment to determine the shape?

MCCOY: They can for some proteins, but those experiments can take years and years. And, Emily, that's because a scientist essentially needs to take the equivalent of a molecular photo of the protein to map its complicated shape. But getting the protein to cooperate to get that photo - so, like, to hold still, for example, without falling apart - that can be super-tricky. And it could take a grad student's entire Ph.D. program to figure out a single protein. And other proteins were just abandoned because they would never cooperate.

KWONG: Proteins sound difficult, honestly. So the challenge is how do you figure out a protein's shape without running these super-tedious experiments? Is this where AI comes in?

MCCOY: Yeah. And to give you a sense of how AI has changed the protein game, there's this protein competition that scientists run every other year.

KWONG: Get out - a protein competition. OK.

MCCOY: Yeah, and they've run it for the past 30 years where groups will basically compete on who can accurately guess the most protein shapes. It's, like, nerd central, for sure.

KWONG: We love.

MCCOY: And for most of that 30-year history, participants have really only made incremental progress. But in 2020, Google DeepMind used AlphaFold 2. That's its AI protein prediction model. And, Emily, AlphaFold 2 blew the other competition out of the water completely.

KWONG: Wow. OK.

MCCOY: Game-changer. Now the Google DeepMind team has taken this AI tool to the next level by expanding it beyond proteins.

KWONG: So today on the show, how scientists have taken a huge step to understanding the building blocks of life using AI.

MCCOY: Plus, how other researchers are using the tech to design brand-new proteins, one's never before seen in nature.

KWONG: And how AI could help us solve the biggest problems we face today, from disease to climate. You are listening to SHORT WAVE, the science podcast from NPR.

(SOUNDBITE OF EXMOOR EMPEROR'S "FASTER SPEEDS")

KWONG: OK, Berly. So scientists, it seems, have been trying to figure out the complicated shapes of proteins for decades to better understand how they work. Why has this been such a complicated thing to figure out?

MCCOY: Well, the short answer, Emily, is that there are so many theoretical ways a single protein could fold that it's a big problem to solve. So if you unfolded a protein, it would look like a bunch of beads on a long string. Those beads are little molecules called amino acids.

KWONG: Oh, I remember this from biology. There are, like, 20 types...

MCCOY: Yeah.

KWONG: ...Of amino acids.

MCCOY: Yep.

KWONG: Each one is a little different.

MCCOY: Right. So each one has a slightly different shape, and that kind of dictates how that part of the string can be folded up. Because proteins often have a hundred or more amino acids, you can see how imagining all the ways it could fold would get complicated.

KWONG: Yeah. It just sounds like thousands of different shapes or -what? - hundreds of thousands of different shapes.

MCCOY: OK, try billions of trillions, Emily. Like, there are theoretically more ways for one single protein to fold than there are stars in our night sky.

KWONG: This sounds like a glorious nightmare.

MCCOY: Right?

KWONG: I'm so curious. OK. So you said that AI has helped us make some leaps and bounds towards a solution. How does this technology work?

MCCOY: So this AlphaFold model is a type of AI called a deep learning program, which is this huge network of data processing points called nodes. And the purpose of this network is to learn and then make predictions based on what it's learned. In AlphaFold's case and other models like it, it learns about proteins from a huge collection of protein structures that scientists have been building on for decades from their experimental data.

KWONG: OK. So the idea is that after these models use all of that carefully gathered experimental data to learn, they can then predict the shapes of proteins they do not know yet.

MCCOY: Exactly.

KWONG: OK. And going back to the protein competition in 2020, how did AlphaFold blow away the competition?

MCCOY: So they essentially changed the whole architecture of their model. They had been using AI before, but remember the beads on a string analogy? If amino acids are the beads, even if one bead is far from another on the string, when it all folds up, they could be right next to each other. So with AlphaFold 2, the model looked at distances between all the different amino acids and previous knowledge from solved protein structures.

KWONG: Awesome.

MCCOY: And the accuracy and speed of the predictions went way up.

KWONG: OK. And I'm assuming that made a huge difference for scientists everywhere studying proteins.

MCCOY: Totally. Julien Bergeron, a structural biologist at King's College London, is one of them. He studies the tail-like appendage that propels bacteria. So it's called a flagellum and it's pretty complicated.

JULIAN BERGERON: It's this huge assembly. So it's longer than the bacterial cell itself. It consists of 20 to 25 different proteins, but many of them have hundreds of thousands of copies of that protein.

MCCOY: And these huge propeller machines are what gives some bacteria the ability to make you sick or build plaque on your teeth. So Julian's lab is trying to figure out how these giant machines work, what their pieces look like and how it all fits together. And so when the AlphaFold 2 model came out, he just had to try it.

BERGERON: And I input a sequence, and then a few hours later, I had the model. And I was like, oh, my God. This just did it, and we'd been struggling with that problem for, you know, months if not years. And all of a sudden, I messaged my lab, and I said, remodel everything (laughter). And we've had dozens of projects that immediately progressed thanks to this.

KWONG: OK. So it sounds like overnight, AlphaFold changed the trajectory of his lab.

MCCOY: Yeah.

KWONG: But how do you know that using AlphaFold 2 would actually work?

MCCOY: Yeah. So the accuracy is super-important - right? - especially when you're basing all of your other experiments on the results. And it's important to note that, like other AI, AlphaFold 2 isn't right a hundred percent of the time. So you can't just take the results at face value. But unlike some other AI, included in the results is a score basically telling you how accurate each part of the structure is.

KWONG: OK. And are others in the field using AlphaFold 2?

MCCOY: Yeah. So this is something that actually sets AlphaFold apart from other protein prediction AI models. It's extremely user-friendly. So essentially, anyone who works on a protein or even just has a sequence of a protein can plug it in and get results. I talked to Pushmeet Kohli, vice president of research at Google DeepMind, and he told me why it was important for them to make this tool open-access.

PUSHMEET KOHLI: The mission statement that we have for the science program at Google DeepMind is to leverage AI to accelerate and advance science.

KWONG: OK, so I'm scrolling through the AlphaFold website, and I'm seeing scientists using this model for all kinds of things. They're working on malaria and cancer research, drug discovery, plastic-eating enzymes.

MCCOY: And last week DeepMind released a new version, AlphaFold 3, which can predict the 3D structure of proteins and other kinds of biomolecules that they attach to.

KWONG: Why are those other biomolecules important?

MCCOY: Yes. So I know we talked about how much proteins are super-important. I love them. But I have to admit they rarely work alone. And if we actually want to know how biology works as a whole, we need to understand how proteins work with their partner molecules.

KOHLI: So it really gives you a detailed and more accurate picture of what is happening inside the body, where proteins are just - not just existing in isolation. They are interacting in a very rich biological space or soup of RNA and DNA and small molecules. And it really sheds light into those rich interactions.

MCCOY: Now, previous versions of these protein prediction softwares would model where each amino acid was located. But in this new version, AlphaFold 3, it maps things on an even smaller level. So I models where individual atoms are.

KWONG: Wow.

MCCOY: So they can predict the structure of multi-protein complexes like the bacterial flagellum or something like proteins in the blood, which attach to iron atoms.

KWONG: That is powerful. OK. What are the limits to AlphaFold predictions?

MCCOY: Yeah, there are definitely limitations. Pushmeet says that the model works best when a protein has a single defined structure. But some proteins have more than one shape, or they have sections that are kind of flimsy. Think cooked versus uncooked spaghetti.

KWONG: OK. So the model has, sounds like, some trouble with prediction in some cases, and the results show that.

MCCOY: Yeah. The idea is that these results would say, hey. I'm not so confident in this area of the protein, just so, like, users know. Another limitation is that the prediction ability depends on the amount of what's called training data available. So I mentioned that there's a lot of training data for proteins. But...

KOHLI: Some categories have much less training data available. For example, there's much less structural data available for RNA.

KWONG: OK, so the prediction is only as good as the data.

MCCOY: Exactly, exactly. But, Emily...

KWONG: But, Berly...

MCCOY: There's another way scientists can use AI in the protein world.

KWONG: OK, what's that?

MCCOY: To generate brand-new proteins, ones, like, not found in nature anywhere.

DAVID BAKER: Humans face new problems today, and, you know, we live longer. We're polluting and heating up the planet. And it's reasonable to think that if, with another million - more millions of years of evolution - that some of these problems would be solved. But we don't want to wait that long. So the idea is that we can now create completely new proteins that solve these problems that weren't really relevant during evolution to make the world a better place.

MCCOY: So this is David Baker. He's a biochemist and the director of the Institute for Protein Design at the University of Washington. And he's been working on proteins for years. He actually developed one of the earlier protein prediction models. His lab has a similar AI program to AlphaFold 3. It's called RoseTTaFold All-Atom. But his big focus is designing these brand-new proteins.

KWONG: This sounds so futuristic.

MCCOY: Right?

KWONG: Like, what kind of new proteins?

MCCOY: So far, they've done things like design new protein antibodies, which are important for fighting infections - in this case, to fight influenza. They've made something called a switch protein that could be used as an environmental sensor. And they've also made proteins that could help store carbon, which is a huge hurdle for fighting climate change.

BAKER: I think really across, you know, medicine, sustainability, technology, I think there's huge opportunities to transform the current ways we do things with protein design.

MCCOY: So these predictive and generative AI models have fundamentally changed the protein science landscape. And, again, there's definitely room for improving the prediction power. But with what the field has shifted to, like, in terms of prediction accuracy and design potential, I mean, it's really gotten this retired protein fanatic, like, missing my science days.

(SOUNDBITE OF MUSIC)

KWONG: Berly, thank you so much for bringing us this big story about the little things in life.

MCCOY: Thanks, Emily.

(SOUNDBITE OF MUSIC)

KWONG: This episode was produced by Rachel Carlson. It was edited by our showrunner, Rebecca Ramirez. Berly checked the facts. Ko Takasugi-Czernowin was the audio engineer. Special thanks to Geoff Brumfiel. Beth Donovan is our senior director, and Colin Campbell is our senior vice president of podcasting strategy. I'm Emily Kwong. Thank you for listening to SHORT WAVE from NPR.

(SOUNDBITE OF MUSIC)

Copyright © 2024 NPR. All rights reserved. Visit our website terms of use and permissions pages at www.npr.org for further information.

NPR transcripts are created on a rush deadline by an NPR contractor. This text may not be in its final form and may be updated or revised in the future. Accuracy and availability may vary. The authoritative record of NPR’s programming is the audio record.