Last month I wrote about the wonders and perils of the artificial intelligence program that predicts 3D protein structures, AlphaFold2. As an ion channel enthusiast, I naturally wanted to know how AlphaFold2 performs at predicting the structures of proteins embedded in cell membranes. When I search PubMed for articles that mention both "AlphaFold" and "ion channel" I only get 34 hits. This surprised me, given the hype and the general paranoia around AI replacing humanity. If we use these search results as a proxy for the state of the ion channel protein structure prediction field, I'd say the juice is still in the coconut.
I wanted to know how well AF2 would do at predicting an ion channel protein structure, so I asked it to generate the structure of Kv2.1, a voltage-gated potassium ion channel that I studied during graduate school. Kv2.1 is a pretty important protein. It regulates neuron firing throughout the brain and body where it helps us learn new stuff, regulate our blood sugar, and contract our muscles, among other things. And we already know a lot about its structure from decades of research using bench techniques like site-directed mutagenesis, electrophysiology, microscopy, human pathophysiology, cryo-EM, etc. So AF2 should know exactly what the structure of this protein looks like, right? The output of AF2 is in Figure 1.
It definitely did get some things right. For example, I can clearly see the six well-known alpha helices that span the entire membrane (S1-S6). I can also see a few smaller helices that are found near the inner and outer membrane, one in the S4-S5 linker region and one in the pore loop. But overall the returned structure is quite...round? This, among other things, just isn't right.
The most obvious problem with the structure is that is only shows a single subunit. Voltage-gated potassium channels like Kv2.1 don't exist in nature as depicted in Figure 1. The structure in Figure 1 shows one subunit, but four subunits like this have to come together in your cells to make a functional channel. Despite tetramerization being a critical aspect of the structure/function relationship for this protein sequence, this fact is not communicated clearly in the output structure. Instead, it's buried in text in the "Biological Function" section of the AlphaFold2 results page, which is mostly a clone of the content from a different protein sequence/structure database called UniProt. Does this flaw make the output of AF2 patently 'wrong'? Well, technically, no. But any AF2 users who want to use AF2 outputs to generate hypotheses about the biological function of their input sequence should bring a healthy dose of skepticism with them.
Another problem with the structure is that there are floppy loops that appear to exist in the same plane where the transmembrane helices are. There's a reason the segments that span the membrane are helical. It's because the ordered helix minimizes exposure of hydrophillic amino acids to the hydrophobic interior of the cell membrane, in turn minimizing the free energy of the structure in its environment. Again, while this is a well-understood concept in structural biology, AF2 doesn't seem to get it yet.
Yet another problem, which I alluded to above, is the overall round-ish shape of the output structure. I suspect this is an artifact that comes from AF2's training data set. AF2 was trained using the world's largest existing bank of high-quality protein structures, the Protein Data Bank (PDB). Well, the easiest proteins to purify at the lab bench are the round-ish ones, so naturally those are the ones whose structures have ended up in the PDB over the years. It's no wonder that AF2 came out of its training with a strong impression that all proteins should be kinda round.
To the AlphaFold2 creators' credit, the output structure is color-coded based on how confident the AI is in its prediction. Dark blue means "hell yeah, this is definitely right" while yellow and orange are "be skeptical" and "big nope!", respectively. Accordingly, the highly ordered helices are blue while the floppy disordered parts of the sequence are yellow and orange. Also, there is a large button soliciting feedback near the top of the AlphaFold2 results page to make it easy for experts to share their opinion on what AlphaFold2 is doing well and what it isn't. And yes I did click the button and provide the feedback I've shared here, so I've done my duty.

Comments
Post a Comment