Skip to main content

AlphaFold2 Part 2: The ion channel challenge

Last month I wrote about the wonders and perils of the artificial intelligence program that predicts 3D protein structures, AlphaFold2. As an ion channel enthusiast, I naturally wanted to know how AlphaFold2 performs at predicting the structures of proteins embedded in cell membranes. When I search PubMed for articles that mention both "AlphaFold" and "ion channel" I only get 34 hits. This surprised me, given the hype and the general paranoia around AI replacing humanity. If we use these search results as a proxy for the state of the ion channel protein structure prediction field, I'd say the juice is still in the coconut.

I wanted to know how well AF2 would do at predicting an ion channel protein structure, so I asked it to generate the structure of Kv2.1, a voltage-gated potassium ion channel that I studied during graduate school. Kv2.1 is a pretty important protein. It regulates neuron firing throughout the brain and body where it helps us learn new stuff, regulate our blood sugar, and contract our muscles, among other things. And we already know a lot about its structure from decades of research using bench techniques like site-directed mutagenesis, electrophysiology, microscopy, human pathophysiology, cryo-EM, etc. So AF2 should know exactly what the structure of this protein looks like, right? The output of AF2 is in Figure 1. 


Figure 1 - Structure of a single human Kv2.1 potassium channel subunit generated by AlphaFold2. Source.

It definitely did get some things right. For example, I can clearly see the six well-known alpha helices that span the entire membrane (S1-S6). I can also see a few smaller helices that are found near the inner and outer membrane, one in the S4-S5 linker region and one in the pore loop. But overall the returned structure is quite...round? This, among other things, just isn't right.

The most obvious problem with the structure is that is only shows a single subunit. Voltage-gated potassium channels like Kv2.1 don't exist in nature as depicted in Figure 1. The structure in Figure 1 shows one subunit, but four subunits like this have to come together in your cells to make a functional channel. Despite tetramerization being a critical aspect of the structure/function relationship for this protein sequence, this fact is not communicated clearly in the output structure. Instead, it's buried in text in the "Biological Function" section of the AlphaFold2 results page, which is mostly a clone of the content from a different protein sequence/structure database called UniProt. Does this flaw make the output of AF2 patently 'wrong'? Well, technically, no. But any AF2 users who want to use AF2 outputs to generate hypotheses about the biological function of their input sequence should bring a healthy dose of skepticism with them.

Another problem with the structure is that there are floppy loops that appear to exist in the same plane where the transmembrane helices are. There's a reason the segments that span the membrane are helical. It's because the ordered helix minimizes exposure of hydrophillic amino acids to the hydrophobic interior of the cell membrane, in turn minimizing the free energy of the structure in its environment.  Again, while this is a well-understood concept in structural biology, AF2 doesn't seem to get it yet. 

Yet another problem, which I alluded to above, is the overall round-ish shape of the output structure. I suspect this is an artifact that comes from AF2's training data set. AF2 was trained using the world's largest existing bank of high-quality protein structures, the Protein Data Bank (PDB). Well, the easiest proteins to purify at the lab bench are the round-ish ones, so naturally those are the ones whose structures have ended up in the PDB over the years. It's no wonder that AF2 came out of its training with a strong impression that all proteins should be kinda round.

To the AlphaFold2 creators' credit, the output structure is color-coded based on how confident the AI is in its prediction. Dark blue means "hell yeah, this is definitely right" while yellow and orange are "be skeptical" and "big nope!", respectively. Accordingly, the highly ordered helices are blue while the floppy disordered parts of the sequence are yellow and orange. Also, there is a large button soliciting feedback near the top of the AlphaFold2 results page to make it easy for experts to share their opinion on what AlphaFold2 is doing well and what it isn't. And yes I did click the button and provide the feedback I've shared here, so I've done my duty.

I'm not the first one to notice these quirks. This assessment of AF2 performance for transmembrane proteins from 2023 highlighted some of my exact points above. The good news is that the scientific community has already created "add-ons" to AF2 that allow users to tell it to consider multiple subunits (AlphaFold Multimer) or that your input sequence is a transmembrane protein (TmAlphaFold). But these tools will only be so useful given the current AF2 training data set. Don't get me wrong, I think AF2 is a fantastic step forward in the field of structural biology, but it would be folly to think this AI could replace the experimentation scientists are carrying out all over the world. You know, the 3D, beautiful, messy, real world where new learning gets done.

Comments

Popular posts from this blog

Precision murder -- wait, no -- medicine

A non-zero amount of what we call ‘medicine’ could be described as just controlled cell murder.  This was my revelation after researching a new treatment for certain cardiac arrhythmias called Pulsed Electric Field Ablation, which I became interested in when my father-in-law asked me how it worked during our Christmas visit. “How can it kill the heart cells and leave the nerves and blood vessels intact?” I had no idea. I know next-to-nothing about medical treatments for cardiac patients, much less how this Pulsed Field Ablation technique could have fewer side effects than the standard-of-care ablation techniques. A quick Google search piqued my curiosity when I learned that PFA is also sometimes called “high frequency irreversible electroporation”. While less catchy, that name revealed a bit more about the mechanism of action behind PFA - electroporation - which happens to be something I actually do know something about. Electroporation refers to the formation of holes (pores) in c...

iPSCs, the new model organism?

Induced pluripotent stem cells. The name doesn't exactly roll off the tongue and it certainly doesn't conjure images of mice, fruit flies, monkeys, or any of the other classic model organisms used for basic biomedical research. These so called "model organisms" are just that; animals that help scientists model the way that the most promising human therapeutics in the collective pipeline will behave in humans. And now induced pluripotent stem cells, or iPSCs, are becoming an increasingly popular tool used for developing and testing novel drugs way before we expose any real human patients to them. The upside to using model organisms is pretty obvious -- we minimize exposure of humans to potentially unsafe molecules. The downsides are many, but one big one is that sometimes potential new drug molecules look really promising when they are given to a mouse with a human-like disease, but then that same molecule does nothing (or worse, is toxic!) when it goes into human clin...