Advances in computational biology celebrated by the Nobel Prize for Chemistry 

chemistry

How did they select the winners for this year’s Nobel Prize for Chemistry? Photo credit: National Cancer Institute via Unsplash


This year, the Nobel Prize for Chemistry celebrated the significant contribution computational biology has made to advancing our understanding of fundamental biochemistry principles. The prize was co-awarded to Sir Demis Hassabis and Dr John Jumper—both based at Google DeepMind—for protein structure prediction, and to Professor David Baker for computational protein design. This work is crucial for our understanding of naturally occurring proteins, in addition to the design of novel ones. 

Much of life is powered by proteins.

Much of life is powered by proteins. Despite all constituting combinations of the same 20 amino acids, they achieve a huge diversity of functions, from speeding up reactions in cells, to regulating the transport of substances, and decoding DNA sequences—a list that barely scratches the surface. Understanding the rules of how a given amino acid sequence will adopt a singular three-dimensional structure has been an aim of scientists for many years, yet the complexity of this problem has made progress slow.  

With a huge array of possible interactions and individual amino acid positions, a thought experiment by Cyrus Levinthal in 1969 postulated that for a single 100 amino acid long sequence, there would be 10⁴⁷ potential resulting structures. Considering many proteins are longer than 100 amino acids, this problem becomes exponentially more complicated. Thankfully, proteins do not simply randomly sample all possible conformations.  Instead, folding follows a distinct series of steps involving the formation of local interactions and the rearrangement of larger structural elements, which in turn, interact with each other to produce the overall fold.  

…folding follows a distinct series of steps involving the formation of local interactions and the rearrangement of larger structural elements…

Understanding the complex relationship between the sequence, structure, and function of proteins has puzzled biochemists for many years. Improving our understanding of this problem is integral to not only our understanding of biochemical pathways but also to speeding up drug discovery and engineering new proteins with target functions. Even though we have access to over 3 billion DNA sequences, only 200 million protein sequences and even fewer structures have been deposited in online databases. Experimental determination of structures using methods such as X-ray crystallography or Cryo-Electron Microscopy can be time-consuming and challenging for certain proteins which explains our current lack of data.  

Understanding the complex relationship between the sequence, structure, and function of proteins has puzzled biochemists for many years.

Emphasising the need for rapid computational prediction methods, the Critical Assessment of Protein Structure Prediction (CASP) was started in 1994. This biannual experiment aims to objectively evaluate the progress made in our ability to predict the three-dimensional structures of proteins from sequence alone. In each round, the participating teams are given 100 protein sequences with unknown structures, and they apply computational methods to predict how these sequences will fold. In parallel, a team of experimentalists determines the structures in the lab, which are compared with the predictions. 

The 14th edition of CASP, held in 2020, claimed that the long-awaited solution to the structure prediction problem had been found. The team at DeepMindhad managed to predict structures with a 70% accuracy—on par with the experimental structures—using the AlphaFold2 program. The original iteration of AlphaFold won CASP13 in 2018, but improvements in the neural network architecture of the software marked a significant breakthrough in its accuracy in the following competition.  

Without knowing anything about a protein, aside from its sequence, deep learning had now paved the way for scientists to start making predictions about how it may behave.

Without knowing anything about a protein, aside from its sequence, deep learning had now paved the way for scientists to start making predictions about how it may behave. The paper published by DeepMind in 2021 is one of the most cited publications of all time, demonstrating the impact of the AlphaFold2 program. Upon receiving the Nobel Prize, Hassabis said he hoped AlphaFold would act as ‘the first proof point of AI’s incredible potential to accelerate scientific discovery’. 

The AlphaFold2 program has now predicted the structure of over 200 million proteins, which are stored in a database maintained by EMBL’s European Bioinformatics Institute. However, it is important to note that it has not made experimental structural determination work obsolete. Many protein structures are predicted with a low confidence score by AlphaFold, and further research is required to validate aspects of the protein that are computationally predicted, especially when proteins have flexibleregions.  

Despite this, it has proved to be an extremely helpful tool to scientists in making initial predictions or aiding experimental work. AlphaFold has proved important in the lab of Martin Beck, enabling them to piece together the significantly large Nuclear Pore Complex, improving the model coverage from 30-60% when using AlphaFold in conjunction with experimental methods. In only a short amount of time, the work of researchers at DeepMind has made possible what was inconceivable only a few years ago—and beyond that, made it possible on an accessible timescale. This year saw the release of AlphaFold3, which has the added capability of predicting how a protein would interact with other molecules. Although the Nobel Prize committee chose to celebrate its achievements now, the full impact of AlphaFold on accelerating scientific discovery remains to be seen. 

Whilst AlphaFold has accelerated our progress in connecting sequence to structure, the co-awardee of the 2024 prize, David Baker, has been instrumental in enabling scientists to design novel proteins with a desired function with the development of Rosetta. Originally constructed as a structure prediction tool like AlphaFold, the Rosetta suite of tools has since been adapted to operate this problem in reverse—generating amino acid sequences that will fold into a specific three-dimensional shape.  

David Baker, has been instrumental in enabling scientists to design novel proteins

Since the late 1900s, scientists have repeatedly attempted to generate novel proteins.  Baker’s lab made a significant breakthrough in protein design in 2008 with the design of Top7—a structure generated from a sequence unlike those found in nature. Experimental work validated that the sequence could fold into the target structure, demonstrating Rosetta’s ability to produce synthetic proteins. 

Rosetta has been used in a wide range of applications, from designing new enzymes to engineering antibodies. During the COVID-19 pandemic, these tools were successfully applied to develop a vaccine that was approved for use in several countries, making it the first entirely computationally designed medicine. David Baker has published more than 640 peer-reviewed papers, been awarded over 100 patents, and co-founded 21 biotechnology companies, demonstrating his substantial contribution to the field. 

Much of the Rosetta architecture, like AlphaFold, is built on machine learning algorithms. This year’s Nobel Prize for Chemistry represents how artificial intelligence significantly benefits humanity when used responsibly. The co-awardees have played a pivotal role in decoding an age-old problem in molecular biology and advancing our abilities to study and design the proteins that power life. 


Top