MIPT scientists accelerated the search for new drugs with the help of machine learning
In recent years, computer modeling has greatly facilitated the creation of new drugs by predicting the structure of molecules and their interactions. However, even such a “purely computer” screening can be too expensive and difficult, if we are talking about millions of substances. Therefore, the authors of a new article in the Journal of Chemical Information and Modeling—researchers from MIPT, the Universities of Groningen and Grenoble—made this process much faster and more efficient with the help of active machine learning.
The biological properties of molecules – the ability to perform their functions and interact with other compounds – directly depend on the structure. This also applies to proteins – receptors and enzymes, which are the target of various pharmacological drugs. Recent advances in predicting the structure of protein molecules using artificial intelligence have opened up new opportunities for drug design. However, it is important to know how proteins interact with other compounds.
Testing the activity of many thousands of ligands — molecules that bind a protein — in a real experiment is too expensive and time-consuming. Therefore, scientists are now starting with virtual screening, that is, a computer search for substances with the necessary properties. For example, the interactions of future drugs and their targets are modeled using molecular docking (literally “docking” in English), that is, the search for the optimal mutual location of molecules upon contact.
Virtual screening allows to assess the biological effects of a substance much faster and cheaper. The most promising of the candidate molecules are then tested in a real experiment, if successful, in preclinical studies on animals and only then on patients.
However, virtual screening of large molecular libraries faces difficulties. Such libraries usually have tens of millions of connections. It is clear that the verification of such a number requires significant computing resources. It is important to understand that machine time (that is, how long the processor is running) means monetary costs. Yes, docking just one ligand takes a few seconds of CPU time. Processing a large library of tens of millions of ligands using cloud services will require tens of years of processor work and cost tens of thousands of dollars. Therefore, scientists are trying to make this process faster and more affordable.
A possible solution to this problem was described by the authors of a new article in the highly rated Journal of Chemical Information and Modeling – scientists from the Center for Researching Molecular Mechanisms of Aging and Age-related Diseases of the Moscow Institute of Technology, as well as the Universities of Groningen (Netherlands) and Grenoble (France).
“Due to the huge number of substances to be tested, virtual screening takes a lot of machine time, even if we use modern computing resources. Moreover, the chemical space of potential drugs is constantly expanding, which requires an increase in the efficiency of the process,” said the head of the study Valentyn Borshchevskiy, deputy director of the Center for Researching Molecular Mechanisms of Aging and Age-related Diseases of the Moscow Institute of Technology.
The authors of the new study used libraries that describe the docking of a million ligands for each of the four studied proteins. These are human adenosine receptor type A2 (AA2AR), cannabinoid receptor type 2 (CB2), dopamine receptor type 4 (D4), and beta-lactamase AmpC, an enzyme that makes bacteria resistant to antibiotics.
Next, the authors found out which model of machine learning (machine learning, ML) is best suited for predicting docking results in this case. She found linear regression to be a fairly simple method compared to ML heavyweights such as random forest, decision trees, or deep learning.
Then linear regression was used as active learning. It happened in stages: after docking a small portion of the library, at each new step, a basic model was trained, which highlighted the ligands with the maximum scores – they were used for docking at the next stage. As a result, screening just 10 percent of the library using this scheme revealed 48 to 91 percent of the ligands that are in the one-hundredth of the most active. The quality indicators of the models were compared with those previously obtained using much more “sophisticated” models.
We have demonstrated that machine learning can significantly accelerate the search for promising substances. It turned out that it is not necessary to evaluate the affinity (that is, the affinity of molecules, the strength of binding) to everyone. It is enough to select a small number of molecules from the list, evaluate their affinity, train artificial intelligence on them, and then accurately predict promising substances from the remaining list. This makes it possible to significantly speed up the process of developing new drugs,” Valentyn Borshchevskii concluded.
The work was carried out with the support of the Russian Science Foundation (grant 22-24-00454). The authors would like to thank the Data Processing Center of the Moscow State Technical University for high-performance computing infrastructure and technical support.