Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Published: 26 April 2023

Computational approaches streamlining drug discovery

  • Anastasiia V. Sadybekov   ORCID: orcid.org/0000-0003-3925-983X 1 , 2 &
  • Vsevolod Katritch   ORCID: orcid.org/0000-0003-3883-4505 1 , 2 , 3  

Nature volume  616 ,  pages 673–685 ( 2023 ) Cite this article

84k Accesses

199 Citations

414 Altmetric

Metrics details

  • Cheminformatics
  • Virtual screening

Computer-aided drug discovery has been around for decades, although the past few years have seen a tectonic shift towards embracing computational technologies in both academia and pharma. This shift is largely defined by the flood of data on ligand properties and binding to therapeutic targets and their 3D structures, abundant computing capacities and the advent of on-demand virtual libraries of drug-like small molecules in their billions. Taking full advantage of these resources requires fast computational methods for effective ligand screening. This includes structure-based virtual screening of gigascale chemical spaces, further facilitated by fast iterative screening approaches. Highly synergistic are developments in deep learning predictions of ligand properties and target activities in lieu of receptor structure. Here we review recent advances in ligand discovery technologies, their potential for reshaping the whole process of drug discovery and development, as well as the challenges they encounter. We also discuss how the rapid identification of highly diverse, potent, target-selective and drug-like ligands to protein targets can democratize the drug discovery process, presenting new opportunities for the cost-effective development of safer and more effective small-molecule treatments.

Similar content being viewed by others

computer aided drug design research

The transformational role of GPU computing and deep learning in drug discovery

computer aided drug design research

Artificial intelligence–enabled virtual screening of ultra-large chemical libraries with deep docking

computer aided drug design research

AI is a viable alternative to high throughput screening: a 318-target study

Despite amazing progress in basic life sciences and biotechnology, drug discovery and development (DDD) remain slow and expensive, taking on average approximately 15 years and approximately US$2 billion to make a small-molecule drug 1 . Although it is accepted that clinical studies are the priciest part of the development of each drug, most time-saving and cost-saving opportunities reside in the earlier discovery and preclinical stages. Preclinical efforts themselves account for more than 43% of expenses in pharma, in addition to major public funding 1 , driven by the high attrition rate at every step from target selection to hit identification and lead optimization to the selection of clinical candidates. Moreover, the high failure rate in clinical trials (currently 90%) 2 is largely explained by issues rooted in early discovery such as inadequate target validation or suboptimal ligand properties. Finding fast and accessible ways to discover more diverse pools of higher-quality chemical probes, hits and leads with optimal absorption, distribution, metabolism, excretion and toxicology (ADMET) and pharmacokinetics (PK) profiles at the early stages of DDD would improve outcomes in preclinical and clinical studies and facilitate more effective, accessible and safer drugs.

The concept of computer-aided drug discovery 3 was developed in the 1970s and popularized by Fortune magazine in 1981, and has since been through several cycles of hype and disillusionment 4 . There have been success stories along the way 5 and, in general, computer-assisted approaches have become an integral, yet modest, part of the drug discovery process 6 , 7 . In the past few years, however, several scientific and technological breakthroughs resulted in a tectonic shift towards embracing computational approaches as a key driving force for drug discovery in both academia and industry. Pharmaceutical and biotech companies are expanding their computational drug discovery efforts or hiring their first computational chemists. Numerous new and established drug discovery companies have raised billions in the past few years with business models that heavily rely on a combination of advanced physics-based molecular modelling with deep learning (DL) and artificial intelligence (AI) 8 . Although it is too early yet to expect approved drugs from the most recent computationally driven discovery efforts, they are producing a growing number of clinical candidates, with some campaigns specifically claiming target-to-lead times as low as 1–2 months 9 , 10 , or target-to-clinic time under 1 year 11 . Are these the signs of a major shift in the role that computational approaches have in drug discovery or just another round of the hype cycle?

Let us look at the key factors defining the recent changes (Fig. 1 ). First, the structural revolution—from automation in crystallography 12 to microcrystallography 13 , 14 and most recently cryo-electron microscopy technology 15 , 16 —has made it possible to reveal 3D structures for the majority of clinically relevant targets, often in a state or molecular complex relevant to its biological function. Especially impressive has been the recent structural turnaround for G protein-coupled receptors (GPCRs) 17 and other membrane proteins that mediate the action of more than 50% of drugs 18 , providing 3D templates for ligand screening and lead optimization. The second factor is a rapid and marked expansion of drug-like chemical space, easily accessible for hit and lead discovery. Just a few years ago, this space was limited to several million on-shelf compounds from vendors and in-house screening libraries in pharma. Now, screening can be done with ultra-large virtual libraries and chemical spaces of drug-like compounds, which can be readily made on-demand, rapidly growing beyond billions of compounds 19 , and even larger generative spaces with theoretically predicted synthesizability (Box 1 ). The third factor involves emerging computational approaches that strive to take full advantage of the abundance of 3D structures and ligand data, supported by the broad availability of cloud and graphics processing unit (GPU) computing resources to support these methods at scale. This includes structure-based virtual screening of ultra-large libraries 20 , 21 , 22 , using accelerated 23 , 24 , 25 and modular 26 screening approaches, as well as recent growth of data-driven machine learning (ML) and DL methods for predicting ADMET and PK properties and activities 27 .

figure 1

a , More than 200,000 protein structures in the PDB, plus private collections, have more than 90% of protein families covered with high-resolution X-ray and more recently cryo-electron microscopy structures, often in distinct functional states, with remaining gaps also filled by homology or AlphaFold2 models. b , The chemical space available for screening and fast synthesis has grown from about 10 7 on-shelf compounds in 2015 to more than 3 × 10 10 on-demand compounds in 2022, and can be rapidly expanded beyond 10 15 diverse and novel compounds. c , Computational methods for VLS include advances in fast flexible docking, modular fragment-based algorithms, DL models and hybrid approaches. d , Computational tools are supported by rapid growth of affordable cloud computing, GPU acceleration and specialized chips.

Although the impacts of the recent structural revolution 17 and computing hardware in drug discovery 28 are comprehensively reviewed elsewhere, here we focus on the ongoing expansion of accessible drug-like chemical spaces as well as current developments in computational methods for ligand discovery and optimization. We detail how emerging computational tools applied in gigaspace can facilitate the cost-effective discovery of hundreds or even thousands of highly diverse, potent, target-selective and drug-like ligands for a desired target, and put them in the context of experimental approaches (Table 1 ). Although the full impact of new computational technologies is only starting to affect clinical development, we suggest that their synergistic combination with experimental testing and validation in the drug discovery ecosystem can markedly improve its efficiency in producing better therapeutics.

Box 1
 Types of chemical libraries and spaces for drug discovery

Pharma companies amass collections of compounds for screening in-house, whereas in-stock collections from vendors (see the figure, part a ) allow fast (less than 1 week) delivery, contain unique and advanced chemical scaffolds, are easily searchable and are HTS compatible. However, the high cost of handling physical libraries, their slow linear growth, limited size and novelty constrain their applications.

More recently, virtual on-demand chemical databases (fully enumerated) and spaces (not enumerated) allow fast parallel synthesis from available building blocks, using validated or optimized protocols, with synthetic success of more than 80% and delivery in 2–3 weeks (see the figure, part b ). The virtual chemical spaces assure high chemical novelty and allow fast polynomial growth with the addition of new synthons and reaction scaffolds, including 4+ component reactions. Examples include Enamine REAL, Galaxy by WuXi, CHEMriya by Otava and private databases and spaces at pharmaceutical companies.

Generative spaces, unlike on-demand spaces, comprise theoretically possible molecules and collectively could comprise all chemical space (see the figure, part c ). Such spaces are limited only by theoretical plausibility, estimated as 10 23 –10 60 of drug-like compounds. Although allowing comprehensive space coverage, the reaction path and success rate of generated compounds are unknown, and thus require computational prediction of their practical synthesizability. Examples of generative spaces and their subsets include GDB-13, GDB-17, GDB-18 and GDBChEMBL.

computer aided drug design research

Expansion of accessible chemical space

Why bigger is better.

The limited size and diversity of screening libraries have long been a bottleneck for detection of novel potent ligands and for the whole process of drug discovery. An average ‘affordable’ high-throughput screening (HTS) campaign 29 uses screening libraries of about 50,000–500,000 compounds and is expected to yield only a few true hits after secondary validation. Those hits, if any, are usually rather weak, non-selective, have suboptimal ADMET and PK properties and unknown binding mode, so their discovery entails years of painstaking trial-and-error optimization efforts to produce a lead molecule with satisfying potency and all the other requirements for preclinical development. Scaling of HTS to a few million compounds can be afforded only in big pharma, and it still does not make that much difference in terms of the quality of resulting hits. Likewise, virtual libraries that use in silico screening were traditionally limited to a collection of compounds available in stock from vendors, usually comprising fewer than 10 million unique compounds, therefore the scale advantage over HTS was marginal.

Although chasing the full coverage of the enormous drug-like chemical space (estimated at more than 10 63 compounds) 30 is a futile endeavour, expanding the screening of on-demand libraries by several orders of magnitude to billions and more of previously unexplored drug-like compounds, either physical or virtual, is expected to change the drug discovery model in several ways. First, it can proportionally increase the number of potential hits in the initial screening 31 (Fig. 2 ). This abundance of ligands in the library also increases the chances of identification of more potent or selective ligands, as well as ligands with better physicochemical properties. This has been demonstrated in ultra-large virtual screening campaigns for several targets, revealing highly potent ligands with affinities often in the mid-nanomolar to sub-nanomolar range 20 , 21 , 22 , 23 , 26 . Second, the accessibility of hit analogues in the same on-demand spaces streamlines a generation of meaningful structure–activity relationship (SAR)-by-catalogue and further optimization steps, reducing the amount of elaborate custom synthesis. Last, although the library scale is important, properly constructed gigascale libraries can expand chemical diversity (even with a few chemical reactions 32 ), chemical novelty and patentability of the hits, as almost all on-demand compounds have never been synthesized before.

figure 2

The red curves in log scale illustrate the distribution of screening hits with binding scores better than X for libraries of 10 billion, 100 million and 1 million compounds, as estimated from previous VLS and V-SYNTHES screening campaigns. The blue curves illustrate the approximate dependence of the experimental hit rate on the predicted docking score for 10-µM, 1-µM and 100-nM thresholds 20 . This analysis (semi-quantitative, as it varies from target to target) suggests that screening of more than 100 million compounds lifts the limitations of smaller libraries, extending the tail of the hit distribution towards better binding scores with high hit rates, and allowing for identification of proportionally more experimental hits with higher affinity. Note also two important factors justifying further growth of screening libraries to 10 billion and more: (1) the candidate hits for synthesis and experimental testing are usually picked as a result of target-dependent post-processing of several thousands of top-scoring compounds, which selects for novelty, diversity, drug likeness and often interactions with specific receptor residues. Thus, the more good-scoring compounds that are identified, the better overall selection can be made. (2) Saturation of the hit rate curves at best scores is not a universal rule but a result of the limited accuracy of fast scoring functions used in screening. Using more accurate docking or scoring approaches (flexible docking, quantum mechanical and free energy perturbation) in the post-processing step can extend a meaningful correlation of binding score with affinity further left (grey dashed curves), potentially bringing even more high-affinity hits for gigascale chemical spaces.

Physical libraries

Several approaches have been developed recently to push the library size limits in HTS, including combinatorial chemistry and large-scale pooling of the compounds for parallel assays. For example, affinity-selection mass spectrometry techniques can be applied to identify binders directly in pools of thousands of compounds 33 without the need for labelling. DNA-encoded libraries (DELs) and cost-effective approaches to generate and screen them have also been developed 34 , making it possible to work with as many as approximately 10 10 compounds in a single test tube 35 . These methods have their own limitations; as DELs are created by tagging ligands with unique DNA sequences through a linker, DNA conjugation limits the chemistries possible for the combinatorial assembly of the library. Screening of DELs may also yield a large number of false negatives by blocking important moieties for binding and, more importantly, false positives by nonspecific binding of DNA labels, so expensive off-DNA resynthesis of hit compounds is needed for their validation. To avoid this resynthesis, it has been suggested to use ML modes trained on DEL results for each target to predict drug-like ligands from on-demand chemical spaces, as described in ref. 36 .

Virtual on-demand libraries

In silico screening of virtual libraries by fast computational approaches has long been touted as a cost-effective way to overcome the limitations of physical libraries. Only recently, however, have synthetic chemistry and cheminformatics approaches been developed to break out of these limits and construct virtual on-demand libraries that explore much larger chemical space, as reviewed in refs. 37 , 38 . In 2017, the readily accessible (REAL) database by Enamine 19 , 39 became the first commercially available on-demand library based on the robust reaction principle 40 , whereas the US National Institutes of Health developed synthetically accessible virtual inventory (SAVI) 41 , which also uses Enamine building blocks. The REAL database uses carefully selected and optimized parallel synthesis protocols and a curated collection of in-stock building blocks, making it possible to guarantee the fast (less than 4 weeks), reliable (80% success rate) and affordable synthesis of a set of compounds 21 . Driven by new reactions and diverse building blocks, the fully enumerated REAL database has grown from approximately 170 million compounds in 2017 to more than 5.5 billion compounds in 2022 and comprises the bulk of the popular ZINC20 virtual screening database 42 . The practical utility of the REAL database has been recently demonstrated in several major prospective screening campaigns 20 , 21 , 23 , 24 , some of them taking further hit optimization steps in the same chemical space, yielding selective nanomolar and even sub-nanomolar ligands without any custom synthesis 20 , 21 . Similar ultra-large virtual libraries (that is, GalaXi ( http://www.wuxiapptec.com ) and CHEMriya ( http://chemriya.com )) are available commercially, although their synthetic success rates are yet to be published.

Virtual chemical spaces

The modular nature of on-demand virtual libraries supports further growth by the addition of reactions and building blocks. However, building, maintaining and searching fully enumerated chemical libraries comprising more than a few billion compounds become slow and impractical. Such gigascale virtual libraries are therefore usually maintained as non-enumerated chemical spaces, defined by a specific set of building blocks and reactions (or transforms), as comprehensively reviewed in ref. 38 . Within pharma, one of the first published examples includes PGVL by Pfizer 37 , 43 , the most recent version of which uses a set of 1,244 reactions and in-house reagents to account for 10 14 compounds. Other biopharma companies have their own virtual chemical spaces 38 , 44 , although their details are often not in the public domain. Among commercially available chemical spaces, GalaXi Space by WuXi (approximately 8 billion compounds), CHEMriya by Otava (11.8 billion compounds) and Enamine REAL Space (36 billion compounds) 45 are among the largest and most established. In addition to their enormous sizes, these virtual spaces are highly novel and diverse, and have minimal overlap (less than 10%) between each other 46 . Currently, the largest commercial space, Enamine REAL Space, is an extension to the REAL database that maintains the same synthetic speed, rate and cost guarantees, covering more than 170 reactions and more than 137,000 building blocks (Box 1 ). Most of these reactions are two-component or three-component, but more four-component or even five-component reactions are being explored, enabling higher-order combinatorics. This space can be easily expanded to 10 15 compounds based on available reactions and extended building block sets, for example, 680 million of make on demand (MADE) building blocks 47 , although synthesis of such compounds involves more steps and is more expensive. To represent and navigate combinatorial chemical spaces without their full enumeration, specialized cheminformatics tools have been developed, from fragment-based chemical similarity searches 48 to more elaborate 3D molecular similarity search methods based on atomic property fields such as rapid isostere discovery engine (RIDE) 38 .

An alternative approach proposed to building chemical spaces generates hypothetically synthesizable compounds following simple rules of synthetic feasibility and chemical stability. Thus, the generated databases (GDB) predict compounds that can be made of a specific number of atoms; for example, GDB-17 contained 166.4 billion molecules of up to 17 atoms of C, N, O, S and halogens 49 , whereas GDB-18 made up of 18 atoms would reach an estimated 10 13 compounds 38 . Other generative approaches based on narrower definitions of chemical spaces are now used in de novo ligand design with DL-based generative chemistry (for example, ref. 50 ), as discussed below.

Although the synthetic success rate for some of the commercial on-demand chemical spaces (for example, Enamine REAL Space) have been thoroughly validated 20 , 21 , 22 , 23 , 24 , 26 , 42 , synthetic accessibilities and success rates of other chemical spaces remain unpublished 38 . These are important metrics for the practical sustainability of on-demand synthesis because reduced success rates or unreasonable time and cost would diminish its advantage over custom synthesis.

Computational approaches to drug design

Challenges of gigascale screening.

Chemical spaces of gigascale and terrascale, provided that they maintain high drug likeness and diversity, are expected to harbour millions of potential hits and thousands of potential lead series for any target. Moreover, their highly tractable robust synthesis simplifies any downstream medicinal chemistry efforts towards final drug candidates.

Dealing with such virtual libraries, however, calls for new computational approaches that meet special requirements for both speed and accuracy. They have to be fast enough to handle gigascale libraries. If docking of a compound takes 10 s per CPU core, it would take more than 3,000 years to screen 10 10 compounds on a single CPU core, or cost approximately US $1 million on a computing cloud at the cheapest CPU rates. At the same time, gigascale screening must be extremely accurate, safeguarding against false-positive hits that effectively cheat the scoring function by exploiting its holes and approximations 31 . Even a one-in-a-million rate of false positives in a 10 10 compound library would comprise 10,000 false hits, which may flood out any hit candidate selection. The artefact rate and nature may depend on the target and screening algorithms and should be carefully addressed in screening and post-processing. Although there is no one simple solution for such artefacts, some practical and reasonably cost-effective remedies include: (1) selection based on the consensus of two different scoring functions, (2) selection of highly diverse hits (many artefacts cluster to similar compounds), (3) hedging the bets from several ranges of scores 31 and (4) manually curating the final list of compounds for any unusual interactions. Ultimately, it is highly desirable to fix as many remaining ‘holes in the scoring functions’ as possible, and reoptimize them for high selectivity in the range of scores where the top true hits of gigaspace are found. Missing some hits in screening (false negatives) would be well tolerated because of the huge number of potential hits in the 10 10 space (for example, losing 50% of a million potential hits is perfectly fine), so some trade-off in score sensitivity is acceptable.

The major types of computational approaches to screening a protein target for potential ligands are summarized in Table 2 . Below, we discuss some emerging technologies and how they can best fit into the overall DDD pipeline to take full advantage of growing on-demand chemical spaces.

Receptor structure-based screening

In silico screening by docking molecules of the virtual library into a receptor structure and predicting its ‘binding score’ is a well-established approach to hit and lead discovery and had a key role in recent drug discovery success stories 11 , 17 , 51 . The docking procedure itself can use molecular mechanics, often in internal coordinate representation, for rapid conformational sampling of fully flexible ligands 52 , 53 , using empirical 3D shape-matching approaches 54 , 55 , or combining them in a hybrid docking funnel 56 , 57 . Special attention is devoted to ligand scoring functions, which are designed to reliably remove non-binders to minimize false-positive predictions, which is especially relevant with the growth of library size. Blind assessments of the performance of structure-based algorithms have been routinely performed as a D3R Grand Challenge community effort 58 , 59 , showing continuous improvements in ligand pose and binding energy predictions for the best algorithms.

Results of the many successful structure-based prospective screening campaigns have been published over the years covering all major classes of targets, most recently GPCRs, as reviewed in refs. 17 , 51 , 60 , whereas countless more have been used in industry. The focused candidate ligand sets, predicted by such screening, often show useful (10–40%) hit rates in experimental testing 60 , yielding novel hits for many targets with potencies in the 0.1–10-μM range (for those that are published, at least). Further steps in optimization of the initial hits obtained from standard screening libraries of less than 10 million compounds, however, usually require expensive custom synthesis of analogues, which has been afforded only in a few published cases 20 , 61 .

Identification of hits directly in much larger chemical spaces such as REAL Space not only can bring more and better hits 31 but also supports their optimization, as any resulting hit has thousands of analogues and derivatives in the same on-demand space. This advantage was especially helpful for such challenging targets as SARS-CoV-2 main protease (M pro ), for which hundreds of standard virtual ligand screening (VLS) attempts came up empty-handed 62 (see discussion on M pro challenges in ‘Hybrid in vitro–in silico approaches’ below). Although the initial hit rates were low even in the ultra-large screens, VirtualFlow 24 of the REAL database with 1.4 billion compounds still identified hits in the 10–100-µM range, which were optimized via on-demand synthesis 63 to yield quality leads with the best compound Z222979552 (half maximal inhibitory concentration (IC 50 ) = 1.0 μM). Another ultra-large screen of 235 million compounds, based on a newer M pro structure with a non-covalent inhibitor (Protein Data Bank (PDB) ID: 6W63 ), also produced viable hits, fast optimization of which resulted in the discovery of nanomolar M pro inhibitors in just 4 months by a combination of on-demand and simple custom chemistry 64 . The best compound in this work had good in vitro ADMET properties, with an affinity of 38 nM and a cell-based antiviral potency of 77 nM, which are comparable to clinically used PF-07321332 (nirmatrelvir) 65 .

With increasing library sizes, the computational time and cost of docking itself become the main bottleneck in screening, even with massively parallel cloud computing 60 . Iterative approaches have been recently suggested to tackle libraries of this size; for example, VirtualFlow used stepwise filtering of the whole library with docking algorithms of increasing accuracy to screen approximately 1.4 billion Enamine REAL compounds 23 , 24 . Although improving speed several-fold, the method still requires a fully enumerated library and its computational cost grows linearly with the number of compounds, limiting its applicability in rapidly expanding chemical spaces.

Modular synthon-based approaches

The idea of designing molecules from a limited set of fragments to optimally fill the receptor binding pocket has been entertained from the early years of drug discovery, implemented, for example, in the LUDI algorithm 66 . However, custom synthesis of the designed compounds remained the major bottleneck of such approaches. The recently developed virtual synthon hierarchical enumeration screening (V-SYNTHES) 26 technology applies fragment-based design to on-demand chemical spaces, thus avoiding the challenges of custom synthesis (Fig. 3 ). Starting with the catalogue of REAL Space reactions and building blocks (synthons), V-SYNTHES first prepares a minimal library of representative chemical fragments by fully enumerating synthons at one of the attachment points, capping the other position (or positions) with a methyl or phenyl group. Docking-based screening then allows selection of the top-scoring fragments (for example, the top 0.1%) that are predicted to bind well into the target pocket. This is repeated for a second position (and then third and fourth positions, if available), and the resulting focused libraries are screened at each iteration against the target pocket. At the final step, the top approximately 50,000 full compounds from REAL Space are docked with more elaborate and accurate docking parameters or methods, and the top-ranking candidates are filtered for novelty, diversity and variety of desired drug-like properties. In post-processing, the best 50–500 compounds are selected for synthesis and testing. Our assessment suggests that combining synthons with the scaffolds and capping them with dummy minimal groups in the V-SYNTHES algorithm is a critical requirement for optimal fragment predictions because reactive groups of building blocks and scaffolds often create strong, yet false, interactions that are not present in the full molecule. Another important part of the algorithm is the evaluation of the fragment-binding pose in the target, which prioritizes those hits with minimal caps pointed into a region of the pocket where the fragment has space to grow.

figure 3

An overview of the V-SYNTHES algorithm allowing effective screening of more than 31 billion compounds in REAL Space or even larger chemical spaces, while performing enumeration and docking of only small fractions of molecules. The algorithm, illustrated here using a two-component reaction based on a sulfonamide scaffold with R 1 and R 2 synthons, can be applied to hundreds of optimized two-component, three-component or more-component reactions by iteratively repeating steps 3 and 4 until fully enumerated molecules optimally fitting the target pocket are obtained. PAINS, pan assay interference compounds.

Initially applied to discover new chemotypes for cannabinoid receptor CB 2 antagonists, V-SYNTHES has shown a hit rate of 23% for submicromolar ligands, which exceeded the hit rate of standard VLS by fivefold, while taking about 100 times less computational resources 26 . A similar hit rate was found for the ROCK1 kinase screening in the same study, with one hit in the low nanomolar range 26 . V-SYNTHES is being applied to other therapeutically relevant targets with well-defined pocket structures.

A similar approach, chemical space docking, has been implemented by BioSolveIT, so far for two-component reactions 67 . This method is even faster, as it docks individual building block fragments and then enumerates them with scaffolds and other synthons. However, there are trade-offs for the extra speed: docking of smaller fragments without scaffolds is less reliable, and their reactive groups often have dissimilar properties from the reaction product. This may introduce strong receptor interactions that are irrelevant to the final compound and can misguide the fragment selection. This is especially true for cycloaddiction reactions and three-component scaffolds, which need further validation in chemical space docking.

Apart from supporting the abundance, chemical diversity and potential quality of hits, structure-based modular approaches are especially effective in identifying hits with robust chemical novelty, as they (1) do not rely on information for existing ligands and (2) identify ligands that have never been synthesized before. This is an important factor in assuring the patentability of the chemical matter for hit compounds and the lead series arising from gigascale screening. Moreover, thousands of easily synthesizable analogues assure extensive SAR-by-catalogue for the best hits, which, for example, enabled approximately 100-fold potency and selectivity improvement for the CB 2 V-SYNTHES hits 26 . Availability of the multilayer on-demand chemical space extensions (for example, supported by MADE building blocks 47 ) can also greatly streamline the next steps in lead optimization through ‘virtual MedChem’, thus reducing extensive custom synthesis.

Data-driven approaches and DL

In the era of AI-based face recognition, ChatGPT and AlphaFold 68 , there is enormous interest in applications of data-driven DL approaches across drug discovery, from target identification to lead optimization to translational medicine (as reviewed in refs. 69 , 70 , 71 ).

Data-driven approaches have a long history in drug discovery, in which ML algorithms such as support vector machine, random forest and neural networks have been used extensively to predict ligand properties and on-targets activities, albeit with mixed results. Accurate quantitative structure–property relationship (QSPR) models can predict physicochemical (for example, solubility and lipophilicity) and pharmacokinetic (for example, bioavailability and blood–brain barrier penetration) properties, in which large and broad experimental datasets for model training are available and continue to grow 72 , 73 , 74 . ML is also implemented in many quantitative SAR (QSAR) algorithms 75 , in which the training set and the resulting models are focused on a given target and a chemical scaffold, helping to guide lead affinity and potency optimization. Methods based on extensive ligand–target binding datasets, chemical similarity clustering and network-based approaches have also been suggested for drug repurposing 76 , 77 .

The advent of DL takes data-driven models to the next level, allowing analysis of much larger and diverse datasets while deriving more complicated non-linear relationships, with vast literature describing specific DL methodologies and applications to drug discovery 27 , 70 . By its ‘learning from examples’ nature, AI requires comprehensive ligand datasets for training the predictive models. For QSPR, large public and private databases have been accumulated, with various properties such as solubility, lipophilicity or in vitro proxies for oral bioavailability and brain permeability experimentally measured for many thousands of diverse compounds, allowing prediction of these properties in a broad range of new compounds.

The quality of QSAR models, however, differs for different target classes depending on data availability, with the most advances achieved for the kinase superfamily and aminergic GPCRs. An unbiased benchmark of the best ML QSAR models was given by a recent IDG-DREAM Drug-Kinase Binding Prediction Challenge with the participation of more than 200 experts 78 . The top predictive models in this blind assessment included kernel learning, gradient boosting and DL-based algorithms. The top-performing model (from team Q.E.D) used a kernel regression, protein sequence similarity and affinity values of more than 60,000 compound–kinase pairs between 13,608 compounds and 527 kinases from ChEMBL 79 and Drug Target Commons 80 databases as the training data. The best DL model used as many as 900,000 experimental ligand-binding data points for training, but still trailed the much simpler kernel model in performance. The best models achieved a Spearman rank coefficient of 0.53 with a root-mean-square error of 0.95 for the predicted versus experimental p K d values in the challenge set. Such accuracy was found to be on par with the accuracy and recall of single-point experimental assays for kinase inhibition, and may be useful in screenings for the initial hits for less explored kinases and guiding lead optimization. Note, however, that the kinase family is unique as it is the largest class of more than 500 targets, all possessing similar orthosteric binding pockets and sharing high cross-selectivity. The distant second family with systematic cross-reactivity comprises about 50 aminergic GPCRs, whereas other GPCR families and other cross-reactive protein families are much smaller. The performance and generalizability of ML and DL methods for these and other targets remain to be tested.

The development of broadly generalizable or even universal models is the key aspiration of AI-driven drug discovery. One of the directions here is to extract general models of binding affinities (binding score functions) from data on both known ligand activities and corresponding protein–ligand 3D structures, for example, collected in the PDBbind database 81 or obtained from docking. Such models explore various approaches to represent the data and network architectures, including spatial graph-convolutional models 82 , 83 , 3D deep convolutional neural networks 84 , 85 or their combinations 86 . A recent study, however, found that regardless of neural network architecture, an explicit description of non-covalent intermolecular interactions in the PDBbind complexes does not provide any statistical advantage compared with simpler approximations of only ligand or only receptor that omit the interactions 87 . Therefore, the good performances of DL models based on PDBbind rely on memorizing similar ligands and receptors, rather than on capturing general information about their binding. One possible explanation for this phenomenon is that the PDBbind database does not have an adequate presentation of ‘negative space’, that is, ligands with suboptimal interaction patterns to enforce the training.

This mishap exemplifies the need for a better understanding of behaviour of DL models and their dependence on the training data, which is widely recognized in the AI community. It has been shown that DL models, especially based on limited datasets lacking negative data, are prone to overtraining and spurious performance, sometimes leading to whole classes of models deemed ‘useless’ 88 or severely biased by subjective factors defining the training dataset 89 . Statistical tools are being developed to define the applicability range and carefully validate the performance of the models. One of the proposed concepts is the predictability, computability and stability framework for ‘veridical data science’ 90 . Adequate selection of quality data has been specifically identified by leaders of the AI community as the major requirement for closing the ‘production gap’, or the inability of ML models to succeed when they are deployed in the real world, thus calling for a data-centric approach to AI 91 , 92 . There have also been attempts to develop tools to make AI ‘explainable’, that is, able to formulate some general trends in the data, specifically in the drug discovery applications 93 .

Despite these challenges and limitations, AI is already starting to make a substantial effect on drug discovery, with the first AI-based drug candidates making it into the preclinical and clinical studies. For kinases, the AI-driven compounds were reported as potent and effective in vivo inhibitors of the receptor tyrosine kinase DDR1, which is involved in fibrosis 9 . Phase I clinical trials have been announced for ISM001-055 (also known as INS018_055) for the treatment of idiopathic pulmonary fibrosis 10 , although the identity of the compound and its target has not been disclosed. For GPCRs, AI-driven compounds targeting 5-HT 1A , dual 5-HT 1A –5-HT 2A and A 2A receptors have recently entered clinical trials, providing further support for the AI-driven drug discovery concept. These first success stories are coming from kinase and GPCR families with already well-studied pharmacology, and the compounds show close chemical similarity to known high-affinity scaffolds 94 . It is important for the next generation of DL drug candidates to improve in novelty and applicability range.

Hybrid computational approaches

As discussed above, physics-based and data-driven approaches have distinct advantages and limitations in predicting ligand potency. Structure-based docking predictions are naturally generalizable to any target with 3D structures and can be more accurate, especially in eliminating false positives as the main challenge of screening. Conversely, data-driven methods may work in lieu of structures and can be faster, especially with GPU acceleration, although they struggle to generalize beyond data-rich classes of targets. Therefore, there are numerous ongoing efforts to combine physics-based and data-driven approaches in some synergistic ways in general 95 , and in drug discovery specifically 96 .

In virtual screening approaches, a synergetic use of physics-based docking with data-based scoring functions may be highly beneficial. Moreover, if the physics-based and data-based scoring functions are relatively independent and both generate enrichment in the selected focused libraries, their combination can reduce the false-positive rates and improve the quality of the hits. This synergy is reflected in the latest 3DR Grand Challenge 4 results for ligand IC 50 predictions 59 , in which the top methods that used a combination of both physics-based and ML scoring outperformed those that did not use ML. Going forward, thorough benchmarking of physics-based, ML and hybrid approaches will be a key focus of a new Critical Assessment of Computational Hit-finding Experiments (CACHE), which will assess five specific scenarios relevant to practical hit and lead discovery and optimization 97 .

At a deeper level, the results of accurate physics-based docking (in addition to experimental data, for example, from PDBbind 81 ) can be used to train generalized graph or 3D DL models predicting ligand–receptor affinity. This would help to markedly expand the training dataset and balance positive and negative (suboptimal binding) examples, which is important to avoid the overtraining issues described in ref. 87 . Such DL-based 3D scoring functions for predicting molecular binding affinity from a docked protein−ligand complex are being developed and benchmarked, most recently RTCNN 98 , although their practical utility remains to be demonstrated.

To expand the range of structure-based docking applicability to those targets lacking high-resolution structures, it is also tempting to use AI-derived AlphaFold2 (refs. 99 , 100 ) or RosettaFold 101 3D models, which already show utility in many applications, including protein–protein and protein–peptide docking 102 . Traditional homology models based on close protein similarity, especially when refined with known ligands 103 , have been used in small-molecule docking and virtual screening 104 , therefore AlphaFold2 is expected to further expand the scope of structural modelling and its accuracy. In a recent report, AlphaFold2 models, augmented by other AI approaches, helped to identify a cyclin-dependent kinase 20 (CDK20) small-molecule inhibitor, although at a modest affinity of 8.9 μM (ref. 105 ). More general benchmarking of the performance of AlphaFold2 models in virtual screening, however, gives mixed results. In a benchmark focused on targets with existing crystal structures, most AlphaFold2 models had to be cleaned from loops blocking the binding pocket and/or augmented with known ion or other cofactors to achieve reasonable enrichment of hits 106 . For the more practical cases of targets lacking experimental structures, especially for target classes with less obvious structural homologies in the ligand-binding pocket, the performance of AlphaFold2 models in small-molecule docking showed disappointing results in recent assessments for GPCR and antibacterial targets 107 , 108 . The recently developed AphaFill approach 109 for ‘transplanting’ small-molecule cofactors and ligands form PDB structures to homologous AlphaFold2 models can potentially help to validate and optimize these models, although further assessment of their utility for docking and virtual screening is ongoing.

To speed up virtual screening of ultra-large chemical libraries, several groups have suggested hybrid iterative approaches, in which results of structure-based docking of a sparse library subset are used to train ML models, which are then used to filter the whole library to further reduce its size. These methods, including MolPal 25 , Active Learning 110 and DeepDocking 111 , report as much as 14–100 reduction in the computational cost for libraries of 1.4 billion compounds, although it is not clear how they would scale to rapidly growing chemical spaces.

We should emphasize here that scoring functions in fast-docking algorithms and ML models are primarily designed and trained to effectively separate potential target binders from non-binders, although they are not very accurate in predictions of binding affinities or potencies. For more accurate potency predictions, the smaller focused library of candidate binders selected by the initial AI or docking-based screening can be further analysed and ranked using more elaborate physics-based tools, including free energy perturbation methods for relative 112 and absolute 113 , 114 , 115 free energy of ligand binding. Although these methods are much slower, utilization of GPU accelerated calculations 28 holds the potential for their broader application in post-processing in virtual screening campaigns to further enrich the hit rates for high-affinity candidates (Fig. 2 ), as well as in lead optimization stages.

Future challenges

Further growth of readily accessible chemical spaces.

The advent of fast and practical methods for screening gigascale chemical spaces for drug discovery stimulates further growth of these on-demand spaces, supporting better diversity and the overall quality of identified hits and leads. Specifically developed for V-SYNTHES screening, the xREAL extension of Enamine REAL Space now comprises 173 billion compounds 116 , and can be further expanded to 10 15 compounds and beyond by tapping into an even larger building block set (for example, to 680 million of MADE building blocks 47 ), by including four-component or five-component scaffolds, and by using new click-like chemistries as they are discovered. Real-world testing of MADE-enhanced REAL Space, and other commercial and proprietary chemical spaces will allow a broader assessment of their synthesizability and overall utility 38 , 117 , 118 . In parallel, specialized ultra-large libraries can be built for important scaffolds underrepresented in general purpose on-demand spaces, for example, screening of a virtual library of 75 million easily synthesizable tetrahydropyridines recently yielded potent agonists for the 5-HT 2A receptor 119 .

Further growth of the on-demand chemical space size and diversity is also supported by recent development of new robust reactions for the click-like assembly of building blocks. As well as ‘classical’ azide-alkyne cycloaddition click chemistry 120 , recognized by the 2022 Nobel Prize in chemistry 121 , and optimized click-like reactions including SuFEx 122 , more recent developments such as Ni-electrocatalysed doubly decarboxylative cross-coupling 123 show promise. Other carbon–carbon forming reactions use methyliminodiacetic acid boronates for C sp 2 –C sp 2 couplings 124 , and most recently tetramethyl N -methyliminodiacetic acid boronates 125 for stereospecific C sp 3 –C bond formation. Each of these reactions applied iteratively can generate new on-demand chemical spaces of billions of diverse compounds operating with a limited number of building blocks. Similar to the routinely used automatic assembly of amino acids in peptide synthesis, fully automated processes could be carried out with robots capable of producing a library of drug-like compounds on demand using combinations of a few thousand diverse building blocks 126 , 127 , 128 . Such machines are already working, although scaling-up production of thousands of specialized building blocks remains the bottleneck.

The development of more robust generative chemical spaces can also be supported by new computational approaches in synthetic chemistry, for example, predictions of new iterative reaction sequences 129 or synthetic routes and feasibility from DL-based retrosynthetic analysis 130 . In generative models, synthesizability predictions can be coupled with predictions of potency and other properties towards higher levels of automated chemical design 131 . Thus, generative adversarial networks combined with reinforcement learning (GAN-RL) were recently used to predict synthetic feasibility, novelty and biological activity of compounds, enabling the iterative cycle of in silico optimization, synthesis and testing of the ligands in vitro 50 , 132 . When applied within a set of well-established reactions and pharmacologically explored classes of targets, these approaches already yield useful hits and leads, leading to clinical candidates 50 , 132 . However, the wider potential of automated chemical design concepts and robotic synthesis in drug discovery remains to be seen.

Hybrid in vitro–in silico approaches

Although blind benchmarking and recent prospective screening success stories for the growing number of targets support utility of modern computational tools, there are whole classes of challenging targets, in which existing in silico screening approaches are not expected to fare very well by themselves. Some of the hardest cases are targets with cryptic or shallow pockets that have to open or undergo a substantial induced fit to engage ligand, as often found when targeting allosteric sites, for example, in kinases or GPCRs, or protein–protein interactions in signalling pathways.

Although bioinformatics and molecular dynamics approaches can help to detect and analyse allosteric and cryptic pockets 133 , computational tools alone are often insufficient to support ligand discovery for such challenging sites. The cryptic and shallow pockets, however, have been rather successfully handled by fragment-based drug discovery approaches, which start with experimental screening for the binding of small fragments. The initial hits are found by very sensitive methods, such as BIACORE, NMR, X-ray 134 , 135 and potentially cryo-electron microscopy 136 , to reliably detect weak binding, usually in the 10–100-μM range. The initial screening of the target can be also performed with fragments decorated by a chemical warhead enabling proximity-driven covalent attachment of a low-affinity ligand 137 . In either case, elaboration of initial fragment hits to full high-affinity ligands is the key bottleneck of fragment-based drug discovery, which requires a major effort involving ‘growing’ the fragment or linking two or more fragments together. This is usually an iterative process involving custom ligand design and synthesis that can take many years 134 , 138 . At the same time, structure-based virtual screening can help to computationally elaborate the fragments to match the experimentally identified conformation of the target binding pocket. Most cost-effectively, this approach can be applied when fragment hits are identified from the on-demand space building blocks or their close analogues for easy elaboration in the same on-demand space 139 .

The recent examples of hybrid fragment-based computational design approaches targeting SARS-CoV-2 inhibitors highlight the challenges presented by such targets and allow head-to-head comparisons to ultra-large VLS. One of the studies was aimed at the SARS-CoV-2 NSP3 conserved macrodomain enzyme (Mac1), which is a target critical for the pathogenesis and lethality of the virus. Building on crystallographic detection of the low-affinity (180 μM) fragments weakly binding Mac1 (ref. 139 ), merging of the fragments identified a 1-μM hit, quickly optimized by catalogue synthesis to a 0.4-μM lead 140 . In the same study, an ultra-scale screening of 400 million REAL database identified more than 100 new diverse chemotypes of drug-like ligands, with follow-up SAR-by-catalogue optimization yielding a 1.7-μM lead 140 . For the SARS-CoV-2 main protease M pro , the COVID Moonshot initiative published results of crystallographic screening of 1,500 small fragments with 71 hits bound in different subpockets of the shallow active site, albeit none of them showing in vitro inhibition of protease even at 100 μM (ref. 141 ). Numerous groups crowdsourcing the follow-up computational design and screening of merged and growing fragments helped to discover several SAR series, including a non-covalent M pro inhibitor with an enzymatic IC 50 of 21 μM. Further optimization by both structure-based and AI-driven computational approaches, which used more than 10 million MADE Enamine building blocks, led to the discovery of preclinical candidates with cell-based IC 50 in the approximately 100-nM range, approaching the potency of nirmatrelvir 65 . The enormous scale, urgency and complexity of this Moonshot effort with more than 2,400 compounds synthesized on demand and measured in more than 10,000 assays are unprecedented and this highlights the challenges of de novo design of non-covalent inhibitors of M pro .

Beyond the Moonshot initiative, a flood of virtual screening efforts yielded mostly disappointing results 62 , for example, the antimalaria drug ebselen, which was proposed in an early virtual screen 142 , failed in clinical trials. Most of these studies, however, screened small-ligand sets focused on repurposing existing drugs, lacked experimental support and used the first structure of M pro solved in a covalent ligand complex (PDB ID: 6LU7 ) that was suboptimal for docking non-covalent molecules 142 .

In comparison, several studies screening ultra-large libraries were able to identify de novo non-covalent M pro inhibitors in the 10–100-μM range 24 , 62 , 63 , 143 , while experimentally testing only a few hundred synthesized on-demand compounds. One of these studies further elaborated on these weak VLS hits by testing their Enamine on-demand analogues, revealing a lead with IC 50  = 1 μM in cell-based assays, and validating its non-covalent binding crystallographically 63 . Another study based on a later, more suitable non-covalent co-crystal structure of M pro (PDB ID: 6W63 ) used an ultra-large docking and optimization strategy to discover even more potent 38-nM lead compounds 64 . Note that, although the results of the initial ultra-large screenings for M pro were modest, they were on par with the much more elaborate and expensive efforts of the Moonshot hybrid approach, with simple on-demand optimization leading to similar-quality preclinical candidates. These examples suggest that even for challenging shallow pockets, structure-based virtual screening can often provide a viable alternative when performed at gigascale and supported by accurate structures, sufficient testing and optimization effort.

Outlook towards computer-driven drug discovery

With all the challenges and caveats, the emerging capability of in silico tools to effectively tap into the enormous abundance and diversity of drug-like on-demand chemical spaces at the key target-to-hit-to-lead-to-clinic stages make it tempting to call for the transformation of the DDD ecosystem from computer-aided to computer-driven 144 (Fig. 4 ). At the early hit identification stage, the ultra-scale virtual screening approaches, both structure-based and AI-based, are becoming mainstream in providing fast and cost-effective entry points into drug discovery campaigns. At the hit-to-lead stage, the more elaborate potency prediction tools such as free energy perturbation and AI-based QSAR often guide rational optimization of ligand potency. Beyond the on-target potency and selectivity, various data-driven computational tools are routinely used in multiparameter optimization of the lead series that includes ADMET and PK properties. Of note, chemical spaces of more than 10 10 diverse compounds are likely to contain millions of initial hits for each target 20 (Box 1 ), thousands of potent and selective leads and, with some limited medicinal chemistry in the same highly tractable chemical space, drug candidates ready for preclinical studies. To harness this potential, the computational tools need to become more robust and better integrated into the overall discovery pipeline to ensure their impact in translating initial hits into preclinical and clinical development.

figure 4

Schematic comparison of the standard HTS plus custom synthesis-driven discovery pipeline versus the computationally driven pipeline. The latter is based on easily accessible on-demand or generative virtual chemical spaces, as well as structure-based and AI-based computational tools that streamline each step of the drug discovery process.

One should not forget here that any computational models, however useful or accurate, may never ensure that all of the predictions are correct. In practice, the best virtual screening campaigns result in 10–40% of candidate hits confirmed in experimental validation, whereas the best affinity predictions used in optimization rarely have accuracy better than 1 kcal mol −1 root-mean-square error. Similar limitations apply to current computational models predicting ADMET and PK properties. Therefore, computational predictions always need experimental validation in robust in vitro and in vivo assays at each step of the pipeline. At the same time, experimental testing of predictions also provides data that can feed back into improving the quality of the models by expanding their training datasets, especially for the ligand property predictions. Thus, the DL-based QSPR models will greatly benefit from further accumulating data in cell-permeability assays such as CACO-2 and MDCK, as well as new advanced technologies such as organs-on-a-chip or functional organoids to provide better estimates of ADMET and PK properties without cumbersome in vivo experiments. The ability to train ADMET and PK models with in vitro assay data representing the most relevant species for drug development (typically mouse, rat and human) would also help to address species variability as a major challenge for successful translational studies. All of this creates a virtuous cycle for improving computational models to the point at which they can drive compound selection for most DDD end points. When combined with more accurate in vitro testing, this may reduce and eventually eliminate animal test requirements (as recently indicated by FDA) 145 .

Building hybrid in silico–in vitro pipelines with easy access to the enormous on-demand chemical space at all stages of the gene-to-lead process can help to generate abundant pools of diverse lead compounds with optimal potency, selectivity and ADMET and PK properties, resulting in less compromise in multiparameter optimization for clinical candidates. Running such data-rich computationally driven pipelines requires overarching data management tools for drug discovery, many of them being implemented in pharma and academic DDD centres 146 , 147 . Building computationally driven pipelines will also help to reveal weak or missing links, in which new approaches and additional data may be needed to generate improved models, thus helping to fill the remaining computational gaps in the DDD pipeline. Provided this systematic integration continues, computer-driven ligand discovery has a great potential to reduce the entry barriers for generating molecules for numerous lines of inquiry, whether it is in vivo probes for new and understudied targets 148 , polypharmacology and pluridimensional signalling, or drug candidates for rare diseases and personalized medicine.

Austin, D. & Hayford, T. Research and development in the pharmaceutical industry. CBO https://www.cbo.gov/publication/57126 (2021).

Sun, D., Gao, W., Hu, H. & Zhou, S. Why 90% of clinical drug development fails and how to improve it? Acta Pharm. Sin. B 12 , 3049–3062 (2022).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Bajorath, J. Computer-aided drug discovery. F1000Res. 4 , F1000 Faculty Rev-1630 (2015).

Article   Google Scholar  

Van Drie, J. H. Computer-aided drug design: the next 20 years. J. Comput. Aided Mol. Des. 21 , 591–601 (2007).

Article   ADS   CAS   PubMed   Google Scholar  

Talele, T. T., Khedkar, S. A. & Rigby, A. C. Successful applications of computer aided drug discovery: moving drugs from concept to the clinic. Curr. Top. Med. Chem. 10 , 127–141 (2010).

Article   CAS   PubMed   Google Scholar  

Macalino, S. J. Y., Gosu, V., Hong, S. & Choi, S. Role of computer-aided drug design in modern drug discovery. Arch. Pharmacal. Res. 38 , 1686–1701 (2015).

Article   CAS   Google Scholar  

Sabe, V. T. et al. Current trends in computer aided drug design and a highlight of drugs discovered via computational techniques: a review. Eur. J. Med. Chem. 224 , 113705 (2021).

Jayatunga, M. K., Xie, W., Ruder, L., Schulze, U. & Meier, C. AI in small-molecule drug discovery: a coming wave. Nat. Rev. Drug Discov. 21 , 175–176 (2022).

Zhavoronkov, A. et al. Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nat. Biotechnol. 37 , 1038–1040 (2019). This study claims the discovery of a lead candidate in just 21 days, using generative AI, synthesis, and in vitro and in vivo testing of the compounds .

US National Library of Medicine. ClinicalTrials.gov https://clinicaltrials.gov/ct2/show/NCT05154240#contactlocation (2022).

Schrodinger. Schrödinger announces FDA clearance of investigational new drug application for SGR-1505, a MALT1 inhibitor. Schrodinger https://ir.schrodinger.com/node/8621/pdf (2022). This press release states that combined physics-based and ML methods enabled a computational screen of 8.2 billion compounds and the selection of a clinical candidate after 10 months and only 78 molecules synthesized .

Jones, N. Crystallography: atomic secrets. Nature 505 , 602–603 (2014).

Liu, W. et al. Serial femtosecond crystallography of G protein–coupled receptors. Science 342 , 1521–1524 (2013).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Nannenga, B. L. & Gonen, T. The cryo-EM method microcrystal electron diffraction (MicroED). Nat. Methods 16 , 369–379 (2019).

Fernandez-Leiro, R. & Scheres, S. H. Unravelling biological macromolecules with cryo-electron microscopy. Nature 537 , 339–346 (2016).

Renaud, J.-P. et al. Cryo-EM in drug discovery: achievements, limitations and prospects. Nat. Rev. Drug Discov. 17 , 471–492 (2018).

Congreve, M., de Graaf, C., Swain, N. A. & Tate, C. G. Impact of GPCR structures on drug discovery. Cell 181 , 81–91 (2020).

Santos, R. et al. A comprehensive map of molecular drug targets. Nat. Rev. Drug Discov. 16 , 19–34 (2017).

Grygorenko, O. O. et al. Generating multibillion chemical space of readily accessible screening compounds. iScience 23 , 101681 (2020).

Lyu, J. et al. Ultra-large library docking for discovering new chemotypes. Nature 566 , 224–229 (2019). This is ultra-large docking study also carefully assessed the advantages and potential pitfalls of expanding chemical space .

Stein, R. M. et al. Virtual discovery of melatonin receptor ligands to modulate circadian rhythms. Nature 579 , 609–614 (2020). This study shows ultra-large docking that resulted in subnanomolar hits for a GPCR .

Alon, A. et al. Structures of the sigma2 receptor enable docking for bioactive ligand discovery. Nature 600 , 759–764 (2021).

Gorgulla, C. et al. An open-source drug discovery platform enables ultra-large virtual screens. Nature 580 , 663–668 (2020). This study shows an iterative library filtering as a first approach to accelerate ultra-large virtual screening .

Gorgulla, C. et al. A multi-pronged approach targeting SARS-CoV-2 proteins using ultra-large virtual screening. iScience 24 , 102021 (2021).

Graff, D. E., Shakhnovich, E. I. & Coley, C. W. Accelerating high-throughput virtual screening through molecular pool-based active learning. Chem. Sci. 12 , 7866–7881 (2021). This study introduces acceleration of ultra-large screening by iteratively combining DL and docking .

Sadybekov, A. A. et al. Synthon-based ligand discovery in virtual libraries of over 11 billion compounds. Nature 601 , 452–459 (2022). This study introduces the modular concept for screening gigascale spaces, V-SYNTHES, and validates its performance on GPCR and kinase targets .

Yang, X., Wang, Y., Byrne, R., Schneider, G. & Yang, S. Concepts of artificial intelligence for computer-assisted drug discovery. Chem. Rev. 119 , 10520–10594 (2019).

Pandey, M. et al. The transformational role of GPU computing and deep learning in drug discovery. Nat. Mach. Intell. 4 , 211–221 (2022).

Blay, V., Tolani, B., Ho, S. P. & Arkin, M. R. High-throughput screening: today’s biochemical and cell-based approaches. Drug Discov. Today 25 , 1807–1821 (2020).

Bohacek, R. S., McMartin, C. & Guida, W. C. The art and practice of structure-based drug design: a molecular modeling perspective. Med. Res. Rev. 16 , 3–50 (1996).

Lyu, J., Irwin, J. J. & Shoichet, B. K. Modeling the expansion of virtual screening libraries. Nat. Chem. Biol. https://doi.org/10.1038/s41589-022-01234-w (2023).

Article   PubMed   Google Scholar  

Tomberg, A. & Boström, J. Can easy chemistry produce complex, diverse, and novel molecules? Drug Discov. Today 25 , 2174–2181 (2020).

Muchiri, R. N. & van Breemen, R. B. Affinity selection–mass spectrometry for the discovery of pharmacologically active compounds from combinatorial libraries and natural products. J. Mass Spectrom. 56 , e4647 (2021).

Fitzgerald, P. R. & Paegel, B. M. DNA-encoded chemistry: drug discovery from a few good reactions. Chem. Rev. 121 , 7155–7177 (2021).

Neri, D. & Lerner, R. A. DNA-encoded chemical libraries: a selection system based on endowing organic compounds with amplifiable information. Annu. Rev. Biochem. 87 , 479–502 (2018).

McCloskey, K. et al. Machine learning on DNA-encoded libraries: a new paradigm for hit finding. J. Med. Chem. 63 , 8857–8866 (2020).

Walters, W. P. Virtual chemical libraries. J. Med. Chem. 62 , 1116–1124 (2019).

Warr, W. A., Nicklaus, M. C., Nicolaou, C. A. & Rarey, M. Exploration of ultralarge compound collections for drug discovery. J. Chem. Inf. Model. 62 , 2021–2034 (2022). This is a comprehensive review of the history and recent developments of the on-demand and generative chemical spaces .

Enamine. REAL Database. Enamine https://enamine.net/compound-collections/real-compounds/real-database (2020).

Hartenfeller, M. et al. A collection of robust organic synthesis reactions for in silico molecule design. J. Chem. Inf. Model. 51 , 3093–3098 (2011).

Patel, H. et al. SAVI, in silico generation of billions of easily synthesizable compounds through expert-system type rules. Sci. Data 7 , 384 (2020).

Article   PubMed   PubMed Central   Google Scholar  

Irwin, J. J. et al. ZINC20-A free ultralarge-scale chemical database for ligand discovery. J. Chem. Inf. Model. 60 , 6065–6073 (2020).

Hu, Q. et al. Pfizer Global Virtual Library (PGVL): a chemistry design tool powered by experimentally validated parallel synthesis information. ACS Comb. Sci. 14 , 579–589 (2012).

Nicolaou, C. A., Watson, I. A., Hu, H. & Wang, J. The Proximal Lilly Collection: mapping, exploring and exploiting feasible chemical space. J. Chem. Inf. Model. 56 , 1253–1266 (2016).

Enamine. REAL Space. Enamine https://enamine.net/library-synthesis/real-compounds/real-space-navigator (2022).

Bellmann, L., Penner, P., Gastreich, M. & Rarey, M. Comparison of combinatorial fragment spaces and its application to ultralarge make-on-demand compound catalogs. J. Chem. Inf. Model. 62 , 553–566 (2022).

Enamine. Make on-demand building blocks (MADE). Enamine https://enamine.net/building-blocks/made-building-blocks (2022).

Hoffmann, T. & Gastreich, M. The next level in chemical space navigation: going far beyond enumerable compound libraries. Drug Discov. Today 24 , 1148–1156 (2019).

Ruddigkeit, L., van Deursen, R., Blum, L. C. & Reymond, J.-L. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. J. Chem. Inf. Model. 52 , 2864–2875 (2012).

Vanhaelen, Q., Lin, Y.-C. & Zhavoronkov, A. The advent of generative chemistry. ACS Med. Chem. Lett. 11 , 1496–1505 (2020).

Ballante, F., Kooistra, A. J., Kampen, S., de Graaf, C. & Carlsson, J. Structure-based virtual screening for ligands of G protein-coupled receptors: what can molecular docking do for you? Pharmacol. Rev. 73 , 527–565 (2021).

Neves, M. A., Totrov, M. & Abagyan, R. Docking and scoring with ICM: the benchmarking results and strategies for improvement. J. Comput. Aided Mol. Des. 26 , 675–686 (2012).

Meiler, J. & Baker, D. ROSETTALIGAND: protein-small molecule docking with full side-chain flexibility. Proteins 65 , 538–548 (2006).

Lorber, D. M. & Shoichet, B. K. Flexible ligand docking using conformational ensembles. Protein Sci. 7 , 938–950 (1998).

Trott, O. & Olson, A. J. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31 , 455–461 (2010).

CAS   PubMed   PubMed Central   Google Scholar  

Halgren, T. A. et al. Glide: a new approach for rapid, accurate docking and scoring. 2. Enrichment factors in database screening. J. Med. Chem. 47 , 1750–1759 (2004).

Friesner, R. A. et al. Glide: a new approach for rapid, accurate docking and scoring. 1. Method and assessment of docking accuracy. J. Med. Chem. 47 , 1739–1749 (2004).

Gaieb, Z. et al. D3R grand challenge 3: blind prediction of protein-ligand poses and affinity rankings. J. Comput. Aided Mol. Des. 33 , 1–18 (2019).

Parks, C. D. et al. D3R grand challenge 4: blind prediction of protein-ligand poses, affinity rankings, and relative binding free energies. J. Comput. Aided Mol. Des. 34 , 99–119 (2020).

Bender, B. J. et al. A practical guide to large-scale docking. Nat. Protoc. 16 , 4799–4832 (2021).

Manglik, A. et al. Structure-based discovery of opioid analgesics with reduced side effects. Nature 537 , 185–190 (2016).

Cerón-Carrasco, J. P. When virtual screening yields inactive drugs: dealing with false theoretical friends. ChemMedChem 17 , e202200278 (2022).

PubMed   PubMed Central   Google Scholar  

Rossetti, G. G. et al. Non-covalent SARS-CoV-2 M pro inhibitors developed from in silico screen hits. Sci. Rep. 12 , 2505 (2022).

Luttens, A. et al. Ultralarge virtual screening identifies SARS-CoV-2 main protease inhibitors with broad-spectrum activity against coronaviruses. J. Am. Chem. Soc. 144 , 2905–2920 (2022). This study compares fragment-based and ultra-large screening-based discovery of lead candidates for the challenging target .

Owen, D. R. et al. An oral SARS-CoV-2 M pro inhibitor clinical candidate for the treatment of COVID-19. Science 374 , 1586–1593 (2021).

Böhm, H.-J. The computer program LUDI: a new method for the de novo design of enzyme inhibitors. J. Comput. Aided Mol. Des. 6 , 61–78 (1992).

Article   ADS   PubMed   Google Scholar  

Beroza, P. et al. Chemical space docking enables large-scale structure-based virtual screening to discover ROCK1 kinase inhibitors. Nat. Commun. 13 , 6447 (2022).

Jumper, J. et al. Applying and improving AlphaFold at CASP14. Proteins 89 , 1711–1721 (2021).

Vamathevan, J. et al. Applications of machine learning in drug discovery and development. Nat. Rev. Drug Discov. 18 , 463–477 (2019).

Schneider, P. et al. Rethinking drug design in the artificial intelligence era. Nat. Rev. Drug Discov. 19 , 353–364 (2020). This article provides a comprehensive introduction to DL approaches in drug discovery .

Elbadawi, M., Gaisford, S. & Basit, A. W. Advanced machine-learning techniques in drug discovery. Drug Discov. Today 26 , 769–777 (2021).

Bender, A. & Cortés-Ciriano, I. Artificial intelligence in drug discovery: what is realistic, what are illusions? Part 1: ways to make an impact, and why we are not there yet. Drug Discov. Today 26 , 511–524 (2021).

Davies, M. et al. Improving the accuracy of predicted human pharmacokinetics: lessons learned from the AstraZeneca drug pipeline over two decades. Trends Pharmacol. Sci. 41 , 390–408 (2020).

Schneckener, S. et al. Prediction of oral bioavailability in rats: transferring insights from in vitro correlations to (deep) machine learning models using in silico model outputs and chemical structure parameters. J. Chem. Inf. Model. 59 , 4893–4905 (2019).

Cherkasov, A. et al. QSAR modeling: where have you been? Where are you going to? J. Med. Chem. 57 , 4977–5010 (2014).

Keiser, M. J. et al. Predicting new molecular targets for known drugs. Nature 462 , 175–181 (2009).

Guney, E., Menche, J., Vidal, M. & Barábasi, A.-L. Network-based in silico drug efficacy screening. Nat. Commun. 7 , 10331 (2016).

Cichońska, A. et al. Crowdsourced mapping of unexplored target space of kinase inhibitors. Nat. Commun. 12 , 3307 (2021).

Gaulton, A. et al. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res. 40 , D1100–D1107 (2012).

Tang, J. et al. Drug Target Commons: a community effort to build a consensus knowledge base for drug–target interactions. Cell Chem. Biol. 25 , 224–229.e222 (2018).

Liu, Z. et al. PDB-wide collection of binding data: current status of the PDBbind database. Bioinformatics 31 , 405–412 (2015).

Gaudelet, T. et al. Utilizing graph machine learning within drug discovery and development. Brief. Bioinform. 22 , bbab159 (2021).

Son, J. & Kim, D. Development of a graph convolutional neural network model for efficient prediction of protein–ligand binding affinities. PLoS ONE 16 , e0249404 (2021).

Stepniewska-Dziubinska, M. M., Zielenkiewicz, P. & Siedlecki, P. Improving detection of protein–ligand binding sites with 3D segmentation. Sci. Rep. 10 , 5035 (2020).

Jiménez, J., Škalič, M., Martínez-Rosell, G. & De Fabritiis, G. KDEEP: protein–ligand absolute binding affinity prediction via 3D-convolutional neural networks. J. Chem. Inf. Model. 58 , 287–296 (2018).

Jones, D. et al. Improved protein–ligand binding affinity prediction with structure-based deep fusion inference. J. Chem. Inf. Model. 61 , 1583–1592 (2021).

Volkov, M. et al. On the frustration to predict binding affinities from protein–ligand structures with deep neural networks. J. Med. Chem. 65 , 7946–7958 (2022).

Roberts, M. et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans. Nat. Mach. Intell. 3 , 199–217 (2021).

Beker, W. et al. Machine learning may sometimes simply capture literature popularity trends: a case study of heterocyclic Suzuki–Miyaura coupling. J. Am. Chem. Soc. 144 , 4819–4827 (2022).

Yu, B. & Kumbier, K. Veridical data science. Proc. Natl Acad. Sci. USA 117 , 3920–3929 (2020). This perspective article lays a foundation for veridical AI .

Article   ADS   MathSciNet   CAS   PubMed   PubMed Central   MATH   Google Scholar  

Ng, A., Laird, D. & He, L. Data-centric AI competition. DeepLearning AI https://https-deeplearning-ai.github.io/data-centric-comp/ (2021).

Miranda, L. J. Towards data-centric machine learning: a short review. LJ Miranda https://ljvmiranda921.github.io/notebook/2021/07/30/data-centric-ml/ (2021).

Jiménez-Luna, J., Grisoni, F. & Schneider, G. Drug discovery with explainable artificial intelligence. Nat. Mach. Intell. 2 , 573–584 (2020).

Wills, T. AI drug discovery: assessing the first AI-designed drug candidates to go into human clinical trials. CAS https://www.cas.org/resources/cas-insights/drug-discovery/ai-designed-drug-candidates (2022).

Meng, C., Seo, S., Cao, D., Griesemer, S. & Liu, Y. When physics meets machine learning: a survey of physics-informed machine learning. Preprint at https://doi.org/10.48550/arXiv.2203.16797 (2022).

Thomas, M., Bender, A. & de Graaf, C. Integrating structure-based approaches in generative molecular design. Curr. Opin. Struct. Biol. 79 , 102559 (2023).

Ackloo, S. et al. CACHE (Critical Assessment of Computational Hit-finding Experiments): a public–private partnership benchmarking initiative to enable the development of computational methods for hit-finding. Nat. Rev. Chem. 6 , 287–295 (2022). This is an important community initiative for comprehensive performance assessment of computational drug discovery methods .

MolSoft. Rapid isostere discovery engine (RIDE). MolSoft http://molsoft.com/RIDE.html (2022).

Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596 , 583–589 (2021).

Tunyasuvunakool, K. et al. Highly accurate protein structure prediction for the human proteome. Nature 596 , 590–596 (2021).

Baek, M. et al. Accurate prediction of protein structures and interactions using a three-track neural network. Science 373 , 871–876 (2021).

Akdel, M. A structural biology community assessment of AlphaFold2 applications. Nat. Struct. Mol. Biol. 29 , 1056–1067 (2022).

Katritch, V., Rueda, M. & Abagyan, R. Ligand-guided receptor optimization. Methods Mol. Biol. 857 , 189–205 (2012).

Carlsson, J. et al. Ligand discovery from a dopamine D 3 receptor homology model and crystal structure. Nat. Chem. Biol. 7 , 769–778 (2011).

Ren, F. et al. AlphaFold accelerates artificial intelligence powered drug discovery: efficient discovery of a novel cyclin-dependent kinase 20 (CDK20) small molecule inhibitor. Chem. Sci. 14 , 1443–1452 (2023).

Zhang, Y. et al. Benchmarking refined and unrefined AlphaFold2 structures for hit discovery. J. Chem. Inf. Model. 63 , 1656–1667 (2023).

He, X.-h. et al. AlphaFold2 versus experimental structures: evaluation on G protein-coupled receptors. Acta Pharmacol. Sin. 44 , 1–7 (2022).

Wong, F. et al. Benchmarking AlphaFold-enabled molecular docking predictions for antibiotic discovery. Mol. Syst. Biol. 18 , e11081 (2022).

Hekkelman, M. L., de Vries, I., Joosten, R. P. & Perrakis, A. AlphaFill: enriching AlphaFold models with ligands and cofactors. Nat. Methods 20 , 205–213 (2023).

Yang, Y. et al. Efficient exploration of chemical space with docking and deep learning. J. Chem. Theory Comput. 17 , 7106–7119 (2021).

Gentile, F. et al. Artificial intelligence-enabled virtual screening of ultra-large chemical libraries with deep docking. Nat. Protoc. 17 , 672–697 (2022).

Schindler, C. E. M. et al. Large-scale assessment of binding free energy calculations in active drug discovery projects. J. Chem. Inf. Model. 60 , 5457–5474 (2020).

Chen, W., Cui, D., Abel, R., Friesner, R. A. & Wang, L. Accurate calculation of absolute protein–ligand binding free energies. Preprint at https://doi.org/10.26434/chemrxiv-2022-2t0dq-v2 (2022).

Khalak, Y. et al. Alchemical absolute protein–ligand binding free energies for drug design. Chem. Sci. 12 , 13958–13971 (2021).

Cournia, Z. et al. Rigorous free energy simulations in virtual screening. J. Chem. Inf. Model. 60 , 4153–4169 (2020).

xREAL Chemical Space, Chemspace , https://chem-space.com/services#v-synthes (2023).

Rarey, M., Nicklaus, M. C. & Warr, W. Special issue on reaction informatics and chemical space. J. Chem. Inf. Model. 62 , 2009–2010 (2022).

Zabolotna, Y. et al. A close-up look at the chemical space of commercially available building blocks for medicinal chemistry. J. Chem. Inf. Model. 62 , 2171–2185 (2022).

Kaplan, A. L. et al. Bespoke library docking for 5-HT 2A receptor agonists with antidepressant activity. Nature 610 , 582–591 (2022).

Krasiński, A., Fokin, V. V. & Sharpless, K. B. Direct synthesis of 1,5-disubstituted-4-magnesio-1,2,3-triazoles, revisited. Org. Lett. 6 , 1237–1240 (2004).

The Nobel Prize in Chemistry. nobelprize.org , https://www.nobelprize.org/prizes/chemistry/2022/summary/ (2022)

Dong, J., Sharpless, K. B., Kwisnek, L., Oakdale, J. S. & Fokin, V. V. SuFEx-based synthesis of polysulfates. Angew. Chem. Int. Ed. Engl. 53 , 9466–9470 (2014).

Zhang, B. et al. Ni-electrocatalytic C sp 3 -C sp 3 doubly decarboxylative coupling. Nature 606 , 313–318 (2022).

Gillis, E. P. & Burke, M. D. Iterative cross-couplng with MIDA boronates: towards a general platform for small molecule synthesis. Aldrichimica Acta 42 , 17–27 (2009).

Blair, D. J. et al. Automated iterative C sp 3 –C bond formation. Nature 604 , 92–97 (2022). This study provides a chemical approach for automation of the C–C bond formation in small-molecule synthesis .

Li, J. et al. Synthesis of many different types of organic small molecules using one automated process. Science 347 , 1221–1226 (2015).

Trobe, M. & Burke, M. D. The molecular industrial revolution: automated synthesis of small molecules. Angew. Chem. Int. Ed. 57 , 4192–4214 (2018).

Bubliauskas, A. et al. Digitizing chemical synthesis in 3D printed reactionware. Angew. Chem. Int. Ed. 61 , e202116108 (2022).

Molga, K. et al. A computer algorithm to discover iterative sequences of organic reactions. Nat. Synth. 1 , 49–58 (2022).

Article   ADS   Google Scholar  

Segler, M. H. S., Preuss, M. & Waller, M. P. Planning chemical syntheses with deep neural networks and symbolic AI. Nature 555 , 604–610 (2018).

Goldman, B., Kearnes, S., Kramer, T., Riley, P. & Walters, W. P. Defining levels of automated chemical design. J. Med. Chem. 65 , 7073–7087 (2022).

Grisoni, F. et al. Combining generative artificial intelligence and on-chip synthesis for de novo drug design. Sci. Adv. 7 , eabg3338 (2021).

Wagner, J. R. et al. Emerging computational methods for the rational discovery of allosteric drugs. Chem. Rev. 116 , 6370–6390 (2016).

Davis, B. J. & Hubbard, R. E. in Structural Biology in Drug Discovery (ed. Renaud, J.-P.) 79–98 (2020).

de Souza Neto, L. R. et al. In silico strategies to support fragment-to-lead optimization in drug discovery. Front. Chem. 8 , 93 (2020).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Saur, M. et al. Fragment-based drug discovery using cryo-EM. Drug Discov. Today 25 , 485–490 (2020).

Kuljanin, M. et al. Reimagining high-throughput profiling of reactive cysteines for cell-based screening of large electrophile libraries. Nat. Biotechnol. 39 , 630–641 (2021).

Muegge, I., Martin, Y. C., Hajduk, P. J. & Fesik, S. W. Evaluation of PMF scoring in docking weak ligands to the FK506 binding protein. J. Med. Chem. 42 , 2498–2503 (1999).

Schuller, M. et al. Fragment binding to the Nsp3 macrodomain of SARS-CoV-2 identified through crystallographic screening and computational docking. Sci. Adv. 7 , eabf8711 (2021).

Gahbauer, S. et al. Iterative computational design and crystallographic screening identifies potent inhibitors targeting the Nsp3 macrodomain of SARS-CoV-2. Proc. Natl Acad. Sci. USA 120 , e2212931120 (2023). This article demonstrates the application of both hybrid fragment screening-and-merging design and ultra-large library screening to a challenging viral target .

Achdout, H. et al. Open science discovery of oral non-covalent SARS-CoV-2 main protease inhibitor therapeutics. Preprint at https://doi.org/10.1101/2020.10.29.339317 (2022).

Jin, Z. et al. Structure of M pro from SARS-CoV-2 and discovery of its inhibitors. Nature 582 , 289–293 (2020).

Ton, A. T., Gentile, F., Hsing, M., Ban, F. & Cherkasov, A. Rapid identification of potential inhibitors of SARS-CoV-2 main protease by deep docking of 1.3 billion compounds. Mol. Inform. 39 , e2000028 (2020).

Frye, L., Bhat, S., Akinsanya, K. & Abel, R. From computer-aided drug discovery to computer-driven drug discovery. Drug Discov. Today Technol. 39 , 111–117 (2021).

Wadman, M. FDA no longer needs to require animal tests before human drug trials. Science , https://doi.org/10.1126/science.adg6264 (2023).

Stiefl, N. et al. FOCUS—development of a global communication and modeling platform for applied and computational medicinal chemists. J. Chem. Inf. Model. 55 , 896–908 (2015).

Schrodinger. LiveDesign. Schrodinger https://www.schrodinger.com/sites/default/files/general_ld_rgb_080119_forweb.pdf . (accessed 5 April 2023)

Müller, S. et al. Target 2035—update on the quest for a probe for every protein. RSC Med. Chem. 13 , 13–21 (2022).

Verdonk, M. L., Cole, J. C., Hartshorn, M. J., Murray, C. W. & Taylor, R. D. Improved protein–ligand docking using GOLD. Proteins 52 , 609–623 (2003).

Miller, E. B. et al. Reliable and accurate solution to the induced fit docking problem for protein–ligand binding. J. Chem. Theory Comput. 17 , 2630–2639 (2021).

Chemical space docking. BioSolveIT https://www.biosolveit.de/application-academy/chemical-space-docking/ (2022).

Cavasotto, C. N. in Quantum Mechanics in Drug Discovery (ed. Heifetz, A.) 257–268 (Springer, 2020).

Dixon, S. L. et al. AutoQSAR: an automated machine learning tool for best-practice quantitative structure–activity relationship modeling. Future Med. Chem. 8 , 1825–1839 (2016).

Totrov, M. Atomic property fields: generalized 3D pharmacophoric potential for automated ligand superposition, pharmacophore elucidation and 3D QSAR. Chem. Biol. Drug Des. 71 , 15–27 (2008).

Schaller, D. et al. Next generation 3D pharmacophore modeling. WIREs Comput. Mol. Sci. 10 , e1468 (2020).

Chakravarti, S. K. & Alla, S. R. M. Descriptor free QSAR modeling using deep learning with long short-term memory neural networks. Front. Artif. Intell. 2 , 17 (2019).

Deng, Z., Chuaqui, C. & Singh, J. Structural interaction fingerprint (SIFt): a novel method for analyzing three-dimensional protein-ligand binding interactions. J. Med. Chem. 47 , 337–344 (2004).

Download references

Acknowledgements

We thank A. Brooun, A. A. Sadybekov, S. Majumdar, M. M. Babu, Y. Moroz and V. Cherezov for helpful discussions.

Author information

Authors and affiliations.

Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA

Anastasiia V. Sadybekov & Vsevolod Katritch

Center for New Technologies in Drug Discovery and Development, Bridge Institute, Michelson Center for Convergent Biosciences, University of Southern California, Los Angeles, CA, USA

Department of Chemistry, University of Southern California, Los Angeles, CA, USA

Vsevolod Katritch

You can also search for this author in PubMed   Google Scholar

Contributions

All authors contributed to the writing of the manuscript.

Corresponding author

Correspondence to Vsevolod Katritch .

Ethics declarations

Competing interests.

The University of Southern California are in the process of applying for a patent application (no. 63159888) covering the V-SYNTHES method that lists V.K. as a co-inventor.

Peer review

Peer review information.

Nature thanks Alexander Tropsha and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article.

Sadybekov, A.V., Katritch, V. Computational approaches streamlining drug discovery. Nature 616 , 673–685 (2023). https://doi.org/10.1038/s41586-023-05905-z

Download citation

Received : 21 April 2022

Accepted : 01 March 2023

Published : 26 April 2023

Issue Date : 27 April 2023

DOI : https://doi.org/10.1038/s41586-023-05905-z

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Natural product-inspired strategies towards the discovery of novel bioactive molecules.

  • Sunita Gagare
  • Pranita Patil
  • Ashish Jain

Future Journal of Pharmaceutical Sciences (2024)

Small molecule autoencoders: architecture engineering to optimize latent space utility and sustainability

  • Marie Oestreich
  • Matthias Becker

Journal of Cheminformatics (2024)

Structure prediction of protein-ligand complexes from sequence information with Umol

  • Patrick Bryant
  • Atharva Kelkar

Nature Communications (2024)

Neural multi-task learning in drug design

  • Stephan Allenspach
  • Jan A. Hiss
  • Gisbert Schneider

Nature Machine Intelligence (2024)

Genetics of human brain development

  • Hongjun Song
  • Guo-li Ming

Nature Reviews Genetics (2024)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

computer aided drug design research

  • Pharmaceutical Chemistry
  • Medicinal Chemistry
  • Drug Design

COMPUTER AIDED DRUG DESIGN: AN OVERVIEW

  • September 2018
  • Journal of Drug Delivery and Therapeutics 8(5):504-509
  • 8(5):504-509
  • CC BY-NC 4.0
  • This person is not on ResearchGate, or hasn't claimed this research yet.

Binoy Kumar Singh at All India Institute of Medical Sciences, Raipur

  • All India Institute of Medical Sciences, Raipur

Abstract and Figures

Outline of process involved in LBDD 7 .

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations
  • CELL BIOCHEM BIOPHYS

Jhashanath Adhikari Subin

  • Ram Lal Swagat Shrestha
  • Janki Darlami
  • Shweta Sharma

Ameji John

  • Pratiksha Akki

Kusum s Akki

  • Akshita Arora
  • Simranjeet Kaur
  • Amandeep Singh
  • CRIT REV FOOD SCI
  • Natalia Rodríguez Longarela

María Paredes Ramos

  • Santanu Paul
  • Akanksha Kashyap
  • Mohd. Zaheeruddin Beg

Shraddha Tivari

  • CHEM ENG RES DES
  • Seen Ye Lim
  • Nishanth G. Chemmangattuvalappil
  • Vui Soon Chok

Lik Yin Ng

  • Camila Silva de Magalhães

Laurent E Dardenne

  • Int. J. Biomed. Data Min.

Antoine Daina

  • Vivienne Gerritsen

Vincent Zoete

  • Arkendu Chatterjee
  • Somenath Bhattacharya

Rudy Potenzone

  • A. J. Hopfinger

Stephani Joy Yarcia Macalino

  • Sunhye Hong

Sun Choi

  • Expet Opin Drug Discov

Supratik Kar

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Computer-Aided Drug Design Methods – An update

1 Computer-Aided Drug Design Center, Department of Pharmaceutical Sciences, School of Pharmacy, University of Maryland, Baltimore, Maryland

2 Institute for Bioscience and Biotechnology Research (IBBR), Rockville, Maryland

3 Center for Biomolecular Therapeutics (CBT), School of Medicine, University of Maryland, Baltimore, Maryland

David J. Weber

Alexander d. mackerell, jr..

4 SilcsBio LLC, Baltimore, Maryland

Computer-Aided drug design (CADD) approaches are playing an increasingly important roles in understanding the fundamentals of ligand-receptor interactions and helping medicinal chemists design therapeutics. About five years ago, we presented a chapter devoted to an overview of CADD methods and covered typical CADD protocols including structure-based drug design (SBDD) and ligand-based drug design (LBDD) approaches that were frequently used in the antibiotic drug design process. Advances in computational hardware and algorithms and emerging CADD methods are enhancing the accuracy and ability of CADD in drug design and development. In this chapter an update to our previous chapter is provided with focus on new CADD approaches from our laboratory and other peers that can be employed to facilitate the development of antibiotic therapeutics.

1. Introduction

Following the significant milestone that penicillin represents in human medical history, the battle between humans and bacteria has never settled down and becomes even more vigorous caused by the steady rise of drug resistance. This problem persists despite the availability of large number of antibiotic drugs, indicating the need for more novel antibiotic drug classes to overcome the resistance problem ( 1 , 2 ). Towards this need, computer-aided drug design (CADD) methods are very helpful tools and have been regularly to study structure and function relationships of antibiotic targets that contribute to drug resistance and to search for new antibiotics, at a relatively cheaper cost compared to using only experimental wet lab methods owing to the powerful modern computational resources ( 3 , 4 ).

Previously, we published a chapter in the first edition of this book that was dedicated to an overview of CADD and included information on routinely utilized protocols, especially tools used in our laborotary, towards the design of antibotic theraputics ( 4 ). Applications of these CADD methods in real life studies were also presented. Since then CADD methods have been employed extensively to facilitate the development of novel antibiotics by the computational chemistry community and us for the past five years. This included studies on the mechanism behind antibiotic resistance that may help to guide the design of new antibiotic drugs to overcome such resistance. For examples, Stote et. al. studied the mechanism of a S222T mutation induced resistance of Deoxy-D-xylulose 5-phosphate reductoisomerase (DXR) to fosmidomycin using molecular dynamics (MD) simulations ( 5 ). The MD simulations revealed the structural and energetic basis of the single mutation that induced resistance shedding light on the development of a new antibiotic compounds targeting DXR. Verma et. al. recently explored the molecular mechanism of polymyxin E (colistin) resistance in mobile colistin-resistant ( mcr -1) bacteria ( 6 ). Colistin is the only FDA-approved membrane-active drug to tackle Gram-negative bacteria despite its high toxicity. However, appearance of mcr -1 bacteria identified in 2015 has worsened the situation ( 7 ). MD simulations revealed the mechanism of interruption to the outer membrane of normal Gram-negative bacteria caused by colistin and dissected the mechanism of resistance in mcr -1 bacteria to the action of colistin and other cationic peptides due to the covalently attached phosphoethanolamine group modification in lipids. The simulation results provide clues for the design of new membrane disruptors to treat mcr -1 infections.

Identification and developing drug candidates against novel antibiotic targets for specific bacteria still serves as an important alternative to overcome the antibiotic resistance issue. Heme Oxygenase (HemO) is a novel antibiotic target involved in the metabolism of heme by bacteria as required to access iron. Previous bioassay data supported that Pseudomonas aeruginosa HemO ( pa -HemO) inhibitors, by blocking a key mechanism of the iron acquisition system, represents a promising therapeutic target for pa infections ( 8 , 9 ). Collaborating with the Wilks laboratory, our group has continued to apply CADD methods to optimize pa -HemO small molecule inhibitors. In a recent study, a series of inhibitors based on a previously established scaffold were designed and tested to develop a structure-activity relationship (SAR) ( 10 ). Binding orientations and affinities were predicted and used to interpret SAR. Good correlation between predicted affinities and bioassay potency data was observed and validated the utility of the computational model in the further refinement of the current scaffold targeting pa -HemO. In another recent study, the structure of the Clostridioides difficile ( C. diff ) binary toxin (CDTa and CDTb), which is associated with the most serious outbreaks of drug-resistant C. diff infection in the 21st century, was solved ( 11 ). Using normal mode analysis, we explored the possible mechanism behind the translocation of CDTa, which is the enzymatic component, helped by the CDTb that serves as the pore-forming or delivery subunit. Such analysis helps to elucidate the C. diff binary toxin infection mechanism and shape potential therapeutic strategies in the future ( 11 ).

Searching for new antibiotics against established targets are still continuing where CADD methods are playing important roles. Our laboratory together with de Leeuw and coworkers are continuing the design of novel agents against bacteria cell wall biosynthesis ( 12 , 13 ). In a recent study, SAR for a series of compounds that have benzothiazole indolene scaffold was pursued targeting the essential bacterial cell wall precursor molecule Lipid II ( 14 ). Using MD simulations, we predicted binding free energies and binding modes of Lipid II binders and gained atomic details on the interactions between designed molecules with Lipid II, information that will be useful for further development of antibacterial therapeutics.

β-lactamase was used as a target in combination therapy against multi-drug resistant Enterobacteriaceae ( 15 ). β-lactamase inhibitors may help to inactivate the β-lactamase enzyme of the pathogen and restore the function of β-lactam antibiotics to overcome the enzyme-mediated resistance. Using the full Site Identification by Ligand Competitive Saturation (SILCS) ( 16 – 18 ) technology developed in our laboratory ( Figure 1 ), we identified β-lactamase CMY-10 inhibitors with our experimental collaborators ( 19 ). The SILCS-based CADD method was fully described in the first edition of our chapter ( 4 ) as well as another chapter previously published in this same book series ( 20 ). The de novo drug design started from running SILCS simulations, which conduct all-atom explicit-solvent combined Grand Canonical Monte Carlo/MD (GCMC/MD) simulations that include small organic solutes such as propane, benzene, methanol and others, to identify 3D functional-group binding patterns (FragMaps) on the CMY-10 protein target. Then SILCS-Pharm ( 21 , 22 ) was conducted to extract important binding patterns from FragMaps and turn them into pharmacophore features at the R1 and R2 subsites of CMY-10. Pharmacophore models were then constructed and used to initiate virtual screenings (VS) against over 750,000 commercially available compounds. Top 10,000 hit compounds from the initial pharmacophore screen were selected for SILCS-Monte Carlo (SILCS-MC) sampling for further binding pose refinement and estimation of the binding affinity based on the ligand grid free energy (LGFE) evaluation. Fingerprint based similarity clustering was then conducted to maximize the chemical diversity of the ranked compounds to be selected for bioassay testing. Several compounds leading to decreased β-lactamase activity were confirmed by bioassay tests. The best hit compound was then subject to a similarity search for chemically similar analogs and more inhibitory compounds were identified. Such identified non-β-lactam-based β-lactamase inhibitors have the potential to be used in combination therapy with lactam-based antibiotics against multi-drug resistant clinical isolates.

An external file that holds a picture, illustration, etc.
Object name is nihms-1860696-f0001.jpg

SILCS oriented CADD workflow developed in our laboratory and used in the CMY-10 project ( 19 ). Wet-lab and CADD techniques are colored in red and blue, respectively. Boxes with solid lines indicate methods used in the CMY-10 study while boxes with dashed lines mark methods not used in the CMY-10 study but in other studies. Double arrows indicate the two techniques can be used interactively in several iterative drug design rounds.

With the fast development of more powerful computing hardware, expensive algorithms such as free energy perturbation methods ( 23 ), which can only be used to finely tune the drug candidates at the lead optimization stage, become much more affordable and have been routinely used in a range of applications ( 24 – 26 ). Alternative CADD methods represent novel solutions that exploit the interactions between drugs and targets are also seeing wider use. Our laboratory put forward the SILCS methodology as described previously, and information from SILCS can be utilized in many different ways in various aspects of drug discovery ( 16 – 18 ). Significant advancements are developments in machine learning (ML), especially deep learning (DL) based CADD algorithms ( 27 ) owing, in part, to the development of artificial intelligence (AI) methods in other areas ( 28 ).

ML algorithms are not new to the CADD area, but the increasing need for AI in areas such as image recognition and text processing promote powerful novel ML algorithms that can handle vast amount of data ( 29 , 30 ). The refined graphic processing unit (GPU) architecture ( 31 ) and its growing computing power further accelerate the applications of ML, and its adaptations in CADD has erupted in recent years. This includes quite a lot of antibiotic drug development studies employing ML ( 32 , 33 ). For example, Palsson et. al. developed an ML workflow for identifying genetic features driving antibiotic resistance ( 34 ). ML models were trained against the resistance profiles of 14 antibiotics across three urgent pathogens using genome sequences as inputs. The ML workflow was verified to be able to generate models not only capable of predicting resistance profiles but also identifying the responsible genes. In another study ( 35 ), Collins et. al. conducted an antibiotic activity assay screen of near 2,300 chemically diverse FDA approved and natural product compounds targeting E. coli . Deep neural network-based DL models were then trained to predict the inhibition probabilities from the chemical structures and properties of tested compounds alone. The resulting DL model was used to screen the Drug Repurposing Hub database ( 36 ) and a known c-Jun N-terminal kinase inhibitor SU3327 was predicted to be an antibiotic targeting E. coli . This molecule is structurally divergent from conventional antibiotics and was confirmed to display bactericidal activity against a wide phylogenetic spectrum of pathogens, demonstrating how ML can guide the antibiotics discovery.

In the rest of this chapter which serves as an update to our first edition, recent progresses in our laboratory toward development of novel SILCS based CADD methods will be overviewed. Typical ML method will also be covered. Readers are highly recommended to refer to the first edition of this chapter ( 4 ) for basic CADD concepts and classical protocols to gain a fundamental understanding about CADD methods towards antibiotics development.

2. Materials

Similar to other computational sciences, the two basic materials in CADD are the specific hardware and software that are suitable for the current study of interest. The hardware, which refers to the computational resources, can be established locally, for example, computer clusters being purchased and equipped in the working place, or obtained on-the-fly, e.g., computing times applied from public supercomputer resources such as XSEDE ( 37 ) or purchased from private companies such as popularized cloud computing on the Amazon Web Services (AWS) or Microsoft Azure cloud platforms ( 38 ). On the other side, software requirements are varied depending on the specific study goals. In the first edition of this book, we introduced fundamental tools for CADD. Here, an update is provided with an emphasis on the common CADD tools used in our laboratory.

  • MD simulation packages such as CHARMM ( 39 ), GROMACS ( 40 ), NAMD ( 41 ) and OpenMM ( 42 ) among others, are continually being optimized. Better computational performance is reached through algorithm refinement and software engineering as well as optimized computing using GPUs ( 41 – 44 ). New MD programs developed in the GPU era are also emerging and get more attentions. For example, ACEMD ( 45 ) which was optimized for use on Nvidia GPUs maximizes its performance by running the full computation on GPUs rather than dividing the job between CPUs and GPUs.
  • Target structures are required for SBDD method and can be downloaded from the Protein Data Bank (PDB) ( 46 ) if it has been solved by X-ray crystallography, nuclear magnetic resonance (NMR) or recently matured cryogenic electron microscopy (cryo-EM) techniques ( 47 ). For unsolved protein targets, 3D structures can be predicted using the recently released RoseTTA fold from Baker’s group ( 48 ), and AlphaFold ML model from the Google DeepMind team, which was verified to have the best accuracy among other protein structure prediction methods ( 49 ). For most proteins, predicted structures using the AlphaFold ML model are available to be downloaded from the server hosted by the European Bioinformatics Institute ( https://alphafold.ebi.ac.uk/ ) ( 50 , 51 ).
  • Force fields, which are used to estimate the energies and forces within and between molecules, continue to be refined. This includes the CHARMM ( 52 – 55 ) or AMBER ( 56 , 57 ) families among others to describe both macromolecules, such as the CHARMM36 protein force field, ( 52 , 53 ) and small molecules such as the CHARMM General Force Field (CGenFF) ( 54 , 55 ). To automate the creation of the topologies and parameters for new molecules, program like CGenFF program (see https://cgenff.paramchem.org ) ( 58 , 59 ) can be used. And for experts who want to further optimize force field parameters, a standalone package named FFParam is available for CHARMM force field parametrization ( 60 ). In addition to the additive force fields that have a long history, emerging polarizable force fields such as the CHARMM Drude force field ( 61 , 62 ) and AMOEBA ( 63 ) are now available that treat electronic polarization effects explicitly thereby describing the interactions between molecules more realistically. The increased computational cost introduced by polarization terms (~4-fold over the additive model with the Drude FF) is gradually being overcome by better computing algorithms and growing GPU ability ( 64 , 65 ). Accordingly, it may be anticipated that polarizable FFs will see routine use in the near future to describe interactions between antibiotics and bacteria targets in CADD. As a further note, new types of force field based on different perspectives, such as Open Force Field ( 66 , 67 ) that is coded by direct chemical perception instead of predefined atom types for atoms; or driven by ML such as PhysNet ( 68 , 69 ), which is based on deep neural networks, are also emerging, even though their capabilities need to be thoroughly tested before their broader use in CADD applications.
  • Virtual database screening (VS) is used to screen large chemical libraries to search for potential small molecule binders for a given macromolecule target. CADD methods such as docking ( 70 ) or pharmacophore modeling ( 71 ) can be adapted for this purpose. For docking, both free software, such as AutoDock Vina ( 72 ) and commercial ones such as GOLD ( 73 ) are available among others ( 74 ). Opensource toolkits with interface to existing docking software are also available to facilitate a more integrated docking based CADD cycle. For example, the open drug discovery toolkit (OOTD) ( 75 ), which is currently interfaced with AutoDock Vina, offers researcher the ease of conducting the full docking workflow including in silico compound library preparation, library filtering, docking pose rescoring, docking performance evaluation and SAR model training. Another example is VirtualFlow ( 76 ), which has similar functions as OOTD but offers an interface to additional docking programs and is built on an optimized architecture to enable efficient parallelization and balanced workload for a better docking job performance against huge chemical libraries. Web-based docking platforms are also available for a convenient use even for non-experts. For example, Webina ( 77 ) offers users the ease of using AutoDock Vina on the web without installation. Another web based docking interface, SeamDock ( 78 ), allows users to select from four docking codes for their docking needs and also provides the ability to share visualization of docking results with other researchers. For structure-based pharmacophore modeling, the open-source program Pharmer ( 79 ) can perform pharmacophore searching efficiently on large databases. It also provides a web interface ZINCPharmer ( 80 ) for interactive environment for the virtual screening of the ZINC or Molport databases using pharmacophores and later the same research group launched another web service called Pharmit ( 81 ) for online pharmacophore VS using user tailored or a variety of pre-loaded databases. It should be noted that for pharmacophore searching, multiple conformations of each molecule in the database are required as well as assignment of the correct protonation and tautomeric states, with the latter requirement true for all in silico databases used for either docking or pharmacophore screening.
  • VS uses chemical libraries to identify small molecules to be tested in biological assays for ligand discovery. While researchers can resort to in-house compounds based on their own chemical synthesis work, purchasing compounds from commercial chemical vendors is a convenient way to assist the discovery at the early stage. ZINC ( 82 ) as well as MolPort ( 83 ) provide such platforms for chemical compound sourcing from various vendors. For de novo drug design, exploration of larger chemical space holds the promise for higher success rates in general. The ultra-large REadily accessible (REAL) ( 84 ) compound library from Enamine represents for the largest purchasable chemical collection currently available. Its REAL Database currently contains 4.1 billion enumerated compounds and can be extended to over 20 billion compounds in the Enamine synthetically accessible database called REAL Space, for which compound synthesis time is ~8 weeks from order to delivery with an average of over 80% of the requested compounds actually being synthesized and delivered. As an alternate to de novo drug development, drug repurposing is a lower cost method which explores existing therapeutics for a new disease indication ( 85 , 86 ). For this purpose, the library of FDA approved drugs can be downloaded from various sources, e.g. as a subset from ZINC ( 82 ). Another comprehensive library of clinical compounds, called Drug Repurposing Hub ( 36 ), provides a hand-curated collection of thousands of approved and in-clinic compounds with annotated identities is also available.
  • Integrated commercial software such as Discovery Studio ( 87 ) and MOE ( 88 ) among others, incorporates a broad range of CADD capabilities. On the open source side, even though quite a lot of choices are available for specific CADD needs, integrated code is rare. However, the commercial packages such as OpenEye ( 89 ) offers no cost license for pure non-commercial research while others provide discounted licensing for academic use. Similarly, the software suite from SilcsBio LLC ( 90 ), which offers end to end drug design capabilities in the context of the SILCS technology is available at no cost to non-commercial research groups. Online platforms that offer integrated CADD capabilities without the hassle of installing on your local machine and do not require advanced computer knowledge are also emerging. One example is PlayMolecule ( 91 ) from Acellera, which offers CADD workflows covers target preparation, binding site identification, force field parametrization, MD simulation, docking as well as ML model generation on the cloud. Most of these services are free to the public with some limitations and full service is also available for purchase. Traditional CADD software companies are also in the transition to offer cloud services, such as OpenEye ( 89 ) launched Orion platform to offer web-based environment for their software.

In addition to the common methods introduced in the first edition of this chapter, additional CADD methods developed recently in our laboratory as well as from other laboratories will be described below.

3.1. Protein structure prediction using AlphaFold

For SBDD, protein 3D structure is required to explore atomic level details of the ligand-protein interactions. When no protein structure is available from the PDB, structure prediction methods such as homology modelling ( 92 ) were used traditionally to generate 3D models. With the surging of AI and related DL techniques, DL driven structure prediction methods such as RoseTTA fold ( 48 ) and AlphaFold ( 49 ) can now predict most protein 3D structures to a level of approaching experimental accuracy. In the recent challenging 14th Critical Assessment of protein Structure Prediction (CASP14), AlphaFold was demonstrated to greatly outperform other methods, and its predictions are competitive with experimental structures in a majority of cases. This promising progress make the initiation of SBDD methods much more feasible. Below are general steps to prepare a protein structure using AlphaFold.

  • Go to the AlphaFold database hosted by the European Molecular Biology Laboratory European Bioinformatics Institute (EMBL-EBI) at https://alphafold.ebi.ac.uk and input the protein name, gene name or UniProt accession name in the search bar. The DeepMind team has already predicted the structures of most known human proteins as well as those of 20 model organisms ( 50 , 51 ) including bacteria such as Escherichia coli and Staphylococcus aureus and deposited them to the EMBL-EBI server.
  • Click on the most relevant entry from the search hit list from the results page if the search is conducted using text other than the UniProt accession ID. On the next page, the predicted 3D structure is displayed with residues colored according to the predicted local distance difference test (pLDDT) metric and the predicted aligned error (PAE) matrix is also shown.
  • Check the prediction quality by looking at both the pLDDT metric, which is per-residue confidence metric that reflects the local confidence in the structure, and the PAE metric which can be used to assess the confidence in the relative orientation of different parts, e.g. domains, of the model.
  • The predicted structure is then downloaded in PDB or mmCIF format to users’ machine for further analysis. For example, in some cases the predicted structure covers the full length of the sequence, but the user may want to only focus on specific domain of the protein for drug design purposes. In such cases the downloaded structure can be trimmed for subsequent use. For regions with lower pLDDT values, MD simulation can further be conducted to equilibrate and refine the structure.
  • For proteins not yet deposited in the AlphaFold database, the 3D structure can be predicted using the AlphaFold code that may be downloaded from AlphaFold GitHub deposit at https://github.com/deepmind/alphafold/ . Follow the README file there to install required environment and load AlphaFold program. Prepare a FASTA file of the sequence of the protein to be predicted and input into the python script to run the prediction. Following completion of the prediction, the output structures are saved in a subdirectory provided by user via the ‘--output_dir’ flag of using the python script. It should be noted limitations in the true resolution of the structures from AI-prediction methods exist and that these methods do not account for the presence of alternate conformations of the protein (e.g., allosteric states) which are need to be considered when using 3D structures generated from these methods ( 93 ).

3.2. MD simulations with polarizable force field

MD simulations model how atoms in a protein or other molecular systems will move over time based on force field description ( 94 ) through integration of Newton’s equations of motion. These simulations can capture a wide range of important biomolecular processes such as conformational change and drug binding, where the dynamics of the systems allows for the inclusion of the entropic contributions to ligand binding to be taken into account as required for calculating free energies. Accordingly, the rich information from MD simulation acts as the foundation for other CADD techniques such as FEP and SILCS developed in our laboratory. Big improvements in simulation speed, accuracy, and accessibility of MD simulation software and environment have increased the utility of MD in CADD. Beyond the classic additive force fields ( 52 – 57 ) currently used in the majority of MD simulations, polarizable force fields that explicitly account for induced electronic polarization represent the next generation of physical models for MD simulations ( 95 – 101 ). Our laboratory studied the impact of electronic polarizability on protein-fragment interactions using the in-house developed classical Drude oscillator model, showing that the polarizable force field helps to improve the prediction of protein-ligand interactions indicating the utility of a polarizable force field in CADD ( 102 ). The Drude oscillator FF models electronic polarization by attaching a charged particle to the nucleus of each non-hydrogen atom via a harmonic spring and allowing those particles to relax in the surrounding electric field with the nuclear position fixed, as previously described. While the additional terms introduced in the force field associated with the treatment of polarization increase the computational cost, improved algorithms and computational power make this class of simulations accessible ( 64 , 65 ). For example, the computational overhead of the Drude FF over the additive CHARMM FF is approximately 4-fold ( 64 , 65 ). Thus, the Drude as well as other polarizable FFs represent new tools to study molecular systems that will make significant contributions to CADD.

In the first edition of this chapter, we introduced a standard MD simulation protocol. Here we present MD simulation protocol using the Drude polarizable force field.

  • Obtain the protein structure from PDB or predict the protein structure as described above in section 3.1 . Prepare the protein structure for MD simulations by adding missing hydrogens, assigning appropriate protonation state of residues and etc. These steps can be performed by a number of the publicly available and commercial modelling packages as discussed above. Generate CHARMM protein structure file (PSF) file for the simulation system based on CHARMM additive force fields using web tool CHARMM-GUI ( http://www.charmm-gui.org ) ( 103 ) or locally by running the CHARMM code. The CHARMM-GUI may be used for initial protein preparation as well as for preparation of the PSF and is available to non-commercial users.
  • Go to the CHARMM-GUI at http://www.charmm-gui.org and select Drude Prepper ( 104 ) and upload the additive PSF and coordinate files to construct the Drude force field based PSF and PDB files with added Drude particles and lone pairs. Also provided are the CHARMM input files for subsequent calculations listed below and the needed topology and parameter files. In addition, the user may request input files compatible with MD programs such as OpenMM and NAMD that support the Drude FF.
  • Similar to the protocol previously described for the additive FF ( 4 ), the system will go through minimization, equilibration and production steps. During minimization, Drude particles will be minimized first and then the entire system using the adopted-basis Newton-Raphson (ABNR) minimizer.
  • For equilibration and production runs, a hard wall restraint of 0.2 Å between the parent atom and the Drude particle is applied to prevent instability and large displacements of Drude particles. The hard wall is designed to avoid polarization catastrophe that may occur due to low frequency close interactions between atomsduring the MD simulation leading to over polarization ( 105 ). During Drude simulations, the extended Lagrangian dynamics scheme ( 106 ) for integration of Newton’s equations of motion is used where the real atoms and the Drude particles are coupled to a dual thermostat responsible for uniting their dynamics. The physical and Drude thermostats are maintained at different temperatures of 298 K and 1 K with the friction coefficients of 5 ps −1 and 20 ps −1 , respectively. Drude simulations are typically propagated with a 1 fs time step.
  • Analysis of Drude simulations, beyond that used for all MD simulations, can include variations in the dipole moments of various groups in the systems being studied. This allows for an understanding of how variations in the electronic structure of the system associated with the explicit inclusion of polarizability are impacting the properties of the system. When calculating dipole moments, care must be taken as the dipole is not spatially invariant when the sum of the charges is not zero. To account for this and facilitate dipole analysis with the Drude FF, the sum of the charges on all particles are integers (e.g., on protein sidechains and nucleic acid bases) though the spatial orientations of charged groups must still be considered.

3.3. Docking using SILCS-MC and ML based reweighting for SAR

Docking is a useful CADD tool to predict binding orientation of a ligand molecule within target binding site as well as to evaluate its binding strength ( 70 ). Traditional docking methods only consider rigid or limited protein flexibility and ignore or treat the contribution of desolvation to binding in an empirical way. While FEP and molecular mechanics (MM) with Poisson–Boltzmann (PB) and surface area solvation (MM/PBSA) as well as MM/generalized-Born SA (GBSA) methods ( 107 ) do account desolvation, these are computational demanding approaches that limits their utility in CADD. A novel method designed to overcome this drawback is the SILCS-MC ( 19 ) docking method put forward by our laboratory. SILCS-MC conducts ligand sampling within the GFE FragMap free energy grids from SILCS. This takes advantage of the use of GCMC/MD simulations of the protein in aqueous solution with selected organic solutes to precompute the GFE FragMaps that are free energy functional group affinity patterns that encompass the entire protein and account for protein flexibility and desolvation contributions ( 108 – 110 ). SILCS-MC then involves simply assigning the GFE value for the appropriate FragMap type to each atom in the molecule and summing those values to get the LGFE score. MC conformational sampling is then performed to allow the orientation and conformation of the ligands to relax in the field of the GFE FragMaps. This allows for SILCS-MC docking to be performed in a highly computationally efficient fashion while achieving a level of accuracy similar to highly expensive FEP methods ( 109 ). The SILCS method was fully described in the first edition of this chapter ( 4 ). Below we present the SILCS-MC docking protocol assuming the user has already run the SILCS simulations and obtained the GFE FragMaps. A Bayesian ML based reweighting protocol is also described for improvement of the predictability of the SILCS method that can be applied when experimental data on a small set of ligands (10 or more) is available ( 108 ).

  • Prepare molecule coordinate files for ligands to be docked in either mol2 or sdf format. For mol2 format, each mol2 file contains a single molecule, while for sdf format, multiple molecule entries are allowed in a single sdf file by the current SILCS-MC code.
  • Select an atom classification scheme (ACS) for the SILCS-MC run. When performing SILCS-MC, GFE is evaluated for each atom in the molecule with an assigned type that overlapped with SILCS FragMaps of the same type. ACS controls the assignment of FragMap types to each atom in a molecule based on their CGenFF atom type and chemical connectivity during the initiation of a SILCS-MC run. Typical ACS include generic and specific types. Generic ACS has more general FragMap types, e.g., both aromatic and aliphatic carbon atoms in a molecule will be assigned with generic nonpolar GENN FragMap type. While specific ACS has specific FragMap types being assigned to specific atoms, e.g., aromatic carbon atom has BENC (benzene carbons) FragMap type while aliphatic carbon gets PRPC (propane carbons) type. The generic ACS is the default method.
  • Choose a MC sampling protocol for the SILCS-MC run. Ligand binding poses are sampled using Metropolis MC sampling and following simulated annealing (SA) during a SILCS-MC run. Thus, MC/SA parameters such as simulation cycles (n CY ) and steps (n MC /n SA ) as well as range of global rotational (dθ), translational (dX) and intramolecular dihedral (dφ) degrees of freedom can be adjusted depending on the specific system. Typical protocols include local and exhaustive types even though users can customize their own protocol by changing corresponding parameters in the SILCS-MC input file. Local MC is designed for pose refinement and the sampling starts from the user supplied pose with limited conformational sampling with n CY =10, n MC =100, dX=0.5 Å, dθ=15°, dφ=45° and n SA =1000. Exhaustive protocol is designed for full docking of the ligand orientation and conformation in a given pocket to determine its most favorable orientation when no initial binding information is available from experiment. It starts with a randomized orientation for the ligand within a sphere with user defined center and radius. MC sampling is performed to allow for larger conformational changes with n CY =250, n MC =10,000, dX=1 Å, dθ=180°, dφ=180° and n SA =40,000. For both local and exhaustive sampling, SA steps following MC in each cycle adapts parameters as dX=0.2 Å, dθ=9° and dφ=9°.
  • Run SILCS-MC simulations using the SILCS-MC code with the ACS file, the CGenFF rules and parameter files, the GFE FragMap files, exclusion map file and user defined parameters. CGenFF parameters are initially assigned to the ligand powered by the CGenFF engine to allow for energy minimization of the ligand during initiation of SILCS-MC simulation and used to calculate intramolecular energy during the MC calculation. The exclusion map represents the forbidden region of the protein not sampled by the solutes or water non-hydrogen atoms during the SILCS simulation and used as a penalty score to guide the sampling. Usually, five independent SILCS-MC runs are conducted in parallel to expediate the convergence of the docking results. In each run, after a minimum of 50 MC/SA cycles, if the lowest three LGFE scores are within 0.5 kcal/mol, the run will be considered converged and terminated. Otherwise, cycles will continue either until the convergence criterion is met or until the user defined maximum MC/SA cycles, 250 by default, have been reached.
  • After the SILCS-MC simulations are finished, the docking pose with the lowest LGFE score can be extracted and used as the predicted binding orientation for the ligand. The docking pose can be visualized together with the protein structure and FragMaps and analyzed. One advantage of SILCS-MC over traditional docking methods is the ease of decomposition of total docking score into atomic contributions which are especially useful during the ligand optimization step ( 111 , 112 ). Atomic GFE values can be visually checked for a ligand to determine beneficial and unsatisfactory functional groups. For example, when modifying a ligand, favorable gains associated with the modification may be offset by a loss of favorable contributions in another part of the molecule, information that is not readily accessible to other CADD docking methods ( 109 ). In addition, visualization of FragMaps around the docking pose of a ligand can also offer ideas about additional functional groups that may be introduced to the current scaffold to further improve affinity.
  • SILCS Bayesian ML reweighting: When experimental binding data is available, LGFE scores can be trained using ML for a refined prediction yielding a more accurate SAR model. The LGFE is a simple summation over all atomic GFE contributions from different FragMap types, assuming the contribution from each FragMap type is well balanced when fragments form a full molecule. In practice, this represents an approximation since the sum of binding affinities of individual fragments in a molecule does not formally equal the binding affinity of the full molecule due to the energy adjustment through linking fragments into a molecule. Accordingly, the GFE FragMap contributions in LGFE can be reweighted based on experimental binding data to improve the predictability of SILCS-MC. This is done by using a Bayesian Markov-Chain Monte Carlo based ML (BML) method ( 108 ).
  • To start the reweighting BML process, experimental data and SILCS-MC docking poses in PDB format are required. The ML training can be conducted by optimizing the root-mean squared deviation (RMSD), Pearson correlation or percent correct (true positives and true negatives) metrics between the LGFE and experimental binding free energies. The user can also select from three restraint types as flat-bottom, hard wall and harmonic to prevent over-fitting problem. Running the BML code will yield trained weighting factors for each FragMap type and estimated prediction improvement based on the current docking poses. The optimized weighting factors of the FragMaps are then used to redo the SILCS-MC run to verify the real improvement. The new LGFE score formula with trained reweighting factors can then be used for new ligand designs for the current protein target. In cases where overfitting of the weighting factors occurs, the resulting docking poses and LGFE scores from the second SILCS-MC run get highly perturbed and in poorer agreement with experiment, respectively, allowing for a check on the applied BML fitting parameters.

3.4. Site identification using SILCS-Hotspots

Computational binding site identification methods can be used to exploit novel, druggable sites on new protein targets for potential therapeutic development ( 113 ). For antibiotics development, such methods can be employed to search for putative allosteric sites as alternatives to the active or orthosteric sites on bacterial proteins to overcome drug resistance issues ( 114 ). A binding-site identification method under the SILCS framework, named SILCS-Hotspots, was developed recently by our laboratory ( 115 ). SILCS-Hotspots is designed to identify fragment binding hotspots that are spatially distributed across the global protein structure including both surface and interior binding sites. The general protocol using SILCS-Hotspots to identify putative binding sites on a protein is described as the following. The protocol requires that the SILCS FragMaps are already available.

  • Select a collection of representative molecular fragments to be used for the hotspots search. The Astex MiniFrag set ( 116 ) and the collection of ~90 mono and bicyclic rings present in drug molecules ( 117 ) are both good fragment libraries to be used.
  • Partition the protein system into a set of overlapping 14.14 Å 3 sub-spaces that encompass the entire protein. For each individual sampling box, exhaustive SILCS-MC as described in section 3.3 is conducted for every fragment in the library. All SILCS-MC docking poses that are sampled over the full space are collected for each fragment. Fragment docking poses with LGFE scores of −2 kcal/mol or more favorable and within 6 Å of any protein C⍺ atoms are selected as relevant binding poses and subjected to the following clustering steps.
  • For each fragment, a center-of-mass (COM) based clustering with 3 Å cluster radius is performed. Clustering determines the number of neighbors within a 3 Å radius of COM of each docking pose and then identifies the pose with the largest number of neighbors. The remaining cluster members are then removed from the pool of docking poses with the process continued until no additional poses remain. This step selects presentative docking poses for each fragment.
  • A second round of clustering is conducted over docking poses of all fragments obtained from the first round of clustering. The same clustering algorithm is used but with a radius of 4 Å, from which clusters that contain representative docking poses for one or more fragments are identified. These cluster centers are defined as Hotspots. Information on the hotspots include the number and types of fragments in the site, the LGFE scores of all fragments, and their spatial relationships. Hotspot ranking may then be performed based on the average LGFE scores or the number of fragments in a site.

An external file that holds a picture, illustration, etc.
Object name is nihms-1860696-f0002.jpg

SILCS-Hotspots analysis for the bacterial enzyme TEM-1 beta-lactamase using the Astex MiniFrag set. SILCS apolar (green), H-bond donor (blue) and H-bond acceptor (red). FragMaps are rendered at −1.0 kcal/mol while positively charged (cyan) and negatively charged (orange) FragMaps are rendered at −1.2 kcal/mol. Hotspots are shown as spheres with average LGFE colored in red-white-blue (more to less favorable) scale. Putative binding sites selected based on adjacent hotspots, FragMaps and exclusion maps are shown in red dashed circles. Crystal binding modes of an active site (PDB:1ERM) ( 118 ) and an allosteric site (PDB:1PZP) ( 119 ) binder are shown. rSASA% vs LGFE plots are shown in the lower panel for top 25 LGFE ranked FDA compounds for all three sites with average values indicated as vertical and horizon lines.

  • Quantitative evaluation may be performed through exhaustive SILCS-MC docking on each selected site using a library of drug molecules, for example, FDA approved drugs. In house, ~ 380 chemically diverse FDA approved compounds were constructed for this purpose. Exhaustive SILCS-MC docking is performed with the center of each site defined based on the central hotspot along with a 5 Å radius; the process may be repeated with each hotspot within an interesting site as the center of the docking region. For each site, the average LGFE scores of the top-ranked 25 FDA compounds based on the LGFE scores are obtained along with the percent relative solvent accessible surface area (rSASA%) ( 120 ). The rSASA% is calculated using the solvent accessibility of each ligand in the presence and absence of the protein. Free software such as FreeSASA can be used for such a calculation ( 121 ). The combination of these metrics is then used to quantify the binding sites with ideal sites giving highly favorable LGFE scores (<−10 kcal/mol) and small rSASA% (<40%) which indicate they are suitable for binding drug-like molecules. For example, Figure 2 shows that sites 1 and 2 have more favorable LGFEs than site 3 and both those sites have reasonable rSASA% around 30% compared to site 3 (~50%). Thus site 1 and 2 are predicted to be putative binding sites over site 3. Experimental crystal complex structures confirmed this with site 1 being the active site and site 2 serving as an allosteric site for TEM-1 beta-lactamase.

3.5. Membrane permeation prediction using SILCS

Most antibiotics were designed to target proteins involved in intracellular processes, thus the outer membrane of bacteria needs to be penetrated for antibiotics to function. Drug resistance involving modifications of macromolecules in the outer membrane is a common issue that needs to be considered when searching for new antibiotics ( 122 , 123 ). While bacterial membranes are complex environments with multiple transport and pore proteins, it is of utility to estimate the pure membrane permeability of drug candidates during drug discovery as this may contribute to drug bioavailability. Traditionally, potential of mean force (PMF) free-energy profiles for a compound across membrane lipid bilayers are derived using MD simulations ( 124 ). The PMF may then be used together with position-specific diffusion coefficient in the inhomogeneous solubility-diffusion equation ( 125 ) to derive effective resistivity, which may be inverted into permeability. Under the SILCS framework, we recently put forward a protocol to calculate permeation related resistant factor of a molecule to cross membranes ( 126 ) using LGFE energy profile and is described in the following.

  • Setup the membrane lipid bilayer system. This can be a bilayer system with lipopolysaccharide composition that is specific for the bacteria outer membrane of interest ( 127 ), or just a bilayer model. Examples include pure 1,2-dipalmitoyl- sn -glycero-3-phosphocholine (DPPC), a (0.9:0.1) 1-palmitoyl-2-oleoyl- sn -glycero-3-phosphocholine(POPC)/cholesterol mixture or a (0.52:0.18:0.3) 1,2-dioleoyl- sn -glycero-3-phosphocholine(DOPC)/1,2-dioleoyl- sn -glycero-3-phospho-l-serine(DOPS)/cholesterol composition that mimics the lipid mixture used in a parallel artificial membrane permeability assay (PAMPA) experimental study ( 128 ). The membrane builder functionality ( 129 ) in CHARMM-GUI is a very convenient tool to setup such lipid bilayer systems. Minimization and short MD simulation can be conducted to further stabilize the lipid bilayer model using protocol and inputs supplied by the CHARMM-GUI.
  • Perform the standard SILCS simulation on the lipid bilayer system and generate the GFE FragMaps.
  • Calculate the LGFE profile for drug-like ligands across the lipid bilayer. Run SILCS-MC for the ligand along the normal Z to the bilayer with 1 Å increments covering the full bilayer system including both the lipid and water phases. At each Z position, SILCS-MC is performed under exhaustive mode as described in section 3.3 except for that the ligand COM is only allowed to vary by 1 Å maximum from the assigned Z value during MC sampling. SILCS-MC simulation can be conducted at multiple different (X,Y) positions along the plane of the bilayer to ensure proper samplings. LGFE profile is constructed at each (X,Y) position along Z and multiple energy profiles are averaged over different (X,Y) positions to get the final LGFE profile with standard deviation being evaluated.
  • Use the LGFE energy profile G(z) along Z axis to calculate permeation related resistant factor R. The effective membrane permeability P eff can be calculated from the effective resistivity R eff through equation: 1/P eff =R eff =∫ h R(z)dz, where h is the bilayer thickness and resistivity R(z) at position z is defined as: R(z)=e β(ΔG(z)) /D(z), where D(z) is the position-specific diffusion coefficient at position z and ΔG(z)=G(z)-G ref . Here, G(z) is LGFE profile as a function of z and G ref is the reference free energy in the water phase that can be calculated from the average LGFE over the water phase. β=1/k B T, where k B is the Boltzmann constant and T is the absolute temperature. In the current protocol, D(z) is not calculated and is assumed to be a constant D so that 1/P eff =R/D where R=∫ h e β(ΔG(z)) dz will be calculated as the resistant coefficient.

3.6. Protein-protein interaction prediction using SILCS-PPI

Protein-protein interactions (PPIs) are involved in a tremendous amount of vital cellular processes in bacteria and can serve as novel antibiotic targets ( 130 , 131 ). Efforts towards the inhibition of PPIs related to division and replication, transcription, outer membrane protein complexes, as well as toxin-antitoxin systems in bacteria are ongoing ( 132 ). Thus, PPI prediction is of utility to identify novel druggable PPI interfaces in bacterial proteins and pave the way toward novel antibiotic therapeutics. Traditional PPI prediction methods are mostly based on rigid protein structures with limited flexibility considerations ( 133 ). Using the GFE FragMap and protein residue occupancy distributions, or protein probability grids (PPG), calculated from SILCS, a PPI prediction method named SILCS-PPI was put forward in our laboratory ( 134 ). It uses SILCS FragMaps and PPG from both proteins involved in a PPI which have flexibility considerations intrinsically embedded in them together with fast Fourier transforms (FFT) enhanced sampling to sample a comprehensive set of PPI interaction orientations that are then ranked based on the overlap of the FragMaps and PPG of the protein partners. The general SILCS-PPI protocol is described as the following and requires that the SILCS FragMaps and PPG for both proteins are already available.

  • Run SILCS-PPI prediction using both FragMaps and PPGs as well as exclusion maps from both proteins. During the run, FragMaps from the ligand protein will be spatially operated to match PPGs from the receptor protein and vice visa. To expedite the process, unique rigid body rotations ( 135 ) are considered for the ligand protein and for each rotation FFT is used to calculate PPI scores for all global translations in one go. The final SILCS-PPI score is obtained by summing over all ligand FragMap-receptor PPG scores and receptor FragMap-ligand PPG scores of all types as well as an exclusion score calculated from the correlation of exclusion maps from the two proteins which serves as an alternative shape complementary score.
  • Save top ranked solutions (global translation and rotation parameters) and construct PPI complex coordinates. Then two-pass clustering is used to cluster all complex models. In the first step, COM-based clustering is conducted to put all models whose COM distances are within 6 Å into the same cluster. Next, a second orientation-based clustering is performed using an angular distance metric that preserve periodicity ( 136 ). The distance cutoff is set to 0.5, which corresponds to about 30° in an angle. After the two-pass clustering, the best scoring pose from each cluster is saved for further evaluation.
  • COM of top ranked solutions can be visualized on the surfaces of both ligand and receptor proteins and colored by SILCS-PPI scores to help interpret the predicted PPI interfaces. The populations of COMs on the protein surface can be used to predict alternative PPI sites. In addition, the PDB coordinates of the PPI complexes may be accessed, though they are based on the rigid crystal structures used to initiate the SILCS-PPI calculation such that there will typically be steric overlap between the two proteins that requires careful relaxation of the structures prior to MD simulations.

3.7. Biologics formulation using SILCS

Besides efforts to develop small molecule antibiotics to counteract the evolving drug resistance of bacteria, researchers are also applying biologics-based drugs such as monoclonal antibodies (mAbs) in the battle ( 137 – 140 ). Biomacromolecular therapeutics, or so called biologics, need to be carefully formulated to maximize protein stability and minimize viscosity, so as to ensure both efficacy and safety for highly concentrated formulations ( 141 ). Toward maximizing stability, biologics can be formulated with excipients to help minimize aggregation and denaturation of the biologic in a solution formulation ( 142 ). To assist the rational selection of excipients for biologics, we developed the SILCS-Biologics protocol ( 143 , 144 ) which combines SILCS-PPI and SILCS-Hotspots as described above to predict both PPIs that can contribute to protein aggregation and increased viscosity, and binding sites of excipients. This information is then combined to build a model for protein stability, aggregation and viscosity prediction. Basic protocol is shown in the following.

  • Run SILCS simulation on the biologic protein and generate both FragMaps and PPGs as described in section 3.6 .
  • Predict PPI sites using SILCS-PPI as described in section 3.6 . Instead of two proteins, here the SILCS-PPI calculation is conducted against the same set of FragMaps and PPGs from the single protein. After the two-pass clustering in SILCS-PPI, all selected poses are used to calculate a per-residue PPI preference value (PPIP) by counting the number of contacts between the receptor and ligand (the same protein) atoms within a 5 Å cutoff over all poses and normalized by the maximum PPIP value to get the final PPIP score for all residues that contribute PPI. Such PPIP score suggests the likelihood of a residue being involved in a PPI that may lead to aggregation or increase viscosity.
  • Run SILCS-Hotspots to map excipient binding sites on the biologic protein. The user can choose a collection of excipient molecules desired for the formulation. In our in-house tests ( 143 ), amino acid and sugar excipient molecules used include alanine, arginine, aspartate, citrate, glucose, glutamate, glycine, histidine, lactate, lysine, malate, mannitol, phosphate, proline, sorbitol, succinate, sucrose, threonine, trehalose, and valine. This list may be easily altered or extended as needed.
  • Combine the calculated PPIP from step 2 and excipient binding-site profiles from step 3 to investigate the potential effect of excipient molecules on biologic protein aggregation. For example, the number of excipient binding sites that satisfy a range of PPIP and energy criteria may be selected. These may then be partitioned into the number of sites involving individual excipients. In addition to LGFE, ligand efficiency (LE), which is defined as LGFE divided by number of non-hydrogen atoms in a molecule, is also employed to rank excipients since it is independent of the size of the excipient molecules. In a study on the NIST mAb, it was found that a criterion defined as number of excipient binding sites that have average LE < −0.25 kcal/mol and PPIP > 0.1, correlates well with experimental viscosity profile in general ( 143 ). Such criterion has the ability to indicate the strength of excipient to prevent aggregation since it incorporates information about favorable excipient binding (more negative LE) against more likely aggregation involved regions (higher PPIP). In practice, user can try different criteria using PPIP, LGFE and LE metrics for biologic protein of interest and even build a regression model using these metrics if there is experimental aggregation related data available. We note that given the challenges associated with biologics formulations, it is likely that the different criteria will be required for different proteins, even with different mAb molecules of the same class.
  • For protein structure prediction using AlphaFold, regions with low pLDDT values are often intrinsically disordered regions ( 50 ), which are generally not suitable for drug targeting purposes. The intrinsically disordered regions are often presented as extended polypeptide regions in the predicted 3D structures. If a well-structured region in your model has low pLDDT values, this then might indicate that the quality of the model is questionable and needs to be examined.
  • Docking pose of a ligand from a SILCS-MC run can have clashes with the protein structure that is used to initialize the SILCS simulation. This is because SILCS-MC docking use SILCS FragMaps that incorporate the protein flexibility during the MD simulation. An alternative is to visualize the docking pose with the SILCS exclusion map which can serve as a flexibility-accounting alternative to the protein surface representation based on a single rigid protein structure. To present the SILCS-MC docking pose in a classic protein-ligand interaction representation fashion, it is also practical to extract protein structures from the SILCS simulation that have no clashes with the pose to present the result. Finally, when combining the SILCS-MC predicted ligand orientation with the protein structure used to initiate the simulation or one extracted from the SILCS simulations, it is important to perform careful relaxation of the protein around the ligand prior to production MD simulations.
  • In the current version of SILCS-PPI, only protein structures that used to initialize SILCS simulations are used to construct PPI complex coordinates. In practice, representative protein structures from SILCS simulations can be extracted for model construction purpose and minimization with short time MD simulation is also desired to further refine the complex model for a better PPI representation.

Acknowledgements.

This work was supported by NIH grants R35GM131710 (AM), GM129327 (DW), AI152397 (DW), the University of Maryland Center for Biomolecular Therapeutics (CBT), the Samuel Waxman Cancer Research Foundation, and the Computer-Aided Drug Design (CADD) Center at the University of Maryland, Baltimore.

Conflict of interest: A.D.M. is Co-founder and CSO of SilcsBio LLC.

References:

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

  • Computer-aided drug design
  • Drug design
  • Get an email alert for Computer-aided drug design
  • Get the RSS feed for Computer-aided drug design

Showing 1 - 13 of 24

View by: Cover Page List Articles

Sort by: Recent Popular

computer aided drug design research

Insights into the computer-aided drug design and discovery based on anthraquinone scaffold for cancer treatment: A systematic review

Hui Ming Chua, Said Moshawih,  [ ... ], Long Chiau Ming

computer aided drug design research

Insights into the computer-aided drug design and discovery based on anthraquinone scaffold for cancer treatment: A protocol for systematic review

Hui Ming Chua, Said Moshawih,  [ ... ], Nurolaini Kifli

computer aided drug design research

Role of ADAMTS13, VWF and F8 genes in deep vein thrombosis

Maria Teresa Pagliari, Andrea Cairo,  [ ... ], Flora Peyvandi

computer aided drug design research

in-silico tools">Prioritization of candidate genes for a South African family with Parkinson’s disease using in-silico tools

Boiketlo Sebate, Katelyn Cuttler,  [ ... ], Soraya Bardien

computer aided drug design research

Novel genetic variants of inborn errors of immunity

Farida Almarzooqi, Abdul-Kader Souid, Ranjit Vijayan, Suleiman Al-Hammadi

computer aided drug design research

SVAD: A genetic database curates non-ischemic sudden cardiac death-associated variants

Wei-Chih Huang, Hsin-Tzu Huang,  [ ... ], Hsien-Da Huang

computer aided drug design research

MISTIC: A prediction tool to reveal disease-relevant deleterious missense variants

Kirsley Chennen, Thomas Weber,  [ ... ], Olivier Poch

computer aided drug design research

RNU4ATAC , a non-coding spliceosomal gene">Clinical interpretation of variants identified in RNU4ATAC , a non-coding spliceosomal gene

Clara Benoit-Pilven, Alicia Besson,  [ ... ], Sylvie Mazoyer

computer aided drug design research

ZFHX3 , TRPS1 , and CHD7 in human esophageal atresia">Human exome and mouse embryonic expression data implicate ZFHX3 , TRPS1 , and CHD7 in human esophageal atresia

Rong Zhang, Jan Gehlen,  [ ... ], Heiko Reutter

computer aided drug design research

Phenogenon: Gene to phenotype associations for rare genetic diseases

Nikolas Pontikos, Cian Murphy,  [ ... ], UK Inherited Retinal Dystrophy Consortium, Phenopolis Consortium

computer aided drug design research

Whole genome sequencing and rare variant analysis in essential tremor families

Zagaa Odgerel, Shilpa Sonti,  [ ... ], Lorraine N. Clark

computer aided drug design research

Identification of putative pathogenic single nucleotide variants (SNVs) in genes associated with heart disease in 290 cases of stillbirth

Ellika Sahlin, Anna Gréen,  [ ... ], Erik Iwarsson

computer aided drug design research

Rare, potentially pathogenic variants in 21 keratoconus candidate genes are not enriched in cases in a large Australian cohort of European descent

Sionne E. M. Lucas, Tiger Zhou,  [ ... ], Kathryn P. Burdon

Connect with Us

  • PLOS ONE on Twitter
  • PLOS on Facebook

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

molecules-logo

Article Menu

computer aided drug design research

  • Subscribe SciFeed
  • Recommended Articles
  • PubMed/Medline
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Past, present, and future perspectives on computer-aided drug design methodologies.

computer aided drug design research

1. Introduction: The Benefits of Computational Methods for Drug Discovery

1.1. the drug discovery pipeline and the problem of candidate selection, 1.2. the application of computational methods in drug discovery, 1.3. the main methodology branches in cadd, 2. discussion, 2.1. ligand-based drug design (lbdd), quantitative structure–activity relationship (qsar) modeling and cheminformatics, 2.2. structure-based drug design (sbdd), 2.2.1. molecular docking, 2.2.2. molecular dynamics, enhanced sampling methods in molecular dynamics, molecular dynamics as a post-docking approach, free-energy perturbation (fep) and thermodynamic integration (ti), thermal titration molecular dynamics (ttmd), 2.2.3. supervised molecular dynamics, 3. conclusions and future perspectives, author contributions, institutional review board statement, informed consent statement, data availability statement, acknowledgments, conflicts of interest.

  • Martin, L.; Hutchens, M.; Hawkins, C. Clinical trial cycle times continue to increase despite industry efforts. Nat. Rev. Drug Discov. 2017 , 16 , 157. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Simoens, S.; Huys, I. R&D Costs of New Medicines: A Landscape Analysis. Front. Med. 2021 , 8 , 760762. [ Google Scholar ] [ CrossRef ]
  • The Pharmaceutical Industry in Figures. Available online: https://www.efpia.eu/media/602709/the-pharmaceutical-industry-in-figures-2021.pdf (accessed on 2 March 2023).
  • Bohacek, R.S.; McMartin, C.; Guida, W.C. The art and practice of structure-based drug design: A molecular modeling perspective. Med. Res. Rev. 1996 , 16 , 3–50. [ Google Scholar ] [ CrossRef ]
  • Szymański, P.; Markowicz, M.; Mikiciuk-Olasik, E. Adaptation of High-Throughput Screening in Drug Discovery—Toxicological Screening Tests. Int. J. Mol. Sci. 2011 , 13 , 427–452. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Moro, S.; Bacilieri, M.; Deflorian, F. Combining ligand-based and structure-based drug design in the virtual screening arena. Expert Opin. Drug Discov. 2007 , 2 , 37–49. [ Google Scholar ] [ CrossRef ]
  • Petrović, D.; Scott, J.S.; Bodnarchuk, M.S.; Lorthioir, O.; Boyd, S.; Hughes, G.M.; Lane, J.; Wu, A.; Hargreaves, D.; Robinson, J.; et al. Virtual Screening in the Cloud Identifies Potent and Selective ROS1 Kinase Inhibitors. J. Chem. Inf. Model. 2022 , 62 , 3832–3843. [ Google Scholar ] [ CrossRef ]
  • Luttens, A.; Gullberg, H.; Abdurakhmanov, E.; Vo, D.D.; Akaberi, D.; Talibov, V.O.; Nekhotiaeva, N.; Vangeel, L.; De Jonghe, S.; Jochmans, D.; et al. Ultralarge Virtual Screening Identifies SARS-CoV-2 Main Protease Inhibitors with Broad-Spectrum Activity against Coronaviruses. J. Am. Chem. Soc. 2022 , 144 , 2905–2920. [ Google Scholar ] [ CrossRef ]
  • Gorgulla, C.; Boeszoermenyi, A.; Wang, Z.-F.; Fischer, P.D.; Coote, P.W.; Das, K.M.P.; Malets, Y.S.; Radchenko, D.S.; Moroz, Y.S.; Scott, D.A.; et al. An open-source drug discovery platform enables ultra-large virtual screens. Nature 2020 , 580 , 663–668. [ Google Scholar ] [ CrossRef ]
  • Gupta, G. Racing the Clock, COVID Killer Sought among a Billion Molecules. Available online: https://blogs.nvidia.com/blog/2020/05/26/covid-autodock-summit-ornl/ (accessed on 2 March 2023).
  • LeGrand, S.; Scheinberg, A.; Tillack, A.F.; Thavappiragasam, M.; Vermaas, J.V.; Agarwal, R.; Larkin, J.; Poole, D.; Santos-Martins, D.; Solis-Vasquez, L.; et al. GPU-Accelerated Drug Discovery with Docking on the Summit Supercomputer. In Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, Virtual Event, 21–24 September 2020; pp. 1–10. [ Google Scholar ] [ CrossRef ]
  • FDA. The Drug Development Process. Available online: https://www.fda.gov/patients/learn-about-drug-and-device-approvals/drug-development-process (accessed on 2 March 2023).
  • Zhu, T.; Cao, S.; Su, P.-C.; Patel, R.; Shah, D.; Chokshi, H.B.; Szukala, R.; Johnson, M.E.; Hevener, K.E. Hit Identification and Optimization in Virtual Screening: Practical Recommendations Based on a Critical Literature Analysis. J. Med. Chem. 2013 , 56 , 6560–6572. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Hughes, J.P.; Rees, S.; Kalindjian, S.B.; Philpott, K.L. Principles of early drug discovery. Br. J. Pharmacol. 2011 , 162 , 1239–1249. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Keseru, G.M.; Makara, G.M. Hit discovery and hit-to-lead approaches. Drug Discov. Today 2006 , 11 , 741–748. [ Google Scholar ] [ CrossRef ]
  • Leelananda, S.P.; Lindert, S. Computational methods in drug discovery. Beilstein J. Org. Chem. 2016 , 12 , 2694–2718. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Dauter, Z.; Wlodawer, A. Progress in protein crystallography. Protein Pept. Lett. 2016 , 23 , 201–210. [ Google Scholar ] [ CrossRef ]
  • Benjin, X.; Ling, L. Developments, applications, and prospects of cryo-electron microscopy. Protein Sci. 2020 , 29 , 872–882. [ Google Scholar ] [ CrossRef ]
  • Cavasotto, C.N.; Phatak, S.S. Homology modeling in drug discovery: Current trends and applications. Drug Discov. Today 2009 , 14 , 676–683. [ Google Scholar ] [ CrossRef ]
  • Jumper, J.; Evans, R.; Pritzel, A.; Green, T.; Figurnov, M.; Ronneberger, O.; Tunyasuvunakool, K.; Bates, R.; Žídek, A.; Potapenko, A.; et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021 , 596 , 583–589. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Hekkelman, M.L.; de Vries, I.; Joosten, R.P.; Perrakis, A. AlphaFill: Enriching AlphaFold models with ligands and cofactors. Nat. Methods 2023 , 20 , 205–213. [ Google Scholar ] [ CrossRef ]
  • David, A.; Islam, S.; Tankhilevich, E.; Sternberg, M.J. The AlphaFold Database of Protein Structures: A Biologist’s Guide. J. Mol. Biol. 2021 , 434 , 167336. [ Google Scholar ] [ CrossRef ]
  • Cherkasov, A.; Muratov, E.N.; Fourches, D.; Varnek, A.; Baskin, I.; Cronin, M.; Dearden, J.; Gramatica, P.; Martin, Y.C.; Todeschini, R.; et al. QSAR Modeling: Where Have You Been? Where Are You Going to? J. Med. Chem. 2014 , 57 , 4977–5010. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Yang, S.-Y. Pharmacophore modeling and applications in drug discovery: Challenges and recent advances. Drug Discov. 2010 , 15 , 444–450. [ Google Scholar ] [ CrossRef ]
  • Kramer, C.; Fuchs, J.E.; Whitebread, S.; Gedeck, P.; Liedl, K.R. Matched Molecular Pair Analysis: Significance and the Impact of Experimental Uncertainty. J. Med. Chem. 2014 , 57 , 3786–3802. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Zhong, F.; Xing, J.; Li, X.; Liu, X.; Fu, Z.; Xiong, Z.; Lu, D.; Wu, X.; Zhao, J.; Tan, X.; et al. Artificial intelligence in drug design. Sci. China Life Sci. 2018 , 61 , 1191–1204. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Mouchlis, V.D.; Afantitis, A.; Serra, A.; Fratello, M.; Papadiamantis, A.G.; Aidinis, V.; Lynch, I.; Greco, D.; Melagraki, G. Advances in De Novo Drug Design: From Conventional to Machine Learning Methods. Int. J. Mol. Sci. 2021 , 22 , 1676. [ Google Scholar ] [ CrossRef ]
  • Begam, B.; Kumar, J.S. A Study on Cheminformatics and its Applications on Modern Drug Discovery. Procedia Eng. 2012 , 38 , 1264–1275. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Kubinyi, H. Free Wilson Analysis. Theory, Applications and its Relationship to Hansch Analysis. Quant. Struct. Relatsh. 1988 , 7 , 121–133. [ Google Scholar ] [ CrossRef ]
  • Silakari, O.; Singh, P.K. QSAR: Descriptor calculations, model generation, validation and their application. In Concepts and Experimental Protocols of Modelling and Informatics in Drug Design ; Elsevier: Amsterdam, The Netherlands, 2021; pp. 29–63. [ Google Scholar ] [ CrossRef ]
  • Ragno, R.; Esposito, V.; Di Mario, M.; Masiello, S.; Viscovo, M.; Cramer, R.D. Teaching and Learning Computational Drug Design: Student Investigations of 3D Quantitative Structure–Activity Relationships through Web Applications. J. Chem. Educ. 2020 , 97 , 1922–1930. [ Google Scholar ] [ CrossRef ]
  • Dearden, J.C. The History and Development of Quantitative Structure-Activity Relationships (QSARs). Int. J. Quant. Struct. Relatsh. 2016 , 1 , 1–44. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • TSoares, T.A.; Nunes-Alves, A.; Mazzolari, A.; Ruggiu, F.; Wei, G.-W.; Merz, K. The (Re)-Evolution of Quantitative Structure–Activity Relationship (QSAR) Studies Propelled by the Surge of Machine Learning Methods. J. Chem. Inf. Model. 2022 , 62 , 5317–5320. [ Google Scholar ] [ CrossRef ]
  • Golbraikh, A.; Wang, X.S.; Zhu, H.; Tropsha, A. Predictive QSAR Modeling: Methods and Applications in Drug Discovery and Chemical Risk Assessment. In Handbook of Computational Chemistry ; Springer: Dordrecht, The Netherlands, 2012; pp. 1309–1342. [ Google Scholar ] [ CrossRef ]
  • Verma, J.; Khedkar, V.M.; Coutinho, E.C. 3D-QSAR in Drug Design-A Review. Curr. Top. Med. Chem. 2010 , 10 , 95–115. [ Google Scholar ] [ CrossRef ]
  • Gasteiger, J.; Engel, T. Chemoinformatics: A Textbook ; Wiley-VCH Verlag GmbH & Co. KGaA: Weinheim, Germany, 2003. [ Google Scholar ]
  • Muchmore, S.W.; Edmunds, J.J.; Stewart, K.D.; Hajduk, P.J. Cheminformatic Tools for Medicinal Chemists. J. Med. Chem. 2010 , 53 , 4830–4841. [ Google Scholar ] [ CrossRef ]
  • Landrum, G. RDKit: Open-Source Cheminformatics. 2010. Available online: https://www.rdkit.org/ (accessed on 2 March 2023).
  • Dalke, A.; Hert, J.; Kramer, C. mmpdb: An Open-Source Matched Molecular Pair Platform for Large Multiproperty Data Sets. J. Chem. Inf. Model. 2018 , 58 , 902–910. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Bolcato, G.; Heid, E.; Boström, J. On the Value of Using 3D Shape and Electrostatic Similarities in Deep Generative Methods. J. Chem. Inf. Model. 2022 , 62 , 1388–1398. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Spiegel, J.O.; Durrant, J.D. AutoGrow4: An open-source genetic algorithm for de novo drug design and lead optimization. J. Cheminform. 2020 , 12 , 25. [ Google Scholar ] [ CrossRef ]
  • Riniker, S.; Landrum, G.A. Similarity maps-a visualization strategy for molecular fingerprints and machine-learning methods. J. Cheminform. 2013 , 5 , 43. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Daré, J.K.; Freitas, M.P. Is conformation relevant for QSAR purposes? 2D Chemical representation in a 3D-QSAR perspective. J. Comput. Chem. 2022 , 43 , 917–922. [ Google Scholar ] [ CrossRef ]
  • Nikonenko, A.; Zankov, D.; Baskin, I.; Madzhidov, T.; Polishchuk, P. Multiple Conformer Descriptors for QSAR Modeling. Mol. Inform. 2021 , 40 , 2060030. [ Google Scholar ] [ CrossRef ]
  • Günther, S.; Senger, C.; Michalsky, E.; Goede, A.; Preissner, R. Representation of target-bound drugs by computed conformers: Implications for conformational libraries. BMC Bioinform. 2006 , 7 , 293. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Bosc, N.; Atkinson, F.; Felix, E.; Gaulton, A.; Hersey, A.; Leach, A.R. Large scale comparison of QSAR and conformal prediction methods and their applications in drug discovery. J. Cheminform. 2019 , 11 , 4. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Neves, B.J.; Braga, R.C.; Melo-Filho, C.C.; Moreira-Filho, J.T.; Muratov, E.N.; Andrade, C.H. QSAR-Based Virtual Screening: Advances and Applications in Drug Discovery. Front. Pharmacol. 2018 , 9 , 1275. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Kwon, S.; Bae, H.; Jo, J.; Yoon, S. Comprehensive ensemble in QSAR prediction for drug discovery. BMC Bioinform. 2019 , 20 , 521. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Anderson, A.C. The Process of Structure-Based Drug Design. Chem. Biol. 2003 , 10 , 787–797. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Pavan, M.; Bassani, D.; Sturlese, M.; Moro, S. From the Wuhan-Hu-1 strain to the XD and XE variants: Is targeting the SARS-CoV-2 spike protein still a pharmaceutically relevant option against COVID-19? J. Enzyme Inhib. Med. Chem. 2022 , 37 , 1704–1714. [ Google Scholar ] [ CrossRef ]
  • Kuntz, I.D.; Blaney, J.M.; Oatley, S.J.; Langridge, R.; Ferrin, T.E. A geometric approach to macromolecule-ligand interactions. J. Mol. Biol. 1982 , 161 , 269–288. [ Google Scholar ] [ CrossRef ]
  • Meng, X.-Y.; Zhang, H.-X.; Mezei, M.; Cui, M. Molecular Docking: A Powerful Approach for Structure-Based Drug Discovery. Curr. Comput. Aided-Drug Des. 2011 , 7 , 146–157. [ Google Scholar ] [ CrossRef ]
  • Pavan, M.; Menin, S.; Bassani, D.; Sturlese, M.; Moro, S. Implementing a Scoring Function Based on Interaction Fingerprint for Autogrow4: Protein Kinase CK1δ as a Case Study. Front. Mol. Biosci. 2022 , 9 , 909499. [ Google Scholar ] [ CrossRef ]
  • Liu, J.; Wang, R. Classification of Current Scoring Functions. J. Chem. Inf. Model. 2015 , 55 , 475–482. [ Google Scholar ] [ CrossRef ]
  • Guedes, I.A.; Pereira, F.S.S.; Dardenne, L.E. Empirical Scoring Functions for Structure-Based Virtual Screening: Applications, Critical Aspects, and Challenges. Front. Pharmacol. 2018 , 9 , 1089. [ Google Scholar ] [ CrossRef ]
  • Shen, Q.; Xiong, B.; Zheng, M.; Luo, X.; Luo, C.; Liu, X.; Du, Y.; Li, J.; Zhu, W.; Shen, J.; et al. Knowledge-Based Scoring Functions in Drug Design: 2. Can the Knowledge Base Be Enriched? J. Chem. Inf. Model. 2010 , 51 , 386–397. [ Google Scholar ] [ CrossRef ]
  • Li, J.; Fu, A.; Zhang, L. An Overview of Scoring Functions Used for Protein–Ligand Interactions in Molecular Docking. Interdiscip. Sci. Comput. Life Sci. 2019 , 11 , 320–328. [ Google Scholar ] [ CrossRef ]
  • Jones, G.; Willett, P.; Glen, R.C.; Leach, A.R.; Taylor, R. Development and validation of a genetic algorithm for flexible docking 1 1Edited by F. E. Cohen. J. Mol. Biol. 1997 , 267 , 727–748. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Friesner, R.A.; Banks, J.L.; Murphy, R.B.; Halgren, T.A.; Klicic, J.J.; Mainz, D.T.; Repasky, M.P.; Knoll, E.H.; Shelley, M.; Perry, J.K.; et al. Glide:  A New Approach for Rapid, Accurate Docking and Scoring. 1. Method and Assessment of Docking Accuracy. J. Med. Chem. 2004 , 47 , 1739–1749. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Korb, O.; Stützle, T.; Exner, T.E. PLANTS: Application of Ant Colony Optimization to Structure-Based Drug Design. In Ant Colony Optimization and Swarm Intelligence ; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4150, pp. 247–258. [ Google Scholar ] [ CrossRef ]
  • Pecoraro, C.; De Franco, M.; Carbone, D.; Bassani, D.; Pavan, M.; Cascioferro, S.; Parrino, B.; Cirrincione, G.; Dall’acqua, S.; Moro, S.; et al. 1,2,4-Amino-triazine derivatives as pyruvate dehydrogenase kinase inhibitors: Synthesis and pharmacological evaluation. Eur. J. Med. Chem. 2023 , 249 , 115134. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Torres, P.H.M.; Sodero, A.C.R.; Jofily, P.; Silva, F.P., Jr. Key Topics in Molecular Docking for Drug Design. Int. J. Mol. Sci. 2019 , 20 , 4574. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Morris, G.M.; Huey, R.; Lindstrom, W.; Sanner, M.F.; Belew, R.K.; Goodsell, D.S.; Olson, A.J. AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility. J. Comput. Chem. 2009 , 30 , 2785–2791. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Eberhardt, J.; Santos-Martins, D.; Tillack, A.F.; Forli, S. AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. J. Chem. Inf. Model. 2021 , 61 , 3891–3898. [ Google Scholar ] [ CrossRef ]
  • Cosconati, S.; Forli, S.; Perryman, A.L.; Harris, R.; Goodsell, D.S.; Olson, A.J. Virtual screening with AutoDock: Theory and practice. Expert Opin. Drug Discov. 2010 , 5 , 597–607. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Huang, S.-Y. Comprehensive assessment of flexible-ligand docking algorithms: Current effectiveness and challenges. Brief. Bioinform. 2018 , 19 , 982–994. [ Google Scholar ] [ CrossRef ]
  • Sotriffer, C.A. Accounting for Induced-Fit Effects in Docking: What is Possible and What is Not? Curr. Top. Med. Chem. 2011 , 11 , 179–191. [ Google Scholar ] [ CrossRef ]
  • Amaro, R.E.; Baudry, J.; Chodera, J.; Demir, Ö.; McCammon, J.A.; Miao, Y.; Smith, J.C. Ensemble Docking in Drug Discovery. Biophys. J. 2018 , 114 , 2271–2278. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Spinaci, A.; Buccioni, M.; Catarzi, D.; Cui, C.; Colotta, V.; Ben, D.D.; Cescon, E.; Francucci, B.; Grieco, I.; Lambertucci, C.; et al. Dual Anta-Inhibitors’ of the A2A Adenosine Receptor and Casein Kinase CK1delta: Synthesis, Biological Evaluation, and Molecular Modeling Studies. Pharmaceuticals 2023 , 16 , 167. [ Google Scholar ] [ CrossRef ]
  • Sartore, G.; Bassani, D.; Ragazzi, E.; Traldi, P.; Lapolla, A.; Moro, S. In silico evaluation of the interaction between ACE2 and SARS-CoV-2 Spike protein in a hyperglycemic environment. Sci. Rep. 2021 , 11 , 22860. [ Google Scholar ] [ CrossRef ]
  • Roberts, B.C.; Mancera, R.L. Ligand−Protein Docking with Water Molecules. J. Chem. Inf. Model. 2008 , 48 , 397–408. [ Google Scholar ] [ CrossRef ]
  • Deng, N.; Forli, S.; He, P.; Perryman, A.; Wickstrom, L.; Vijayan, R.S.K.; Tiefenbrunn, T.; Stout, D.; Gallicchio, E.; Olson, A.J.; et al. Distinguishing Binders from False Positives by Free Energy Calculations: Fragment Screening Against the Flap Site of HIV Protease. J. Phys. Chem. B 2015 , 119 , 976–988. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Poli, G.; Tuccinardi, T. Consensus Docking in Drug Discovery. Curr. Bioact. Compd. 2020 , 16 , 182–190. [ Google Scholar ] [ CrossRef ]
  • Houston, D.R.; Walkinshaw, M.D. Consensus Docking: Improving the Reliability of Docking in a Virtual Screening Context. J. Chem. Inf. Model. 2013 , 53 , 384–390. [ Google Scholar ] [ CrossRef ]
  • Bolcato, G.; Cescon, E.; Pavan, M.; Bissaro, M.; Bassani, D.; Federico, S.; Spalluto, G.; Sturlese, M.; Moro, S. A Computational Workflow for the Identification of Novel Fragments Acting as Inhibitors of the Activity of Protein Kinase CK1δ. Int. J. Mol. Sci. 2021 , 22 , 9741. [ Google Scholar ] [ CrossRef ]
  • Rastelli, G.; Pinzi, L. Refinement and Rescoring of Virtual Screening Results. Front. Chem. 2019 , 7 , 498. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Peach, M.L.; Nicklaus, M.C. Combining docking with pharmacophore filtering for improved virtual screening. J. Cheminform. 2009 , 1 , 6. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Shivanika, C.; Kumar, D.; Ragunathan, V.; Tiwari, P.; Sumitha, A. Molecular docking, validation, dynamics simulations, and pharmacokinetic prediction of natural compounds against the SARS-CoV-2 main-protease. J. Biomol. Struct. Dyn. 2022 , 40 , 585–611. [ Google Scholar ] [ CrossRef ]
  • Pavan, M.; Menin, S.; Bassani, D.; Sturlese, M.; Moro, S. Qualitative Estimation of Protein–Ligand Complex Stability through Thermal Titration Molecular Dynamics Simulations. J. Chem. Inf. Model. 2022 , 62 , 5715–5728. [ Google Scholar ] [ CrossRef ]
  • Menin, S.; Pavan, M.; Salmaso, V.; Sturlese, M.; Moro, S. Thermal Titration Molecular Dynamics (TTMD): Not Your Usual Post-Docking Refinement. Int. J. Mol. Sci. 2023 , 24 , 3596. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Pavan, M.; Bassani, D.; Bolcato, G.; Bissaro, M.; Sturlese, M.; Moro, S. Computational Strategies to Identify New Drug Candidates against Neuroinflammation. Curr. Med. Chem. 2022 , 29 , 4756–4775. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Hollingsworth, S.A.; Dror, R.O. Molecular Dynamics Simulation for All. Neuron 2018 , 99 , 1129–1143. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Pavan, M.; Bassani, D.; Sturlese, M.; Moro, S. Investigating RNA–protein recognition mechanisms through supervised molecular dynamics (SuMD) simulations. NAR Genom. Bioinform. 2022 , 4 , lqac088. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • De Vivo, M.; Masetti, M.; Bottegoni, G.; Cavalli, A. Role of Molecular Dynamics and Related Methods in Drug Discovery. J. Med. Chem. 2016 , 59 , 4035–4061. [ Google Scholar ] [ CrossRef ]
  • Jorgensen, W.L.; Chandrasekhar, J.; Madura, J.D.; Impey, R.W.; Klein, M.L. Comparison of simple potential functions for simulating liquid water. J. Chem. Phys. 1983 , 79 , 926–935. [ Google Scholar ] [ CrossRef ]
  • Durrant, J.D.; McCammon, J.A. Molecular dynamics simulations and drug discovery. BMC Biol. 2011 , 9 , 71. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Tzeliou, C.E.; Mermigki, M.A.; Tzeli, D. Review on the QM/MM Methodologies and Their Application to Metalloproteins. Molecules 2022 , 27 , 2660. [ Google Scholar ] [ CrossRef ]
  • Gorgulla, C.; Jayaraj, A.; Fackeldey, K.; Arthanari, H. Emerging frontiers in virtual drug discovery: From quantum mechanical methods to deep learning approaches. Curr. Opin. Chem. Biol. 2022 , 69 , 102156. [ Google Scholar ] [ CrossRef ]
  • Bassani, D.; Pavan, M.; Bolcato, G.; Sturlese, M.; Moro, S. Re-Exploring the Ability of Common Docking Programs to Correctly Reproduce the Binding Modes of Non-Covalent Inhibitors of SARS-CoV-2 Protease Mpro. Pharmaceuticals 2022 , 15 , 180. [ Google Scholar ] [ CrossRef ]
  • Lotz, S.D.; Dickson, A. Unbiased Molecular Dynamics of 11 min Timescale Drug Unbinding Reveals Transition State Stabilizing Interactions. J. Am. Chem. Soc. 2018 , 140 , 618–628. [ Google Scholar ] [ CrossRef ]
  • Shaw, D.E.; Adams, P.J.; Azaria, A.; Bank, J.A.; Batson, B.; Bell, A.; Bergdorf, M.; Bhatt, J.; Butts, J.A.; Correia, T.; et al. Anton 3. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, St. Louis, MO, USA, 14–19 November 2021; pp. 1–11. [ Google Scholar ] [ CrossRef ]
  • Hartmann, C.; Banisch, R.; Sarich, M.; Badowski, T.; Schütte, C. Characterization of Rare Events in Molecular Dynamics. Entropy 2013 , 16 , 350–376. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Lazim, R.; Suh, D.; Choi, S. Advances in Molecular Dynamics Simulations and Enhanced Sampling Methods for the Study of Protein Systems. Int. J. Mol. Sci. 2020 , 21 , 6339. [ Google Scholar ] [ CrossRef ]
  • Patel, J.S.; Berteotti, A.; Ronsisvalle, S.; Rocchia, W.; Cavalli, A. Steered Molecular Dynamics Simulations for Studying Protein–Ligand Interaction in Cyclin-Dependent Kinase 5. J. Chem. Inf. Model. 2014 , 54 , 470–480. [ Google Scholar ] [ CrossRef ]
  • Sinko, W.; Miao, Y.; de Oliveira, C.A.F.; McCammon, J.A. Population Based Reweighting of Scaled Molecular Dynamics. J. Phys. Chem. B 2013 , 117 , 12759–12768. [ Google Scholar ] [ CrossRef ]
  • Sugita, Y.; Okamoto, Y. Replica-exchange molecular dynamics method for protein folding. Chem. Phys. Lett. 1999 , 314 , 141–151. [ Google Scholar ] [ CrossRef ]
  • Bussi, G.; Laio, A. Using metadynamics to explore complex free-energy landscapes. Nat. Rev. Phys. 2020 , 2 , 200–212. [ Google Scholar ] [ CrossRef ]
  • Miao, Y.; Feher, V.A.; McCammon, J.A. Gaussian Accelerated Molecular Dynamics: Unconstrained Enhanced Sampling and Free Energy Calculation. J. Chem. Theory Comput. 2015 , 11 , 3584–3595. [ Google Scholar ] [ CrossRef ]
  • Yu, Z.; Su, H.; Chen, J.; Hu, G. Deciphering Conformational Changes of the GDP-Bound NRAS Induced by Mutations G13D, Q61R, and C118S through Gaussian Accelerated Molecular Dynamic Simulations. Molecules 2022 , 27 , 5596. [ Google Scholar ] [ CrossRef ]
  • Chen, J.; Zeng, Q.; Wang, W.; Sun, H.; Hu, G. Decoding the Identification Mechanism of an SAM-III Riboswitch on Ligands through Multiple Independent Gaussian-Accelerated Molecular Dynamics Simulations. J. Chem. Inf. Model. 2022 , 62 , 6118–6132. [ Google Scholar ] [ CrossRef ]
  • Sabbadin, D.; Moro, S. Supervised Molecular Dynamics (SuMD) as a Helpful Tool To Depict GPCR–Ligand Recognition Pathway in a Nanosecond Time Scale. J. Chem. Inf. Model. 2014 , 54 , 372–376. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Alonso, H.; Bliznyuk, A.A.; Gready, J.E. Combining docking and molecular dynamic simulations in drug design. Med. Res. Rev. 2006 , 26 , 531–568. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Fratev, F.; Sirimulla, S. An Improved Free Energy Perturbation FEP+ Sampling Protocol for Flexible Ligand-Binding Domains. Sci. Rep. 2019 , 9 , 16829. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Cournia, Z.; Allen, B.; Sherman, W. Relative Binding Free Energy Calculations in Drug Discovery: Recent Advances and Practical Considerations. J. Chem. Inf. Model. 2017 , 57 , 2911–2937. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Lovering, F.; Aevazelis, C.; Chang, J.; Dehnhardt, C.; Fitz, L.; Han, S.; Janz, K.; Lee, J.; Kaila, N.; McDonald, J.; et al. Imidazotriazines: Spleen Tyrosine Kinase (Syk) Inhibitors Identified by Free-Energy Perturbation (FEP). ChemMedChem 2016 , 11 , 217–233. [ Google Scholar ] [ CrossRef ]
  • Ngo, S.T.; Nguyen, H.M.; Huong, L.T.T.; Quan, P.M.; Truong, V.K.; Tung, N.T.; Vu, V.V. Assessing potential inhibitors of SARS-CoV-2 main protease from available drugs using free energy perturbation simulations. RSC Adv. 2020 , 10 , 40284–40290. [ Google Scholar ] [ CrossRef ]
  • Deflorian, F.; Perez-Benito, L.; Lenselink, E.B.; Congreve, M.; Van Vlijmen, H.W.T.; Mason, J.S.; De Graaf, C.; Tresadern, G. Accurate Prediction of GPCR Ligand Binding Affinity with Free Energy Perturbation. J. Chem. Inf. Model. 2020 , 60 , 5563–5579. [ Google Scholar ] [ CrossRef ]
  • Gapsys, V.; Yildirim, A.; Aldeghi, M.; Khalak, Y.; van der Spoel, D.; de Groot, B.L. Accurate absolute free energies for ligand–protein binding based on non-equilibrium approaches. Commun. Chem. 2021 , 4 , 61. [ Google Scholar ] [ CrossRef ]
  • Azimi, S.; Khuttan, S.; Wu, J.Z.; Pal, R.K.; Gallicchio, E. Relative Binding Free Energy Calculations for Ligands with Diverse Scaffolds with the Alchemical Transfer Method. J. Chem. Inf. Model. 2022 , 62 , 309–323. [ Google Scholar ] [ CrossRef ]
  • Wang, L.; Wu, Y.; Deng, Y.; Kim, B.; Pierce, L.; Krilov, G.; Lupyan, D.; Robinson, S.; Dahlgren, M.K.; Greenwood, J.; et al. Accurate and Reliable Prediction of Relative Ligand Binding Potency in Prospective Drug Discovery by Way of a Modern Free-Energy Calculation Protocol and Force Field. J. Am. Chem. Soc. 2015 , 137 , 2695–2703. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Abel, R.; Wang, L.; Harder, E.D.; Berne, B.J.; Friesner, R.A. Advancing Drug Discovery through Enhanced Free Energy Calculations. Acc. Chem. Res. 2017 , 50 , 1625–1632. [ Google Scholar ] [ CrossRef ]
  • LMartins, L.C.; Cino, E.A.; Ferreira, R.S. PyAutoFEP: An Automated Free Energy Perturbation Workflow for GROMACS Integrating Enhanced Sampling Methods. J. Chem. Theory Comput. 2021 , 17 , 4262–4273. [ Google Scholar ] [ CrossRef ]
  • Mey, A.S.J.S.; Allen, B.K.; Macdonald, H.E.B.; Chodera, J.D.; Hahn, D.F.; Kuhn, M.; Michel, J.; Mobley, D.L.; Naden, L.N.; Prasad, S.; et al. Best Practices for Alchemical Free Energy Calculations [Article v1.0]. Living J. Comput. Mol. Sci. 2020 , 2 , 18378. [ Google Scholar ] [ CrossRef ]
  • Wu, D.; Zheng, X.; Liu, R.; Li, Z.; Jiang, Z.; Zhou, Q.; Huang, Y.; Wu, X.-N.; Zhang, C.; Huang, Y.-Y.; et al. Free energy perturbation (FEP)-guided scaffold hopping. Acta Pharm. Sin. B 2021 , 12 , 1351–1362. [ Google Scholar ] [ CrossRef ]
  • Steinbrecher, T.B.; Dahlgren, M.; Cappel, D.; Lin, T.; Wang, L.; Krilov, G.; Abel, R.; Friesner, R.; Sherman, W. Accurate Binding Free Energy Predictions in Fragment Optimization. J. Chem. Inf. Model. 2015 , 55 , 2411–2420. [ Google Scholar ] [ CrossRef ]
  • Resat, H.; Mezei, M. Studies on free energy calculations. I. Thermodynamic integration using a polynomial path. J. Chem. Phys. 1993 , 99 , 6052–6061. [ Google Scholar ] [ CrossRef ]
  • Bruckner, S.; Boresch, S. Efficiency of alchemical free energy simulations. II. Improvements for thermodynamic integration. J. Comput. Chem. 2010 , 32 , 1320–1333. [ Google Scholar ] [ CrossRef ]
  • Zhang, Q.; Yang, Y.; Gong, X.; Zhao, N.; Zhang, Y.; Liu, H. Thermodynamic integration combined with molecular dynamic simulations to explore the cross-resistance mechanism of isoniazid and ethionamide. Proteins Struct. Funct. Bioinform. 2022 , 90 , 1142–1151. [ Google Scholar ] [ CrossRef ]
  • Mishra, S.K.; Calabró, G.; Loeffler, H.H.; Michel, J.; Koča, J. Evaluation of Selected Classical Force Fields for Alchemical Binding Free Energy Calculations of Protein-Carbohydrate Complexes. J. Chem. Theory Comput. 2015 , 11 , 3333–3345. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Huai, Z.; Shen, Z.; Sun, Z. Binding Thermodynamics and Interaction Patterns of Inhibitor-Major Urinary Protein-I Binding from Extensive Free-Energy Calculations: Benchmarking AMBER Force Fields. J. Chem. Inf. Model. 2020 , 61 , 284–297. [ Google Scholar ] [ CrossRef ]
  • Christ, C.D.; Fox, T. Accuracy Assessment and Automation of Free Energy Calculations for Drug Design. J. Chem. Inf. Model. 2014 , 54 , 108–120. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Garbett, N.C.; Chaires, J.B. Thermodynamic studies for drug design and screening. Expert Opin. Drug Discov. 2012 , 7 , 299–314. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Decherchi, S.; Cavalli, A. Thermodynamics and Kinetics of Drug-Target Binding by Molecular Simulation. Chem. Rev. 2020 , 120 , 12788–12833. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Endter, L.J.; Smirnova, Y.; Risselada, H.J. Density Field Thermodynamic Integration (DFTI): A ‘Soft’ Approach to Calculate the Free Energy of Surfactant Self-Assemblies. J. Phys. Chem. B 2020 , 124 , 6775–6785. [ Google Scholar ] [ CrossRef ]
  • Kaästner, J.; Thiel, W. Bridging the gap between thermodynamic integration and umbrella sampling provides a novel analysis method: “Umbrella integration”. J. Chem. Phys. 2005 , 123 , 144104. [ Google Scholar ] [ CrossRef ]
  • Pan, A.C.; Xu, H.; Palpant, T.; Shaw, D.E. Quantitative Characterization of the Binding and Unbinding of Millimolar Drug Fragments with Molecular Dynamics Simulations. J. Chem. Theory Comput. 2017 , 13 , 3372–3377. [ Google Scholar ] [ CrossRef ]
  • ALong, A.; Zhao, H.; Huang, X. Structural Basis for the Interaction between Casein Kinase 1 Delta and a Potent and Selective Inhibitor. J. Med. Chem. 2012 , 55 , 956–960. [ Google Scholar ] [ CrossRef ]
  • Ursu, A.; Illich, D.J.; Takemoto, Y.; Porfetye, A.T.; Zhang, M.; Brockmeyer, A.; Janning, P.; Watanabe, N.; Osada, H.; Vetter, I.R.; et al. Epiblastin A Induces Reprogramming of Epiblast Stem Cells Into Embryonic Stem Cells by Inhibition of Casein Kinase 1. Cell Chem. Biol. 2016 , 23 , 494–507. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Husic, B.E.; Pande, V.S. Markov State Models: From an Art to a Science. J. Am. Chem. Soc. 2018 , 140 , 2386–2396. [ Google Scholar ] [ CrossRef ]
  • Bernardi, R.C.; Melo, M.C.; Schulten, K. Enhanced sampling techniques in molecular dynamics simulations of biological systems. Biochim. Biophys. Acta (BBA)—Gen. Subj. 2015 , 1850 , 872–877. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Cuzzolin, A.; Sturlese, M.; Deganutti, G.; Salmaso, V.; Sabbadin, D.; Ciancetta, A.; Moro, S. Deciphering the Complexity of Ligand-Protein Recognition Pathways Using Supervised Molecular Dynamics (SuMD) Simulations. J. Chem. Inf. Model. 2016 , 56 , 687–705. [ Google Scholar ] [ CrossRef ]
  • Sabbadin, D.; Ciancetta, A.; Deganutti, G.; Cuzzolin, A.; Moro, S. Exploring the recognition pathway at the human A 2A adenosine receptor of the endogenous agonist adenosine using supervised molecular dynamics simulations. Medchemcomm 2015 , 6 , 1081–1085. [ Google Scholar ] [ CrossRef ]
  • Bolcato, G.; Pavan, M.; Bassani, D.; Sturlese, M.; Moro, S. Ribose and Non-Ribose A2A Adenosine Receptor Agonists: Do They Share the Same Receptor Recognition Mechanism? Biomedicines 2022 , 10 , 515. [ Google Scholar ] [ CrossRef ]
  • Pavan, M.; Bolcato, G.; Bassani, D.; Sturlese, M.; Moro, S. Supervised Molecular Dynamics (SuMD) Insights into the mechanism of action of SARS-CoV-2 main protease inhibitor PF-07321332. J. Enzyme Inhib. Med. Chem. 2021 , 36 , 1645–1649. [ Google Scholar ] [ CrossRef ]
  • Panday, S.K.; Sturlese, M.; Salmaso, V.; Ghosh, I.; Moro, S. Coupling Supervised Molecular Dynamics (SuMD) with Entropy Estimations To Shine Light on the Stability of Multiple Binding Sites. ACS Med. Chem. Lett. 2019 , 10 , 444–449. [ Google Scholar ] [ CrossRef ]
  • Salmaso, V.; Sturlese, M.; Cuzzolin, A.; Moro, S. Exploring Protein-Peptide Recognition Pathways Using a Supervised Molecular Dynamics Approach. Structure 2017 , 25 , 655–662.e2. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Jayatunga, M.K.; Xie, W.; Ruder, L.; Schulze, U.; Meier, C. AI in small-molecule drug discovery: A coming wave? Nat. Rev. Drug Discov. 2022 , 21 , 175–176. [ Google Scholar ] [ CrossRef ]
  • Boniolo, F.; Dorigatti, E.; Ohnmacht, A.J.; Saur, D.; Schubert, B.; Menden, M.P. Artificial intelligence in early drug discovery enabling precision medicine. Expert Opin. Drug Discov. 2021 , 16 , 991–1007. [ Google Scholar ] [ CrossRef ]
  • Cavasotto, C.N.; Di Filippo, J.I. Artificial intelligence in the early stages of drug discovery. Arch. Biochem. Biophys. 2020 , 698 , 108730. [ Google Scholar ] [ CrossRef ]
  • Füzi, B.; Mathai, N.; Kirchmair, J.; Ecker, G.F. Toxicity prediction using target, interactome, and pathway profiles as descriptors. Toxicol. Lett. 2023 , 381 , 20–26. [ Google Scholar ] [ CrossRef ]
  • Lysenko, A.; Sharma, A.; Boroevich, K.A.; Tsunoda, T. An integrative machine learning approach for prediction of toxicity-related drug safety. Life Sci. Alliance 2018 , 1 , e201800098. [ Google Scholar ] [ CrossRef ] [ PubMed ] [ Green Version ]
  • Hao, Y.; Moore, J.H. TargetTox: A Feature Selection Pipeline for Identifying Predictive Targets Associated with Drug Toxicity. J. Chem. Inf. Model. 2021 , 61 , 5386–5394. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Atz, K.; Isert, C.; Böcker, M.N.A.; Jiménez-Luna, J.; Schneider, G. Δ-Quantum machine-learning for medicinal chemistry. Phys. Chem. Chem. Phys. 2022 , 24 , 10775–10783. [ Google Scholar ] [ CrossRef ] [ PubMed ]
  • Pu, L.; Govindaraj, R.G.; Lemoine, J.M.; Wu, H.-C.; Brylinski, M. DeepDrug3D: Classification of ligand-binding pockets in proteins with a convolutional neural network. PLoS Comput. Biol. 2019 , 15 , e1006718. [ Google Scholar ] [ CrossRef ] [ Green Version ]
  • Ain, Q.U.; Aleksandrova, A.; Roessler, F.D.; Ballester, P.J. Machine-learning scoring functions to improve structure-based binding affinity prediction and virtual screening. Wiley Interdiscip. Rev. Comput. Mol. Sci. 2015 , 5 , 405–424. [ Google Scholar ] [ CrossRef ]
  • Graff, D.E.; Shakhnovich, E.I.; Coley, C.W. Accelerating high-throughput virtual screening through molecular pool-based active learning. Chem. Sci. 2021 , 12 , 7866–7881. [ Google Scholar ] [ CrossRef ]
  • Pozzan, A. QM Calculations in ADMET Prediction. Quantum Mech. Drug Discov. 2020 , 2114 , 285–305. [ Google Scholar ] [ CrossRef ]
  • Isert, C.; Atz, K.; Jiménez-Luna, J.; Schneider, G. QMugs, quantum mechanical properties of drug-like molecules. Sci. Data 2022 , 9 , 273. [ Google Scholar ] [ CrossRef ]
  • Böselt, L.; Thürlemann, M.; Riniker, S. Machine Learning in QM/MM Molecular Dynamics Simulations of Condensed-Phase Systems. J. Chem. Theory Comput. 2021 , 17 , 2641–2658. [ Google Scholar ] [ CrossRef ]

Click here to enlarge figure

The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

Bassani, D.; Moro, S. Past, Present, and Future Perspectives on Computer-Aided Drug Design Methodologies. Molecules 2023 , 28 , 3906. https://doi.org/10.3390/molecules28093906

Bassani D, Moro S. Past, Present, and Future Perspectives on Computer-Aided Drug Design Methodologies. Molecules . 2023; 28(9):3906. https://doi.org/10.3390/molecules28093906

Bassani, Davide, and Stefano Moro. 2023. "Past, Present, and Future Perspectives on Computer-Aided Drug Design Methodologies" Molecules 28, no. 9: 3906. https://doi.org/10.3390/molecules28093906

Article Metrics

Article access statistics, further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

IMAGES

  1. Drug discovery process and computer-aided drug design.

    computer aided drug design research

  2. Overview of a typical ensemble-based computer-aided drug design

    computer aided drug design research

  3. Computer Aided Drug Discovery: Opportunities to Optimize

    computer aided drug design research

  4. Computer-Aided Drug Design (CADD)- Definition, Types, Uses, Examples

    computer aided drug design research

  5. (PDF) Computer-Aided Drug Design for Undergraduates

    computer aided drug design research

  6. Computer Aided Drug Design and its Application to the Development of

    computer aided drug design research

VIDEO

  1. Lecture 1 of computer-aided drug design

  2. Introduction of Computer-Aided Drug Design. || Private Batch ||

  3. Computer Aided Drug Design (CADD) Methodologies and Ongoing Research

  4. Lecture on Molecular docking || Private Batch ||

  5. Computer Aided Drug Design (CADD)

  6. Basics of CADD

COMMENTS

  1. An Updated Review of Computer-Aided Drug Design and Its...

    Computer-aided drug design has helped to expedite the drug discovery and development process by minimizing the cost and time. In this review article, we highlight two important categories of computer-aided drug design (CADD), viz., the ligand-based as well as structured-based drug discovery.

  2. Computational approaches streamlining drug discovery | Nature

    Computer-aided drug discovery has been around for decades, although the past few years have seen a tectonic shift towards embracing computational technologies in both academia and pharma.

  3. (PDF) COMPUTER AIDED DRUG DESIGN: AN OVERVIEW - ResearchGate

    Computer-aided drug design (CADD) is an alternative in-silico tool in the field of drug development and discovery, which can quickly identify the most promising therapeutic candidate at a low...

  4. Computer Aided Drug Design and its Application to the ...

    Computer-Aided Drug Design (CADD) has emerged as an efficient means of developing candidate drugs for the treatment of many disease types. Applications of CADD approach to drug discovery are progressing day by day.

  5. CADD, AI and ML in drug discovery: A comprehensive review

    Computer-Aided drug design is the most beneficial method in early-phase drug discovery. • CADD reduces drug discovery costs, and the process is expedited. • Essential pillars of drug discovery are CADD, AI, and ML tools. • Structure-based and Ligand-based drug design are essential tools of CADD.

  6. Computer-Aided Drug Design Methods – An update - PMC

    Computer-Aided drug design (CADD) approaches are playing an increasingly important roles in understanding the fundamentals of ligand-receptor interactions and helping medicinal chemists design therapeutics.

  7. Computer-aided drug design | PLOS ONE

    Showing 1 - 13 of 24. View by: Cover Page List Articles. Sort by: Recent Popular. Insights into the computer-aided drug design and discovery based on anthraquinone scaffold for cancer treatment: A systematic review. Hui Ming Chua, Said Moshawih, [ ... ], Long Chiau Ming.

  8. Computer-Aided Drug Design - an overview - ScienceDirect

    Computer-Aided drug design is the most beneficial method in early-phase drug discovery. • CADD reduces drug discovery costs, and the process is expedited. • Essential pillars of drug discovery are CADD, AI, and ML tools. • Structure-based and Ligand-based drug design are essential tools of CADD.

  9. Computational Drug Design Methods—Current and Future ...

    Computer-aided drug design (CADD) comprises a broad range of theoretical and computational approaches that are part of modern drug discovery. CADD methods have made key contributions to the development of drugs that are in clinical use or in clinical trials.

  10. Past, Present, and Future Perspectives on Computer-Aided Drug ...

    1. Introduction: The Benefits of Computational Methods for Drug Discovery. 1.1. The Drug Discovery Pipeline and the Problem of Candidate Selection.