Advertisement

Advertisement

Deep learning in drug discovery: an integrative review and future challenges

  • Open access
  • Published: 17 November 2022
  • Volume 56 , pages 5975–6037, ( 2023 )

Cite this article

You have full access to this open access article

research paper about drug development

  • Heba Askr 1 ,
  • Enas Elgeldawi 2 ,
  • Heba Aboul Ella 4 ,
  • Yaseen A. M. M. Elshaier 5 ,
  • Mamdouh M. Gomaa 2 &
  • Aboul Ella Hassanien 3  

36k Accesses

64 Citations

12 Altmetric

Explore all metrics

Recently, using artificial intelligence (AI) in drug discovery has received much attention since it significantly shortens the time and cost of developing new drugs. Deep learning (DL)-based approaches are increasingly being used in all stages of drug development as DL technology advances, and drug-related data grows. Therefore, this paper presents a systematic Literature review (SLR) that integrates the recent DL technologies and applications in drug discovery Including, drug–target interactions (DTIs), drug–drug similarity interactions (DDIs), drug sensitivity and responsiveness, and drug-side effect predictions. We present a review of more than 300 articles between 2000 and 2022. The benchmark data sets, the databases, and the evaluation measures are also presented. In addition, this paper provides an overview of how explainable AI (XAI) supports drug discovery problems. The drug dosing optimization and success stories are discussed as well. Finally, digital twining (DT) and open issues are suggested as future research challenges for drug discovery problems. Challenges to be addressed, future research directions are identified, and an extensive bibliography is also included.

Similar content being viewed by others

research paper about drug development

Applications of artificial intelligence to drug design and discovery in the big data era: a comprehensive review

research paper about drug development

Revolutionizing Drug Discovery: Unleashing AI’s Potential in Pharmaceutical Innovation

research paper about drug development

A review of machine learning-based methods for predicting drug–target interactions

Explore related subjects.

  • Artificial Intelligence

Avoid common mistakes on your manuscript.

1 Introduction

The examination of how various drugs interact with the body and how a medication needs to act on the body to have a therapeutic impact is known as drug discovery. Drug discovery strategy constitutes from different approaches as physiology-based and target based. This strategy is based on information about the ligand and the target. In this regard, our attention was directed in certain topics especially drug (ligand)–target interactions, drug sensitivity and response, drug–drug interaction, and drug–drug similarity. For certain diseases such as cancer or pandemic situations as COVID-19, more than one drug combination is required to alleviate the prognosis and pathogenesis interactions. Despite all the recent advances in pharmaceuticals, medication development is still a labor-intensive and costly process. As a result, several computational algorithms are proposed to speed up the drug discovery process (Betsabeh and Mansoor 2021 ).

As DL models progress and the drug data size is getting bigger, a slew of new DL-based approaches is cropping up at every stage of the drug development process (Kim et al. 2021 ). In addition, we’ve seen large pharmaceutical corporations migrate toward AI in the wake of the development of DL approaches, eschewing outmoded, ineffective procedures to increase patient profit while also increasing their own (Nag et al. 2022 ). Despite the DL impressive performance, it remains a critical and challenging task, and there is a chance for researchers to develop several algorithms that improve drug discovery performance. Therefore, this paper presents a SLR that integrates the recent DL technologies and applications in drug discovery. This review study is the first one that incorporates the recent DL models and applications for the different categories of drug discovery problems such as DTIs, DDIs similarity, drug sensitivity and response, and drug-side effects predictions, as well as presenting new challenging topics such as XAI and DT and how they help the advancement of the drug discovery problems. In addition, the paper supports the researchers with the most frequently used datasets in the field.

The paper is developed based on six building blocks as shown in Fig.  1 . More than 300 articles are presented in this paper, and they are divided across these building blocks. The papers are selected using the following criteria:

The papers which published from 2000 to 2022.

The papers which published in IEEE, ACM, Elsevier, and Springer have more priority.

figure 1

The main building blocks of the paper

The following analytical questions are discussed and completely being answered in the paper:

AQ1: What DL algorithms have been used to predict the different categories of drug discovery problems?

AQ2: Which deep learning methods are mostly used in drug dosing optimization?

AQ3: Are there any success stories about drug discovery and DL?

AQ4: What about the newest technologies such as XAI and DT in drug discovery?

AQ5: What are the future and open works related to drug discovery and DL?

The remainder of this review paper is organized as: Sect.  2 presents a review of related studies; Sect.  3 covers the various DL techniques as an overview. Section  4 presents the organization of DL applications in drug discovery problems through explaining each drug discovery problem category and gives a literature review of the DL techniques used. Section  5 discusses the numerous benchmark data sets and databases that have been employed in the drug development process. Section  6 presents the evaluation metrics used for each drug discovery problem category. The drug dose optimization, successful stories, and XAI are introduced in Sect.  7 , Sect.  8 , and Sect.  9 . DT and open problems are suggested as future research challenges in Sects.  10 and 11 . Section  12 presents a discussion of the analytical questions. Finally, Sect.  13 concludes the paper.

2 Review of related studies

Although the drug discovery is a large field and has different research categories, there is a few review studies about this field and each related study has focused only on a one research category such as reviewing the DL applications for the DTIs. This section aims to review these related studies and a summary is presented in Table 1 .

Kim et al. ( 2021 ) presented a survey of DL models in the prediction of drug–target interaction (DTI) and new medication development. They start by providing a thorough summary of many depictions of drugs and proteins, DL applications, and widely used exemplary data sets to test and train models. One good point for this study, they identify a few obstacles to the bright future of de novo drug creation and DL-based DTI prediction. However, the major drawback of this study was that it did not consider the latest technology in DL application for the DTIs such as XAI and DTs.

Rifaioglu et al. ( 2019 ) presented the recent ML applications in Virtual Screening (VS) with the techniques, instruments, databases, and materials utilized to create the model. They outline what VS is and how crucial it is to the process of finding new drugs. Good points for this study, they highlighted the DL technologies that are accessible as open access programming libraries and provided instances of VS investigations that resulted in the discovery of novel bioactive chemicals and medications, tool kits and frameworks, and can be employed for the foreseeable future's computational drug discovery (including DTI prediction). However, they did not consider the drug dose optimization in their literature review.

Sachdev and Gupta ( 2019 ) presented the various feature based chemogenomic methods for DTIs prediction. They offer a thorough review of the different methodologies, datasets, tools, and measurements. They give a current overview of the various feature-based methodologies. Additionally, it describes relevant datasets, methods for determining medication or target properties, and evaluation measures. Although the study considered the initial integrated review which concentrate only on DTI feature-based techniques, they did not consider the latest technology in DL application for the DTIs such as XAI and DTs.

3 Deep learning (DL) techniques

Detecting spam, recommending videos, classifying images, and retrieving multimedia ideas are just a few of the techniques used are just a few of the applications where machine learning (ML) has lately gained favor in research. Deep learning (DL) is one of the most extensively utilized ML methods in these applications. The ongoing appearance of new DL studies is due to the unpredictability of data acquisition and the incredible progress made in hardware technologies. DL is based on conventional neural networks but outperforms them significantly. Furthermore, DL uses transformations and graph technology to build multi-layer learning models (Kim et al. 2021 ). With their groundbreaking invention, Machine Learning and Deep Learning have revolutionized the world's perspective. Deep learning approaches have revolutionized the way we tackle problems. Deep learning models come in various shapes and sizes, capable of effectively resolving problems that are too complex for standard approaches to tackle. We'll review the various deep learning models in this section (Sarker 2021 ).

3.1 Classic neural networks

As shown in Fig.  2 , Multi-layer perceptron are frequently employed to recognize Fully Connected Neural Networks. It involves converting the algorithm into simple two-digit data inputs (Mukhamediev et al. 2021 ). This paradigm allows for both linear and nonlinear functions to be included. The linear function is a single line with a constant multiplier that multiplies its inputs. Sigmoid Curve, Hyperbolic Tangent, and Rectified Linear Unit are three representations for nonlinear functions. This model is best for categorization and regression issues with real-valued data and a flexible model of any kind.

figure 2

Multilayer Perceptron or ANN

3.2 Convolutional neural networks (CNN)

As shown in Fig.  3 , The classic convolutional neural network (CNN) model is an advanced and high-potential variant ANN Which developed to manage escalating complexity levels, as well as data pretreatment and compilation. It is based on how an animal's visual cortex's neurons are arranged (Amashita et al. 2018 ). One of the most flexible algorithms for the processing of data with and without images is CNNs. CNN can be processed through 4 phases:

For analyzing basic visual data, such as picture pixels, it includes one input layer that is often the case a 2D array of neurons.

Some CNNs analyze images on their inputs using a single-dimensional output layer of neurons coupled to distributed convolutional layers.

Layer number 3, called as the sampling layer, is included in CNNs o restrict the number of neurons which It took part in the relevant network levels.

The sampling and output layers are joined by one or more connected layers in CNNs.

figure 3

Convolutional Neural Networks (CNN)

This network concept can potentially aid in extracting relevant visual data in pieces or smaller units. In the CNN, the neurons are responsible for the group of neurons from the preceding layer.

After the input data has been included into the convolutional model, the CNN is constructed in four steps:

Convolution: The method produces feature maps based on supplied data., which are then subjected to a purpose.

Max-Pooling: It aids CNN in detecting an image based on supplied changes.

Flattening: The data is flattened in this stage so that a CNN can analyze it.

Full Connection: It's sometimes referred to as a "hidden layer" which creates the loss function for a model.

Image recognition, image analysis, image segmentation, video analysis, and natural language processing (NLP) (Chauhan et al. 2018 ; Tajbakhsh et al. May 2016 ; Mohamed et al. 2020 ; Zhang et al. 2018 ) are among the tasks that CNNs are capable of.

3.3 Recurrent neural networks (RNNs)

RNNs were first created to help in sequence prediction. These networks rely solely on data streams with different lengths as inputs. For the most recent forecast, the knowledge of its previous state is used as an input value by the RNN. As a result, it can help a network's short-term memory achievers (Tehseen et al. 2019 ). As shown in Fig.  4 , The Long Short-Term Memory (LSTM) method, for example, is renowned for its adaptability.

figure 4

LSTM Network

LSTMs, which are advantageous in predicting data in time sequences using memory, and LSTMs, which are useful in predicting data in time sequences using memory, are two forms of RNN designs that aid in the study of problems. The three gates are Input, Output, and Forget. Gated RNNs are particularly helpful for temporal sequence prediction using memory-based data. Both types of algorithms can be used to address a range of issues, including image classification (Chandra and Sharma 2017 ), sentiment analysis (Failed 2018 ), video classification (Abramovich et al. 2018 ), language translation (Hermanto et al. 2015 ), and more.

3.4 Generative adversarial networks: GAN

As shown in Fig.  5 , It combines a Generator and a Discriminator DL neural network approach. The Discriminator helps to discriminate between real and fake data while the Generator Network creates bogus data (Alankrita et al. 2021 ).

figure 5

GAN: Generative Adversarial Networks

Both networks compete with one another as The Discriminator still distinguishes between actual and fake data, and the Generator keeps making fake data look like real data. The Generator network will generate simulated data for the authentic photos if a picture library is necessary. Then, a deconvolution neural network would be created. Then, an Image Detector network would be utilized to discriminate between fictitious and real images. This competition would eventually help the network's performance. It can be employed in creating images and texts, enhancing the image and discovering new drugs.

3.5 Self-organizing maps (SOM)

As shown in Fig.  6 , Self-Organizing Maps operate by leveraging unsupervised data to decrease a model's number of random variables (Kohonen 1990 ). Given that every synapse is linked to both its input and output nodes, the output dimension in this DL approach is set as a two-dimensional model. The competition between each data point and its model representation in the Self-Organizing Maps, the weight of the closest nodes or Best Matching Units is adjusted (BMUs). The value of the weights varies based on how close a BMU is. The value represents the node's position in the network because weights are a node attribute in and of themselves. It's great for evaluating dataset frameworks that don't have a Y-axis value or project explorations that don't have a Y-axis value.

figure 6

Self-Organizing Maps (SOM)

3.6 Boltzmann machines

As shown in Fig.  7 , the nodes are connected in a circular pattern because there is no set orientation in this network model. This deep learning technique is utilized to generate model parameters because of its uniqueness. The Boltzmann Machines model is stochastic, unlike all preceding deterministic network models. It can monitor systems, create a binary recommendation platform, and analyze specific datasets (Hinton 2011 ).

figure 7

Boltzmann Machines

The architecture of the Boltzmann Machine is a two-layer neural network. The visible or input layer is the first, while the hidden layer is the second. They are made up of several neuron-like nodes that carry out computations. These nodes are interconnected at different levels but are not linked across nodes in the same layer. As a result, there is no connectivity between layers, which is one of the Boltzmann machine's disadvantages. When data is supplied into these nodes, it is transformed into a graph, and they process it and learn all the parameters, motifs, and relations between them before deciding whether to transmit it. As a result, an Unsupervised DL model is often known as a Boltzmann Machine.

3.7 Autoencoders

As shown in Fig.  8 , This algorithm, one of the most popular deep learning algorithms, automatically based on its inputs, applies an activation function, and decodes the result at the end. Because of the backlog, there are fewer types of data produced, and the built-in data structures are used to their fullest extent (Zhai et al. 2018 ).

figure 8

Autoencoders

There are various types of autoencoders:

Sparse: The generalization technique is used when the hidden layers outnumber the input layer to decrease the overfitting. It constrains the loss function and restricts the autoencoder from utilizing all its nodes simultaneously.

Denoising: In this case, randomly, the inputs are adjusted and made to equal 0.

Contractive: When the hidden layer outnumbers the input layer, to avoid overfitting and data duplication, a penalty factor is introduced to the loss function.

Stacked: When another hidden layer is added to an autoencoder, it results in two stages of encoding and Initial stages of decoding.

Feature identification, establishing a strong recommendation model, and adding features to enormous datasets are some of the difficulties it can solve.

4 Organization of DL applications in drug discovery problems

The evolution of safe and effective treatments for human is the primary goal of drug discovery (Kim et al. 2021 ). Drug discovery is the problem of finding the suitable drugs to treat a disease (i.e., a target protein) which relies on several interactions. This paper divides the drug discovery problems into four main categories, as presented in Fig.  9 . They are drug–target interactions, drug–drug similarity, drug combinations side effects, and drug sensitivity and response predictions. The following subsections provide a literature review of DL with these problems and some of the investigated literature articles related to each category are summarized in Table 2 .

figure 9

Drug discovery problem categories

4.1 Drug–target interactions prediction using DL

Drug repurposing attempts to uncover new uses for drugs that are already on the market and have been approved. It has attracted much attention since it takes less time, costs less money, and has a greater success rate than traditional de novo drug development (Thafar et al. 2022 ). The discovery of drug–target interactions is the initial step in creating new medications, as well as one of the most crucial aspects of drug screening and drug-guided synthesis (Wang et al. 2020a ). Exploring the link between possible medications and targets can aid researchers in better understanding the pathophysiology of targets at the drug level, which can help with the disease's early detection, treatment prognosis, and drug design. This is well known as drug–target interactions (DTIs) (Lian et al. 2021 ). Achieving success to the drug repositioning mechanism largely reliant on DTI's forecast because it reduces the number of potential medication candidates for specific targets. The approaches based on molecular docking and the approaches based on drugs are the two basic tactics used in traditional computational methods. When target proteins' 3D structures aren't available, the effectiveness of molecular docking is limited. When there are only a few known binding molecules for a target, drug-based techniques typically produce subpar prediction results. DL technologies overcome the restrictions of the high-dimensional structure of drug and target protein by using unstructured-based approaches which do not need 3D structural data or docking for DTI prediction. Therefore, this section provides a recent comprehensive review of DL-based DTIs prediction models (Chen et al. 2012 ).

As shown in Fig.  10 , there are known interactions (solid lines) and unknown interactions (dashed lines) between diseases (proteins) and drugs. DTIs forecast unknown interactions or what diseases (or target proteins) a new drug might treat. According to their input features, we divided the latest DL models used to predict DTIs into three categories: drug-based models, structure (graph)-based models, and drug-protein(disease)-based models.

figure 10

DL models used for predicting the DTIs are grouped into three categories: a drug-based models, b structure (graph)-based models, and c drug-protein(disease)-based models

4.1.1 Drug-based models

Figure  10 A shows drug-based models that assume a potential drug will be like known drugs for the target proteins. It calculates the DTI using the target's medication information. Similarity search strategies are used in these models, which postulate that structurally similar substances have similar biological functions (Thafar et al. 2019 ; Matsuzaka and Uesawa 2019 ). These methods have been used for decades to select compounds in vast compound libraries employing massive computer jobs or solve problems using human calculations. Deep neural network models gradually narrow the gap between in silico prediction and empirical study, and DL technology can shorten these time-consuming procedures and manual operations.

Researchers may now use deep neural networks to analyze medicines and predict drug-related features, including as bioactivities and physicochemical qualities, thanks to using benchmark packages like MoleculeNet (Wu et al. 2018 ) and DeepChem (). As a result, basic neural networks like MLP and CNN have been used in numerous drug-based DL approaches (Zeng et al. 2020 ; Yang et al. 2019 ; Liu et al. 2017 ). The representation power of molecular descriptors was often the focus of ADMET investigations rather than the model itself (Zhai et al. 2018 ; Liu et al. 2017 ; Kim et al. 2016 ; Tang et al. 2014 ). Hirohara et al. trained a CNN model with the SMILES string and then used learned attributes to discover motifs using significant structures for locations that bind proteins or unidentified functional groupings (Hirohara et al. 2018 ). Atom pairs and pharmacophoric donor–acceptor pairings have been employed by Wenzel et al. ( 2019 ) as adjectives in multi-task deep neural networks to predict microsomal metabolic liability. Gao et al. ( 2019 ) compared 6 different kinds of 2D fingerprints in the prediction of affinity between proteins and drugs using ML methods such as RF, single-task DNN, and multi-task DNN models. Matsuzaka and Uesawa ( 2019 ) used 2D pictures of 3D chemical compounds to train a CNN model to predict constitutive androstane receptor agonists. They optimized the greatest performance in snapshots of a 3D ball-and-stick model taken at various angles or coordinates. Therefore, the method outperformed seven common 3D chemical structure forecasts.

Since the GCN's development, drug related GCN models have created depictions of graphs which concerned with molecules that incorporate details on the chemical structures by adding up the adjacent atoms' properties (Gilmer et al. 2017 ).

GCNs have been employed as 3D descriptors instead of SMILES strings in a lot of research, and it's been discovered that these learned descriptors outperform standard descriptors in prediction tests and are easier to understand (Shin et al. 2019 ; Ozturk et al. 2018 ; Yu et al. 2019 ). Chemi-net employed GCN models to represent molecules and compared the performance of single-task and multi-task DNNs on their own QSAR datasets (Liu et al. 2019a ). Yang et al. ( 2019 ) introduced the directed message passing neural network, which uses a directed message-passing paradigm, as a more advanced model (D-MPNN). They tested their approaches on 19 publicly available and 16 privately held datasets and discovered that in most situations, they were correct. The D-MPNN models outperformed the previous models. In two datasets, they underperformed and were not as resilient as typical 3D descriptors when the sample was small or unbalanced. The D-MPNN model was then employed by another research group to correctly forecast a kind of antibiotic named HALICIN, which demonstrated bactericide effects in models for mice (Stokes et al. 2020 ). This was the first incident that resulted in the finding of an antibiotic by using DL methods to explore a large-scale chemical space that current experimental methodologies cannot afford. The application of attention-based graph neural networks is another interesting contemporary method (Sun et al. 2020a ). Edge weights and node features can be learned together since a molecule's graph representations can be altered by edge properties. As a result, Shang et al. suggested a multi-relational GCN with edge attention (Shang et al. 2018 ). For each edge, they created a reference guide on attention spans. Because it is used throughout the molecule, the approach can handle a wide range of input sizes.

In the Tox21 and HIV benchmark datasets, they found that this model performed better than the random forest model. As a result, the model may effectively learn pre-aligned features from the molecular graph's inherent qualities. Withnall et al. ( 2020 ) extended the MPNN model with AMPNN (attention MPNN), which is an attention technique that the message forwarding step employs weighted summation. Moreover, they termed the D-MPNN model the edge memory neural network because it was extended by the same attention mechanism as the AMPNN (EMNN). Although it is computationally more intensive than other models, this model fared better than others on the uniformly absent information from the maximal unbiased validation (MUV) reference.

4.1.2 Structure (graph)-based models

Unlike the drug- and structure-based models in Fig.  10 b, protein targets and medication information should be included. Typical molecular docking simulation methods aim to predict the geometrically possible binding of known tertiary structure drugs and proteins. Atom sequences and amino acid residues can be used to express both the medicine as well as the target. Descriptors based on sequences were selected because DL approaches may be implemented right away with non-significant pre-processing of the entering data.

The Davis kinase binding affinity dataset (Davis et al. 2011 ) and the KIBA dataset (Sun et al. 2020a ) were used in that study. DeepDTA, suggested by Ozturk et al. ( 2018 ), outperformed moderate ML approaches such as KronRLS (Nascimento et al. 2016 ) and SimBoosts (Tong et al. 2017 ) by applying solely information about the sequence of a CNN model based on the SMILES string and amino acid sequences. Wen et al. used ECFPs and protein sequence composition descriptors as examples of common and basic features and trained them using semi-supervised learning via a deep belief network (Wen et al. 2017 ). Another study, DeepConv-DTI, built a deep CNN model using only an RDKit Morgan fingerprint and protein sequences (Lee et al. 2019 ). They also used the pooled convolution findings to capture local residue patterns of target protein sequences, resulting in high values for critical protein areas like actual binding sites.

The scoring feature, which ranks the protein-drug interaction with 3D structures and makes the training data parametric to forecast values for binding affinities of targeted proteins, is used to predict binding affinity values or binding pocket sites of the target proteins as a key metric for the structure-based regression model. The protein–drug complexes' 3D structural characteristics were included in the CNNs by AtomNet (Wallach et al. 2015 ). They placed 3D grids with set sizes (i.e., voxels) in comparison to protein–drug combinations, with every cell in the grid representing structural properties at that position. Several researchers have examined the situation since then, deep CNN models that use voxels to predict binding pocket location or binding affinity (Wang et al. 2020b ; Ashburner et al. 2000 ; Zhao et al. 2019 ). In comparison to common docking approaches such as AutoDock Vina (Trott and Olson 2010 ) or Smina (Koes et al. 2013 ), these models have shown enhanced performance. This is since CNN models are relatively impervious even with large input sizes. It can be taught and is resilient to input data noise.

Many DTI investigations using GCNs based on structure-based approaches have been reported (Feng et al. 2018 ; Liu et al. 2016 ). Feng et al. ( 2018 ) used both ECFPs and GCNs as pharmacological characteristics. In the Davis et al. ( 2011 ), Metz et al. ( 2011 ), and KIBA Tang et al. ( 2014 ) benchmark datasets, their methods outperformed prior models such as KronRLS (Nascimento et al. 2016 ) and SimBoost (Tong et al. 2017 ). However, they did agree that their GCN model couldn't beat their ECFP model due to time and resource constraints in implementing the GCN. In a different DTI investigation study, Torng et al. employed a graph without supervision to become familiar with constant size depictions of protein binding sites (Torng and Altman 2019 ). The pre-trained GCN model was then trained using the newly created protein pocket GCN, the drug GCN model, on the other hand, used attributes to be trained and which were generated automatically. They concluded that without relying on target–drug complexes, their model effectively captured protein–drug binding interactions.

Because the models that implement the attention mechanism have key qualities that enable the model to be interpreted, attention-based DTI prediction approaches have evolved (Hirohara et al. 2018 ; Liu et al. 2016 ; Perozzi et al. 2014 ).

For protein sequences, Gao et al. ( 2017 ) employed compressed vectors with the LSTM RNNs and the GCN for drug structures. They concentrated on demonstrating their method's capacity to deliver biological insights into DTI predictions. To do so, Mechanisms for two-way attention were employed. to calculate the binding of drug–target pairs (DTPs), allowing for flexible interpretation of superior data from target proteins, such as GO keywords. Shin et al. ( 2019 ) introduced the Molecule transformer DTI (MT-DTI) approach for drug representations, which uses the self-attention mechanism. The MT-DTI model was tweaked to perfection and assessed using two Davis models Using pre-trained parameters from the 97 million chemicals PubChem (Davis et al. 2011 ) and (KIBA) (Tang et al. 2014 ) benchmark datasets, which are both publicly available. However, the attention mechanism was not used to depict the protein targets because it would take too long to calculate the target sequence in an acceptable amount of time. Pre-training is impossible due to a lack of target information.

On the other hand, attention DTA presented by Zhao et al. incorporates a CNN attention mechanism model to establish the weighted connections between drug and protein sequences (Zhao et al. 2019 ). They showed that these attention-based drug and protein representations have good MLP model affinity prediction task performance. DeepDTIs used external, experimental DTPs to infer the probability of interaction for any given DTP. Four of the top ten predicted DTIs have previously been identified, and one was discovered to have a poor glucocorticoid receptor binding affinity (Huang et al. 2018 ). DeepCPI was used to predict drug–target interactions. Small-molecule interactions with the glucagon-like peptide one receptor, the glucagon receptor, and the vasoactive intestinal peptide receptor have been tested in experiments (Wan et al. 2019 ).

4.1.3 Drug–protein(disease)-based models

According to poly pharmacology, most medicines have multiple effects on both primary and secondary targets. The biological networks involved, as well as the drug's dose, influence these effects. As a result, the drug–protein(disease)-based models shown in Fig.  10 c are particularly beneficial when evaluating protein promiscuity or drug selectivity (Cortes-Ciriano et al. 2015 ). Furthermore, Neural networks that can do multiple tasks are ideal for simultaneously learning the properties of many sorts of data (Camacho et al. 2018 ). Several DL model applications, such as drug-induced gene-expression patterns and DTI-related heterogeneous networks, leverage relational information for distinct views. A network-based strategy employs heterogeneous networks includes a variety of nodes and edges kinds (Luo et al. 2017 ; David et al. 2019 ). The nodes in these networks have a local similarity, which is a significant aspect of these models. One can anticipate DTIs using their connections and topological features when a network of similarity with medications as its nodes and drug–drug similarity values as a measure of the edges' weights is investigated. Machine to support vectors (Bleakley and Yamanishi 2009 ; Keum and Nam 2017 ), Machine learning techniques that use heterogeneous networks as prediction frameworks include the regularized least square model (RLS) (Liu et al. 2016 ; Xia et al. 2010 ; Hao et al. 2016 ) and random walk with the restart model Nascimento (Lian et al. 2021 ; Nascimento et al. 2016 ). DTI prediction research using networks have employed DL to enhance the methods used to forecast associations today for evaluating the comparable topological structures of drug and target networks that are bipartite and tripartite linked networks, owing to the increased interest in the usage of DL technologies (drug, target, and disease networks) (Hassan-Harrirou et al. 2020 ; Lamb et al. 2006 ; Korkmaz 2020 ; Townshend et al. 2012 ; Vazquez et al. 2020 ). Zong et al. ( 2017 ) used the DeepWalk approach to collect local latent data, compute topology-based similarity in tripartite networks, and demonstrate the technology's promise as a medication repurposing solution.

Relationship-based features collected by training the AE were used in some network-based DTI prediction studies. Zhao et al. ( 2020 ) developed a DTI-CNN prediction model that combined depth information that is low-dimensional but rich with a heterogeneous network that has been taught using the stacked AE technique. To construct the topological similarity matrix of drug and target, Wang et al. used a deep AE and mutually beneficial pointwise information in their analysis (Wang et al. 2020b ). Peng et al. ( 2020 ) employed a denoising Autoencoder to pick network-based attributes and decrease the representation dimensions in another investigation.

By helping the self-encoder learn to denoise, the anti-aliasing effect (Autoencoder) enhances high-dimensional images with noise, input data that is noisy and incomplete, allowing the encoder to learn more reliably. These approaches, however, have a drawback in that it is challenging to foresee recent medications or targets, a problem. The problem of recommendation systems' "cold start" is known as the "cold start" problem (Bedi et al. 2015 ). The size and form of the network have a big impact on these models, so if the network isn't big enough, they will not be able to collect all the medications or targets that aren't in the network (Lamb et al. 2006 ).

Various investigations have also utilized Gene expression patterns as chemogenomic traits to predict DTIs. This research presumes that medications with similar expression patterns have similar effects on the same targets (Hizukuri et al. 2015 ; Sawada et al. 2018 ).

The revised version of CMAP, the LINCS-L1000 database, has been integrated into the DL DTI models in recent works (Subramanian et al. 2017 ; Thafar et al. 2020 ; Karpov et al. 2020 ; Arus-Pous et al. 2020 ). Based on the LINCS pharmacological perturbation and knockout gene data, using a deep neural network, Xie et al. developed a binary classification model (Xie et al. 2018 ).

On the other hand, Lee and Kim employed as a source of expression signature genes medication and target features. They used node2vec to train the rich data by examining three elements of protein function, including pathway-level memberships and PPI (Lee and Kim 2019 ). Saho and Zhang employed a GCN model to extract drug and target attributes from LINCS data and a CNN model to forecast DTPs by extracting latent features in DTIGCCN (Shao et al. 2020 ). The Gaussian kernel function was identified to aid in the production of high-quality graphs, and as a result, this hybrid model scored better on classification tests.

DeepDTnet employs a heterogeneous drug–gene-disease network to uncover known drug targets containing fifteen types of chemicals and genomic, phenotypic, and cellular network properties. DeepDTnet predicted and experimentally confirmed topotecan, a new direct inhibitor of the orphan receptor linked to the human retinoic acid receptor (Zeng et al. 2020 ).

4.2 Drug sensitivity and response prediction using DL

Drug response is the clinical outcome treated by the drug of interest ( https://www.sciencedirect.com/topics/drug-response ). This is due to the normally low ratio of samples to measurements each sample, which makes traditional feedforward neural networks unsuitable. The main idea of drug response prediction is shown in Fig.  11 . The DL method takes the heterogenous network of drug and protein interactions as inputs and predicts the response scores. Although the widespread use of the deep neural network (DNN) approaches in various domains and sectors, including related topics like computational chemistry (Gómez-Bombarelli et al. 2018 ), DNNs have only lately made their way into drug response prediction. Overparameterization, overfitting, and poor generalization are common outcomes of recent simulation datasets. However, more public data has become available recently, and freshly built DNN models have shown promise. As a result, this section summarizes current DL computational problems and drug response prediction breakthroughs.

figure 11

Drug binding with proteins and drug sensitivity (response) scores prediction

Since the 1990s, neural networks have been used to predict drug response (El-Deredy et al. 1997 ) revealed that data from tumor nuclear magnetic resonance (NMR) spectra might be used to train a neural network and can be utilized to predict drug response in gliomas and offer information on the metabolic pathways involved in drug response.

In 2018, The DRscan model was created by Chang et al. ( 2018 ), and it uses a CNN architecture that was trained on 1000 drug reaction studies per molecule. Compared to other traditional ML algorithms like RF and SVM, their model performed much better. CDRscan's ability to incorporate genomic data and molecular fingerprints is one of the reasons it outperformed these baseline models. Furthermore, its convolutional design has been demonstrated to be useful in various machine learning areas. A neural network called an autoencoder attempts to recreate the original data from the compressed form after compressing its input. As proven by Way and Greene ( 2018 ), this is very useful for feature extraction, which condensed a gene expression profile with 5000 dimensions with a maximum of 100 dimensions, some of which revealed to significant characteristics such as the patient's sexual orientation or melanoma status. Using variational autoencoders, Dincer et al. ( 2018 ) created DeepProfile, a technique for learning a depiction of gene expression in AML patients in eight dimensions that is then fitted to a Lasso linear model for treatment response prediction with superior results to that of no extracting features.

Ding et al. ( 2018 ) proposed a deep autoencoder model for representation learning of cancer cells from input data consisting of gene expression, CNV, and somatic mutations.

In 2019, MOLI (Multi-omics Late Integration) (Sharifi-Noghabi et al. 2019 ) was a deep learning model that incorporates multi-omics data and somatic mutations to characterize a cell line. Three separate subnetworks of MOLI learn representations for each type of omics data. A final network identifies a cell's response as responder or non-responder based on concatenated attributes. Those methods share two characteristics: integrating multiple input data (multi-omics) and binary classification of the drug response. Although combining several forms of omics data can improve the learning of cell line status, it may limit the method's applicability for testing on different cell lines or patients because the model requires extra data beyond gene expression.

Furthermore, a certain threshold of the IC50 values should be set before binary classification of the drug response, which may vary depending on the experimental condition, such as drug or tumor types. Twin CNN for drugs in SMILES format (TCNNS) (Liu et al. 2019b ) takes a one-hot encoded representation of drugs and feature vectors of cell lines as the inputs for two encoding subnetworks of a One-Dimensional (1D) CNN. One-hot encodings of drugs in TCNNS are Simplified Molecular Input Line Entry System (SMILES) strings which describe a drug compound's chemical composition. Binary feature vectors of cell lines represent 735 mutation states or CNVs of a cell. KekuleScope (Cortés-Ciriano and Bender 2019 ) adopts transfer learning, using a pre-trained CNN on ImageNet data. The pre-trained CNN is trained with images of drug compounds represented as Kekulé structures to predict the drug response.

Yuan et al. ( 2019 ) offer GNNDR, a GNN-based technique with a high learning capacity and allows drug response prediction by combining protein–protein interactions (PPI) information with genomic characteristics. The value of including protein information has been empirically proven. The proposed method offers a viable avenue for the discovery of anti-cancer medicines. Semi-supervised variational autoencoders for the prediction of monotherapy response were examined by the Rampášek et al. ( 2019 ). In contrast to many conventional ML methodologies, together developed a model for predicting medication reaction that took advantage of expression of genes before and after therapy in cell lines and demonstrated enhanced evaluation on a variety of FDA-approved pharmaceuticals. Chiu et al. ( 2019 ) trained a deep drug response predictor after pre-training autoencoders using mutation data and expression features from the TCGA dataset. The use of pretraining distinguishes their strategy from others. Compared to using only the labeled data, the pretraining process permits un-labelled data from outside sources, like TCGA, as opposed to just gene expression profiles obtained from drug reaction tests, resulting in a significant increase in the number of samples available and improved performance.

Chiu et al. ( 2019 ) and Li et al. ( 2019 ) used a combination of auto-encoders and predicted drug reactions in cell lines with deep neural networks and malignancies that had been gnomically characterized. To anticipate cell lines reactions to drug combinations, in https://string-db.org/cgi/download.pl?sessionId=uKr0odAK9hPs used deep neural encoders to link genetic characteristics with drug profiles.

In 2020, Wei et al. ( 2020 ) anticipate drug risk levels (ADRs) based on adverse drug reactions. They use SMOTE and machine learning techniques in their studies. The proposed framework was used to investigate the mechanism of ADRs to estimate degrees of drug risk and to assist with and direct decision-making during the changeover from prescription to over-the-counter medications. They demonstrated that the best combination, PRR-SMOTE-RF, was built using the above architecture and that the macro-ROC curve had a strong classification prediction effect. They suggested that this framework could be used by several drug regulatory organizations, including the FDA and CFDA, to provide a simple but dependable method for ADR signal detection and drug classification, as well as an auxiliary judgement basis for experts deciding on the status change of Rx drugs to OTC drugs. They propose that more ML or DL categorization algorithms be tested in the future and that computational complexity be factored into the comparison process. Kuenzi et al. ( 2020 ) built DrugCell, an interpretable DL algorithm of personal cancer cells based on the reactions of 1235 tumor cell lines to 684 drugs. Genotypes of cancer cause conditions in cellular systems combined with medication composition to forecast therapeutic outcome while also learning the molecular mechanisms underlying the response. Predictions made by DrugCell in cell lines are precise and help to categorize clinical outcomes. The study of DrugCell processes results in the development of medication combinations with synergistic effects, which we test using combinatorial CRISPR, in vitro drug–drug screening, and xenografts generated from patients. DrugCell is a step-by-step guide to building interpretable predictive medicine models.

Artificial Neural Networks (ANNs) that operate on graphs as inputs are known as Graph Neural Networks (GNNs). Deep GNNs were recently employed for learning representations of low-dimensional biomolecular networks (Hamilton 2020 ; Wu et al. 2020 ). Ahmed et al. ( 2020 ) used two separate GNN methods to develop a GNN using GE and a network of genes that are expressed together. This is a network that depicts the relationship between gene pairs' expression.

The CNN is one of the neural network models adopted for drug response prediction. The CNN has been actively used for image, video, text, and sound data due to its strong ability to preserve the local structure of data and learn hierarchies of features. In 2021, several methods had been developed for drug response prediction, each of which utilizes different input data for prediction (Baptista et al. 2021 ).

Nguyen et al. ( 2021 ) proposed a method to predict drug response called GraphDRP, which integrates two subnetworks for drug and cell line features, like CNN in Liu et al. ( 2019b ) and Qiu et al. ( 2021 ). Gene expression data from cancer cell lines and medication response data, the author finds predictor genes for medications of interest and provides a reliable and accurate drug response prediction model. Using the Pearson correlation coefficient, they employed the ElasticNet regression model to predict drug response and fine-tune gene selection after pre-selecting genes. They ran a regression on each drug twice, once using the IC50 and once with the area under the curve (AUC), to obtain a more trustworthy collection of predictor genes (or activity area). The Pearson correlation coefficient for each of the 12 medicines they examined was greater than 0.6. With 17-AAG, IC50 has the highest Pearson correlation coefficient of 0.811.

In contrast, AUC has the highest Pearson correlation coefficient of 0.81. Even though the model developed in this study has excellent predictive performance for GDSC, it still has certain flaws. First, the cancer cell line's properties may differ significantly from those of in vivo malignancies, and it must be determined whether this will be advantageous in a clinical trial. Second, they primarily use gene expression data to predict drug response. While drug response is influenced by structural changes such as gene mutations, it is also influenced by gene expression levels. To improve the prediction capacity of the model, more research is needed to use such data and integrate it into the model.

In 2022, Ren et al. ( 2022 ) suggested a graph regularized matrix factorization based on deep learning (DeepGRMF), which uses a variety of information, including information on drug chemical composition, their effects on cell biology signaling mechanisms, and the conditions of cancer cells, to integrate neural networks, graph models, and matrix-factorization approaches to forecast cell response to medications. DeepGRMF trains drug embeddings so that drugs in the embedding space with similar structures and action mechanisms, (MOAs) are intimately linked. DeepGRMF learns the same representation embeddings for cells, allowing cells with similar biological states and pharmacological reactions to be linked. The Cancer Cell Line Encyclopedia (CCLE) and On the Genomics of Drug Sensitivity in Cancer (GDSC) datasets, DeepGRMF outperforms competing models in prediction performance. In the Cancer Genome Atlas (TCGA) dataset, the suggested model might anticipate the effectiveness of a treatment plan on lung cancer patients' outcomes. The limited expressiveness of our VAE-based chemical structure representation may explain why new cell line prediction outperforms innovative drug sensitivity prediction in terms of accuracy. A family of neural graph networks has recently been shown to depict better chemical structures that can be investigated in the future. Pouryahya et al. ( 2022 ) proposed a new network-based clustering approach for predicting medication response based on OMT theory. Gene-expression profiles and cheminformatic drug characteristics were used to cluster cell lines and medicines, and data networks were used to represent the data. Then, RF model was used regarding each pair of cell-line drug clusters. by comparison, prediction-clustered based models regarding the homogenous data are anticipated to enhance drug sensitivity and precise forecasting and biological interpretability.

4.3 Drug–drug interactions (DDIs) side effect prediction using DL

Drugs are chemical compounds consumed by people and interact with protein targets to create a change. The drugs may alter the human body positively or negatively. Drug side effects are the undesirable alterations medications cause in the human body. These adverse effects might range from moderate headaches to life-threatening reactions like cardiac arrest, malignancy, and death. They differ depending on the person's age, gender, stage of sickness, and other factors (Kuijper et al. 2019 ). In the laboratory, to determine whether the medications have any unfavorable side effects, several tests are conducted on them. However, these examinations are both pricey and additionally lengthy. Recently, many computational algorithms for detecting medication adverse effects have been created. Computational methodologies are replacing laboratory experiments.

On the other hand, these methods do not provide adequate data to predict drug–drug interactions (DDIs). The phenomenon of DDIs is discussed in Fig.  12 . The desired effects of a drug resulting from its interaction with the intended target and the unfavorable repercussions emerging from drug interactions with off targets make up a drug's entire reaction on the human body (undesirable effects). Even though A medication has a strong affinity for binding to one target, it binds to several proteins as well with varied affinities, which might cause adverse consequences (Liu et al. 2021 ). Predicting DDIs can assist in reducing the likelihood of adverse reactions and optimizing the medication development and post-market monitoring processes (Arshed et al. 2022 ). Side effects of DDIs are often regarded as the leading cause of drug failure in pharmacological development. When drugs have major side effects, the market is quickly removed from them. As a result, predicting side effects is a fundamental requirement in the drug discovery process to keep drug development costs and timelines in check and launch a beneficial drug in terms of patient health recovery.

figure 12

Drug binding with proteins and DDI side effects

Furthermore, the average drug research and development cost is $2.6 billion (Liu et al. 2019 ). As a result, determining the possibility of negative consequences is important for lowering the expense and risk of medication development. The researchers use various computer tools to speed up the process. In pharmacology and clinical application, DDI prediction is a difficult topic, and correctly detecting possible DDIs in clinical studies is crucial for patients and the public. Researchers have recently produced a series of successes utilizing deep learning as an AI technique to predict DDIs by using drug structural properties and graph theory (Han et al. 2022 ). AI successfully detected potential drug interactions, allowing doctors to make informed decisions before prescribing prescription combinations to patients with complex or numerous conditions (Fokoue et al. 2016 ).

Therefore, this section comprehensively reviews the researchers' most popular DL algorithms to predict DDIs.

In 2016, Tiresias is a framework proposed by Achille Fokoue et al. ( 2017 ) for discovering DDIs. The Tiresias framework uses a large amount of drug-related data as input to generate DDI predictions. The detection of the DDI approach begins using input data that has been semantically integrated, resulting in a knowledge network that represents drug properties and interactions using additional components like enzymes, chemical structures, and routes. Numerous similarity metrics between all pharmacological categories were determined using a knowledge graph in a scalable and distributed setting. To forecast the DDIs, a large-scale logistic regression prediction model employs calculated similarity metrics. According to the findings, the Tiresias framework was proven to help identify new interactions between currently available medications and freshly designed and existing drugs. The suggested Tiresias model's necessity for big, scaled medication information was negative, resulting in the developed model's high cost.

In 2017, Reza et al. ( 2017 ) developed a computational technique for predicting DDIs based on functional similarities among all medicines. Several major biological aspects were used to create the suggested model: carriers, enzymes, transporters, and targets (CETT). The suggested approach was implemented on 2189 approved medications, for which the associated CETTs were obtained, and binary vectors to find the DDIs were created. Two million three hundred ninety-four thousand seven hundred sixty-seven potential drug–drug interactions were assessed, with over 250,000 unidentified possible DDIs discovered. Inner product-based similarity measures (IPSMs) offered good values predicted for detecting DDIs among the several similarity measures used. The lack of pharmacological data was a key flaw in this strategy, which resulted in the erroneous detection of all potential pairs of DDIs.

In 2018, Ryu et al. ( 2018 ) proposed a model that predicts more DDI kinds using the drug's chemical structures as inputs and applied multi-task learning to DDI type prediction in the same vein Decagon (Zitnik et al. 2018 ) models polypharmacy side effects using a relational GNN. To comprehend the representations of intricate nonlinear pharmacological interactions, Chu et al. ( 2018 ) utilized an auto-encoder for factoring. To predict DDIs, Liu et al. ( 2019c ) presented the DDI-MDAE based on shared latent representation, a multimodal deep auto-encoder. Recently, interest in employing graph neural networks (GNNs) to forecast DDI has increased. Distinct aggregation algorithms lead to different versions of GNNs to efficiently assemble the vectors of its neighbors’ feature vectors (Asada et al. 2018 ) uses a convolutional graph network (GCN) to encode the molecular structures to extract DDIs from text. Furthermore, Ma et al. ( 2018 ) has incorporated attentive Multiview graph auto-encoders into a coherent model.

Chen ( 2018 ) devised a model for predicting Adverse Drug Reactions (ADR). SVM, LR, RF, and GBT were all used in the predictive model. The DEMO dataset, which contains properties such as the patient's age, weight, and sex, and the DRUG dataset, which includes features such as the drug's name, role, and dosage, were employed in this model. Males make up 46% of the sample, while females make up 54%. The developed model had a fair forecasting accuracy for a representative sample set. Furthermore, the outputs revealed that the suggested model is only accurate for a significant number of datasets.

To anticipate the possible DDI, Kastrin et al. ( 2018 ) employed statistical learning approaches. The DDI was depicted as a complex network, with nodes representing medications and links representing their potential interactions. On networks of DDIs, the procedure for predicting links was represented as a binary classification job. A big DDI database was picked randomly to forecast. Several supervised and unsupervised ML approaches, such as SVM, classification tree, boosting, and RF, are applied for edge prediction in various DDIs. Compared to unsupervised techniques, the supervised link prediction strategy generated encouraging results. To detect the link between the pharmaceuticals, The proposed method necessitates Unified Medical Language System (UMLS) filtering, which provided a dilemma for the scientists. Furthermore, the suggested system only considers fixed network snapshots, which is problematic for DDI's system because It's a fluid system.

In 2019, Lee et al. ( 2019 ) proposed a deep learning system for accurately forecasting the results of DDIs. To learn more about the pharmacological effects of a variety of DDIs, an assortment of auto-encoders and a deep feed-forward neural network was employed in the suggested method that were honed utilizing a mix of well-known techniques. The results revealed that using SSP alone improves GSP and TSP prediction accuracy, and the autoencoder is more powerful than PCA at reducing profile features. In addition, the model outperformed existing approaches and included numerous novel DDIs relevant to the current study Yue et al. ( 2020 ) combines numerous graphs embedding methods for the DDI job, while models DDI as link prediction with the help of a knowledge graph (Karim et al. 2019 ). There's also a system for co-attention (Andreea and Huang 2019 ), which presented a deep learning model based solely on side-effect data and molecular drug structure. CASTER in Huang et al. ( 2020 ) also based on drug chemical structures, develops a framework for dictionary learning to anticipate DDIs (Chu et al. 2019 ) and proposes using semi-supervised learning to extract meaningful information for DDI prediction in both labeled and unlabeled drug data. Shtar et al. ( 2019 ) used a mix of computational techniques to predict medication interactions, including artificial neural networks and graph node factor propagation methods such as adjacency matrix factorization (AMF) and adjacency matrix factorization with propagation (AMFP). The Drug-bank database was used to train the model, containing 1142 medications and 45,297 drug drugs. With 1442 drugs and 248,146 drug–drug interactions, the trained model was tested from the drug bank's most recent version. AMF and AMFP were also used to develop an ensemble-based classifier, and the outcomes were assessed using the receiver operating characteristic (ROC) curve. The findings revealed that the suggested a classifier that uses an ensemble delivers important drug development data and noisy data for drug prescription. In addition, drug embedding, which was developed during the training of models utilizing interaction networks, has been made available. To anticipate adverse drug events caused by DDIs, Hou et al. ( 2019 ) suggested a deep neural network architecture model. The suggested model is based on a database of 5000 medication codes obtained from Drug Bank. Using the computed features, it discovers 80 different types of DDIs. Tensor Flow-GPU was also used to create the model, which takes 4432 drug characteristics as input.

Medicines for inflammatory bowel disease (IBD) can predict how they will react; the trained model has an accuracy of 88 percent. The findings also revealed that the model performs best when many datasets are used. Detecting negative effects of drugs with a DNN Model was proposed by Wang et al. ( 2019 ). The model predicts ADRs by using synthetic, biological, and biomedical knowledge of drugs. Drug data from SIDER databases was also incorporated into the model. The proposed system's performance was improved by distributing. Using a word-embedding approach, determine the association between medications using the target drug representations in a vector space. The suggested system's fundamental flaw was that it only worked well with ordinary SIDER databases.

In 2020, numerous AI-based methods were developed for DDI event prediction, including evaluating chemical structural similarity using neural graph networks (Huang et al. 2020 ). Attempts to forecast DDI utilizing different data sources have also been made, such as leveraging similarity features to create pharmacological features for the DDI job predicting occurrences (Deng et al. 2020 ).

With the help of word embeddings, part-of-speech tags, and distance embeddings. Bai et al. ( 2020 ) suggested a deep learning technique that executes the DDI extraction task and supports the drug development cycle and drug repurposing. According to experimental data, the technique can better avoid instance misclassifications with minimal pre-processing. Moreover, the model employs an attention technique to emphasize the significance of each hidden state in the Bi-LSTM layers.

A tool for extracting features regarding a graph convolutional network (GCN) and a predictor based on a DNN. Feng et al. ( 2020 ) suggested DPDDI, an effective and robust approach for predicting potential DDIs by utilizing data from the DDI network lacking a thought of drug characteristics (i.e., drug chemical and biological properties). The proposed DPDDI is a useful tool for forecasting DDIs. It should benefit from other DDI-related circumstances, such as recognizing unanticipated side effects and guiding drug combinations. The disadvantage of this paradigm is that it ignores drug characteristics.

Zaikis and Vlahavas ( 2020 ), by developing a bi-level network with a more advanced level reflecting the network of biological entities' interactions, suggested a multi-level GNN framework for predicting biological entity links. Lower levels, however, reflect individual biological entities such as drugs and proteins, although the proposed model's accuracy needs to be enhanced.

In 2021, To overcome the DDI prediction, Lin et al. ( 2021 ) suggested an end-to-end system called Knowledge Graph Neural Network (KGNN). KGNN expands the use of spatial GNN algorithms to the knowledge graph by selectively various aggregators of neighborhood data, allowing it to learn the knowledge graph's topological structural information, semantic relations, and the neighborhood of drugs and drug-related entities. Medical risks are reduced when numerous medications are used correctly, and drug synergy advantages are maximized. For multi-typed DDI pharmacological effect prediction, Yue et al. ( 2021 ) used knowledge graph summarization. Lyu et al. ( 2021 ) also introduced a Multimodal Deep Neural Network (MDNN) framework for DDI event prediction. On the drug knowledge graph, a graph neural network was used, MDNN effectively utilizes topological information and semantic relations. MDNN additionally uses joint representation structure information, and heterogeneous traits are studied, which successfully investigates the multimodal data's complementarity across modes. Karim et al. ( 2019 ) built a knowledge graph that used CNN and LSTM models to extract local and global pharmacological properties across the network. DANN-DDI is a deep attention neural network framework proposed by Liu et al. ( 2021 ). To anticipate unknown DDIs, it carefully incorporates different pharmacological properties (Chun and Yi-Ping Phoebe 2021 ) and developed a deep hybrid learning (DL) model to provide a descriptive forecasting of pharmacological adverse reactions. It was one of the initial hybrid DL models through conception models that could be interpreted. The model includes a graph CNN through conception models to improve the learning efficiency of chemical drug properties and bidirectional long short-term memory (BiLSTM) recurrent neural networks to link drug structure to adverse effects. After concatenating the outputs of the two networks (GCNN and BiLSTM), a fully connected network is utilized to forecast pharmacological adverse reactions. Regardless of the classification threshold, the model obtains an AUC of 0.846. It has a 0.925 precision score. Even though a tiny drug data set was used for adverse drug response (ADR) prediction, the Bilingual Evaluation Understudy (BLEU) concluded results were 0.973, 0.938, 0.927, and 0.318, indicating considerable achievements. Furthermore, the model can correctly form words to explain pharmacological adverse reactions and link them to the drug's name and molecular structure. The projected drug structure and ADR relationship will guide safety pharmacology research at the preclinical stage and make ADR detection easier early in the drug development process. It can also aid in the detection of unknown ADRs in existing medications. DDI extraction using a deep neural network model from medical literature was proposed by Mohsen and Hossein (). This model employs an innovative approach of attracting attention to improve the separation of essential words from other terms based on word similarity and location concerning candidate medications. Before recognizing the type of DDIs, this method calculates the results of a bi-directional long short-term memory (Bi-LSTM) model's attention weights in the deep network architecture. On the standard DDI Extraction 2013 dataset, the proposed approach was tested. According to the findings of the experiments, they were able to get an F1-Score of 78.30, which is comparable to the greatest outcomes for stated existing approaches.

In 2022, Pietro et al. ( 2022 ) introduced DruGNN, a GNN-based technique for predicting DDI side effects. Each DDI corresponds to a class in the prediction, a multi-class, multi-label node classification issue. To forecast the side effects of novel pharmaceuticals, they use a combination inductive-transudative learning system that takes advantage of drug and gene traits (induction path) and knowledge of known drug side effects (transduction path). The entire procedure is adaptable because the base for machine learning can still be used if the graph dataset is enlarged to include more node properties and associations. Zhang et al. ( 2022 ) proposed CNN-DDI, a new semi-supervised algorithm for predicting DDIs that uses a CNN architecture. They first extracted interaction features from pharmacological categories, targets, pathways, and enzymes as feature vectors. They then suggested a novel convolution neural network as a predictor of DDIs-related events based on feature representation. Five convolutional layers, two full-connected layers, and a CNN-based SoftMax layer make up the predictor. The results reveal that CNN-DDI superior to other cutting-edge techniques, but it takes longer to complete (Jing et al. 2022 ) presented DTSyn. This unique dual-transformer-based approach can select probable cancer medication combinations. It uses a multi-head attention technique to extract chemical substructure-gene, chemical-chemical, and chemical-cell-line connections. DTSyn is the initial model that incorporates two transformer blocks to extract linkages between interactions between genes, drugs, and cell lines, allowing a better understanding of drug action processes. Despite DTSyn's excellent performance, it was discovered that balanced accuracy on independent data sets is still limited. Collecting more training data is expected to solve the problem. Another issue is that the fine-granularity transformer was only trained on 978 signature genes, which could result in some chemical-target interactions being lost.

Furthermore, DTSyn used expression data as the only cell line attributes. To fully represent the cell line, additional omics data may be added going forward, including methylation and genetic data. He et al. ( 2022 ) proposed MFFGNN, a new end-to-end learning framework for DDI forecasting that can effectively combine information from molecular drug diagrams, SMILES sequences, and DDI graphs. The MFFGNN model used the molecular graph feature extraction module to extract global and local features from molecular graphs.

They run thorough tests on a variety of real-world datasets. The MFFGNN model routinely beats further cutting-edge models, according to the findings. Furthermore, the module for multi-type feature fusion configures the gating mechanism to limit the amount of neighborhood data provided to the node.

4.4 Drug–drug similarity prediction using DL

Drug similarity studies presume that medications with comparable pharmacological qualities have similar activation mechanisms, and side effects are used to treat problems like each other (Brown 2017 ; Zeng et al. 2019 ).

The drug-pharmacological similarity is critical for various purposes, including identifying drug targets, predicting side effects, predicting drug–drug interactions, and repositioning drugs. Features of the chemical structure (Lu et al. 2017 ; O’Boyle 2016 ), protein targets (Vilar 2016 ; Wang et al. 2014 ), side-effect profiles (Campillos et al. 2008 ; Tatonetti et al. 2012 ), and gene expression profiles (Iorio et al. 2010 ) provide a multi-perspective viewpoint for forecasting medications that are similar and can correct for data gaps in different data sources and offer fresh perspectives on drug repositioning and other uses. The main idea of drug–drug similarity is presented in Fig.  13 . The vector represents the drug features, and the links reflect the similarity between the two drugs.

figure 13

Drug–drug similarity main idea

4.4.1 Drug similarity measures

The similarity estimations are calculated based on chemical structure, target protein sequence-based, target protein functional, and drug-induced pathway similarities.

4.4.1.1 The similarity in chemical structure

DrugBank ( 2019 ) provides tiny molecule medicine chemical structures in SDF molecular format. Invalid SDFs can be recognized and eliminated, such as those with a NA value or fewer than three columns in atom or bond blocks. For valid compounds, atom pair descriptors can be computed, pairwise comparison of compounds, δ c ( di , dj ), was evaluated using atom pairs using the Tanimoto coefficient, which is defined as the number of atom pairs in each fraction shared by two different compounds divided by their union (Eq.  1 ).

where AP i and AP j are atom pairs from pharmaceuticals d i and dj, respectively, the numerator is the total number of atom pairs in both compounds, while the denominator is the number of common atom pairs in both compounds.

4.4.1.2 Target protein sequence-based similarity

DrugBank provides all small molecule drugs have target sequences in FASTA format. The basic Needleman-Wunsch et al. ( 1970 ) dynamic programming approach for global alignment can be used to compare pairwise protein sequences. The proportion of pairwise sequence identity (Raghava 2006 ) can be represented as the corresponding sequence similarity. Equation  2 was used to calculate drug–drug similarity based on target sequence similarities:

where δ t ( di , dj ) denotes target-based similarity between medicines di and dj. Drugs di target a group of proteins known as Ti. Tj is a set of proteins that pharmaceuticals dj target and S(x,y) is a similarity metric based on symmetric sequences between two targeted proteins, x \(\in \) Ti and y \(\in \) Tj. Overall, Eq.  2 calculates the average of the best matches, wherein each first medicine's target is only connected to the second medicine's most comparable phrase, and vice versa.

4.4.1.3 Target protein functional similarity

Protein targets that are overrepresented by comparable biological functions and have similar sequences imply shared pharmacological mechanisms and downstream effects (Passi et al. 2018 ). As a result, each protein has a set of Gene Ontology (GO) concepts from all three categories associated with it, such as cellular components (CC), molecular functions (MF), and biological processes (BP). We filtered out GO keywords that were either very specialized (with 15 linked genes) or very general (with 100 genes). DrugBank ( 2019 ) provided the Human Protein–Protein Interaction (PPI) network. Wang et al. ( 2007 ) proposed leveraging the topology of the GO graph structure to determine the semantic similarity of their linked GO terms, which was used to determine how functionally comparable two drugs are, such as δ f (d i , d j ). Using a best-match average technique, any two GO keywords are compared for pairwise semantic similarity connected with di and d j were aggregated into a single semantic similarity measure and presented into a final similarity matrix.

4.4.1.4 Drug-induced pathway similarity

A medication pair that triggers similar pathways or overlaps shows that the drugs' mechanisms of action are similar, which is useful information for drug similarities and repositioning research (Zeng et al. 2015 ). Kanehisa and Goto ( 2000 ) was used to find the pathways activated by each small molecule medication. Using dice similarity, the similarity in pairs of any two options was calculated based on their constituent genes' closeness. After that, a pathway-based similarity score was calculated for each medication pair d i and d j , i.e., δ p ( d i , d j ), was calculated using Eq.  3 :

where P i and P j are a group of drug-induced pathways d i and d j , respectively; x and y are two paths represented by a group of genes that make up their constituents, and \(DSC\left( {x,y} \right) = {{{2}\left| {x \cap y} \right|} \mathord{\left/ {\vphantom {{{2}\left| {x \cap y} \right|} {\left( {\left| x \right| + \left| y \right|} \right)}}} \right. \kern-\nulldelimiterspace} {\left( {\left| x \right| + \left| y \right|} \right)}}\) is the probability of a pair of dice matching, this determines how much the two trajectories overlap. When no gene is shared by any two pathways produced by the comparing drug pair, the similarity is set to 0.0. Overall, Eq.  3 implies that if two medications stimulate one or more identical pathways, the maximum pathway-based similarity will be achieved (s).

4.4.2 DL for drug similarity prediction

Wang et al. ( 2019 ) introduced a gated recurrent units (GRUs) model that employs similarity to predict drug–disease interactions. In this approach, CDK turned the SMILES into 2D chemical fingerprints, and the Jaccard score of the 2D chemical fingerprints was used to compare the two medicines. This section comprehensively reviews the researchers' most popular DL algorithms to predict drug similarity.

Hirohara et al. ( 2018 ) employed a CNN to learn molecular representation. The network is given the molecule's SMILES notation as input to feed into the convolutional layers in this scenario. The TOX 21 dataset was used.

To conduct similarity analysis, Cheng et al. ( 2019 ) used the Anatomical Therapeutic Chemical (ATC) based on the drug ATC classification systems and code-based commonalities of drug pairs. The authors created interaction networks, performed drug pair similarity analyses, and developed a network-based methodology for identifying clinically effective treatment combinations for a specific condition.

Xin et al. ( 2016 ) presented a Ranking-based k-Nearest Neighbour (Re-KNN) technique for medication repositioning. The method's key feature combines the Ranking SVM (Support Vector Machine) algorithm and the traditional KNN algorithm. Chemical structural similarity, target-based similarity, side-effect similarity, and topological similarity are the types of similarity computation methodologies they used. The Tanimoto score was then used to determine the similarity between the two profiles.

Seo et al. ( 2020 ) proposed an approach that combined drug–drug interactions from DrugBank, network-based drug–drug interactions, polymorphisms in a single nucleotide, and anatomical hierarchy of side effects, as well as indications, targets, and chemical structures.

Zeng et al. ( 2019 ) developed an assessment of clinical drug–drug similarity derived from data from the clinic and used EHRs to analyse and establish drug–diagnosis connections. Using the Bonferroni adjusted hypergeometric P value, they created connections between drugs and diagnoses in an EMR dataset. The distances between medications were assessed using the Jaccard similarity coefficient to form drug clusters, and a k-means algorithm was devised.

Dai et al. ( 2020 ) reviewed, summarized representative methods, and discussed applications of patient similarity. The authors talked about the values and applications of patient similarity networks. Also, they discussed the ways to measure similarity or distance between each pair of patients and classified it into unsupervised, supervised, and semi-supervised.

Yan et al. ( 2019 ) created BiRWDDA, a new computational methodology for medication repositioning that combines bi-random walk and various similarity measures to uncover potential correlations between diseases and pharmaceuticals. First drug and disease–disease similarities are assessed to identify optimal drug and disease similarities. The information entropy is evaluated between the similarity of medicine and disease to determine the right similarities. Four drug–drug similarity metrics and three disease–disease similarity measurements were calculated depending on some drug- and disease-related characteristics to create a heterogeneous network. The drug's protein sequence information, the extracted drug interaction from DrugBank then utilized the Jaccard score to determine this similarity, the chemical structure, derived canonical SMILES from DrugBank, and the side effect, respectively the four drug–drug similarities.

Yi et al. ( 2021 ) constructed the model of a deep gated recurrent unit to foresee drug–disease interactions that likely employ a wide range of similarity metrics and a kernel with a Gaussian interaction profile. Based on their chemical fingerprints, the similarity measure is utilized to detect a distinguishing trait in medications. Meanwhile, based on established disease–disease relationships, the Gaussian interactions profile kernel is used to derive efficient disease features. After that, a model with a deep gated recurrent cycle is created to anticipate drug-disease interactions that could occur. The outputs of the experiments showed that the suggested algorithm could be used to anticipate novel drug indications or disease treatments and speed up drug repositioning and associated drug research and discovery.

To forecast DDIs, Yan et al. ( 2022 ) suggested a semi-supervised learning technique (DDI-IS-SL). DDI-IS-SL uses the cosine similarity method to calculate drug feature similarity by combining chemical, biological, and phenotypic data. Drug chemical structures, drug–target interactions, drug enzymes, drug transporters, drug routes, drug indications, drug side effects, harmful effects of drug discontinuation, and DDIs that have been identified are all included in the integrated drug information.

Heba et al. ( 2021 ) used DrugBank to develop a machine learning framework based on similarities called "SMDIP" (Similarity-based ML for Drug Interaction Prediction), where they calculated drug–drug similarity utilizing a Russell–Rao metric for the biological and structural data that is currently accessible on DrugBank to represent the limited feature area. The DDI classification is carried out using logistic regression, emphasizing finding the main predictors of similarity. The DDI key features are subjected to six machine learning models (NB: naive Bayes; LR: logistic regression; KNN: k-nearest neighbours; ANN: neural network; RFC: random forest classifier; SVM: support vector machine).

For large-scale DDI prediction, Vilar et al. ( 2014 ) provided a procedure combining five similar drug fingerprints (Two-dimensional structural fingerprints, fingerprinting of interaction profiles, fingerprints of the target profile, Fingerprints of ADE profiles, and pharmacophoric techniques in three dimensions).

Song et al. ( 2022 ) used similarity theory and a convolutional neural network to create global structural similarity characteristics. They employed a transformer to extract and produce local chemical sub-structure semantic characteristics for drugs and proteins. To create drug and protein global structural similarity characteristics, The Tanimoto coefficient, Levenshtein distance, and CNN are all utilized in this study.

5 Benchmark datasets and databases

Drug development or discovery has been based on a range of direct and indirect data sources and has regularly demonstrated strong predictive capability in finding confirmed repositioning candidates and other applications for computer-aided drug design. This section reviews the most important and available benchmark datasets and databases used in the drug discovery problem and which the researchers may need according to each problem category. Thirty-five datasets are summarized in Table 3.

6 Evaluation metrics

Performance measures are required for evaluating machine learning models (Benedek et al. 2021 ). The measures serve as a tool for comparing different techniques. They aid in comparing many approaches to identify the best one for execution. This section describes the many metrics defined for the four categories of drug discovery difficulties below.

Table 4 shows the metrics employed in drug discovery problems—understanding the metrics aids in assessing the effectiveness of various prediction systems. True positives (TP) are drug side effects that have been recognized appropriately, False positives (FP) are adverse pharmacological effects that aren't present but were detected by the model, and True negatives (TN) are pharmacological side effects that do not exist but that the model failed to detect. False negatives (FN) are adverse pharmacological effects the model did not predict.

7 Drug dosing optimization

Drugs are vital to human health and choosing the proper treatment and dose for the right patient is a constant problem for clinicians. Even when taken as studied and prescribed, drugs have adverse impact profiles with varying response rates. As a result, all medications must be well-managed, especially those utilized in treating critical ailments or with a tight exposure window between efficacy and toxicity. Clinicians follow typical guidelines for the first dosage, which is not always optimal or secure for every patient, especially if the medicine no longer is evaluated in various dosages for various patient types. Precision dosage can revolutionize by increasing perks in health care while reducing drug therapy risks. While precise dosing will probably influence some pharmaceuticals significantly, perhaps not essential or practical to apply to all drugs or therapeutic classes. As a result, recognizing the characteristics that make medications suitable for precision dosage targets will aid in directing resources to where they'll have the most impact. Precision-dosing meds with a high priority and therapeutic classes could be crucial in achieving increased health care performance, safety, and cost-effectiveness (Tyson et al. 2020 ).

Due to standard, fixed dosing procedures or gaps in knowledge, imprecise drug dosing in specific subpopulations increases the risk of potentiating adverse effects due to supratherapeutic or subtherapeutic concentrations (Watanabe et al. 2018 ). Currently, the Food and Medicine Administration (FDA) simply requires a drug to be statistically better than a non-inferior to placebo of the existing treatment standard. This does not guarantee that the medicine will benefit most patients in clinical trials, especially if malignancies treatment can be tough, like diffuse intrinsic pontine glioma (DIPG) and unresectable meningioma, where rates of therapy response can be exceedingly low (Fleischhack et al. 2019 ).

There are essential aspects for dose optimization ( https://friendsofcancerresearch.org/wpcontent/uploads/Optimizing_Dosing_in_Oncology_Drug_Development.pdf ) that vary based on the product, the target population, and the available data to find the most effective dose, which varies based on the product, the target population, and the available data:

Therapeutic properties: Drug features such as small molecule vs. large molecule and agonist vs. antagonist impact how drugs interact with the body regarding safety and efficacy. The therapeutic characteristics impact the first doses used in dose-finding studies and the procedures used to determine which doses should be used in registrational trials.

Patient populations: Patient demographics vary depending on tumour kind, stage of disease, and comorbidities. Understanding how diverse factors influence the drug's efficacy may justify modifying the dose correspondingly, especially in the context of enlarged clinical trial populations.

Supplemental versus original approval: Differences in disease features and patient demographics between tumour types and treatment settings, such as monotherapy versus combination therapy, must be considered when assessing whether additional dose exploration is required for a supplemental application. In cases when more dose exploration is required, the research design can include previous exposure-response knowledge from the initial approval.

8 Drug discovery and XAI

The topic of XAI addresses one of the most serious flaws in ML and DL algorithms: model interpretability and explain ability. Understanding how and why a prediction is formed becomes increasingly crucial as algorithms grow more sophisticated and can forecast with greater accuracy. It would be impossible to trust the forecasts of real-world AI applications without interpretability and explain ability. Human-comprehensible explanations will increase system safety while encouraging trust and sustained acceptance of machine learning technologies (). XAI has been studied to circumvent the limitations of AI technologies due to their black-box nature. In contrast to making decisions and model justifications which may be provided by AI approaches like DL and XAI (Zhang et al. 2022 ). Attention has been attracted to XAI approaches (Lipton 2018 ; Murdoch et al. 2019 ) to compensate for the lack of interpretability of some ML models as well as to aid human decision-making and reasoning (Goebel et al. 2018 ). The purpose of presenting relevant explanations alongside mathematical models is to help students understand them better by (1) Making the decision-making process more transparent (Doshi-Velez and Kim 2017 ), (2) correct predictions should not be made for the wrong motives (Lapuschkin et al. 2019 ), (3) avoid biases and discrimination that are unjust or unethical (Miller 2019 ), and (4) close the gap between ML and other scientific disciplines. Effective XAI can also help scientists in navigating the scientific process (Goebel et al. 2018 ), enabling people to fine-tune their understanding and opinions on the process under inquiry (Chander et al. 2018 ). We hope to provide an overview of recent XAI drug discovery research in this section.

XAI has a place in drug development. While the precise definition of XAI is still up for controversy (Guidotti et al. 2018 ), the following characteristics of XAI are unquestionably beneficial in applications of drug design (Lipton 2018 ):

Transparency is accomplished by understanding how the system came to a specific result.

The explanation of why the model's response is suitable serves as justification. It is instructive to provide new information to human decision-makers.

Determining the reliability of a prediction to estimate uncertainty.

The molecular explanation of pharmacological activity is already possible with XAI (Xu et al. 2017 ; Ciallella and Zhu 2019 ), as well as drug safety and organic synthesis planning (Dey et al. 2018 ). If It's working overtime, XAI will be important in processing and interpreting increasingly complex chemical data, as well as creating new pharmaceutical ideas, all while preventing human bias (Boobier et al. 2017 ). Application-specific XAI techniques are being developed to quickly reply to unique scientific issues relating to the Pathophysiology and biology of the human may be boosted by pressing drug discovery difficulties such as the coronavirus pandemic.

AI tools can increase their prediction performance by increasing model complexity. As a result, these models become opaque, with no clear grasp of how they operate. Because of this ambiguity, AI models are not generally utilized in important industries such as medical care. As a result, XAI focuses on understanding what goes into AI model prediction to meet the demand for transparency in AI tools. AI model interpretability approaches can be categorized depending on the algorithms used, a scale for interpreting, and the kind of information (Adadi and Mohammed 2018 ). Regarding the objectives of interpretability, approaches grouped as white-box model development, black-box model explanation, model fairness enhancement, and predictive sensitivity testing (Guidotti et al. 2018 ).

According to the gradient-based attribution technique (Simonyan et al. 2014 ), the network's input features are to blame for the forecast. Because this strategy is commonly employed when producing a DNN system's predictions, it may be a suitable solution for various black-box DNN models in DDI prediction (Quan, et al. 2016 ; Sun et al. 2018 ). In addition, DeepLIFT is a frequent strategy for implementing on top of DNN models that have been demonstrated to be superior to techniques based on gradients (Shrikumar et al. 2017 ). As opposed to that, the Guided Backpropagation model may be used to construct network architectures (Springenberg 2015 ). A convolutional layer with improved stride can be used instead of max pooling in CNN to deal with loss of precision. This method could be employed in CNN-based DDI prediction, as shown in Zeng et al. ( 2015 ).

Furthermore, in the Tao et al. ( 2016 ) was implemented neural networks that parse natural language. Using rationales, this method aimed to achieve the small pieces of input text. This method's design comprises two parts: a generator and an encoder that seek for text subsets that are closely connected to the predicted outcome. Because NLP-based models are used to extract DDIs (Quan et al. 2016 ), the above methods should be examined for usage in improving the model's clarity.

Aside from that, XAI has created methods for developing white-box models, including linear, decision tree, rule-based, and advanced but transparent models. However, these approaches are receiving less attention due to their weak ability to predict, particularly in the NLP-based sector, such as in the DDIs the job of extracting. Several ideas to address AI fairness have also been offered. Nonetheless, while extracting DDIs, only a small number of these scholarly studies looked at non-tabular data impartiality, such as text-based data. Many DDIs experiments used the word embedding method (Quan et al. 2016 ; Zhang 2020 ; Bolukbasi 2016 ). As a result, attempts to ensure fairness in DDI research should be considered more. To ensure the reliability of AI models, numerous methods also make an effort to examine the sensitivity of the models. Regarding their Adversarial Example-based Sensitivity Analysis, Zügner et al. ( 2018 ) used this model to explore graph-structured data. The technique looks at making changes to links between nodes or node properties to target node categorization models. Because graph-based methods are frequently utilized in DDIs research (Lin et al. 2021 ; Sun et al. 2020b ), methods like those used in the previous study suggest that they might be used in a DDIs prediction model. In RNN, word embedding perturbations (Miyato et al. 1605 ) are also worth addressing. Significantly, the input reduction strategy utilized by Feng et al. ( 2018 ) to expose hypersensitivity in NLP models could be applied to DDI extraction studies. The DDIs study of Schwarz et al. ( 2021 ) attempted to provide model interpretability using Attention ratings derived at all levels of modeling in their DDIs study. The significance of similarity matrices to the vectors for medication depiction is determined using these scores, and drug properties that contribute to improved encoding are identified using these scores. This method makes use of data that travels through all tiers of the network.

Graph neural networks (GNNs) and their explain ability are rapidly evolving in the field of graph data. GNNExplainer in Ying et al. ( 2019 ) uses mask optimization to learn soft masks for edge and node attributes to elaborate on the forecasts. Soft masks have been initiated at random and regarded as trainable variables. After that, the masks are then combined in comparison to the first graph using multiplications on a per-element basis by GNNExplainer. After that, by enhancing the exchange of information between the forecasts from the first graph and the recently acquired graph, the masks are maximized. Even when various regularization terms, such as element-by-element entropy, motivate optimal disguises for stealth, the resulting Masks remain supple.

In addition, because the masks are tuned for each input graph separately, it’s possible that the explanations aren't comprehensive enough. To elaborate on the forecasts, PGExplainer (Luo et al. 2020 ) discovers approximated discrete edge masks. To forecast edge masks, it develops a mask predictor that is parameterized. It starts by concatenating node embeddings to get the embeddings for each edge in an input graph. The predictor then forecasts the chances of each edge being selected using the edge embeddings, that regarded as an evaluation of significance. The reparameterization approach is then used to sample the approximated discrete masks. Finally, the mutual information between the previous and new forecasts is optimized to train the mask predictor. GraphMask (Schlichtkrull et al. 2010 ) describes the relevance of edges in each GNN layer after the fact. It uses a classifier, like the PGExplainer, to forecast if an edge may be eliminated and does not impact the original predictions. A binary concrete distribution (Louizos et al. 1712 ) and a reparameterization method are used to roughly represent separate masks. The classifier is additionally trained by removing a term for a difference, which evaluates the difference between network predictions over the entire dataset. ZORRO (Thorben et al. 2021 ) employs discrete masks to pinpoint key input nodes and characteristics. A greedy method is used to choose nodes or node attributes from an input network. ZORRO chooses one node characteristic with the greatest fidelity score for each stage. The objective function, fidelity score, measures the degree of the recent forecasts resemble the model's original predictions by replacing the rest of the nodes/features with random noise values and repairing chosen nodes/features. The non-differentiable limitation of discrete masks is overcome because no training process is used.

Furthermore, ZORRO avoids the problem of "introduced evidence" by wearing protective masks. The greedy mask selection process, on the other hand, may result in optimal local explanations. Furthermore, because masks are generated for each graph separately, the explanations may lack a global understanding. Causal Screening (Xiang et al. 2021 ) investigates the attribution of causality to various edges in the input graph. It locates the explanatory subgraph's edge mask. The essential concept behind causal attribution is to look at how predictions change when an edge is added to the present explanatory subgraph, called the influence of causality. It examines the causal consequences of many edges at each step and selects one to include in the paragraph. It selects edges using the individual causal effect (ICE), which assesses the difference in information between parties after additional edges are introduced to the subgraph.

Causal Screening, like ZORRO, is a rapacious algorithm that generates undetectable masks without any prior training. As a result, it does not suffer due to the issue of the evidence presented. However, it is possible to lack worldwide comprehension and be caught in optimum local explanations. SubgraphX (Yuan et al. 2102 ) investigates deep graph model subgraph-level explanations. It uses the Monte Carlo Tree Search (MCTS) method (Silver et al. 2017 ) to effectively investigate various subgraphs by trimming nodes and choose the most significant subgraph from the search tree's leaves as the explanation for the prediction.

Furthermore, the Shapley values can be used to update the mask generation algorithm's objective function. Its produced subgraphs are more understandable by humans and suited for graph data than previous perturbation-based approaches. However, the computational cost is higher because the MCTS algorithm explores distinct subgraphs.

9 Success stories about using DL in drug discovery

Big pharmaceutical companies have migrated toward AI as DL methodologies have advanced, abandoning conventional approaches to maximize patient and company profit. AstraZeneca is a multinational, science-driven, worldwide pharmaceutical company that has successfully used artificial intelligence in each stage of drug development, from virtual screening to clinical trials. They could comprehend current diseases better, identify new targets, plan clinical trials with higher quality, and speed up the entire process by incorporating AI into medical science. AstraZeneca's success is a shining illustration of how combining AI with medical science can yield incredible results. Their collaborations with other AI-based companies demonstrate their continual attempts to increase AI utilization. One such cooperation is with Ali Health, an Alibaba subsidiary that wants to provide AI-assisted screening and diagnosis systems in China (Nag et al. 2022 ).

SARS-CoV-2 virus outbreak placed many businesses under duress to develop the best medicine in the shortest amount of time feasible. These businesses have turned to employ AI in conjunction based on the data available to attain their goals. Below are some examples of firms that have been successful in identifying viable strategies to combat the COVID-19 virus because of their efforts.

Deargen, a South Korean startup, developed the MT-DTI (Molecule Transformer Drug Target Interaction Model), a DL-based drug-protein interaction prediction model. In this approach, the strength of an interaction between a drug and its target protein is predicted using simplified chemical sequences rather than 2D or 3D molecular structures. A critical protein on the COVID-19-causing virus SARS-CoV-2 is highly likely to bind to and inhibit the FDA-approved antiviral drug atazanavir, a therapy for HIV. It also discovered three more antivirals, as well as Remdesivir, a not-yet-approved medicine that is currently being studied in patients. Deagen's ability to uncover antivirals utilizing DL approaches is a significant step forward in pharmaceutical research, making it less time-consuming and more efficient. If such treatments are thoroughly evaluated, there is a good chance that we will be able to stop the epidemic in its tracks (Beck et al. 2020 ; Scudellari 2020 ).

Another example is Benevolent AI, a biotechnology company in London leverages medical information, AI, and machine learning to speed up health-related research. They've identified six medicines so far, one of which, Ruxolitinib, is claimed to be in clinical trials for COVID19 (Gatti et al. 2021 ). To find prospective medications that might impede the procedure for viral replication of SARS-CoV-2, The business has been utilizing a massive reservoir of information pertaining to medicine, together Utilizing data obtained from the scientific literature by their AI system and ML. They received FDA permission to use their planned Baricitinib medication in conjunction with Remdesivir, which resulted in a higher recovery rate for hospitalized COVID19 patients (Richardson et al. 2020 ).

Skin cancer is a form of cancer that is very frequent around the globe. As the rate at which skin cancer continues to rise, it is becoming increasingly crucial to diagnose it initially developed, research demonstrate that early identification and therapy improve the survival rate of skin cancer patients. With the advancement of medical research and AI, several skin cancer smartphone applications have been introduced to the market, allowing people with worrisome lesions to use a specialized technique to determine whether they should seek medical care. According to studies, over 235 dermatology smartphone apps were developed between 2014 and 2017 (Flaten et al. 2020 ). Previously, they worked by sending a snapshot of the lesion over the internet to a health care provider. Still, thanks to smartphones' internal AI algorithms, these applications can detect and classify images of lesions as high or low risk and Immediately assess the patient's risk and offer advice. SkinVison (Carvalho et al. 2019 ) is an example of a successful application.

10 Future challenges

10.1 digital twinning in drug discovery.

The development and implementation of Industry 4.0 emerging technologies allow for creation of digital twins (DTs), that promotes the modification of the industrial sector into a more agile and intelligent one. A DT is a digital depiction of a real entity that interacts in dynamic, two-way links with the original. Today, DTs are being used in a variety of industries. Even though the pharmaceutical sector has grown to accept digitization to embrace Industry 4.0, there is yet to be a comprehensive implementation of DT in pharmaceutical manufacture. As a result, it is vital to assess the pharmaceutical industry's success in applying DT solutions (Chen et al. 1088 ).

New digital technologies are essential in today's competitive marketplaces to promote innovation, increase efficiency, and increase profitability (Legner et al. 2017 ). AI (Venkatasubramanian 2019 ), Internet of Things (IoT) devices (Venkatasubramanian 2019 ; Oztemel and Gursev 2018 ), and DTs have all piqued the interest of governments, agencies, academic institutions, and corporations (Bao et al. 2018 ). Industry 4.0 is a concept offered by a professional community to increase the level of automation to boost productivity and efficiency in the workplace.

This section provides a quick look at the evolution of DT and its application in pharmaceutical and biopharmaceutical production. We begin with an overview of the technology's principles and a brief history, then present various examples of DTs in pharmacology and drug discovery. After then, there will be a discussion of the significant technical and other issues that arise in these kinds of applications.

10.1.1 History and main concepts of digital twin

The idea of making a "twin" of a process or a product returned to NASA's Apollo project in the late 1960s (Rosen et al. 2015 ; Mayani et al. 2018 ; Schleich et al. 2017 ), when it assembled two identical space spacecraft. In this scenario, the "twin" was employed to imitate the counterpart's action in real-time.

The DT, according to Guo et al. ( 2018 ), is a type of digital data structure that is generated as a separate entity and linked to the actual system. Michael Grieves presented the original meaning of a DT in 2002 at the University of Michigan as part of an industry presentation on product lifecycle management (PLM) (Grieves 2014 ; Grieves and Vickers 2017 ; Stark et al. 2019 ). However, the first actual use of this notion, which gave origin to the current moniker, occurred in 2010, when NASA (the United States National Aeronautics and Space Administration) attempted to create virtual spaceship simulators for testing (Glaessgen and Stargel 2012 ).

A digital reproduction or representation of a physical thing, process, or service is what a DT is in theory. It's a computer simulation with unique features that dynamically connect the physical and digital worlds. The purpose of DTs is to model, evaluate, and improve a physical object in virtual space til it matches predicted performance, at which time it can be created or enhanced (if already built) in the real world (Kamel et al. 2021 ; Marr 2017 ).

Since then, DT technology has acquired popularity in both business and academia. Main components of DTs presently exist, as shown in Fig.  14 . Still, the theoretical model comprises three parts: the real entity in the actual world, the digital entity in the virtual space, and the interconnection between them (Glaessgen and Stargel 2012 ).

figure 14

Main components of DT

In an ideal world, the digital component would have all the system's information that could be acquired from its physical counterpart (Kritzinger et al. 2018 ). When integrated with AI, IoT, and other recent intelligent systems, a DT can forecast how an object or process will perform.

10.1.2 Digital twin in pharmaceutical manufacturing

Developing a drug is lengthy and costly, requiring efforts in biology, chemistry, and manufacturing, and it has a low success rate. An estimated 50,000 hits (trial versions of compounds that are subsequently tweaked to develop a medication in the future) are evaluated to develop a successful drug. Only one in every 12 therapeutic compounds, clinical trials have been performed on humans, makes it to market successfully. Toxicity (A medication's capacity to offer a patient with respite and slow the progression of a disease) and lack of effectiveness contribute to more than 60% of all drug failures (Subramanian 2020 ).

Making the appropriate decisions about which targets, hits, leads, and compounds to pursue is important to a drug's successful market introduction. However, the decision is based on in vitro (Experimental system in a test tube or petri dish.) and in vivo (experiments in animals.) systems, both of which have a shaky correlation with clinical outcomes (Mak et al. 2014 ). Answers to the following inquiries would be provided by a perfect decision support system for drug discovery:

What is the magnitude of any target's influence on the desired clinical result?

Is the potential compound changing the target enough to change clinical outcomes?

Is the chemical sufficiently selective and free of side effects or harmful consequences?

Is the ineffectiveness attributable to the drug's failure to reach its target?

Has the trial chosen the appropriate dose and dosing regimen?

Are there any surrogate or biomarkers such as cholesterol that serves as a proxy for the illness's root cause that can forecast a drug's success or failure?

Have the correct patients been chosen for the study?

Is it possible to identify hyper- and hypo-responders before the study begins?

Therapeutic failures are prevalent and difficult to address, given the complex process of developing drugs based on the points above. This issue must be addressed by combining data and observations from many stages of the drug development process and developing a system that can forecast an experiment's outcome or a chemical modification's influence on a therapeutic molecule. This highlights the significance of DT in the field of drug discovery.

In the United States, funding organizations such as DARPA, NSF, and DOE have aggressively supported bioprocess modeling at the genomic and cellular levels, resulting in high-profile programs such as BioSPICE (Kumar and Feidler 2003 ). These groups have shown that smaller models built to answer specific issues can greatly influence drug development efficiency. This would make it possible to apply the prediction methodology to various stages of the drug discovery and research process, including confirmation of the target, enhancing leads, and choosing candidates, Recognition of biomarkers, fabrication of assays and screens, and the improvement of clinical trials.

The pharmaceutical business is embracing the overall digitization trend in tandem with the US FDA's ambition to establish an agile, adaptable pharmaceutical manufacturing sector that delivers high-quality pharmaceuticals without considerable regulatory scrutiny (O’Connor et al. 2016 ). Industries are beginning to implement Industry 4.0 and DT principles and use them for development and research (Barenji et al. 2019 ; Steinwandter et al. 2019 ; Lopes et al. 2019 ; Kumar et al. 2020 ; Reinhardt et al. 2020 ). Pharma 4.0 (Ierapetritou et al. 2016 ) is a digitalization initiative that integrates Industry 4.0 with International Council for Harmonisation (ICH) criteria to model a combined operational model and production control plan.

As shown in Fig.  15 , live monitoring of the system `by the Process Analytical Technology (PAT), data collection from the machinery, the supplementary and finished goods, and a worldwide modelling and software for data analysis are some of the key requirements for achieving smart manufacturing with DT (Barenji et al. 2019 ). Quality-by-Design (QbD) and Continuous Manufacturing (CM) (Boukouvala et al. 2012 ), flowsheet modeling (Kamble et al. 2013 ), and PAT implementations (James et al. 2006 ) have all been used by the pharmaceutical industry to achieve this. Although some of the instruments have been thoroughly examined, DTs' entire integration and development is still a work in progress.

figure 15

Main categories of smart manufacturing with DT

The pharmaceutical industry has used PAT in different programs across the steps involved in producing drugs (Nagy et al. 2013 ). Even though this has resulted in a rise in the use of PAT instruments, their implementations are limited to research and development rather than manufacturing on a large scale (Papadakis et al. 2018 ). They have been successful in decreasing production costs and enhancing product quality monitoring in the small number of examples where they have been used in manufacturing (Simon et al. 2019 ). The development of various PAT approaches, as well as their convincing implementation is a vital component of a scheme for surveillance and control (Boukouvala et al. 2012 ) and has given a foundation for obtaining essential data from the physical component.

Papadakis et al. ( 2018 ) recently provided a framework for identifying efficient reaction paths for pharmaceutical manufacture (Rantanen and Khinast 2015 ), which comprises modeling reaction route workflows discovery, analysis of reactions and separations, process simulation, assessment, optimization, and the use (Sajjia et al. 2017 ).

To develop models, data-driven modeling methods require the gathering and using of many substantial experiments, and the resulting models are solely reliant on the datasets provided. Artificial neural networks (ANN) (Pandey et al. 2006 ; Cao et al. 2018 ), multivariate statistical analysis, and in Monte Carlo Badr and Sugiyama ( 2020 ) are all commonly used in pharmaceutical manufacturing. These methods are less computationally costly, but the prediction outside the dataset space is frequently unsatisfactory due to the trained absence of underlying physics understanding in models. Using IoT devices in pharmaceutical manufacturing lines results in massive data collection volumes. The virtual component must receive this collection of process data and CQAs quickly and effectively. Additionally, for accurate prediction, several pharmaceutical process models need material properties. As a result, to provide virtual component access to all datasets, a central database site is necessary (Lin-Gibson and Srinivasan 2019 ).

10.1.3 Digital twin in biopharmaceutical manufacturing

The synthesis of big molecule-based entities in various combinations that has applications in the treatment of inflammatory, microbial, and cancer issues, is the focus of biopharmaceutical manufacturing (Glaessgen and Stargel 2012 ; Narayanan et al. 2020 ). The demand for biologic-based medications has risen in recent years, necessitating greater production efficiency and efficacy (Kamel et al. 2021 ). As a result, many businesses are switching from batch to continuous production and implementing intelligent manufacturing systems (Lin-Gibson and Srinivasan 2019 ). DT can aid in decision-making, risk analysis, product creation, and process prediction., which incorporates the physical plant, data collecting, data analysis, and system control (Tao et al. 2018 ).

biological products' components and structures are intimately connected to treatment effectiveness (Read et al. 2010 ) and are very sensitive to cell-line. Operating conditions thorough actual plant's virtual description in a simulation environment is required to apply DT in biopharmaceutical manufacturing (Tao et al. 2018 ). This means that each unit activity inside an integrated model's simulation should accurately reflect the crucial process dynamics. Previous reviews Narayanan et al. ( 2020 ) Tang et al. ( 2020 ) Farzan et al. ( 2017 ) Baumann and Hubbuch ( 2017 ) Smiatek et al. ( 2020 ) and Olughu et al. ( 2019 ) focused on process modelling methodologies for both upstream and downstream operations.

Data from a biopharmaceutical monitoring system is typically diverse regarding data kinds and time scales. A considerable amount of data is collected during biopharmaceutical manufacture thanks to the deployment of real-time PAT sensors. As a result, data pre-processing is required to deal with missing data, visualize data, and reduce dimensions (Gangadharan et al. 2019 ). In batch biopharmaceutical production, Casola et al. ( 2019 ) presented data mining-based techniques for stemming, classifying, filtering, and clustering historical real-time data. Lee et al. ( 2012 ) combined different spectroscopic techniques and used data fusion to forecast the composition of raw materials.

10.2 AI-driven digital twins in today's pharmaceutical drug discovery

In the pharmaceutical industry, challenges are emerging from clinical studies that make drug development incomplete, sluggish, uncertain, and maybe dangerous. For example, It is not a true reflection of reality where clinical trials can take into account that in the real world, just a small portion of a big and diverse population is depicted among the many billions of humans on the planet where it is not possible to get a view of how each person based on how they will respond to a medicine. Clinical trials' rigorous requirements for physical and mental health in some cases also result in failure because of a lack of qualified participants. Pharmaceutical firms battle to provide the precise number and kind of participants needed to comply with the stringent requirements of clinical trial designs. Also, in most trials, the actual drug is replaced by a placebo as this helps contrast how sick individuals behave when they are not administered the experimental medication; This implies that at least some trial participants do not receive it. Here, These issues can be solved by using digital twins, which can imitate a range of patient features, giving a fair representation of how a medicine affects a larger population. AI-enabled digital twinning may reduce the trial's setup by revealing how susceptible a patient is to various inclusion and exclusion criteria as a result, patients can be rapidly identified, and digital twins can predict a patient's reaction, and placebos won't be required. Therefore, the new treatment can be assured for every patient in the trial, and digital twins can reduce the dangerous impact of drugs in the early stages by decreasing the number of patients who need to be tested in the real world. Figure  16 illustrates a framework by running all possible combinations. All treatment protocols are tested on a digital twin of the patient to discover an appropriate treatment protocol for this patient. Doing this quickly and accurately can lead to providing the best quality treatment for the patient without experimenting with the patient, which saves effort, cost, and accuracy in determining an appropriate treatment protocol for patients.

figure 16

AI-driven digital twins in today's pharmaceutical drug discovery

11 Open problems

This section discusses important issues to consider regarding progression from preclinical to clinical and implementation in practice that necessitate new ML solutions to assist transparent, usable, and data-driven decision-making procedures to accelerate drug discovery and decrease the number of failures in clinical development phases.

Complex disorders, such as viral infections and advanced malignancies frequently necessitate drug combinations (Julkunen et al. 2020 ; White et al. 2021 ). For example, kinase inhibitor combos or single compounds that block several kinases may improve therapeutic efficacy and duration while combating treatment resistance in cancer (Attwood et al. 2021 ). While several ML models have been created to predict response pairs of drug–dose combinations, higher-order combination effects can be predicted in a systematic way involving more than two medicines or targets is still a problem. In cancer cell lines, tensor learning methods have permitted reliable prediction of paired drug combination dose-response matrices (Smiatek et al. 2020 ). This computationally efficient learning approach could use extensive pharmacogenomic data, determine which drug combinations are most successful for additional in vitro or in vivo testing in many kinds of preclinical models, such as higher-order combinations among novel therapeutic compounds and doses.

While possible toxicity and effectiveness that is targeted are important criteria for clinical development success, most existing ML models for predicting response to the therapy accentuate effectiveness as the primary result. As a result, careful examination, and harmful effects prediction of instances in simulated and preclinical settings is required to strike a balance between the effectiveness of the toxicity and therapy that is acceptable to accelerate the next stages of drug development (Narayanan et al. 2020 ). Applying single-cell data and ML algorithms to develop combinations of anticancer drugs has shown the potential to boost the likelihood of clinical success (Tao et al. 2018 ). Transfer of knowledge and deconvolution techniques for in silico cell set (Avila et al. 2020 ) may offer effective ways to reduce the requirement to generate a lot of single-cell data to predict combination therapy responders and impacts of toxicity, as well as the recommended dosage that optimizes both efficacy and safety.

In addition, patient data and clinical profiles must be used to validate the in-silico therapy response forecasts. This real data for ML predictions is crucial for progress in medicine and establishing the practical value and providing clinical guidance in making decisions. A no-go decision was made early, for example, if the substance has harmful consequences. Many of the present issues encountered when using machine learning for drug discovery, particularly in clinical development, are since current AI algorithms do not meet the requirements for clinical research. As a result, ML model validation requires systematic and comprehensive high-quality clinical data sets. The discovery methods must be thoroughly evaluated for accuracy and reproducibility using community-agreed performance measures in various settings, not just a small collection of exemplary data sets. sharing and exploiting private patient information is possible with systems that isolate the code from the data or use the model to data method (Guinney and Saez-Rodriguez 2018 ), which It makes it possible for federated learning to utilise patient-level data for model construction and thorough assessment.

Even if there are many applications for drug discovery, The majority of ML and particularly DL models remain "black boxes”, and interpretation by a human specialist is sometimes tricky (Jiménez-Luna et al. 2020 ). Implementing mathematical models as online decision support tools must be understandable to users to obtain confidence. Comprehensible, accessible, and explainable models should clearly state the optimization goals, such as synergy, efficacy, and/or toxicity.

DTI prediction is a notable example of fields of drug discovery research. It has been ongoing more than 10 years and aims to enhance the effectiveness of computational models using various technologies. The most recent computational approaches for predicting DTIs are DL technologies. These use unstructured-based approaches that don't need 3D structural data or docking to get over the drug and target protein's high-dimensional structure restrictions. Despite the DL's outstanding performance, regression inside the DTI prediction remains a critical and difficult issue, and researchers could develop several strategies to improve prediction accuracy. Furthermore, data scarcity and the lack of a standardized benchmark database are still considered current research gaps.

While DL approaches show promise in detecting drug responses, especially when dealing with large amounts of data, drug response prediction research is in its first stages, and more efficient and relevant models are needed.

While DL techniques have shown to be effective in detecting DDIs, especially when dealing with large amounts of data, more promising algorithms that focus on complex molecular reactions need to be developed.

Only a few studies in the drug discovery field have investigated their models' explain ability, leaving much room for improvement. The explanations generated by XAI for human decision-making must be not insignificant, not artificial, and helpful to the scientific community. Until now, ensuring that XAI techniques achieve their goals and produce trustworthy responses would necessitate a combined effort amongst DL specialists, chemo informaticians and chemists, biologists, data scientists, and other subject matter experts. As a result, we believe that more developed methodologies to explain black-box models for drug discovery fields like DDIs, drug–target interactions, drug sensitivity, and drug side effects must be considered in the future to ensure model fairness or strict sensitivity evaluations of models. Further exploration of the capabilities and constraints of the existing chemical language for defining these models will be critical. The development of novel interpretable molecular representations for DL and the deployment of self-explanatory algorithms alongside sufficiently accurate predictions will be a critical area of research in the coming years. Because there are currently no methods that combine all the stated advantageous XAI characteristics (transparency, justification, informativeness, and uncertainty estimation), consensus techniques that draw on the advantages of many XAI approaches and boost model dependability will play a major role in the short and midterm. Currently, there is no open-community platform for exchanging and refining XAI software and model interpretations in drug discovery. As a result, we believe that future study into XAI in drug development has much potential.

12 Discussion

This section presents a brief about how the proposed analytical questions in Sect.  2 are being answered through the paper.

Several DL algorithms have been used to predict the different categories of drug discovery problems as deeply illustrated in Sect. 4 with respect to the main categories of drug discovery problems in Fig.  8 . In addition, a summary of a sample of these algorithms, their methods, advantages and weaknesses are presented in Table 2 .

Recognizing the characteristics that make medications suitable for precision dosage targets will aid in directing resources to where they'll have the most impact. Employing DL in drug dosing optimization is a big challenge which increases the health care performance, safety, and cost-effectiveness as presented in Sect.  7 .

With the advancement of DL methods, we've seen big pharmaceutical businesses migrate toward AI, such as ‘AstraZeneca’ which is a global multinational pharmaceutical business that has successfully used AI in every stage of drug development. Several success stories have been presented in Sect.  9 .

AQ4: What about using the newest technologies such as XAI and DT in drug discovery?

The topic of XAI addresses one of the most serious flaws in ML and DL algorithms: model interpretability and explain ability. It would be impossible to trust the forecasts of real-world AI applications without interpretability and explain ability. Section  8 presents the literature that address this issue. A digital twin (DT) is a virtual representation of a living thing that is connected to the real thing in dynamic, reciprocal ways. Today, DTs are being used in a variety of industries. Even though the pharmaceutical sector has grown to accept digitization to embrace Industry 4.0, there is yet to be a comprehensive implementation of DT in pharmaceutical manufacture. Success stories regarding employing DT into drug discovery is presented in Sect. 10.

AQ5: What are the future and open works related to the drug discovery and DL?.

Through the paper, we present how DL succeed in all aspects of drug discovery problems, However, it is still a very important challenge for future research. Section 11 covers these challenges.

Figure  17 presents the percentage of the different DL applications for each building block of our study. It is well observed that the most percentage segment is dedicated for the drug discovery and DL because it is the main core of our research.

figure 17

Percentages of DL applications for each category

13 Conclusion

Despite all the breakthroughs in pharmacology, developing new drugs still requires a lot of time and costs. As DL technology advances and the amount of drug-related data grows, a slew of new DL-based approaches is cropping up at every stage of the drug development process. In addition, we’ve seen large pharmaceutical corporations migrate toward AI in the wake of the development of DL approaches.

Although the drug discovery is a large field and has different research categories, there is a few review studies about this field and each related study has focused only on a one research category such as reviewing the DL applications for the DTIs. So, the main goal of our research is to present a systematic Literature review (SLR) which integrates the recent DL technologies and applications for the different categories of drug discovery problems Including, Drug–target interactions (DTIs), drug–drug similarity interactions (DDIs), drug sensitivity and responsiveness, and drug-side effect predictions. That is associated with the benchmark data sets and databases. Related topics such as XAI and DT and how they support the drug discovery problems are also discussed. In addition, the drug dosing optimization and success stories are presented as well. Finally, we suggest open problems as future research challenges.

Although the DL has proved its strength in drug discovery problems, it is still a promising open research area for the interested researchers. In this paper, they can find all they want to know about using DL in various drug discovery problems. In addition, they can find success stories and open areas for future research.

Given the recent success of DL approaches and their use by pharmaceuticals in identifying new medications, it seems clear that current DL techniques being highly regarded in the next generation of enormous data investigation and evaluation for drug discovery and development.

Abramovich I, Ben-Yehuda T, Cohen R (2018) Low-complexity video classification using recurrent neural networks. IEEE Int Conf Sci Electr Eng Israel (ICSEE) 2018:1–4. https://doi.org/10.1109/ICSEE.2018.8646076

Article   Google Scholar  

Adadi A, Mohammed B (2018) Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6:2169–3536

Google Scholar  

Ahmed KT, Park S, Jiang Q et al (2020) Network-based drug sensitivity prediction. BMC Med Genomics 13:193

Alankrita A, Mamta M, Gopi B (2021) Generative adversarial network: an overview of theory and applications. Int J Inf Manag Data Insights 1(1):100004

Amashita R, Nishio M, Do RKG et al (2018) Convolutional neural networks: an overview and application in radiology. Insights Imaging 9:611–629. https://doi.org/10.1007/s13244-018-0639-9

Andreea D, Yu-Hsiang H, Petar V, Pietro L, Jian T (2019) Drug–drug adverse effect prediction with graph co-attention. https://arxiv.org/abs/1905.00534

Arshed MA, Mumtaz S, Riaz O, Sharif W, Abdullah S (2022) A deep learning framework for multi drug side effects prediction with drug chemical substructure. Int J Innovat Sci Technol 4(1):19–31

Arus-Pous J, Patronov A, Bjerrum EJ, Tyrchan C, Reymond JL, Chen H, Engkvist O (2020) SMILES-based deep generative scaffold decorator for de-novo drug design. J Cheminform 12:1–18

Asada M, Miwa M, Sasaki Y (2018) Enhancing drug–drug interaction extraction from texts by molecular structure information. In: proceedings of the 56th annual meeting of the association for computational linguistics. 2, pp 680–685, https://doi.org/10.18653/v1/P18-2108

Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT et al (2000) Gene ontology: tool for the unification of biology. Nat Genet 25:25–29

Attwood MM, Fabbro D, Sokolov AV et al (2021) Trends in kinase drug discovery: targets, indications and inhibitor design. Nat Rev Drug Discov 20(11):839–861

Avila C, Alquicira-Hernandez J, Powell JE et al (2020) Benchmarking of cell type deconvolution pipelines for transcriptomics data. Nat Commun 11(1):5650

Azad AKM, Dinarvand M, Nematollahi A, Swift J, Lutze-Mann L, Vafaee F (2021) A comprehensive integrated drug similarity resource for in-silico drug repositioning and beyond. Brief Bioinform 22(3):bbaa126. https://doi.org/10.1093/bib/bbaa126

Badr S, Sugiyama H (2020) A PSE perspective for the efficient production of monoclonal antibodies: integration of process, cell, and product design aspects. Curr Opin Chem Eng 27:121–128

Bao J, Guo D, Li J, Zhang J (2018) The modelling and operations for the digital twin in the context of manufacturing. Enterp Inf Syst 13:534–556

Baptista D, Ferreira PG, Rocha M (2021) Deep learning for drug response prediction in cancer. Briefings Bioinform 22:360–379

Barenji RV, Akdag Y, Yet B, Oner L (2019) Cyber-physical-based PAT (CPbPAT) framework for Pharma 4.0. Int J Pharm 567:118445

Baumann P, Hubbuch J (2017) Downstream process development strategies for effective bioprocesses: Trends, progress, and combinatorial approaches. Eng Life Sci 17:1142–1158

Beck BR, Shin B, Choi Y, Park S, Kang K (2020) Predicting commercially available antiviral drugs that may act on the novel coronavirus (SARS-CoV-2) through a drug–target interaction deep learning model. Comput Struct Biotechnol J 18:784–790

Bedi P, Sharma C, Vashisth P, Goel D, Dhanda M (2015) Handling cold start problem in Recommender Systems by using Interaction Based Social Proximity factor. In: Proceeding of the 2015 international conference on advances in computing, communications and informatics, Kerala, India, 10–13 August 2015; pp 1987–1993

Benedek R, Stephen B, Andriy N, Michael U, Sebastian N, Eliseo P (2021) A unified view of relational deep learning for drug pair scoring. coRR V. https://arxiv.org/abs/2111.02916 .

Betsabeh T, Mansoor ZJ (2021) Using drug–drug and protein-protein similarities as feature vector for drug–target binding prediction. Chemom Intell Lab Syst 217:104405. https://doi.org/10.1016/j.chemolab.2021.104405

Bleakley K, Yamanishi Y (2009) Supervised prediction of drug–target interactions using bipartite local models. Bioinformatics 25:2397–2403

Bolukbasi T (2016) Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems, 2016; 29. In Identifying gender and sexuality of data subjects. https://cis.pubpub.org/pub/debiasing-word-embeddings-2016 .

Bongini P, Pancino N, Dimitri GM, Bianchini M, Scarselli F, Lio P (2022) Modular multi-source prediction of drug side-effects with DruGNN. http://arxiv.org/abs/2202.08147 .

Boobier S, Osbourn A, Mitchell JB (2017) Can human experts predict solubility better than computers? J Cheminform 9:63

Boukouvala F, Niotis V, Ramachandran R, Muzzio FJ, Ierapetritou MG (2012) An integrated approach for dynamic flowsheet modeling and sensitivity analysis of a continuous tablet manufacturing process. Comput Chem Eng 42:30–47

Brown AS, Patel CJ (2017) MeSHDD: literature-based drug-drug similarity for drug repositioning. J Am Med Inf Assoc 24(3):614–618

Camacho DM, Collins KM, Powers RK, Costello JC, Collins JJ (2018) Next-generation machine learning for biological networks. Cell 173:1581–1592

Campillos M et al (2008) Drug target identification using side-effect similarity. Science 321(5886):263–666. https://doi.org/10.1126/science.1158140

Cao H, Mushnoori S, Higgins B, Kollipara C, Fermier A, Hausner D, Jha S, Singh R, Ierapetritou M, Ramachandran R (2018) A systematic framework for data management and integration in a continuous pharmaceutical manufacturing processing line. Processes 6:53

Casola G, Siegmund C, Mattern M, Sugiyama H (2019) Data mining algorithm for pre-processing biopharmaceutical drug product manufacturing records. Comput Chem Eng 124:253–269

Chabner BA (2016) NCI-60 cell line screening: a radical departure in its time. J Natl Cancer Inst. https://doi.org/10.1093/jnci/djv388

Chander A, Srinivasan R, Chelian S, Wang J, Uchino K (2018) Working with beliefs: AI transparency in the enterprise. In: Joint proceedings of the ACM IUI 2018 workshops co-located with the 23rd acm conference on intelligent user interfaces 2068 (eds Said, A. and Komatsu, T.) (CEUR-WS.org, 2018)

Chandra B, Sharma RK (2017) On improving recurrent neural network for image classification. Int Jt Conf Neural Netw (IJCNN) 2017:1904–1907. https://doi.org/10.1109/IJCNN.2017.7966083

Chang Y, Park H, Yang HJ, Lee S, Lee KY, Kim TS, Jung J, Shin JM (2018) Cancer drug response profile scan (CDRscan): a deep learning model that predicts drug effectiveness from cancer genomic signature. Sci Rep 8:1–11

Chauhan R, Ghanshala KK, Joshi RC (2018) Convolutional neural network (CNN) for image detection and recognition. First Int Conf Secure Cyber Comput Commun (ICSCCC) 2018:278–282. https://doi.org/10.1109/ICSCCC.2018.8703316

Chen AW (2018) Predicting adverse drug reaction outcomes with machine learning. Int J Commun Med Public Health 5(3):901–904

Chen JY, Mamidipalli S, Huan T (2009) Happi: an online database of comprehensive human annotated and predicted protein interactions. BMC Genomics 10(1):S16

Chen X, Liu M-X, Yan G-Y (2012) Drug–target interaction prediction by random walk on the heterogeneous network. Mol BioSyst 8:1970–1978. https://doi.org/10.1039/C2MB00002D

Chen Y, Yang O, Sampat C, Bhalode P, Ramachandran R, Ierapetritou M (2020) Digital twins in pharmaceutical and biopharmaceutical manufacturing: a literature review. Processes 8(9):1088. https://doi.org/10.3390/pr8091088

Cheng F, Kovács IA, Barabási AL (2019) Network-based prediction of drug combinations. Nat Commun 10(1):1–11

Chiu Y-C, Chen H-IH, Zhang T, Zhang S, Gorthi A, Wang L-J, Huang Y, Chen Y (2019) Predicting drug response of tumors from integrated genomic profiles by deep neural networks. BMC Med Genomics 12:119

Chu X, Lin Y, Gao J, Wang J, Wang Y, Wang L (2018) Multi-label robust factorization autoencoder and its applicationin predicting drug–drug interactions. arXiv:1811.00208 .

Chu X, Lin Y, Wang Y, Wang L, Wang J, Mlrda JG (2019) A multitask semi-supervised learning framework for drug–drug interaction prediction. In: proceedings of the international joint conference on artificial intelligence, pp 4518– 4524

Ciallella HL, Zhu H (2019) Advancing computational toxicology in the big data era by artificial intelligence: data-driven and mechanism-driven modeling for chemical toxicity. Chem Res Toxicol 32:536–547

Cortes-Ciriano I, Ain QU, Subramanian V, Lenselink EB, Méndez-Lucio O, IJzerman AP, Wohlfahrt G, Prusis P, Malliavin TE, van Westen GJP et al (2015) Polypharmacology modelling using proteochemometrics (PCM): recent methodological developments, applications to target families, and future prospects. Medchemcomm 6:24–50

Cortés-Ciriano I, Bender A (2019) KekuleScope: prediction of cancer cell line sensitivity and compound potency using convolutional neural networks trained on compound images. J Cheminform 11:1–16

Dai L, Zhu H, Liu D (2020) Patient similarity: methods and applications. http://arxiv.org/abs/2012.01976

David L, Arús-Pous J, Karlsson J, Engkvist O, Bjerrum EJ, Kogej T, Kriegl JM, Beck B, Chen H (2019) Applications of deep-learning in exploiting large-scale and heterogeneous compound data in industrial pharmaceutical research. Front Pharmacol 10:1303

Davis MI, Hunt JP, Herrgard S, Ciceri P, Wodicka LM, Pallares G, Hocker M, Treiber DK, Zarrinkar PP (2011) Comprehensive analysis of kinase inhibitor selectivity. Nat Biotechnol 29:1046–1051

De Carvalho TM, Noels E, Wakkee M, Udrea A, Nijsten T (2019) Development of smartphone apps for skin cancer risk assessment: progress and promise. JMIR Dermatol 2(1):e13376

De Kuijper GM, Risselada A, van Dijken R (2019) Monitoring drug side-effects. Handbook of intellectual disabilities. Springer, Cham, pp 275–301

“deepchem/deepchem: Democratizing Deep-Learning for Drug Discovery”; Quantum Chemistry, Materials Science and Biology; Available online: https://github.com/deepchem/deepchem (accessed on 15 April 2022).

Dey S, Luo H, Fokoue A, Hu J, Zhang P (2018) Predicting adverse drug reactions through interpretable deep learning framework. BMC Bioinform 19:476

Dincer AB, Celik S, Hiranuma N, Lee S-I (2018) DeepProfile: deep learning of cancer molecular profiles for precision medicine. bioRxiv. https://doi.org/10.1101/278739

Ding MQ, Chen L, Cooper GF, Young JD, Lu X (2018) Precision oncology beyond targeted therapy: combining omics data with machine learning matches the majority of cancer cells to effective therapeutics. Mol Cancer Res 16:269–278

Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. https://arxiv.org/abs/1702.08608

DrugBank (2019) DrugBank Release Version 5.1.3, chemical structures. https://www.drugbank.com

Dua D, Graff C (2017) UCI machine learning repository. https://archive.ics.uci.edu/ml/index.php

El-Deredy W et al (1997) Pretreatment prediction of the chemotherapeutic response of human glioma cell cultures using nuclear magnetic resonance spectroscopy and artificial neural networks. Cancer Res 57:4196–4199

Farzan P, Mistry B, Ierapetritou MG (2017) Review of the important challenges and opportunities related to modeling of mammalian cell bioreactors. AIChE J 63:398–408

Fatehifar M, Karshenas H (2021) Drug–drug interaction extraction using a position and similarity fusion-based attention mechanism. J Biomed Inf 115:103707. https://doi.org/10.1016/j.jbi.2021.103707

Feng S, et al (2018) Pathologies of neural models make interpretations difficult. http://arxiv.org/abs/1804.07781

Feng Q, Dueva E, Cherkasov A, Ester M (2018) PADME: a deep learning-based framework for drug–target interaction prediction. arXiv 2018; arXiv:1807.09741

Feng YH, Zhang SW, Shi JY (2020) DPDDI: a deep predictor for drug–drug interactions. BMC Bioinform 21:419. https://doi.org/10.1186/s12859-020-03724-x

Ferdousi R, Safdari R, Omidi Y (2017) Computational prediction of drug–drug interactions based on drugs functional similarities. J Biomed Inform. https://doi.org/10.1016/j.jbi.2017.04.021

Finn RD et al (2013) Pfam: the protein families database. Nucleic Acids Res 42(D1):D222–D230

Flaten HK, St Claire C, Schlager E, Dunnick CA, Dellavalle RP (2020) Growth of mobile applications in dermatology. Dermatol Online J 24(2):13–16

Fleischhack G, Massimino M, Warmuth-Metz M, Khuhlaeva E, Janssen G, Graf N et al (2019) Nimotuzumab and radiotherapy for treatment of newly diagnosed diffuse intrinsic pontine glioma (DIPG): a phase III clinical study. J Neurooncol 143:107–113. https://doi.org/10.1007/s11060-019-03140-z

Fokoue A, Sadoghi M, Hassanzadeh O, Zhang P (2016) Predicting drug–drug interactions through large-scale similarity-based link prediction. In: European semantic web conference 2016 May 29; pp 774–789

Fushman D, Shooshan SE, Rodriguez L, Aronson AR, Lang F, Rogers W, Tonning J (2018) A dataset of 200 structured product labels annotated for adverse drug reactions. Sci Data 5:180001

Gangadharan N, Turner R, Field R, Oliver SG, Slater N, Dikicioglu D (2019) Metaheuristic approaches in biopharmaceutical process development data analysis. Bioprocess Biosyst Eng 42:1399–1408

Gao Z et al (2008) PDTD: a web-accessible protein database for drug target identification. BMC Bioinf 9(1):104

Gao KY, Fokoue A, Luo H, Iyengar A, Dey S, Zhang P (2017) Interpretable drug target prediction using deep neural representation. In: Proceedings of the international joint conference on artificial intelligence, Melbourne, Australia, 19–25 August 2017

Gao K, Duy Nguyen D, Sresht V, Mathiowetz AM, Tu M, Wei G-W (2019) Are 2D fingerprints still valuable for drug discovery? Phys Chem Chem Phys 22:8373–8390

Gatti M, Turrini E, Raschi E, Sestili P, Fimognari C (2021) Janus kinase inhibitors and coronavirus disease (COVID)-19: rationale, clinical evidence and safety issues. Pharmaceuticals 14(8):738

Gaulton A et al (2011) ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res 40(D1):D1100–D1107

Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for quantum chemistry. 34th Int Conf Mach Learn ICML 3:2053–2070

Glaessgen EH, Stargel DS (2012) The digital twin paradigm for future NASA and US Air Force vehicles. In: Proceedings of the 53rd AIAA/ASME/ASCE/AHS/ASC structures, structural dynamics and materials conference, Honolulu, HI, USA. https://ntrs.nasa.gov/citations/20120008178

Goebel R et al (2018) Explainable AI: the new 42? In: Holzinger A, Kieseberg P, Tjoa A, Weippl E (eds) Machine learning and knowledge extraction. CD-MAKE Lecture Notes in Computer Science. Springer, New York

Gómez-Bombarelli R et al (2018) Automatic chemical design using a data-driven continuous representation of molecules. ACS Cent Sci 4:268–276

Grieves M (2014) Digital twin: manufacturing excellence through virtual factory replication. Glob J Eng Sci Res. https://doi.org/10.5281/zenodo.1493930

Grieves M, Vickers J (2017) Digital twin: mitigating unpredictable undesirable emergent behavior in complex systems. Springer, Cham, pp 85–113

Guidotti R et al (2018) A survey of methods for explaining black box models. ACM Comput Surv 51:93

Guinney J, Saez-Rodriguez J (2018) Alternative models for sharing confidential biomedical data. Nat Biotechnol 36(5):391–392

Gunther S et al (2007) SuperTarget and Matador: resources for exploring drug–target relationships. Nucleic Acids Res 36:D919–D922

Hamilton WL (2020) Graph representation learning. Synth Lect Artif Intell Mach Learn 14:1–159

MATH   Google Scholar  

Han X, Xie R, Li X, Li J (2022) SmileGNN: drug–drug interaction prediction based on the smiles and graph neural network. Life (basel). 12(2):319. https://doi.org/10.3390/life12020319

Hao M, Wang Y, Bryant SH (2016) Improved prediction of drug–target interactions using regularized least squares integrating with kernel fusion technique. Anal Chim Acta 909:41

Hassan-Harrirou H, Zhang C, Lemmin T (2020) RosENet: improving binding affinity prediction by leveraging molecular mechanics energies with an ensemble of 3D convolutional neural networks. J Chem Inf Model 60:2791–2802

He C, Liu Y, Li H, Zhang H, Mao Y, Qin X, Liu L, Zhang X (2022) Multi-type feature fusion based on graph neural network for drug-drug interaction prediction. BMC Bioinf 23(1):1–8

Hecker N et al (2011) SuperTarget goes quantitative: update on drug–target interactions. Nucleic Acids Res 40(D1):D1113–D1117

Hermanto A, Adji TB, Setiawan NA (2015) Recurrent neural network language model for English-Indonesian machine translation: experimental study. Int Conf Sci Inf Technol (ICSITech) 2015:132–136. https://doi.org/10.1109/ICSITech.2015.7407791

Hinton G (2011) Boltzmann machines. In: Sammut C, Webb GI (eds) Encyclopedia of machine learning. Springer, Boston

Hirohara M, Saito Y, Koda Y, Sato K, Sakakibara Y (2018) Convolutional neural network based on SMILES representation of compounds for detecting chemical motif. BMC Bioinform 19:83–94

Hizukuri Y, Sawada R, Yamanishi Y (2015) Predicting target proteins for drug candidate compounds based on drug-induced gene expression data in a chemical structure-independent manner. BMC Med Genomics 8:82

Hou X, You J, Hu P (2019) Predicting drug–drug interactions using deep neural network. In: proceedings of the 11 th international conference on machine learning and computing, pp 168–172

http://zinc.docking.org

https://bioinf-applied.charite.de/supernatural_new/index.php .

https://friendsofcancerresearch.org/wpcontent/uploads/Optimizing_Dosing_in_Oncology_Drug_Development.pdf .

https://ncats.nih.gov/tox21

https://pharmacodb.pmgenomics.ca/datasets/4

https://sites.broadinstitute.org/ccle/

https://string-db.org/cgi/download.pl?sessionId=uKr0odAK9hPs

https://www.cancer.gov/about-nci/organization/ccct/ctrp

https://www.ebi.ac.uk/chebi/

https://www.sciencedirect.com/topics/drug-response

Hu J, Gao J, Fang X, Liu Z, Wang F, Huang W, Wu H, Zhao G (2022) DTSyn: a dual-transformer-based neural network to predict synergistic drug combinations. bioRxiv. https://doi.org/10.1101/2022.03.29.486200

Huang C-T et al (2018) A large-scale gene expression intensity-based similarity metric for drug repositioning. iScience 7:40–52

Huang K, Xiao C, Hoang TN, Glass LM, Sun J (2020) Caster: predicting drug interactions with chemical substructure representation. In: AAAI 2020 34th AAAI Conference on Artificial Intelligence, American Association for Artificial Intelligence (AAAI) Press, pp 702–709

Ibrahim H, El Kerdawy AM, Abdo A, Eldin AS (2021) Similarity-based machine learning framework for predicting safety signals of adverse drug–drug interactions. Inf Med Unlocked 26:100699

Ierapetritou M, Muzzio F, Reklaitis G (2016) Perspectives on the continuous manufacturing of powder-based pharmaceutical processes. AIChE J 62:1846–1862

Iorio F et al (2010) Discovery of drug mode of action and drug repositioning from transcriptional responses. PNAS 107(33):14621–14626. https://doi.org/10.1073/pnas.1000138107

Iorio F, Knijnenburg TA, Vis DJ, Bignell GR, Menden MP, Schubert M, Aben N, Gonçalves E, Barthorpe S, Lightfoot H et al (2016) A landscape of pharmacogenomic interactions in cancer. Cell 166:740–754

James M, Stanfield CF, Bir G (2006) A review of process analytical technology (PAT) in the US pharmaceutical industry. Curr Pharm Anal 2:405–414

Ji ZL, Han LY, Yap CW, Sun LZ, Chen X, Chen YZ (2003) Drug adverse reaction target database (DART). Drug Saf 26(10):685–690

Jiménez-Luna J, Grisoni F, Schneider G (2020) Drug discovery with explainable artificial intelligence. Nat Mach Intell 2(10):573–584

Julkunen H, Cichonska A, Gautam P et al (2020) Leveraging multi-way interactions for systematic prediction of pre-clinical drug combination effects. Nat Commun 11(1):6136

Kamath U, Liu J (2021) Explainable artificial intelligence: an introduction to interpretable machine learning. Springer, Cham

Kamble R, Sharma S, Varghese V, Mahadik K (2013) Process analytical technology (PAT) in pharmaceutical development and its application. Int J Pharm Sci Rev Res 23:212–223

Kamel Boulos MN, Zhang P (2021) Digital twins: from personalised medicine to precision public health. J Person Med 11(8):745

Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28(1):27–30

Karim MR, Cochez M, Jares JB, Uddin M, Beyan O, Decker S (2019) Drug–drug interaction prediction based on knowledge graph embeddings and convolutional-LSTM network. In: Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics, pp 113–123

Karim MR, Cochez M, Jares JB, Uddin M, Beyan O, Decker S (2019) Drug–drug interaction prediction based on knowledge graph embeddings and convolutional-LSTM network. In: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics 2019, pp 113–123

Karpov P, Godin G, Tetko IV (2020) Transformer-CNN: Swiss knife for QSAR modeling and interpretation. J Cheminform 12:17

Kastrin A, Ferk P, Leskošek B (2018) Predicting potential drug–drug interactions on topological and semantic similarity features using statistical learning. PLoS ONE 13(5):e0196865

Keum J, Nam H (2017) SELF-BLM: prediction of drug–target interactions via self-training SVM. PLoS ONE 12:e0171839

Kim S, Thiessen PA, Bolton EE, Chen J, Fu G, Gindulyte A, Han L, He J, He S, Shoemaker BA et al (2016) PubChem substance and compound databases. Nucleic Acids Res 44:D1202–D1213

Kim J, Park S, Min D, Kim W (2021) comprehensive survey of recent drug discovery using deep learning. Int J Mol Sci 22:9983. https://doi.org/10.3390/ijms22189983

Koes DR, Baumgartner MP, Camacho CJ (2013) Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise. J Chem Inf Model 53:1893–1904

Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480

Korkmaz S (2020) Deep learning-based imbalanced data classification for drug discovery. J Chem Inf Model 60:4180–4190

Kritzinger W, Karner M, Traar G, Henjes J, Sihn W (2018) Digital Twin in manufacturing: a categorical literature review and classification. IFAC-PapersOnLine 51:1016–1022

Kuenzi BM et al (2020) Predicting drug response and synergy using a deep learning model of human cancer cells. J Elsevier Cancer Cell 38(5):1535–6108. https://doi.org/10.1016/j.ccell.2020.09.014

Kuhn M et al (2010) A side effect resource to capture phenotypic effects of drugs. Mol Syst Biol 6(1):343

Kuhn M et al (2013) STITCH 4: integration of protein–chemical interactions with user data. Nucleic Acids Res 42(D1):D401–D407

Kumar SP, Feidler JC (2003) BioSPICE: a computational infrastructure for integrative biology. OMICS J Integr Biol 7(3):225. https://doi.org/10.1089/153623103322452350

Kumar S, Talasila D, Gowrav M, Gangadharappa H (2020) Adaptations of pharma 4.0 from industry 4.0. Drug Invent Today 14:405–415

Lamb J, Crawford ED, Peck D, Modell JW, Blat IC, Wrobel MJ, Lerner J, Brunet JP, Subramanian A, Ross KN et al (2006) The connectivity map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313:1929–1935

Lapuschkin S et al (2019) Unmasking clever Hans predictors and assessing what machines really learn. Nat Commun 10:1096

Lee CY, Chen YP (2021) Descriptive prediction of drug side-effects using a hybrid deep learning model. Int J Intell Syst 36(6):2491–2510

MathSciNet   Google Scholar  

Lee H, Kim W (2019) Comparison of target features for predicting drug–target interactions by deep neural network based on large-scale drug-induced transcriptome data. Pharmaceutics 11:377

Lee HW, Christie A, Xu J, Yoon S (2012) Data fusion-based assessment of raw materials in mammalian cell culture. Biotechnol Bioeng 109:2819–2828

Lee G, Park C, Ahn J (2019) Novel deep learning model for more accurate prediction of drug–drug interaction effects. BMC Bioinform 20(1):415

Lee I, Keum J, Nam H (2019) DeepConv-DTI: prediction of drug–target interactions via deep learning with convolution on protein sequences. PLoS Comput Biol 15:1–21

Legner C, Eymann T, Hess T, Matt C, Böhmann T, Drews P, Mädche A, Urbach N, Ahlemann F (2017) Digitalization: opportunity and challenge for the business and information systems engineering community. Bus Inf Syst Eng 59:301–308

Lei T, Barzilay R, Jaakkola T (2016) Rationalizing neural predictions. In: 2016 conference on empirical methods in natural language processing, 2016; Austin, Texas: Association for computational linguistics, pp 107—117. https://aclanthology.org/D16-1011

Li M, Wang Y, Zheng R, Shi X, Wu F, Wang J, et al. (2019) Deepdsc: a deep learning method to predict drug sensitivity of cancer cell lines. IEEE/ACM transactions on computational biology and bioinformatics

Lian M, Du W, Wang X, Yao Q (2021) Drug–target interaction prediction based on multi-similarity fusion and sparse dual-graph regularized matrix factorization. IEEE Access 9:99718–99730. https://doi.org/10.1109/ACCESS.2021.3096830

Lin X, Quan Z, Wang Z-J, Ma T, Zeng X (2021) KGNN: knowledge graph neural network for drug–drug interaction prediction. In: Proceedings of the twenty-ninth international joint conference on artificial intelligence, Jaban; IJCAI'20

Lin-Gibson S, Srinivasan V (2019) Recent industrial roadmaps to enable smart manufacturing of biopharmaceuticals. IEEE Trans Autom Sci Eng 2019:1–8

Lipton ZC (2018) The mythos of model interpretability. Queue 16:31–57

Liu Y, Wu M, Miao C, Zhao P, Li X-L (2016) Neighborhood regularized logistic matrix factorization for drug–target interaction prediction. PLoS Comput Biol 12:e1004760

Liu B, Ramsundar B, Kawthekar P, Shi J, Gomes J, Luu Nguyen Q, Ho S, Sloane J, Wender P, Pande V (2017) Retrosynthetic reaction prediction using neural sequence-to-sequence models. R ACS Cent Sci 3:1103–1113

Liu N, Chen CB, Kumara S (2019) Semi-supervised learning algorithm for identifying high-priority drug–drug interactions. IEEE J Biomedic Health Inform. https://doi.org/10.1109/JBHI.2019.2932740

Liu K, Sun X, Jia L, Ma J, Xing H, Wu J, Gao H, Sun Y, Boulnois F, Fan J (2019a) Chemi-net: a molecular graph convolutional network for accurate drug property prediction. Int J Mol Sci 20:3389

Liu P, Li H, Li S, Leung KS (2019b) Improving prediction of phenotypic drug response on cancer cell lines using deep convolutional network. BMC Bioinform 20:408

Liu S, Huang Z, Qiu Y, Chen Y-PP, Zhang W (2019c) Structural network embedding using multi-modal deep auto-encoders for predicting drug–drug interactions. IEEE Int Conf Bioinform Biomed 2019:445–450. https://doi.org/10.1109/BIBM47256.2019.8983337

Liu S, Zhang Y, Cui Y, Qiu Y, Deng Y, Zhang W, Zhang Z (2021) Enhancing drug–drug interaction prediction using deep attention neural networks. BioRxiv. https://doi.org/10.1101/2021.03.16.435553

Lopes MR, Costigliola A, Pinto R, Vieira S, Sousa JMC (2019) Pharmaceutical quality control laboratory digital twin—a novel governance model for resource planning and scheduling. Int J Prod Res 58:1–15

Louizos C, Welling M, Kingma DP (2017) Learning sparse neural networks through l 0 regularization. http://arxiv.org/abs/1712.01312 .

Lu Y, Guo Y, Korhonen AJB (2017) Link prediction in drug–target interactions network using similarity indices. BMC Bioinf 18(1):39. https://doi.org/10.1186/s12859-017-1460-z

Luo Y, Zhao X, Zhou J, Yang J, Zhang Y, Kuang W, Peng J, Chen L, Zeng J (2017) A network integration approach for drug–target interaction prediction and computational drug repositioning from heterogeneous information. Nat Commun 8:573

Luo D, Cheng W, Xu D, Yu W, Zong B, Chen H, Zhang X (2020) Parameterized explainer for graph neural network. Adv Neural Inf Process Syst 33:19620–19631

Lyu T, Gao J, Tian L, Li Z, Zhang P, Zhang J (2021) MDNN: a multimodal deep neural network for predicting drug–drug interaction events. In: Proceedings of the thirtieth international joint conference on artificial intelligence (IJCAI-21), pp 3536–3542. https://doi.org/10.24963/ijcai.2021/487

Ma T, Xiao C, Zhou J, Wang F (2018) Drug similarity integration through attentive Multiview graph auto-encoders. In: IJCAI 2018, proceedings of the 27th international joint conference on artificial intelligence, pp 3477–3483

Mahajan D, Kumar D (2018) Sentiment analysis using RNN and Google translator. In: 2018 8th international conference on cloud computing, data science & engineering (Confluence), pp 798–802. https://doi.org/10.1109/CONFLUENCE.2018.8442924

Mak IWY, Evaniew N, Ghert M (2014) Lost in translation: animal models and clinical trials in cancer treatment. Am J Transl Res 6:114–118

Marr B (2017) What is digital twin technology and why is it so important? Forbes. https://www.forbes.com/sites/bernardmarr/2017/03/06/what-is-digital-twin-technology-and-why-is-it-so-important

Matsuzaka Y, Uesawa Y (2019) Prediction model with high-performance constitutive androstane receptor (CAR) using DeepSnap-deep learning approach from the tox21 10K compound library. Int J Mol Sci 20:4855

Maul J-T, Djamei V, Kolios AG, Meier B, Czernielewskiand J, Jungo P (2016) Efficacy and survival of systemic psoriasis treatments: an analysis of the SWISS registry SDNTT. Dermatology 232(6):640–647

Mayani MG, Svendsen M, Oedegaard SI (2018) Drilling digital twin success stories the last 10 years. In: Proceedings of the SPE Norway one day seminar, Bergen, Norway. https://doi.org/10.2118/191336-MS

Metz JT, Johnson EF, Soni NB, Merta PJ, Kifle L, Hajduk PJ (2011) Navigating the kinome. Nat Chem Biol 7:200–202

Miller T (2019) Explanation in artificial intelligence: insights from the social sciences. Artif Intell 267:1–38

MathSciNet   MATH   Google Scholar  

Miyato T, Dai AM, Goodfellow I (2016) Adversarial training methods for semisupervised text classification. http://arxiv.org/abs/1605.07725

Mohamed C, Nsiri B, Abdelmajid S, Abdelghani EM, Brahim B (2020) Deep convolutional networks for image segmentation: application to optic disc detection. Int Conf Electr Inf Technol (ICEIT) 2020:1–3. https://doi.org/10.1109/ICEIT48248.2020.9113204

Mukhamediev RI, Symagulov A, Kuchin Y, Yakunin K, Yelis M (2021) From classical machine learning to deep neural networks: a simplified scientometric review. Appl Sci 11:5541. https://doi.org/10.3390/app11125541

Murdoch WJ, Singh C, Kumbier K, Abbasi-Asl R, Yu B (2019) Definitions, methods, and applications in interpretable machine learning. Proc Natl Acad Sci USA 116:22071–22080

Nag S, Baidya ATK, Mandal A et al (2022) Deep learning tools for advancing drug discovery and development. 3 Biotech 12:110. https://doi.org/10.1007/s13205-022-03165-8

Nagy ZK, Fevotte G, Kramer H, Simon LL (2013) Recent advances in the monitoring, modelling, and control of crystallization systems. Chem Eng Res Des 91:1903–1922

Narayanan H, Luna MF, von Stosch M, Cruz Bournazou MN, Polotti G, Morbidelli M, Butte A, Sokolov M (2020) Bioprocessing in the digital age: the role of process models. Biotechnol J 15:e1900172

Nascimento ACA, Prudêncio RBC, Costa IG (2016) A multiple kernel learning algorithm for drug–target interaction prediction. BMC Bioinforma 17:46

Needleman SB, Wunsch CD (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48(3):443–453

Nguyen T, Nguyen TT, Nguyen T, Le DH (2021) Graph convolutional networks for drug response prediction. IEEE/ACM Trans Comput Biol Bioinform 19:146–154

O’Connor TF, Yu LX, Lee SL (2016) Emerging technology: a key enabler for modernizing pharmaceutical manufacturing and advancing product quality. Int J Pharm 509:492–498

Oboyle NM, Sayle RA (2016) Comparing structural fingerprints using a literature-based similarity benchmark. J Cheminform 8(1):1–14. https://doi.org/10.1186/s13321-016-0148-0

Olughu W, Deepika G, Hewitt C, Rielly C (2019) Insight into the large-scale upstream fermentation environment using scaled-down models. J Chem Technol Biotechnol 94:647–657

Oughtred R, Rust J, Chang C, Breitkreutz BJ, Stark C, Willems A, Boucher L, Leung G, Kolas N, Zhang F, Dolma S (2021) The BioGRID database: a comprehensive biomedical resource of curated protein, genetic, and chemical interactions. Protein Sci 30(1):187–200

Oztemel E, Gursev S (2018) Literature review of Industry 4.0 and related technologies. J Intell Manuf 31:127–182

Ozturk H, Ozturk A, Ozkirimli E (2018) DeepDTA: Deep drug–target binding affinity prediction. Bioinformatics 34:i821–i829

Pandey P, Katakdaunde M, Turton R (2006) Modeling weight variability in a pan coating process using Monte Carlo simulations. AAPS Pharm Sci Tech 7:E2–E11

Papadakis E, Woodley JM, Gani R (2018) Perspective on PSE in pharmaceutical process development and innovation. In Process. Systems engineering for pharmaceutical manufacturing. Elsevier, Amsterdam pp 597–656

Passi A et al (2018) RepTB: a gene ontology-based drug repurposing approach for tuberculosis. J Cheminform 10(1):24. https://doi.org/10.1186/s13321-018-0276-9

Peng J, Li J, Shang X (2020) A learning-based method for drug–target interaction prediction based on feature representation learning and deep neural network. BMC Bioinform 21:1–13

Perozzi B, Al-Rfou R, Skiena S (2014) DeepWalk: online learning of social representations. In: Proceeding of the ACM SIGKDD international conference on knowledge discovery and data mining, New York, NY, USA, 24–27 August 2014, pp 701–710

Poluzzi E, Raschi E, Piccinni C, De Ponti F (2012) data mining techniques in pharmacovigilance: analysis of the publicly accessible FDA adverse event reporting system (AERS). In: Data mining applications in engineering and medicine. London, United Kingdom: IntechOpen. https://doi.org/10.5772/50095

Pouryahya M, Oh JH, Mathews JC, Belkhatir Z, Moosmüller C, Deasy JO, Tannenbaum AR (2022) Pan-cancer prediction of cell-line drug sensitivity using network-based methods. Int J Mol Sci 23:1074. https://doi.org/10.3390/ijms23031074

Qiu K, Lee J, Kim H, Yoon S, Kang K (2021) Machine learning based anti-cancer drug response prediction and search for predictor genes using cancer cell line gene expression. Genomics Inform. https://doi.org/10.5808/gi.20076

Quan C et al (2016) Multichannel convolutional neural network for biological relation extraction. BioMed Res Int. https://doi.org/10.1155/2016/1850404

Raghava GP, Barton GJ (2006) Quantification of the variation in percentage identity for protein sequence alignments. BMC Bioinf 7(1):415. https://doi.org/10.1186/1471-2105-7-415

Rampášek L et al (2019) Improving drug response prediction via modeling of drug perturbation effects. Bioinformatics. https://doi.org/10.1093/bioinformatics/btz158

Rantanen J, Khinast J (2015) The future of pharmaceutical manufacturing sciences. J Pharm Sci 104:3612–3638

Read EK, Park JT, Shah RB, Riley BS, Brorson KA, Rathore AS (2010) Process analytical technology (PAT) for biopharmaceutical products: Part I. Concepts and applications. Biotechnol Bioeng 105:276–284

Reinhardt IC, Oliveira DJC, Ring DDT (2020) Current perspectives on the development of industry 4.0 in the pharmaceutical sector. J Ind Inf Integr 18:100131

Ren S, Tao Y, Yu K et al (2022) De novo prediction of Cell-Drug sensitivities using deep learning-based graph regularized matrix factorization. Pacif Symp Biocomput. https://doi.org/10.7490/f1000research.1118807.1

Reza F, Reza S, Yadollah O (2017) Computational prediction of drug–drug interactions based on drugs functional similarities. J Biomed Inform 70:54–64

Richardson P, Grifn I, Tucker C, Smith D, Oechsle O, Phelan A, Rawling M, Savory E, Stebbing J (2020) Baricitinib as potential treatment for 2019-nCoV acute respiratory disease. Lancet (london, England) 395(10223):e30

Rifaioglu AS, Atas H, Martin MJ, Cetin-Atalay R, Atalay V, Dogan T (2019) Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases. Brief Bioinform 20:1878–1912

Rosen R, von Wichert G, Lo G, Bettenhausen KD (2015) About the importance of autonomy and digital twins for the future of manufacturing. IFAC-PapersOnLine 48:567–572

Ryu JY, Kim HU, Lee SY (2018) Deep learning improves prediction of drug–drug and drug–food interactions. PNAS 115(18):E4304–E4311

Sachdev K, Gupta MK (2019) A comprehensive review of feature-based methods for drug–target interaction prediction. J Biomed Inform 93:103159

Sajjia M, Shirazian S, Kelly CB, Albadarin AB, Walker G (2017) ANN analysis of a roller compaction process; in the pharmaceutical industry. Chem Eng Technol 40:487–492

Sarker IH (2021) Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput Sci 2:420. https://doi.org/10.1007/s42979-021-00815-1

Sawada R, Iwata M, Tabei Y, Yamato H, Yamanishi Y (2018) Predicting inhibitory and activatory drug targets by chemically and genetically perturbed transcriptome signatures. Sci Rep 8:156

Schleich B, Anwer N, Mathieu L, Wartzack S (2017) Shaping the digital twin for design and production engineering. CIRP Ann 66:141–144

Schlichtkrull MS, De Cao N, Titov I (2020) Interpreting graph neural networks for NLP with differentiable edge masking. http://arxiv.org/abs/2010.00577

Schwarz K (2021) AttentionDDI: Siamese attention-based deep learning method for drug–drug interaction predictions. BMC Bioinf 22(1):412

Scudellari M (2020) Five companies using AI to fight coronavirus. https://spectrum.ieee.org/the-human-os/artificial-intelligence/medical-ai/companies-ai-coronavirus

Seo S, Lee T, Kim MH, Yoon Y (2020) Prediction of side effects using comprehensive similarity measures. BioMed Res Int. https://doi.org/10.1155/2020/1357630

Shang C, Liu Q, Chen KS, Sun J, Lu J, Yi J, Bi J (2018) Edge attention-based multi-relational graph convolutional networks. arXiv 2018; arXiv:1802.04944 .

Shao K, Zhang Z, He S, Bo X (2020) DTIGCCN: prediction of drug–target interactions based on GCN and CNN. In: Proceedings of the 2020 IEEE 2 nd international conference on tools with artificial intelligence (ICTAI), Baltimore, MD, USA, 9–11 November 2020, pp 337–342

Sharifi-Noghabi H, Zolotareva O, Collins CC, Ester M (2019) MOLI: multi-omics late integration with deep neural networks for drug response prediction. Bioinformatics 35:i501–i509

Shin B, Park S, Kang K, Ho JC (2019) Self-attention based molecule representation for predicting drug–target interaction. Proc Mach Learn Res 106:1–18

Shoemaker RH (2006) The NCI60 human tumour cell line anticancer drug screen. Nat Rev Cancer 6:813–823

Shrikumar A, Greenside P, Kundaje A (2017) Learning important features through propagating activation differences. In: Proceedings of the 34th international conference on machine learning 2017; 70, JMLR.org: Sydney, NSW, Australia. pp 3145–3153

Shtar G, Rokach L, Shapira B (2019) Detecting drug–drug interactions using artificial neural networks and classic graph similarity measures. PLoS ONE 14(8):e0219796

Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A et al (2017) Mastering the game of go without human knowledge. Nature 550(7676):354–359

Simon LL, Kiss AA, Cornevin J, Gani R (2019) Process engineering advances in pharmaceutical and chemical industries: Digital process design, advanced rectification, and continuous filtration. Curr Opin Chem Eng 25:114–121

Simonyan K, Vedaldi A, Zisserman A (2014) Deep inside convolutional networks: visualising image classification models and saliency maps. In: 2nd international conference on learning representations, ICLR 2014, Banff, AB, Canada, April 14–16, 2014, Workshop Track Proceedings; http://arxiv.org/abs/1312.6034

Smiatek J, Jung A, Bluhmki E (2020) Towards a digital bioprocess. Replica: computational approaches in biopharmaceutical development and manufacturing. Trends Biotechnol 38(10):1141–1153. https://doi.org/10.1016/j.tibtech.2020.05.008

Song T, Zhang X, Ding M, Rodriguez-Paton A, Wang S, Wang G (2022) DeepFusion: a deep learning based multi-scale feature fusion method for predicting drug–target interactions. Methods 204:269–277

Springenberg JT (2015) Striving for simplicity: the all-convolutional Net. CoRR, http://arxiv.org/abs/1412.6806

Stark R, Fresemann C, Lindow K (2019) Development and operation of digital twins for technical systems and services. CIRP Ann 68:129–132

Steinwandter V, Borchert D, Herwig C (2019) Data science tools and applications on the way to Pharma 4.0. Drug Discov Today 24:1795–1805

Stokes JM, Yang K, Swanson K, Jin W, Cubillos-Ruiz A, Donghia NM, MacNair CR, French S, Carfrae LA, Bloom-Ackerman Z et al (2020) A deep learning approach to antibiotic discovery. Cell 180:688-702.e13

Subramanian K (2020) Digital twin for drug discovery and development—the virtual liver. J Indian Inst Sci 100:653–662. https://doi.org/10.1007/s41745-020-00185-2

Subramanian A, Narayan R, Corsello SM, Peck DD, Natoli TE, Lu X, Gould J, Davis JF, Tubelli AA, Asiedu JK et al (2017) A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171:1437-1452.e17

Sun X, Ma L, Du X, Feng J, Dong K (2018) Deep convolution neural networks for drug–drug interaction extraction. In: 2018 IEEE international conference on bioinformatics and biomedicine (BIBM), pp 1662–1668. https://doi.org/10.1109/BIBM.2018.8621405

Sun M, Zhao S, Gilvary C, Elemento O, Zhou J, Wang F (2020a) Graph convolutional networks for computational drug development and discovery. Brief Bioinform 21:919–935

Sun M, Wang F, Elemento O, Zhou J (2020b) Structure-based drug–drug interaction detection via expressive graph convolutional networks and deep sets. Proc AAAI Conf Artif Intell 34(10):13927–13928. https://doi.org/10.1609/aaai.v34i10.7236

System HSL (2006) Psychoactive Drug Screening Program. https://www.hsls.pitt.edu/obrc/index.php?page=URL1133202727

Tajbakhsh N et al (2016) Convolutional neural networks for medical image analysis: full training or fine tuning? IEEE Trans Med Imaging 35(5):1299–1312. https://doi.org/10.1109/TMI.2016.2535302

Tang J, Szwajda A, Shakyawar S, Xu T, Hintsanen P, Wennerberg K, Aittokallio T (2014) Making sense of large-scale kinase inhibitor bioactivity data sets: a comparative and integrative analysis. J Chem Inf Model 54:735–743

Tang P, Xu J, Louey A, Tan Z, Yongky A, Liang S, Li ZJ, Weng Y, Liu S (2020) Kinetic modeling of Chinese hamster ovary cell culture: factors and principles. Crit Rev Biotechnol 40:265–281

Tao F, Cheng J, Qi Q, Zhang M, Zhang H, Sui F (2018) Digital twin-driven product design, manufacturing and service with big data. Int J Adv Manuf Technol 94:3563–3576

Tatonetti NP et al (2012) Data-driven prediction of drug effects and interactions. Sci Transl Med 4(125):12531. https://doi.org/10.1126/scitranslmed.3003377

Tatonetti NP, Patrick PY, Daneshjou R, Altman RB (2012) Data driven prediction of drug effects and interactions. Sci Transl Med 4(125):125ra31-125ra31

Tehseen Z, Usman Z (2019) Long short-term memory recurrent neural network architectures for Urdu acoustic modelling. Int J Speech Technol 22(1):21–30. https://doi.org/10.1007/s10772-018-09573-7

Thafar M, Raies AB, Albaradei S, Essack M, Bajic VB (2019) Comparison study of computational prediction tools for drug–target binding affinities. Front Chem 7:782. https://doi.org/10.3389/fchem.2019.00782

Thafar MA, Olayan RS, Olayan RS, Ashoor H, Ashoor H, Albaradei S, Albaradei S, Bajic VB, Gao X et al (2020) DTiGEMS: drug–target interaction prediction using graph embedding, graph mining, and similarity-based techniques. J Cheminform 12:1–17

Thafar MA, Alshahrani M, Albaradei S et al (2022) Affinity2Vec: drug–target binding affinity prediction through representation learning, graph mining, and machine learning. Sci Rep 12:4751. https://doi.org/10.1038/s41598-022-08787-9

Thorben F, Megha Kh, Avishek A (2021) Hard masking for explaining graph neural networks. In Submitted to international conference on learning representations https://openreview.net/forum?id=uDN8pRAdsoC

Tian X, Xin M, Luo J, Jiang Z (2016) Using the ranking-based KNN approach for drug repositioning based on multiple information. Springer, Cham, pp 317–327

Tong H, Heidemeyer M, Ban F, Cherkasov A, Ester M (2017) SimBoost: A read-across approach for predicting drug–target binding affinities using gradient boosting machines. J Cheminform 9:1–14

Torng W, Altman RB (2019) Graph convolutional neural networks for predicting drug–target interactions. J Chem Inf Model 59:4131–4149

Townshend RJL, Powers A, Eismann S, Derry A (2021) ATOM3D: tasks on molecules in three dimensions. arXiv 2021: arXiv:2012.04035

Trißl S, Rother K, Müller H et al (2005) Columba: an integrated database of proteins, structures, and annotations. BMC Bioinformatics 6:81. https://doi.org/10.1186/1471-2105-6-81

Trott O, Olson AJ (2010) AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization and multithreading. J Comput Chem 31:455

Tyson RJ, Park CC, Powell JR, Patterson JH, Weiner D, Watkins PB, Gonzalez D (2020) Precision dosing priority criteria: drug, disease, and patient population variables. J Front Pharmacol. https://doi.org/10.3389/fphar.2020.00420

U. Consortium (2014) UniProt: a hub for protein information. Nucleic Acids Res 43(D1):D204–D212

Vazquez J, Lopez M, Gibert E, Herrero E, Luque FJ (2020) Merging ligand-based and structure-based methods in drug discovery: an overview of combined virtual screening approaches. Molecules 25:4723

Venkatasubramanian V (2019) The promise of artificial intelligence in chemical engineering: is it here, finally? AIChE J 65:466–478

Vermeer NS, Straus SM, Mantel-Teeuwisse AK, Domergue F, Egberts TC, Leufkens HG, De Bruin ML (2013) Traceability of biopharmaceuticals in spontaneous reporting systems: a cross sectional study in the FDA adverse event reporting system (FAERS) and surveillance databases. Drug Saf 36(8):617–625

Vilar S, Hripcsak GJ (2016) Leveraging 3D chemical similarity, target and phenotypic data in the identification of drug-protein and drug-adverse effect associations. J Cheminform 8(1):35. https://doi.org/10.1186/s13321-016-0147-1

Vilar S, Uriarte E, Santana L, Lorberbaum T, Hripcsak G, Friedman C, Tatonetti NP (2014) Similarity-based modeling in large-scale prediction of drug–drug interactions. Nat Protoc 9(9):2147–2163. https://doi.org/10.1038/nprot.2014.151

Wallach I, Dzamba M, Heifets A (2015) AtomNet: a deep convolutional neural network for bioactivity prediction in structurebased drug discovery. arXiv 2015: arXiv:1510.02855 .

Wan F et al (2019) DeepCPI: a deep learning-based framework for large-scale in silico drug screening. Genom Proteomics Bioinform 17:478–495

Wang JZ et al (2007) A new method to measure the semantic similarity of GO terms. Bioinformatics 23(10):1274–1281. https://doi.org/10.1093/bioinformatics/btm087

Wang W et al (2014) Drug repositioning by integrating target information through a heterogeneous network model. Bioinformatics 30(20):2923–2930. https://doi.org/10.1093/bioinformatics/btu403

Wang CS, Lin PJ, Cheng CL, Tai SH, Kao Yang YH, Chiang JH (2019) Detecting potential adverse drug reactions using a deep neural network model. J Med Internet Res 21(2):e11016

Wang T, Yi HC, You ZH, Li LP, Wang YB, Hu L, Wong L (2019) A gated recurrent unit model for drug repositioning by combining comprehensive similarity measures and Gaussian interaction profile kernel. In: International conference on intelligent computing. Springer, Cham. pp 344–353

Wang YB, You ZH, Yang S et al (2020a) A deep learning-based method for drug–target interaction prediction based on long short-term memory neural network. BMC Med Inform Decis Mak 20:49. https://doi.org/10.1186/s12911-020-1052-0

Wang H, Wang J, Dong C, Lian Y, Liu D, Yan Z (2020b) A novel approach for drug–target interactions prediction based on multimodal deep autoencoder. Front Pharmacol 10:1–19

Watanabe JH, McInnis T, Hirsch JD (2018) Cost of prescription drug-related morbidity and mortality. Ann Pharmacother 52:829–837. https://doi.org/10.1177/1060028018765159

Way GP, Greene CS (2018) Extracting a biologically relevant latent space from cancer transcriptomes with variational autoencoders. Pac Symp Biocomput 23:80–91

Wei J, Lu Z, Qiu K, Li P, Sun H (2020) Predicting drug risk level from adverse drug reactions using SMOTE and machine learning approaches. IEEE Access 8:185761–185775. https://doi.org/10.1109/ACCESS.2020.3029446

Weinstein JN (2004) Integromic analysis of the NCI-60 cancer cell lines. Breast Dis 19:11–22

Wen M, Zhang Z, Niu S, Sha H, Yang R, Yun Y, Lu H (2017) Deep-learning-based drug–target interaction prediction. J Proteome Res 16:1401–1409

Wenzel J, Matter H, Schmidt F (2019) Predictive multitask deep neural network models for adme-tox properties: learning from large data sets. J Chem Inf Model 59:1253–1268

White J, Schiffer JT, Bender R et al (2021) Drug combinations as a first line of defense against coronaviruses and other emerging viruses. Mbio 12(6):e0334721

Withnall M, Lindelöf E, Engkvist O, Chen H (2020) Building attention and edge message passing neural networks for bioactivity and physical-chemical property prediction. J Cheminform 12:1

Wu Z, Ramsundar B, Feinberg EN, Gomes J, Geniesse C, Pappu AS, Leswing K, Pande V (2018) MoleculeNet: a benchmark for molecular machine learning. Chem Sci 9:513–530

Wu Z, Pan S, Chen F et al (2020) A comprehensive survey on graph neural networks. IEEE Trans Neural Netw Learn Syst 32:4–24

Xia Z, Wu LY, Zhou X, Wong ST (2010) Semi-supervised drug-protein interaction prediction from heterogeneous biological spaces. BMC Syst Biol 4:S6

Xiang W, Yingxin W, An Z, Xiangnan H, Tat-seng C (2021) Causal screening to interpret graph neural networks. In Submitted to international conference on learning representations. https://www.openreview.net/forum?id=nzKv5vxZfge

Xie L, He S, Song X, Bo X, Zhang Z (2018) Deep learning-based transcriptome data classification for drug–target interaction prediction. BMC Genomics 19:13–16

Xie Y, Peng J, Zhou Y, et al (2019) Integrating protein-protein interaction information into drug response prediction by graph neural encoding. 16 December 2019, Available at Research Square https://doi.org/10.21203/rs.2.18936/v1 .

Xu Y, Pei J, Lai L (2017) Deep learning-based regression and multiclass models for acute oral toxicity prediction with automatic chemical feature extraction. J Chem Inf Model 57:2672–2685

Yan CK, Wang WX, Zhang G et al (2019) BiRWDDA: a novel drug repositioning method based on multisimilarity fusion. J Comput Biol 26(11):1230–1242

Yan C, Duan G, Zhang Y, Wu F-X, Pan Y, Wang J (2022) Predicting drug–drug interactions based on integrated similarity and semi-supervised learning. IEEE/ACM Trans Comput Biol Bioinf 19(1):168–179. https://doi.org/10.1109/TCBB.2020.2988018

Yang K, Swanson K, Jin W, Coley C, Eiden P, Gao H, Guzman-Perez A, Hopper T, Kelley B, Mathea M et al (2019) Analyzing learned molecular representations for property prediction. J Chem Inf Model 59:3370–3388

Yi HC, You ZH, Wang L et al (2021) In silico drug repositioning using deep learning and comprehensive similarity measures. BMC Bioinf 22:293. https://doi.org/10.1186/s12859-020-03882-y

Yifan D, Xinran X, Yang Q, Jingbo X, Wen Z, Shichao L (2020) A multimodal deep learning framework for predicting drug–drug interaction events. Bioinformatics 36:4316–4322

Ying Z, Bourgeois D, You J, Zitnik M, Leskovec J (2019) Gnnexplainer: generating explanations for graph neural networks. Adv Neural Inf Process Syst 32:9244–9255

Yu Y, Si X, Hu C, Zhang J (2019) A review of recurrent neural networks: Lstm cells and network architectures. Neural Comput 31:1235–1270

Yu Y, Huang K, Zhang C, Glass LM, Sun J, Xiao C (2021) SumGNN: multi-typed drug interaction prediction via efficient knowledge graph summarization. Bioinformatics 37(18):2988–2995

Yuan H, Yu H, Wang J, Li K, Ji S (2021) On explain-ability of graph neural networks via subgraph explorations. http://arxiv.org/abs/2102.05152

Yue X, Wang Z, Huang J, Parthasarathy S, Moosavinasab S, Huang Y, Lin SM, Zhang W, Zhang P, Sun H (2020) Graph embedding on biomedical networks: methods, applications, and evaluations. Bioinformatics 36(4):1241–1251. https://doi.org/10.1093/bioinformatics/btz718

Yunsheng B, Ken G, Yizhou S, Wei W (2020) Bi-level graph neural networks for drug–drug interaction prediction. J Comput Eng arXiv:2006.14002

Zaikis D, Vlahavas I (2020) Drug–drug interaction classification using attention based neural networks. In: 11th Hellenic conference on artificial intelligence, pp 34–40. https://doi.org/10.1145/3411408.3411461

Zeng H, Qiu C, Cui QJD (2015) Drug-path: a database for drug-induced pathways. J Biol Databases Curation. https://doi.org/10.1093/database/bav061

Zeng T, Rongjian L, Ravi M, Jieping Y, Shuiwang J (2015) Deep convolutional neural networks for annotating gene expression patterns in the mouse brain. BMC Bioinformatics 16(1):147

Zeng X et al (2019) Measure clinical drug–drug similarity using electronic medical records. Int J Med Inf 124:97–103. https://doi.org/10.1016/j.ijmedinf.2019.02.003

Zeng X, Zhu S, Lu W, Liu Z, Huang J, Zhou Y, Fang J, Huang Y, Guo H, Li L et al (2020) Target identification among known drugs by deep learning from heterogeneous networks. Chem Sci 11:1775–1797

Zhai J, Zhang S, Chen J, He Q (2018) Autoencoder and its various variants. In: 2018 IEEE international conference on systems, man, and cybernetics (SMC), pp 415–419. https://doi.org/10.1109/SMC.2018.00080

Zhang Y (2020) Predicting drug–drug interactions using multi-modal deep autoencoders based network embedding and positive-unlabeled learning. Methods 179:37–46

Zhang M-L, Zhou Z-H (2007) Ml-knn: a lazy learning approach to multi-label learning. Pattern Recogn 40(7):2038–2048

Zhang H, Liu D, Xiong Z (2018) Convolutional neural network-based video super-resolution for action recognition. In: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), pp 746–750. https://doi.org/10.1109/FG.2018.00117

Zhang Y, Weng Y, Lund J (2022) Applications of explainable artificial intelligence in diagnosis and surgery. Diagnostics 12:237. https://doi.org/10.3390/diagnostics12020237

Zhang C, Lu Y, Zang T (2022) CNN-DDI: a learning-based method for predicting drug–drug interactions using convolution neural networks. BMC Bioinf 23:88. https://doi.org/10.1186/s12859-022-04612-2

Zhao Y, Zheng K, Guan B, Guo M, Song L, Gao J, Qu H, Wang Y, Shi D, Zhang Y (2020) DLDTI: a learning-based framework for drug–target interaction identification using neural networks and network representation. J Transl Med 18:434

Zhao Q, Xiao F, Yang M, Li Y, Wang J (2019) AttentionDTA: prediction of drug–target binding affinity using attention model. In: Proceedings of the 2019 IEEE international conference on bioinformatics and biomedicine, San Diego, CA, USA, 18–21 November 2019, pp 64–69

Zhou Y, Zhang Y, Lian X, Li F, Wang C, Zhu F, Qiu Y, Chen Y (2022) Therapeutic target database update 2022: facilitating drug discovery with enriched comparative data of targeted agents. Nucleic Acids Res 50:1398–1407

Zitnik M, Agrawal M, Leskovec J (2018) Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 34(13):i457–i466

Zitnik SM, Sosic R, Leskovec J (2018) Biosnap datasets: Stanford biomedical network dataset collection. http://snap.stanford.edu/biodata

Zong N, Kim H, Ngo V, Harismendy O (2017) Deep mining heterogeneous networks of biomedical linked data to predict novel drug–target associations. Bioinformatics 33:2337–2344

Zügner D, Akbarnejad A, Günnemann S (2018) Adversarial attacks on neural networks for graph data. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and Data Mining. 2018, Association for Computing Machinery: London, United Kingdom. pp 2847–2856

Download references

Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB).

Author information

Authors and affiliations.

Faculty of Computers and Artificial Intelligence, University of Sadat City, Sadat City, Egypt

Computer Science Department, Faculty of Science, Minia University, Minia, Egypt

Enas Elgeldawi & Mamdouh M. Gomaa

Faculty of Computers and Artificial Intelligence, Cairo University, Cairo, Egypt

Aboul Ella Hassanien

Faculty of Pharmacy and Drug Technology, Chinese University in Egypt (CUE), Cairo, Egypt

Heba Aboul Ella

Faculty of Pharmacy, University of Sadat City, Sadat City, Menoufia, Egypt

Yaseen A. M. M. Elshaier

You can also search for this author in PubMed   Google Scholar

Contributions

Ask wrote the main text, HA wrote the digital twining part, EE wrote the deep learning part, YAMME wrote the data sets part, MMG wrote the similarly part, AEH, suggest the idea of the review and supervision

Corresponding author

Correspondence to Aboul Ella Hassanien .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Askr, H., Elgeldawi, E., Aboul Ella, H. et al. Deep learning in drug discovery: an integrative review and future challenges. Artif Intell Rev 56 , 5975–6037 (2023). https://doi.org/10.1007/s10462-022-10306-1

Download citation

Accepted : 24 October 2022

Published : 17 November 2022

Issue Date : July 2023

DOI : https://doi.org/10.1007/s10462-022-10306-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Drug discovery
  • Artificial intelligence
  • Deep learning
  • Drug–target interactions
  • Drug–drug similarity
  • Drug side-effects
  • Drug sensitivity and response
  • Drug dosing optimization
  • Explainable artificial intelligence
  • Digital twining
  • Find a journal
  • Publish with us
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Design of experiments (DoE) in pharmaceutical development

Affiliations.

  • 1 a Department of Pharmaceutical Technology, Faculty of Pharmacy , National and Kapodistrian University of Athens , Athens , Greece.
  • 2 b Department of Pharmacy , University of Parma , Parma , Italy.
  • 3 c PlumeStars s.r.l. , Parma , Italy.
  • 4 d Department of Life Sciences and Biotechnology , University of Ferrara , Ferrara , Italy.
  • PMID: 28166428
  • DOI: 10.1080/03639045.2017.1291672

At the beginning of the twentieth century, Sir Ronald Fisher introduced the concept of applying statistical analysis during the planning stages of research rather than at the end of experimentation. When statistical thinking is applied from the design phase, it enables to build quality into the product, by adopting Deming's profound knowledge approach, comprising system thinking, variation understanding, theory of knowledge, and psychology. The pharmaceutical industry was late in adopting these paradigms, compared to other sectors. It heavily focused on blockbuster drugs, while formulation development was mainly performed by One Factor At a Time (OFAT) studies, rather than implementing Quality by Design (QbD) and modern engineering-based manufacturing methodologies. Among various mathematical modeling approaches, Design of Experiments (DoE) is extensively used for the implementation of QbD in both research and industrial settings. In QbD, product and process understanding is the key enabler of assuring quality in the final product. Knowledge is achieved by establishing models correlating the inputs with the outputs of the process. The mathematical relationships of the Critical Process Parameters (CPPs) and Material Attributes (CMAs) with the Critical Quality Attributes (CQAs) define the design space. Consequently, process understanding is well assured and rationally leads to a final product meeting the Quality Target Product Profile (QTPP). This review illustrates the principles of quality theory through the work of major contributors, the evolution of the QbD approach and the statistical toolset for its implementation. As such, DoE is presented in detail since it represents the first choice for rational pharmaceutical development.

Keywords: Experimental design; design space; factorial designs; mixture designs; pharmaceutical development; process knowledge; statistical thinking.

PubMed Disclaimer

Similar articles

  • Understanding pharmaceutical quality by design. Yu LX, Amidon G, Khan MA, Hoag SW, Polli J, Raju GK, Woodcock J. Yu LX, et al. AAPS J. 2014 Jul;16(4):771-83. doi: 10.1208/s12248-014-9598-3. Epub 2014 May 23. AAPS J. 2014. PMID: 24854893 Free PMC article. Review.
  • Quality by Design: Concept to Applications. Swain S, Parhi R, Jena BR, Babu SM. Swain S, et al. Curr Drug Discov Technol. 2019;16(3):240-250. doi: 10.2174/1570163815666180308142016. Curr Drug Discov Technol. 2019. PMID: 29521238 Review.
  • A quality by design study applied to an industrial pharmaceutical fluid bed granulation. Lourenço V, Lochmann D, Reich G, Menezes JC, Herdling T, Schewitz J. Lourenço V, et al. Eur J Pharm Biopharm. 2012 Jun;81(2):438-47. doi: 10.1016/j.ejpb.2012.03.003. Epub 2012 Mar 13. Eur J Pharm Biopharm. 2012. PMID: 22446063
  • Development and in vitro evaluation of pH-independent release matrix tablet of weakly acidic drug valsartan using quality by design tools. Saydam M, Takka S. Saydam M, et al. Drug Dev Ind Pharm. 2018 Dec;44(12):1905-1917. doi: 10.1080/03639045.2018.1496450. Epub 2018 Sep 10. Drug Dev Ind Pharm. 2018. PMID: 29969042
  • QbD-guided pharmaceutical development of Pembrolizumab biosimilar candidate PSG-024 propelled to industry meeting primary requirements of comparability to Keytruda®. Jaffar-Aghaei M, Khanipour F, Maghsoudi A, Sarvestani R, Mohammadian M, Maleki M, Havasi F, Rahmani H, Karagah AH, Kazemali MR. Jaffar-Aghaei M, et al. Eur J Pharm Sci. 2022 Jun 1;173:106171. doi: 10.1016/j.ejps.2022.106171. Epub 2022 Apr 1. Eur J Pharm Sci. 2022. PMID: 35378209
  • Development of Continuous Additive-Controlled MSMPR Crystallization by DoE-Based Batch Experiments. Stoffán GN, Lőrincz Z, Pusztai É, Madarász L, Tacsi K, Marosi G, Pataki H. Stoffán GN, et al. Ind Eng Chem Res. 2024 Jul 23;63(31):13709-13722. doi: 10.1021/acs.iecr.4c01933. eCollection 2024 Aug 7. Ind Eng Chem Res. 2024. PMID: 39130744 Free PMC article.
  • From lab to industrial development of lipid nanocarriers using quality by design approach. Buya AB, Mahlangu P, Witika BA. Buya AB, et al. Int J Pharm X. 2024 Jul 1;8:100266. doi: 10.1016/j.ijpx.2024.100266. eCollection 2024 Dec. Int J Pharm X. 2024. PMID: 39050378 Free PMC article. Review.
  • Quality by Design Perspective for Designing Foam-based Formulation: Current State of Art. Kumar M, Kumar D, Singh S, Chopra S, Mahmood S, Bhatia A. Kumar M, et al. Curr Pharm Des. 2024;30(6):410-419. doi: 10.2174/0113816128289965240123074111. Curr Pharm Des. 2024. PMID: 38747045 Review.
  • Response surface methodology for process optimization in livestock wastewater treatment: A review. Reza A, Chen L, Mao X. Reza A, et al. Heliyon. 2024 Apr 24;10(9):e30326. doi: 10.1016/j.heliyon.2024.e30326. eCollection 2024 May 15. Heliyon. 2024. PMID: 38726140 Free PMC article.
  • Process Optimization of Tinospora cordifolia Extract-Loaded Water in Oil Nanoemulsion Developed by Ultrasound-Assisted Homogenization. Anjum V, Bagale U, Kadi A, Malinin A, Potoroko I, Alharbi AH, Khafaga DS, AlMetwally M, Qenawy AT, Anjum A, Ali F. Anjum V, et al. Molecules. 2024 Apr 16;29(8):1797. doi: 10.3390/molecules29081797. Molecules. 2024. PMID: 38675617 Free PMC article.

Publication types

  • Search in MeSH

LinkOut - more resources

Full text sources.

  • Taylor & Francis

Other Literature Sources

  • The Lens - Patent Citations
  • scite Smart Citations

Research Materials

  • NCI CPTC Antibody Characterization Program

full text provider logo

  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

W

  • General Chemistry
  • Pharmaceutical & Medicinal Chemistry
  • Drug Discovery & Development

research paper about drug development

Chemistry and Pharmacology of Drug Discovery

ISBN: 978-1-394-22512-5

August 2024

research paper about drug development

Jie Jack Li

Case studies of 20 successful FDA-approved drugs, from biological rationale to clinical efficacy studies and state-of-the-art applications

Chemistry and Pharmacology of Drug Discovery illustrates how chemistry, biology, pharmacokinetics, and a host of disciplines come together to produce successful medicines, discussing a total of 20 drugs that are all FDA-approved post 2021—some of which are first-in-class and revolutionary.

The four sections in this book cover Infectious Disease, Cancer Drugs, CNS Drugs, and Miscellaneous Drugs. Each chapter covers background material on the drug class and/or disease indication and key aspects relevant to the discovery of the drug, including structure-activity relationships, pharmacokinetics, drug metabolism, efficacy, and safety.

This book is contributed to by various veterans and well-known experts in medical chemistry, many of whom discovered the drugs they reviewed, leading to tremendous quality and depth of insight.

Some of the drugs covered in Chemistry and Pharmacology of Drug Discovery include:

  • Nirmatrelvir (Paxlovid with Ritonavir), a 3-chymotrypsin-like protease inhibitor for treating SARS-CoV-2 infection
  • Doravirine (Pifeltro), a third-generation non-nucleoside reverse transcriptase inhibitor for the treatment of HIV-1 infection
  • Oteseconazole (Vivjoa), a CYP51 inhibitor for treating recurrent vulvovaginal candidiasis, and Rimegepant (Nurtec ODT), a CGRP antagonist for treating migraine
  • Ciprofol (Cipepofol), a γ-Aminobutyric acid receptor agonist for induction of anesthesia, and Ozanimod (Zeposia), an S1P receptor antagonist for treating multiple sclerosis
  • Deucravacitinib (Sotyktu), a first-in-class deuterated TYK2 inhibitor for the treatment of plaque psoriasis

Chemistry and Pharmacology of Drug Discovery serves as an excellent and highly authoritative learning resource for medicinal, organic, synthetic, and process chemists as well as research scientists in lead optimization and process development.

Jie Jack Li, PhD is the CSO of GenHouse Bio. Previously, he was VP of Discovery Chemistry at ChemPartner, an Associate Professor of Chemistry at the University of San Francisco, and a Medicinal Chemist at Pfizer and Bristol-Myers Squibb. He has authored or edited over 30 books, many published by Wiley.

  • Pharmacology
  • Pharmaceutical Industry

The Pharmaceutical Industry and the Future of Drug Development

  • January 2016
  • 2016(41):1-33

David Taylor at WCA Environment Ltd.

  • WCA Environment Ltd.

Abstract and Figures

Example of ''pay for delay'' mechanism.

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

Victor Olabode Otitolaiye

  • Maryam Al-Naaimi

Rasha Abdelrahim

  • Jose Antonio Pereira

Luan F. Diniz

  • Christian W. Lehmann

Paulo Carvalho Jr

  • Ramesh Bhardwaj
  • Expet Opin Drug Discov
  • Rebecca A. Gallego
  • Martin P Edwards
  • T Patrick Montgomery

Stephen Hanessian

  • ACCOUNT ORG SOC

Chiara Bottausci

  • Keith Robson

Claire Dambrin

  • Mahdieh Estedadi

Leila Barghi

  • PHILOS T R SOC B

Joakim Larsson

  • Klaus Kümmerer
  • D. P. Walker
  • Walter Pagel

John Sumpter

  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up

Tufts CSDD

  • From the Executive Director
  • The Tufts CSDD Team
  • In the News
  • Research Milestones
  • Financial Disclosure
  • Privacy Policy
  • Research Platforms and Working Groups
  • Recurrent Research Assessments
  • Longitudinal Studies
  • Research Grants and Center Sponsors
  • Impact Reports

White Papers

  • Presentations
  • CSDD Insider
  • Global Biotech E-Sourcebook
  • Postgraduate Course in Clinical Pharmacology, Drug Development, and Regulation
  • Leadership for Drug Development Teams
  • Customized On-Site and Virtual Programs
  • Drug Development R&D Executive Learning Series
  • Roundtables and Forums
  • Other Programs of Interest
  • Center Sponsorship
  • Endowed Research Grants

Artboard 1-1

Comprehensive Summary of Site Engagement Literature

(spring 2024).

Much has been written in the literature — both peer- reviewed and trade journals — about the unprecedented operating challenges that the global community of investigative sites face. This reference resource provides a comprehensive summary with citations and links to these articles, organized by challenge category. In all, this resource offers valuable insights informing site engagement strategies and practices.

Access a copy of this White Paper Report 

Retail Pharm White Paper Cover

Assessing Sponsor, Site, and Patient Receptivity to Retail Pharmacy Involvement in Clinical Trials

research paper about drug development

Benchmarking and Optimizing the Process for Adopting Innovations Supporting Clinical Trial Execution

(fall 2022).

Although there are conceptual frameworks and an extensive body of work in the literature examining the barriers and challenges associated with the adoption of new innovations supporting clinical trial execution, there is no empirical data benchmarking the process. As part of a working group study, Tufts CSDD conducted 26 in-depth interviews followed by an online survey that yielded 631 responses from the global community of drug development professionals. This study found that the four stages of the innovation adoption process – Initiation, Evaluation, Adoption Decision, and Full Implementation — takes 5.8 years on average with mid-sized companies taking one year longer than large companies and nearly two years longer than small companies. CROs are able to complete the innovation adoption process in half the time. High variation around the mean duration was observed overall and by company size showing highly inconsistent experience and the difficulty that companies face in navigating the process. The latter two stages of the process — Adoption Decision and Full Implementation — take the longest, are the most variable, and are regarded as the most difficult. Factors and approaches to accelerate the process and optimize innovation adoption are discussed.

Summary of themes from the 2021 Executive Roundtable Examining Experiences Implementing and Accommodating the ICH E6 (R2) Guidance

Summary of themes from the

2021 executive roundtable examining, experiences implementing and , accommodating the ich e6 (r2) guidance, (june 2021).

On May 2021, the Tufts Center for the Study of Drug Development — in collaboration with, and with funding from, CluePoints and PwC — hosted a virtual roundtable comprised of R&D senior executives from pharmaceutical and biotechnology companies, contract research organizations (CROs) and a representative from the Food and Drug Administration. In total, 60 people participated in a virtual, facilitated discussion.

DRUG DEVELOPMENT WORKFORCE IN THE AGE OF DIGITAL TRANSFORMATION

Drug Development Workforce in the Age of Digital Transformation

(june 2020).

ASSESSING THE FINANCIAL IMPACT OF TRANSLATIONAL PHARMACEUTICS - A PLATFORM FOR ACCELERATING PRODUCT DEVELOPMENT

Assessing the Financial Impact of Translational Pharmaceutics - A Platform for Accelerating Product Development

(october 2019).

ALPHA+PROJECT+UNITES+GLOBAL+LUPUS+COMMUNITY+ON+BARRIERS+TO+RESEARCH,+DRUG+DEVELOPMENT,+CARE,+AND+ACCESS

ALPHA Project Unites Global Lupus Community on Barriers to Research, Drug Development, Care, and Access

(august 2019).

SUMMARY+REPORT+FROM+THE+SPRING+2019+EXECUTIVE+ROUNDTABLE+ON+COMPANY+RESPONSE+TO+ICH+E6+(R2) SUMMARY REPORT FROM THE SPRING 2019 EXECUTIVE ROUNDTABLE ON COMPANY RESPONSE TO ICH E6 (R2)

Summary Report From the Spring 2019 Executive Roundtable on Company Response to ICH E6 (R2)

(july 2019).

MANUFACTURING STRATEGY FOR DIVERSE BIOLOGIC PIPELINES OF THE FUTURE

Manufacturing Strategy for Diverse Biologic Pipelines of the Future

(october 2017).

ASSESSING THE ECONOMICS OF SINGLE-SOURCE VS. MULTI-VENDOR MANUFACTURING

Assessing the Economics of Single-Source vs. Multi-Vendor  Manufacturing

PROFILES OF NEW APPROACHES TO IMPROVING THE EFFICIENCY AND PERFORMANCE OF PHARMACEUTICAL DRUG DEVELOPMENT

Profiles of New Approaches to Improving the Efficiency and Performance of Pharmaceutical Drug Development

PUBLIC AND PRIVATE SECTOR CONTRIBUTIONS TO THE RESEARCH & DEVELOPMENT OF THE MOST TRANSFORMATIONAL DRUGS OF THE LAST 25 YEARS

Public and Private Sector Contributions to the Research & Development of the Most Transformational Drugs of the Last 25 Years  

(january 2015).

INDUSTRY+USAGE+OF+SOCIAL+AND+DIGITAL+MEDIA+COMMUNITIES+IN+CLINICAL+RESEARCH INDUSTRY USAGE OF SOCIAL AND DIGITAL MEDIA COMMUNITIES IN CLINICAL RESEARCH

Industry Usage of Social and Digital Media Communities in Clinical Research

(june 2014).

A+NEW+TOOL+FOR+PREDICTING+MARKETING+APPROVAL+OF+ONCOLOGY+DRUGS A NEW TOOL FOR PREDICTING MARKETING APPROVAL OF ONCOLOGY DRUGS

A New Tool for Predicting Marketing Approval of Oncology Drugs

(december 2013).

THE ADOPTION AND IMPACT OF ADAPTIVE TRIAL DESIGNS

The Adoption and Impact of Adaptive Trial Designs

ACADEMIC-INDUSTRY PARTNERSHIPS FOR BIOPHARMACEUTICAL RESEARCH & DEVELOPMENT: ADVANCING MEDICAL SCIENCE IN THE U.S.

Academic-Industry Partnerships for Biopharmaceutical Research & Development: Advancing Medical Science in the U.S.

(april 2012).

  • Skip to main content
  • Skip to FDA Search
  • Skip to in this section menu
  • Skip to footer links

U.S. flag

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

U.S. Food and Drug Administration

  •   Search
  •   Menu
  • Science & Research
  • Science and Research Special Topics

Artificial Intelligence and Machine Learning (AI/ML) for Drug Development

What is artificial intelligence and machine learning.

Artificial Intelligence (AI) and Machine Learning (ML) can be described as a branch of computer science, statistics, and engineering that uses algorithms or models to perform tasks and exhibit behaviors such as learning, making decisions, and making predictions. ML is considered a subset of AI that allows models to be developed by training algorithms through analysis of data, without models being explicitly programmed.

What role is AI/ML playing in drug development?

FDA recognizes the increased use of AI/ML throughout the drug development life cycle and across a range of therapeutic areas. In fact, FDA has seen a significant increase in the number of drug and biologic application submissions using AI/ML components over the past few years, with more than 100 submissions reported in 2021. These submissions traverse the landscape of drug development — from drug discovery and clinical research to postmarket safety surveillance and advanced pharmaceutical manufacturing.  

Additionally, AI/ML is increasingly integrated in areas where FDA is actively engaged, including Digital Health Technologies (DHTs) , and Real-World Data (RWD) analytics. 

What is FDA’s perspective on the use of AI/ML in drug development?

FDA is committed to ensuring that drugs are safe and effective while facilitating innovations in their development. As with any innovation, AI/ML creates opportunities and new and unique challenges. To meet these challenges, FDA has accelerated its efforts to create an agile regulatory ecosystem that can facilitate innovation while safeguarding public health.

As part of this effort, FDA’s Center for Drug Evaluation and Research (CDER), in collaboration with the Center for Biologics Evaluation and Research (CBER) and the Center for Devices and Radiological Health (CDRH), issued an initial discussion paper to communicate with a range of stakeholders and to explore relevant considerations for the use of AI/ML in the development of drugs and biological products. The agency will continue to solicit feedback as it advances regulatory science in this area.

AI/ML will undoubtedly play a critical role in drug development, and FDA plans to develop and adopt a flexible risk-based regulatory framework that promotes innovation and protects patient safety.

Related Publications

  • AI/ML for Drug Development Discussion Paper (PDF - 1 MB)
  • Artificial Intelligence in Drug Manufacturing  (PDF - 2 MB)
  • Distributed Manufacturing and Point-of-Care Manufacturing of Drugs  (PDF - 1 MB)
  • For specific AI/ML publications related to devices, please visit CDRH’s Digital Health Center of Excellence webpage

Contact Information

For more information about AI/ML for Drug Development, please email [email protected]

EDITORIAL article

Editorial: supporting pediatric drug development: from basic research to clinical studies and technological advancements.

Nikolas Dietis

  • 1 School of Medicine, University of Nicosia, Nicosia, Nicosia, Cyprus
  • 2 department of pharmaceutical and pharmacological sciences, KU Leuven, Department of Development and Regeneration, Faculty of Medicine, KU Leuven, Leuven, Belgium
  • 3 Department of Pharmacy, Erasmus Medical Center, Rotterdam, Netherlands
  • 4 clinical research center, Antwerp Univesity Hospitals, Antwerp, Belgium
  • 5 Fondazione per la Ricerca Farmacologica 'Gianni Benzi, Bari, Italy

The final, formatted version of the article will be published soon.

Select one of your emails

You have multiple emails registered with Frontiers:

Notify me on publication

Please enter your email address:

If you already have an account, please login

You don't have a Frontiers account ? You can register here

Keywords: Pediatric drug development, Children, Pediatric treatment, EPTRI, preclincical pediatric research

Received: 07 Aug 2024; Accepted: 12 Aug 2024.

Copyright: © 2024 Dietis, Allegaert, Smits and Ceci. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Nikolas Dietis, School of Medicine, University of Nicosia, Nicosia, 2408, Nicosia, Cyprus Karel Allegaert, department of pharmaceutical and pharmacological sciences, KU Leuven, Department of Development and Regeneration, Faculty of Medicine, KU Leuven, Leuven, Belgium

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

research paper about drug development

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

  •  We're Hiring!
  •  Help Center

Drug development

  • Most Cited Papers
  • Most Downloaded Papers
  • Newest Papers
  • Last »
  • Drug Regulatory Affairs Follow Following
  • COMMUNICATIONS MANAGEMENT Follow Following
  • Public Health Ethics Follow Following
  • Medical Geography Follow Following
  • Sierra Leone Follow Following
  • Emerging Infectious Diseases Follow Following
  • Pharmacie Follow Following
  • Pediatric Surgery Follow Following
  • Sepsis Follow Following
  • Coagulopathy Follow Following

Enter the email address you signed up with and we'll email you a reset link.

  • Academia.edu Journals
  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024

What is Pharmaceutical Research and Development?

  • Download PDF Copy

Marzia Khan

Introduction to pharmaceutical R&D   What is pharmaceutical R&D?       Key stages of pharmaceutical R&D       Conclusion References Further reading

Introduction to pharmaceutical R&D

In 2022, the combined research and development (R&D) expenditure of the U.S. pharmaceutical industry consisted of more than $100 billion. While this is a significantly expansive market, approximately less than 30 percent of new drugs progress from Phase II to Phase III clinical trials successfully. 1

Developing new drugs from the discovery stage to the launching of the product into the market is a lengthy and complex process, taking approximately 12-15 years and costing over $1 billion. 2

Pharmaceutical R&D plays an important role in discovering new or improved methods to optimize therapeutic approaches and address unmet medical needs. This increases overall public health and ensures healthcare professionals can provide better care for their patients.3 This article will explore the role of pharmaceutical R&D further.

Image Credit: elenabsl/Shutterstock.com

Image Credit: elenabsl/Shutterstock.com

What is pharmaceutical R&D?

Pharmaceutical research, which involves drug development, aims to investigate drug interaction with living organisms. Some significant pharmaceutical products developed in the past include insulin and penicillin in the 1920s and 1930s, which greatly improved population health. 4

The historical concept behind this process involves isolating the active ingredient from conventional remedies or discovering a therapeutic effect of a drug by chance. The second method results in a few potential drug candidates that reach the open market. 4

With the growth of this sector and advanced processes to expedite finding potential drug candidates, regulatory oversight was required to reduce the occurrence of life-threatening incidents from the development and use of pharmaceutical drugs. 4

The pharmaceutical R&D industry began to grow substantially in the 1990s, with various partners being involved in the process, including the National Institutes of Health (NIH) in the United States and academia, which furthered the research stages of drug development into a more rigorous process over the subsequent years. 4

Key stages of pharmaceutical R&D

Discovery phase.

Drug development starts with the discovery phase, with new knowledge of a biological target such as a receptor, enzyme, protein, or gene that is involved with a biological process that may not be working effectively in patients. 5

This phase encompasses the discovery of new medicine with a mode of action that works differently from already approved medicines and for a different clinical indication. The newly approved medications may provide additional benefits to existing medications, such as safety, potency, tolerability, or convenience. 5

At this early stage of drug discovery, researchers compile thousands of compounds that may be potential drug candidates for medical treatment. However, after early testing of these compounds, only a small volume is deemed to be promising enough for further examination. 6

Target identification and validation involves genetic, cellular, and experimental models that identify and validate the biological targets identified by researchers. This is a significant step for drug development, reducing the risk of producing an ineffective drug. The two main reasons drugs fail in the clinic are: (i) they either do not work or (ii) they are unsafe. 2

A good target is required to be effective, safe, meet clinical and commercial needs, and be ‘druggable,’ meaning it causes a biological response that can be measured both in vitro (a laboratory test performed outside of the body 7 ) and in vivo (a laboratory test performed on whole living organisms, i.e., in the body 8 ). 2

Once a potential drug candidate has been found, it will progress onto the next stage of pre-clinical research for further testing. 2

Preclinical research

Prior to drug testing in humans, researchers are required to investigate a drug’s potential to cause serious harm or toxicity through preclinical research, including in vitro and in vivo testing. 9

In vitro cell studies are typically the first tests performed on a potential drug candidate to investigate its efficacy. Researchers investigate the effects of the new drug on human or animal cells that are grown in a petri dish or test tube. 10

Image Credit: redgreystock/Shutterstock.com

Image Credit: redgreystock/Shutterstock.com

Related Stories

  • Breakthrough in aging research: Blocking IL-11 extends lifespan and improves health in mice
  • Beckman Coulter Life Sciences and Illumina Accelerate Oncology Research with Launch of TruSight Oncology 500 DNA/RNA Application
  • Sapio Sciences launches partner program to accelerate research and diagnostics

After evaluating which treatments work well in cell studies, researchers perform in vivo testing in live animals, which provides information about the safety of the new treatment in a living organism. 10

Clinical trials

Clinical trials are a significant part of pharmaceutical R&D and are conducted with questions and objectives that require answering before being approved by the Food and Drug Administration (FDA). 10  

There are four stages of clinical trials, including Phase I, II, III, and IV. 10

Phase I clinical trials aim to investigate the safety of a new drug treatment. Within this phase, researchers identify the highest safe dose that can be administered to patients without severe side effects. While Phase I trials have the highest potential risk for patients, they do, however, help some patients who are facing life-threatening diseases. 10,11

Phase II clinical trials aim to investigate the efficacy and side effects of the drug in a group of 25 to 100 patients. When developing a cancer treatment, this phase assesses if the treatment works in certain types of cancers. In Phase II trials, everyone typically receives the same dosage. However, in some trials, patients may be randomly assigned to different treatment groups with different dosages to assess the best balance of safety and response. 10,11

Phase III clinical trials investigate whether the new drug treatment is better than drugs that are already available. This phase compares the safety and efficacy of a new drug treatment against a placebo or a current standard of care treatment. Ethically, while placebos can be used in some phase III trials, they are never used alone if there is an effective standard of care treatment. 10,11

Randomization can be used to assign patients to either the new drug group or to the standard of care treatment that is already approved. A double-blind design can also be used to prevent the doctors and patients from knowing which treatment they are receiving. 10

After Phase III clinical trials, regulatory agencies such as the US FDA and the European Medicines Agency (EMA) assess new drug treatments with high efficacy or safety results. These agencies play an integral role in introducing new drugs into clinical practice worldwide. 12

Both regulatory agencies have assessment criteria that a drug is required to meet before approval is granted, such as investigating a drug’s ability to address an unmet medical need, having meaningful benefits over available drugs on the market, and benefiting public health. 12

Phase IV clinical trials can be described as post-marketing surveillance trials that aim to assess drugs that have already received regulatory approval after Phase III trials. 13

These trials evaluate how a drug works over a longer period while being available on the market, with doctors prescribing the drug to a wider population of patients. As a result, researchers can understand the full effect of the drug and gain a holistic view of its wider impact. 10

Phase IV studies may also investigate other factors that affect treatment, including quality of life or cost effectiveness. 10

Pharmaceutical R&D encompasses all stages of drug discovery and development and plays a significant role in how novel drug treatments are made available on the market for the benefit of public health. 3

From target identification and validation to preclinical trials, clinical trials, and post-market surveillance, pharmaceutical R&D provides due diligence and evidence for the efficacy and safety of innovative treatments before and after regulatory approval. 3,10

  • Pharmaceutical research and development (R&D). Statista. https://www.statista.com/topics/6755/pharmaceutical-research-and-development-randd/#topicOverview
  • Hughes J, Rees S, Kalindjian S, Philpott K. Principles of Early Drug Discovery. British Journal of Pharmacology. 2011;162(6):1239-1249. doi: https://doi.org/10.1111/j.1476-5381.2010.01127.x
  • ‌Chandra A, Drum J, Daly M, et al. Comprehensive measurement of biopharmaceutical R&D investment.  Nature Reviews Drug Discovery . Published online August 6, 2024. doi: https://doi.org/10.1038/d41573-024-00131-2
  • ‌Wang ML. The Modern Pharmaceutical Industry: History, Current Position and Challenges.  Global Health Partnerships . Published online 2009:33-80. doi: https://doi.org/10.1057/9780230582873_2
  • ‌Mohs RC, Greig NH. Drug discovery and development: Role of basic biological research.  Alzheimer’s & Dementia: Translational Research & Clinical Interventions . 2017;3(4):651-657. doi: https://doi.org/10.1016/j.trci.2017.10.005
  • ‌FDA. Discovery and Development. U.S. Food and Drug Administration. Published January 4, 2018. https://www.fda.gov/patients/drug-development-process/step-1-discovery-and-development
  • ‌NCI Dictionaries. In Vitro. Available from: https://www.cancer.gov/publications/dictionaries/cancer-terms/def/in-vitro . Published February 2, 2011.
  • ‌NCI Dictionaries. In Vivo. https://www.cancer.gov/publications/dictionaries/cancer-terms/def/in-vivo . Published February 2, 2011.
  • ‌FDA. Step 2: Preclinical Research. U.S. Food and Drug Administration. Published 2018. https://www.fda.gov/patients/drug-development-process/step-2-preclinical-research
  • American Cancer Society. Types and Phases of Clinical Trials | What Are Clinical Trial Phases? www.cancer.org. Published August 18, 2020. https://www.cancer.org/cancer/managing-cancer/making-treatment-decisions/clinical-trials/what-you-need-to-know/phases-of-clinical-trials.html
  • FDA. Step 3: Clinical Research. U.S. Food and Drug Administration. Published January 4, 2018. https://www.fda.gov/patients/drug-development-process/step-3-clinical-research
  • Hwang TJ, Ross JS, Vokinger KN, Kesselheim AS. Association between FDA and EMA expedited approval programs and therapeutic value of new medicines: retrospective cohort study.  BMJ . Published online October 7, 2020:m3434. doi: https://doi.org/10.1136/bmj.m3434
  • Fountzilas E, Tsimberidou AM, Vo HH, Kurzrock R. Clinical trial design in the era of precision medicine.  Genome Medicine . 2022;14(1). doi: https://doi.org/10.1186/s13073-022-01102-1

Further Reading

  • All Research Content
  • What is the Replication Crisis?
  • What are the Advantages of Open-Access Research?
  • What is Reproducibility?
  • Human Health Solutions through Animal Research

Last Updated: Aug 12, 2024

Marzia Khan

Marzia Khan is a lover of scientific research and innovation. She immerses herself in literature and novel therapeutics which she does through her position on the Royal Free Ethical Review Board. Marzia has a MSc in Nanotechnology and Regenerative Medicine as well as a BSc in Biomedical Sciences. She is currently working in the NHS and is engaging in a scientific innovation program.

Please use one of the following formats to cite this article in your essay, paper or report:

Khan, Marzia. (2024, August 12). What is Pharmaceutical Research and Development?. News-Medical. Retrieved on August 14, 2024 from https://www.news-medical.net/life-sciences/What-is-Pharmaceutical-Research-and-Development.aspx.

Khan, Marzia. "What is Pharmaceutical Research and Development?". News-Medical . 14 August 2024. <https://www.news-medical.net/life-sciences/What-is-Pharmaceutical-Research-and-Development.aspx>.

Khan, Marzia. "What is Pharmaceutical Research and Development?". News-Medical. https://www.news-medical.net/life-sciences/What-is-Pharmaceutical-Research-and-Development.aspx. (accessed August 14, 2024).

Khan, Marzia. 2024. What is Pharmaceutical Research and Development? . News-Medical, viewed 14 August 2024, https://www.news-medical.net/life-sciences/What-is-Pharmaceutical-Research-and-Development.aspx.

Suggested Reading

Why do we blush, and what are the underlying mechanisms of blushing? Research aims to find out

Cancel reply to comment

  • Trending Stories
  • Latest Interviews
  • Top Life Sciences Articles

Columbia study reveals bacteria create free-floating genes, challenging genetic code norms

Addressing Important Cardiac Biology Questions with Shotgun Top-Down Proteomics

In this interview conducted at Pittcon 2024, we spoke to Professor John Yates about capturing cardiomyocyte cell-to-cell heterogeneity via shotgun top-down proteomics.

Addressing Important Cardiac Biology Questions with Shotgun Top-Down Proteomics

A Discussion with Hologic’s Tim Simpson on the Future of Cervical Cancer Screening

Tim Simpson

Hologic’s Tim Simpson Discusses the Future of Cervical Cancer Screening.

A Discussion with Hologic’s Tim Simpson on the Future of Cervical Cancer Screening

From Waste to Taste: The Transformative Power of Fermented Foods

Maria Marco

In this interview conducted at Pittcon 2024 in San Diego, Maria Marco discusses her research on the health benefits, safety, and waste reduction potential of fermented foods, and the microbial processes involved in their production.

From Waste to Taste: The Transformative Power of Fermented Foods

Latest Life Science News

Study finds no clinically useful biomarkers for long COVID in routine lab tests

Newsletters you may be interested in

Pharmacy / Pharmacology

Your AI Powered Scientific Assistant

Hi, I'm Azthena, you can trust me to find commercial scientific answers from News-Medical.net.

A few things you need to know before we start. Please read and accept to continue.

  • Use of “Azthena” is subject to the terms and conditions of use as set out by OpenAI .
  • Content provided on any AZoNetwork sites are subject to the site Terms & Conditions and Privacy Policy .
  • Large Language Models can make mistakes. Consider checking important information.

Great. Ask your question.

Azthena may occasionally provide inaccurate responses. Read the full terms .

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions .

Provide Feedback

research paper about drug development

Clinical Drug Development

Learning about the process of clinical drug development has important implications for anyone working in health care and related sectors.

Placeholder.

Associated Schools

Harvard Medical School

Harvard Medical School

What you'll learn.

Learn some of the key epidemiological considerations that need to be taken into account when designing and conducting a clinical trial

Understand how a clinical trial is designed, carried out, interpreted, and presented

Learn about the process of drug review and the regulatory steps that are taken to move a drug from clinical drug development to the drug market for use in patients

Course description

Clinical drug development has evolved dramatically over time and the field continues to advance today. The methodology of clinical trials has evolved with new techniques being introduced. The design and statistical aspects have become more sophisticated in parallel with the increased use of technology and the introduction of new drug modalities. The rigor and discipline of this process means that people can trust that the medicines they take are likely to be safe and effective and that beneficial treatments can make their way through the process in a safe and timely manner.

Learning about the process of clinical drug development has important implications for anyone working in health care and related sectors. This advanced course offers a unique way for professionals to learn from leading Harvard Medical School faculty about how drugs are developed clinically and about the advances happening in this field that are ultimately helping to improve the treatment and prevention of disease.

You may also like

An animation of a doctor looking at circles with different organs animated within them.

Gene Therapy

Explore recent advances in gene therapy and learn about the implications for patient care..

Animated image about cancer treatment

Immuno-oncology

See how the immune system is being used to improve cancer treatment..

Women at computer assessing research

Foundations of Clinical Research

This Harvard Medical School six-month, application-based certificate program provides the essential skill sets and fundamental knowledge required to begin or expand your clinical research career.

Join our list to learn more

  • IT’S ADVANCING OUR UNDERSTANDING OF BREAST CANCER
  • IT’S SAVING LIVES, IMPROVING OUTCOMES
  • IT’S LEADING TO PREVENTION & A CURE
  • RESEARCH IS THE REASON STORIES
  • Our Approach
  • The Ground We’ve Gained
  • Areas of Focus
  • Meet Our Researchers
  • Collaborative Initiatives
  • Start Your Fundraiser
  • Make a planned gift
  • Game for BCRF
  • Other Ways to Give
  • Become a Partner
  • Find an event
  • Our History
  • Board of Directors
  • Scientific Advisors
  • Corporate Partners
  • Affiliate Organizations
  • Major Donors
  • Blog: The Progress Report
  • Podcasts: Investigating Breast Cancer
  • Video Series: Behind the Breakthroughs
  • Stories: Research is the reason
  • BCRF Publications
  • Research is the reason

How AI Can Innovate Breast Cancer Drug Development, Clinical Trials, and Data Analytics

research paper about drug development

The final installment of BCRF’s series on AI in breast cancer focuses on how it can bring new treatments to patients—faster

Artificial Intelligence (AI) is already having a transformative impact in healthcare, including on breast cancer detection and diagnosis , risk assessment, and prognosis . And there is much excitement building around AI’s potential role in drug discovery and clinical trials.

In part 4 of BCRF’s series on AI in breast cancer, we delve into how it’s reshaping drug discovery, patient matching to clinical trials, and data gathering and analysis (big data mining).

The protein-folding problem in drug development

Traditional drug development is a slow process. Case in point: It took 14 years from the time the HER2 protein was discovered to the moment the FDA approved the lifesaving targeted therapy Herceptin for patients. That’s because there are several steps involved: identification of a target protein that’s specific for the disease, screening for drugs that bind the protein to identify the best candidates, testing and optimization of lead candidate(s) in laboratory cells and models, and clinical trial testing in humans. All of this takes tremendous amounts of time, manpower, and funding.

Central to drug discovery and development is the fact that researchers need to determine the exact 3D structure of the protein target so that drugs can be designed to fit into a specific protein pocket. But predicting a 3D structure of a protein is not an easy task, and in fact, researchers have dubbed this the “protein-folding problem.” How does a string of amino acids that make up the building blocks of a protein also form the complex 3D structure of a functional protein? It is analogous to the letters of the alphabet creating words, sentences, and a coherent novel—all by themselves. The complexity of the task makes traditional experimental methods complicated and time-consuming, taking as much as several years to solve a single protein’s structure.  

AlphaFold redefines drug discovery

Enter AlphaFold, an AI deep learning system developed by DeepMind, a subsidiary of Google. AlphaFold was trained on mountains of data from prior experiments that successfully defined protein structures and uses this information to help determine the 3D structures of yet-unsolved proteins.

In 2022, AlphaFold deciphered the 3D structures of 200 million known proteins in just one year . Imagine how this technology would have accelerated the development of HER2-targeted drugs and, moving forward, the avenues it will open for future breast cancer drugs.

DeepMind made this data open source, so it’s freely available to the research community and pharmaceutical companies at large. In 2023, DeepMind’s was awarded the American Nobel Prize, the Lasker Award, for their solution to the protein-folding problem.

“Thanks to AI, new medical breakthroughs may one day move from years to mere months,” Demis Hassabis, AlphaFold’s founder, said in a recent TED Talk .

The first application of AlphaFold in oncology sought to unravel the structure of CDK20, a newly identified protein target that is important in liver cancer. Remarkably, in only 30 days, scientists from AlphaFold and Insilico Medicine, an AI drug discovery company, collaborated to decipher the CDK20’s protein structure. Using a fully automated AI machine learning program, they generated approximately 9,000 molecules that could target CDK20. From these, seven were selected for synthesis and biological testing with one lead CDK20 small molecule inhibitor found to be active in liver cancer laboratory models.

This is one example of how scientists are using AI to fuel drug discovery. The ultimate test will be if the combination of AlphaFold and the pharmaceutical industry’s AI-intensive strategies lead to successful FDA approvals in oncology. If this type of work continues, we can shave years off the early steps in the process and speed up drug development in the future.

Only time will tell if AI advances lead to improvements in patient outcomes, but the future of AI-based drug development is looking bright. Knowing the potential importance of AI to accelerate drug development, BCRF is committed to funding novel AI research in breast cancer and, moving forward, plans to fund more innovative AI research.

How AI can improve clinical trials

Patient screening is a key component of clinical research that determines if people are eligible to participate in specific trials. That process is labor intensive, time consuming, and subject to human errors. A clinical research team must scour patient medical records and laboratory results to determine who fits a given trial’s long inclusion and exclusion criteria. AI solutions that can sift through large amounts of information like this and determine who is eligible for specific clinical trials would be invaluable.

In fact, these practices have already begun to take shape in oncology. A team of researchers compared AI to the standard method of clinical trial screening by reviewing data from three oncology clinical trials. They found that AI-assisted screening led to a 24 to 50 percent increase in the number of patients that were correctly identified as potential participants compared to standard practice. Further, no patients correctly identified by the standard practice were missed by AI. Most strikingly, compared to standard practices that can take many days to screen patients, AI took only minutes. The ability to shorten research timelines will ultimately benefit more patients.

How AI provides insights for clinical decision-making

AI is also leveraged in big data mining, the process of sifting through mounds of information to find meaningful nuggets to achieve a given goal. In oncology, big data mining allows researchers to efficiently assess changes in the genome that could be associated with disease. Building on human genome sequencing, genomic profiling blossomed as an important area in cancer biology. But the amount of information generated can be daunting to analyze, so bioinformatic scientists have developed ways to use AI-mediated big data mining to analyze mountains of data for key patterns or trends.

For example, diagnostic companies like Caris and Tempus are using state-of-the-art AI algorithms to examine patient molecular data for biomarkers that could predict a tumor’s response or chance of developing resistance. The results could inform clinical trials to validate biomarkers or inform how cancer care teams make clinical decisions and tailor personalized treatments.

Moving forward with AI

As we continue to harness AI in drug discovery and clinical trials, it’s paramount to consider what this means for patients. Testing, transparent protocols, and ongoing oversight to prevent any unintended consequences are needed, particularly for underserved communities.

Nevertheless, the integration of AI into oncology will accelerate progress and improve outcomes for patients with breast cancer. Indeed, AI is poised to revolutionize the field as it may be leveraged to speed up the drug development process, increase the availability of innovative and more effective treatments, and enable greater participation in clinical trials to test them—thereby personalizing care for patients worldwide. BCRF will remain at the forefront of AI advances by funding innovative projects that can unlock its potential and help move the needle in the fight against breast cancer.

Maadi, H., Soheilifar, M. H., Choi, W., Moshtaghian, A., & Wang, Z. (2021). Trastuzumab mechanism of action; 20 years of research to unravel a dilemma. Cancers , 13 (14), 3540. https://doi.org/10.3390/cancers13143540

Ren, F., Ding, X., Zheng, M., Korzinkin, M., Cai, X., Zhu, W., Mantsyzov, A., Aliper, A., Aladinskiy, V., Cao, Z., Kong, S., Long, X., Liu, B. H. M., Liu, Y., Naumov, V., Shneyderman, A., Ozerov, I. V., Wang, J., Pun, F. W., . . . Zhavoronkov, A. (2023). AlphaFold accelerates artificial intelligence powered drug discovery: efficient discovery of a novel CDK20 small molecule inhibitor. Chemical Science , 14 (6), 1443–1452. https://doi.org/10.1039/d2sc05709c

TED. (2024, April 29). How AI is unlocking the secrets of nature and the universe | Demis Hassabis | TED [Video]. YouTube. https://www.youtube.com/watch?v=0_M_syPuFos

AI and Cancer . (2024, May 30). Cancer.gov. https://www.cancer.gov/research/infrastructure/artificial-intelligence

Calaprice-Whitty, D., Galil, K., Salloum, W., Zariv, A., & Jimenez, B. (2020). Improving Clinical Trial Participant Prescreening With Artificial Intelligence (AI): A Comparison of the Results of AI-Assisted vs Standard Methods in 3 Oncology Trials. Therapeutic Innovation & Regulatory Science , 54 (1), 69–74. https://doi.org/10.1007/s43441-019-00030-4

Chua, I. S., Gaziel-Yablowitz, M., Korach, Z. T., Kehl, K. L., Levitan, N. A., Arriaga, Y. E., Jackson, G. P., Bates, D. W., & Hassett, M. (2021). Artificial intelligence in oncology: Path to implementation. Cancer Medicine , 10 (12), 4138–4149. https://doi.org/10.1002/cam4.3935

Get The Latest

Connect with us.

Please remember BCRF in your will planning.  Learn More

Breast Cancer Research Foundation 28 West 44th Street, Suite 609, New York, NY 10036

General Office: 646-497-2600 | Toll Free: 1-866-346-3228 [email protected]  | BCRF is a 501 (c)(3) | EIN: 13-3727250

  • Privacy Policy
  • Health Tech
  • Health Insurance
  • Medical Devices
  • Gene Therapy
  • Neuroscience
  • H5N1 Bird Flu
  • Health Disparities
  • Infectious Disease
  • Mental Health
  • Cardiovascular Disease
  • Chronic Disease
  • Alzheimer's
  • Coercive Care
  • The Obesity Revolution
  • The War on Recovery
  • Adam Feuerstein

Matthew Herper

  • Jennifer Adaeze Okwerekwu
  • Ed Silverman
  • CRISPR Tracker
  • Breakthrough Device Tracker
  • Generative AI Tracker
  • Obesity Drug Tracker
  • 2024 STAT Summit
  • All Summits
  • STATUS List
  • STAT Madness
  • STAT Brand Studio

Don't miss out

Subscribe to STAT+ today, for the best life sciences journalism in the industry

Eli Lilly’s billions: Can the world’s most valuable pharma company keep inventing drugs at this pace?

The best-selling medicines in history have often left a shadow of research failure behind them.

  • Manage alerts for this article
  • Email this article
  • Share this article

News photo of the The new Lilly Seaport Innovation Center is located at 15 Necco Street, on the Fort Point Channel.

By Matthew Herper

Aug. 14, 2024

Senior Writer, Medicine, Editorial Director of Events

On Tuesday, Eli Lilly unveiled a towering 346,000-square-foot laboratory in Boston’s Seaport District, a building that will house 500 of the company’s scientists — 300 of whom are still to be hired — who will focus on medicines that work by exploiting the basic mechanisms of human genetics. Another 200 people will be part of companies Lilly will incubate.

It’s a sizeable and conspicuous bet. It’s also the latest attempt to deal with one of the biggest challenges in drug development: What should a company do when it wins big?

advertisement

Victory is hardly defeat. But inventing a mega-blockbuster — in this case novel drugs that treat obesity and diabetes — brings its own set of issues. The best-selling medicines in history have often left a shadow of research failure behind them, even when their makers shoveled cash into R&D as they never had before. There are many reasons for this. Distraction, because selling the biggest medicines in the world is distracting. Overconfidence, because big successes make executives take bets that are too risky. And bad luck, because developing drugs is both one of the most scientifically based activities human beings engage in and one of those most subject to random chance.

STAT+ Exclusive Story

Already have an account? Log in

STAT+

This article is exclusive to STAT+ subscribers

Unlock this article — plus daily coverage and analysis of the biotech sector — by subscribing to stat+..

Totals $468 per year

for 3 months, then $39/month

Then $39/month

Savings start at 25%!

Annually per user

$300 Annually per user

Get unlimited access to award-winning journalism and exclusive events.

About the reporting

STAT’s investigation is based on interviews with nearly 100 people around the country, including incarcerated patients and grieving families, prison officials, and legal and medical experts. Reporter Nicholas Florko also filed more than 225 public records requests and combed through thousands of pages of legal filings to tell these stories. His analysis of deaths in custody is based on a special data use agreement between STAT and the Department of Justice.

You can read more about the reporting for this project and the methodology behind our calculations.

The series is the culmination of a reporting fellowship sponsored by the Association of Health Care Journalists and supported by The Commonwealth Fund.

Matthew Herper covers medical innovation — both its promise and its perils.

More on Long Covid

research paper about drug development

STAT Plus: As Humira biosimilars take over the market, CVS has created a new ploy: the drug ‘rebate credit’

research paper about drug development

Understand how science, health policy, and medicine shape the world every day

Your data will be processed in accordance with our Privacy Policy and Terms of Service . You may opt out of receiving STAT communications at any time.

ebook-image

A new era for drug development and research

Recommended

research paper about drug development

STAT Plus: Gilead wins U.S. approval for drug to treat autoimmune liver disease

research paper about drug development

Syros shares battered after it stops leukemia trial

research paper about drug development

STAT Plus: Grail announces layoffs as it looks to push forward blood-based cancer detection test

research paper about drug development

STAT Plus: Illumina lays out plan to boost growth by helping scientists interpret sequencing data

research paper about drug development

STAT Plus: Acelyrin announces layoffs, strategy shift after late-stage trial results for anti-inflammatory drug

Subscriber picks, stat plus: how unitedhealth turned a questionable artery-screening program into a gold mine, stat plus: how unitedhealth harnesses its physician empire to squeeze profits out of patients, stat plus: the race to build better crispr delivery vehicles is heating up, stat plus: three mdma therapy papers retracted over data integrity concerns, stat plus: trump keeps losing his train of thought. cognitive experts have theories about why.

research paper about drug development

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • 14 August 2024

Has your paper been used to train an AI model? Almost certainly

  • Elizabeth Gibney

You can also search for this author in PubMed   Google Scholar

Person holding smartphone with logo of US publishing company John Wiley and Sons Inc. in front of their website.

Academic publisher Wiley has sold access to its research papers to firms developing large language models. Credit: Timon Schneider/Alamy

Academic publishers are selling access to research papers to technology firms to train artificial-intelligence (AI) models. Some researchers have reacted with dismay at such deals happening without the consultation of authors. The trend is raising questions about the use of published and sometimes copyrighted work to train the exploding number of AI chatbots in development.

Experts say that, if a research paper hasn’t yet been used to train a large language model (LLM), it probably will be soon. Researchers are exploring technical ways for authors to spot if their content being used.

research paper about drug development

AI models fed AI-generated data quickly spew nonsense

Last month, it emerged that the UK academic publisher Taylor & Francis, had signed a US$10-million deal with Microsoft, allowing the US technology company to access the publisher’s data to improve its AI systems. And in June, an investor update showed that US publisher Wiley had earned $23 million from allowing an unnamed company to train generative-AI models on its content.

Anything that is available to read online — whether in an open-access repository or not — is “pretty likely” to have been fed into an LLM already, says Lucy Lu Wang, an AI researcher at the University of Washington in Seattle. “And if a paper has already been used as training data in a model, there’s no way to remove that paper after the model has been trained,” she adds.

Massive data sets

LLMs train on huge volumes of data, frequently scraped from the Internet. They derive patterns between the often billions of snippets of language in the training data, known as tokens, that allow them to generate text with uncanny fluency.

Generative-AI models rely on absorbing patterns from these swathes of data to output text, images or computer code. Academic papers are valuable for LLM builders owing to their length and “high information density”, says Stefan Baack, who analyses AI training data sets at the Mozilla Foundation, a global non-profit organization in San Francisco, California that aims to keep the Internet open for all to access.

research paper about drug development

How does ChatGPT ‘think’? Psychology and neuroscience crack open AI large language models

Training models on a large body of scientific information also give them a much better ability to reason about scientific topics, says Wang, who co-created S2ORC, a data set based on 81.1 million academic papers. The data set was originally developed for text mining — applying analytical techniques to find patterns in data — but has since been used to train LLMs.

The trend of buying high-quality data sets is growing. This year, the Financial Times has offered its content to ChatGPT developer OpenAI in a lucrative deal, as has the online forum Reddit, to Google. And given that scientific publishers probably view the alternative as their work being scraped without an agreement, “I think there will be more of these deals to come,” says Wang.

Information secrets

Some AI developers, such as the Large-scale Artificial Intelligence Network, intentionally keep their data sets open, but many firms developing generative-AI models have kept much of their training data secret, says Baack. “We have no idea what is in there,” he says. Open-source repositories such as arXiv and the scholarly database PubMed of abstracts are thought to be “very popular” sources, he says, although paywalled journal articles probably have their free-to-read abstracts scraped by big technology firms. “They are always on the hunt for that kind of stuff,” he adds.

Proving that an LLM has used any individual paper is difficult, says Yves-Alexandre de Montjoye, a computer scientist at Imperial College London. One way is to prompt the model with an unusual sentence from a text and see whether the output matches the next words in the original. If it does, that is good evidence that the paper is in the training set. But if it doesn’t, that doesn’t mean that the paper wasn’t used — not least because developers can code the LLM to filter responses to ensure they don’t match training data too closely. “It takes a lot for this to work,” he says.

research paper about drug development

Robo-writers: the rise and risks of language-generating AI

Another method to check whether data are in a training set is known as membership inference attack. This relies on the idea that a model will be more confident about its output when it is seeing something that it has seen before. De Montjoye’s team has developed a version of this, called a copyright trap, for LLMs.

To set the trap, the team generates sentences that look plausible but are nonsense, and hides them in a body of work, for example as white text on a white background or in a field that’s displayed as zero width on a webpage. If an LLM is more ‘surprised’ — a measure known as its perplexity — by an unused control sentence than it is by the one hidden in the text, “that is statistical evidence that the traps were seen before”, he says.

Copyright questions

Even if it were possible to prove that an LLM has been trained on a certain text, it is not clear what happens next. Publishers maintain that, if developers use copyrighted text in training and have not sought a licence, that counts as infringement. But a counter legal argument says that LLMs do not copy anything — they harvest information content from training data, which gets broken up, and use their learning to generate new text.

research paper about drug development

AI is complicating plagiarism. How should scientists respond?

Litigation might help to resolve this. In an ongoing US copyright case that could be precedent-setting, The New York Times is suing Microsoft and ChatGPT’s developer OpenAI in San Francisco, California. The newspaper accuses the firms of using its journalistic content to train their models without permission.

Many academics are happy to have their work included in LLM training data — especially if the models make them more accurate. “I personally don’t mind if I have a chatbot who writes in the style of me,” says Baack. But he acknowledges that his job is not threatened by LLM outputs in the way that those of other professions, such as artists and writers, are.

Individual scientific authors currently have little power if the publisher of their paper decides to sell access to their copyrighted works. For publicly available articles, there is no established means to apportion credit or know whether a text has been used.

Some researchers, including de Montjoye, are frustrated. “We want LLMs, but we still want something that is fair, and I think we’ve not invented what this looks like yet,” he says.

doi: https://doi.org/10.1038/d41586-024-02599-9

Reprints and permissions

Related Articles

research paper about drug development

  • Machine learning

Chatbots in science: What can ChatGPT do for you?

Chatbots in science: What can ChatGPT do for you?

Career Column 14 AUG 24

Weather and climate predicted accurately — without using a supercomputer

Weather and climate predicted accurately — without using a supercomputer

News & Views 13 AUG 24

Physics solves a training problem for artificial neural networks

Physics solves a training problem for artificial neural networks

News & Views 07 AUG 24

Estonians gave their DNA to science — now they’re learning their genetic secrets

Estonians gave their DNA to science — now they’re learning their genetic secrets

News 26 JUN 24

Not all ‘open source’ AI models are actually open: here’s a ranking

Not all ‘open source’ AI models are actually open: here’s a ranking

News 19 JUN 24

A guide to the Nature Index

A guide to the Nature Index

Nature Index 05 JUN 24

Tenure-Track Assistant Professor, Associate Professor, and Professor

Westlake Center for Genome Editing seeks exceptional scholars in the many areas.

Westlake Center for Genome Editing, Westlake University

research paper about drug development

Tenure-track Assistant Professor

Position Description Brain Immunology and Glia (BIG) Center at the Washington University School of Medicine invites applications for a tenure-track...

Saint Louis, Missouri

Washington University School of Medicine - St. Louis

research paper about drug development

Post Doctoral Associate Radiation Oncology

The Department of Radiation Oncology at the University of Miami, Miller School of Medicine, has an exciting opportunity for a Postdoctoral Associate.

Miami, Florida

University of Miami/UHealth System

research paper about drug development

Editor (Applied and Industrial Microbiology)

We’re seeking a biologist who has a critical eye, a deep understanding of their subject and interests beyond, and who can think on their feet.

London, Madrid or Pune (hybrid working)

Springer Nature Ltd

research paper about drug development

Research assistant (praedoc) (m/f/d)

Fachbereich Physik - Institut für Theoretische Physik AG Netz Research assistant (praedoc) (m/f/d) full-time job limited to 3 years (end of project...

14195, Berlin (DE)

Freie Universität Berlin

research paper about drug development

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Elsevier - PMC COVID-19 Collection

Logo of pheelsevier

Artificial intelligence in drug discovery and development

  • • Artificial Intelligence (AI) has revolutionized many aspects of the pharmaceuticals.
  • • AI assistance to pharma industries helps to improve overall life cycle of product.
  • • AI can be implemented in pharma ranging from drug discovery to product management.
  • • Future challenges related to AI and their respective solutions have been expounded.

Artificial Intelligence (AI) has recently started to gear-up its application in various sectors of the society with the pharmaceutical industry as a front-runner beneficiary. This review highlights the impactful use of AI in diverse areas of the pharmaceutical sectors viz., drug discovery and development, drug repurposing, improving pharmaceutical productivity, clinical trials, etc. to name a few, thus reducing the human workload as well as achieving targets in a short period. Crosstalk on the tools and techniques utilized in enforcing AI, ongoing challenges, and ways to overcome them, along with the future of AI in the pharmaceutical industry, is also discussed.

Artificial intelligence-integrated drug discovery and development has accelerated the growth of the pharmaceutical sector, leading to a revolutionary change in the pharma industry. Here, we discuss areas of integration, tools, and techniques utilized in enforcing AI, ongoing challenges, and ways to overcome them.

The use of artificial intelligence (AI) has been increasing in various sectors of society, particularly the pharmaceutical industry. In this review, we highlight the use of AI in diverse sectors of the pharmaceutical industry, including drug discovery and development, drug repurposing, improving pharmaceutical productivity, and clinical trials, among others; such use reduces the human workload as well as achieving targets in a short period of time. We also discuss crosstalk between the tools and techniques utilized in AI, ongoing challenges, and ways to overcome them, along with the future of AI in the pharmaceutical industry.

Artificial intelligence: things to know

Over the past few years, there has been a drastic increase in data digitalization in the pharmaceutical sector. However, this digitalization comes with the challenge of acquiring, scrutinizing, and applying that knowledge to solve complex clinical problems [1] . This motivates the use of AI, because it can handle large volumes of data with enhanced automation [2] . AI is a technology-based system involving various advanced tools and networks that can mimic human intelligence. At the same time, it does not threaten to replace human physical presence 3 , 4 completely. AI utilizes systems and software that can interpret and learn from the input data to make independent decisions for accomplishing specific objectives. Its applications are continuously being extended in the pharmaceutical field, as described in this review. According to the McKinsey Global Institute, the rapid advances in AI-guided automation will be likely to completely change the work culture of society 5 , 6 .

AI: networks and tools

AI involves several method domains, such as reasoning, knowledge representation, solution search, and, among them, a fundamental paradigm of machine learning (ML). ML uses algorithms that can recognize patterns within a set of data that has been further classified. A subfield of the ML is deep learning (DL), which engages artificial neural networks (ANNs). These comprise a set of interconnected sophisticated computing elements involving ‘perceptons’ analogous to human biological neurons, mimicking the transmission of electrical impulses in the human brain [7] . ANNs constitute a set of nodes, each receiving a separate input, ultimately converting them to output, either singly or multi-linked using algorithms to solve problems [8] . ANNs involve various types, including multilayer perceptron (MLP) networks, recurrent neural networks (RNNs), and convolutional neural networks (CNNs), which utilize either supervised or unsupervised training procedures 9 , 10 .

The MLP network has applications including pattern recognition, optimization aids, process identification, and controls, are usually trained by supervised training procedures operating in a single direction only, and can be used as universal pattern classifiers [11] . RNNs are networks with a closed-loop, having the capability to memorize and store information, such as Boltzmann constants and Hopfield networks 11 , 12 . CNNs are a series of dynamic systems with local connections, characterized by its topology, and have use in image and video processing, biological system modeling, processing complex brain functions, pattern recognition, and sophisticated signal processing [13] . The more complex forms include Kohonen networks, RBF networks, LVQ networks, counter-propagation networks, and ADALINE networks 9 , 11 . Examples of method domains of AI are summarized in Figure 1 .

Figure 1

Method domains of artificial intelligence (AI). This figure shows different AI method domains along with their subfields that can be implemented in different fields drug discovery and development.

Several tools have been developed based on the networks that form the core architecture of AI systems. One such tool developed using AI technology is the International Business Machine (IBM) Watson supercomputer (IBM, New York, USA). It was designed to assist in the analysis of a patient’s medical information and its correlation with a vast database, resulting in suggesting treatment strategies for cancer. This system can also be used for the rapid detection of diseases. This was demonstrated by its ability to detect breast cancer in only 60 s 14 , 15 .

AI in the lifecycle of pharmaceutical products

Involvement of AI in the development of a pharmaceutical product from the bench to the bedside can be imagined given that it can aid rational drug design [16] ; assist in decision making; determine the right therapy for a patient, including personalized medicines; and manage the clinical data generated and use it for future drug development [17] . E-VAI is an analytical and decision-making AI platform developed by Eularis, which uses ML algorithms along with an easy-to-use user interface to create analytical roadmaps based on competitors, key stakeholders, and currently held market share to predict key drivers in sales of pharmaceuticals [18] , thus helping marketing executives to allocate resources for maximum market share gain, reversing poor sales and enabled them to anticipate where to make investments. Different applications of AI in drug discovery and development are summarized in Figure 2 .

Figure 2

Applications of artificial intelligence (AI) in different subfields of the pharmaceutical industry, from drug discovery to pharmaceutical product management.

AI in drug discovery

The vast chemical space, comprising >10 60 molecules, fosters the development of a large number of drug molecules [19] . However, the lack of advanced technologies limits the drug development process, making it a time-consuming and expensive task, which can be addressed by using AI [15] . AI can recognize hit and lead compounds, and provide a quicker validation of the drug target and optimization of the drug structure design 19 , 20 . Different applications of AI in drug discovery are depicted in Figure 3 .

Figure 3

Role of artificial intelligence (AI) in drug discovery. AI can be used effectively in different parts of drug discovery, including drug design, chemical synthesis, drug screening, polypharmacology, and drug repurposing.

Despite its advantages, AI faces some significant data challenges, such as the scale, growth, diversity, and uncertainty of the data. The data sets available for drug development in pharmaceutical companies can involve millions of compounds, and traditional ML tools might not be able to deal with these types of data. Quantitative structure-activity relationship (QSAR)-based computational model can quickly predict large numbers of compounds or simple physicochemical parameters, such as log P or log D. However, these models are some way from the predictions of complex biological properties, such as the efficacy and adverse effects of compounds. In addition, QSAR-based models also face problems such as small training sets, experimental data error in training sets, and lack of experimental validations. To overcome these challenges, recently developed AI approaches, such as DL and relevant modeling studies, can be implemented for safety and efficacy evaluations of drug molecules based on big data modeling and analysis. In 2012, Merck supported a QSAR ML challenge to observe the advantages of DL in the drug discovery process in the pharmaceutical industry. DL models showed significant predictivity compared with traditional ML approaches for 15 absorption, distribution, metabolism, excretion, and toxicity (ADMET) data sets of drug candidates 21 , 22 .

The virtual chemical space is enormous and suggests a geographical map of molecules by illustrating the distributions of molecules and their properties. The idea behind the illustration of chemical space is to collect positional information about molecules within the space to search for bioactive compounds and, thus, virtual screening (VS) helps to select appropriate molecules for further testing. Several chemical spaces are open access, including PubChem, ChemBank, DrugBank, and ChemDB.

Numerous in silico methods to virtual screen compounds from virtual chemical spaces along with structure and ligand-based approaches, provide a better profile analysis, faster elimination of nonlead compounds and selection of drug molecules, with reduced expenditure [19] . Drug design algorithms, such as coulomb matrices and molecular fingerprint recognition, consider the physical, chemical, and toxicological profiles to select a lead compound [23] .

Various parameters, such as predictive models, the similarity of molecules, the molecule generation process, and the application of in silico approaches can be used to predict the desired chemical structure of a compound 20 , 24 . Pereira et al. presented a new system, DeepVS, for the docking of 40 receptors and 2950 ligands, which showed exceptional performance when 95 000 decoys were tested against these receptors [25] . Another approach applied a multiobjective automated replacement algorithm to optimize the potency profile of a cyclin-dependent kinase-2 inhibitor by assessing its shape similarity, biochemical activity, and physicochemical properties [26] .

QSAR modeling tools have been utilized for the identification of potential drug candidates and have evolved into AI-based QSAR approaches, such as linear discriminant analysis (LDA), support vector machines (SVMs), random forest (RF) and decision trees, which can be applied to speed up QSAR analysis 27 , 28 , 29 . King et al. found a negligible statistical difference when the ability of six AI algorithms to rank anonymous compounds in terms of biological activity was compared with that of traditional approaches [30] .

AI in drug screening

The process of discovering and developing a drug can take over a decade and costs US$2.8 billion on average. Even then, nine out of ten therapeutic molecules fail Phase II clinical trials and regulatory approval 31 , 32 . Algorithms, such as Nearest-Neighbour classifiers, RF, extreme learning machines, SVMs, and deep neural networks (DNNs), are used for VS based on synthesis feasibility and can also predict in vivo activity and toxicity 31 , 33 . Several biopharmaceutical companies, such as Bayer, Roche, and Pfizer, have teamed up with IT companies to develop a platform for the discovery of therapies in areas such as immuno-oncology and cardiovascular diseases [19] . The aspects of VS to which AI has been applied are discussed below.

Prediction of the physicochemical properties

Physicochemical properties, such as solubility, partition coefficient (logP), degree of ionization, and intrinsic permeability of the drug, indirectly affect its pharmacokinetics properties and its target receptor family and, hence, must be considered when designing a new drug [34] . Different AI-based tools can be used to predict physicochemical properties. For example, ML uses large data sets produced during compound optimization done previously to train the program [35] . Algorithms for drug design include molecular descriptors, such as SMILES strings, potential energy measurements, electron density around the molecule, and coordinates of atoms in 3D, to generate feasible molecules via DNN and thereby predict its properties [36] .

Zang et al. created a quantitative structure–property relationship (QSPR) workflow to determine the six physicochemical properties of environmental chemicals obtained from the Environmental Protection Agency (EPA) called the Estimation Program Interface (EPI) Suite [35] . Neural networks based on the ADMET predictor and ALGOPS program have been used to predict the lipophilicity and solubility of various compounds [37] . DL methods, such as undirected graph recursive neural networks and graph-based convolutional neural networks (CVNN), have been used to predict the solubility of molecules [38] .

In several instances, ANN-based models, graph kernels, and kernel ridge-based models were developed to predict the acid dissociation constant of compounds 35 , 39 . Similarly, cell lines, such as Madin-Darby canine kidney cells and human colon adenocarcinoma (Caco-2) cells have been utilized to generate cellular permeability data of a diverse class of molecules, which are subsequently fed to AI-assisted predictors [34] .

Kumar et al. developed six predictive models [SVMs, ANNs, k-nearest neighbor algorithms, LDAs, probabilistic neural network algorithms, and partial least square (PLS)] utilizing 745 compounds for training; these were used later on 497 compounds to predict their intestinal absorptivity based on parameters including molecular surface area, molecular mass, total hydrogen count, molecular refractivity, molecular volume, logP, total polar surface area, the sum of E- states indices, solubility index (log S), and rotatable bonds [40] . On similar lines, RF and DNN-based in silico models were developed to determine human intestinal absorption of a variety of chemical compounds [41] . Thus, AI has a significant role in the development of a drug, to predict not only its desired physicochemical properties, but also the desired bioactivity.

Prediction of bioactivity

The efficacy of drug molecules depends on their affinity for the target protein or receptor. Drug molecules that do not show any interaction or affinity towards the targeted protein will not be able to deliver the therapeutic response. In some instances, it might also be possible that developed drug molecules interact with unintended proteins or receptors, leading to toxicity. Hence, drug target binding affinity (DTBA) is vital to predict drug–target interactions. AI-based methods can measure the binding affinity of a drug by considering either the features or similarities of the drug and its target. Feature-based interactions recognize the chemical moieties of the drug and that of the target to determine the feature vectors. By contrast, in similarity-based interaction, the similarity between drug and target is considered, and it is assumed that similar drugs will interact with the same targets [42] .

Web applications, such as ChemMapper and the similarity ensemble approach (SEA), are available for predicting drug–target interactions [43] . Many strategies involving ML and DL have been used to determine DTBA, such as KronRLS, SimBoost, DeepDTA, and PADME. ML-based approaches, such as Kronecker-regularized least squares (KronRLS), evaluate the similarity between drugs and protein molecules to determine DTBA. Similarly, SimBoost utilized regression trees to predict DTBA, and considers both feature-based and similarity-based interactions. Drug features from SMILES, ligand maximum common substructure (LMCS), extended connectivity fingerprint, or a combination thereof can also be considered [42] .

DL approaches have shown improved performance compared with ML because they apply network-based methods that do not depend on the availability of the 3D protein structure [43] . DeepDTA, PADME, WideDTA, and DeepAffinity are some DL methods used to measure DTBA. DeepDTA accepts drug data in the form of SMILES, whereby, the amino acid sequence is entered for protein input data and for the 1D representation of the drug structure [44] . WideDTA is CVNN DL method that incorporates ligand SMILES (LS), amino acid sequences, LMCS, and protein domains and motifs as input data for assessing the binding affinity [45] .

DeepAffinity and Protein And Drug Molecule interaction prEdiction (PADME) are similar to the approaches described earlier [46] . DeepAffinity is an interpretable DL model that uses both RNN and CNN and both unlabeled and labeled data. It takes into account the compound in the SMILES format and protein sequences in the structural and physicochemical properties [47] . PADME is a DL-based platform that utilizes feed-forward neural networks for predicting drug target interactions (DTIs). It considers the combination of the features of the drug and target protein as input data and forecasts the interaction strength between the two. For the drug and the target, the SMILES representation and the protein sequence composition (PSC) are used for illustration, respectively [46] . Unsupervised ML techniques, such as MANTRA and PREDICT, can be used to forecast the therapeutic efficacy of drugs and target proteins of known and unknown pharmaceuticals, which can also be extrapolated to the application of drug repurposing and interpreting the molecular mechanism of the therapeutics. MANTRA groups compound based on similar gene expression profiles using a CMap data set and clusters those compounds predicted to have a common mechanism of action and common biological pathway [43] . The bioactivity of a drug also includes ADME data. AI-based tools, such as XenoSite, FAME, and SMARTCyp, are involved in determining the sites of metabolism of the drug. In addition, software such as CypRules, MetaSite, MetaPred, SMARTCyp, and WhichCyp were used to identify specific isoforms of CYP450 that mediate a particular drug metabolism. The clearance pathway of 141 approved drugs was done by SVM-based predictors with high accuracy [48] .

Prediction of toxicity

The prediction of the toxicity of any drug molecule is vital to avoid toxic effects. Cell-based in vitro assays are often used as preliminary studies, followed by animal studies to identify the toxicity of a compound, increasing the expense of drug discovery. Several web-based tools, such as LimTox, pkCSM, admetSAR, and Toxtree, are available to help reduce the cost [35] . Advanced AI-based approaches look for similarities among compounds or project the toxicity of the compound based on input features. The Tox21 Data Challenge organized by the National Institutes of Health, Environmental Protection Agency (EPA), and US Food and Drug Administration (FDA) was an initiative to evaluate several computational techniques to forecast the toxicity of 12 707 environmental compounds and drugs [35] ; an ML algorithm named DeepTox outperformed all methods by identifying static and dynamic features within the chemical descriptors of the molecules, such as molecular weight (MW) and Van der Waals volume, and could efficiently predict the toxicity of a molecule based on predefined 2500 toxicophore features [49] . The different AI tools used in drug discovery are listed in Table 1 .

Examples of AI tools used in drug discovery

ToolsDetailsWebsite URLRefs
DeepChemMLP model that uses a python-based AI system to find a suitable candidate in drug discovery
DeepToxSoftware that predicts the toxicity of total of 12 000 drugs
DeepNeuralNetQSARPython-based system driven by computational tools that aid detection of the molecular activity of compounds
ORGANICA molecular generation tool that helps to create molecules with desired properties
PotentialNetUses NNs to predict binding affinity of ligands
Hit DexterML technique to predict molecules that might respond to biochemical assays
DeltaVinaA scoring function for rescoring drug–ligand binding affinity
Neural graph fingerprintHelps to predict properties of novel molecules
AlphaFoldPredicts 3D structures of proteins
ChemputerHelps to report procedure for chemical synthesis in standardized format

SEA was used to evaluate the safety target prediction of 656 marketed drugs against 73 unintended targets that might produce adverse effects [43] . Developed using an ML-based approach, eToxPred was applied to estimate the toxicity and synthesis feasibility of small organic molecules and showed accuracy as high as 72% [48] . Similarly, open-source tools, such as TargeTox and PrOCTOR, are also used in toxicity prediction [50] . TargeTox is biological network target-based drug toxicity risk prediction method that uses the guilt-by-association principle whereby entities that have similar functional properties share similarities in biological networks [51] . It can produce protein network data and unite pharmacological and functional properties in a ML classifier to predict drug toxicity [52] . PrOCTOR was trained using a RF model and took into account drug-likeliness properties, molecular features, target-based features, and properties of the protein targets to generate a ‘PrOCTOR score’, which forecasted whether a drug would fail in clinical trials owing to its toxicity. It also recognized FDA-approved drugs that later reported adverse drug events [53] . In another approach, Tox_(R)CNN involving a deep CVNN method evaluated the cytotoxicity of drugs that had been exposed to DAPI-stained cells [54] .

AI in designing drug molecules

Prediction of the target protein structure.

While developing a drug molecule, it is essential to assign the correct target for successful treatment. Numerous proteins are involved in the development of the disease and, in some cases, they are overexpressed. Hence, for selective targeting of disease, it is vital to predict the structure of the target protein to design the drug molecule. AI can assist in structure-based drug discovery by predicting the 3D protein structure because the design is in accordance with the chemical environment of the target protein site, thus helping to predict the effect of a compound on the target along with safety considerations before their synthesis or production [55] . The AI tool, AlphaFold, which is based on DNNs, was used to analyze the distance between the adjacent amino acids and the corresponding angles of the peptide bonds to predict the 3D target protein structure and demonstrated excellent results by correctly predicting 25 out of 43 structures.

In a study by AlQurashi, RNN was used to predict the protein structure. The author considered three stages (i.e., computation, geometry, and assessment) termed a recurrent geometric network (RGN). Here, the primary protein sequence was encoded, and the torsional angles for a given residue and a partially completed backbone obtained from the geometric unit upstream of this were then considered as input and provided a new backbone as output. The final unit produced the 3D structure as the output. Assessment of the deviation of predicted and experimental structures was done using the distance-based root mean square deviation (dRMSD) metric. The parameters in RGN were optimized to keep the dRMSD low between the experimental and predicted structures [56] . AlQurashi predicted that his AI method would be quicker than AlphaFold in terms of the time taken to predict the protein structure. However, AlphaFold is likely to have better accuracy in predicting protein structures with sequences similar to the reference structures [57] .

A study was conducted to predict the 2D structure of a protein using MATLAB assisted by a nonlinear three-layered NN toolbox based on a feed-forward supervised learning and backpropagation error algorithm. MATLAB was used to train input and output data sets, and the NNs were learning algorithms and performance evaluators. The accuracy in predicting the 2D structure was 62.72% [58] .

Predicting drug–protein interactions

Drug–protein interactions have a vital role in the success of a therapy. The prediction of the interaction of a drug with a receptor or protein is essential to understand its efficacy and effectiveness, allows the repurposing of drugs, and prevents polypharmacology [55] . Various AI methods have been useful in the accurate prediction of ligand–protein interactions, ensuring better therapeutic efficacy 55 , 59 . Wang et al. reported a model using the SVM approach, trained on 15 000 protein–ligand interactions, which were developed based on primary protein sequences and structural characteristics of small molecules to discover nine new compounds and their interaction with four crucial targets [60] .

Yu et al. exploited two RF models to predict possible drug–protein interactions by the integration of pharmacological and chemical data and validating them against known platforms, such as SVM, with high sensitivity and specificity. Also, these modes were capable of predicting drug–target associations that could be further extended to target–disease and target–target associations, thereby speeding up the drug discovery process [61] . Xiao et al. adopted the Synthetic Minority Over-Sampling Technique and the Neighborhood Cleaning Rule to obtain optimized data for the subsequent development of iDrugTarget. This is a combination of four subpredictors (iDrug-GPCR, iDrug-Chl, iDrug-Enz, and iDrug-NR) for identifying interactions between a drug and G-protein-coupled receptors (GPCRs), ion channels, enzymes, and nuclear receptors (NR) respectively. When this predictor was compared with existing predictors through target-jackknife tests, the former surpassed the latter in terms of both prediction accuracy and consistency [62] .

The ability of AI to predict drug–target interactions was also used to assist the repurposing of existing drugs and avoiding polypharmacology. Repurposing an existing drug qualifies it directly for Phase II clinical trials [19] . This also reduces expenditure because relaunching an existing drug costs ∼US$8.4 million compared with the launch of a new drug entity (∼US$41.3 million) [63] . The ‘Guilt by association’ approach can be utilized to forecast the innovative association of a drug and disease, which is either a knowledge-based or computationally driven network [64] . In a computationally driven network, the ML approach is widely used, which utilizes techniques such as SVM, NN, logistic regression, and DL. Logistic regression platforms, such as PREDICT, SPACE, and other ML approaches, consider drug–drug, disease–disease similarity, the similarity between target molecules, chemical structure, and gene expression profiles while repurposing a drug [65] .

Cellular network-based deep learning technology (deepDTnet) has been explored to predict the therapeutic use of topotecan, currently used as a topoisomerase inhibitor. It can also be used for the therapy of multiple sclerosis by inhibiting human retinoic acid receptor-related orphan receptor-gamma t (ROR-γt) [66] . This platform is currently under a provisional US patent. Self-organizing maps (SOMs) are in the unsupervised category of ML and are used in drug repurposing. They use a ligand-based approach to search novel off-targets for a set of drug molecules by training the system on a defined number of compounds with recognized biological activities, which is later used for the analysis of different compounds [67] . In a recent study, DNN was used to repurpose existing drugs with proven activity against SARS-CoV, HIV, influenza virus, and drugs that are 3C-like protease inhibitors. In this, extended connectivity fingerprint (ECFP), functional-class fingerprints (FCFPs), and an octanol-water partition coefficient (ALogP_count) were considered to train the AI platform. From the results, it was concluded that 13 of the screened drugs could be carried toward further development based on their cytotoxicity and viral inhibition [68] .

Drug–protein interactions can also predict the chances of polypharmacology, which is the tendency of a drug molecule to interact with multiple receptors producing off-target adverse effects [69] . AI can design a new molecule based on the rationale of polypharmacology and aid in the generation of safer drug molecules [70] . AI platforms such as SOM, along with the vast databases available, can be used to link several compounds to numerous targets and off-targets. Bayesian classifiers and SEA algorithms can be used to establish links between the pharmacological profiles of drugs and their possible targets [67] .

Li et al. demonstrated the use of KinomeX, an AI-based online medium using DNNs for the detection of polypharmacology of kinases based on their chemical structures. This platform uses DNN trained with ∼14 000 bioactivity data points developed based on >300 kinases. Thus, it has practical application in studying the overall selectivity of a drug towards the kinase family and particular subfamilies of kinases, thus helping to design novel chemical modifiers. This study used NVP-BHG712 as a model compound to predict its primary targets and also its off-targets with reasonable accuracy [71] . One prominent instance is Cyclica’s cloud-based proteome-screening AI platform, Ligand Express, which is used to find receptors that can interact with a particular small molecule (the molecular description of which is in SMILE string) and produce on and off-target interactions. This helps in understanding the possible adverse effects of the drug [72] .

AI in de novo drug design Over the past few years, the de novo drug design approach has been widely used to design drug molecules. The traditional method of de novo drug design is being replaced by evolving DL methods, the former having shortcomings of complicated synthesis routes and difficult prediction of the bioactivity of the novel molecule [36] . Computer-aided synthesis planning can also suggest millions of structures that can be synthesized and also predicts several different synthesis routes for them [73] .

Grzybowski et al. developed the Chematica program [74] , now renamed Synthia, which has the ability to encode a set of rules into the machine and propose possible synthesizing routes for eight medicinally essential targets. This program has proven to be efficient both in terms of improving the yield and reducing expenses. It is also capable of providing alternate synthesizing strategies for patented products and is said to be helpful in the synthesis of compounds that have not yet been synthesized. Similarly, DNN focuses on rules of organic chemistry and retrosynthesis, which, with the aid of Monte-Carlo tree searches and symbolic AI, help in reaction prediction and the process of drug discovery and design, which is much faster than traditional methods 75 , 76 .

Coley et al. developed a framework in which a rigid forward reaction template was applied to a group of reactants to synthesize chemically feasible products with a significant rate of reaction. ML was used to determine the dominant product based on a score given by the NNs [23] . Putin et al. explored a DNN architecture called the reinforced adversarial neural computer (RANC) based on RL for de novo design of small organic molecules. This platform was trained with molecules represented as SMILES strings. It then generated molecules with predefined chemical descriptors in terms of MW, logP, and topological polar surface area (TPSA). RANC was compared with another platform, ORGANIC, where the former outperformed in generating unique structures without sufficient loss of their structure length [77] .

Even RNN was based on the long short-term memory (LSTM) relating to molecules obtained from the ChEMBL database and fed as SMILES strings. This was used to generate a diverse library of molecules for VS. This approach was extended to procure novel molecules toward a particular target, such as targets for the 5-HT2A receptor, Staphylococcus aureus , and Plasmodium falciparum [78] .

Popova et al. developed the Reinforcement Learning for Structural Evolution strategy for de novo drug synthesis, which involves generative and predictive DNNs to develop new compounds. In this, the generative model produces more unique molecules in terms of SMILE strings based on a stack memory, whereas the predictive models are used to forecast the properties of the developed compound [79] . Merk et al. also exploited the generative AI model to design retinoid X and PPAR agonist molecules, with desired therapeutic effects without requiring complex rules. The authors successfully designed five molecules, four out of which have shown good modulatory activity in cell assays, thereby emphasizing the use of generative AI in new molecule synthesis [80] . The involvement of AI in the de novo design of molecules can be beneficial to the pharmaceutical sector because of its various advantages, such as providing online learning and simultaneous optimization of the already-learned data as well as suggesting possible synthesis routes for compounds leading to swift lead design and development 78 , 81 .

AI in advancing pharmaceutical product development

The discovery of a novel drug molecule requires its subsequent incorporation in a suitable dosage form with desired delivery characteristics. In this area, AI can replace the older trial and error approach [82] . Various computational tools can resolve problems encountered in the formulation design area, such as stability issues, dissolution, porosity, and so on, with the help of QSPR [83] . Decision-support tools use rule-based systems to select the type, nature, and quantity of the excipients depending on the physicochemical attributes of the drug and operate through a feedback mechanism to monitor the entire process and intermittently modify it [84] .

Guo et al. integrated Expert Systems (ES) and ANN to create a hybrid system for the development of direct-filling hard gelatin capsules of piroxicam in accordance with the specifications of its dissolution profile. The MODEL EXPERT SYSTEM (MES) makes decisions and recommendations for formulation development based on the input parameters. By contrast, ANN uses backpropagation learning to link formulation parameters to the desired response, jointly controlled by the control module, to ensure hassle-free formulation development [82] .

Various mathematical tools, such as computational fluid dynamics (CFD), discrete element modeling (DEM), and the Finite Element Method have been used to examine the influence of the flow property of the powder on the die-filling and process of tablet compression 85 , 86 . CFD can also be utilized to study the impact of tablet geometry on its dissolution profile [87] . The combination of these mathematical models with AI could prove to be of immense help in the rapid production of pharmaceutical products.

AI in pharmaceutical manufacturing

With the increasing complexities of manufacturing processes along with increasing demand for efficiency and better product quality, modern manufacturing systems are trying to confer human knowledge to machines, continuously changing the manufacturing practice [88] . The incorporation of AI in manufacturing can prove to be a boost for the pharmaceutical industry. Tools, such as CFD, uses Reynolds-Averaged Navier-Stokes solvers technology that studies the impact of agitation and stress levels in different equipment (e.g., stirred tanks), exploiting the automation of many pharmaceutical operations. Similar systems, such as direct numerical simulations and large eddy simulations, involve advanced approaches to solve complicated flow problems in manufacturing [85] .

The novel Chemputer platform helps digital automation for the synthesis and manufacturing of molecules, incorporating various chemical codes and operating by using a scripting language known as Chemical Assembly [23] . It has been successfully used for the synthesis and manufacture of sildenafil, diphenhydramine hydrochloride, and rufinamide, with the yield and purity significantly similar to manual synthesis [89] . The estimated completion of granulation in granulators of capacities ranging from 25 to 600 l can be done efficiently by AI technologies [90] . The technology and neuro-fuzzy logic correlated critical variables to their responses. They derived a polynomial equation for the prediction of the proportion of the granulation fluid to be added, required speed, and the diameter of the impeller in both geometrically similar and dissimilar granulators [91] .

DEM has been widely utilized in the pharmaceutical industry, such as in studying the segregation of powders in a binary mixture, the effects of varying blade speed and shape, predicting the possible path of the tablets in the coating process, along with analysis of time spent by tablets under the spray zone [85] . ANNs, along with fuzzy models, studied the correlation between machine settings and the problem of capping to reduce tablet capping on the manufacturing line [92] .

Meta-classifier and tablet-classifier are AI tools that help to govern the quality standard of the final product, indicating a possible error in the manufacturing of the tablet [93] . A patent has been filed, demonstrating a system capable of determining the most exquisite combination of drug and dosage regimen for each patient, using a processor receiving patient information, and designs the desired transdermal patch accordingly [94] .

AI in quality control and quality assurance

Manufacturing of the desired product from the raw materials includes a balance of various parameters [93] . Quality control tests on the products, as well as maintenance of batch-to-batch consistency, require manual interference. This might not be the best approach in each case, showcasing the need for AI implementation at this stage [85] . The FDA amended the Current Good Manufacturing Practices (cGMP) by introducing a ‘Quality by Design’ approach to understand the critical operation and specific criteria that govern the final quality of the pharmaceutical product [95] .

Gams et al. used a combination of human efforts and AI, wherein preliminary data from production batches were analyzed and decision trees developed. These were further translated into rules and analyzed by the operators to guide the production cycle in the future [93] . Goh et al. studied the dissolution profile, an indicator of batch-to-batch consistency of theophylline pellets with the aid of ANN, which correctly predicted the dissolution of the tested formulation with an error of <8% [96] .

AI can also be implemented for the regulation of in-line manufacturing processes to achieve the desired standard of the product [95] . ANN-based monitoring of the freeze-drying process is used, which applies a combination of self-adaptive evolution along with local search and backpropagation algorithms. This can be used to predict the temperature and desiccated-cake thickness at a future time point ( t + Δ t ) for a particular set of operating conditions, eventually helping to keep a check on the final product quality [97] .

An automated data entry platform, such as an Electronic Lab Notebook, along with sophisticated, intelligent techniques, can ensure the quality assurance of the product [98] . Also, data mining and various knowledge discovery techniques in the Total Quality Management expert system can be used as valuable approaches in making complex decisions, creating new technologies for intelligent quality control [99] .

AI in clinical trial design

Clinical trials are directed toward establishing the safety and efficacy of a drug product in humans for a particular disease condition and require 6–7 years along with a substantial financial investment. However, only one out of ten molecules entering these trials gain successful clearance, which is a massive loss for the industry [100] . These failures can result from inappropriate patient selection, shortage of technical requirements, and poor infrastructure. However, with the vast digital medical data available, these failures can be reduced with the implementation of AI [101] .

The enrolment of patients takes one-third of the clinical trial timeline. The success of a clinical trial can be ensured by the recruitment of suitable patients, which otherwise leads to ∼86% of failure cases [102] . AI can assist in selecting only a specific diseased population for recruitment in Phase II and III of clinical trials by using patient-specific genome–exposome profile analysis, which can help in early prediction of the available drug targets in the patients selected 19 , 101 . Preclinical discovery of molecules as well as predicting lead compounds before the start of clinical trials by using other aspects of AI, such as predictive ML and other reasoning techniques, help in the early prediction of lead molecules that would pass clinical trials with consideration of the selected patient population [101] .

Drop out of patients from clinical trials accounts for the failure of 30% of the clinical trials, creating additional recruiting requirements for the completion of the trial, leading to a wastage of time and money. This can be avoided by close monitoring of the patients and helping them follow the desired protocol of the clinical trial [102] . Mobile software was developed by AiCure that monitored regular medication intake by patients with schizophrenia in a Phase II trial, which increased the adherence rate of patients by 25%, ensuring successful completion of the clinical trial [19] .

AI in pharmaceutical product management

Ai in market positioning.

Market positioning is the process of creating an identity of the product in the market to attract consumers to buy them, making it an essential element in almost all business strategies for companies to establish their own unique identity 103 , 104 . This approach was used in the marketing of pioneer brand Viagra, where the company targeted it not only for the treatment of men’s erectile dysfunction, but also for other problems affecting quality of life [105] .

With the help of technology and e-commerce as a platform, it has become easier for companies to get a natural recognition of their brand in the public domain. Companies exploit search engines as one of the technological platforms to occupy a prominent position in online marketing and help in the positioning of the product in the market, as also confirmed by the Internet Advertising Bureau. Companies continuously try to rank their websites higher than those of other companies, giving recognition to their brand in a short period [106] .

Other techniques, such as statistical analysis methods, particle swarm optimization algorithms (proposed by Eberhart and Kennedy in 1995) in combination with NNs, provided a better idea about markets. They can help decide the marketing strategy for the product based on accurate consumer-demand prediction [107] .

AI in market prediction and analysis

The success of a company lies in the continuous development and growth of its business. Even with access to substantial funds, R&D output in the pharmaceutical industry is falling because of the failure of companies to adopt new marketing technologies [108] . The advances in digital technologies, referred to as the ‘Fourth industrial revolution’, is helping innovative digitalized marketing via a multicriteria decision-making approach, which collects and analyzes statistical and mathematical data and implements human inferences to make AI-based decision-making models explore new marketing methodology [109] .

AI also helped in a comprehensive analysis of the fundamental requirements of a product from the customer’s point of view as well as understanding the need of the market, which aid in decision-making using prediction tools. It can also forecast sales and analyze the market. AI-based software engages consumers and creates awareness among physicians by displaying advertisements directing them to the product site by just a click [110] . In addition, these methods use natural language-processing tools to analyze keywords entered by customers and relate them to the probability of purchasing the product 111 , 112 .

Several businesses to business (B2B) companies have announced self-service technologies that allow free browsing of health products, easily found by giving its specification, place orders, and track their shipping. Pharmaceutical companies are also introducing their online applications such as 1 mg, Medline, Netmeds, and Ask Apollo, to fulfill the unmet needs of the patients [109] . Prediction of the market is also essential for various pharmaceutical distribution companies, which can implement AI in the field, such as ‘Business intelligent Smart Sales Prediction Analysis’, which uses a combination of time series forecasting and real-time application. This helps pharmaceutical companies to predict the sale of products in advance to prevent costs of excess stock or prevent customer loss because of shortages [113] .

AI in product cost

Based on the market analysis and cost incurred in the development of the pharmaceutical product, the company determines the final price of the product. The critical concept in applying AI to determine this price is harnessing its ability to mimic the thinking of a human expert to assess the factors that control the pricing of a product after its manufacture [114] . Factors, such as expenditure during research and development of the drug, strict price regulatory schemes in the concerned country, length of the exclusivity period, market share of the innovated drug after a year before are patent expiry, price of the reference product, and price-fixing policies determine the price of branded and generic drugs [115] .

In ML, large sets of statistical data, such as product development cost, product demand in the market, inventory cost, manufacturing cost, and competitors’ product price, are analyzed by the software, subsequently developing algorithms for predicting the product price. AI platforms, such as In competitor, launched by Intelligence Node (founded in the year 2012), is a complete retail competitive intelligence platform that analyzes the competitor pricing data and helps retailers and brands to monitor the competition. Wise Athena and Navetti PricePoint enable the user to determine the pricing of their product, suggesting that pharmaceutical companies can adopt the same to assist product costing [116] .

AI-based advanced applications

Ai-based nanorobots for drug delivery.

Nanorobots comprise mainly integrated circuits, sensors, power supply, and secure backup of data, which are maintained via computational technologies, such as AI 117 , 118 . They are programmed to avoid the collision, target identification, detect and attach, and finally excretion from the body. Advances in nano/microrobots give them the ability to navigate to the targeted site based on physiological conditions, such as pH, thus improving the efficacy and reducing systemic adverse effects [118] . Development of implantable nanorobots developed for controlled delivery of drugs and genes requires consideration of parameters such as dose adjustment, sustained release, and control release, and the release of the drugs requires automation controlled by AI tools, such as NNs, fuzzy logic, and integrators [119] . Microchip implants are used for programmed release as well as to detect the location of the implant in the body.

AI in combination drug delivery and synergism/antagonism prediction

Several combinations of drugs are approved and marketed to treat complex diseases, such as TB and cancer, because they can provide a synergistic effect for quick recovery 120 , 121 . The selection of precise and potential drugs for combination requires high-throughput screening of a considerable number of drugs, making the process tedious; for example, cancer therapy requires six or seven drugs as a combination therapy. ANNs, logistic regression, and network-based modeling can screen drug combinations and improve overall dose regimen 120 , 122 . Rashid et al. developed a quadratic phenotype optimization platform for the detection of optimal combination therapy for the treatment of bortezomib-resistant multiple myeloma using a collection of 114 FDA-approved drugs. This model recommended the combination of decitabine (Dec) and mitomycin C (MitoC) as the best two-drug combination and Dec, MitoC, and mechlorethamine as the superior three-drug combination [121] .

Combination drug delivery can be more efficient if backed up by data on the synergism or antagonism of drugs administered together. The Master Regulator Inference Algorithm used ‘Mater regulator genes’ to efficiently predict 56% synergism. Other methods, such as Network-based Laplacian regularized least square synergistic drug combination, and RF, can also be used for the same [122] .

Li et al. developed a synergistic drug combination model using RF for the prediction of synergistic anticancer drug combinations. This model was formed based on gene expression profiles and various networks, and the authors successfully predicted 28 synergistic anticancer combinations. They have reported three such combinations, although the remainder might also prove to be important [69] . Similarly, Mason et al. applied an ML approach, called the Combination Synergy Estimation, to predict potential synergistic antimalarial combinations based on a data set of 1540 antimalarial drug compounds [123] .

AI emergence in nanomedicine

Nanomedicines use nanotechnology and medicines for the diagnosis, treatment, and monitoring of complex diseases, such as HIV, cancer, malaria, asthma, and various inflammatory diseases. In recent years, nanoparticle-modified drug delivery has become important in the field of therapeutics and diagnostics because they have enhanced efficacy and treatment 121 , 124 . A combination of nanotechnology and AI could provide solutions to many problems in formulation development [125] .

A methotrexate nanosuspension was computationally formulated by studying the energy generated on the interaction between the drug molecules, monitoring the conditions that could lead to the aggregation of the formulation [83] . Coarse-grained simulation, along with chemical calculation, can aid the determination of drug–dendrimer interactions and evaluation of drug encapsulation within the dendrimer. In addition, software such as LAMMPS and GROMACS 4 can be used to examine the impact of surface chemistry on the internalization of nanoparticles into cells [83] .

AI assisted the preparation of silicasomes, which is a combination of iRGD, a tumor-penetrating peptide, and irinotecan-loaded multifunctional mesoporous silica nanoparticles. This increased the uptake of silicasomes three–fourfold because iRGD improves the transcytosis of silicasomes, with improved treatment outcome and enhanced overall survival [124] .

Pharmaceutical market of AI

To decrease the financial cost and chances of failures that accompany VS, pharmaceutical companies are shifting towards AI. There was an increase in the AI market from US$200 million in 2015 to US$700 million in 2018, and is expected to increase to $5 billion by 2024 [126] . A 40% projected growth from 2017 to 2024 indicates that AI will likely revolutionize the pharmaceutical and medical sectors. Various pharmaceutical companies have made and are continuing to invest in AI and have collaborated with AI companies to developed essential healthcare tools. The collaboration of DeepMind Technologies, a subsidiary of Google, with the Royal Free London NHS Foundation Trust for the assistance of acute kidney injury, is an example of this. Major pharmaceutical companies and AI players are detailed in Figure 4 [19] .

Figure 4

Leading pharmaceutical companies and their association with Artificial Intelligence (AI) organizations that are working in fields including oncology, cardiovascular diseases, and central nervous system disorders.

Ongoing challenges in adopting AI: leads on ways to overcome

The entire success of AI depends on the availability of a substantial amount of data because these data are used for the subsequent training provided to the system. Access to data from various database providers can incur extra costs to a company, and the data should also be reliable and high quality to ensure accurate result prediction. Other challenges that prevent full-fledged adoption of AI in the pharmaceutical industry include the lack of skilled personnel to operate AI-based platforms, limited budget for small organizations, apprehension of replacing humans leading to job loss, skepticism about the data generated by AI, and the black box phenomenon (i.e., how the conclusions are reached by the AI platform) [6] .

Automation of certain tasks in drug development, manufacturing, and supply chains, clinical trials, and sales will take place with time, but these all fall under the category of ‘narrow AI’; where AI has to be trained using a large volume of data and, thus, makes it suitable for a particular task. Therefore, human intervention is mandatory for the successful implementation, development, and operation of the AI platform. However, the fear of unemployment could be a myth given that AI is currently is taking over repetitive jobs, while leaving scope for human intelligence to be used for developing more complicated insights and creativity.

Nevertheless, AI has been adopted by several pharmaceutical companies, and it is expected that a revenue of US$2.199 billion will be created by 2022 through AI-based solutions in the pharmaceutical sector, with an investment exceeding US$7.20 billion across 300+ deals between 2013 and 2018 by the pharmaceutical industry [127] . Pharmaceutical organizations need clarity about the potential of AI technology in finding solutions to problems once it has been implemented, along with understanding the reasonable goals that can be achieved. Skilled data scientists, software engineers with a sound knowledge of AI technology, and a clear understanding of the company business target and its R&D goal can be developed to utilize the full potential of the AI platform.

Concluding remarks and prospects

The advancement of AI, along with its remarkable tools, continuously aims to reduce challenges faced by pharmaceutical companies, impacting the drug development process along with the overall lifecycle of the product, which could explain the increase in the number of start-ups in this sector [23] . The current healthcare sector is facing several complex challenges, such as the increased cost of drugs and therapies, and society needs specific significant changes in this area. With the inclusion of AI in the manufacturing of pharmaceutical products, personalized medications with the desired dose, release parameters, and other required aspects can be manufactured according to individual patient need [85] . Using the latest AI-based technologies will not only speed up the time needed for the products to come to the market, but will also improve the quality of products and the overall safety of the production process, and provide better utilization of available resources along with being cost-effective, thereby increasing the importance of automation [128] .

The most significant worry regarding the incorporation of these technologies is the job losses that would follow and the strict regulations needed for the implementation of AI. However, these systems are intended only to make work easier and not to completely replace humans [129] . AI can not only aid quick and hassle-free hit compound identification, but also contribute to suggestions of synthesis routes of these molecules along with the prediction of the desired chemical structure and an understanding of drug–target interactions and its SAR.

AI can also make major contributions to the further incorporation of the developed drug in its correct dosage form as well as its optimization, in addition to aiding quick decision-making, leading to faster manufacturing of better-quality products along with assurance of batch-to-batch consistency. AI can also contribute to establishing the safety and efficacy of the product in clinical trials, as well as ensuring proper positioning and costing in the market through comprehensive market analysis and prediction. Although there are no drugs currently on the market developed with AI-based approaches and specific challenges remain with regards to the implementation of this technology, it is likely that AI will become an invaluable tool in the pharmaceutical industry in the near future.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in the paper.

Acknowledgments

The authors acknowledge the Department of Pharmaceuticals, Ministry of Chemicals and Fertilizers, Government of India for financial support. R.K.T. would like to acknowledge the Science and Engineering Research Board (Statutory Body Established through an Act of Parliament: SERB Act 2008), Department of Science and Technology, Government of India for a grant (Grant #ECR/2016/001964) and N-PDF for funding (PDF/2016/003329) research in his laboratory.

Biographies

An external file that holds a picture, illustration, etc.
Object name is fx1_lrg.jpg

Rakesh K. Tekade , currently an associate professor in NIPER, Ahmedabad, is an academic-researcher with >10 years of teaching and research experience. Dr Tekade’s research group investigates the design, development, and characterization of targeted nanotechnology-based products for the site-specific delivery of therapeutic drugs, siRNA, miRNA, and so on, for the treatment of cancer, diabetes, arthritis, and neurological disorders. He has coauthored >100 peer-reviewed publications in international journals, contributed >50 international reference book chapters, five invited editorial articles, and four patent applications. Dr Tekade is an editor in chief of a Book Series entitled Advances in Pharmaceutical Product Development and Research Series .

An external file that holds a picture, illustration, etc.
Object name is fx2_lrg.jpg

Debleena Paul received a BSc in pharmacy from Maulana Abul Kalam Azad University of Technology and is currently pursuing her MS Pharm at NIPER-Ahmedabad under the guidance Rakesh K. Tekade. Her research focuses on the development of in situ gelling dusting powder for wound dressing applications.

An external file that holds a picture, illustration, etc.
Object name is fx3_lrg.jpg

Kiran Kalia is a professor of pharmacology and Director of NIPER, Ahmedabad; she is also a professor in lien in the Department of Biosciences, Sardar Patel University. She has research experience spanning over 35 years and has mentored several MSc and PhD students. She was an awardee of the Indian National Science Academy (INSA) Research fellowship and has received several CSIR Fellowships. Professor Kiran is an editorial board and review committee member of several international journals. Her research interests encompass proteomic markers for diabetic nephropathy from urine, genomic markers of the susceptibility of diabetic retinopathy, genomic alterations in oral cancer, and environmental biotechnology & toxicology studies.

IMAGES

  1. Drug Development

    research paper about drug development

  2. Custom Writing Service

    research paper about drug development

  3. Drug Development Process Timeline

    research paper about drug development

  4. Drug Development Overview

    research paper about drug development

  5. Research Paper Drug Addiction and Abuse

    research paper about drug development

  6. Drug Development Essay

    research paper about drug development

VIDEO

  1. Drug discovery and development process

  2. Unlocking the mysteries of drug development in clinical research

  3. research & development in drug discovery

  4. Introduction to How Drugs are Developed

  5. How a Drug Becomes a Drug

  6. Drug development process: Overview

COMMENTS

  1. Drug Development Research

    Drug Development Research is an interdisciplinary pharmacology journal publishing papers and reviews covering all areas of drug development, including medicinal and process chemistry, biotechnology and biopharmaceuticals, toxicology, drug delivery, formulation, pharmacokinetics, and clinical trial reviews. Since 1981, we serve a diverse research community including pharmacologists, pharmacists ...

  2. Nature Reviews Drug Discovery

    Nature Reviews Drug Discovery is a journal for people interested in drug discovery and development. It features reviews, news, analysis and research highlights.

  3. A comprehensive review of discovery and development of drugs discovered

    1. Introduction. Drug discovery and development is a process that involves the identification, optimization, pre-clinical and clinical studies to extensively test and characterize the new drug molecule for its pharmacological properties and toxicity profile (Sleire et al., 2017).After the successful completion of the Human Genome project in 2003, a rough draft of the human genome has been ...

  4. Recent articles and discoveries in Drug Development

    Find the latest research papers and news in Drug Development. Read stories and opinions from top researchers in our research community. ... Uncover the latest and most impactful research in Drug Development. Explore pioneering discoveries, insightful ideas and new methods from leading researchers in the field.

  5. (PDF) Recent Advances in Drug Discovery: Innovative ...

    Abstract. Drug discovery is a dynamic field constantly evolving with the aim of identifying novel. therapeutic agents to combat various diseases. In this review, we present an overview of recent ...

  6. Research in the Field of Drug Design and Development

    In general, the process of drug development consists of three main stages: ... In our paper, we focused especially on small-molecule drugs, due to the significant amount of information collected on this extremely important topic, and the experiences of the authors, and the partners of the international project in this field of research ...

  7. Drug discovery and development: Role of basic biological research

    This article provides a brief overview of the processes of drug discovery and development. Our aim is to help scientists whose research may be relevant to drug discovery and/or development to frame their research report in a way that appropriately places their findings within the drug discovery and development process and thereby support effective translation of preclinical research to humans.

  8. Drug Design and Discovery: Principles and Applications

    Drug development and discovery includes preclinical research on cell-based and animal models and clinical trials on humans, and finally move forward to the step of obtaining regulatory approval in order to market the drug. ... and rotarod neurotoxicity tests. The study outcomes are presented in their paper .

  9. An overview of drug discovery and development

    Abstract. A new medicine will take an average of 10-15 years and more than US$2 billion before it can reach the pharmacy shelf. Traditionally, drug discovery relied on natural products as the main source of new drug entities, but was later shifted toward high-throughput synthesis and combinatorial chemistry-based development.

  10. The Stages of Drug Discovery and Development Process

    involves identification of a drug chemical. therapeutically useful in treating and management of. a disease condition. Typically, researchers find out new. drugs throug h new visions into a ...

  11. Drug development

    Drug development describes the process of developing a new drug that effectively targets a specific weakness in a cell. This process involves specific pre-clinical development and testing ...

  12. Deep learning in drug discovery: an integrative review and future

    Recently, using artificial intelligence (AI) in drug discovery has received much attention since it significantly shortens the time and cost of developing new drugs. Deep learning (DL)-based approaches are increasingly being used in all stages of drug development as DL technology advances, and drug-related data grows. Therefore, this paper presents a systematic Literature review (SLR) that ...

  13. CADD, AI and ML in drug discovery: A comprehensive review

    Computer-aided drug design (CADD) is an emerging field that has drawn a lot of interest because of its potential to expedite and lower the cost of the drug development process. Drug discovery research is expensive and time-consuming, and it frequently took 10-15 years for a drug to be commercially available.

  14. Machine learning applications in drug development

    Nonetheless, the field of research seems ripe enough to be applied in at least semi-automated drug development pipelines; there is a growing number of papers published on both relevant data processing, and on algorithms applied for biomedical purposes.

  15. Design of experiments (DoE) in pharmaceutical development

    Among various mathematical modeling approaches, Design of Experiments (DoE) is extensively used for the implementation of QbD in both research and industrial settings. In QbD, product and process understanding is the key enabler of assuring quality in the final product. Knowledge is achieved by establishing models correlating the inputs with ...

  16. Chemistry and Pharmacology of Drug Discovery

    Case studies of 20 successful FDA-approved drugs, from biological rationale to clinical efficacy studies and state-of-the-art applications Chemistry and Pharmacology of Drug Discovery illustrates how chemistry, biology, pharmacokinetics, and a host of disciplines come together to produce successful medicines, discussing a total of 20 drugs that are all FDA-approved post 2021&#8212;some of ...

  17. The Pharmaceutical Industry and the Future of Drug Development

    This greater role for the state was paralleled on both sides of the Atlantic, with increasing government regulation of medicine production. The Pharmaceutical Industry and the Future of Drug ...

  18. White Papers

    Tufts CSDD and Janssen Research & Development (JRD) collaborated on a pilot study to develop and test a model capable of predicting the likelihood of marketing approval for oncology NMEs and NBEs. The study assessed the predictive power of select new compound characteristics. Access a copy of this White Paper Report.

  19. Drug Design—Past, Present, Future

    Drug design is a complex pharmaceutical science with a long history. Many achievements have been made in the field of drug design since the end of 19th century, when Emil Fisher suggested that the drug-receptor interaction resembles the key and lock interplay. Gradually, drug design has been transformed into a coherent and well-organized ...

  20. Artificial Intelligence and Machine Learning for Drug Development

    FDA recognizes the increased use of AI/ML throughout the drug development life cycle and across a range of therapeutic areas. In fact, FDA has seen a significant increase in the number of drug and ...

  21. Frontiers

    Editorial: Supporting Pediatric Drug Development: From Basic Research to Clinical Studies and Technological Advancements ... Keywords: Pediatric drug development, Children, Pediatric treatment, EPTRI, preclincical pediatric research Received: 07 Aug 2024; Accepted: 12 Aug 2024.

  22. Drug development Research Papers

    The open-source model permits for the sharing of data, resources, compounds, clinical molecules, small libraries, and screening platforms to cost-effectively advance old drugs and/or candidates into clinical re-development. Clearly, at the core of drug-repurposing activities is collaboration, in many cases progressing beyond the open sharing of ...

  23. What is Pharmaceutical Research and Development?

    In 2022, the combined research and development (R&D) expenditure of the U.S. pharmaceutical industry consisted of more than $100 billion. While this is a significantly expansive market ...

  24. Artificial intelligence and machine learning in drug discovery and

    One study that demonstrates the application of machine learning in the area of drug discovery was done by Margulis and colleagues [13], which looks at how intensely bitter molecules can be identified with the help of machine learning in the early stages of drug development.The aim was to determine a certain machine learning algorithm could be used as a substitute for animal testing to predict ...

  25. Clinical Drug Development

    Clinical drug development has evolved dramatically over time and the field continues to advance today. The methodology of clinical trials has evolved with new techniques being introduced. The design and statistical aspects have become more sophisticated in parallel with the increased use of technology and the introduction of new drug modalities.

  26. AI in Breast Cancer Drug Development and Trials

    The final installment of BCRF's series on AI in breast cancer focuses on how it can bring new treatments to patients—faster. Artificial Intelligence (AI) is already having a transformative impact in healthcare, including on breast cancer detection and diagnosis, risk assessment, and prognosis.And there is much excitement building around AI's potential role in drug discovery and clinical ...

  27. Kynurenine pathway in type 2 diabetes: Role of metformin

    These findings indicated a strong controversy regarding the role of KP in T2D. Therefore, the objectives of this mini review were to clarify how KP induces the development of IR and T2D. In addition, this review aimed to find the mechanistic role of antidiabetic drug metformin on the KP, and how KP modulators affect the pathogenesis of T2D.

  28. Eli Lilly's billions: Can the company keep inventing drugs at ...

    On Tuesday, Eli Lilly unveiled a towering 346,000-square-foot laboratory in Boston's Seaport District, a building that will house 500 of the company's scientists — 300 of whom are still to ...

  29. Has your paper been used to train an AI model? Almost certainly

    Experts say that, if a research paper hasn't yet been used to train a large language model (LLM), it probably will be soon. Researchers are exploring technical ways for authors to spot if their ...

  30. Artificial intelligence in drug discovery and development

    The use of artificial intelligence (AI) has been increasing in various sectors of society, particularly the pharmaceutical industry. In this review, we highlight the use of AI in diverse sectors of the pharmaceutical industry, including drug discovery and development, drug repurposing, improving pharmaceutical productivity, and clinical trials, among others; such use reduces the human workload ...