Drug discovery is a complex, time-consuming, and costly process, traditionally requiring over a decade and billions of dollars to bring a new drug to market. Recent advances in artificial intelligence (AI), particularly machine learning (ML) and deep learning (DL), have transformed multiple stages of the drug discovery pipeline. AI-driven approaches enable faster target identification, improved hit-to-lead optimization, accurate prediction of molecular properties, and more efficient clinical trial design. This review summarizes the core AI methodologies used in drug discovery, highlights key applications across the pipeline, discusses current challenges, and outlines future directions for AI-enabled pharmaceutical research..
The traditional drug discovery process involves target identification, hit discovery, lead optimization, preclinical testing, and clinical trials. High attrition rates, limited biological understanding, and experimental costs have motivated the adoption of computational approaches. AI has emerged as a powerful tool due to its ability to learn complex patterns from large-scale biological and chemical datasets. With the availability of high-throughput screening data, omics data, and chemical libraries, AI is increasingly integrated into modern drug discovery workflows.The vast chemical space, comprising >1060 molecules, fosters the development of a large number of drug molecules [19]. However, the lack of advanced technologies limits the drug development process, making it a time-consuming and expensive task, which can be addressed by using AI [15]. AI can recognize hit and lead compounds, and provide a quicker validation of the drug target and optimization of the drug structure design 19, 20. Different applications of AI in drug discovery are depicted in Figure
Despite its advantages, AI faces some significant data challenges, such as the scale, growth, diversity, and uncertainty of the data. The data sets available for drug development in pharmaceutical companies can involve millions of compounds, and traditional ML tools might not be able to deal with these types of data. Quantitative structure-activity relationship (QSAR)-based computational model can quickly predict large numbers of compounds or simple physicochemical parameters, such as log P or log D. However, these models are some way from the predictions of complex biological properties, such as the efficacy and adverse effects of compounds. In addition, QSAR-based models also face problems such as small training sets, experimental data error in training sets, and lack of experimental validations. To overcome these challenges, recently developed AI approaches, such as DL and relevant modeling studies, can be implemented for safety and efficacy evaluations of drug molecules based on big data modeling and analysis. In 2012, Merck supported a QSAR ML challenge to observe the advantages of DL in the drug discovery process in the pharmaceutical industry. DL models showed significant predictivity compared with traditional ML approaches for 15 absorption, distribution, metabolism, excretion, and toxicity (ADMET) data sets of drug candidates .The virtual chemical space is enormous and suggests a geographical map of molecules by illustrating the distributions of molecules and their properties. The idea behind the illustration of chemical space is to collect positional information about molecules within the space to search for bioactive compounds and, thus, virtual screening (VS) helps to select appropriate molecules for further testing. Several chemical spaces are open access, including PubChem, ChemBank, DrugBank, and ChemDB.Numerous in silico methods to virtual screen compounds from virtual chemical spaces along with structure and ligand-based approaches, provide a better profile analysis, faster elimination of nonlead compounds and selection of drug molecules, with reduced expenditure . Drug design algorithms, such as coulomb matrices and molecular fingerprint recognition, consider the physical, chemical, and toxicological profiles to select a lead compound .Various parameters, such as predictive models, the similarity of molecules, the molecule generation process, and the application of in silico approaches can be used to predict the desired chemical structure of a compound . Pereira et al. presented a new system, DeepVS, for the docking of 40 receptors and 2950 ligands, which showed exceptional performance when 95 000 decoys were tested against these receptors . Another approach applied a multiobjective automated replacement algorithm to optimize the potency profile of a cyclin-dependent kinase-2 inhibitor by assessing its shape similarity, biochemical activity, and physicochemical properties .QSAR modeling tools have been utilized for the identification of potential drug candidates and have evolved into AI-based QSAR approaches, such as linear discriminant analysis (LDA), support vector machines (SVMs), random forest (RF) and decision trees, which can be applied to speed up QSAR analysis . King et al. found a negligible statistical difference when the ability of six AI algorithms to rank anonymous compounds in terms of biological activity was compared with that of traditional approaches .
Applications Across the Drug Discovery Pipeline
Target Identification and Validation
AI models analyze genomic, transcriptomic, proteomic, and biomedical literature data to identify disease-associated targets. Network-based AI approaches help uncover complex disease mechanisms and drug–target interactions.
Role of AI in Target Identification
AI techniques integrate and analyze large-scale datasets generated from genomics, transcriptomics, proteomics, and metabolomics studies. Machine learning algorithms identify disease-associated genes by recognizing patterns and correlations that are often undetectable through conventional statistical approaches. Network-based models further assist in mapping protein–protein interactions and signaling pathways, helping to prioritize targets that are central to disease networks.
Natural Language Processing (NLP) tools are widely used to extract target–disease relationships from vast biomedical literature, patents, and clinical trial databases. This automated literature mining accelerates hypothesis generation and supports evidence-based target selection.
AI-Assisted Target Validation
Once potential targets are identified, AI supports target validation by predicting functional relevance and therapeutic feasibility. Deep learning models assess target druggability by evaluating structural features, binding site accessibility, and similarity to known drug targets. AI also predicts off-target effects and safety concerns early in development.Integrative AI platforms combine biological data with clinical and real-world evidence to validate whether modulation of a target is likely to produce a meaningful therapeutic effect. This data-driven validation reduces the risk of late-stage failure and improves confidence in target selection.
Advantages of AI-Based Target Identification and Validation
AI-driven approaches significantly reduce time and cost by automating data analysis and prioritization. They enable the discovery of novel and non-obvious targets, enhance prediction accuracy, and support personalized medicine by identifying patient-specific targets based on molecular profiles.
Virtual Screening and Hit Identification
AI-driven virtual screening significantly reduces the search space by predicting ligand–target binding affinity, outperforming traditional docking methods in speed and scalability.
AI-Driven Virtual Screening
Traditional virtual screening methods rely on molecular docking and scoring functions, which are often time-consuming and limited in predictive accuracy. AI-based virtual screening employs machine learning (ML) and deep learning (DL) algorithms to predict ligand–target interactions more efficiently. Models such as convolutional neural networks (CNNs), graph neural networks (GNNs), and support vector machines (SVMs) analyze molecular structures and protein features to estimate binding affinity and activity.Ligand-based virtual screening uses AI models trained on known active and inactive compounds to identify new molecules with similar physicochemical and biological properties. Structure-based virtual screening integrates protein structural information to improve interaction predictions, even for large and diverse compound libraries.
Hit Identification
AI accelerates hit identification by ranking compounds based on predicted activity, selectivity, and drug-likeness. Deep learning models evaluate molecular descriptors and fingerprints to prioritize high-quality hits while minimizing false positives. AI-assisted screening can analyze millions of compounds in a fraction of the time required for experimental high-throughput screening.Additionally, AI helps filter compounds based on toxicity and ADMET properties at early stages, ensuring that identified hits have favorable safety and pharmacokinetic profiles.
Advantages of AI in Virtual Screening and Hit Identification
AI-driven approaches significantly reduce computational cost, increase screening speed, and improve hit rates. They allow efficient exploration of vast chemical spaces and enable identification of novel chemical scaffolds that may not be detected using conventional methods.
Lead Optimization
Predictive AI models assist in optimizing pharmacokinetic and pharmacodynamic properties, including absorption, distribution, metabolism, excretion, and toxicity (ADMET).
Role of AI in Lead Optimization
AI models analyze structure–activity relationship (SAR) data to predict how chemical modifications influence potency and selectivity. Machine learning algorithms such as random forest, support vector machines, and deep neural networks evaluate molecular descriptors and fingerprints to guide rational optimization strategies.Deep learning architectures, including graph neural networks (GNNs), capture complex molecular structures and interactions, allowing precise prediction of binding affinity and biological activity. These models help medicinal chemists prioritize structural modifications with the highest likelihood of success.
Generative AI for Molecular Design
Generative AI techniques such as variational autoencoders (VAEs), generative adversarial networks (GANs), and reinforcement learning are increasingly used to design novel chemical structures. These models generate optimized molecules that satisfy multiple objectives simultaneously, including potency, solubility, metabolic stability, and synthetic feasibility.Reinforcement learning approaches iteratively refine molecular structures by rewarding desirable properties, enabling efficient exploration of chemical space beyond traditional trial-and-error methods.
Multi-Parameter Optimization
AI facilitates multi-parameter optimization by balancing efficacy, safety, and pharmacokinetic properties. Predictive models simultaneously assess parameters such as lipophilicity, solubility, permeability, metabolic stability, and toxicity, reducing the risk of late-stage failures.
Advantages of AI-Driven Lead Optimization
AI significantly reduces development time and cost by automating SAR analysis and molecular design. It enhances decision-making accuracy, enables discovery of novel scaffolds, and improves success rates in advancing optimized leads toward preclinical development.
Drug Repurposing
AI enables rapid identification of new therapeutic indications for existing drugs by integrating chemical, biological, and clinical data, as demonstrated during the COVID-19 pandemic.
Role of AI in Drug Repurposing
AI integrates chemical, biological, genomic, and clinical data to uncover novel drug–disease relationships. Machine learning models analyze gene expression profiles, protein–protein interaction networks, and pathway data to predict how existing drugs can modulate disease-related biological processes. These data-driven approaches enable the identification of repurposing opportunities that may not be evident through conventional methods.
Natural Language Processing (NLP) techniques further support drug repurposing by mining scientific literature, clinical trial reports, electronic health records, and adverse event databases. NLP-based models extract valuable insights on drug effects, side effects, and disease associations, facilitating hypothesis generation for new indications.
Network-Based and Similarity Approaches
AI-driven network pharmacology models construct drug–target–disease interaction networks to identify potential repositioning candidates. Similarity-based algorithms compare molecular structures, gene expression signatures, or pharmacological profiles to match existing drugs with diseases exhibiting similar biological characteristics.Deep learning models, including graph neural networks, enhance prediction accuracy by capturing complex relationships within these networks and prioritizing high-confidence repurposing candidates.
Advantages of AI-Based Drug Repurposing
AI-enabled drug repurposing significantly reduces risk by leveraging known safety and pharmacokinetic data. It enables rapid response to emerging diseases, supports personalized treatment strategies, and improves the efficiency of translational research.
Clinical Trial Design
AI supports patient stratification, biomarker discovery, and trial outcome prediction, improving trial efficiency and success rates.
Role of AI in Clinical Trial Design
AI algorithms analyze large volumes of clinical, genomic, and real-world data to support data-driven trial planning. Machine learning models evaluate historical trial data and electronic health records (EHRs) to identify suitable patient populations, define inclusion and exclusion criteria, and predict patient responses to treatment.AI also enables adaptive trial designs by continuously analyzing incoming trial data and suggesting protocol modifications, such as dose adjustments or cohort expansion, to improve trial outcomes.
Patient Recruitment and Stratification
Patient recruitment is a major challenge in clinical trials. AI-based tools use NLP and predictive analytics to screen EHRs and medical databases, identifying eligible participants more efficiently. AI also supports patient stratification by grouping participants based on genetic, molecular, or clinical characteristics, enhancing precision medicine and reducing variability in trial results.
Outcome Prediction and Risk Assessment
AI models predict clinical trial outcomes by assessing efficacy, safety, and potential adverse events. Predictive analytics help identify high-risk patients, anticipate trial failures, and optimize endpoints, thereby reducing late-stage attrition and development costs.
Advantages of AI in Clinical Trial Design
AI-driven clinical trial design improves trial efficiency, reduces timelines, enhances patient safety, and increases the likelihood of regulatory success. It supports personalized treatment strategies and facilitates evidence-based decision-making throughout the trial lifecycle.
Advantages of AI in Drug Discovery
Reduced cost and development time
Improved prediction accuracy
Ability to handle high-dimensional, multimodal data
Discovery of novel chemical space beyond human intuition
1. Reduced Time and Cost
AI significantly shortens the drug discovery timeline by automating data analysis, virtual screening, and lead optimization. Tasks that traditionally take years can now be completed in months, leading to substantial reductions in research and development costs.
2. Improved Target Identification
AI analyzes large-scale biological datasets to identify and validate novel drug targets with higher accuracy. This reduces the likelihood of selecting ineffective targets and lowers the risk of late-stage clinical failure.
3. Enhanced Hit and Lead Discovery
AI-driven virtual screening enables rapid evaluation of millions of compounds, increasing hit rates and identifying novel chemical scaffolds. AI also improves structure–activity relationship (SAR) analysis, leading to better-quality lead compounds.
4. Better Prediction of ADMET Properties
Machine learning models accurately predict absorption, distribution, metabolism, excretion, and toxicity (ADMET) profiles at early stages. Early toxicity prediction reduces late-stage failures and improves overall drug safety.
5. Support for Drug Repurposing
AI accelerates drug repurposing by identifying new therapeutic indications for existing drugs using biological, chemical, and clinical data. This approach reduces development risk and allows faster entry into clinical trials.
6. Improved Clinical Trial Efficiency
AI optimizes clinical trial design through better patient selection, stratification, and outcome prediction. This enhances trial success rates, reduces patient dropout, and improves regulatory approval prospects.
7. Data Integration and Decision Support
AI integrates diverse data sources such as genomics, proteomics, chemical libraries, and real-world evidence. This holistic analysis supports informed decision-making and reduces human bias.
8. Personalized and Precision Medicine
AI enables personalized drug discovery by identifying patient-specific targets and predicting individual treatment responses. This supports the development of precision medicines tailored to specific populations.
Challenges and Limitations
Despite its promise, AI in drug discovery faces several challenges:
Limited availability of high-quality, unbiased data
Lack of model interpretability and explainability
Poor generalization across biological domains
Regulatory and ethical concerns
Integration with experimental validation
1. Data Quality and Availability
AI models rely heavily on large, high-quality, and well-annotated datasets. In drug discovery, available data are often incomplete, noisy, biased, or proprietary. Poor data quality can lead to inaccurate predictions and unreliable outcomes, limiting the effectiveness of AI-driven approaches.
2. Lack of Data Standardization
Biological and chemical data originate from diverse sources and are generated using different experimental protocols. The absence of standardized data formats and reporting practices complicates data integration and model training, reducing reproducibility and model robustness.
3. Model Interpretability and Transparency
Many AI models, particularly deep learning architectures, function as “black boxes,” making it difficult to interpret their predictions. Lack of explainability hinders trust among researchers and regulators, especially in high-stakes decisions related to safety and efficacy.
4. Limited Generalizability
AI models trained on specific datasets may perform poorly when applied to new targets, diseases, or chemical spaces. Overfitting and limited chemical diversity in training data reduce the ability of models to generalize across different drug discovery scenarios.
5. Computational and Technical Constraints
AI-driven drug discovery requires substantial computational resources, specialized infrastructure, and technical expertise. High costs and limited access to advanced computing facilities may restrict adoption, particularly in academic and small-scale research settings.
6. Integration with Experimental Validation
AI predictions must be experimentally validated, which remains time-consuming and expensive. Discrepancies between in silico predictions and in vitro or in vivo results can limit confidence in AI-generated outcomes.
7. Regulatory and Ethical Challenges
The use of AI in drug discovery raises regulatory concerns regarding data privacy, model validation, and accountability. Regulatory agencies require transparent and reproducible evidence, which can be challenging to provide with complex AI models.
8. Bias and Ethical Concerns
Bias in training data can lead to biased predictions, potentially affecting drug safety and efficacy across different populations. Ethical concerns related to data usage, patient privacy, and fairness must be carefully addressed.
Future Directions
Future research is expected to focus on explainable AI, integration of physics-based and data-driven models, multimodal learning, and tighter coupling of AI predictions with automated laboratory experiments. Collaborative efforts between academia, industry, and regulatory agencies will be crucial for translating AI innovations into approved therapeutics.
1.Integration of Multi-Omics Data
Future AI systems will increasingly integrate multi-omics datasets, including genomics, transcriptomics, proteomics, metabolomics, and epigenomics. This holistic data integration will enable more accurate target identification, disease stratification, and personalized drug discovery approaches.
2. Explainable and Interpretable AI
Developing explainable AI (XAI) models will be a major focus to address the “black-box” nature of current deep learning methods. Transparent models will enhance trust among researchers and regulatory agencies by providing clear rationale for predictions related to target selection, toxicity, and efficacy.
3. Generative AI and De Novo Drug Design
Advances in generative AI, including large language models, graph-based models, and reinforcement learning, will enable efficient de novo design of drug molecules. These systems will generate optimized compounds with improved efficacy, safety, and synthetic feasibility, accelerating lead discovery and optimization.
4. AI-Driven Automation and Closed-Loop Systems
Future drug discovery platforms will integrate AI with laboratory automation to create closed-loop systems. These systems will continuously design, synthesize, test, and optimize compounds with minimal human intervention, significantly reducing development timelines.
5. Improved Prediction of Clinical Success
AI models will increasingly incorporate real-world evidence, electronic health records, and clinical trial data to better predict clinical outcomes. This will reduce late-stage failures and improve the probability of regulatory approval.
6. Personalized and Precision Medicine
AI will enable personalized drug discovery by identifying patient-specific targets and predicting individual treatment responses. This approach will support the development of precision therapies tailored to genetic and molecular profiles.
7. Enhanced Drug Repurposing and Pandemic Preparedness
AI-driven drug repurposing will continue to expand, enabling rapid identification of therapeutic options for emerging diseases. This capability will strengthen global preparedness for pandemics and public health emergencies.
8. Regulatory Acceptance and Standardization
Future efforts will focus on establishing regulatory frameworks, validation standards, and best practices for AI models in drug discovery. Increased collaboration between researchers, industry, and regulatory bodies will support broader adoption and compliance.
CONCLUSION
AI has become an indispensable component of modern drug discovery, offering transformative improvements in efficiency and innovation. While challenges remain, continued advances in algorithms, data quality, and interdisciplinary collaboration are likely to make AI-driven drug discovery a cornerstone of pharmaceutical research. The use of AI technology in drug design has grown rapidly due to its predictive ability and accuracy. This review highlights the numerous applications of AI in all phases of drug development, from disease diagnosis to post-marketing analysis. AI helps in the early prediction of diseases, the development of personalized medicine, optimization of drug doses, and the prediction of treatment outcomes. Additionally, AI assists in target and lead identification through the prediction of protein structures and biological activities of small molecules. AI technology can also predict drug-like properties and off-target effects of new compounds, reducing the need for experimental validation. Furthermore, AI-driven approaches improve patient stratification, recruitment, monitoring, and follow-ups in clinical trials, and can even assist in FDA approvals and pharmacovigilance. The integration of AI in drug design has resulted in faster drug discovery, cost savings, reduced resource and manpower usage, and decreased attrition rates in clinical trials. Additionally, AI helps to minimize the use of in vivo bioassays, reducing animal sacrifice. AI has far-reaching applications beyond medicine, including healthcare management, surgeries, mRNA vaccination, preventive treatments, and nutrigenomics. However, it is important to note that AI models are meant to complement human intelligence, not replace it. AI models may have comparable or better predictive ability than human researchers, but they still lack human intuition. Predictions made by AI machines must be verified by humans, as AI models can provide false positive and false negative results, compromising the sensitivity and specificity of the model. Additionally, resource sustainability needs holistic solutions like cost-aware cross-layer co-design, integrating hardware, algorithms, and models for efficient exploration of resource-sustainable configurations. Consensus-based distributed learning is suggested to fully utilize existing and future computing infrastructures, incorporating Internet-of-Things devices and edge servers for data sharing while ensuring privacy. Stable infrastructures with AI-enhanced resource allocation are recommended, involving dedicated healthcare AI infrastructures compliant with evolving government regulations. Lastly, interpretable self-supervised learning is proposed to address the sustainability issue in domain expertise, enhancing trust by extracting clinically useful features and providing human-interpretable evidence in healthcare applications. There are numerous challenges associated with AI, including the explainability of models, the quality and suitability of data used to train models, avoiding bias and overfitting, resource sustainability and more. It is crucial to remain aware of the limitations and risks associated with AI technology. Opportunities for improvement in AI technology include minimizing dependence on supercomputing power, addressing ethical concerns surrounding data collection, and implementing AI in a controlled manner in the healthcare sector to limit negative consequences. It is possible that the future of AI-assisted drug discovery lies in developing a virtual human with complete complexity, allowing for accurate predictions of all possible interactions between molecules and exploring all therapeutic potentials and adverse side effects.
REFERENCES
Yogesh Kakrambe, Dr. Kardile Prabhakar, Rajkumar Shete, Sambodhi Patil, Sandip pawar , Artificial Intelligence (AI) In Drug Discovery, Int. J. of Pharm. Sci., 2026, Vol 4, Issue 2, 2034-2050. https://doi.org/10.5281/zenodo.18628484
10.5281/zenodo.18628484