K. V. N. Naik S. P. Sanstha’s, Institute of Pharmaceutical Education & Research, Canada Corner, Nashik, 422002, Maharashtra, India.
Molecular docking is a computational technique essential to structure-based drug discovery, predicting ligand–receptor binding modes and affinity. Evolved from rigid-body models to advanced flexible and AI-driven approaches, it integrates conformational exploration with binding energy estimation. Core methodologies employ shape complementarity matching and simulation-driven strategies using genetic algorithms and gradient-based optimization, supported by force field-based, empirical, knowledge-based, and consensus scoring functions. Major software platforms including AutoDock, Glide, GOLD, and DOCK Suite streamline protein preparation, ligand setup, binding site detection, and result analysis. Docking applications encompass virtual screening, lead optimization, de novo design, and drug repurposing in cancer, neurological, and metabolic disorders. Key challenges include scoring function accuracy limitations, protein flexibility constraints, and conformational sampling difficulties. Experimental validation through crystallography and spectroscopic methods remains critical for confirming predictions. Future advances in machine learning, ensemble approaches, and cloud computing will enhance docking reliability and scalability in drug discovery.
Molecular docking is an important computer-based method used in modern drug discovery. It helps scientists find and improve new medicines by studying how drug molecules fit and interact with their target proteins. This technique plays a key role in speeding up and simplifying the process of developing new drugs.
Overview of Molecular Docking:
Molecular docking is a computer-based method used to predict how small molecules (ligands) attach to larger molecules like proteins. It helps understand how they fit together and form stable, energy-efficient complexes. The process involves two key stages: a conformational search that explores possible ligand positions and orientations within the protein’s binding site, and a scoring phase? that evaluates these poses based on binding affinity using factors such as geometric complementarity, electrostatic and van der Waals interactions, hydrogen bonding, hydrophobic effects, and desolvation energy1.
Docking is based on shape and chemical complementarity, illustrated by the “lock-and-key” and “induced-fit” models. In the induced-fit model, both the ligand and protein adjust their shapes to form a stable, low-energy complex. This provides detailed insight into molecular interactions, helping researchers understand drug–target binding, identify key residues, and explain structure–activity relationships2.
Historical Development and Evolution:
Molecular docking has evolved greatly since the early 1980s. The first program, DOCK (1982), introduced by Kuntz and colleagues, used simple rigid-body models based on geometric matching. In the 1990s and early 2000s, semi-flexible docking methods were developed, allowing ligand flexibility while keeping proteins rigid, with tools like FlexX, GOLD, and AutoDock improving accuracy. Later, fully flexible docking incorporated movements of both ligands and proteins using ensemble docking and molecular dynamics for more realistic modeling. Recently, AI and machine learning have further advanced docking by improving conformational sampling and scoring, leading to faster and more accurate prediction of binding affinities and large-scale virtual screening3.
Importance in Modern Drug Discovery:
Molecular docking plays a crucial role in today’s drug discovery by supporting structure-based design, virtual screening, and lead optimization. In structure-based design, it helps identify and design compounds that fit precisely into target binding sites, improving hit rates and reducing unwanted interactions. Through virtual screening, docking enables the rapid evaluation of millions of compounds to find promising candidates more efficiently than traditional methods. In lead optimization, it assists in refining hits to enhance binding strength, selectivity, and pharmacological properties.Molecular docking provides faster ,cost-effective, and ethical alternative to experimental approaches, greatly accelerating the discovery of potent and specific drug molecules4.
Scope and Organization of the Review:
This review covers both the theoretical and practical aspects of molecular docking, including its basic principles, binding energetics, and computational algorithms. It discusses key docking methods, search strategies, scoring functions, and essential steps such as ligand preparation, protein flexibility, and validation. Advanced applications like fragment-based design, covalent and peptide docking, and protein–protein interaction modeling are also highlighted. Emerging trends such as AI-driven docking, ultra-large library screening, and allosteric site targeting are explored, emphasizing the expanding role of molecular docking in next-generation drug discovery5.
Fundamental Principles of Molecular Docking:
Molecular docking rests upon well-established physicochemical and thermodynamic principles that govern molecular recognition and binding between biological macromolecules and small molecule ligands. Understanding these fundamental concepts is essential for appreciating both the theoretical foundation and practical application of docking methodologies in drug discovery.
Lock-and-Key vs. Induced-Fit Models:
The understanding of molecular recognition has evolved from the classical lock-and-key model to the more dynamic induced-fit model.Modern research further suggests a balance between conformational selection and induced fit, showing that proteins exist in multiple conformations even before binding. In molecular docking, this theoretical shift has influenced algorithmic development—from early rigid-body (lock-and-key) methods to flexible and dynamic docking approaches that account for protein movement and ligand-induced conformational changes. This evolution has enhanced docking accuracy and better reflects real biological interactions6.
Fig. 1: Lock-and-Key vs Induced-Fit Mechanisms of Enzyme Activity
Molecular Recognition and Complementarity:
Molecular recognition represents the fundamental process by which a ligand specifically identifies and binds its target protein through complementary interactions. This specificity arises from the integration of multiple forms of complementarity geometric, electrostatic, hydrophobic, and hydrogen-bonding that collectively determine binding selectivity and affinity7.
Shape complementarity is a key factor in molecular recognition, describing how well a ligand fits into a protein’s binding pocket. Optimal fit minimizes steric clashes and maximizes surface contact, enhancing binding affinity. Studies show that high shape complementarity strongly predicts binding strength and occurs consistently across various protein-ligand systems, establishing geometric fit as a universal principle of molecular interaction8.
Physicochemical complementarity encompasses electrostatic, van der Waals, hydrophobic, and desolvation interactions that collectively drive and stabilize ligand-receptor binding.Van der Waals interactions, though weak individually, collectively stabilize ligand-protein binding by ensuring optimal atomic contact distances. These forces prevent steric clashes and excessive gaps, complementing shape fit and reinforcing overall molecular recognition9.
Hydrophobic interactions drive ligand-protein binding by clustering nonpolar regions and displacing water from binding sites. This effect provides significant binding energy and stability, with hydrophobic complementarity strongly correlating with binding affinity and occurrence across diverse protein systems10.
Hydrogen bonding between donor and acceptor groups on ligands and proteins provides additional specificity and binding energy that enhancing selectivity beyond geometric and hydrophobic effects11. Hydrogen bonding enhances ligand-protein specificity and stability by forming directional interactions that anchor the complex.Desolvation plays a key role in molecular recognition, as water displacement during binding affects overall energy. Effective ligands balance favorable hydrophobic burial with minimal desolvation penalties, promoting stable and efficient binding12.
Binding Free Energy:
The thermodynamic stability and spontaneity of ligand-receptor complex formation are quantitatively described by binding free energy, a concept central to both understanding and predicting protein-ligand interactions in molecular docking. Gibbs free energy change (ΔG) represents the fundamental thermodynamic parameter governing binding thermodynamics, expressing the energy available to drive ligand-protein complex formation13.
Ligand binding is governed by enthalpic and entropic contributions. Entropy involves conformational and solvation changes; although binding restricts molecular motion driving stability. Docking scoring functions estimate binding free energy using simplified models that combine van der Waals, electrostatic, hydrogen bonding, and desolvation terms14.
Molecular Docking Methodologies:
Molecular docking approaches vary widely to balance computational efficiency and biological accuracy. They are generally classified based on treatment of molecular flexibility, sampling algorithms for conformational exploration, and the strategies used for pose prediction and scoring.
Classification Based on Flexibility:
Molecular docking methods differ in how they handle flexibility, ranging from fast rigid models to more accurate fully flexible approaches.
Rigid Docking:
Rigid docking is a computational technique in which both the ligand and receptor are considered fixed three-dimensional structures, allowing only translational and rotational movements within six degrees of freedom15.
The main advantage of rigid docking is its high computational efficiency. Due to the limited conformational search space, it enables rapid screening of large compound libraries and is therefore effective for primary filtering in high-throughput virtual screening workflows16.
A key limitation of rigid docking is its inability to represent conformational flexibility in either ligand or protein. It fails to capture induced-fit effects or side-chain rearrangements that occur during binding, resulting in reduced predictive accuracy. Consequently, important interactions such as hydrogen bonding or accommodation of bulky side chains are often missed17.
Semi-Flexible Docking:
Semi-flexible docking, also known as flexible ligand/rigid receptor docking, treats the ligand as flexible while keeping the receptor rigid. This approach provides a balance between computational efficiency and predictive accuracy, making it the most widely applied strategy in modern structure-based drug discovery18.
Semi-flexible docking offers higher accuracy than rigid docking by accounting for ligand flexibility. It is implemented in major docking tools such as AutoDock, FlexX, Glide, and Surflex, employing algorithms like Monte Carlo simulated annealing and genetic optimization for conformational sampling19. Despite these improvements, semi-flexible docking assumes a rigid protein structure, reducing accuracy when significant receptor conformational changes occur. Cross-docking studies reveal accuracy loss when ligands are docked into alternative protein conformations, particularly in highly flexible targets such as kinases and proteases20.
Fig. 2: Types of Molecular Docking Based on Structural Flexibility
Flexible Docking:
Flexible docking, or fully flexible docking, allows both ligand and receptor to undergo conformational changes during docking, providing a biologically realistic model of molecular recognition 21.
The principal application domains for flexible docking include enzyme inhibition studies where active-site architecture changes substantially upon inhibitor binding, and protein-protein interaction modeling where substantial conformational rearrangement often occurs upon complex formation. Flexible docking proves particularly valuable for kinase inhibitor design, where hinge-bending between the two kinase lobes and loop rearrangements frequently accompany inhibitor binding22.
The limitations of flexible docking include high computational cost and challenges in accurate conformational sampling, as allowing flexibility in protein side chains greatly increases the number of degrees of freedom beyond the ligand’s typical range23.
Induced-Fit Docking:
Induced-fit docking provides a balanced approach between rigid and fully flexible docking by modelling protein conformational changes during ligand binding through a structured, two-step mechanism that separates ligand sampling from receptor adjustment24 .
In this process, the ligand is first docked into a rigid receptor, followed by receptor side-chain and backbone relaxation to accommodate the ligand, and subsequent redocking and energy minimization. The advanced IFD-MD variant further improves accuracy, reaching an 85% success rate in 258 cross-docking cases. Other implementations, such as Rosetta Ligand, Adaptive BP-Dock, and CHARMM-GUI IFD, employ various strategies like rotamer sampling and solvent-based refinement, achieving up to 80% success in cross-docking predictions25.
Docking Approaches and Algorithms:
Molecular docking aims to identify optimal ligand–receptor binding poses within high-dimensional conformational space using two broad strategies: shape complementarity and simulation-based approaches. Shape complementarity methods rely on geometric matching of ligand and receptor surfaces, using molecular surface definitions and Fourier shape descriptors to identify spatial complementarity without explicit energy calculations26.
In contrast, simulation-based approaches evaluate ligand–protein interaction energies through force fields such as AMBER, CHARMM, and OPLS, explicitly computing van der Waals, electrostatic, and hydrogen-bonding interactions27.
Search Algorithms:
Efficient search algorithms address the challenge of exploring ligand conformational space. Genetic algorithms, as in AutoDock and GOLD, evolve ligand poses through crossover, mutation, and selection, while hybrid Lamarckian implementations integrate local optimization for improved convergence 28.
Specialized Approaches:
Beyond conventional rigid-body and semi-flexible docking paradigms, several specialized approaches address particular challenges in drug discovery. Fragment-based incremental construction in FlexX reduces computational burden by docking rigid fragments sequentially. Gradient-based optimizers, exemplified by AutoDock Vina, refine poses along energy gradients for rapid and accurate docking. Inverse docking screens a ligand against multiple proteins to identify potential off-target interactions and repurposing opportunities using normalized Z-score rankings 29.
Scoring Functions:
Scoring functions are mathematical models that evaluate protein-ligand interactions by converting 3D docking poses into numerical values reflecting binding affinity. They are crucial for distinguishing favorable binding modes and directly determine the accuracy and reliability of molecular docking predictions.
Types of Scoring Functions:
Molecular docking scoring functions can be classified into three major categories reflecting their underlying theoretical frameworks and derivation methodologies.
1.Force Field-Based Scoring Functions:
Force field-based scoring functions estimate binding free energy using classical molecular mechanics formulations that sum van der Waals, electrostatic, solvation, and conformational entropy terms30.
ΔGbind=ΔEvdW+ΔEelec+ΔGsolv+ΔSconf
The van der Waals interactions, modeled via the Lennard-Jones potential, describe nonbonded contacts that favor optimal atomic packing while penalizing steric clashes or excessive separations. Electrostatic contributions are computed using Coulomb’s law with a distance-dependent dielectric function to capture solvent screening effects efficiently, while more sophisticated implementations use Poisson–Boltzmann or Generalized Born models for improved electrostatic accuracy. Solvation energy is generally estimated from solvent-accessible surface area (SASA), capturing hydrophobic interactions and desolvation penalties associated with binding. Prominent examples include DOCK, AutoDock, GOLD/GoldScore, and DockThor, which integrate these physical principles in varying degrees. AutoDock 4.2, for instance, employs a semi-empirical free energy model combining six pairwise potential terms with conformational entropy correction, offering enhanced modeling for halogen and metal ion interactions31.
2.Empirical Scoring Functions:
Empirical scoring functions estimate binding free energy by summing weighted terms that represent various interaction contributions, such as van der Waals forces, hydrogen bonding, hydrophobic effects, desolvation, and entropy losses32.
These functions employ the general form:
ΔGbind=a⋅ΔEvdW+b⋅ΔEhbond+c⋅ΔEhydrophobic+d⋅ΔEentropy+...
The coefficients of these terms are determined by regression fitting to experimental binding affinities, allowing the model to directly capture empirical trends rather than relying solely on physical laws. This approach originated with Böhm’s LUDI scoring function, which established the principle of parameterizing scoring terms using experimental data to improve correlation between predicted and observed affinities. Representative examples include ChemScore, GlideScore, PLP, PLANTS/CHEMPLP, X-Score, and Surflex, many of which introduce additional terms for metal coordination and torsional entropy, optimized through techniques such as ant colony optimization33. Limitations include the risk of overfitting, dependence on high-quality experimental data, poor transferability across different target classes, and limited treatment of solvation and entropy effects34.
3.Knowledge-Based Scoring Functions:
Knowledge-based scoring functions estimate binding affinity by deriving interaction potentials from statistical analysis of experimentally determined protein–ligand complexes deposited in structural databases such as the Protein Data Bank. These functions calculate mean force potentials based on
W(r)=-RTln?(Nobs(r)Nexp(r))
Knowledge-based methods offer key advantages such as independence from parameter fitting, inherent incorporation of favorable and unfavorable interactions through statistical trends, and broad applicability across diverse targets without retraining. Limitations of knowledge-based scoring functions include dependence on database size and quality, simplified treatment of solvation and entropy effects, and potential biases from uneven structural data representation36.
4.Consensus Scoring:
Consensus scoring combines results from multiple scoring functions to improve ligand ranking accuracy and reduce both false positives and false negatives in virtual screening. It operates on the principle that integrating diverse scoring models provides more reliable predictions than any single function. Two main strategies are used: score-based combinations, which average or weight raw docking scores, and rank-based combinations, which merge compound ranks to overcome the incommensurability of scoring scales37.
Among advanced implementations, the exponential consensus ranking (ECR) method has shown superior performance by weighting individual scoring functions using exponential distributions. Advantages of consensus scoring include reduced false positive rates through requiring agreement among independent methods, improved discriminatory ability compared to single functions in many benchmarking studies, and parameter-independence in rank-based approaches. Consensus approaches prove particularly effective in virtual screening campaigns where elimination of false positives is critical.?Limitations of consensus scoring include dependence on the quality and diversity of component scoring functions, increased computational cost, and the need for empirical optimization of weighting schemes for specific targets38.
Goals of Scoring Functions:
Scoring functions in structure-based drug discovery have three main goals: predicting the correct binding pose, identifying active compounds in virtual screening, and estimating binding affinity. Pose prediction focuses on reproducing the experimentally observed binding mode. Virtual screening aims to rank true actives within the top fraction of large compound libraries. Binding affinity prediction is the most challenging goal, seeking accurate free energy estimates for potency ranking. Because these goals sometimes conflict, specialized scoring functions are often developed for each task rather than relying on a single universal model39.
Molecular Docking Software and Tools:
Molecular docking software implementing diverse algorithms and scoring functions falls into three categories:
|
Category |
Examples |
Accessibility |
Advantages |
Limitations |
|
Commercial Software |
Schrödinger’s Glide, GOLD, MOE, FlexX |
Licensed (paid) |
High accuracy, integrated analysis tools, strong technical support |
High cost, limited accessibility for academics |
|
Academic Software |
AutoDock, UCSF DOCK |
Free for academic/nonprofit use |
Cost-free, widely used, good reliability |
Limited support, less user-friendly interface |
|
Open-Source Software |
AutoDock Vina, rDock |
Completely free (open-source license) |
Full transparency, customizable algorithms, promotes innovation |
Requires technical expertise, minimal support |
Widely Used Tools:
|
Software |
Developer/Source |
Key Features |
Notable Strengths |
|
AutoDock Suite |
Scripps Research Institute |
Lamarckian genetic algorithm (AutoDock 4.2), gradient-based optimization and multithreading (Vina), peptide docking (CrankPep), flexible side chains (FR) |
Widely used open-source platform offering a balance of flexibility and speed |
|
GOLD |
Cambridge Crystallographic Data Centre |
Genetic algorithms with partial protein flexibility, multiple scoring functions, explicit water displacement |
High docking accuracy with flexible receptor modeling |
|
Glide |
Schrödinger |
Hierarchical docking funnel with HTVS, SP, and XP modes; Induced Fit Docking; advanced hydrogen bonding and pi interactions |
High accuracy, adaptable receptor modeling, and effective water repositioning |
|
MOE-Dock |
Chemical Computing Group |
Flexible docking, ensemble and template-based approaches, multiple scoring functions integrated with cheminformatics and visualization |
Comprehensive modeling and analysis platform |
|
FlexX |
BioSolveIT |
Anchor-and-grow fragment-based docking; supports pharmacophore constraints, covalent docking, multi-core parallelization |
Extremely fast screening with good docking accuracy |
|
DOCK Suite |
University of California, San Francisco (UCSF) |
Sphere-based geometric matching; anchor-and-grow flexible ligand docking; multiple scoring function options |
Pioneer tool efficient for large-scale library screening |
|
SwissDock |
Swiss Institute of Bioinformatics |
Cloud-based docking using CHARMM energetics; supports both blind and local docking |
Highly accessible web-based platform requiring no installation |
|
Discovery Studio |
BIOVIA (Dassault Systèmes) |
Integrates CDOCKER (CHARMM-based Monte Carlo), GOLD, LibDock, and LigandFit docking algorithms |
Multi-algorithm enterprise platform with advanced molecular modeling tools |
Molecular Docking Workflow and Protocol:
The successful application of molecular docking requires systematic execution of a multi-stage workflow converting raw biomolecular structures into reliable binding predictions. This workflow encompasses protein preparation, ligand preparation, binding site identification, docking execution, and result analysis, each stage critically influencing downstream accuracy and reliability40.
General Docking Procedure:
1.Target Protein Preparation:
Target protein preparation involves obtaining structures from the Protein Data Bank or predictive models like AlphaFold, selecting ligand-bound conformations, removing nonessential molecules, adding hydrogens, and performing energy minimization to refine the structure. Nonessential molecules like water, ions, and bound ligands are removed, keeping only those important for binding. Hydrogen atoms are added, and missing residues or bonds are fixed using tools like Schrödinger’s Protein Preparation Workflow or MolProbity41.
2.Ligand Preparation:
Ligands for molecular docking are sourced from databases like ZINC and PubChem or designed for virtual screening libraries containing thousands to millions of compounds. SMILES or 2D structures are converted into 3D forms and assigned accurate protonation states at physiological pH. Energy minimization and conformer generation optimize ligand geometries for accurate docking. Identifying rotatable bonds helps determine ligand flexibility, which influences docking search complexity and accuracy42.
3.Binding Site Identification:
Binding site determination employs static, dynamic, or hybrid approaches depending on structural data availability. When reference ligands are present, their binding locations define target sites; alternatively, computational cavity detection algorithms such as PASS, SurfNet, or POCKET identify potential sites through geometric or energy-based mapping 43.
4.Docking Execution:
Docking execution involves setting up grid parameters to precompute interaction maps, which accelerates docking by replacing real-time energy calculations with grid interpolation. Docking simulations employ search algorithms such as the Lamarckian genetic algorithm in AutoDock 4 and stochastic global search in AutoDock Vina. Parameters like population size, number of runs, and exhaustiveness influence the search depth and docking accuracy44.
5.Result Analysis and Validation:
Result analysis uses visualization tools like PyMOL or ChimeraX to evaluate hydrogen bonding, hydrophobic, and electrostatic interactions. Predicted binding energies are used to estimate ligand affinity and rank docking poses for interpretation 45.
Fig. 3: Schematic Overview of the Molecular Docking Workflow
Applications of Molecular Docking:
A. Drug Discovery and Development:
1.Hit Identification and Virtual Screening:
Virtual screening represents a primary application of molecular docking, enabling rapid identification of bioactive molecules from vast chemical libraries. Screening performance is evaluated using enrichment factors and BEDROC metrics emphasizing early recognition of actives 46.
2.Lead Optimization:
Lead optimization uses molecular docking to guide modifications of hit compounds, aiming to enhance binding affinity, selectivity, and drug-like properties by analyzing atomic interactions. While scoring functions are essential in this process, their predictive accuracy is often limited, with studies showing moderate correlation between docking scores and experimental activities, such as in acetylcholinesterase inhibitors. Therefore, experimental validation remains critical47.
3.De Novo Drug Design:
De novo design creates new molecules by building them directly within the target’s binding site, using fragment growth and linking guided by interaction scores. Tools like LUDI and HIPPO map hydrogen bonding and hydrophobic sites, while modern methods use AI and structural modeling for precise protein–ligand design48.
B. Disease-Specific Applications:
1.Cancer Therapy:
Docking is crucial for designing inhibitors against cancer-related proteins like BCL-2, CDK4/6, HER2, and EGFR. It supports structure-guided drug design by improving inhibitor selectivity and helping overcome drug resistance49.
2.Neurological Disorders:
Molecular docking supports Alzheimer's research by helping identify inhibitors targeting key proteins like acetylcholinesterase, butyrylcholinesterase, tau, and amyloid-β plaques. These docking studies also aid in discovering dual inhibitors and natural compounds that contribute to developing multi-target treatments for neurodegenerative diseases50.
Challenges and Limitations:
1.Technical Challenges:
Molecular docking faces several technical and methodological challenges that limit its predictive accuracy. The most critical issue is the limited reliability of scoring functions in accurately predicting binding free energy, as they often fail to distinguish between structurally similar ligands. Protein flexibility presents another major challenge, since most docking programs treat receptors as rigid, neglecting conformational changes essential for accurate binding representation. Inadequate conformational sampling further affects precision, as current molecular dynamics and heuristic methods capture only a fraction of possible ligand–protein conformations 52.
2.Methodological Limitations:
High computational costs restrict fully flexible docking and extensive sampling during large-scale virtual screening, while complex, non-additive interactions such as allosteric and water-mediated effects further complicate energy prediction. Moreover, machine learning-based approaches suffer from generalization issues due to biased training datasets, making experimental validation through crystallography, SPR, ITC, or NMR indispensable for confirming docking accuracy53.
CONCLUSION:
Molecular docking has matured into a central tool for structure-based drug design, driving innovation from conceptual origins to impactful real-world applications. Advances—from basic geometric matching to machine learning-augmented flexibility modelling have enabled docking to support drug discovery across cancer, infectious, neurological, and metabolic diseases. Despite ongoing challenges with scoring accuracy, protein flexibility, and solvation, pragmatic consensus methods and experimental validation enhance reliability. Future progress will rely on AI-driven scoring, more accurate modeling of protein dynamics and water effects, and expanded applicability to novel therapeutic modalities, sustaining docking’s leading role in pharmaceutical research and development.
REFERENCES
Kartiki Deshmukh*, Diksha Gangurde, Dr. Kanchan Jagtap, Advancements in Molecular Docking: A Comprehensive Review of Methods and Their Applications in Drug Discovery, Int. J. of Pharm. Sci., 2025, Vol 3, Issue 12, 998-1012 https://doi.org/10.5281/zenodo.17831503
10.5281/zenodo.17831503