Application of Support Vector Machines in Virtual Screening

Soumi Sengupta, Sanghamitra Bandyopadhyay


Traditionally drug discovery has been a labor intensive effort, since it is difficult to identify a possible drug candidate from an extremely large small molecule library for any given target. Most of the small molecules fail to show any activity against the target because of electrochemical, structural and other incompatibilities. Virtual screening is an in-silico approach to identify drug candidates which are unlikely to show any activity against a given target, thus reducing an enormous amount of experimentation which is most likely to end up as failures. Important approaches in virtual screening have been through docking studies and using classification techniques. Support vector machines based classifiers, based on the principles of statistical learning theory have found several applications in virtual screening. In this paper, first the theory and main principles of SVM are briefly outlined. Thereafter a few successful applications of SVM in virtual screening have been discussed. It further underlines the pitfalls of the existing approaches and highlights the area which needs further contribution to improve the state of the art for application of SVM in virtual screening.


Drug Design, Virtual Screening, Quantitative Structure Activity Relationship, Support Vector Machines

Full Text:



Blaney JM and Dixion JS, “On the Information Content of 2D and 3D Descriptors for QSAR”, Perspect Drug Discov.Des., 1: 301–319 (1993).

Ghosal N and Mukherjee PK, “3D QSAR of N-substituted 4-amino-3,3-dialkyl-2(3H)-furanone GABA Receptor Modulators Using Molecular Field Analysis and Receptor Surface Modeling Study”, Bioinorg. Med. Chem. Lett., 14:103–109 (2004).

Kuntz I D, Blaney EC, Oatley SJ, Langridge R, and Ferrin TE, “A Geometric Approach to Macromolecule-Ligand Interactions”, J. Mol. Biol., 161: 269–288 (1982).

Bandyopadhyay S, Bagchi A, and Maulik U, “Active Site Driven Ligand Design: An Evolutionary Approach”, Journal of Bioinformatics and Computational Biology, 3:1053–1070 (2005).

Goh G and Foster JA, Evolving molecules for drug design using genetic algorithm via Molecular Tree in Int Conf. Genet. Evol.Comput 2000, 27–33.

Jones G, Willett P, and Glen RC, “A genetic algorithm for flexible molecular overlay and pharmacophore elucidation”, J. Comput. Aided. Mol. Des., 9: 532–549 (1995).

Kimura T, Hasegawa K, and Funatsu K, “GA Strategy for Variable Selection in QSAR Studies: GA-Based Region Selection for CoMFA Modeling”, Journal of Chemical Information and Computer Sciences, 38: 276–282 (1998).

Lee KW and Briggs JM, “Comparative Molecular Field Analysis (CoMFA) study of Epothilones – Tubulin Depolymerization Inhibitors: Pharmacophore Development

Using 3DQSAR Methods”, J. Comput. Chem., 15: 41–55 (2001).

Pegg SC, Haresco JJ, and Kuntz ID, “A Genetic Algorithm for Structure-based De Novo Design”, J. Comput. Aided. Mol. Des., 15: 911–933 (2001).

Oshiro CM, Kuntz ID, and Dixion JS, “Flexible ligand docking using a genetic algorithm”, J. Comput. Aided. Mol. Des., 8: 565–582 (1994).

Oprea TI, “On the Information Content of 2D and 3D Descriptors for QSAR”, J. Braz. Chem. Soc., 13: 811–815 (2002).

Kubinyi H, “Free Wilson Analysis. Theory, Applications and its Relationship to Hansch Analysis”, Quantitative Structure-Activity Relationships, 7: 121–133 (1988).

Guner O, Pharmacophore Perception, Development, and use in Drug Design,

International University Line : La Jolla, 254—268; pp 254—268.

Hecker EA, Duraiswami C, Andrea TA,

and Diller DJ, “Use of catalyst pharmacophore models for screening of large combinatorial libraries.”, J ChemInfComputSci, 42: 1204–1211 (2002).

Liu F, You Qi-Dong, Chen Ya-Dong, “Pharmacophore identification of ksp inhibitors”, Bioorganic & Medicinal Chemistry Letters, 17: 722 – 726 (2007).

Sengupta S and Bandyopadhyay S, Evolving fragments to lead molecules in ISB ’10: Proceedings of the International Symposium on Biocomputing 2010, New York, 1–7.

Bandyopadhyay S and Sengupta S, “IVGA3D: De novo ligand design using a variable sized tree representation”, Protein & Peptides Letters, 17: 1495–1516 (2010).

Vapnik VN, “Estimation of dependencies based on empirical data,” Nauka: Moscow, 1979.

V. N. Vapnik, “The nature of statistical learning theory,” New York: Springer, 1995.

V. N. Vapnik, Statistical learning theory, Adaptive and learning systems for signal processing, communications, and control, Wiley: New York, 1998.

Ivanciuc O, In Reviews in Computational Chemistry, Wiley-VCH: Weinheim, Germany, 2007, 291—400.

Burbidge R, Trotter M, Buxton B, and Holden S, “Drug design by machine learning: Support vector machines for pharmaceutical data”, Computers and Chemistry, 26: 4–15 (2001).

Doniger S, Hofmann T, and Yeh JJ, “Predicting CNS permeability of drug molecules: Comparison of neural network and support vector machine algorithms”, Journal of Computational Biology, 9: 849–864 (2002).

Lengauer T, Lemmen C, Rarey M, and Zimmermann M, “Novel technologies for virtual screening”, Drug Discovery Today, 9: 27–34 (2004).

Trotter MWB, Buxton BF, and Holden SB, “Support Vector Machines in combinatorial chemistry”, Measurement and Control, 34: 235–239 (2001).

Trotter MWB and Holden SB, “Support Vector Machines for ADME property classification”, QSAR and Combinatorial Science, 22: 533–548 (2003).

Warmuth MK, Liao J, Ratsch G, Mathieson M, Putta S, and Lemmen C, “Active learning with support vector machines in drug discovery process”, J. Chem. Inf. Comput. Sci., 43: 667–673 (2003).

Jorissen RN and Gilson MK, “Virtual screening of molecular databases using a support vector machine”, Journal of Chemical Information and Modeling, 45: 549–561 (2005).

Yao XJ, Panaye A, Doucet JP, Zhang RS, Chen HF, Liu MC, Hu ZD, and Fan BT, “Comparative study of QSAR/QSPR correlations using support vector machines, radial basis function neural networks, and multiple linear regression”, Journal of Chemical Information and Computer Sciences, 44: 1257–1266 (2004).

Zernov VV, Balakin KV, Ivaschenko AA, Savchuk NP, and Pletnev IV, “Drug discovery using support vector machines. The case studies of drug-likeness, agrochemical-likeness, and enzyme inhibition predictions”, Journal of Chemical Information and Computer Sciences, 43: 2048–2056 (2003).

Byvatov E and Schneider G, “SVM-Based Feature Selection for Characterization of Focused Compound Collections”, Journal of Chemical Information and Computer Sciences, 44: 993–999 (2004).

Muller KR, Rtsch G, Sonnenburg S, Mika S, Grimm M, and Heinrich N, “Classifying drug-likeness’ with kernel based learning methods”, Journal of Chemical Information and Modeling, 45: 249–253 (2005).

Jacob L, Hoffmann B, Stoven V, and Vert JP, “Virtual screening of gpcrs: An in silico chemogenomics approach”, BMC Bioinformatics, 9: 363–379 (2008).

Li L, Li J, Khanna M, Jo I, Baird JP, and Meroueh SO, “Docking to erlotinib off-targets leads to inhibitors of lung cancer cell proliferation with suitable in vitro pharmacokinetics”, ACS Medicinal Chemistry Letters, 1: 229–233 (2010).

Li L, Khanna M, Jo I, Wang F, Ashpole NM, Hudmon A, and Meroueh SO, “Target-specific support vector machine scoring in structure-based virtual screening: Computational validation, in vitro testing in kinases, and effects on lung cancer cell proliferation”, Journal of Chemical Information and Modelling, 51: 755–759 (2011).

Smalter A, Huan J, and Lushington G, “Graph wavelet alignment kernels for drug virtual screening”, Journal of Bioinformatics and computational biology, 7: 473—497 (2009).

Soman ST and Soman KP, Wavelet assignment graph kernel for drug virtual screening in Advances in Recent Technologies in Communication and Computing 2009, 282–284