Applications of Support Vector Machines as a Robust tool in High Throughput Virtual Screening
Chemical space is enormously huge but not all of it is pertinent for the drug designing. Virtual screening methods act as knowledge-based filters to discover the coveted novel lead molecules possessing desired pharmacological properties. Support Vector Machines (SVM) is a reliable virtual screening tool for prioritizing molecules with the required biological activity and minimum toxicity. It has to its credit inherent advantages such as support for noisy data mainly coming from varied high-throughput biological assays, high sensitivity, specificity, prediction accuracy and reduction in false positives. SVM-based classification methods can efficiently discriminate inhibitors from non-inhibitors, actives from inactives, toxic from non-toxic and promiscuous from non-promiscuous molecules. As the principles of drug design are also applicable for agrochemicals, SVM methods are being applied for virtual screening for pesticides too. The current review discusses the basic kernels and models used for binary discrimination and also features used for developing SVM-based scoring functions, which will enhance our understanding of molecular interactions. SVM modeling has also been compared by many researchers with other statistical methods such as Artificial Neural Networks, k-nearest neighbour (kNN), decision trees, partial least squares, etc. Such studies have also been discussed in this review. Moreover, a case study involving the use of SVM method for screening molecules for cancer therapy has been carried out and the preliminary results presented here indicate that the SVM is an excellent classifier for screening the molecules.
Walters W.P., Stahl M.T. and Murcko M.A."Virtual screening – an overview", Drug Discov Today, 3: 160–178 (1998).
Alexander Tropsha and Alexandre Varnek Eds Chemoinformatics approaches to virtual screening Royal Society of Chemistry 2008.
Vapnik V., Golowich S. and Smola A., "Support vector method for function approximation, regression estimation and signal processing", Adv. Neural Inform. Process. Syst. 9: 281–287 (1996) ; Sch¨olkopf B., Simard, P.Y., Smola, A. J. and Vapnik V. Prior knowledge in support vector kernels In Advances in Neural Information Processing Systems; Jordon M., Kearns M. and Solla S. Eds,; MIT Press :Cambridge, MA, 1998; pp 640–646.
Mangasarian O. L. and Musicant D. R. "Lagrangian Support Vector Machines", Journal of Machine Learning Research, 1: 161-177 (2001).
Ovidiu Ivanciuc Ed Lipkowitz and T.R. Cundari Application of Support Vector Machines in Chemistry In Reviews in Computational Chemistry; Wiley-VCH, Weinheim 2007 ; pp 291-40.
Kulkarni A., Jayraman V.K. and Kulkarni B.D "Control of chaotic dynamical systems using support vector machines", Physics Letters A, 317: 429-435 (2003).
Mundra P.K., Kumar M.K., Krishna K.K., Jayraman V.K. and Kulkarni B.D. "Using psuedoamino acid composition to predict protein subnuclear localization: Approached with PSSM", Pattern Recognition Letters 28: 1610-1615 (2007).
Kulkarni A., Jayraman V. K. and Kulkarni B.D. "Support vector classification with parameter tuning assisted by agent based technique", Computers and Chemical Engineering 28: 311-318 (2004).
Kulkarni A., Jayraman V.K. and Kulkarni B.D. (2005) "Knowledge incorporated support vector machines to detect faults in Tennessee Eastman Process", Computers and Chemical Engineering, 29: 2128-2133(2005).
Jade A.M., Srikanth B., Jayraman V.K., Kulkarni B.D. and Jog J.P "Feature extraction and denoising using kernel PCA", Chemical Engineering Sciences 58:4441-4448 (2003).
Nandi S., Badhe Y., Lonari J., Sridevei U., Rao B. S., Tambe S.S. and Kulkarni, B.D. "Hybrid Process Modeling and optimization strategies integrating neural network/support vector regression and genetic algorithms: Study of benzene isopropylation on H beta catalyst", Chemical Engineering Journal 97: 115-129 (2004).
Gandhi A. B., Joahi J.B., Jayraman V.K. and Kulkarni B.D. "Development of support vector regression(SVR) based correlation for prediction of overall gas hold up in bubble column reactors for various gas-liquid system", Chemical Engineering Science, 62: 7078-7089 (2007).
Schierz A.C. "Virtual screening of bioassay data", Journal of Chemoinformatics, 1:21 (2009) .
Karatzoglou A., Meyer D. and Hornik K. "Support Vector Machines in R", Journal of Statistical Software, 15: 1-28 (2006).
Scholkopf B. and Smola A. Learning with Kernels. MIT Press Cambridge MA, 2002.
Rätsch G. A brief introduction into machine learning In 21st Chaos Communication Congress, Berliner Congress Center, Berlin, Germany 2004 .
Ralaivola L., Swamidass S. J., Saigo H. and Baldi P. "Graph Kernels for Chemical Informatics", Neural Networks, 18: 1093-110 (2005).
Willett P., Wilton D., Basil H., Tang R., Ford J. and Madge D. "Prediction of Ion Channel Activity Using Binary Kernel Discrimination" Journal of Chemical Information and Modeling, 47: 1961-1966 (2007).
Byvatov E., Sasse B.C., Stark H. and Schneider G."From virtual to real screening for D3 dopamine receptor ligands", ChemBioChem, 6 : 997-999 (2005).
Chen B., Harrison R.F., Hert J., Mpanhanga C., Willett P. and Wilton D. J. "Ligand-based virtual screening using binary kernel discrimination", Molecular Simulation, 31: 597-604 (2005).
Wilton D.J., Harrison R.F., Willett P., Delaney J., Lawson K., and Mullier G "Virtual Screening Using Binary Kernel Discrimination: Analysis of Pesticide Data", Journal of Chemical Information and Modeling, 46: 471-477 (2006).
Mahe P., Ralaivola L., Stoven V. and Vert J.P. "The pharmacophore Kernel for Virtual Screening with Support Vector Machines", Journal of Chemical Information and Modeling, 46: 2003-2014 (2006).
Franke L., Byvatov E., Werz O., Steinhilber D., Schneider P. and Schneider G. "Extraction and visualization of potential pharmacophore points using support vector machines: application to ligand-based virtual screening for COX-2 inhibitors", J Med Chem, 48: 6997-7004 (2005).
Chen Y.F., Hsu K.C., Lin P.T., Hsu D.F., Kristal B.S. and Yang J.M. "LigSeeSVM: ligand-based virtual screening using support vector machines and data fusion", Int J Comput Biol Drug Des, 4: 274-89 (2011).
Liew C.Y., Ma X.H., Liu X., Yap C.W. "SVM model for virtual screening of Lck inhibitors", J Chem Inf Model. 49: 877-85 (2009).
Ma X.H., Wang R., Yang S.Y., Li Z.R. , Xue Y., Wei Y.C., Low B.C. and Chen Y.Z. "Evaluation of virtual screening performance of support vector machines trained by sparsely distributed active compounds ", J Chem Inf Model 48:1227-37 (2008).
Jorissen R.N. and Gilson M.K. "Virtual screening of molecular databases using a support vector machine", J Chem Inf Model, 45:549-61 (2005).
Han L.Y., Ma X.H., Lin H.H., Jia J., Zhu F., Xue Y., Li Z.R., Cao Z.W., Ji Z.L. and Chen Y.Z. "A support vector machines approach for virtual screening of active compounds of single and multiple mechanisms from large libraries at an improved hit-rate and enrichment factor", J Mol Graph Model. 26: 1276-86 (2008).
Wang F., Liu D., Wang H., Luo C., Zheng M., Liu H., Zhu W., Luo X., Zhang J. and Jiang. "Computational Screening for Active Compounds Targeting Protein Sequences: Methodology and Experimental Validation", J Chem Inf Model, 51: 2821-2828 (2011).
Li L., Wang B. and Meroueh S.O. "Support vector regression scoring of receptor-ligand complexes for rank-ordering and virtual screening of chemical libraries", J Chem Inf Model. 51: 2132-8 (2011).
Li L., Khanna M., Jo I., Wang F., Ashpole N.M., Hudmon A. and Meroueh S.O."Target specific support vector machine scoring in structure-based virtual screening : computational validation, invitro testing in kinases and effects on lung cancer cell proliferation", J Chem Inf Model 51, 755-759 (2011).
Wassermann A.M., Geppert H. and Bajorath J. "Application of support vector machine-based ranking strategies to search for target-selective compounds", Methods Mol Biol. 672, 517-30 (2011) .
Ma X.H., Wang R., Tan C.Y., Jiang Y.Y., Lu T., Rao H.B., Li X.Y., Go M.L., Low B.C. and Chen YZ "Virtual Screening of Selective Multitarget Kinase Inhibitors by Combinatorial Support Vector Machines", Mol Pharm.,7: 1545-1560 (2010).
Plewczynski D., von Grotthuss M., Spieser S.A., Rychlewski L., Wyrwicz L.S., Ginalski K. and Koch U. "Target specific compound identification using a support vector machine" , Comb Chem High Throughput Screen, 10: 189-96 (2007).
Byvatov E. and Schneider G. "SVM-based feature selection for characterization of focused compound collections", J Chem Inf Comput Sci, 44: 993-9 (2004).
Li Y., Tan C., Gao C., Zhang C., Luan X., Chen X., Liu H., Chen Y and Jiang Y. "Discovery of benzimidazole derivatives as novel multi-target EGFR, VEGFR-2 and PDGFR kinase inhibitors", Bioorg Med Chem, 19: 4529-35 (2011) .
Xie Q.Q., Zhong L., Pan Y.L., Wang X.Y., Zhou J.P., Di-Wu L., Huang Q., Wang Y.L., Yang L.L., Xie H.Z. and Yang S.Y. "Combined SVM-based and docking-based virtual screening for retrieving novel inhibitors of c-Met", Eur J Med Chem, 46: 3675-80 (2011).
Ren J.X., Li L.L., Zheng R.L., Xie H.Z., Cao Z.X., Feng S., Pan Y.L., Chen X., Wei Y.Q. and Yang S.Y. "Discovery of novel Pim-1 kinase inhibitors by a hierarchical multistage virtual screening approach based on SVM model, pharmacophore, and molecular docking", J Chem Inf Model, 1: 1364-75 (2011) .
Luan X., Gao C., Zhang N., Chen Y., Sun Q., Tan C., Liu H., Jin Y. and Jiang Y. "Exploration of acridine scaffold as a potentially interesting scaffold for discovering novel multi-target VEGFR-2 and Src kinase inhibitors", Bioorg Med Chem, 19: 3312-9 (2011).
Kumar P., Ma X., Liu X., Jia J., Bucong H., Xue Y., Li Z.R., Yang S.Y., Wei Y.Q. and Chen Y.Z.. "Effect of training data size and noise level on support vector machines virtual screening of genotoxic compounds from large compound libraries", J Comput Aided Mol Des, 25: 455-67 (2011).
Guo-Bo L., Ling L. Y., Shan F., Jian P. Z. , Huang Q, Zhang H.X., Lin Li. Li. and Sheng Y.Y. "Discovery of novel mGluR1 antagonists : A multistep virtual screening approach based on a SVM model and a pharmacophore hypothesis significantly increases the hit rate and enrichment factor", Bioorganic and Medicinal Chemistry Letters 21:1736-1740 (2011).
Mballo C. and Makarenkov V. "Using machine learning methods to predict experimental high-throughput screening data", Comb Chem High Throughput Screen, 13: 430-41 (2010).
Yang G.X., Wei L.V., Yu-Z. C. and Ying X. "Insilico prediction and screening of γ secretase inhibitors by molecular descriptors and machine learning methods", J Comput Chem ,31: 1249-58 (2010).
Liu X.H., Ma X.H., Tan C.Y., Jiang Y.Y., Go M.L., Low B.C. and Chen Y.Z. "Virtual screening of Abl inhibitors from large compound libraries by support vector machines", J Chem Inf Model, 49: 2101-10 (2009).
Tang H., Wang X.S., Huang X.P., Roth B.L., Butler K.V., Kozikowski A.P, Jung M. and Tropsha A. "Novel inhibitors of human histone deacetylase (HDAC) identified by QSAR modeling of known inhibitors, virtual screening, and experimental validation", J Chem Inf Model , 49: 461-76 (2009).
Geppert H., Horváth T., Gärtner T., Wrobel S. and Bajorath J. J. " Support-vector-machine-based ranking significantly improves the effectiveness of similarity searching using 2D fingerprints and multiple reference compounds", Chem Inf Model, 48: 742-6 (2008).
Byvatov E., Fechner U., Sadowski J., Schneider G. "Comparison of support vector machine and artificial neural network systems for drug/nondrug classification", J Chem Inf Comput Sci , 43: 1882-9 (2003).
Melagraki G., Afantitis A., Sarimveis H., Koutentis P.A., Markopoulos J. and Igglessi-Markopoulou O. "Optimization of biaryl piperidine and 4-amino-2-biarylurea MCH1 receptor antagonists using QSAR modeling, classification techniques and virtual screening", J. Comput Aided Mol Des, 21: 251-67 (2007).
Vasanthanathan P., Taboureau O., Oostenbrink C., Nico P. E., Olsen V.L. and Jørgensen F.S. "Classification of cytochrome P450 1A2 inhibitors and non inhibitors by machine learning techniques", Drug Metabolism and Disposition 37 : 658-664 (2009).
Khandelwal A., Krasowski M.D., Reschly E.J., Sinz M.W., Swaan P.W. and Ekins S. "Machine Learning Methods and Docking for predicting Human Pregnane X receptor activation", Chem Res Toxicol, 21: 1457-1467 (2008) .
Plewczynski D., Spieser, S.A.H. and Koch U. "Assessing Different Classification Methods for Virtual Screening", 46: 1098-1106 (2006).
Mierswa I. and Wurst, Michael and Klinkenberg, Ralf and Scholz, Martin and Euler, Timm: "YALE: Rapid Prototyping for Complex Data Mining Tasks", In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-06), 2006.
Back T., Fogel D. and Michalewiez Z. Eds Handbook of evolutionary computation, Institute of Physics publishing and oxford university press, New York, 2007.
Schneider G. " Virtual screening: an endless staircase?", Nature Reviews Drug Discovery, 9: 273-276 (2010).