Structural Bioinformatics and Big Data Analytics: A mini-review

Ragothaman M Yennamalli


Structural Biology and Structural Bioinformatics are two complementary areas that deal with three dimensional structures of biomolecules. With the advent of high-throughput techniques and automation of identifying structures there is a barrage of data generated currently, which fall under the area of Big Data. In this review, we present examples and current approach to handle massive volume of structural data and some potential applications of Big Data from Structural Bioinformatics perspective. 


Structural Bioinformatics; Structural Biology; Hadoop; Big Data; MMTF; Data Compression

Full Text:



[Internet]. 2017 [cited 10 April 2017]. Available from: 1.

[Internet]. 2017 [cited 10 April 2017]. Available from: /

PDBx/mmCIF General FAQ [Internet]. 2017 [cited 10 April 2017]. Available from:

Samish I, Bourne P, Najmanovich R. Achievements and challenges in structural bioinformatics and computational biophysics. Bioinformatics. 2014;31(1):146-150.

Bell J. Machine Learning for Big Data: Hands-On for Developers and Technical Professionals. 1st ed. Indianapolis, Indiana: John Wiley & Sons, Inc.; 2015

Fang J, Sips H, Zhang L, Xu C, Che Y, Varbanescu A. Test-driving Intel Xeon Phi. Proceedings of the 5th ACM/SPEC international conference on Performance engineering - ICPE '14. 2014.

Söding J. Big-data approaches to protein structure prediction. Science. 2017;355(6322):248-249.

Ovchinnikov S, Kinch L, Park H, Liao Y, Pei J, Kim D et al. Large-scale determination of previously unsolved protein structures using evolutionary information. eLife. 2015;4.

Muñoz-Torres P, Rokć F, Belužic R, Grbeša I, Vugrek O. msBiodat analysis tool, big data analysis for high-throughput experiments. BioData Mining. 2016;9(1).

Chen D, Jiang S, Ma X, Li F. TFBSbank: a platform to dissect the big data of protein–DNA interaction in human and model species. Nucleic Acids Research. 2016;45(D1):D151-D157.

Meyer P, Socias S, Key J, Ransey E, Tjon E, Buschiazzo A et al. Data publication with the structural biology data grid supports live analysis. Nature Communications. 2016;7:10882.

Elsevier R&D Solutions. Big Data, Wider Mindset, The Netherlands: Elsevier Publications; 2015.

Zhang B, Horvath S. A General Framework for Weighted Gene Co-Expression Network Analysis. Statistical Applications in Genetics and Molecular Biology. 2005;4(1).

Yabuuchi H, Niijima S, Takematsu H, Ida T, Hirokawa T, Hara T et al. Analysis of multiple compound-protein interactions reveals novel bioactive molecules. Molecular Systems Biology. 2014;7(1):472-472.

Zook M, Barocas S, boyd d, Crawford K, Keller E, Gangadharan S et al. Ten simple rules for responsible big data research. PLOS Computational Biology. 2017;13(3):e1005399.