DNA Microarray Data Analysis: A New Survey on Biclustering

Haifa Ben Saber, Mourad ELLOUMI


There are subsets of genes that have similar behavior under subsets of conditions, so we say that they coexpress, but behave independently under other subsets of conditions. Discovering such coexpressions can be helpful to uncover genomic knowledge such as gene networks or gene interactions. That is why, it is of utmost importance to make a simultaneous clustering of genes and conditions to identify clusters of genes that are coexpressed under clusters of conditions. This type of clustering is called biclustering.

Biclustering is an NP-hard problem. Consequently, heuristic algorithms are typically used to approximate this problem by finding suboptimal solutions. In this paper, we make a new survey on biclustering of gene expression data, also called microarray data.


biclustering; heuristic algorithms; microarray data; genomic knowledge

Full Text:



Ouafae Kaissi. Analyse de Données Transcriptomiques pour La Recherche de Biomarqueurs Liés à Certaines Pathologies Cancéreuses. PhD thesis, University Abdelmalek Essaadi, Tangier, Morocco„ sep 2014.

Sara C. Madeira and Arlindo L. Oliveira. A polynomial time biclustering algorithm for finding approximate expression patterns in gene expression time series. Algorithms for Molecular Biology, 4(8), June 2009.

W. Ayadi and M. Elloumi. Algorithms in Computational Molecular Biology : Techniques,Approaches and Applications. chapter Biclustering of Microarray Data, 2011.

Sara C. Madeira and Arlindo L. Oliveira. Biclustering algorithms for biological data analysis: A survey. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 1:24–45, 2004. ISSN 1545-5963.

Law Ngai-Fong Siu Wan-Chi Cheng, Kin-On and Alan Wee-Chung. Identification of coherent patterns in gene expression data using an efficient biclustering algorithm and parallel coordinate visualization. BMC Bioinformatics, 2008.

Xiaowen Liu and Lusheng Wang. Computing the maximum similarity bi-clusters of gene expression data. Bioinformatics, 23(1):50–56, 2007.

Aguilar-Ruiz and Jesús S. Shifting and scaling patterns from gene expression data. Bioinformatics, 21(20): 3840–3845, 2005.

Hyuk Cho and Inderjit S. Dhillon. Coclustering of human cancer microarrays using minimum sum-squared residue coclustering. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 5(3):385–400, 2008.

Ranajit Das and al. Evolutionary biclustering with correlation for gene interaction networks. In Pattern Recognition and Machine Intelligence, Second International Conference, PReMI 2007, Kolkata, India, December 18-22, 2007, Proceedings,pages 416–424, 2007.

Yizong Cheng and George M. Church. Biclustering of expression data. pages 93–103, 2000.

Li Teng and Laiwan Chan. Discovering biclusters by iteratively sorting with weighted correlation coefficient in gene expression data. Signal Processing Systems, 50:267–280.

Carazo J.-Kochi K. Lehmann-D. Pascual-Montano, A. and R. D. Pascual-Marqui. Nonsmooth nonnegative matrix factorization (nsnmf). IEEE, 2006.

Rodrigo Santamara, Roberto Theran, and Luis Quintales. Bicoverlapper: A tool for bicluster visualization. Bioinformatics, 24:1212–1213, 2008.

Roberto Therón Rodrigo Santamaría and Luis Quintales. A visual analytics approach for understanding biclustering results from microarray data. BMC Bioinformatics, 9(247), 2008.

Pinheiro M. Arrais-J. Gomes A. C. Carreto L. Freitas A. Oliveira J. L. Moura, G. and M. A. Santos. Large scale comparative codon-pair context analysis unveils general rules that fine-tune evolution of mrna primary structure. PLoS ONE., 2007.

Yuval Kluger, Ronen Basri, Joseph T. Chang, and Mark Gerstein. Spectral biclustering of microarray cancer data: Co-clustering genes and conditions. Genome Research, 13:703–716, 2003.

Santamaria R. Khamiakova-T. Sill M. Theron R. Quintales L. Kaiser, S. and F. Leisch. biclust: Bicluster algorithms. R package., 2011.

Eugenio Cesario Fabrizio Angiulli and Clara Pizzuti. Random walk biclustering for microarray data. Information Sciences, 178(6):1479–1497, 2008.

Elloumi M. Ayadi, W. and J.-K. Hao. Bicfinder: a biclustering algorithm for microarray data analysis. Knowledge and Information Systems., 2012.

Jan Ihmels, Sven Bergmann, and Naama Barkai. Defining transcription modules using large-scale gene expression data. Bioinformatics, 20(13):1993–2003, 2004.

Chor B.-Karp R. Ben-Dor, A. and Z. Yakhini. Clustering gene expression patterns. 6, 2002.

Amos Tanay, Roded Sharan, and Ron Shamir. Discovering statistically significant biclusters in gene expression data. In In Proceedings of ISMB 2002, pages 136–144, 2002.

Jiong Yang and al. Enhanced biclustering on expression data.

Chor Benny Karp Richard Ben-Dor, Amir. and Zohar. Yakhini. Discovering local structure in gene expression data: The order-preserving submatrix problem. In Proceedings of the Sixth Annual International Conference on Computational Biology, RECOMB ’02, pages 49–57, New York, NY, USA, 2002. ACM.

Hossam S. Sharara and Mohamed A. Ismail. Bisoft: A semi-fuzzy approach for biclustering gene expression data. In BIOCOMP, 2008.

Martin Sill, Sebastian Kaiser, Axel Benner, and Annette Kopp-Schneider. Robust biclustering by sparse singular value decomposition incorporating stability selection. Bioinformatics, 27:2089–2097, 2011.

Miranda van Uitert, Wouter Meuleman, and Lodewyk F. A. Wessels. Biclustering sparse binary genomic data. Journal of Computational Biology, 15(10):1329–1345, 2008.

Perez-Pulido A. J. Rodriguez-Baena, D. S. and J.S. Aguilara-Ruiz. A biclustering algorithm for extracting bit-patterns from binary datasets. Bioinformatics., 2011.

Elloumi M. Ayadi, W. and J.-K. Hao. A biclustering algorithm based on a bicluster enumeration tree: application to dna microarray data. BioData Mining., 2009.

Tze-Haw Huang ; XingXing Song ; Mao Lin Huang. Optimized data acquisition by time series clustering in opc. IEEE., 2011.

Inderjit S. Dhillon, Subramanyam Mallela, and Dharmendra S. Modha. Information-theoretic co-clustering. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 89–98. ACM Press, 2003.

Jiun-Rung Chen and Ye-In Chang. A condition-enumeration tree method for mining biclusters from dna microarray data sets. Elsevier, 97:44–59, 2007.

Stefan Bleuler Oliver Voggenreiter and Wilhelm Gruissem. Exact biclustering algorithm for the analysis of large gene expression data sets. Eighth International Society for Computational Biology (ISCB) Student Council Symposium Long Beach, CA, USA.July, pages 13–14, 2012.

Joana P. Gonalves and Sara C. Madeira. e-bimotif: Combining sequence alignment and biclustering to unravel structured motifs. In IWPACBB, volume 74, pages 181–191, 2010.

Shamir and al. Expander - an integrative program suite for microarray data analysis. BMC Bioinformatics, 6: 232, 2005.

Dong Wang and al. Mapping query to semantic concepts: Leveraging semantic indices for automatic and interactive video retrieval. In ICSC ’07: Proceedings of the International Conference on Semantic Computing, pages 313–320, 2007.

W. Ahmad. chawk: An efficient biclustering algorithm based on bipartite graph crossing minimization. 2007.

Haibao Tang Andrew H. Paterson Guojun Li, Qin Ma and Ying Xu. Qubic: a qualitative biclustering algorithm for analyses of gene expression data. 2009.

Nir Friedman, Lise Getoor, Daphne Koller, and Avi Pfeffer. Learning probabilistic relational models. In IJCAI, pages 1300–1309, 1999.

Sepp Hochreiter, Ulrich Bodenhofer, Martin Heusel, Andreas Mayr, Andreas Mitterecker, Adetayo Kasim, Tatsiana Khamiakova, Suzy Van Sanden, Dan Lin 0004, Willem Talloen, Luc Bijnens, Hinrich W. H. Göhlmann, Ziv Shkedy, and Djork-Arné Clevert. Fabia: factor analysis for bicluster acquisition. Bioinformatics, 26(12):1520–1527, 2010.

Mohamed Nadif and Gérard Govaert. Block clustering via the block gem and two-way em algorithms. In AICCSA’05, pages –1–1, 2005.

Mohamed Nadif and Gerard Govaert. A comparison between block cem and two-way cem algorithms to cluster a contingency table. In PKDD’05, pages 609–616, 2005.

Baocheng W. Guifen, C. and Y. Helong. The implementation of parallel genetic algorithm based on matlab. AdvancedParallel Processing Technologies., 2007.

Daniel Gusenleitner, Eleanor Howe, Stefan Bentink, John Quackenbush, and Aedin C. Culhane. ibbig: iterative binary bi-clustering of gene sets. Bioinformatics, 28(19):2484–2492, 2012.

Lazzeroni and Owen. Plaid models for gene expression data. Statistica Sinica., 2002.

Shawn Mankad and George Michailidis. Biclustering three-dimensional data arrays with plaid models. Journal of Computational and Graphical Statistics, 2013.

Ole Andreatta, Massimo Lund and Morten Nielsen. Simultaneous alignment and clustering of peptide data using a gibbs sampling approach. Bioinformatics, 29(1):8–14, 2013.

Hartigan. Clustering Algorithms, chapter Direct splitting. 1975.

Gerard GOVAERT. La classification croisee. Modulad, 1983.

Wunsch II Xu, Rui and Donald C. Bartmap: A viable structure for biclustering. Neural Netw., 24:709–716, September, 2011.

Douglas Creighton Saeid Nahavandi. Thanh Nguyen, Abbas Khosravi. Spike sorting using locality preserving projection with gap statistics and landmark-based spectral clustering. Neuroscience Methods., 2014.

I. Llatas, A.J. Quiroz, and J.M. Renom. A fast permutation-based algorithm for block clustering. Test, 6(2): 397–418, 1997.

G. Govaert and M. Nadif. Co-Clustering. FOCUS Series. Wiley, 2013.

G. Getz, E. Levine, and E. Domany. Coupled two-way clustering analysis of gene microarray data. Proc. Natl. Acad. Sci. USA, 97:12079–12084, 2000.

Amela Prelic, Stefan Bleuler, Philip Zimmermann, Anja Wille, Peter Bühlmann, Wilhelm Gruissem, Lars Hennig, Lothar Thiele, and Eckart Zitzler. A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics, 22:1122–1129, 2006.

J. Caldas and S. Kaski. Hierarchical generative biclustering for microrna expression analysis. Computational Biology., 2011.

M. Charrad. Une approche gnrique pour l-analyse croisant contenu et usage des sites web par des methodes de bipartitionnement. PhD thesis, Paris and ENSI, University of Manouba, 2010.

Yves Lechevallier Malika Charrad, Gilbert Saporta, and Mohamed Ben Ahmed. Determination du nombre des classes dans l’algorithme croki de classification croisee. In EGC’09, pages 447–448, 2009.

Stanislav Busygin and al. Double conjugated clustering applied to leukemia microarray data. 2002.

Khalid Benabdeslem and Kais Allab. Bi-clustering continuous data with self-organizing map. Neural Computing and Applications, 22(7):1551–1562, 2013.

Chun Tang, Li Zhang 0008, Aidong Zhang, and Murali Ramanathan. Interrelated two-way clustering: An unsupervised approach for gene expression data analysis. pages 41–48, 2001.

Eleni Mina. Applying biclustering to understand the molecular basis of phenotypic diversity. Phd. Utrecht University Faculty of Science Department of Information and Computing Sciences, 2011.

Akdes Serin. Biclustering analysis for large scale data. Phd., 2011.

Michael Ashburner. Gene ontology: tool for the unification of biology. Nature Genetics 25, pages 25 –29, 2000.

Gene ontology consortium. Internet:, . URL http://www.geneontology.org/,note= September2014.

Pietro Hiram Guzzi, Marianna Milano, and Mario Cannataro. Mining association rules from gene ontology and protein networks: Promises and challenges. Procedia Computer Science, 29(0):1970 – 1980, 2014. International Conference on Computational Science.

Xuebo Song, Lin Li, Pradip K. Srimani, Philip S. Yu, and James Z.Wang. Measure the semantic similarity of go terms using aggregate information content. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 11:468–476, 2014.

Cran package. Internet:, . URL http://cran.r-project.org/web/packages. July 2014.

Kuznetsov S. O. Macko J. Jr. W. M. Kaytoue, M. and A. Napoli. Mining biclusters of similar values with triadic concept analysis. The Eighth International Conference on Concept Lattices and Their Applications., 2011.

Chris H. Q. Ding, Tao Li, and Wei Peng. Nonnegative matrix factorization and probabilistic latent semantic indexing: Equivalence chi-square statistic, and a hybrid method. In AAAI’06, 2006.

Haifa BenSaber. Classification non supervisiee des donnees des puces a ADN", ESSTT. 2010.

Jiong Yang, HaixunWang,WeiWang 0010, and Philip S. Yu. An improved biclustering method for analyzing gene expression profiles. International Journal on Artificial Intelligence Tools, 14(5):771–790, 2005.

Mehmet Koyuturk. Using protein interaction networks to understand complex diseases. Computer, 45(3): 31–38, 2012.

C Heaton J L Marchini and B D Ripley. fastica: Fastica algorithms to perform ica and projection pursuit. R package, 2013.

Baliga N. S. Reiss, D. J. and Bonneau. cmonkey integrated biclustering algorithm. R package, 2012.