Weighted Alignment Free Dissimilarity Metric for Promoter Sequence Comparison

Kouser DT, Lalitha Rangarajan


Comparative sequence analysis has been a powerful tool in bioinformatics which interprets knowledge about the functionality of a sequence, making use of its structural information. Among the non coding regions of DNA,   the comparison of promoter sequences has received a great deal of attention in medical science as promoter regions play a crucial role in gene regulation. In this work we propose an alignment free sequence comparison metric for comparison of promoter sequences. We use the binary and decimal position specific motif matrices (PSMM) of the promoters which were created for our experiments using the TFSEARCH tool. Simple weighted algorithm is used to compute the dissimilarity between the PSMMs of promoter sequences, thereby analyzing its underlying homology and functionality. The NCBI database was used to obtain the promoter sequences of 500 nucleotides upstream the transcription start site (TSS) of enzyme pyruvate kinase (PKLR) from the glycolysis pathway of different organisms for one experiment and all the enzymes from the glycolysis pathway of organism human for the other. The proposed dissimilarity metric is successful in bringing out differences on both the datasets and the results regarding similarities and differences in promoter sequences could be essential to have a clear knowledge of transcription regulation process in different organisms.The results reveal some useful findings which can be extended for a broader investigation.


Weighted; Alignment free; Non coding regions ; Promoter comparison; Dissimilarity measure

Full Text:



Wray GA, Hahn MW, Abouheif E, Balhoff JP, Pizer M, Rockman MV et al. The evolution of transcriptional regulation in eukaryotes. Mol Biol Evol. 2003 Sep;20(9):1377-419. Cited in PubMed; PMID 12777501.

Reiter LT, Potocki L, Chein S, Griebskov M, Bier E. A Systematic Analysis of Human Disease – Associated Gene Sequences in Drosophila Melanogaster. Genome Res. 2001;71:1114-125.

Liang KC, Wang X, Anastassiou D. Bayesian Basecalling for DNA Sequence Analysis Using Hidden Markov Models. IEEE/ACM Trans Comput Biol Bioinform. 2007 Jul-Sep;4(3):430-40. Cited in PubMed; PMID 17666762.

Dong X, Sung SY, Sung WK, Tan CL. Constraint based method for finding motifs in DNA sequences. IEEE Trans Symposium on Bioinform and Bioeng. 2004;483-90.

Leung HNM, Chin FYL. Discovering DNA Motifs with Nucleotide Dependency. IEEE Trans Symposium on Bioinform and Bioeng. 2006;70-80.

Meera A. Computational Models for DNA Sequence Alignment- Some New Approaches. Doctoral Thesis. University of Mysore. 2011.

Eddy SR. Hidden Markov Models. Curr Opin Struct Biol. 1996 Jun;6(3):361-5. Cited in PubMed; PMID 8804822.

Ji X, Ling JL, Sun Z. Mining gene expression data using a novel approach based on hidden Markov models. FEBS Lett. 2003 May 8;542(1-3):125-31. Cited in PubMed; PMID 12729911.

Chandra V, Girijadevi R, Nair AS, Pillai SS, Pillai RM. MTar : a computational microRNA target prediction architecture for human transcriptome. BMC Bioinformatics. 2010 Jan 18;11 Suppl 1:S2. Cited in PubMed; PMID 20122191.

Chan TM, Leung KS, Lee KH, Lio P. Generic Spaced DNA Motif Discovery Using Genetic Algorithm. Congress on Evolutionary Computation (CEC) IEEE Trans. 2010:1-8.

Meera A, Rangarajan L, Shilpa N. New Distance Measure for Sequence Comparison using Cumulative Frequency Distribution. I.J.C.A. 2011;19:13-18.DOI - 10.5120/2335-3043.

Deyneko IV, Kel AE, Blocker H, Kauer G. Signal-theoretical DNA similarity measure revealing unexpected similarities of E. coli promoters. In Silico Biol. 2005;5(5-6):547-55. Cited in PubMed; PMID 16268796.

Hu J, Liang X, Zhao H, Chen D. The Analysis of Similarity for Promoter Sequence Structures in yeast Genes. BMEI IEEE Trans. 2012;919-22.

Vinga S, Almeida J. Alignment-free sequence comparison- a review. Bioinformatics. 2003;513-23.


Goshtasby AA. Image Registration - Similarity and Dissimilarity Measures - Chapter 2. Springer Advances in Computer Vision and Pattern Recognition. 2012; 07-66. DOI 10.1007/978-1-4471-2458-0.

Duda RO, Hart PE, Stork DG. Pattern Classification. 2nd edn. Wiley-Interscience. New York. 2001;187.

Theodoridis S, Koutroumbas K. Pattern Classification. 4th edn. Academic Press. New York. 2009;602-06.