FrameOUT and FrameOUTDB : A web based application and repository for the identification and analysis of frameshift mutations

Received Jul 10 th , 2017 Revised Aug 20 th , 2017 Accepted Apr 4 th , 2018 Frameshift, one of the three classes of recoding, leads to waste of energy, resources and activity of biosynthetic machinery. In addition, some peptides, probably cycotoxic synthesized after frameshifts, results in diseases and disorders like muscular dystrophies, lysosomal storage disorders, and cancer. Hidden Stop Codons that occur naturally in coding sequences among all organisms, are associated with the early termination of translation for incorrect reading frame selection and help to reduce the metabolic cost related to the frame-shift events. Hidden stop codons and their association with numerous diseases. These codons are associated with the early termination of translation for incorrect reading frame selection and help to reduce the metabolic cost related to the frame-shift events. There are lots of appearances of hidden stops in mitochondrial genomes and we tried to study this putative event in mitochondrial genomes of vertebrates. To reduce this gap, this work presents an algorithmic web based tool to study hidden stops in frame-shifted translation for vertebrate mitochondrial genomes through respective genetic code system. FrameOUT (FO), an algorithmic web based application, predicts mutations in a user input sequence, be it a diseased or a normal sequence by implementation of Hidden Markov Model. FODB is a collection of all available Frameshift events and their association with various diseases. Keyword:


INTRODUCTION
Reading frames play an important role in the process of translation of nucleotide sequences into proteins.Selection of a wrong reading frame can alter the protein product.Such events that alter the reading frame are rare during translation; Frame-shift is one such event.Frame-shift is quite common in viruses, bacteria, yeast and other organisms [1,2].It is a type of genetic mutation generally caused by indels, i.e. insertion and deletion of nucleotides.Frame-shifts are defined as protein translations that start not at the first, but either at the second (+1 frame-shift) or the third (−1 frame-shift) nucleotide of the codon [3].Presumably, most frame-shifts would yield nonfunctional proteins.Therefore, frame-shifts lead to the waste of energy, resources and activity of the biosynthetic machinery.Some peptides, synthesized after frame-shifts, are probably cytotoxic and serve as possible cause for innumerable diseases and disorders such as muscular dystrophies, lysosomal storage disorders, and cancer.Frame-shift mutations might be beneficial sometime such as a frameshift mutation was responsible for the creation of Nylonaser [4, 5, and 6].Coding sequences lack stop codons, but many stop codons appear off-frame.Off-frame stops i.e. stop codons in +1 and −1 shifted reading frames, are termed Hidden Stop Codons (HSCs) or hidden stops [7][8][9][10].

What causes frame-shift errors?
One clear implication of the suppressor analysis is that frame-shifting is strongly stimulated by near-cognate decoding, that is decoding by an isoacceptor that makes a less than optimal wobble interaction with the mRNA.The example of suppression by a structurally normal near cognate tRNA in thesufB2 strain of S. typhimurium clearly shows that near-cognate decoding can stimulate frame errors.Moreover, overproduction of same near-cognate tRNA induces frame-shifting at the same sites suppressed by sufB2.Some programmed frameshifts are also stimulated by near-cognate decoding.The first example comes from the dnaX gene of E.coli, which encodes alternative forms of a subunit of DNA polymerase III [11].Frame-shifting results in the expression of a C-terminally truncated form of the protein and occurs on a slippery heptameric sequence A-AAA-AAG, two tRNAs simultaneously slipping −1 from AAA-AAG to AAA-AAA.The unusually high efficiency of this site partly results from the near-cognate recognition of the AAG codon by a tRNA with a modified U in the wobble position which restricts the ability of tRNA to decode AAG.Expressing a tRNA that recognizes AAG in a completely cognate fashion reduced frame-shifting on the site.The weakness of the interaction apparently predisposes the ribosome to frame-shift [2].
The aim of this study was to identify and analyze frame-shift mutations and their disease specific consequences.For this purpose, several mitochondrial vertebrate genomic sequences having 13 protein coding sequences, namely: ND1, ND2, COX1, COX2, ATP6, COX3, ND3, ND4L, ND4, ND5, ND6 and CYTB were collected.Based on these protein coding sequences, respective transition matrices have been developed.After generating transition matrices, HMM forward algorithm is implemented to compute the joint probability, on the sequence entered by the user, neglecting the stop codons as coding regions lack stop codons [12,3].There can be various probable states such as 0, 1 and 2 that are based upon the position of characters in nucleotide sequences for normal translation (0), +1 frameshift -1 and 1frameshift (2), and symbols are our very own nucleotides i.e.A, T, G and C, with equal probabilities.Therefore, FrameOut (FO) is a web based tool that predicts the mutational events occurring in genomic sequences through frame-shift events.Data is being framed through HMM.The calculation of probability of mutation in the user input sequence is done by implementing Hidden Markov Model (which is equivalent to stochastic regular grammars), particularly HMM Forward Algorithm, in which we calculate the probability based on certain training set.FrameOut DB (FODB) is a collection of all available Frameshift events and their association with various diseases specifically human diseases such as Corhn's disease, Rett-Syndrome, and Sandhoff disease, etc [13, 14 and 15].The FrameOut (FO) tool, that is a web based tool to predict the mutational events occurring in genomic sequences through frame-shift events, and FrameOUT DB have been designed with the help of languages such as HTML, CSS, JavaScript and PHP, MySQL and WAMP server.

IJCB
Steps to create FrameOUT Tool: FODB is a collection of all available frame-shift events and their association with various diseases specifically human diseases such as Corhn's disease, Rett-Syndrome, and Sandhoff disease, etc.This database has various options which may aid to give better results.Also, certain attributes are linked with other databases/resources such as gene list is linked with GenBank and PMID with PubMed.

1 .
Collection of Coding Genomic sequences 2. Generation of Transition Probability matrix 3. Algorithmic Implementation of HMM 4. Identification of Probable States 5. Identification and analysis of Hidden Stop Codon Steps to create FrmaOUT DB:  Collecting Data  Enlisting Related Attributes  Linking with Other DB  Various Options and their implementations  FODB development Both the processes are explained through the following diagrams.IJCB Vol. 7, No. 1, Apr 2018, 35 -48 http://www.ijcb.inMethodology for FrameOUT DB: Data Collection While collecting coding Genomic sequences, particularly, Vertebrate mitochondrial genomes, we collected genomic sequence data for 790 genomes.The collected genomes had 13 protein coding regions, namely: ND1, ND2, COX1, COX2, ATP6, ATP8, COX3, ND3, ND4L, ND4, ND5, ND6 and CYTB.After collecting the genomic sequence data, I segregated these 790 genomes into respective coding sequence files.Therefore, separate files listing respective coding regions have corresponding sequences and thereby there are 13 files, each having different 790 sequences.So, in total, there were 10270 coding genomic sequences.In this section, the basic flowchart of how FrameOUT works is described through the diagram below.For each separate file transition probability matrix was calculated, therefore 13 different transition probability matrices were generated based on the following formula: Transition probability = p(x t |x t-1 ) = p(y|x) = ≈ = For example: -= Implementing HMM: Forward Algorithm Implementing HMM forward algorithm, which basically computes the joint probabilityp (xt, y1: t), neglecting the stop codons.The forward algorithm takes advantage of the conditional independence rules of HMM to perform the calculation recursively based on following formula [16, 17]: α t (x t ) =p(y t |x t ) Thus, since  SYMBOLS = A, T, G, C  STATE = 0, 1, 2 p(y t |x t )= emission distributions probability= 0.25 i.e. equal probability for each SYMBOL p(x t |x t-1 ) = transition probabilities = value from transition probability matrix calculated previously α t-1 (x t-1 ) = previously calculated probability IJCB Vol. 7, No. 1, Apr 2018, 35 -48 http://www.ijcb.inThe various probable STATES are 0, 1 and 2:  State 0: no mutation/change is done i.e. sequence is taken as such,  State 1: start the frame by neglecting first nucleotide and  State 2: start the frame by neglecting initial two nucleotidesFrameOUT DB