Computational Structural Analysis of C-Terminal Residues of Proteins Containing Transmembrane Regions

Received Oct 29 th , 2014 Accepted Dec 24 th , 2014 For the past few years, the numbers of transmembrane protein structures in Protein Data Bank have been increased substantially. It is of interest to analyze the terminal residues of transmembrane proteins by using computational approaches. Also, up to our knowledge, no analysis was reported in the literature on the study of terminal residues in transmembrane proteins. While the N-terminal position of alpha and beta transmembrane proteins are composed of signal peptides, in the present work, a careful, in-depth, computational analysis such as residue preference, stability upon mutation, solvent accessibility, hydrogen bonding and carboxy terminal pentapeptide pattern search respectively has been done on C-terminal residues. Alanine in alpha transmembrane proteins and phenylalanine in beta transmembrane proteins are highly preferred. Glutamic acid and glycine residues can be substituted at the terminal sites of alpha and beta transmembrane proteins without affecting the protein's overall stability. Hydrogen bonding of terminal residues is studied in detail. Pattern search of carboxy pentapeptides shows that identical pentapeptides with reference to the position can adopt a different secondary structure. The results discussed in this paper may help to understand the role of carboxy terminal residues in alpha and beta transmembrane proteins. From our analysis, we insist that the preferences and structural analysis of carboxy terminal residues in alpha and beta transmembrane proteins, can help to model and design novel transmembrane proteins. Keyword:


INTRODUCTION
Proteins are linear amino acid polymers flanked by amino (N-) and carboxy (C-) termini.The amino and carboxy termini of proteins play a vital role in the folding and functioning of proteins [1].They are also important for the conformational stability of proteins [2].These terminal residues serve as a binding site to interact with other proteins to form protein-protein interactions to perform a specific function [3][4][5][6][7].Another well known function of terminal residues is their ability to serve as a cellular targeting signal [8].Also, there are several hypotheses that the terminal residues and last few amino acid residues may interact with various release factors involved in translational termination [9].Considering the above facts, protein science groups have made several detailed theoretical and experimental analysis on amino and carboxy terminal of folded globular proteins.
One such experiment suggests that the terminal residues have comparatively more accessible surface area than other residues [10].This is because, if the terminal residues are exposed, it can easily interact with other proteins.Another analysis on protein termini reveals that the proteins having terminal residues in the core (not on the surface) are difficult to fold and hence the tendency of the terminal residues to be on the surface is increased [11].Pal and Chakrabarti have made a computational analysis on the known protein structures to find out if there is any pattern in the type of residues used and their conformation at the two terminal positions of the polypeptide chains [12].Likewise, several analyses on protein termini is reported on globular proteins [13][14][15][16][17].By looking the literature carefully, it is found that the studies and reports on terminal residues of membrane proteins are limited.It is of interest to analyze the terminal residues of membrane proteins using various computational procedures, and hence we have made a detailed analysis on carboxy terminal residues of proteins containing transmembrane regions.
Transmembrane proteins are major classes of membrane proteins which is involved in diverse functions in the cell like transport, ion transmission and catalysis.These transmembrane proteins are further classified into two groups based on its structure such as alpha transmembrane proteins (made up of alpha helices) and beta transmembrane proteins (made up of beta barrels).Although, transmembrane proteins are targets of many pharmaceutical developments, they are not paid much attention like globular proteins.This is due to the hardness of structure elucidation of transmembrane proteins.The crystal structures of several membrane proteins are of low resolution [18].Currently, the number of membrane protein structures is increased in the Protein Data Bank and now it has more than 300 membrane-protein structures with 30-40 unique folds [19].Considering the above facts, in the present work, a careful computational analysis has been done on the carboxy terminal residues of the proteins containing transmembrane regions.The carboxy terminal residues are subjected to mutation analysis to find favorable and unfavorable residues at that position.The five residues at the terminal is considered and subjected to pattern search in the dataset of proteins containing regular secondary structures.Further, hydrogen bonding patterns of the terminal residues are studied.

Dataset
We have obtained non-redundant alpha and beta transmembrane protein sequences and structures from PDBTM database [18].As of December 2013, there are 176 alpha transmembrane proteins and 65 beta transmembrane proteins.It is noted that the significant fraction of the proteins in the dataset is composed of several chains or domains.Further, in order to make a pattern search of terminal carboxy pentapeptide residues with globular proteins, we used a pre-compiled dataset of globular proteins derived from the latest release of the PDB [19].The dataset contains 5877 nonredundant protein chains and each residue in the dataset is assigned with regular secondary structure (Helix, Sheet or Coil) and solved experimentally at high resolution (less than 3 Å).The proteins in the dataset share less than 30% sequence identity [20].

Extraction of terminal residues and terminal carboxy pentapeptides
Since, the N-terminal region of the membrane proteins consist of signal peptides, the Cterminal residues were considered in our analysis.Terminal residues were extracted from the PDB files by using in-house Perl scripts.For example, the protein with PDB ID: 1YC9, the C-terminal of chain A is considered.The last residue in the chain is not a phenylalanine as shown in PDB file.A visual inspection of the structure reveals that the C-terminal of chain A is actually far from the membrane, but there is a phenylalanine in the C-terminal segment of the transmembrane domain.Hence, we have not omitted above proteins from the dataset and our analysis is with respect to the domain organization of proteins.Also, there are some missing residues in the C-terminal region of the structure.Five carboxy terminal residues (Carboxy pentapeptides) were also extracted and kept in appropriate format for further computations.We have computed the percentage of number of occurrences of twenty amino acid residues at carboxy terminal position of alpha and beta transmembrane proteins.

Phylognetc analysis and gene ontology terms of transmembrane proteins
We have performed multiple sequence alignment on the alpha and beta transmembrane protein sequences to understand the evolutionary relationship (65 alpha and 63 beta transmembrane proteins) in the context of C-terminal residues using clustalw.Further, we have extracted gene ontology terms from the annotations page of the Protein Data Bank to understand the functional consequence of each proteins in the dataset [19].

Computation of protein stability upon point mutations
The carboxy terminal residues of alpha and beta transmembrane proteins were subjected to point mutation with all other 19 amino acids and the ΔΔG is calculated by using CUPSAT (Cologne University Protein Stability Analysis Tool) server [21].This program uses structural environment specific atom potentials and torsion angle potentials to predict ΔΔG.ΔΔG is the difference in free energy of unfolding between wild-type and mutant proteins.The ΔΔG values obtained from the CUPSAT server were considered for further analysis.Also, this server calculates the solvent accessibility of the mutated residues.

Computation of hydrogen bonding patterns
Hydrogen bonding patterns of carboxy terminal residues were examined using Struct tools server at high performance computing at National Institute of Health (NIH).The server is freely accessible at http://helixweb.nih.gov/structbio/.This server computes hydrogen bond distance and the angle between the atoms of donor (CNH_O) and acceptor (NH_OC).The results are tabulated in appropraite format to perform further analysis.

Searching identical carboxy pentapeptide patterns in folded proteins
Five residue carboxy terminal segments (carboxy pentapeptide) of each of the transmembrane proteins were searched against the ccPDB sequences (5877 sequences) [20].Secondary structures of identically matched segments were obtained.The results thus obtained were classified into two major groups such as identical pentapeptides with identical secondary structures and identical pentapeptides with different secondary structures.The whole searching and secondary structure assignment process was performed using an inhouse Perl program.

Percentage of occurrences of amino acid residues at C-terminal position
In protein sequence analysis, looking at the number of occurrences of a particular amino acid type at a specific position is important.Hence, the percentage of occurrences of twenty amino acids at carboxy terminal sites of transmembrane (alpha and beta) and globular proteins is computed and shown in figure 1.From the figure, it is found that the alanine occurs in higher percentage (13.84%) in terminal sites of alpha transmembrane proteins whereas, phenylalanine (57.14%) occurs in higher percentage in terminal sites of beta transmembrane proteins.In alpha transmembrane proteins, charged amino acid residues such as arginine and glutamic acid occur relatively at higher percentage in carboxy terminal sites.The amino acid 'leucine' occurs relatively in higher percentage along with alanine in terminal sites of both alpha and beta transmembrane proteins.The observation that the occurrences of higher percentage of alanine residues in terminal sites of alpha transmembrane proteins agree with the observation of occurrences of higher percentage of alanine at terminal sites of globular proteins [12].This implies that the frequent distribution of alanine at the terminal site of alpha transmembrane proteins.Occurrence of phenylalanine at the carboxy terminal in beta transmembrane proteins is important for efficient and correct assembly of the protein and it is also observed that the residues with hydrophobic property occurs relatively in higher percentage which suggests that the presence of hydrophobic residues make the chain termini less mobile [22].In the case of alpha transmembrane proteins, residues such as aspartic acid, glycine and methionine no longer occurs at carboxy terminal whereas in the case of beta transmembrane proteins, residues such as arginine, cysteine, lysine, methionine, proline and serine occurs no longer at the carboxy terminal.In both classes of transmembrane proteins, methionine and aspartic acid have no occurrence.

Phylogeny and gene ontology terms of transmembrane proteins
The phylogenetic tree of alpha and beta transmembrane proteins is shown in figure 2. By looking the figures carefully, it is found that the multiple sequence alignment of transmembrane proteins cannot form clusters or groups based on the terminal residues.This is due to the low sequence identity between the proteins in the dataset.The dataset thus used is non-redundant and the sequences in the dataset share less than 30% sequence identity.The gene ontology terms of each of the proteins in the dataset is shown in Supplementary Material S1.Most of the alpha transmembrane proteins in the dataset have the gene ontology terms as electron carrier and metal ion binding.There are other terms such as ion channel activity, transporter, ammonium binding and proteolysis repectively.In the case of beta transmembrane ptoteins, most of the proteins have gene ontology terms as transporter and porin activity.Eleven beta transmembrane proteins have no annotations.

Mutational analysis and solvent accessibility of carboxy terminal sites of transmembrane protein
In order to find the mutable and immutable residues at the carboxy terminal site of the alpha and beta transmembrane proteins, mutational analysis is carried out.The terminal site is mutated with all nineteen amino acids (except wild type amino acid).The average ΔΔG values for each amino acid residue at carboxy terminal is computed and presented in figure 3.In the case of alpha transmembrane proteins, the terminal sites can be mutated by glutamic acid residue whereas in the case of all beta transmembrane proteins, small residue like glycine can be substituted on behalf of any residue at the terminal site.The residues like tryptophan, isoleucine, tyrosine and leucine cannot be substituted at terminal sites of alpha transmembrane proteins.In the case of beta transmembrane proteins, residues such as isoleucine, tyrosine, methionine, proline, phenylalanine and valine cannot be substituted at carboxy terminals.Interestingly, it is observed that the residue glycine can be substituted at the carboxy terminal site instead of any residue in alpha and beta transmembrane proteins.The role of glycine in folded proteins is extensively studied by various groups [23,24,25,26].Similarly, sulphur containing amino acid residue, cysteine can be substituted at the terminal site of alpha and beta transmembrane proteins.The results imply that the carboxy terminal sites cannot be replaced by hydrophobic amino acids like isoleucine and valine.The charged amino acid residues can be substituted at the carboxy terminal position without affecting the stability of the protein in alpha and beta transmembrane proteins.The substitution of residues such as threonine and serine at the carboxy terminal site also seems to be stabilizing in the alpha and beta transmembrane proteins.Accessible Surface Area is closely related to the molecular surface which plays a vital role in binding process [27].The average solvent accessibility of carboxy terminal residues in alpha and beta transmembrane proteins is 87.12% and 67.64% respectively.The average solvent accessibility of terminal site residues of alpha transmembrane proteins is higher than the beta transmembrane proteins.

Carboxy terminal pentapeptide patterns
From several previous reports, it was proved that the identical pentapeptides at different positions in unrelated proteins can adopt different secondary structures [23,28,29,30] .The five terminal residues (carboxy pentapeptides) are extracted from the dataset and these carboxy pentapeptides are subjected to pattern search against the dataset of proteins with regular secondary structures.The carboxy pentapeptides, hits obtained by pattern matching and their corresponding secondary structure, percentage of similarity between secondary structures of obtained hits are shown in table 1 (A and B).In other words, the above shows that the identical pentapeptides with different and similar secondary structure with respect to carboxy pentapeptide.From our observations, it is found that the identical pentapeptides adopt a different secondary structure.The transition of secondary structure is frequent from helix to coil and extended to coil and rare for helix to extended transition.Due to the positional context along sequence, carboxy pentapeptide and hits obtained do not possess a similar secondary structure.The terminal residues are mostly said to be unstructured in both alpha and beta transmembrane proteins, but it is found to have different secondary structure in globular proteins.While looking at the amino acid composition for Cterminal pentapeptides in alpha transmembrane proteins, leucine has occupied highest percentage of occurrences (11.40%).Lecuine is the most distributed amino acid among the proteins and so it has occupied highest percentage.But while considering only the terminal residue, alanine is found to occupy more.Arginine (10.20%) has a relatively higher percentage of occurrence when compared with leucine.Cysteine and methionine have a minimum percentage of occurrences.In the case of beta transmembrane proteins phenylalanine is found to occupy highest percentage of occurrence in both cases [only terminal residue (57%) and terminal pentapeptide (18.40%)].Cysteine is not found in any carboxy pentapeptides.

Hydrogen bonding patterns of carboxy terminal residues
Proteins are made up of hundreds of amino acids folded to a well-defined structure that is stabilized by various types of interactions.Discretization of the 3D structure of proteins can be done in many ways.The formation of hydrogen bonds between amino acid residues in a protein is one such way to understand protein folding and structure.In the present work, the hydrogen bonding of terminal residues with other residues in proteins is analyzed in detail.The terminal residue (donor) and other residue (acceptor) forming hydrogen bond distance and details of acceptor and donor is presented in Supplementary Material S2.In the case of alpha transmembrane proteins, the terminal residue of 20 proteins does not form hydrogen bond whereas, in the case of beta transmembrane proteins, terminal residues of ten proteins do not form hydrogen bonds.In the case of alpha and beta transmembrane proteins, the hydrogen bonding distance between terminal residue and other residue is approximately 3Å. 42% of terminal residues of transmembrane proteins form a long-range hydrogen bond with other residues.Long-range hydrogen bonds are nothing but the residues forming hydrogen bonds that are far in sequence and close in three dimensional structure (distance between two residues along the sequence greater than 4) [31,32].From our analysis, it is found there is no clear discrimination or difference in hydrogen bonding patterns of carboxy terminal residues between alpha and beta transmembrane proteins.

CONCLUSION
Terminal residues of the proteins help in binding to other proteins and for their correct folding and assembly.Although studies have been done on terminal residues of globular proteins, not much of the analysis has been done on terminal residues of transmembrane proteins.These proteins were found to be involved in diverse biological processes.Since, the N-terminal region of the proteins pertain the signal peptides, the significance of the Cterminal residues were subjected to computational analysis.These residues were subjected to mutation analysis to elucidate the favorable and unfavorable residues along their individual positions and it was found that residues like, tryptophan, isoleucine, tyrosine and leucine cannot be substituted at terminal sites of alpha transmembrane proteins and in the case of beta transmembrane proteins, residues such as isoleucine, tyrosine, methionine, proline, phenylalanine and valine cannot be substituted at carboxy terminals respectively.The terminal carboxy pentapeptide patterns of transmembrane proteins were matched with the globular protein sequences and the identical pentapeptides adopting different secondary structures were identified.The frequency of transition was more from helical to coil conformation and from extended to coil conformation.The solvent accessibility and hydrogen bonding patterns were studied and it was inferred that the average hydrogen bonding distance was found to be 3Å and about 42% of the terminal residues of transmembrane proteins form long-range hydrogen bonds with other residues.The results obtained in the present study will provide clues to understand protein folding and binding patterns in transmembrane proteins.

IJCB Vol. 4 ,Figure 1 .
Figure 1.Percentage of occurrences of amino acid residues at C-terminal positions

Figure 2 .
Figure 2. Phylogenetic tree of alpha (left) and beta (right) transmembrane proteins

Figure 3 .
Figure 3. Average ΔΔG values for each amino acid residue at C-terminal position

Table 1 (
A) Carboxy-terminal pentapeptide of alpha transmembrane proteins pattern search in 5877 sequences with regular secondary structure

ISSN: 2278-8115 IJCB Vol. 4, No. 1, April 2015, 44 -54 http://www.ijcb.in
While looking at the amino acid composition for C-terminal pentapeptides in alpha transmembrane proteins, leucine has occupied highest percentage of occurrences (11.40%).Lecuine is the most distributed amino acid among the proteins and so it has occupied highest percentage.But while considering only the terminal residue, alanine is found to occupy more.Arginine (10.20%) has a relatively higher percentage of occurrence when compared with leucine.Cysteine and methionine have a minimum percentage of occurrences.In the case of beta transmembrane proteins phenylalanine is found to occupy highest percentage of occurrence in both cases [only terminal residue (57%) and terminal pentapeptide (18.40%)].Cysteine is not found in any carboxy pentapeptides.

Table 1 (
B) Carboxy-terminal pentapeptide of beta transmembrane proteins pattern search in 5877 sequences with regular secondary structure