Accurate Demarcation of Protein Domain Linkers Based on Structural Analysis of Linker Probable Region
In multi-domainproteins, the domainsare connected by a flexible unstructured region called as protein domain linker. The accurate demarcation of these linkers holds a key to understanding of their biochemical and evolutionary attributes. This knowledge helps in designing a suitable linker for engineering stable multi-domain chimeric proteins. Here we propose a novel method for the demarcation of the linker based on a three-dimensional protein structure and a domain definition. The proposed method is based on biological knowledge about structural flexibility of the linkers. We performed structural analysis on a linker probable region (LPR) around domain boundary points of known SCOP domains. The LPR was described using a set of overlapping peptide fragments of fixed size. Each peptide fragment was then described by geometricinvariants (GIs) and subjected to clustering process where the fragments corresponding to actual linker comeupasoutliers.We then discover the actual linkers by finding the longest continuous stretch ofoutlier fragments from LPRs. This method was evaluated on a benchmark dataset of 51 continuous multi-domain proteins, where it achieves F1 score of 0.745 (0.83precision and 0.66recall). When the method was applied on 725 continuous multi-domain proteins, it was able to identify novel linkers that were not reported previously. This method can be used in combination with supervised /sequence based linker prediction methods for accurate linker demarcation.
A. Aliverti, R. Faber, C. M. Finnerty, C. Ferioli, V. Pandini, A. Negri, P. A. Karplus, and G. Zanetti, “Biochemical and crystallographic characterization of ferredoxin-NADP (+) reductase from nonphotosynthetic tissues”, Biochemistry, 40:14501-14508, (2001).
P. Argos. “An investigation of oligopeptides linking domains in protein tertiary structures and possible candidates for general gene fusion”, J MolBiol, 211:943-958 (1990).
S. M. Babor and D. Fass, “Crystal structure of the Sec18p N-terminal domain”, ProcNatlAcadSci U S A, 96:14759-14764, (1999).
K. Bae, B. K. Mallick, and C. G. Elsik, “Prediction of protein interdomain linker regions by a hidden Markov model”, Bioinformatics, 21:2264-2270, (2005).
J. J. Barycki, L. K. O'Brien, A. W. Strauss, and L. J. Banaszak, “Sequestration of the active site by interdomain shifting. Crystallographic and spectroscopic evidence for distinct conformations of L-3-hydroxyacyl-CoA
dehydrogenase” J BiolChem, 275:27186-27196, (2000).
P. G. Board, M. Coggan, G. Chelvanayagam, S. Easteal, L. S. Jermiin, G. K. Schulte, D. E. Danley, L. R. Hoth, M. C. Gri_or, A. V. Kamath, M. H. Rosner, B. A. Chrunyk, D. E. Perregaux, C. A. Gabel, K. F. Geoghegan, and J. Pandit, “Identification, characterization, and crystal structure of the Omega class glutathione transferases”, J BiolChem, 275:24798-24806, (2000).
L. Bousset, H. Belrhali, R. Melki, and S. Morera, “Crystal structures of the yeast prion Ure2p functional region in complex with glutathione and related compounds”, Biochemistry, 40:13564-13573, (2001).
S. E. Brenner, P. Koehl, and M. Levitt, “The ASTRAL compendium for protein structure and sequence analysis”, Nucleic Acids Res, 28:254-256, (2000).
Z. W. Chen, M. Koh, G. Van Driessche, J. J. Van Beeumen, R. G. Bartsch, T. E. Meyer, M. A. Cusanovich, and F. S. Mathews, “The structure of avocytochrome c sulfide dehydrogenase from a purple phototrophic bacterium”, Science, 266:430-432, (1994).
S. C. Cordell, R. E. Anderson, and J. Lowe, “Crystal structure of the bacterial cell division inhibitor MinC”, EMBO J, 20:2454-2461, (2001).
M. Dumontier, R. Yao, H. J. Feldman, and C. W. V. Hogue, “Armadillo: domain boundary prediction by amino acid composition”, J MolBiol,
T. Ebina, H. Toh, and Y. Kuroda, “Loop-length-dependent SVM prediction of domain linkers for high-throughput structural proteomics”, Biopolymers, 92:1-8, (2009).
M. D. Feese, B. P. Ingason, J. Goranson-Siekierke, R. K. Holmes, and W. G. Hol, “Crystal structure of the iron-dependent regulator from Mycobacterium tuberculosis at 2.0-A resolution reveals the Src homology domain 3-like fold and metal binding function of the third domain”, J BiolChem, 276:5959-5966, (2001).
O. V. Galzitskaya and B. S. Melnik, “Prediction of protein domain boundaries from sequence alone”, Protein Sci, 12:696-701, (2003).
R. A. George and J. Heringa, “An analysis of protein domain linkers: Their classification and role in protein folding”, Protein Eng, 15:871-879, (2002).
R. A. George and J. Heringa, “SnapDRAGON: a method to delineate protein structural domains from sequence data”, J MolBiol, 316:839-851, (2002).
C. Gibbons, M. G. Montgomery, A. G. Leslie, and J. E. Walker, “The structure of the central stalk in bovine F (1)-ATPase at 2.4 A resolution”, Nat StructBiol, 7:1055-1061, (2000).
R. S. Gokhale and C. Khosla, “Role of linkers in communication between protein modules”, CurrOpinChemBiol, 4:22-27, (2000).
S. J. Harrop, M. Z. DeMaere, W. D. Fairlie, T. Reztsova, S. M. Valenzuela, M. Mazzanti, R. Tonini, M. R. Qiu, L. Jankova, K. Warton, A. R. Bauskin, W. M. Wu, S. Pankhurst, T. J. Campbell, S. N. Breit, and P. M. Curmi, “Crystal structure of a soluble form of the intracellular chloride ion channel CLIC1 (NCC27) at 1.4-A resolution”, J BiolChem, 276:44993-45000, (2001).
A. B. Hickman, Y. Li, S. V. Mathew, E. W. May, N. L. Craig, and F. Dyda, “Unexpected structural diversity in dna recombination: the restriction endonuclease connection”, Mol Cell, 6:1025-1034, (2000).
R. H. Jacobson, A. G. Ladurner, D. S. King, and R. Tjian, “Structure and function of a human TAFII250 double bromodomain module”, Science, 288:1422-1425, (2000).
P. D. Je_rey, L. Tong, and N. P. Pavletich, “Structural basis of inhibition of CDK-cyclin complexes by INK4 inhibitors”, Genes Dev, 14:3115-3125, (2000).
L. Jin, B. Stec, and E. R. Kantrowitz, “A cis-proline to alanine mutant of E. coli aspartate transcarbamoylase: Kinetic studies and three-dimensional crystal structures”, Biochemistry, 39:8058-8066, (2000).
P. Li, D. L. Morris, B. E. Willcox, A. Steinle, T. Spies, and R. K. Strong, “Complex structure of the activating immunoreceptor NKG2D and its MHC class I-like ligand MICA”, Nat Immunol, 2:443-451, (2001).
J. Liu and B. Rost, “CHOP proteins into structural domain-like fragments”, Proteins, 55:678-688, (2004).
S. Miyazaki, Y. Kuroda, and S. Yokoyama, “Characterization and prediction of linker sequences of multi-domain proteins by a neural network”, J StructFunct Genomics, 2:37-51, (2002).
J. Moser, W. D. Schubert, V. Beier, I. Bringemeier, D. Jahn, and D. W. Heinz, “V-shaped structure of glutamyl-tRNAreductase, the first enzyme of tRNA-dependent tetrapyrrole biosynthesis”, EMBO J, 20:6583-6590, (2001).
D. Mumford, J. Fogarty, and F. Kirwan. Geometric invariant theory.Ergebnisse der Mathematik und ihrerGrenzgebiete.Springer-Verlag, 1994.
A. G. Murzin, S. E. Brenner, T. Hubbard, and C. Chothia, “SCOP: a structural classification of proteins database for the investigation of sequences and structures”, J MolBiol, 247:536-540, (1995).
N. Nandhagopal, A. A. Simpson, J. R. Gurnon, X. Yan, T. S. Baker, M. V. Graves, J. L. Van Etten, and M. G. Rossmann, “The structure and evolution of the major capsid protein of a large, lipid-containing DNA virus”, ProcNatlAcadSci U S A, 99:14758-14763, (2002).
F. Park, K. Gajiwala, G. Eroshkina, E. Furlong, D. He, Y. Batiyenko, R. Romero, J. Christopher, J. Badger, J. Hendle, J. Lin, T. Peat, and S. Buchanan, “Crystal structure of YIGZ, a conserved hypothetical protein from Escherichia coli k12 with a novel fold”, Proteins, 55:775-777, (2004).
P. Reinemer, L. Prade, P. Hof, T. Neuefeind, R. Huber, R. Zettl, K. Palme,
J. Schell, I. Koelln, H. D. Bartunik, and B. Bieseler, “Three-dimensional structure of glutathione S-transferase from Arabidopsis thaliana at 2.2 A resolution: Structural characterization of herbicide-conjugating plant glutathione S-transferases and a novel active site architecture”, J MolBiol, 255:289-309, (1996).
C. R. Robinson and R. T. Sauer, “Optimizing the stability of single-chain proteins by linker length and composition mutagenesis”, ProcNatlAcadSci U S A, 95:5929-5934, (1998).
M. A. Schumacher, M. C. Miller, S. Grkovic, M. H. Brown, R. A. Skurray, and R. G. Brennan, “Structural mechanisms of QacR induction and multidrug recognition”, Science, 294:2158-2163, (2001).
R. M. Story, H. Li, and J. N. Abelson, “Crystal structure of a DEAD box protein from the hyperthermophileMethanococcusjannaschii”, ProcNatlAcadSci U S A, 98:1465-1470, (2001).
X. Sun, J. A. Cross, A. L. Bognar, E. N. Baker, and C. A. Smith, “Folatebinding triggers the activation of folylpolyglutamatesynthetase”, J MolBiol, 310:1067-1078, (2001).
M. Suyama and O. Ohara, “DomCut: prediction of inter-domain linker regions in amino acid sequences”, Bioinformatics, 19:673-674, (2003).
P. Tan, M. Steinbach, and V. Kumar. Introduction to data mining. Pearson Addison Wesley, 2006.
T. Tanaka, Y. Kuroda, and S. Yokoyama, “Characteristics and prediction of domain linker sequences in multi-domain proteins”, J StructFunct Genomics, 4:79-85, (2003).
Y. Tanaka, O. Nureki, H. Kurumizaka, S. Fukai, S. Kawaguchi, M. Ikuta, J. Iwahara, T. Okazaki, and S. Yokoyama, “Crystal structure of the CENPB protein-DNA complex: the DNA-binding domains of CENP-B induce kinks in the CENP-B box DNA”, EMBO J, 20:6612-6618, (2001).
W. R. Taylor, “Protein structural domain identification”, Protein Eng., 3:203-216, (1999).
A. V. Tendulkar, A. A. Joshi, M. A. Sohoni, and P. P.Wangikar, “Clustering of protein structural fragments reveals modular building block approach of nature”, J MolBiol, 338:611-629, (2004).
A. V. Tendulkar, M. A. Sohoni, B. Ogunnaike, and P. P. Wangikar, “A geometric invariant-based framework for the analysis of protein conformational space”, Bioinformatics, 21:3622-3628, (2005).
A. V. Tendulkar, P. P. Wangikar, M. A. Sohoni, V. V. Samant, and C. Y. Mone, “Parameterization and classification of the protein universe via geometric techniques”, J MolBiol, 334:157-172, (2003).
H.Weyl. The classical groups: their invariants and representations. Princeton landmarks in mathematics and physics.Princeton University Press, 1997.
P. Williams, Y. Chaudhry, I. G. Goodfellow, J. Billington, R. Powell, O. B. Spiller, D. J. Evans, and S. Lea, “Mapping CD55 function. The structure of two pathogen-binding domains at 1.7 A”, J BiolChem, 278:10691-10696, (2003).
W. Wriggers, S. Chakravarty, and P. A. Jennings, “Control of protein functional dynamics by peptide linkers”, Biopolymers, 80:736-746, (2005).
B. A. Wurzburg, S. C. Garman, and T. S. Jardetzky, “Structure of the human IgE-Fc C epsilon 3-C epsilon 4 reveals conformational flexibility in the antibody effector domains”, Immunity, 13:375-385, (2000).
L. Yu, A. H. Gunasekera, J. Mack, E. T. Olejniczak, L. E. Chovan, X. Ruan,
D. L. Towne, C. G. Lerner, and S. W. Fesik, “Solution structure and function of a conserved protein SP14.3 encoded by an essential Streptococcus pneumoniae gene”, J MolBiol, 311:593-604, (2001).
R. Zhang, T. Pappas, J. L. Brace, P. C. Miller, T. Oulmassov, J. M. Molyneaux, J. C. Anderson, J. K. Bashkin, S. C. Winans, and A. Joachimiak, “Structure of a bacterial quorum-sensing transcription factor complexed with pheromone and DNA”, Nature, 417:971-974, (2002).
X. Zhu, X. Zhao, W. F. Burkholder, A. Gragerov, C. M. Ogata, M. E. Gottesman, and W. A. Hendrickson, “Structural analysis of substrate binding by the molecular chaperone DnaK”, Science, 272:1606-1614, (1996).