Molecular Docking Studies towards Development of Novel Gly-Phe Analogs for Potential Inhibition of Cathepsin C ( dipeptidyl peptidase I )

Received Nov 13 th , 2013 Revised April 8 th , 2014 Accepted April 8 th , 2014 Cathepsin C is a cysteine protease required for activation of various proinflammatory serine proteases and, essentially, is of interest as a therapeutic target. Cathepsin C coordinate system was employed as a model to study the interaction of some already available inhibitors of Cathepsin C. Compounds containing Gly-Phe fragment with functional groups at its ends were designed by knowledge based approach. Using AutoDock and Discovery Studio Client 3.1 software packages, binding energy of different conformations and ten scoring functions (LigScore1, LigScore2, PLP1, PLP2, JAIN, PMF, PMF04, LUDI_1, LUDI_2 and LUDI_3) were calculated for newly designed compounds. These docking studies revealed favorable energy scores which also helps to understand interaction of ligands with enzyme. Keyword:


INTRODUCTION
Cathepsin C (DPPI or dipeptidyl peptidase I; EC 3.4.14.1) is a lysosomal cysteine protease, which sequentially removes dipeptides from the N-termini of protein and peptide substrates.DPPI homologues have been identified from gene cloning, biochemical characterization or sequence comparision in various species [1] [2] [3] [4].Continuously growing evidence of the key role of hDPPI (human DPPI) in various diseases, for instance sepsis [5], arthritis [6] and other inflammatory disorders, applying animal models has drawn attention to the potential of DPPI as a drug target.
First time Cathepsin C was described by Guttmann and Frutin in 1948 [7] [8], but the cDNA of human enzyme was published in 1995 [9].The cDNAs encoding rat, human, murine, bovine, dog and two schistosome Cathepsin C have been cloned and sequenced which shows that Cathepsin C enzyme is highly conserved [10].The human and rat Cathepsin C cDNAs encode precursors (prepro-cathepsin C) consisting signal peptides of 24 residues, pro-region of 205 (rat Cathepsin C) or 206 (human Cathepsin C) residues and are 30-40% identical to the mature amino acid sequences of papain and a number of other cathepsins involving cathepsins, B, H, K, L, and S [11].
Cathepsin C is the only member of C1 family that needed a tetrameric form and requires halide ions for its activity [9].It can also activate zymogens of several proteases [12] [13] [14] [15] [16].DPPI activates several pro-inflammatory serine proteases by removal of an aminoterminal inhibitory dipeptide [17].DPPI in acidic lysosomal milieu is primarily an amino dipeptidase cleaving two-residue units from the N-terminal of a polypeptide chain, although it can also acts as a transferase and catalyze the reverse reaction [9] [14].
Although it is processing enzyme of many proteases, DPPI is not capable of auto activation and requires other proteases such as Cathepsin L or S [9].DPPI is distributed in a variety of tissues and it is expressed in lung, kidney, liver and spleen [7].It is widespread in mammals (humans and other apes, cows dogs, rabbits, rats and mice), human parasites (Plasmodium and Schistosoma), fish (rainbow trout and killifish), reptiles (frog) and birds (Chicken) [18] [19] [20] [21] [22], that indicates an important role widespread of Cathepsin C in nature and is present in high levels in cytotoxic lymphocytes, and mature myeloid cells [23].Cathepsin C has two distinct roles: First, it contributes to general protein catabolism within the lysosome.Second, the enzyme is found in secretory granules of immune effector cells (cytotoxic lymphocytes, mast cells and neutrophils) where it activates granular serine proteases by removing an inhibitory N-terminal dipeptide sequence [24] [25] [26].DPPI also functions as a key enzymes in activation of granule serine protease in cytotoxic T lymphocytes and natural killer cells (granzymes A and B) [27] [28], mast cells (chymase and tryptase) [29] [30] [31] [32] and neutrophils (Cathepsin G, proteinase 3 and elastase).DPPI and granzymes are found in granules of cytotoxic lymphocytes [23] and DPPI is responsible for processing and activating progranzymes [15].
DPPI is unique proteases within the papain super family, as it has an olgometric structure; indeed it has a unique structure.It also has a unique mechanism compared with other oligomeric proteolytic complexes such as proteasome, bleomycin hydrolase and tryptase, which all have their active site located inside the structure.Active site of DPPI is located on outside of the structure, which has the capability of binding four molecules of cystatin per oligomer [9].
Human DPPI is 200 kDa (Figure 1) which consists four identical subunits and each subunit is composed from three different polypeptide chains: a heavy chain, a light chain, and an exclusion domain.Nterminal fragment also known as residual propart domain was suggested to be involved in formation of tetramer [9].The interface of heavy chain and light chain mainly consists of hydrophobic interactions.DPPI has four active site clefts.The catalytic residues Cys 234 and His 381 are situated above the oxyanion hole [15].The four active site clefts are positioned approximately at tetrahedral corner of the molecule, ~50-60 Å apart and are exposed to solvent.Each active site (Figure 2) cleft is formed by feature of all three domains of a functional monomer of DPPI; the papin-like domains form the sides of monomer, which is closed at one end by exclusion domain [9].Some available inhibitors of Cathepsin C, which are considered for designing new compounds are: Dipeptide Diazomethyl Ketenes [33] [34] [35], A Dipeptide Nitrile [36] [37], Dipeptide Vinyl Sulfones [38], Dipeptide Acyloxy-Methyl And Fluoromethyl-Ketones [38], Dipeptide O-Acyl Hydroxamic Acid [39], Arginine-Based Peptides [40], Cyanamide-Based Inhibitors [13] and Phoshinic Tripeptides [41].Natural DPPI inhibitor is cystatins or E-64 (Trans-epoxysuccinyl-L-leucylamido-(4-guanidino) butane) [33] [42].Gly-Phe-CHN2 is one of the most efficient inhibitor for suppressing activity of endogenous Cathepsin C enzyme [15], it binds to S1 and S2 binding sites and it is covalently linked to Cys 234 residue, making a thioether bond.All hydrogen bond donors and acceptors from the inhibitor form hydrogen bonds with the enzyme and forms strongest interaction with Asp [33].
Studying active site of Cathepsin C and activity of some already available inhibitor, different compounds were designed by knowledge based approach, containing Gly-Phe fragment with functional groups at its ends.Using AutoDock and Discovery Studio Client 3.1 software packages, binding energy of different conformations and ten scoring functions (LigScore1, LigScore2, PLP1, PLP2, JAIN, PMF, PMF04, LUDI_1, LUDI_2 and LUDI_3) were calculated for newly designed compounds, which helps to understand interaction of ligands with enzyme.In all the potential inhibitors of Cathepsin C, it has been observed that most of them contain 1phenylpropane-2-amine(Gly-Phe) fragments (Scheme 1) in their structure.Studying the activity of Gly-Phe-CHN2 inhibitor, it has been seen that this fragment penetrates into deep pocket formed by active site; interaction with enzyme is formed with both ends of inhibitors.To design more effective inhibitor for Cathepsin C, peptides were added at ends of 1-phenylpropane-2-amine fragments, one end is kept short as there is not much space between ligand and receptor, when Gly-Phe penetrates into the cavity and other end is kept long.Various compounds were designed in this way and different functional groups were added to them considering the active site residues of the enzyme.Designed compounds were docked individually and their interactions with active site residues of enzyme were studied and considered for designing further new inhibitor.Almost all newly designed compounds contains 1phenylpropane-2-amine (Gly-Phe) fragment with: non cyclic derivatives (Scheme 3), benzene (Scheme 4), benzene with modifications (Scheme 5), saturated 6-member ring (Scheme 6), saturated 5-member ring (Scheme 7) and unsaturated 5-member ring (Scheme 8).
Computing interactions of all compounds with enzyme by using Autodock and Discovery Studio Client 3.1 software, it has been observed that compounds with non cyclic derivatives shows less interaction with enzyme as compared to compounds of other groups.Benzene with modifications and unsaturated 5-member ring compounds show quite strong interactions with enzyme and steady conformations in comparison to other compounds.

Knowledge Based Approach
Knowledge based design is one of the well known method for prediction of new small molecule or ligands, which can be more effective than already available molecules [43] [44].It is also found that knowledge based designed structure have the same biological activities, as they can exert related effects similar to available ligands [45].
Designed new molecules with knowledge based are having some small modifications in structure, but this can change the functionality of molecule and also affects its molecular property [46].Modifications can be done by adding and replacing peripheral functional groups, in this way knowledge based method helps to design molecules that are easier to synthesize and more efficient.The nature of binding pocket of the target has a considerable impact on designing of new small molecules.It is useful to understand the type of interaction, for instance, lipophilic or hydrophilic interaction [47].
There are several methods proposed to measure ligand-binding efficiency [48] [49] that helps to evaluate how well the functional groups or heavy atoms are used in binding.It is found that less-complex ligands have a better efficiency of binding to target [50] [51].Another element of ligand quality is structural complexity of the ligand, especially the features that are associated with its scaffold.It has been shown that 50% of known drugs share only 32 molecular frameworks [52], implying an optimal level of skeletal complexity is an intrinsic character of drug likeness.Recent analysis revealed that a certain level of molecular complexity is necessary to achieve desired biological activity [53].In this work knowledge based approach, was considered for designing and modeling new inhibitors which could be the potential target for inhibiting the activity of the Cathepsin C.  bonded water tetrahedral cluster filling the active site cavity.It is the cluster (the carboxylate group of the carbamylated lysine and the hydroxide molecule) that urea replaces when binding to the active site for the reaction [11].As a consequence of above ligations, Ni(1) is pentacoordinated and Ni(2) hexacoordinated, and their coordination geometry is pseudo square pyramidal and pseudo octahedral, respectively.In another consideration, urease can severely decrease the efficiency of urea fertilizers to cause the release of large amounts of ammonia and further induce plant damage by ammonia toxicity and soil pH increase [11].So, to control the rate of the enzymatic urea hydrolysis using urease inhibitors is an important goal.Large quantities of urea produced as a result of biological process.Each human being produces approximately 10 kg of urea per year.Spontaneous degradation of urea occurs with a half life of approximately 3.6 years [1], but in the presence of urease, the hydrolysis of urea is 104 times faster [7].

RESEARCH METHOD
For calculating interaction energy and different scoring functions of newly designed compounds two different programs were used: AutoDock and Discovery Studio Client 3.1, these software packages follows different algorithms for calculating receptor-ligand interactions.

Autodock package
AutoDock uses simulating annealing docking algorithms, a physical process for calculation of interaction energy between ligand and receptor.In AutoDock, ligand is treated as flexible and protein is treated as rigid, that states ligand will interact with receptor on its outer surface [54].AutoDock docking procedure is divided into three steps related with tree programs: AutoTors, AutoGrid and AutoDock.The simulation begins at high temperature, accepting nearly every move and ligand explores large areas of conformation space.As the temperature is reduced, unfavorable moves are increasingly disallowed.The ligand ultimately finds an optimal position and conformation within the deepest energetic well it has sampled, by performing multiple separate simulations, starting from random initial states, consistent favorable binding conformations may be located.Intermolecular interaction energies are calculated at each time step, including dispersion/repulsion, hydrogen bonding and electrostatic terms, replacing the hard-sphere potential used in the original release [55] [56].

Accelrys Discovery Studio Client 3.1 package
Initially the structures of compounds were drawn using Accelrys Discovery Studio Client 3.1, force field CHARMm, Partial Charge-Momany Rone were applied [57][58] and it was also used to optimize receptorligand complex and to calculate the docking score of each compound.Missing bond orders, charges and angles were assigned to each ligand.
LigScore1, LigScore2, PLP1, PLP2, Jain, PMF, PMF04, Ludi_1, Ludi_2 and Ludi_3 are different scoring functions, which were calculated for newly designed compounds by Discovery Studio Client 3.1.Among these scoring functions, PMF and PMF04 are knowledge based functions and rest all empirical scoring functions [59].Different scoring functions are grouped according to the methodologies used to derive them, these three main approaches are generally termed force field based, knowledge based, and empirical [59].
1. Force field scoring functions mainly account for the binding energy between ligand and protein by evaluation of non-bonded interactions, electrostatic and van der Waals potentials between protein and ligand atoms [59] [60] [61] [62].
2. Knowledge-based potentials are derived by collecting statistics on interatomic distances from a database of protein-ligand structures in order to determine potentials of mean force based on the probability of observing specific types of protein and ligand atoms at a given separation from each other [59] [63] [64] [65].
3. Emperical scoring Functions are a set that defines various aspects of receptor-ligand interaction energies like hydrogen bonding, steric interactions, lipophilic interactions, solvation and entropic effects.This states that empirical scoring function is related to force field function [59]

RESULTS AND ANALYSIS
Newly designed compounds contains 1-phenylpropane-2-amine (Gly-Phe) fragment with: Non cyclic derivatives, Benzene, Benzene with modifications, saturated 6-member ring, saturated 5-member ring, unsaturated 5-member ring.Methodologies which were going to be applied in all the newly designed compounds were tested with one of already known inhibitor.As Gly-Phe-CHN2 is stated as one of the potential inhibitor [15] [33], it was selected for testing.The results which were obtained from both methodologies were compared with the data available for Gly-Phe-CHN2 inhibitor.AutoDock result of Gly-Phe-CHN2; lowest binding energy is -7.06 Kcal/mol, mean binding energy is -6.47 Kcal/mol.(Table 1) Comparing interaction results of Gly-Phe-CHN2 with enzyme obtained by AutoDock and Discovery Studio Client 3.1, it has been found that inhibitor shows interaction with Asp 1 residues of enzyme, which indicates that both different methods, supports each other.From these two methods, it has been obtained that inhibitor interacts with Asp 1, Asn 380 and Gly 277 residues of enzyme.By comparing results obtained from two methods with already available data [33], it is observed that both results correlate each other.These states that methodologies which are going to be applied to newly designed compounds are correct and they give reliable results of receptor-ligand interactions.
AutoDock results are described in terms of lowest binding energy, mean binding energy and total conformations in cluster.During evaluation of the results these three terms were taken in consideration.AutoDock software automatic ranks cluster according to their lowest binding energy, but values given in tables for each compounds may contain one of the best clusters, mean binding energy and lowest binding values.Cluster difference of less than 10 conformations was neglected, while selecting lowest energy.In some cases, selection of best cluster, mean binding and lowest energy was difficult, so two or three different conformations were selected for a single compound.
Discovery Studio Client 3.1 was used for calculation of different scoring functions: knowledge based scoring function like PMF and empirical scoring functions like LigScore, PLP, Jain and Ludi.All different scoring functions result was analyzed separately: LigScore, PLP and PMF score with lowest value were considered well, Jain and Ludi score with highest values were better.Results obtained states that scoring functions like LigScore1 and LigScore2 do not have much difference in their values and Ludi 1, 2 and 3 score values are almost similar with each other.
From all newly designed two compounds were considered as the appropriate potential targets for the Cathepsin C, they are C-40 and C-48.Their results are described below, describing two different methodologies.(Table 2) There are two hydrogen bond formed between receptor and ligand, Figure 3A; NH group of compound C-40 forms hydrogen bond of bond length 1.9 Å with oxygen of Asn 380 residue and oxygen of ligand forms hydrogen bond with SH of Cys 234 of receptor, with bond length 1.9 Å.During receptor-ligand interaction, lowest binding energy obtained is -7.5 kcal/mol, mean binding energy of the cluster is -7.31 kcal/mol and total number of conformation obtained in the cluster is 104, which is 52 % of total cluster with respect to 200 poses calculated, that states ligand is quite steady when it binds to receptor.
Figure 4A, Oxygen of long end of compound C-40 forms hydrogen bond with NH group and SH groups of Cys 234 residue, with bond length 1.9 Å and 2.2 Å respectively.Fluorine from the short end of compound forms hydrogen bond with NH group of Gln 228 residue of heavy chain with bond length 2.0 Å.
Figure 4B, is a 2D diagram of computed structure of compound a with enzyme residues, which shows different types of interactions.All residues are colored in pink circles, which indicate they take part in hydrogen-bond, charge or polar interactions between ligand and receptor.Some atoms of compound are surrounded by a blue halo, it indicates about solvent accessible surface of an atom and diameter of circle is Compound C-48 form hydrogen bonds with exclusion domain, heavy chain and light chain.Hydroxyl group of short end of compound forms hydrogen bond with oxygen of Asp 1 residue with bond length 1.7 Å. NH group of long end form hydrogen bond with oxygen of Asn 380 residue of light chain with bond length 2.0 Å and NH group present at the tail of long end of compound forms hydrogen bond with oxygen of Gln 228 residue with bond length 1.8 Å. Lowest binding energy of the compound obtained is -7.37 kcal/mol, and mean binding of 18 conformations is -6.55 kcal/mol.
Figure 6A, two oxygen of short end of compound C-48 forms hydrogen bonds with SH group of Cys 234 residue of heavy chain with bond lengths 1.5 Å and 2.8 Å respectively.Oxygen of long end of the compound forms hydrogen bond with NH group of Gly 277 residue with bond length 2.1 Å and fluorine of long end forms hydrogen bond with NH group of Asp 1 residue with bond length of 2.3 Å.
Figure 6B, is a 2D diagram of computed structure of compound belong with residues of enzyme and it shows different types of interactions formed between ligand and receptor.All residues are colored with a pink circle around them, which indicates that they take part in hydrogen-bond, charge or polar interactions between the ligand and receptor.Some atoms of compound are surrounded by a blue halo, it indicates solvent accessible surface of an atom and diameter of circle is proportional to the solvent accessible surface.
There is a Pi interaction formed between compound C-48 and Asp 1 residue of receptor represented with an orange line.Hydrogen bonds are represented in green and blue dotted lines, which indicate ligand binds with main chain and side chain of residues respectively; arrows are directed towards electron donor.
BPU is an heteropolymeric molecule (αβγ) 3 with exact threefold symmetry and contains flexible subunit composition which depends on organism.The structure of the active site of urease is highly conserved which contains two nickel ions and has a comparably small volume.Urease inhibitors which are distinct in structure have been effectively identified.The most efficient urease inhibitors are Diamidophosphate (DAP) (Scheme 2) and its derivatives, which hydrolyze to the active molecule (DAP) in the active site [43].
DAP is a transition state analogue.It loses stability in aqueous environments because it contains hydrolyzable (especially at low pH) P-N bonds in its structure [44].So the effort was done to modify the structure of this transition state analogue by using highly stable P-C or C-P-C bonds to improve its activity and stability against Bacillus pasteurii urease.
By using the knowledge-based design approach, 44 different compounds were designed to evaluate their potency against BPU.After designing the novel compounds using the knowledge of already synthesized urease inhibitors, they were energy minimized to the closest local minimum using the molecular mechanics CHARMm force field implemented in Discovery Studio.To study the interaction between the ligand and enzyme active site, all 44 compounds were docked in to the enzyme active site and were evaluated with 10 different scoring functions (Ligscore 1, Ligscore 2, PLP1, PLP2, Jain, PMF, PMF04, Ludi-1, Ludi-2, Ludi-3) of Discovery Studio package.
Automated docking was used to locate the appropriate binding orientations and conformations of different inhibitors in the BPU.To perform the task, genetic algorithm routine implemented in the program AutoDock was employed.Kollman charge, atomic solvation parameters and fragmental volumes were assigned to the protein using MGL Tools package.The program AutoGrid was used to generate the grid maps.Lamarckian genetic algorithm was applied for minimization using default parameters.The standard docking protocol was then applied by using AutoDock software package and binding free energies (G b , Kcal/mol) were obtained.
Inhibitor C-  1) (Scheme 3) are top ranked compounds according to evaluation of 10 scoring functions and binding free energy of AutoDock and these 8 compounds have also obtained highest number of conformations which were quite well overlaid.The inhibitors constructed by using phenyl/aromatic/aliphatic rings with fluorine/chlorine substitutions were obtained top ranked compounds in most of the scoring functions among 44 different structures.

CONCLUSION
To conclude the work, it is an approach to design new compounds that can be potential inhibitors for DPPI.For designing new potential inhibitors of DPPI several known inhibitors and their activities with the enzyme were considered.Compounds containing non cyclic derivatives, benzene, benzene with modifications, saturated 6-member ring, saturated 5-member ring and unsaturated 5-member ring were designed.
Binding free energy and binding affinity of all compounds with enzyme were calculated by AutoDock and Discovery Studio Client 3.1 and their interactions with receptor were studied.It has been found that, in AutoDock ligand is treated as flexible and protein is treated as rigid, which states that ligand interacts with receptor on its outer surface and water molecules actively participates in the receptor-ligand interaction.
Using Discovery Studio Client 3.1 package, PDB coordinates of enzyme are kept rigid, whereas structure of enzyme fluctuates and the same is for inhibitor.It indicates that ligand has possibilities to interact with receptor at their inner and outer surface.Possible active role of water molecules in the receptor-ligand interaction is neglected.As the active site of the DPPI is located on its outer surface, there are not much difference seen in interactions between enzyme and all newly designed compounds in both methodologies.
Binding energy of all generated conformations for each inhibitor has been calculated by Autodock, and the best conformations showing lowest energy in cluster were used for ranking compounds.From newly designed compounds C-40 and C-48 (Scheme 2) can be considered as top ranked compounds, as they show strong interaction with enzyme.It has been observed that lowest binding energy, mean binding energy and total conformations in clusters, and scores of compound C-40 and C-48 are much better than standard inhibitor and their results correlates in both methodologies.In Autodock lowest binding energy from all generated confirmations were considered for ranking the compounds.Compound C-40 shows lowest binding energy -7.53 kcal/mol, 104 conformations in cluster and Ludi_3 score 509, all these score describes that compound 40 form strong interactions with enzyme in a steady state.Compound C-48 shows lowest binding energy -7.37 kcal/mol, -PLP1 score -260.25,-PMF04 score -124.36 and Ludi_3 score 662, which states this compound could be a potential structure of DPPI.
Compounds containing non cyclic derivatives have shown weak interaction with enzyme than other compounds and standard inhibitor.Gly-Phe fragment containing benzene, benzene with modifications, saturated 6-member ring, saturated 5-member ring or unsaturated 5-member ring at one end and functional group at other end, can be potential structure for investigated protease.Further modification in these newly designed compounds could increase their inhibitory activity.

Fig. 1 (
Fig.1(A) Crystal structure of Human Dipeptidyl Peptidase I (Cathepsin C): Exclusion domain added to an endopeptidase framework creates the machine for activation of granular serine proteases (1K3B)[9].(B) Crystal structure of Human Dipeptidyl Peptidase I (Cathepsin C) in complex with inhibitor Gly-Phe-CHN 2 (2DJF)[33].Color Scheme: Exclusion domain (A chain) is represented in orange, heavy B chain is colored in light blue and light C chain in blue.A chlorine atom is represented in green in center.

Fig. 2 Fig. 3 Fig. 4 Fig. 5 Fig. 6
Fig. 2 Active site with hydrogen bonds, amino acid residues that are present in active site of Cathepsin C enzyme are: Asp1, Gln 228, Cys 234, Gly 277, Asn 380 and His 381 and they take part in catalytic mechanism or substrate binding [9] [33].Color Scheme: Carbon atom are colored in grey, oxygen atom in red, nitrogen atom in blue, sulfur atom in orange, chlorine atom in green and hydrogen bonds are shown in green dashed lines.

Vol. 3 ,
No. 1, April 2014, 03-26 http://www.ijcb.inproportional to the solvent accessible surface.Compound C-40 forms three hydrogen bonds with heavy chain of the enzyme, represented with blue dashed lines, arrows are directed towards electron donor.There is a Pi interaction formed between compound C-40 and Asp 1 residue of enzyme represented with an orange line.

Table 1
Discovery Studio Client 3.1 results of scoring functions of Gly-Phe-CHN 2