DEN : an R-Bioconductor based package to extract active sub-networks from human interaction map by integrating gene-expression data

Received May 6 th , 2015 Revised May 26 th , 2015 Accepted May 26 th , 2015 Living cells are complex, dynamic, self-regulatory, interactive systems, showing differential states across time and space. Complexity of cellular systems is highlighted with the multi-layered regulatory mechanisms involving the interactions between bio-molecules (such as DNA, RNA, mi-RNA and proteins). These interactions are analyzed in the form of static networks. Likewise, number of experimental techniques like microarray, RNASeq allow quantification of cellular dynamics and aid in discerning differential gene expression across diverse conditions. Computational biology is in need of methods for integration of static networks and gene expression data, since it provides interesting insights into the dynamics of biological systems. DEN is an R/Bioconductor based package designed to assemble different types of human bio-molecular interactions as a complete interactome and contains functions to extract dynamic active networks by integration of gene expression data. Keyword:


INTRODUCTION
The main Biological systems are composed of multiple layers of self-regulatory, dynamic interaction networks.These networks can be decomposed into three major types of interactions; protein-protein interactions (PPI), transcription factor -target gene (TF-TG) interactions (also known as gene regulatory networks) and miRNA-gene (miRNA-mR) interactions [1,2].PPIs account for the signaling and metabolic complexes formed between various proteins.The transcription factors regulate transcription rates of other proteins by binding to promoter sequences alone or in combination with other proteins [3].Third kind of regulatory mechanism functioning in the cell is the interference of mi-RNAs in post-transcriptional gene-expression.miRNAs are known to influence the process of mRNA degradation [4] and are involved in various cellular functions, such as differentiation, proliferation, death and tumourigenesis [5].The two important regulators (TFs and miRNAs) are known to share similar regulatory logics and bring about a cooperative action in gene regulatory network depending on the binding sites present on target gene [6].Functioning of a cell is basically determined by Static binary interactions between bio-molecules have been broadly investigated using basic network analysis frameworks (since biological networks seem to behave in a similar manner to other scale free networks like social networks) [7].Such studies focus on network topology, robustness, module/ motif identification and other graph properties (like shortest paths, hubs, betweenness centrality and so on) [8].Although these examinations have revealed interesting facts, they have a major limitation with respect to living systems.As we know, biological systems are quite dynamic in nature (so much so that they are assumed to be random) and they differ in activity across various tissues/ cell-types and at different time points.These temporo-spatially changing impressions are out of scope for classical network biology.On the other hand high-throughput experimental techniques, including microarrays, RNA-Seq, reveal the system level alterations taking place in the cells at different time points.Analyzing data from these techniques results in identification of genes involved in distinct bio-processes (across diverse cells) [9].List of differentially expressed genes (DEG) derived from these analyses can be further subject to gene annotation, pathway analysis, functional enrichment and so on.
Integration of gene expression data with networks makes identification of functional modules possible [10,11].Several studies have employed such data integration to identify active networks specific to tissues, biological processes [12] and diseases [13,14], particularly cancer [15,16].A range of computational methods have been developed to analyze the transcriptome/ metabolome network dynamics by integrating static interaction measurements with dynamic changes in gene expression or metabolic fluxes [16,17,18].Drug target identification, gene prioritization in disease conditions are few of the prominent applications of these methods.Although these approaches are not intended to identify new interactions, complexes or pathways, they seek changes in network wiring that are active under specific experimental conditions or anomalies.Many R [19] / Bioconductor (BioC) [20] packages have also been developed which aid in differential gene expression [21,22] and sub-network analysis [23,24] independently.The available sub-network extraction packages use either coregulated genes for network construction [25] or depend upon static PPI data [26,27].
Differential active sub-network (DEN) is an R/BioC based package designed to give access to the three major types of interaction networks (PPIs, TF-TG and mi-mR) in the form of static graph objects (representing human interactome) and functions to extract dynamic active networks by integrating either gene expression data or list of active genes.Finally, application of DEN package has been exemplified using a case study.

RESEARCH METHOD
DEN package was implemented in R programming environment and released under Artistic License 2.0.The datasets to create human interaction map consisting of PPI, TF-TG and mi-mR were obtained from respective databases: HPRDrelease 9 [28], HTRIdb [29] and miRTarBaserelease 4.5 [30].Data from these three sources were checked for inconsistencies and non-human/ redundant interaction entries were removed from the dataset to get unique set of interactions.The datasets were then converted to UCINET DL graph file format [31].For basic graph manipulation and analysis R-igraph library [32] was used.The DL graph files for various network types were then imported into R to conceive igraph network objects (discussed in Results and Analysis section) which form the core of DEN.These objects are used for sub-network extraction using the package functions (refer Supplementary file; CoreFunctions).DEN package depends on affy [33] for processing Affymetrix Gene Chip data and limma [21] to find DEGs from given affyBatch object.The package also depends on Bioconductor annotation data hgu133a2.db[34] for probe annotation.The overview of DEN package is as shown in Figure 1.

RESULTS AND ANALYSIS
Integration of data from the three data sources (mentioned in 2) resulted in following igraph-objects; HPPINet, HTFNet and HMiNet, representing PPI, TF-TG interaction and mi-mR networks respectively.HPPINet is an undirected graph containing proteins as nodes and interactions as edges.HTFNet is a directed graph with edges having transcription factor node at one and target gene at other end.HMiNet is again a directed network with edges between miRNA influencing target mRNAs (genes).The mRNA/ protein/ transcription factor node names in all the above networks are official gene symbols.In the networks containing miRNAs, the node names are as per miRTarBase database [30] naming convention; for example database name of miRNA, hsa-miR-20a-5p has been assigned a node name miR-20a-5p in the graph.Since all the interactions under consideration are human, the prefix "hsa" has been removed from all the miRNA names.HNet is a complete human interaction map composed of all the three network objects (HPPINet, HTFNet, HMiNet).The three network objects can be separately loaded for analysis or as a combined network.HNet is an undirected igraph object.

Global topology of human interaction network
Global topology parameters of the assembled networks are as given in Table 1.The Node-Edge chart (Figure 2) gives a comparative account of the nodes (vertices) and edges in the respective networks.Among the three networks under DEN (HMiNet, HTFNet, HPPINet), HTFNet has the highest number of nodes, whereas HPPINet has the highest "edge per node" ratio (E/N).Hence HPPINet is relatively dense network as compared to other two networks.E/N is almost doubling in the HNet interactome, indicating a highly interconnected network.From the average clustering coefficient (except in case of HMiNet; since it is a sparse network) and mean degree it is evident that all the networks contain highly organized node clusters, which is a common observation in most of the biological systems.Among all HTFNet has the highest clustering, indicative of the extremely organized nature when compared to others.These values are also consistent with fact that biological systems are characterized as "small world" networks.respectively.E/N is the ratio of number of edges by number of nodes in a given network; which represents the density of associations in the network.

Case study: Endoderm differentiation from human stem cells
Application of DEN using real expression dataset has been demonstrated in this case study.The active network extracted from the human interactome is further analyzed using igraph library functions.The purpose of the original study by Loh KM et al. [35] is induction of endoderm differentiation from human induced pluripotent stem cells (hiPSC) and understanding the influence of signaling factors in lineage differentiation.In that study, HES3 hiPSC were subject to differentiation protocol to derive definitive endoderm (DE) cells and microarray analyses were conducted of six different populations including hiPSC (at day zero) and DE (at day three).This expression data series; GSE52158 (Affymetrix Human Genome U133 Plus 2.0 Array platform), is downloaded from NCBI-Gene Expression Omnibus [36] and reanalyzed using DEN.After performing quality control procedures (as described in [37]) following data replicates were selected for analysis; GSM1260408, GSM1260409, GSM1260410 (hiPSC) and GSM1260414, GSM1260415, GSM1260416 (DE).
A total 5308 transcripts were found to be significantly up-regulated and 6220 transcripts were downregulated in DE as compared to hiPSC.These 11528 differential transcripts, after annotation were found to represent 7778 unique DEGs.A list of genes expressed in human endoderm tissue (required to maintain the endoderm function) was downloaded from GeneCards database [38].List of genes from GeneCard and up regulated DEGs in DE (output from DEN) were compared and 331 genes were found to be common among both lists.These common genes were used to extract a sub-network using DEN.This network represents an active state of interactions in DE.The derived network consists of 468 nodes (225 miRNA's, 243 genes) and 1210 edges (Figure 3).The network contains genes with associations to miRNAs as neighbors, which results in a list of probable miRNAs involved in development and maintenance of endoderm (Supplementary file; MiRNA_interactions_DE).From the top twenty nodes having highest degree ("hubs", based on degree measure) in the network, following hub genes can be listed (with respective degrees); ACTB (20), HSP90AA1 (21), TUBA1B (21), EEF2 (22), GAPDH (25), RPS2 (25), ACTG1 (27), VEGFA (37), EEF1A1 (40).This study can be further extended to understand the shortest or alternative paths for induction of tissue differentiation, which is out of scope for the current article.

CONCLUSION
The basic aim of this work was to assemble and avail different types of human interaction networks in the form static graphs and to use this static data as a background network for various types of analysis.The current version of DEN package (Version 1.0.available at http://sourceforge.net/projects/desn/) acts a repository of human interactome network objects and provides functions to extract relevant interactions based on the active genes in the cells.The properties of local/ active sub-networks and global networks seem to vary, since functions (thus expression) and interactions between bio-molecules are quite context specific.By providing list of genes/ miRNAs involved in a particular biological process, pathway, disease/ disorder or molecular function to DEN, pertinent sub-networks can be extracted and further analyzed.
Development of DEN is an ongoing process and later versions will contain more biological knowledge incorporated in the form of Gene Ontology (functional assignments to nodes).DEN will support robust network motif identification procedures, based on the ontologies.These features are expected to improve the biological sensitivity of this method in context specific functional module identification.

Figure 1 .
Figure 1.Overview of DEN package

Figure 2 :
Figure 2: Node-Edge plot for basic network data objects in DEN.Number of nodes and edges in the networks are scaled by taking logarithm to the base 10 (for convenience in representation); labeled as log (n) and log (e) respectively.E/N is the ratio of number of edges by number of nodes in a given network; which represents the density of associations in the network.

Figure 3 :
Figure 3: An active network involved in development of endoderm, derived using DEN.Cytoscape version 2.8 [39] is used for visualization.