Meta-Analysis of Brain and Central Nervous System Microarray Datasets

Received Jul 31 st , 2017 Revised Dec 12 th , 2017 Accepted Apr 4 th , 2018 Brain and CNS cancer are rare in comparison to other types of cancer. Currently there are no effective therapies for their treatment. In this study, meta-analysis of microarray datasets of Brain and CNS cancer was done to obtain significantly upregulated genes with increased statistical power and generalizability. A total of 130 significantly up-regulated genes were obtained. Some of the genes found during analysis have not yet been associated with this cancer. Different biological networks were created and analyzed using the significantly up-regulated genes as input. For each network, the most significant pathways have also been identified computationally. Keyword:


INTRODUCTION
Brain and CNS cancer are classified as heterogenous tumors on genetic and biological basis [1].These cancers comprise for approximately 3% of worldwide cancer cases.Though these are rare, there are no effective therapies for their treatment because of the relatively inaccessible location.They have been reported to be more commonly observed in men as compared to women [2].With the variation in age as well as histological type, the prognoses of brain and CNS cancers differs significantly.Prognosis is significantly poor for old aged people and people suffering from glioblastomas [3].The increased rate of survival in higher income countries can be directly linked to improvement in medical care as well as availability of new therapies [4] [5].Various studies have reported an increase in occurrence of brain and CNS cancers in elderly population of Western countries [6][7] [8].The risk of this cancer increases with certain genetic factors and ionizing radiation exposure, while allergic conditions seem to decrease the risk [9] [10].European countries have the highest rates of brain and CNS cancer while Asian countries have the lowest.This difference can be partially linked to the difference in genetic background of the population in these continents [11].The recent World Health Organization report (2016) includes classification of brain tumors and mentions molecular markers to determine subclasses of gliomas and medulloblastomas.But still very few markers are sufficiently characterized to impact the clinical practice in patients with CNS cancers.Through this paper we aim to predict the genes that are significantly upregulated and the corresponding pathways that play a significant role in CNS cancers.

Brain and CNS cancer microarray datasets
In this study, four brain and CNS cancer datasets (Table 1.) were selected from Oncomine [12] database.These datasets contained a differential analysis of tumor and normal samples, experiment type was mRNA and number of samples in both tumor and normal category was more than one.Oncomine [12] currently contains 715 datasets (Oncomine _Research Edition) and it is one of the most comprehensive cancer-specific database.The major advantage of using this database is that prior to inclusion in Oncomine database, the microarray datasets obtained from public resources such as Stanford Microarray Database and the NCBI Gene Expression Omnibus or literature sources are reviewed by a panel of experts to ensure that they meet certain quality standards [13].

Initial screening of microarray datasets
Each dataset obtained from the Oncomine database contained more than one type of brain and CNS cancer.Hence for each dataset, sub-datasets were created based on the type of cancers it contained, location or treatments.

Creation of sub-datasets
Bredel Brain 2 dataset consists specimens belonging to astrocytic, glioblastomas and oligodendroglial types of brain tumors.Hence this dataset was divided on the basis of these tumors and their corresponding subparts.Six sub-datasets were created and the list is provided in Bredel Brain 2 sheet of Supplementary File1. Lee Brain dataset was divided based on the region from where the tumor was extracted.For instance, a separate subdataset was created for each tumor cell line.Thirteen sub-datasets were created and the list is provided in Lee Brain sheet of Supplementary File1.Liang Brain dataset contained data from 5 different platforms.The normal samples belonged to GEO Platform GPL2935 and on comparison we found that platforms GPL182, GPL2778 and GPL2935 have identical genes whereas GPL2648 and GPL3010 have different genes.So, the samples belonging to the latter two platforms were removed from further analysis.The sub-datasets were created on the basis of different types of tumors.The detailed information is provided in Liang Brain sheet of Supplementary File1.In Murat Brain, the dataset was divided to three sub-datasets based on the various treatments provided.The sub-datasets are listed in Murat Brain sheet of Supplementary File1.

Identification of up-regulated genes in each sub dataset
For each sub-dataset, Significance Analysis of Microarray (SAM) [14] was performed using the software Multiple experiment Viewer (MeV) [15].The details of the output generated are provided in the Bredel Brain 2 sheet, Lee Brain sheet, Liang Brain sheet and Murat Brain sheet of Supplementary File2 for the sub-datasets of Bredel Brain 2, Lee Brain, Liang Brain and Murat Brain respectively.

Z score calculation
Z score is calculated for all possible pairs of sub-datasets.The formula used is where R obs is the number of significant genes in both datasets A and B, n B is the number of genes in dataset B and P A is the probability of gene being significantly upregulated in A [16].

Obtaining ranked list of up-regulated genes
For the identification of differentially expressed genes across multiple datasets, 'rank product' method was used.It is a non-parametric method implemented in the RankProd package [17] [18].RankProd is a biologically intuitive algorithm and statistically rigorous, which has been shown to be robust against noise in microarray data [19] [20].This algorithm is shown to have higher specificity and sensitivity as compared to other types of metaanalytic tools for microarrays [17].Based on the conservative estimation of the percentage of false positive predictions (pfp), a list of up-regulated genes is created.As recommended, a pfp value of 0.15 was used to set the threshold for genes that are significantly up-regulated [16].

Identification of upregulated genes
Combining the result of SAM for all the selected sub-datasets, 3861 significant genes were obtained.To improve the result of SAM, the significant genes were further analyzed using RankProd with pfp threshold of < 0.15.271 genes were found to be significantly up-regulated using RankProd program and after removing duplicates, there were 130 significant genes.A complete ranked list of significantly up-regulated genes has been provided in Supplementary File3.Top 26 significantly up-regulated genes are listed in Table2.To enhance the validity of our findings, we searched for experimental works which prove the involvement of these genes in Brain and CNS cancer.It has been reported that the prognosis in primary Glioblastoma multiformes is highly correlated with ALDH1A3 promoter methylation [21] [22].Studies have shown that ANXA2 is significantly over-expressed in glioma samples as compared to normal brain samples [23].This gene regulates angiogenesis and invasion of malignant gliomas as well [24].C3 gene expression is enhanced in time as well as dose dependent manner through interleukin-1 (IL-1 beta) as well as tumor necrosis factor-alpha (TNF-alpha) in the astroglioma cell line D54-MG [25].CHI3L2 gene encodes a protein YKL-40 which serves as a prognosticator for cancer and this gene is found to be highly up-regulated in glioma [26].COL1A1 is overexpressed in pilomyxoid astrocytomas as well as pilocytic astrocytomas [27].COL1A1 and COL6A2 are overexpressed in both primary as well as meta-static brain tumors [28].CTGF plays a significant role in gliomas.Studies have suggested that gene expression level of this gene can have prognostic significance [29].GBP1 gene over-expression enhances glioma cell invasion [30].GFAP is highly expressed in tumor cells and hence can be used in the detection of tumor recurrence [31].HLA-DPA1 and HLA-DRA are the important genes involved in Glioblastoma multiforme [32].IGFBP7 is highly over-expressed in Glioblastoma multiforme [33] and Pilocytic astrocytomas [34].IL13RA2 is over-expressed in highly invasive glioblastoma multiforme [35].SERPINA3 gene has been found to be involved in brain metastasis [36].TGFBI is methylated in neuroblastoma [37].The remaining top significant genes which have not been associated with brain and CNS cancer till now are: C1QC, CXCL14, DDX3Y, HBA1 or HBA2, HBB, MAGEA6, MEOX2, MGST1, RBMS1, RGS1, SRGN and TGFBI.

Functional analysis of up-regulated genes
The significant genes obtained through RankProd were entered as an input to the tool "Database for Annotation, Visualization and Integrated Discovery", DAVID [38] [39] which is a functional annotation tool.123 genes matched out of 130 genes given as an input.Functional Annotation for these 123 genes was obtained using the tool.Here we present the results of some of the functional annotation categories: Reactome Pathway, OMIM Diseases, KEGG Pathway and Genetic Association DB Disease Class.(Figures2-5) The complete results of Functional Annotation can be found in the Supplementary File-DAVID: Category-based sheet contains the information on the basis of various categories that were selected by us during the execution, Cluster-based sheet contains the information for the different annotation clusters generated by DAVID [38] [39] and Gene-based sheet contains the functional annotation information for each gene.

Identification of significant pathways
Using online tool of Genemania [40] a network was created with 127 genes as an input.The seven types of network which were created are: Co-expression, Co-localization, Genetic Interaction, Pathway, Physical Interaction, Predicted and Shared Protein Domains network.The predicted pathway contains only one interaction and hence was removed from further analysis.The network properties for each network was determined using Network analyzer tool of Cytoscape [41].They are mentioned in the tables below.(Tables 3-8) For each network, significant pathways were determined.The methodology used is as follows: Firstly, for each gene present in the network, we determined the KEGG pathways that are associated with that gene.The obtained pathways are assigned the Genemania [40] score of the corresponding gene.Now, for each pathway present in the network, combined score and total count of the pathway in the network is calculated.The enriched pathways are then classified on the basis of the pathway class of KEGG.The Supplementary File-SIGNIFICANT PATHWAYS contains the graphical representation for enriched pathways present in each network.The complete list of the pathways can be found in the Supplementary File-Pathways.Here we present the topmost significant pathways relevant to our discussion.In co-expression network for brain and CNS cancer, top five significant pathways are Retinol metabolism; cAMP signaling pathway; Notch signaling pathway; Prion diseases and Signaling pathways regulating pluripotency of stem cells.For co-localization network, top five significant pathways are: Signaling pathways regulating pluripotency of stem cells; Notch signaling pathway; Intestinal immune network for IgA production; RIG-I-like receptor signaling pathway and Sphingolipid signaling pathway.In genetic interaction network top five significant pathways are: Prion diseases; Retinol metabolism; Notch signaling pathway; cAMP signaling pathway and Signaling pathways regulating pluripotency of stem cells.For pathway network, top five significant pathways are: Regulation of actin cytoskeleton; Gap junction; MAPK signaling pathway; Ras signaling pathway and Phospholipase D signaling pathway.In physical interaction network, top five significant pathways are: Gap junction; Phospholipase D signaling pathway; Glioma; Melanoma and Choline metabolism in cancer.In shared protein domains network, the top five significant pathways are: cAMP signaling pathway; Notch signaling pathway; Prion diseases; Regulation of actin cytoskeleton and Gap junction.

CONCLUSION
It can be concluded that meta-analysis of microarray datasets yields more comprehensive and reliable results as compared to a single dataset because it has generalizability and increased statistical power.On creating different types of networks of significantly up-regulated genes, various pathways that are possibly enriched in Brain and CNS cancer have been obtained.

Table 1 .
Brain and CNS cancer microarray datasets included in the study

Table 2 .
List of top 26 significantly up-regulated genes

Table 5 .
Network Properties of Genetic Interaction network The edge details of Co-expression, Co-localization, Genetic Interaction, Pathway, Physical Interaction, Predicted and Shared Protein Domains network are provided in the Co-expression sheet, Co-localization sheet, Genetic Interaction sheet, Pathway sheet, Physical Interaction sheet, Predicted sheet and Shared Protein Domains sheet of Supplementary File4 respectively.The node details of Co-expression, Co-localization, Genetic Interaction, Pathway, Physical Interaction, Predicted and Shared Protein Domains network are provided in the Co-expression sheet, Co-localization sheet, Genetic Interaction sheet, Pathway sheet, Physical Interaction sheet, Predicted sheet and Shared Protein Domains sheet of Supplementary File5 respectively.