Nonparametric Statistical Test Approaches in Genetics Data

Received Nov 5 th , 2015 Revised Nov 11 th , 2015 Accepted Dec 26 th , 2015 The biggest challenge of genetic research lies in significant and intellectual analysis of the large and complex data sets generated by the cutting edge techniques like massively parallel DNA sequencing and genome wide analysis. Statistical analyses are the most important of such experimental data. When the data are not normally distributed and using non numerical (rank, categorical) data then use the nonparametric test for exact result of research hypothesis. Order statistics are among the most fundamental tools in non-parametric statistics and inference. Non parametric test does not depend upon parameters of the population from which the samples are drawn, no strict assumption about the distribution of the population. Nonparametric tests are known as distribution free test also because their assumptions are less and weaker than those connected with parametric test. Nonparametric test does not follow probability distribution. To analyze microarrays and genomics data several non-parametric statistical techniques are used like Wilcoxon’s signed rank test (pre-post group),Mann-Whitney U test (two groups) or Kruskal-Wallis test (two or more groups).Importance of this paper is to look at the nonparametric test how to use in genetic research and provide the understanding of these test. (9 pt). Keyword:


INTRODUCTION
In some situations, the assumption that data are realizations of Gaussian random variables is not suitable.In the non-parametric context, no assumption is made on the distribution of the differential score, and theoretical quintiles and p-values are not calculable in a close form.
Nonparametric methods require assumption like symmetry of distribution and continuity.
These test applied if the measurements are nominal, ordinal as well as continuous score, N.P. test cannot estimate the parameters its use only for testing the hypothesis.NP tests are based on the order Statistics.Order statitics are not independent even if original variate values are independent.During measuring of the quantity of mRNA bound to each site of the array, they x is descending order and this is known as ordered statistics.This observation can show in ascending order also.
Together with rank statistics, order statistics are among the most fundamental tools in nonparametric statistics and inference.
When using probability theory to analyze order statistics of random samples from a continuous distribution, the cumulative distribution function is used to reduce the analysis to the case of order statistics of the uniform distribution.Important special cases of the order statistics are the minimum and maximum value of a sample, and the sample median and other sample quintiles.Nonparametric tests make no or very minimal assumptions about the probability density from which the data are derived.They are used when the sample size is small, when the data are not normally distributed and cannot be approximated as normal, and when using non numerical (rank, categorical) data.The Nonparametric tests are often a good option for small sample sizes (n < 30).

Objective:
The objective of this paper to apply of nonparametric tests and its approach in genetics data on the sampled example and better understanding of these test included statistical hypothesis.This paper will show how the nonparametric tests are very useful if the data is not follow the normality specific in the various genetic researches.

Model Specification of tests:
If study data do not follow the distributional assumptions of parametric methods, even after transformation, or data involve non-interval scale measurements, then non-parametric test is reliable.Thumb rule apply the nonparametric test is SD>1/2 MEAN.There are various one sample, two or more sample non parametric test which are using in the various biological, public health and genetic research.Here I am going to explain only those tests who is mostly use in genetic research and important for the research.
Mann Whitney U/Wilcoxon's Ranked Sum test: When normality assumptions are not satisfied for any one or both of the groups, the equivalent nonparametric.It is alternative of the parametric independent t -test.This test is applied for the find the difference between two independent groups have been drawn from same population.

Assumption:
 Variable of interest is continuous.
 Measurement scale is at least ordinal  The both sample should be independent.
Let 12 ,..., n x x x    and 12 ,..., n y y y    .be independent ordered samples of size from population.Then the null hypothesis Ho: f 1 (.) = f 2 (.) and alternative Hypothesis H 1 : f 1 (.)≠f 2 (.).Where f 1 (.) and f 2 (.) is p.d.f. of the population.This test is based on the two independent x's and y's combined ordered sample.The test statistics is given as , U 1 value can find through this formula.
Where n 1 = no. of observation in the sample x, R 1 = sum of the ranks of the values in sample x.
, U2 value can find through this formula Where n 2 = no. of observation in the sample y, R 2 = sum of the ranks of the values in sample y.
For the test statistics we consider the U= smaller value of U 1 &U 2 and based on that conclude our null or research hypothesis at the 0.05 significant level.Determine a critical value of U such that if the observed value of U is less than or equal to the critical value, we reject Ho: f 1 (.) = f 2 (.) in favor of H 1 and if the observed value of U exceeds the critical value we do not reject Ho: f 1 (.) = f 2 (.).

Example:
In a genetic inheritance study, we want compare the groups X and group Y with respect to the variable MSCE (mean sister chromatid exchange).
The data is as follows: Group (X): 7. 5   Wilcoxon's signed-rank test: This test is useful for testing the significance of differences in paired observations.This test is an alternative of the paired Student's t-test for matched pairs, when the population not follow the normality then use this test.In this test we measure a variable in each subject pre and post an intervention.

Assumption:
 Sample must be pair and should be same population.Kruskal-Wallis test: Kruskal-Wallis is a non-parametric method for testing to compare medians among j comparison groups (j > 2) and this is like the one-way analysis of variance (ANOVA) with the data replaced by their ranks.Kruskal-Wallis test does not follow the Normal distribution, unlike ANOVA.

Assumption:
 Sample should be independent  Variable of the study is continuous  Populations are equal except maybe in value of median We set hypothesis H 0: The j population medians are equal.
H 1: The j population medians are not equal.
Let there be j independent samples from j population with sizes n 1 , n 2,..., n j .Then the Kruskal-Wallis test H 0 and defined as follows: Where N= the total sample size The statistics H is approximate distributed as with (j-1) degree of freedom.
Example: To evaluate protein secondary structure through CF AVG, GOR, and PHD three different methods.We want to test whether all three methods is differ to each other.In this example total sample size N = 12, R 1 = 10, R 2 = 29, and R 3 = 39.Remember that the sum of the ranks will always equal n (n+1)/2.As a check in our assignment of ranks, we have ).Then we reject the H 0, means that all three methods are not equal.
Fisher's Exact Test: Fisher's exact test is more accurate than the Chi-Square test ore when the expected numbers are small.This test is calculating the probability of the "R×C" The above data we calculate the p = (c+d) (b+d) ⁄ = 12 15 14 13 /27 2 10 12 3 p = 0.0018 Based on the p value we can reject the hypothesis that the factors gene allele and disease are independent and conclude that there is a significant relation between the disease and which allele of the gene a person has 2. SUMMARY Non-parametric methods have fewer assumptions than parametric tests so useful when these assumptions not met.The NP tests are often a good option for small sample sizes (n < 30).Non-parametric methods are a mixture of tests.Ordered statistics play very important role in the nonparametric test.These are the entire test is very useful the genetics study and microarray data analysis.In this paper define the idea and how calculate the nonparametric test with example of genetics data.Overall conclude of this paper is that nonparametric methods play very important role if the genetics data not follow the normality and these test can give the appropriate result of the hypothesis.

H 1 :
Measurement scale is at least ordinal  Pairs are chosen randomly and independently Let x i and y i be the pre and post sample size of the population.State the null hypothesis H 0 : M d = M 0 d , H 1 : M d ≠ M 0 d Calculate each paired differnces, d i = x i −y i , where x i ,y i are the pairs of observations.Rank the d i value, ignoring their negative signs and make the rank according to their sign value.Then calculate the Test Statistics W. , Where N is the number of pairs of observations in the sample.Compute the sum of the ranks of the positive d i , which is W+ and W−, the sum of the ranks of the negative d i Then compare the calculated value to tabulated value of W at 0.5 level of significance.Based on that we can find the hypothesis.The two-sided test consists in rejecting H 0 , if |W| .In the test total, W+ + W−, would be equal to n (n+1)/2.Example: The genetic disorder autism patients taken for the study, this study measure the behaviors of children affected with autism, before and after a 4 weeks course of meditation.There is no significant effect off meditation on autism after 4 weeks There is significant effect off meditation on autism after 4 weeks IJCB Vol. 5, No. 1, August 2016, 77 -87 http://www.ijcb.inCalculate the value in table in given below: smaller value of (W+ and W-) = 13 Tabulated value of 3 when n is 8 at 0.05 significance level which is less than calculated value of W (13).Then we reject the H 0. It means that there is significant effect off meditation on autism after 4 weeks.We have n (n+1)/2 = 8(8+1))/2 = 8*9/2 = 36, which is equal to W-+ W+) 12+ 23 =36.
www.ijcb.inAllocate the rank of each group together from 1 to N = , for the i th sample of size n i , then probable sum of rank is = = , R i sum of ranks of observations in sample i.

ISSN: 2278- 8115 IJCB Vol. 5 , No. 1 ,
August 2016, 77 -87 http://www.ijcb.inn (n+1)/2 = 12(13)/2=78 which is equal to 10+29+39 = 78.The H statistic for this example is computed as follows: -3(N+1) = H = 1/13*[25+210.25+380.25]-39 = 615.5/13-39 = 47.35 -39 H = 8.35 Calculated value = 8.35 greater than tabulated value which 5.99 at the 0.5 % significance with 2 d.f (j-1=3-1=2 genes are expressed under various situations, in different tissues, and in different organisms.Then have become significant technique because several thousand genes can be expressed at one time in one experiment.This facilitates the procedure of gene study clearly.According to The International HapMap Consortium (2003), the statistical analysis and modeling of the links between DNA sequence variants and phenotypes will play a pivotal role in the characterization of specific genes for various diseases and, ultimately, the design of personalized medications that are optimal for individual patient.When analyzing the many thousands of genes on a microarray, we would need to check the normality of every gene in order to ensure that appropriate statistical test.There are many sources of variability in microarray experiment and outliers are frequent.The distribution of intensities of many genes If the observations are arranged in any order that is known as order tatistics .All the observations are dependent in the ordered statistics; probability function of ordered statistics is not the same as that of original variables.In statistics, the nth order statistic of a statistical sample is equal to its nth-smallest value.Let 12 , ,...,  ISSN: 2278-8115 IJCB Vol. 5, No. 1, August 2016, 77 -87 http://www.ijcb.incan determine how may not be normal then apply the nonparametric test.There are a number of nonparametric test used for test one sample nonparametric test are sign test, kolomogorov-smirnov test, Wilcoxon's signed-rank test.Two or more samples nonparametric test are like waldwolfowiz run test,mann -whiteny U test,kruskal-walis test, Wilcoxon's paired signed-rank test, sign test for paired sample, spearman's rank correlation test,mcnemar's test.Order sample is desirable for the NP test 12 ,..., n x x x    .The distribution of the area under the density function between any two ordered observations in independent of the form of the density function.Order statistic: r x x The null hypothesis is that there is no relation and the factors are independent H 1 : The null hypothesis is that there is relation and the factors are independent table.Where R is the number of rows and C is the number of columns.Mostly 2x2 table use in Fisher's exact test.This test hypothesis of independence to a hyper geometric distribution of the numbers in the cells of the table Assumption:  The binary data should be independent  Out of any expected numbers are less than 5Fisher's exact test the calculate the probability of getting any set of values was given by hyper geometric distribution formula: = (c+d) (b+d) ⁄ , Example: Doing a genetic study and studying the effect on which of two alleles for a gene a person has and the presence of a disease.We perform a genetic test to determine which allele the test subjects have and a disease test to determine whether the person has a disease.The data for a 2 x 2 contingency analysis should be entered in the format below, which apply the tests.The tests we want to perform with this contingency table are whether or not the two  ISSN: 2278-8115 IJCB Vol. 5, No. 1, August 2016, 77 -87 http://www.ijcb.inCalculate the table data