A HIGH-PERFORMANCE GENOME-WIDE ASSOCIATION STUDY ALGORITHM BASED ON ANALYSIS OF PAIRS OF INDIVIDUALS

Utkin L.V., Utkina I.L. A High-Performance Genome-Wide Association Study Algorithm based on Analysis of Pairs of Individuals. Аbstract. An extremely simple and high-performance genome-wide association study (GWAS) algorithm for estimating the main and epistatic effects of markers or single nucleotide polymorphisms (SNPs) is proposed. The main idea underlying the algorithm is based on comparison of genotypes of pairs of individuals and comparison of the corresponding phenotype values. It is used the intuitive assumption that changes of alleles corresponding to important SNPs in a pair of individuals lead to a large difference of phenotype values of these individuals. In other words, the algorithm is based on considering pairs of individuals instead of SNPs or pairs of SNPs. The main advantage of the algorithm is that it weakly depends on the number of SNPs in a genotype matrix. It mainly depends on the number of individuals, which is typically very small in comparison with the number of SNPs. Another important advantage of the algorithm is that it can detect the epistatic effect viewed as gene-gene interaction without additional computations. The algorithm can also be used when the phenotype takes only two values (the case-control study). Moreover, it can be simply extended from the analysis of binary genotype matrices to the microarray gene expression data analysis. Numerical experiments with real data sets consisting of populations of double haploid lines of barley illustrate the outperformance of the proposed algorithm in comparison with standard GWAS algorithms from the computation point of view especially for detecting the gene-gene interactions. The ways for improving the proposed algorithm are discussed in the paper.


Introduction.
A genome-wide association study (GWAS) aims to discover genetic factors underlying phenotypic traits, i.e., GWAS examines the association between phenotypes and genetic variants or genotypes across the entire genome.It can be regarded as one of the methods for the well-known feature selection problem where features are the so-called single nucleotide polymorphisms (SNPs).SNPs are typically used as markers of a genomic region and can be defined as a DNA sequence variation where a single nucleotide (A, T, C, G) in the genomic sequence differs among the individuals of a biological species.It should be noted that most SNPs have no effect on the phenotype values or their effect is very insignificant.However, there are SNPs which might be very important in associations between SNPs and the phenotypes.Therefore, another formulation of the main aim of GWAS is to identify or select the most relevant SNPs which differentiate one group of individuals from another or which contribute to the phenotypic differences among the individuals.
From the machine learning point of view, a GWAS is one of the supervised classification or regression problems, where each individual can be regarded as an example in terms of machine learning.It is defined by many SNPs which can be viewed as features in terms of machine learning.Therefore, many machine learning methods, including Lasso and ridge regressions, support vector machines, random forests, neural networks, have been used for GWAS.It should be noted that GWAS problem can be referred to as the well-known feature selection methods which are an important part of the machine learning approaches.In contrast to many standard statistical approaches underlying GWAS, machine learning models allows us to get a solution by taking into account the information of the whole genotype, and thus implicitly consider all possible correlations.Moreover, several variable importance measures can be derived from the machine learning models [1].
We point out some difficulties of solving the GWAS problem mentioned by many authors.First of all, the number of SNPs p is usually very large.It is typically 10-100 times the number of individuals n in the training sample.This is the so called p n  (or large p small n ) problem.Sec- ond, genetic mechanisms might involve complex interactions among genes and between genes and environmental conditions which are not fully captured by additive models [2,3].SNPs may interact in their effects on phenotype, i.e., there is the so-called epistatic effect.Third, many genetic variants are not genotyped, i.e., there are missing data in the genotype information.Fourth, GWAS is applied to find the association between SNPs and different kinds of the trait.It is mentioned by Korte and Farlow [4] in their interesting review of the GWAS methods that the successful GWAS methods applied to identifying SNPs contributing a disease (the two-valued or casecontrol phenotype) may have problems in finding SNPs associated with complex traits (quantitative or continuous phenotype).
A huge amount of the statistical procedures and methods solving the GWAS problem have been developed the last decades.A part of methods can be referred to as filter methods [5] which use statistical properties of SNPs to filter out poorly informative ones.The Fisher criterion, Pearson 2  -test, Cochran-Armitage test are the well-known statistical methods for detecting differential SNPs between two samples.These methods can be joined as the so-called single-locus association tests because the tests are performed separately for each SNP when the case-control phenotypes are analyzed.For quantitative phenotypes, a standard tool is the one-way ANOVA [6].Another part of methods uses various kinds of regression models which can be referred to as embedded methods [7][8][9][10].One of the pioneering papers devoted to the use of regression models in SNP selection has been written by Lander and Botstein [11].The regression models mainly include the Ridge regression and Lasso techniques, their combination called the elastic nets [12].Comprehensive reviews of the methods and al-gorithms using the regression models and their various modifications for solving the GWAS problems are provided by Wray et al. [13], Hayes [14], Visscher et al. [15], Bühlmann [16].
It has been mentioned that the standard GWAS analyzes each SNP separately in order to identify a set of significant SNPs showing genetic variations associated with the trait.However, an important challenge in the analysis of genome-wide data sets is taking into account the so-called epistatic effect when different epistatic loci interact in their association with phenotype.The epistatic effect can be viewed as gene-gene interaction when the action of one locus depends on the genotype of another locus.At the same time, there are different interpretations of the epistatic effect.A fundamental critical review of different definitions and interpretations of epistasis is provided in [17].From the statistical point of view, the epistatic effect is the statistical deviation from the joined effects of two loci on the phenotype [18].There is a series of interesting methods which use the statistical tests at their first step in order to reduce the set of SNPs.These are FastANOVA [19], FastChi [20], COE [21], TEAM [22].We can also point out methods which differs from the filter methods, for example, the Bayesian epistasis association mapping method (BEAM) proposed by Zhang and Liu [23], tree-based methods like the random forests [24], the multifactor dimensionality reduction [25], modifications of the Lasso techniques [26], the ant colony optimization [27].Comparative analyses of methods devoted to the epistatic interaction effect were provided by several authors [28,29].Analyzing these methods, we have to conclude that most of them have two steps (except for the methods with exhaustive consideration of all SNP pairs) such that the first step is for reducing the set of all SNPs to the most important ones, and the second step solves the SNP-SNP interaction problem.
From many approaches for solving the GWAS taking into account the epistatic effect, we would like to mark out a very interesting and efficient algorithm [30] that is subquadratic in the number of SNPs {0,1, 2} .The authors [30] propose an algorithm for efficiently retrieving some predefined number of top scoring pairs among all pairs of SNPs, assuming binary phenotypes and the difference-in-correlation as the association criterion.Some implicit ideas of the algorithm will be used below.
In the present study, we propose a computationally extremely simple GWAS algorithm.It is based on the intuitive assumption that changes of alleles corresponding to important SNPs in a pair of individuals lead to large difference of phenotype values of these individuals.The main advantage of the algorithm is that it weakly depends on the number of SNPs in a genotype matrix.It mainly depends on the number of individuals, which is typically very small in comparison with the number of SNPs.We called the algorithm FAPI-GWAS (Fast Analysis of Pairs of Individuals for GWAS).
A preprint of the paper is given in https://arxiv.org/abs/1708.01746.

The proposed algorithm.
We start with the following general definition of the association mapping problem.Let be a genotype matrix for n individuals and p SNPs.From a statistical point of view, the genotype matrix can be treated as a predictor matrix and the marker genotypes as qualitative explanatory variables, i.e., 1 ( ,.., ) T is a predictor representing the j-th SNP, 1,..., j p  .For bi-allelic SNPs, every ij x is an allele of the i-th individual at the j-th SNP locus.It can be represented by the set {0,1} , where 0 and 1 stand for majority and minority alleles, respectively.A genotype may also be represented with numbers {0,1, 2} to represent the homozygous major allele 0 AA  , heterozygous allele / 1 Aa aA  , and homozygous minor allele 2 aa  , respectively.A vector of alleles corresponding to the i-th individual will be denoted as 1 ( ,.., ) T .A goal of GWAS is to find SNPs in X , that are highly associated with Y , which will be called as important or significant SNPs.
In order to explain the introduced notation by means of an example, we provide Figure 1, where the genotype matrix X and the phenotype vector Y are illustrated.It can be seen from Figure 1   The main idea underlying the FAPI-GWAS is based on comparison of genotypes of pairs of individuals and comparison of the corresponding phenotype values.At that, we use the following intuitive assumption.If genotypes of two individuals are close to each other and the corresponding phenotype values of these two individuals are far from each other, then the SNP-markers which correspond to different elements of the considered two genotypes might be important or contribute to the phenotype values.Indeed, if two individuals differ by some small number of genotype elements, then it is naturally to expect that their phenotypes are similar.However, if the corresponding phenotypes are substantially different, then it is naturally to suppose that this small number of distinguishing genotype elements define this large difference of phenotypes values.Of course, the large difference of the phenotype values may be caused by the noise or other random factors.Therefore, we cannot make any conclusions only on the basis of one pair of individuals.That is why the word combination might be used above means that this assumption may be wrong due to random character of the phenotype values.But we can make the conclusion by analyzing all pairs of individuals or a part of all pairs.
Informally, the FAPI-GWAS can be written as follows.First of all, we find all pairs ( , ) i j x x of vectors of alleles.Then, we select some prede- fined number of the pairs which have largest differences of phenotype values and smallest distances between the vectors of alleles for every pair in accordance with some combined measure jointly characterizing the differences and the distances.The next step is to make a decision which SNPs contribute to the difference between the vectors of alleles for the best pairs.The use of the predefined number of pairs allows us to smooth possible outliers of the phenotype values due to random factors.The above is illustrated in Figure 2, where three pairs of individuals are analyzed.The first pair does not show a large difference between the phenotype values.It is 5. Therefore, this pair is not interesting for us.The second and the third pairs have the difference 50 between the phenotype values.However, this difference for the third pair is caused by many (5) transitions between genotype values, which are underlined.Therefore, the third pair is also not interesting for us.At the same time, the second pair has only one transition.This implies that the large difference between phenotypes is caused by the 7-th SNP.Hence, we can conclude that this SNP is important.x x such that i j  are studied.
Step 3.For every pair ( , ) i j x x , the distance ( , ) i j  x x between vec- tors i x and j x , , 1,..., i j n  , i j  , is computed.A type of the distance de- pends on data.It can be the standard Hamming distance for binary variables ij x .The standard Euclidean distance metric can be also used here.
Step 4. For every pair ( , ) i j , the difference ( , ) d y y  is valid because phenotypes are sorted in descending order (see Step 1).
Step 5.For every pair ( , ) i j , the ratio ( , ) ( , ) / ( , ) is computed.The larger the difference d and the smaller the distance  are, the larger ratio r is.The ratio r is a measure of target pairs.
Step 6. N largest values of ( , ) r i j are selected.Denote these values as ( , ) r i j  and the set of their indices ( , ) i j as .J  The value N can be regarded as a tuned parameter later.Another way is to compute the value N by con- structing a cumulative probability distribution of the random variable r whose sample values are ( , ) r i j .It was observed by many numerical experiments that values ( , ) r i j have a unimodal distribution.Moreover, if we assume that ran- dom variables taking values ( , ) i j d y y and ( , ) for example, normal distributions, then r has one of the so-called ratio distribu- tions, for example, the Cauchy distribution, the t-distribution, the F-distribution.Therefore, we take a predefined value of % q quantile of the random variable r and find all values of the ratio such that their empirical distribution function is larger than /100 q .In this case, we derive some value of N from the above procedure, and q can be viewed as a tuned parameter of the algorithm.
Step 7.For every pair ( , ) i j from J  , we find a subset of elements of vectors i x and j x which differentiate these vectors.In particular, if The vector ij z has element 1  at the k-th position if there is the transition from 0 in i x to 1 in j x at the k-th position, element 1 if there is the transition from 1 in i x to 0 in j x at the same position, and element 0 by transitions from 0 to 0 or from 1 to 1 at the same position, i.e., there holds Only elements of ij z with values  Let us illustrate the above algorithm by means of a toy example.Suppose we have 3 n  individuals whose genotype matrix for 5 bi-allelic SNPs is represented by symbols 0 and 1 which stand for major and minor alleles, respectively.The sorted phenotype values are 45, 15, 10.The initial data are shown in Table 1.We have three pairs of vectors of alleles such that the phenotype differences ( , ), i j d y y the genotype transitions, the corresponding Hamming distances between vectors of alleles in every pair and the ratios ( , ) r i j are given in Table 2.
( , ) r i j 30 8.75 1.667 Suppose that the threshold N for selecting the largest values of ( , ) r i j is 2. Table 3 shows individuals satisfying this condition and the values ( , ) i j z of transitions taking the values 1, 0,1  (see Step 7).It can be seen from Table 3 that only the third SNP has two non-zero elements ( , ).

i j z
This implies that only the third SNP is important.Indeed, it is obviously from Table 3 that the largest difference is observed between phenotypes of the first and the second individuals.Moreover, only the third SNP separates the first and the second vectors of alleles.Intuitively, we can conclude that this SNP is a reason for the large difference between phenotypes of the first and the second individuals. x x is taken in Algorithm 1.However, other distance metrics depending on the analyzed dataset can be used.These distance metrics can be regarded as tuning elements of the model.

Properties of the algorithm.
Let us point out some properties and advantages of the FAPI-GWAS.
1) The epistatic effect which is viewed as gene-gene interaction should not be separately analyzed.It is implicitly included into the proposed algorithm.Indeed, we do not consider single SNPs.For every pair of vectors of alleles, the difference of the vectors is computed for all SNPs simultaneously.So, if there is a combination of alleles which significantly impact on the phenotype, it produces a large difference between the corresponding phenotype values.This is a very important property which allows us to significantly reduce the computational burden needed for consideration of many SNP pairs.
2) The FAPI-GWAS is very simple.Its computational complexity is , i.e. the complexity is linear with the number of SNPs p .This is a very important property of the algorithm because the number of SNPs p is typically 10-100 times the number of individuals n in the training sample for many problems.Moreover, the algorithm does not require special procedures like Lasso, etc.For comparison purposes, a very interesting algorithm for 2locus genome-wide association studies [30] has the complexity Algorithms FastANOVA [19] and TEAM [22] have the complexity 3) The FAPI-GWAS does not depend on the set of allele values.For example, a few trivial changes are needed to consider the case {0,1, 2} ij x  .
Moreover, the important feature of the algorithm is that the values {0,1, 2} or {0,1} are viewed as categorical numbers without order, for example, 0 1 2   .The FAPI-GWAS can be modified for the case ij x R  which takes place in the microarray gene expression data analysis.
4) Another advantage of the FAPI-GWAS is handling missing data in the genotype matrix.We do not need to apply special procedures for preprocessing missing data and their imputation.The missing data just extend the set of values of every ij x .We use the conservative strategy.For example, suppose x as a candidate for getting an important SNP.At the same time, when we have a single missing value at the k-th position in vectors i x and j x , then ( ) 0 ij k z  in accordance with the strategy that a larger number of important SNPs is preferable because the second selection from a small subset of important SNPs should be carried out by means of the wellknown standard procedures.
5) The FAPI-GWAS can be used when the phenotype takes only two values (the case-control study).It is obvious in this case that only a set composed from pairs of individuals taken from the case and control groups, respectively, is analyzed.Indeed, ( , ) 0 (we assume that the vectors of alleles are sorted in descending order of the corresponding phenotypes).6) For many available algorithms of GWAS using filter methods for selection of the most important SNPs like the Fisher exact test, the one-way ANOVA, etc. we have to predefine a limit number of the important SNPs.The FAPI-GWAS determines this number itself.
7) The FAPI-GWAS can be tuned by means of the parameter N (the number of largest values of the rate r ) or parameter q .On the one hand, too small values of the parameter N may lead to a large number of target SNPs.As a result, we have to use some additional procedures for restricting the number of SNPs.On the other hand, large values of N may lead to possible miss- ing SNPs which actually may be very important.There is a compromise choice of N which can be carried out by considering all possible values of N in a predefined grid.Another parameter for tuning is the decision threshold h .
8) The FAPI-GWAS is flexible.This means that many its elements can be changed.For example, there are many metrics for computing distances between vectors of alleles such that the choice of an appropriate metric might improve the algorithm.Similarity ( , ) 1) The first dataset consists of 175 DH lines of barley [31,32].The data are available at Oregon Wolfe Barley Data (OWBD) and GrainGenes Tools (http://wheat.pw.usda.gov/ggpages/maps/OWB/).The lines are analyzed with respect to the heading date trait.The linkage map consists of 1328 SNPs.
2) The second dataset consists of 92 DH lines of barley from the Dicktoo x Morex cross and described in [33,34,35].The data are available at http://wheat.pw.usda.gov/ggpages/DxM/.We analyze the lines with respect to two phenotypic traits: heading date with and without vernalization with an 8-h light/16-h dark photoperiod regime.The linkage map consists of 117 SNPs.
3) The third population dataset includes 150 DH lines of barley from the Steptoe x Morex cross [36].The corresponing data are available at http://wheat.pw.usda.gov/ggpages/SxM.The linkage map consists of 223 SNPs.The lines are analyzed with respect to the heading date trait measured in 16 environments and grain yield trait measured in 6 environments.
The missing data are handled by means of extending the set of values of every ij x , i.e., the set of values {0,1} is extended on the set {0,1, 2} .First, we investigate DH lines of barley from OWBD.The parameter q is 97% .In order to compare the proposed algorithm, we apply the standard tool ANOVA to testing the association between a single marker and a continuous outcome.The F-test is used to assess whether the expected values of a quantitative variable within several predefined groups differ from each other.From this, we can retrieve a p-value for the significance of association between each SNP and the phenotype.Then we correct for multiple testing using the Holm-Bonferroni method.The Manhattan plot generated from the obtained p-values is shown in Figure 3 (the left plot).One can see from the Figure 3 that the significant SNPs have numbers close to 139, 725, 1100.SNPs with these numbers have the smallest p-values.
Let us look at Figure 3 (the right plot) now.It shows a similar Manhattan plot, but significant SNPs are obtained by using the FAPI-GWAS, and p-values are computed for this set again using the Holm-Bonferroni correction.However, the first step of the FAPI-GWAS provides not only the significant SNPs which coincide with the SNPs derived by the standard tool ANOVA.It provides SNPs with numbers 1169 and 1302, which do not belong to the set of significant SNPs obtained by means of the ANOVA.It turns out that the p-values of these single SNPs are larger than 0.05 , i.e., they cannot be viewed as significant ones.In contrast to the single-locus approach applied before, we perform the ANOVA test in order to identify interacting SNP-pairs that have strong association with the phenotype.It is important to note that the two-locus ANOVA test is performed on a small number of candidate SNP-pairs which have been obtained by means of the FAPI-GWAS.It turns out that SNPs with numbers 1169 and 1302 interact with SNPs 729 and 725, respectively, such that the corresponding p-values ( 0.021 and 0.047 ) after the Holm-Bonferroni correction are smaller than 0.05 .In other words, the FAPI-GWAS allows us to implement the efficient epistasis detection., respectively.The corresponding Manhattan plot is shown in Figure 4 (the left plot).Numerical experiments with using the FAPI-GWAS provide quite the same results.They are shown in Figure 4 (the right plot).However, the FAPI-GWAS indicates that there is the 49-th SNP (saflp35) which has a large p-values, but its interaction with SNPs 112 and 22 gives the p-values 0.0135 and 0.0144, respectively.All p-values are computed by using the Holm-Bonferroni correction.
We get similar results for the unvernalized treatment (the second phenotypic trait).In addition, we obtain SNPs with numbers 36, 59, 76, which are called as saflp219, SOLPRO, HorB, respectively, and which are located on different chromosomes.These SNPs interact with the SNP 22 with the corresponding p-values 0.0034 , 0.038 , 0.045 , respectively.The third dataset obtained from the Steptoe x Morex cross.First, we analyze lines with respect to the heading date trait.According to the standard ANOVA test, the 47-th SNP has the smallest p-value which is   The standard analysis with respect to the grain yield trait gives the following significant SNPs and their p-values in parentheses: ).The corresponding Manhattan plots generated from the p-values for the grain yield trait are shown in Figure 6. 5. Discussion how to improve the algorithm.Let us point out shortcomings of the FAPI-GWAS and discuss possible ways to overcome them and to improve the algorithm.
First, numerous experiments with real data illustrate that the FAPI-GWAS selects groups of adjacent strongly correlated SNPs in the same chromosomal region which are not inherited randomly.This effect is similar to those taking a place in the ridge regression algorithm which tends to select all of the correlated SNPs and make their importance coefficients to be equal.In contrast to the ridge regression, the Lasso method tends to select only one SNP from the group of correlated ones.Therefore, the problem of correlated SNPs can be solved by using a two-step procedure.The first step is based on the FAPI-GWAS.The result of this step is a small set of important SNPs.The second step uses the Lasso method or its modification, for example, the adaptive Lasso, in order to remove the correlated SNPs from the available small set.Moreover, we can use a modification of the Lasso which takes into account the epistatic effect because the number of possible pairs of SNPs after the first step is rather small.
Another way to treat with the correlated SNPs is to use the standard tools for testing the association between single SNPs and a continuous phenotype, including for example, one-way ANOVA.In order to identify twolocus epistatic effect or interacting SNP-pairs that have strong association with the phenotype, an algorithm for the two-locus ANOVA test can be used.There are many approximated methods for reducing the computational burden.They are reviewed in detail for a case-control study when the phenotype can be represented as a binary variable with 0 representing controls and 1 representing cases as well as for the quantitative trait locus analysis when the phenotype is quantitative [4].Most methods are reduced to two steps.The first step is reduction of a set of SNPs in order to apply standard statistical procedures to this reduced set of SNPs.The standard statistical procedures make up the second step.The reduction of the set of correlated SNPs can be successfully implemented by means of the FAPI-GWAS as the first step.As a result, we get a small subset of important SNPs which can be processed by statistical tests, for instance, ANOVA test, in order to remove the correlated SNPs located on the same chromosome.
We point out another shortcoming which has been observed in numerical experiments.Since the number of SNPs is much larger than the number of individuals, then we observe only a very small number of vectors i x among all possible vectors.This implies that contributions of some important SNPs in a pair of vectors of alleles ( , ) i j x x may be hidden when there are many transitions in this pair, for example, from 0 to 1 and from 1 to 0. In this case, the distance between vectors is large, and this pair does not get to a set of N best pairs with the largest ratios ( , ) r i j .One of the ways to overcome the difficulty is to apply the combination of the bagging method [37] and the random subspace method [38].The FAPI-GWAS can be improved by using a combination of the bagging method for individuals and the random subspace method for SNPs.The random sampling of individuals in the proposed method allows us to smooth some outliers of the phenotype caused by random factors.By means of the random sampling of SNPs, we try to reduce the effect of SNPs which mask the effect of subsets of important SNPs.

Conclusion.
In this paper, a very fast and simple algorithm for GWAS, including SNP interaction detection, has been presented.In spite of its simplicity, the FAPI-GWAS can be applied to various GWAS problems and cases from the analysis of binary genotype matrices to the microarray gene expression data analysis.Moreover, the algorithm can be simply extended, for example, on the bagging method.
At the same time, it is important to note that the algorithm should be used jointly with another algorithm, for example, with the ANOVA tests to identify the association between a single marker or interacting SNP-pairs and a continuous outcome.At that, the second stage uses a set of significant SNPs which is obtained at the first stage by means of the FAPI-GWAS.

that 12 n
 individuals with different plant height values (50,15,...,10, 60) Y  T (phenotype values) are defined by 10 p  SNPs with alleles ij x taking the values 0 and 1.We have to develop an algorithm which selects the most important SNPs or their combinations from the point of view of their impact on the plant height.

Fig. 1 .
Fig. 1.An example of the genotype matrix X and the phenotype vector Y

Fig. 2 .Step 2 .
Fig. 2.An example of three different pairs of individuals Formally, the proposed algorithm FAPI-GWAS can be represented as follows.Step 1.All vectors of alleles 1 ,..., n x x are sorted in descending order of the corresponding phenotypes, i.e., 1 ... n y y   .This step simplifies compari- son of phenotypes because the condition 0 i j y y   for all i or 1 mean that the allele, corresponding to the k-th SNP and having values 1 or 0, respectively, contributes to decreasing of the phenotype.

4 .
. Another element which could be changed is the choice of the ratio r .The proposed ratio is one of the possible measures for the target pair localization.It is just a simplest way for defining the measure.Perhaps, other measures might also improve the algorithm.Numerical experiments.Numerical experiments are carried out on three populations of double haploid (DH) lines of barley:

Fig. 3 .1
Fig. 3.The Manhattan plot for the OWBD using standard method (left) and the FAPI-GWAS (right) Let us study the dataset obtained from the Dicktoo x Morex cross.According to Pan et al. [35] (Page 905), top ranked SNPs for heading date with and without vernalization are ABC170-CD064 and Dhn1-BCD265b which correspond to the following numbers of SNPs 22-24 and 111-113, respectively.The ANOVA is applied here again.We get two SNPs with numbers 22 and 112 having the smallest p-values 5 1.32 10   and

Fig. 4 .
Fig. 4. The Manhattan plots for the Dicktoo x Morex data set using standard method (left) and the FAPI-GWAS (right)

.
Other significant SNPs have numbers 68, 82, 205.However, they have larger p-values, namely, generated from the obtained p-values is shown in Figure5(the left plot).By using the FAPI-GWAS, we get quite the same results.The Manhattan plot generated from the p-values obtained by means of the FAPI-GWAS is shown in Figure5(the right plot

Fig. 5 .
Fig. 5.The Manhattan plots for the Steptoe x Morex data set (the heading date trait) using standard method (left) and the FAPI-GWAS (right) FAPI-GWAS provides the same significant SNPs.Additionally, we get the following interacting SNPs: 82

Fig. 6 .
Fig. 6.The Manhattan plots for the Steptoe x Morex data set (the grain yield trait) using standard method (left) and the FAPI-GWAS (right)

Table 2 .
The genotype transitions and the values  and r