Study design and quality control
To identify genes involved in food allergy, we performed a GWAS in a discovery set of 523 food allergic cases and 2682 population-based controls from the German Heinz Nixdorff Recall Study (HNR)16. All cases were from the Genetics of Food Allergy Study (GOFA) in which the diagnosis of food allergy was based on OFCs according to current guidelines3. After applying stringent quality control criteria (see “Methods”), the discovery set consisted of 497 cases and 2387 controls with high quality genotype data. For imputation, we used the Haplotype Reference Consortium data at the University of Michigan as reference17. In order to ascertain high quality of imputed genotypes, we filtered for a quality score of r2>0.5 and excluded low frequency variants (minor allele frequency (MAF)<5%), yielding over five million single nucleotide polymorphisms (SNPs) available for analysis. The study design is summarized in Fig. 1. Association with disease was calculated with FastLMM using an additive allele-dosage model. The main analysis was performed on any food allergy (Fig. 2). In addition, we investigated the three most common allergies against specific foods separately, including HE (n = 288), PN (n = 220), and CM (n = 169; Fig. 2). There was no evidence for inflation of the test statistics neither in the GWAS on any food allergy (λ = 1.03) nor in the allergen-stratified analyses (Supplementary Fig. 1). For replication, we considered all loci with moderate association in the discovery set (P < 1 × 10−3; Supplementary Data 1a–d). If multiple SNPs from the same locus reached the selection threshold, we selected the best SNP. For the remaining SNPs, we calculated linkage disequilibrium (LD) with the lead variant. If SNPs in low LD (r2 < 0.2) were present, again the best SNP was selected for replication.
To replicate the findings, we investigated 380 additional food allergic cases of the GOFA study and 986 population-based controls of the Study of Health in Pomerania (SHIP)18, of which 379 and 984 samples passed the quality check. All individuals of the discovery and replication set were of European ancestry as confirmed by principal component analysis. A detailed characterization of both data sets is provided in Table 1.
Furthermore, variants replicating with the same risk allele as the GOFA discovery set at nominal significance (P < 0.05) and not reaching the Bonferroni corrected P value in the GOFA replication set were confirmed in the Chicago Food Allergy Study (Supplementary Table 1) comprising 671 food allergic children, 144 non-allergic, non-sensitized normal controls, and 1382 controls of unknown phenotype (234 children and 1148 parents), all of European ancestry (Fig. 1)14.
Loci associated with food allergy
In the main analysis on any food allergy, two loci already showed association at genome-wide significance (P < 5 × 10−8) in the discovery set (Table 2, Fig. 2). The respective lead SNPs, rs12123821 on chromosome 1q21.3 (odds ratio (OR), 2.55; P = 8.4 × 10−10) and rs11949166 on 5q31.1 (OR, 0.60; P = 1.2 × 10−13), were also significantly associated with food allergy in the GOFA replication set after correction for the number of tests performed (for FA; n = 847, P < 5.9 × 10−5, Bonferroni correction). They replicated with the same risk alleles and with similar effect sizes (rs12123821; OR, 2.86; P = 6.1 × 10−7 and rs11949166; OR, 0.69; P = 3.0 × 10−5). Meta-analysis of the two sets yielded highly significant associations at 1q21.3 and 5q31.1 (rs12123821; OR, 2.65; P = 2.6 × 10−15 and rs11949166; OR, 0.63; P = 4.3 × 10−17).
Variant rs12123821 at 1q21.3 is located within the epidermal differentiation complex (EDC) near the epidermal barrier gene filaggrin (FLG, Supplementary Fig. 2a) which was previously associated with PN allergy10. Since we identified LD between rs12123821 and a LOF mutation in FLG, c.2282del4 (r2 = 0.19, Dʹ = 0.78), we evaluated whether the association signal at 1q21.3 was due to known FLG mutations. We included the two most common FLG LOF mutations in European populations, FLG c.2282del4 (tagged by rs12123821) and p.R501X (rs61816761) as covariates in the analysis which eliminated the highest association peaks within the EDC (Supplementary Table 2). While our results confirmed the role of FLG null mutations in food allergy, a residual association was still detectable between FLG and the repetin gene (RPTN; Supplementary Fig. 2b, Supplementary Table 2), which could point to additional genetic risk factors in this region.
FLG mutations are known to be strong risk factors for eczema19 which often co-occurs with food allergy4. In order to exclude that the observed association was due to an underlying association with eczema, we performed the association analysis in the subset of children without eczema (n = 152, Table 3). The effect of rs12123821 remained significant with similar effect size (OR 1.77; 95% CI 1.15–2.74; P = 0.0094), demonstrating an eczema-independent effect of FLG null mutations on food allergy. Finally, we investigated the effect of FLG null mutations on allergies to specific foods. While the association of FLG null mutations with PN allergy is well-documented10, we show that FLG mutations also confer risk for HE and CM allergy with similar and large effect sizes (Table 2).
On chromosome 5q31.1, the strongest association was observed for rs11949166 located between the interleukin 4 gene (IL4) and the kinesin family member 3a gene (KIF3A) within the cytokine gene cluster (Table 2). Variants spanning the whole 0.2 Mb region from IL5 to KIF3A were associated with food allergy at genome-wide significance (P < 5 × 10−8, Supplementary Fig. 3a). We tested whether LD within the cytokine gene cluster accounted for the multitude of associated SNPs or whether several independent signals were present. We identified two groups of SNPs covering IL5/RAD50 and IL4/KIF3A, which were significantly associated with food allergy (Supplementary Figs. 3a and 4a). There was high LD between the SNPs of each group, but low LD between SNPs of different groups. Mutual adjustment for the lead SNP of each group pointed to two independent signals (Supplementary Table 3, Supplementary Figs. 3b and 4b). Since this chromosomal region is a known eczema locus, we again stratified the association analysis for the eczema status and confirmed an eczema-independent effect of rs11949166 on food allergy with nearly identical effect sizes in the subgroups with eczema (OR, 1.69; 95% CI, 1.50–1.91) and without eczema (OR, 1.61; 95% CI, 1.27–2.04; Table 3).
Two novel susceptibility loci were identified at genome-wide significance after replication in the Chicago Food Allergy Study (Supplementary Table 1). The lead variant at 11q13.5, rs2212434 (Supplementary Fig. 5), was consistently associated with the same risk allele in all three study populations (Table 4). The same SNP was identified as the best associated variant at this locus in the largest eczema GWAS20 to date. The eczema-stratified analysis revealed a strong and significant effect (OR, 1.40; 95% CI, 1.25–1.58; P = 1.9 × 10−8) in the food allergy plus eczema group and a residual effect (OR, 1.14; 95% CI, 0.90–1.44; P = 0.29) that was, however, not significant in a small set of 152 food allergic children without eczema (Table 3).
Another new susceptibility locus, which had not yet been linked to any allergic disease, was identified in chromosomal region 18q21.3 (Supplementary Fig. 6a). SNP rs12964116 located in intron 1 of SERPINB7 (serpin peptidase inhibitor, clade B, member 7) was associated with food allergy in the GOFA discovery set (OR, 1.9; P = 5.7 × 10−6, Table 2) and replicated with the same risk allele and a similar effect size in the GOFA replication set (OR, 1.69; P = 9.4 × 10−3). Since rs12964116 did not reach the Bonferroni corrected P value in the replication set (P < 5.9 × 10−5), we investigated the Chicago Food Allergy Study14 in order to confirm this locus. Again, the SNP was significantly associated with food allergy and with PN allergy (Table 4), reaching genome-wide significance for both phenotypes in the meta-analysis including all three studies (P = 1.8 × 10−8 for any food allergy and P = 1.9 × 10−10 for PN allergy). Within the SERPINB gene cluster, a second SNP (rs1243064) in moderate LD with rs12964116 (r2 = 0.06, Dʹ = 0.71) was associated with food allergy (Supplementary Fig. 7a). In order to explore whether the two SNPs represented independent association signals we mutually conditioned on the two lead variants (Supplementary Table 5). In both cases, association of the other variant with food allergy decreased but was still present suggesting more than one risk haplotype at this locus (Supplementary Figs. 6b and 7b, Supplementary Table 5).
Association of rs1243064 was confirmed in the GOFA replication set for HE allergy at nominal significance, reaching genome-wide significance in the meta-analysis of GOFA discovery and replication set (P = 4.2 × 10−8, Table 2). In the Chicago Food Allergy Study the same risk allele was identified (Table 4). However, association did not reach significance (P = 0.15) which may be due to reduced power in a small sample with a less stringent phenotype definition that was not based on OFCs. Overall, association at the serpin locus was consistent and strong for any food allergy, PN, and HE allergy.
To better understand the potential functional basis of the novel food allergy locus, we used LDLink21 to identify all variants within the SERPINB gene cluster which are in high LD (r2 > 0.8) with the two lead SNPs, rs12964116 and rs1243064. None of the identified candidate SNPs altered any protein sequences as predicted by the ENSEMBL variant effect predictor (Supplementary Table 6)22. We therefore evaluated their association with gene expression in expression databases including the Genotype-Expression database (GTEx, version V6p), and reviewed their functional annotations in the ENCODE Consortium (http://genome.ucsc.edu/ENCODE/)23.
rs12964116 is located in an intron of SERPINB7 in a binding site for several members of the transcription factor activator protein (AP)-1 complex, which is involved in diverse cellular processes including cell growth and differentiation (Supplementary Table 6). In chromatin immunoprecipitation (ChIP)-seq experiments this site has also been shown to bind the transcription factor CCAAT/enhancer-binding protein beta (CEBPB)24 which regulates the expression of genes involved in immune and inflammatory responses, including cytokines interleukin-6, interleukin-4, interleukin-5, and TNF-alpha, as well as signal transducer and activator of transcription 3 (STAT3) which mediates the transcriptional activation in response to multiple cytokines and growth factors. The other lead SNP, rs1243064, is a tissue-specific expression quantitative trait locus (eQTL), with the risk allele rs1243064A being negatively correlated with SERPINB10 expression in whole blood (Supplementary Table 6).
We then used LD score regression analysis in order to quantify the liability-scale heritability of food allergy that was explained by the lead variants identified in our study. Altogether, the food allergy susceptibility loci identified in this study explained ∼10.2% of the variance in liability (Supplementary Table 7).
Loci associated with allergy to specific foods
Association results within the HLA region at 6p21 (Supplementary Fig. 8a) confirmed a previously reported locus for PN allergy14. We found LD (r2 = 0.48, Dʹ = 0.85) between rs9273440, which was strongly associated with PN allergy in the discovery and in the replication set (Table 2), and rs9275596, which was the lead SNP in the previous GWAS14. Although several SNPs reached the selection threshold (P < 1 × 10−3) in the discovery set, conditioning on rs9273440 eliminated all association signals within the region (Supplementary Fig. 8b), pointing to a single signal at this locus. Notably, children with HE or CM allergy did not contribute to the association with rs9273440 (Table 2) demonstrating a PN-specific locus.
In the analysis of HE allergy, 896 candidate variants were identified in the discovery set (Supplementary Data 1b). Apart from the susceptibility loci for any food allergy at 1q21.3 and 5q31.1 (Table 2), two additional SNPs were significantly associated in the GOFA replication set and selected for replication in the Chicago Food Allergy Study in which neither SNP reached significance (Supplementary Table 1). In the analysis of CM allergy, 845 SNPs were selected for replication (Supplementary Data 1d). One candidate SNP specific for CM allergy (rs73908987) replicated in the GOFA replication set (Supplementary Table 1) reaching 6.0 × 10−7 in the meta-analysis of the two GOFA sets. Unfortunately, there were no data or proxy SNPs (r2 > 0.8) available for this variant in the Chicago Food Allergy Study.