Clinical characteristics of the peanut allergic cohort
Twenty-one children with suspected peanut allergy completed randomized, double-blind, placebo-controlled oral food challenges to peanut, performed according to a modified AAAAI/EAACI PRACTALL protocol8, 9. Under close medical supervision with blinding of subjects and staff, subjects ingested incremental amounts of peanut at 20 min intervals until an allergic reaction occurred or until the cumulative dose of 1.044 grams protein was ingested. In a similar fashion and on a different day, the same subjects ingested incremental doses of placebo oat powder. The order of peanut and placebo challenges was randomized, and all subjects completed both peanut and placebo challenges. Peripheral blood samples were drawn before, during, and at the end of each challenge.
Nineteen of the 21 children reacted to peanut, and none of the subjects reported symptoms during the placebo challenge; clinical characteristics are shown in Table 1. The two children who met inclusion criteria and completed the challenges, but had no reaction to peanut, were considered not peanut allergic and were excluded from further study. Among the 19 peanut allergic children, peanut sIgE ranged from 1.21 to >100 kUA/L, and peanut skin prick test wheal ranged from 4.5 to 39.5 mm.
All children reacted within the first 2 h of the peanut challenge. Five children with peanut allergy reacted to the first dose of peanut, with symptoms of an allergic reaction following just 1 mg of peanut protein. The median cumulative dose at first objective symptom among the peanut allergic children was the third dose (10 mg dose for a cumulative intake of 14 mg peanut protein). The most common symptom elicited by peanut was distress, followed by throat tightness, urticaria, angioedema, vomiting, and rhinorrhea. All children received antihistamines and 58% were administered epinephrine.
Identifying genes associated with peanut allergic reactions
The primary aims of this study were to characterize gene expression signatures, functional processes, and causal key drivers of acute peanut allergic reactions. To do this, during each peanut and placebo challenge, we collected peripheral blood from each subject at zero hours (baseline, pre-challenge), 2 h (during challenge) and 4 h (end of challenge) from challenge start, totaling 6 samples per subject for RNA-seq (Fig. 1). This design enabled us to execute analyses such that each subject served as their own control over time and exposure (peanut vs. placebo). Six samples were collected from each of the 19 subjects, except for two participants for whom samples were obtained during peanut challenge only, resulting in a total of 108 samples collected.
Following data processing, normalization, and quality control, one outlier sample was removed (Supplementary Figure 1), resulting in RNA-seq data (n = 17,337 genes) for 107 blood samples (placebo n = 51; peanut n = 56) across the 19 peanut allergic subjects. To identify genes associated with acute peanut allergic reactions, we used lme modeling10, which allowed us to collectively and statistically assess, across all individuals in this peanut allergic cohort, within-individual changes in gene expression across all time points in response to peanut and not placebo (see Methods). The top 30 genes identified by this analysis (Bonferroni-corrected P < 0.01) are shown in Table 2; functional information for each gene is summarized in Supplementary Table 1. Because effect size estimates from lme models can be less intuitive, changes in gene expression between the means at baseline and four-hour time points for the peanut challenge are displayed in Table 2 as a more intuitive measure of effect size. As Bonferroni correction is often considered overly conservative, an extended set of genes significant at P < 0.005 was used for many of the downstream analyses that follow (Supplementary Data 1). For ease, we refer to this extended set of peanut allergen response genes henceforth as “peanut genes.”
Gene expression levels for six selected peanut genes are plotted in Fig. 2 as examples of patterns observed for genes identified by lme modeling; these genes are also the six key drivers identified by causal network analyses conducted below. For context, plots for the top 30 peanut genes are provided in Supplementary Fig. 2. Consistent with the activation of pro-inflammatory processes, the majority of peanut genes were upregulated with peanut exposure, with 65% (1411/2168) exhibiting increased gene expression at four hours after initial exposure to peanut. Across all peanut genes, we observed a greater magnitude of gene expression change between baseline and four hours (mean differences of 0.24 log2-counts per million (c.p.m.)) compared to baseline and two hours (mean difference 0.04 log2-c.p.m.).
Leukocyte compositional changes in peanut allergic reactions
Given that the allergic response involves the activation, differentiation, and recruitment of various immune cell types, we tested for changes in the distribution of leukocyte cell fractions during acute peanut allergic reactions. Employing a leukocyte cell-type deconvolution algorithm11, we inferred the proportions of 19 leukocyte populations from RNA-seq expression profiles of each sample and time point (Fig. 3a). Neutrophils, naive CD4+ T cells, and monocytes were the predominant cell types observed, collectively representing 70% of cells on average across samples. In contrast, the remaining cell types each had individual mean fractions <10%.
Analogous to our gene expression analysis, we used lme models to identify leukocyte cell fractions with significant changes over time in response to peanut but not placebo. We observed significant differences in three specific cell types after correction for multiple testing (Fig. 3b): M0 macrophages (FDR = 0.016), neutrophils (FDR = 0.016), and naive CD4+ T cells (FDR = 0.018). Whereas the fraction of both macrophages and neutrophils continuously increased over time during the peanut challenge, but not placebo, there was a reduction in naive CD4+ T cells at four hours compared to baseline (Fig. 3b). Raw and corrected P-values from this analysis are provided in Supplementary Table 2.
To treat severe reactions, some subjects in our study (n = 11) were given epinephrine during peanut challenge. Because epinephrine can impact cell numbers in peripheral blood12, 13, we stratified our data to assess whether the same trends were observed in subjects who did and did not receive epinephrine (Supplementary Fig. 3). Consistent with the combined results, we observed increased fractions of macrophages and neutrophils, and a decreased fraction of naive CD4+ T cells, after peanut challenge in both the epinephrine treated and untreated strata. However, particularly for neutrophils and CD4+ T cells, the effects were more pronounced in subjects who received epinephrine, which could be attributable to the fact that these individuals had more severe allergic reactions. To directly test whether this difference was due to reaction severity vs. epinephrine treatment, an experimental design barring epinephrine treatment in subjects with severe reactions would be needed; however, this is not ethically possible in human subjects given the risk of anaphylaxis and death with untreated severe allergic reactions.
Replication of gene expression and leukocyte signatures
To assess the robustness of the changes in gene expression and leukocyte fractions observed, we sought to replicate our findings in an independent replication cohort of 21 peanut allergic children, with clinical characteristics similar to those of the discovery cohort (Table 1). Each of the 21 individuals underwent analogous double-blind, placebo-controlled peanut challenges using the same sample collection protocol as the discovery cohort. We used RNA-seq data generated from whole blood collected at baseline, two hours, and four hours of the peanut challenge to test for gene expression changes that occurred in response to peanut. Of the top 30 genes identified in the discovery cohort, 28 exhibited statistically significant (P < 0.05) changes in expression during peanut challenge in the replication cohort (Table 2), representing a significant enrichment (OR = 27.9, Fisher’s exact test P = 9.5 × 10−12) over that expected by chance. Boxplots depicting expression data in this replication cohort for each of the top 30 genes presented in Table 2 are shown in Supplementary Fig. 4, showing concordant directions of gene expression change for all genes during peanut challenge relative to the discovery cohort.
We next used transcriptome profiles from the replication cohort to perform peripheral blood leukocyte deconvolution (Supplementary Fig. 5a) and lme modeling to identify changes in the proportions of cell subsets during peanut challenge. The same three cell types identified in the discovery cohort were again significant in the replication cohort (Supplementary Fig. 5b; M0 macrophages (FDR = 0.0053); neutrophils (FDR = 0.0013); naive CD4+ T cells (FDR = 0.0038)).
Peanut genes enriched in an acute-phase response module
To investigate the biological context and relationships of peanut genes identified by our analyses, we constructed transcriptome-wide gene networks in the discovery cohort using WGCNA14. WGCNA identifies modules of co-expressed genes (i.e., groups of genes with similar expression profiles and interconnectivity across experimental samples), that can serve as broader constructs of gene expression beyond the single gene level15. Using WGCNA, we identified 13 coexpression modules (Supplementary Table 3). To identify modules associated with acute peanut allergic reaction, we tested for enrichment of peanut genes in each module. Only the blue coexpression module was significantly enriched for peanut genes. 1223 (51%) of the 2381 genes in that module were peanut genes, equivalent to a 4.1-fold enrichment over that expected by chance (Fisher’s exact test P = 1.2 × 10−304) (Fig. 4a; Supplementary Table 3). We will henceforth refer to the blue module as the “peanut response module,” given it was the only module enriched for peanut genes.
To gain insight into the collective putative function of genes within the peanut response module, we next performed gene ontology (GO) analysis16. This revealed significant enrichments of the peanut response module for inflammatory processes, including acute-phase response (fold enrichment = 3.5; FDR = 6.5 × 10−3), acute inflammatory response (fold enrichment = 2.8, FDR = 2.9 × 10−3), positive regulation of I-kappa-B kinase/NF-kappa-B signaling (fold enrichment = 1.9; FDR = 1.8 × 10−3), and lymphocyte activation (fold enrichment = 1.7; FDR = 3.0 × 10−3). The GO biological process terms associated with the peanut response module at FDR < 0.01, sorted by fold enrichment, are shown in Fig. 4b. A complete list of these GO terms and associated genes are provided in Supplementary Data 2. Although no other coexpression module identified by WGCNA was enriched for peanut genes after correction, we show the top biological processes associated with these other modules for comparison (Fig. 4a). To further enhance our understanding of the peanut response module, we examined GO biologic process terms for the upregulated and downregulated peanut genes in this module separately (Fig. 4c, d). Whereas the upregulated genes are involved in inflammation, the downregulated genes regulate macromolecule metabolism; P-values and enrichment statistics from this analysis are provided in Supplementary Data 3.
Identifying causal key drivers of the peanut response module
Although integration of our peanut gene analysis with WGCNA provided strong evidence for a link between acute peanut allergic reactions and the peanut response module, this type of analysis is associative and thus cannot on its own reveal causal relationships among genes in the implicated module. Elucidating the connectivity structure within these modules can lead to the identification of key driver genes that are predicted to modulate the regulatory state of the module, and be of high interest to prioritize as causal to acute peanut allergic reactions. We have previously integrated genetic, transcriptome and other high-throughput data into probabilistic causal gene networks as one approach to uncover key driver genes in disease associated networks/modules17, 18. This type of data-driven, directed approach provides systems-wide context for understanding known and novel regulators and processes for a disease. We sought to apply such approaches here to identify key driver genes of the peanut response module.
Given the lack of large-scale gene expression data sets generated from individuals with peanut or other food allergies, which would be needed to construct accurate probabilistic causal gene networks19, we explored the availability of alternative data sets that could be used to model gene networks associated with inflammation (consistent with processes identified by WGCNA analyses above). We constructed a directed probabilistic causal gene network using peripheral blood gene expression data and expression quantitative trait loci (eQTL) from 526 patients with inflammatory bowel disease (IBD)20. Although IBD is distinct from food allergy, both conditions are due to maladaptive immune reactions to antigens presented to the gastrointestinal tract in genetically susceptible individuals, and there is an association between IBD and atopic diseases, including food allergy21, 22. We reasoned that there is some overlap between the two conditions with regard to chronic immune dysregulation and pro-inflammatory processes such that the topology and major edges of the underlying disease network architecture in IBD could inform gene coregulation relevant to the inflammatory response to peanut.
To narrow in on the component of the constructed probabilistic causal gene network most relevant to acute peanut allergic reactions, we projected peanut genes belonging to the WGCNA peanut response module onto this network. The projection consisted of overlapping genes from the intersection of these genes and all genes in the network, identifying all genes in the network that are within a path length of one node in this overlap, and then identifying the largest connected graph (subnetwork) from this set of nodes and all associated edges. The subnetwork identified from this projection was composed of 1029 genes, including 500 genes belonging to both the peanut response module and peanut gene set, representing a 4.7-fold enrichment over that expected by chance (Fisher’s exact test P = 3.81 × 10−272). This indicated that the peanut response module constructed from our study was highly conserved in the probabilistic causal network constructed from the IBD cohort.
To predict genes that modulate the regulatory state of the peanut response module, we employed key driver analysis (KDA), an algorithm that mathematically identifies causal modulators of the regulatory state of functionally relevant gene groups17. Key drivers have been found to be reproducible and successfully validated in different contexts23. KDA identified 26 key drivers (FDR < 0.05) of the peanut response module in the IBD network, and these key drivers were significantly enriched for genes in the identified subnetwork (OR = 82.6, Fisher’s exact test P = 9.65 × 10−20) (Supplementary Table 4). The relationship between key drivers with the peanut genes, and genes belonging to the peanut response module are depicted in Fig. 5. This hierarchical structure of the network, reflects an ordering of the key driver genes in which key drivers at the highest level have the most upstream effects on modulating the state of genes at the lowest level, which specifically comprise peanut genes and genes belonging to the peanut response module. As the distance from the bottom increases, so too does the enrichment at each level for being a key driver. The higher up in the response cascade, the higher the odds ratio (OR 274.1, Fisher’s exact test P = 6.0 × 10−16 for the top level). Resolving the structure of these networks allows for the identification of those driver genes that are causally associated with the expression of peanut genes at the most upstream points in the response cascade.
To ensure that the IBD network was capturing biology relevant to peanut allergy, we constructed a validation causal gene network using the peanut allergic discovery and replication cohorts (Table 1). As context, we used the IBD network as our discovery causal network because it could be built on a much larger data set (n = 526) with genetic prior information (expression quantitative trait loci (eQTL)), thus enabling construction of a robust and informative causal network19, 24, 25. The peanut causal network we built is not as robust as the IBD network given the limited sample size for its construction and lack of eQTLs available as input, and was therefore used to validate findings from the IBD network. Indeed, when we performed the same subnetwork analysis as above but in the peanut network, the overlap between the IBD and peanut subnetworks was 3-fold enriched over that expected by chance (Fisher’s exact test P = 3.3 × 10−224). Further, KDA of the peanut allergy-specific causal network identified key drivers significantly overlapping those from the IBD network (OR 16.8, Fisher’s exact test P = 2.4 × 10−7), lending confidence to our causal network findings.
Six key driver genes (LTB4R, PADI4, IL1R2, PPP1R3D, KLHL2, and ECHDC3) in particular were highlighted by the combined interpretation of the lme models, WGCNA, and KDA, as these were the only genes that met all three criteria of being: (1) peanut genes with a significant change in expression with peanut but not placebo exposure; (2) members of the WGCNA peanut response module; and (3) top level key driver genes in the hierarchical visualization of the network, predicted to causally modulate the state of peanut genes and peanut response module members at the most proximal level (Fig. 5). To illustrate the relationship between these six key drivers of highest interest, Fig. 6 shows their relationship within the probabilistic causal gene network, as well as their cellular context based on prior knowledge. Two of these top layer key drivers, LTB4R and PADI4, were peanut genes by the strictest significance criteria (Bonferroni-corrected P < 0.01) (Table 2; Fig. 2). In addition, LTB4R and PADI4 have previously demonstrated roles in inflammatory and immune-related diseases26, 27, as does IL1R2, another top layer key driver28, 29.