In this study, we have introduced the Modular Single-set Enrichment Test (MSET), a newly developed tool designed for assessing enrichment of disease/disorder-associated gene sets within microarray results. In addition to demonstrating the capabilities and limitations of this novel software, we used it to discover a strong link between the maternal brain and autism, as well as several other mental health disorders. We subsequently identified a network of candidate genes that may influence sociability in mothers and revealed the functional character of this network to be primarily related to developmental and neuronal signaling processes.
Enrichment in postpartum LS for autism, and other mental health disorders
The compelling enrichment of autism-associated genes found in expression changes of the postpartum LS (Figure 1) is, to our knowledge, the first demonstration of a genetic link between the maternal brain and pathways involved in autism. The discovery of enrichment in postpartum LS for several mood/social disorders (Figure 2) suggests that the phenotypic consequence of LS gene changes in the transition to motherhood possesses a significant behavioral and emotional component. Because the mother-infant relationship is the first and foremost social bond formed in mammals, it has been suggested that the genetic and neural networks underlying sociability in this ancestral event might serve as an evolutionary template from which sociability in other contexts is derived . Our data indirectly support this concept on a large-scale genetic level. While autism rates are higher in males , it could be the conserved use of the same core genes for sociability that provides the connection between autism and the maternal brain. Furthermore, the severity of autism symptoms is often described as spectral, rather than binary. It is therefore plausible that subtle dysregulation of genes which are naturally modulated in the control of sociability, such as in the transition to motherhood, would more likely contribute to this observed phenomenon than would rarer gain-of-function or loss-of-function mutations.
Table 1 presents 36 of the 160 autism-associated genes that MSET identified in the postpartum LS, as well as the number of autism databases in which they are featured. This is presumably a reflection of the strength of their association with autism based on past studies, with consensus genes having the most widely recognized evidence. However, it is not a perfect indicator because it only counts positive association discoveries, and does not consider the existence of any potential contradictory evidence or disagreement. For example, Foxp2, a forkhead/winged helix (FOX) transcription factor, is found in seven of the nine autism databases used (Table 1). It is located in a region of chromosome 7q that has been linked to autism in the past , and mutations in Foxp2 cause speech and language acquisition pathologies in humans . However, more recent evidence suggests that the language deficits are more directly related to a developmental impairment of motor brain regions, rather than to social behavior, and several recent reports conclude that Foxp2 does not contribute to autism susceptibility [39–41]. Even if Foxp2 were omitted from significant postpartum LS expression results, the observed enrichment would be highly significant. This illustrates the important point that, although assessing the degree of enrichment using MSET is robust and largely resistant to single gene false positives in upstream databases, caution must be exercised when interpreting the biological importance of individual genes identified by MSET in the testing procedure. Another advantage of MSET is that the user can manually annotate any file, remove genes that are considered to be inappropriate, or even create novel gene lists for testing.
Table 1 includes several autism-linked genes that were identified in our original microarray analysis as particularly interesting based on their biological function and relevance to emotional state and behavior. These include the GABAA receptor subunits α4 and δ, four potassium channel subunits (Kcnd2, Kcnd3, Kcnh7, and Kcnj4), dopamine receptors Drd1a and Drd2, the kappa opioid receptor Oprk1, fatty acid binding protein 7 (Fabp7), and suppressor of cytokine signaling 2 (Socs2). The biology of these genes is discussed in greater detail in our original report .
NIH DAVID’s functional annotation clustering tool was used to generate a functional profile of the 160 autism-associated genes found to be differentially expressed in the postpartum LS (Table 2). The most highly enriched pathways were primarily developmental, involving processes such as synaptic plasticity, neuronal morphogenesis/differentiation, and cell motility. Several clusters related to synaptic transmission also showed high levels of enrichment. Because these biological processes have now been implicated in both autism and the maternal LS, it is likely that aspects of sociability modulated in both phenomena are influenced by structural changes in the brain, including axonal/dendritic growth, and even neurogenesis. This possibility is supported by a body of literature which has revealed that diverse regions of the adult brain contain multipotent stem cells capable of generating new neurons [42–49], and it has been shown that maternal behavior is associated with the stimulation of neurogenesis in the subventricular zone .
In addition to autism, MSET analysis revealed that significant postpartum LS expression results exhibit enrichment for bipolar disorder (BPD), schizophrenia, ADHD, and depression-associated genes (Figure 2). These gene lists were extracted from the four general disease association databases that were also used in the autism enrichment analysis (Additional file 1: Table S1). Links were particularly strong for both BPD and schizophrenia. BPD and depression links are of interest because rates of depression increase in the postpartum state, with postpartum depression affecting 1-10% of mothers . Thus, some of the normal changes that occur in the maternal brain likely lead to a vulnerability of key depression type pathways. Positive associations have been consistently found for an elevated risk of BPD in women after childbirth , which is considered to be part of a suite of diagnosable “postpartum psychoses”. Also among this class of diseases is schizophrenia, which, in addition to its well-known cognitive dysfunction, is also characterized by emotional deficits . Recent studies highlight that a subset of genes contribute to multiple mental health disorders [54–57], so it is not completely surprising that a behavioral transformation as fundamental as the transition to motherhood might have links to multiple disorders. To ensure that this multitude of positive enrichment was not due to an artifact in MSET analysis, we tested the postpartum LS expression results for enrichment of arthritis-associated genes (Figure 4), which proved to be absent. The MSET tool has been used successfully in our laboratory to detect enrichment of mental health-related gene sets in other areas within the maternal brain, such as the medial preoptic area (unpublished observations). While there were similarities in enrichment across regions, there were also differences in enrichment patterns and in the individual genes which accounted for enrichment. This indicates that there may be common, global expression changes in the maternal brain, but also that each region has its own genetic “signature”. Future work will characterize the genetic profile of the maternal brain more comprehensively.
Enrichment analysis in expression data from a murine model of induced arthritis
To validate and demonstrate the applicability of MSET, we performed a series of analyses on expression data taken from several independently conducted microarray experiments. These data are publicly available through NCBI’s Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) or through institutional hosting.
To test for expected specificity of MSET analysis, we assessed enrichment for the full range of disease-associated gene sets within microarray expression data from a murine arthritis model study in which an arthritic state was induced via the transfer of serum from a knockout mouse into a wild type animal . Complementary findings to the postpartum LS results were observed, in which the arthritis model data showed enrichment specifically for arthritis-associated gene sets, but not for autism (Figure 4) or any other mental health disorders (Figure 2). Collectively, these results demonstrate that the enrichment analysis performed in the present study is reliable and specific. Specificity may not be expected in every application, as different models and experimental treatments used in microarray studies can affect broad or unanticipated gene pathways.
Enrichment of autism-associated genes in Tbr1 null transgenic mice
To showcase the broader applicability of MSET, we performed enrichment analysis for the full range of disease-associated gene lists in a set of expression data collected from murine T-box brain gene 1 (Tbr1) null developing neocortex . The Tbr1 null animal was chosen because Tbr1 is a developmentally related transcription factor that binds, among other targets, the promoter of a gene called autism susceptibility candidate 2 (Auts2), named for its implication in autism susceptibility in the frontal cortex [59, 60]. The Tbr1 null neocortex was observed to be enriched specifically for autism-associated gene sets (Figure 3), and not for any other mental health disorders included in our analysis (Figure 2). These findings suggest that, although inviable shortly after birth, the Tbr1 knockout animal may provide a valuable model for the study of autism-related biology. The Tbr1 null expression data also showed enrichment in two out of the four arthritis-associated gene sets. While this is not particularly strong enrichment, the observed variability could be due to broader physiological changes across numerous systems (possibly including the immune response) that must undoubtedly be affected by the fatal null mutation.
MSET enrichment analysis in expression data from methylphenidate treated mice
In addition to using MSET to analyze enrichment in expression results from animals that have undergone a natural change (mothers) and transgenic animals (Tbr1 null), we also tested its capabilities in a set of expression data from mice that were subjected to a pharmacological treatment. In the study, mice were treated with chronic (90 days) exposure to methylphenidate, commonly used to treat ADHD, and microarray analysis was performed on microdissected substantia nigra pars compacta (SNpc) . In our enrichment analysis of these data, we observed a subtle degree of enrichment for autism-related gene sets (in three out of nine lists), but found that a consensus of enrichment was only detected for ADHD-related gene lists, and not for any other mental health disorder or arthritis (Figure 2). This shows that MSET can be effectively utilized with sensitivity in microarray data collected from a variety of different experimental protocols and treatments, providing a promising new strategy for exploring the genetics underlying mental health disorders from numerous, complementary angles.
Considerations and limitations of MSET analysis
MSET allows for powerful research possibilities, but there are numerous considerations that must be made regarding its appropriate application and the input parameters used. MSET utilizes a fairly simple gene randomization testing procedure to determine if members of a disease-associated gene set are overrepresented within significant microarray results compared to what would be expected by chance. This is in contrast to programs like GSEA, in which the coincident distribution of gene set members is characterized within a ranked list of microarray results using a running-sum statistic and correlated to phenotype with individual sample expression values . Accordingly, MSET calls for only one simple input file of summarized microarray gene results (in addition to disease-associated gene sets of interest), and does not require expression values, chip annotations, or phenotype/trait files. Some web applications exist for performing overrepresentation analysis (such as GOHyperGAll in Bioconductor, InnateDB, and GenMAPP-CS in the GO-Elite program), but they include problematic gene ID conversions, species limitations, a strict dependence on GO terms and existing ontologies, and inflexibility in generating custom gene sets. MSET represents an advancement in versatility and ease of use over the existing landscape of tools for testing enrichment of independently curated disease-associated gene sets.
While MSET is theoretically capable of testing for enrichment of genes linked to functional pathways, using other, more full-featured programs for this purpose is recommended. It has been proposed that gene independence is a safe assumption for enrichment analysis . Others have countered that gene-gene interaction can inflate p-values and generate false positives in functional enrichment. Due to the relatively simple nature of the randomization algorithm, independence is assumed in MSET analysis for disease enrichment. This is a safe assumption because disease gene sets are heterogeneous groups curated by phenotypic associations, rather than functional relatedness. However, it would be more conservative to use other programs that account for potential gene-gene interaction in functional enrichment analyses. GSEA, for instance, preserves gene-gene interaction by permuting labels of whole samples, rather than at the individual gene level .
Because enrichment analysis is highly sensitive to the input gene lists used , this will be a focus for much of the discussion regarding the performance of MSET. It can be seen in Figure 1 that the nine autism databases used in our analysis vary in both their identity and size. They also differ in the methods used to produce candidate autism gene lists; therefore, one should be aware that some databases may be more robustly assembled than others, and confidence in MSET results relies critically on confidence in upstream database quality. MSET’s reliability is bolstered by its capacity to test enrichment for multiple gene lists associated with the same disease; this feature minimizes the effects of weak associations on enrichment testing significance.
There is generally a balance between specificity of enrichment and the accuracy of its detection, which is related to gene list length. This is particularly relevant when extracting gene lists from general disease association databases (such as the DISEASES database, GAD, HuGE Phenopedia, and Malacards), which compile positive associations broadly across many diseases. Smaller gene lists may be more specific to their associated disease, but the MSET suffers from a decrease in the accuracy of hypothesis testing as the average number of matches found in simulated results becomes small. This can be seen in the probability density curves for database matches generated with the AutismKB and Malacards autism gene lists in Figure 1. Their “spikey” appearance reflects the highly discrete nature of distributions with a very small range. Consequently, chance variation in the number of matches in the microarray results being analyzed, even by a single gene, represents a disproportionately large jump in p-value from peak to peak. The smoother distributions generated from larger databases provide a much greater resolution for hypothesis testing; however, larger gene lists may be less biologically specific to the associated disease, and extremely large gene lists can result in false positive enrichment results. While there is assuredly some “true” degree of genetic overlap underlying various diseases, there is probably an additional level of similarity across seemingly unrelated conditions introduced artificially through the methodology of association studies and their aggregation. For example, one might expect genes featured in centrally important signaling pathways to show positive associations with many diseases and experimental conditions in microarray studies, leading to false positive results for enrichment of extremely large gene lists. Specificity can be further complicated by the detailed nature of disease association labels in comprehensive databases. For instance, the DISEASES database has separate gene lists for arthritis, psoriatic arthritis, osteoarthritis, rheumatoid arthritis, septic arthritis, and more. The primary capacity of MSET to overcome these factors is rooted in its ability to be repeated modularly to generate a meta-analysis. This allows for isolated enrichment findings to be interpreted within the context of larger patterns. Also, because a deeper and more refined body of resources exists for autism genetics than for the other disorders featured in this study, we have relatively greater confidence in the downstream autism enrichment results. As ongoing research adds to our genetic understanding of various diseases, the MSET tool is in an ideal position to allow researchers to swiftly adapt and make use of updated knowledge bases in the future.
The tradeoff between specificity and accuracy also applies to the significant microarray results in which enrichment testing is performed. For postpartum LS expression data, we used an FDR-adjusted p-value of 0.25 as a significance threshold, which produced 809 genes from the microarray background. Researchers using other model organisms or biological systems may want to use different criteria for statistical significance. Other microarray studies may not yield a high number of significant gene changes by FDR-adjusted p-values. In these situations, a less stringent significance threshold may be applied to make use of a larger number of results, but the greater inclusivity and incidence of false discovery may render them somewhat less biologically meaningful. The subsequent enrichment analysis must therefore be taken with an accordingly critical interpretation.
MSET is designed to allow the user to conduct the most appropriate examination possible for enrichment of one or more disorders in a particular set of expression data. Due to the necessarily customizable nature of the input parameters that make for a quality assessment in one set of microarray results, it is difficult to objectively compare enrichment across numerous expression results. In the current study, we have done so by standardizing both the number of significant expression results selected from the background and the databases used. It cannot be assumed that the 809 most significant genes from one study are as meaningful or specific as those from another study, but the comprehensive and identically repeated analysis performed here is a valuable preliminary comparison. Collectively, the analyses undertaken in the current study provide a promising indication that the MSET method can be a valuable and informative approach to large scale genetic questions.