Genetic diseases often manifest in specific tissues despite having the genetic risk variants in all cells. The most commonly assumed mechanism is selective expression of the causal gene in the pathogenic tissues, but other mechanisms are less explored. Using CRISPR screens from 789 cell lines and 27 lineages, we identified 1274 lineage-specific essential genes (LSEGs). We show that only a minority of LSEGs are explained by preferential expression (n = 115), and a big proportion of them (n = 509) is explained by lineage-specific gene amplification. Three other mechanisms were identified by genome-wide expression analysis. First, lineage-specific expression of paralogs leads to reduced functional redundancy and can account for 153 LSEGs. Second, for 45 LSEGs, the paralog expression increases vulnerability, implying functional codependency. Third, we suggest that the transfer of small molecules to mutant cells could explain blood-specific essentiality. Overall, LSEGs were more likely to be associated with human diseases than common essential genes, were highly intolerant to mutations and function in developmental pathways. Analysis of diverse human cell types found that the expression specificity of LSEGs and their paralogs is consistent with preferential expression and functional redundancy being a general phenomenon. Our findings offer important insights into genetic mechanisms for tissue specificity of human diseases.
Insulator proteins located at the boundaries of topological associated domains (TAD) are involved in regulating chromatin loops. Yet, how chromatin loops contribute to transcription regulation is still not clear. Here we show that Relative-of-WOC (ROW) is essential for the long-range transcription regulation mediated by the Boundary Element-Associated Factor of 32kD (BEAF-32). We found that ROW physically interacts with heterochromatin proteins (HP1b and HP1c) and the insulator protein BEAF-32. The co-localization happens at TAD boundaries where ROW, through its AT-hooks motifs, binds AT-rich sequences flanked by BEAF-32 binding sites and motifs. Knockdown of row resulted in downregulation of genes that are long-range targets of BEAF-32 and bound indirectly by ROW (without binding motif). Analysis of high- throughput chromosome conformation capture (Hi-C) data revealed long-range interactions between promoters of housekeeping genes bound directly by ROW and promoters of developmental genes bound indirectly by ROW. Thus, our results show cooperation between BEAF-32 and the ROW complex, which includes HP1 proteins, to regulate the transcription of developmental and inducible genes by chromatin loops.
Human sex differences are thought to arise from gonadal hormones and genes on the sex chromosomes. Here we studied how sex and the sex chromosomes can modulate the outcome of mutations across the genome. We used the results of genome-wide CRISPR-based screens on 306 female and 396 male cancer cell lines to detect differences in gene essentiality between the sexes. By exploiting the tendency of cancer cells to lose or gain sex chromosomes, we were able to dissect the contribution of the Y and X chromosomes to variable gene essentiality. Using this approach, we identified 178 differentially essential genes that depend on the biological sex or the sex chromosomes. Integration with sex bias in gene expression and the rate of somatic mutations in human tumors highlighted genes that escape from X-inactivation, cancer-testis antigens, and Y-linked paralogs as central to the functional genetic differences between males and females.
Childhood-onset schizophrenia (COS) is a rare form of schizophrenia with an onset before 13 years of age. There is rising evidence that genetic factors play a major role in COS etiology, yet, only a few single gene mutations have been discovered. Here we present a diagnostic whole-exome sequencing (WES) in an Israeli Jewish female with COS and additional neuropsychiatric conditions such as obsessive-compulsive disorder (OCD), anxiety, and aggressive behavior. Variant analysis revealed a de novo novel stop gained variant in GRIA2 gene (NM_000826.4: c.1522 G > T (p.Glu508Ter)). GRIA2 encodes for a subunit of the AMPA sensitive glutamate receptor (GluA2) that functions as ligand-gated ion channel in the central nervous system and plays an important role in excitatory synaptic transmission. GluA2 subunit mutations are known to cause variable neurodevelopmental phenotypes including intellectual disability, autism spectrum disorder, epilepsy, and OCD. Our findings support the potential diagnostic role of WES in COS, identify GRIA2 as possible cause to a broad psychiatric phenotype that includes COS as a major manifestation and expand the previously reported GRIA2 loss of function phenotypes.
In the past decade, the identification of susceptibility genes for psychiatric disorders has become routine, but understanding the biology underlying these discoveries has proven extremely difficult. The large number of potential risk genes and the genetic overlap between disorders are major obstacles for studying the etiology of these conditions. Systems biology approaches relying on gene ontologies, gene coexpression, and protein-protein interactions are used to identify convergence of the genes in relation to biological processes, cell types, brain areas, and developmental stages. Across psychiatric disorders, there is a clear enrichment for genes expressed in the brain and especially in the cortex, but a higher resolution is vastly dependent on sample size and statistical power. There is indication that susceptibility genes tend to be expressed in the brain during periods preceding the typical onset of the disorders. Thus, the role of genes in prenatal brain development is more pronounced for childhood-onset disorders, such as autism spectrum disorder and attention-deficit/hyperactivity disorder, but is much less so for bipolar disorder and depression. One of the most consistent findings across multiple disorders and classes of genetic variants is the role of genes intolerant to mutations in psychiatric disorders, yet this association is more pronounced for disorders with a clear neurodevelopmental component. Notwithstanding, a detailed understanding of the neurobiology of psychiatric disorders is still lacking. It is possible that it will only be revealed by studying the risk genes at the level of the development and function of neuronal networks and circuits.
Stroke is a leading cause of death and disability. Recovery depends on a delicate balance between inflammatory responses and immune suppression, tipping the scale between brain protection and susceptibility to infection. Peripheral cholinergic blockade of immune reactions fine-tunes this immune response, but its molecular regulators are unknown. Here, we report a regulatory shift in small RNA types in patient blood sequenced 2 d after ischemic stroke, comprising massive decreases of microRNA levels and concomitant increases of transfer RNA fragments (tRFs) targeting cholinergic transcripts. Electrophoresis-based size-selection followed by qRT-PCR validated the top six up-regulated tRFs in a separate cohort of stroke patients, and independent datasets of small and long RNA sequencing pinpointed immune cell subsets pivotal to these responses, implicating CD14 monocytes in the cholinergic inflammatory reflex. In-depth small RNA targeting analyses revealed the most-perturbed pathways following stroke and implied a structural dichotomy between microRNA and tRF target sets. Furthermore, lipopolysaccharide stimulation of murine RAW 264.7 cells and human CD14 monocytes up-regulated the top six stroke-perturbed tRFs, and overexpression of stroke-inducible tRF-22-WE8SPOX52 using a single-stranded RNA mimic induced down-regulation of immune regulator Z-DNA binding protein 1. In summary, we identified a "changing of the guards" between small RNA types that may systemically affect homeostasis in poststroke immune responses, and pinpointed multiple affected pathways, which opens new venues for establishing therapeutics and biomarkers at the protein and RNA level.
We introduce a novel methodology for describing animal behavior as a tradeoff between value and complexity, using the Morris Water Maze navigation task as a concrete example. We develop a dynamical system model of the Water Maze navigation task, solve its optimal control under varying complexity constraints, and analyze the learning process in terms of the value and complexity of swimming trajectories. The value of a trajectory is related to its energetic cost and is correlated with swimming time. Complexity is a novel learning metric which measures how unlikely is a trajectory to be generated by a naive animal. Our model is analytically tractable, provides good fit to observed behavior and reveals that the learning process is characterized by early value optimization followed by complexity reduction. Furthermore, complexity sensitively characterizes behavioral differences between mouse strains.
Several genes implicated in autism spectrum disorder (ASD) are chromatin regulators, including POGZ. The cellular and molecular mechanisms leading to ASD impaired social and cognitive behavior are unclear. Animal models are crucial for studying the effects of mutations on brain function and behavior as well as unveiling the underlying mechanisms. Here, we generate a brain specific conditional knockout mouse model deficient for Pogz, an ASD risk gene. We demonstrate that Pogz deficient mice show microcephaly, growth impairment, increased sociability, learning and motor deficits, mimicking several of the human symptoms. At the molecular level, luciferase reporter assay indicates that POGZ is a negative regulator of transcription. In accordance, in Pogz deficient mice we find a significant upregulation of gene expression, most notably in the cerebellum. Gene set enrichment analysis revealed that the transcriptional changes encompass genes and pathways disrupted in ASD, including neurogenesis and synaptic processes, underlying the observed behavioral phenotype in mice. Physiologically, Pogz deficiency is associated with a reduction in the firing frequency of simple and complex spikes and an increase in amplitude of the inhibitory synaptic input in cerebellar Purkinje cells. Our findings support a mechanism linking heterochromatin dysregulation to cerebellar circuit dysfunction and behavioral abnormalities in ASD.
Profiling immunoglobulin (Ig) receptor repertoires with specialized assays can be cost-ineffective and time-consuming. Here we report ImReP, a computational method for rapid and accurate profiling of the Ig repertoire, including the complementary-determining region 3 (CDR3), using regular RNA sequencing data such as those from 8,555 samples across 53 tissues types from 544 individuals in the Genotype-Tissue Expression (GTEx v6) project. Using ImReP and GTEx v6 data, we generate a collection of 3.6 million Ig sequences, termed the atlas of immunoglobulin repertoires (TAIR), across a broad range of tissue types that often do not have reported Ig repertoires information. Moreover, the flow of Ig clonotypes and inter-tissue repertoire similarities across immune-related tissues are also evaluated. In summary, TAIR is one of the largest collections of CDR3 sequences and tissue types, and should serve as an important resource for studying immunological diseases.
Mutations in AUTS2 are associated with autism, intellectual disability, and microcephaly. AUTS2 is expressed in the brain and interacts with polycomb proteins, yet it is still unclear how mutations in AUTS2 lead to neurodevelopmental phenotypes. Here we report that when neuronal differentiation is initiated, there is a shift in expression from a long isoform to a short AUTS2 isoform. Yeast two-hybrid screen identified the splicing factor SF3B1 as an interactor of both isoforms, whereas the polycomb group proteins, PCGF3 and PCGF5, were found to interact exclusively with the long AUTS2 isoform. Reporter assays showed that the first exons of the long AUTS2 isoform function as a transcription repressor, but the part that consist of the short isoform acts as a transcriptional activator, both influenced by the cellular context. The expression levels of PCGF3 influenced the ability of the long AUTS2 isoform to activate or repress transcription. Mouse embryonic stem cells (mESCs) with heterozygote mutations in Auts2 had an increase in cell death during in vitro corticogenesis, which was significantly rescued by overexpressing the human AUTS2 transcripts. mESCs with a truncated AUTS2 protein (missing exons 12-20) showed premature neuronal differentiation, whereas cells overexpressing AUTS2, especially the long transcript, showed increase in expression of pluripotency markers and delayed differentiation. Taken together, our data suggest that the precise expression of AUTS2 isoforms is essential for regulating transcription and the timing of neuronal differentiation.
It is an open question whether aging-related changes throughout the brain are driven by a common factor or result from several distinct molecular mechanisms. Quantitative magnetic resonance imaging (qMRI) provides biophysical parametric measurements allowing for non-invasive mapping of the aging human brain. However, qMRI measurements change in response to both molecular composition and water content. Here, we present a tissue relaxivity approach that disentangles these two tissue components and decodes molecular information from the MRI signal. Our approach enables us to reveal the molecular composition of lipid samples and predict lipidomics measurements of the brain. It produces unique molecular signatures across the brain, which are correlated with specific gene-expression profiles. We uncover region-specific molecular changes associated with brain aging. These changes are independent from other MRI aging markers. Our approach opens the door to a quantitative characterization of the biological sources for aging, that until now was possible only post-mortem.
BACKGROUND: Neurodevelopmental disorders (NDDs) such as autism spectrum disorder, intellectual disability, developmental disability, and epilepsy are characterized by abnormal brain development that may affect cognition, learning, behavior, and motor skills. High co-occurrence (comorbidity) of NDDs indicates a shared, underlying biological mechanism. The genetic heterogeneity and overlap observed in NDDs make it difficult to identify the genetic causes of specific clinical symptoms, such as seizures.
METHODS: We present a computational method, MAGI-S, to discover modules or groups of highly connected genes that together potentially perform a similar biological function. MAGI-S integrates protein-protein interaction and co-expression networks to form modules centered around the selection of a single "seed" gene, yielding modules consisting of genes that are highly co-expressed with the seed gene. We aim to dissect the epilepsy phenotype from a general NDD phenotype by providing MAGI-S with high confidence NDD seed genes with varying degrees of association with epilepsy, and we assess the enrichment of de novo mutation, NDD-associated genes, and relevant biological function of constructed modules.
RESULTS: The newly identified modules account for the increased rate of de novo non-synonymous mutations in autism, intellectual disability, developmental disability, and epilepsy, and enrichment of copy number variations (CNVs) in developmental disability. We also observed that modules seeded with genes strongly associated with epilepsy tend to have a higher association with epilepsy phenotypes than modules seeded at other neurodevelopmental disorder genes. Modules seeded with genes strongly associated with epilepsy (e.g., SCN1A, GABRA1, and KCNB1) are significantly associated with synaptic transmission, long-term potentiation, and calcium signaling pathways. On the other hand, modules found with seed genes that are not associated or weakly associated with epilepsy are mostly involved with RNA regulation and chromatin remodeling.
CONCLUSIONS: In summary, our method identifies modules enriched with de novo non-synonymous mutations and can capture specific networks that underlie the epilepsy phenotype and display distinct enrichment in relevant biological processes. MAGI-S is available at https://github.com/jchow32/magi-s .
Autism spectrum disorder (ASD) presents a wide, and often varied, behavioral phenotype. Improper assessment of risks has been reported among individuals diagnosed with ASD. Improper assessment of risks may lead to increased accidents and self-injury, also reported among individuals diagnosed with ASD. However, there is little knowledge of the molecular underpinnings of the impaired risk-assessment phenotype. In this study, we have identified impaired risk-assessment activity in multiple male ASD mouse models. By performing network-based analysis of striatal whole transcriptome data from each of these ASD models, we have identified a cluster of glutamate receptor-associated genes that correlate with the risk-assessment phenotype. Furthermore, pharmacological inhibition of striatal glutamatergic receptors was able to mimic the dysregulation in risk-assessment. Therefore, this study has identified a molecular mechanism that may underlie risk-assessment dysregulation in ASD.
Mouse embryonic stem cells (mESCs) are key components in generating mouse models for human diseases and performing basic research on pluripotency, yet the number of genes essential for mESCs is still unknown. We performed a genome-wide screen for essential genes in mESCs and compared it to screens in human cells. We found that essential genes are enriched for basic cellular functions, are highly expressed in mESCs, and tend to lack paralog genes. We discovered that genes that are essential specifically in mESCs play a role in pathways associated with their pluripotent state. We show that 29.5% of human genes intolerant to loss-of-function mutations are essential in mouse or human ESCs, and that the human phenotypes most significantly associated with genes essential for ESCs are neurodevelopmental. Our results provide insights into essential genes in the mouse, the pathways which govern pluripotency, and suggest that many genes associated with neurodevelopmental disorders are essential at very early embryonic stages.
High-throughput RNA-sequencing (RNA-seq) technologies provide an unprecedented opportunity to explore the individual transcriptome. Unmapped reads are a large and often overlooked output of standard RNA-seq analyses. Here, we present Read Origin Protocol (ROP), a tool for discovering the source of all reads originating from complex RNA molecules. We apply ROP to samples across 2630 individuals from 54 diverse human tissues. Our approach can account for 99.9% of 1 trillion reads of various read length. Additionally, we use ROP to investigate the functional mechanisms underlying connections between the immune system, microbiome, and disease. ROP is freely available at https://github.com/smangul1/rop/wiki .
MicroRNAs orchestrate brain functioning via interaction with microRNA recognition elements (MRE) on target transcripts. However, the global impact of potential competition on the microRNA pool between coding and non-coding brain transcripts that share MREs with them remains unexplored. Here we report that non-coding pseudogene transcripts carrying MREs (PSG) often show duplicated origin, evolutionary conservation and higher expression in human temporal lobe neurons than comparable duplicated MRE-deficient pseudogenes (PSG). PSG participate in neuronal RNA-induced silencing complexes (RISC), indicating functional involvement. Furthermore, downregulation cell culture experiments validated bidirectional co-regulation of PSG with MRE-sharing coding transcripts, frequently not their mother genes, and with targeted microRNAs; also, PSG single-nucleotide polymorphisms associated with schizophrenia, bipolar disorder and autism, suggesting interaction with mental diseases. Our findings indicate functional roles of duplicated PSG in brain development and cognition, supporting physiological impact of the reciprocal co-regulation of PSG with MRE-sharing coding transcripts in human brain neurons.
Genetic susceptibility to intellectual disability (ID), autism spectrum disorder (ASD), and schizophrenia (SCZ) often arises from mutations in the same genes, suggesting that they share common mechanisms. We studied genes with de novo mutations in the three disorders and genes implicated in SCZ by genome-wide association study (GWAS). Using biological annotations and brain gene expression, we show that mutation class explains enrichment patterns more than specific disorder. Genes with loss-of-function mutations and genes with missense mutations were associated with different pathways across disorders. Conversely, gene expression patterns were specific for each disorder. ID genes were preferentially expressed in the cortex; ASD genes were expressed in the fetal cortex, cerebellum, and striatum; and genes associated with SCZ were expressed in the adolescent cortex. Our study suggests that convergence across neuropsychiatric disorders stems from common pathways that are consistently vulnerable to genetic variations but that spatiotemporal activity of genes contributes to specific phenotypes.
Recent successes in genome-wide association studies (GWASs) make it possible to address important questions about the genetic architecture of complex traits, such as allele frequency and effect size. One lesser-known aspect of complex traits is the extent of allelic heterogeneity (AH) arising from multiple causal variants at a locus. We developed a computational method to infer the probability of AH and applied it to three GWASs and four expression quantitative trait loci (eQTL) datasets. We identified a total of 4,152 loci with strong evidence of AH. The proportion of all loci with identified AH is 4%-23% in eQTLs, 35% in GWASs of high-density lipoprotein (HDL), and 23% in GWASs of schizophrenia. For eQTLs, we observed a strong correlation between sample size and the proportion of loci with AH (R = 0.85, p = 2.2 × 10), indicating that statistical power prevents identification of AH in other loci. Understanding the extent of AH may guide the development of new methods for fine mapping and association mapping of complex traits.
The study of the genetics of gene expression is of considerable importance to understanding the nature of common, complex diseases. The most widely applied approach to identifying relationships between genetic variation and gene expression is the expression quantitative trait loci (eQTL) approach. Here, we increased the computational power of eQTL with an alternative and complementary approach based on analyzing allele specific expression (ASE). We designed a novel analytical method to identify cis-acting regulatory variants based on genome sequencing and measurements of ASE from RNA-sequencing (RNA-seq) data. We evaluated the power and resolution of our method using simulated data. We then applied the method to map regulatory variants affecting gene expression in lymphoblastoid cell lines (LCLs) from 77 unrelated northern and western European individuals (CEU), which were part of the HapMap project. A total of 2309 SNPs were identified as being associated with ASE patterns. The SNPs associated with ASE were enriched within promoter regions and were significantly more likely to signal strong evidence for a regulatory role. Finally, among the candidate regulatory SNPs, we identified 108 SNPs that were previously associated with human immune diseases. With further improvements in quantifying ASE from RNA-seq, the application of our method to other datasets is expected to accelerate our understanding of the biological basis of common diseases.