Esteban G. Burchard
Profile Url: esteban-g--burchard
Researcher at Department of Bioengineering and Therapeutic Sciences, University of California
Mechanistic processes underlying human germline mutations remain largely unknown. Variation in mutation rate and spectra along the genome is informative about the biological mechanisms. We statistically decompose this variation into separate processes using a blind source separation technique. The analysis of a large-scale whole genome sequencing dataset (TOPMed) reveals nine processes that explain the variation in mutation properties between loci. Seven of these processes lend themselves to a biological interpretation. One process is driven by bulky DNA lesions that resolve asymmetrically with respect to transcription and replication. Two processes independently track direction of replication fork and replication timing. We identify a mutagenic effect of active demethylation primarily acting in regulatory regions. We also demonstrate that a recently discovered mutagenic process specific to oocytes can be localized solely from population sequencing data. This process is spread across all chromosomes and is highly asymmetric with respect to the direction of transcription, suggesting a major role of DNA damage.
Age is the dominant risk factor for most chronic human diseases; yet the mechanisms by which aging confers this risk are largely unknown. Recently, the age-related acquisition of somatic mutations in regenerating hematopoietic stem cell populations was associated with both hematologic cancer incidence and coronary heart disease prevalence. Somatic mutations with leukemogenic potential may confer selective cellular advantages leading to clonal expansion, a phenomenon termed 'Clonal Hematopoiesis of Indeterminate Potential' (CHIP). Simultaneous germline and somatic whole genome sequence analysis now provides the opportunity to identify root causes of CHIP. Here, we analyze high-coverage whole genome sequences from 97,691 participants of diverse ancestries in the NHLBI TOPMed program and identify 4,229 individuals with CHIP. We identify associations with blood cell, lipid, and inflammatory traits specific to different CHIP genes. Association of a genome-wide set of germline genetic variants identified three genetic loci associated with CHIP status, including one locus at TET2 that was African ancestry specific. In silico-informed in vitro evaluation of the TET2 germline locus identified a causal variant that disrupts a TET2 distal enhancer. Aggregates of rare germline loss-of-function variants in CHEK2, a DNA damage repair gene, predisposed to CHIP acquisition. Overall, we observe that germline genetic variation altering hematopoietic stem cell function and the fidelity of DNA-damage repair increase the likelihood of somatic mutations leading to CHIP.
The Trans-Omics for Precision Medicine (TOPMed) program seeks to elucidate the genetic architecture and disease biology of heart, lung, blood, and sleep disorders, with the ultimate goal of improving diagnosis, treatment, and prevention. The initial phases of the program focus on whole genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here, we describe TOPMed goals and design as well as resources and early insights from the sequence data. The resources include a variant browser, a genotype imputation panel, and sharing of genomic and phenotypic data via dbGaP. In 53,581 TOPMed samples, >400 million single-nucleotide and insertion/deletion variants were detected by alignment with the reference genome. Additional novel variants are detectable through assembly of unmapped reads and customized analysis in highly variable loci. Among the >400 million variants detected, 97% have frequency <1% and 46% are singletons. These rare variants provide insights into mutational processes and recent human evolutionary history. The nearly complete catalog of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and non-coding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and extends the reach of nearly all genome-wide association studies to include variants down to ~0.01% in frequency.
Background: Exposure to environmental pollutants has been shown to be associated with asthma, but few studies have evaluated the effect of wood smoke on asthma and disease severity in a developing country, where use of stoves powered by solid fuels is a common practice. Objective: In a population in Olancho, Honduras, we evaluated the association between cooking fuel, stove type and asthma. We also evaluated the effects of these factors on asthma symptoms, lung function, and atopy. Methods: Participants with physician-diagnosed asthma (n = 597) and controls without asthma (n = 429) were recruited from the Olancho province in Honduras. Participants were interviewed using a questionnaire and their baseline pulmonary function was measured using spirometry. Results: The prevalence of use of wood as a cooking fuel was 66.9% in the study population, of which 42.1% of participants used wood as their only fuel. Use of wood as a cooking fuel was more prevalent among households with lower income, lower maternal education, and less urbanization. The prevalence of use of an open wood stove as the primary cooking stove among participants with asthma was 6.2% higher (95% CI 0.8 - 11.7%, p = .02) than among healthy controls. In a multiple logistic regression model, we identified a significant association between use of an open wood stove and asthma (OR = 1.80, 95% CI = 1.17 - 2.78, p = 0.007), compared to the referent (electric) stove category. Among participants with asthma, we identified a significant association between use of wood as cooking fuel and increased daytime respiratory symptoms (OR = 1.46, CI: 1.01 - 2.58, p = 0.046) and nocturnal symptoms (OR = 2.51, CI: 1.04 - 2.62, p = 0.04), though not with pulmonary function. Among control participants without asthma, use of wood as cooking fuel was associated with atopy (OR = 1.94, CI = 1.14 - 3.33, p = 0.015) and cough (OR = 2.22, CI = 1.09 - 4.88, p = 0.04). Conclusions: Use of an open wood stove for cooking in a developing country appears to be a significant risk factor for asthma and respiratory symptoms. Exposure to wood smoke may play a role in atopic sensitization and respiratory symptoms, leading to the development of obstructive lung disease in susceptible individuals.
Epigenetics & Chromatin, 2017-01-03
Genetic data are known to harbor information about human demographics, and genotyping data are commonly used for capturing ancestry information by leveraging genome-wide differences between populations. In contrast, it is not clear to what extent population structure is captured by whole-genome DNA methylation data. We demonstrate, using three large cohort 450K methylation array data sets, that ancestry information signal is mirrored in genome-wide DNA methylation data, and that it can be further isolated more effectively by leveraging the correlation structure of CpGs with cis-located SNPs. Based on these insights, we propose a method, EPISTRUCTURE, for the inference of ancestry from methylation data, without the need for genotype data. EPISTRUCTURE can be used to infer ancestry information of individuals based on their methylation data in the absence of corresponding genetic data. Although genetic data are often collected in epigenetic studies of large cohorts, these are typically not made publicly available, making the application of EPISTRUCTURE especially useful for anyone working on public data. Implementation of EPISTRUCTURE is available in GLINT, our recently released toolset for DNA methylation analysis at: http://glint-epigenetics.readthedocs.io.
American Journal of Respiratory and Critical Care Medicine, 2018-06-15
Asthma is the most common chronic disease of children, with significant racial/ethnic differences in prevalence, morbidity, mortality and therapeutic response. Albuterol, a bronchodilator medication, is the first-line therapy for asthma treatment worldwide. We performed the largest whole genome sequencing (WGS) pharmacogenetics study to date using data from 1,441 minority children with asthma who had extremely high or low bronchodilator drug response (BDR). We identified population-specific and shared pharmacogenetic variants associated with BDR, including genome-wide significant (p < 3.53 x 10-7) and suggestive (p < 7.06 x 10-6) loci near genes previously associated with lung capacity (DNAH5), immunity (NFKB1 and PLCB1), and β-adrenergic signaling pathways (ADAMTS3 and COX18). Functional analyses centered on NFKB1 revealed potential regulatory function of our BDR-associated SNPs in bronchial smooth muscle cells. Specifically, these variants are in linkage disequilibrium with SNPs in a functionally active enhancer, and are also expression quantitative trait loci (eQTL) for a neighboring gene, SLC39A8. Given the lack of other asthma study populations with WGS data on minority children, replication of our rare variant associations is infeasible. We attempted to replicate our common variant findings in five independent studies with GWAS data. The age-specific associations previously found in asthma and asthma-related traits suggest that the over-representation of adults in our replication populations may have contributed to our lack of statistical replication, despite the functional relevance of the NFKB1 variants demonstrated by our functional assays. Our study expands the understanding of pharmacogenetic analyses in racially/ethnically diverse populations and advances the foundation for precision medicine in at-risk and understudied minority populations.