Lawrence F Bielak
Profile Url: lawrence-f-bielak
Researcher at Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan
Metabolic dysregulation in multiple tissues alters glucose homeostasis and influences risk for type 2 diabetes (T2D). To identify pathways and tissues influencing T2D-relevant glycemic traits (fasting glucose [FG], fasting insulin [FI], two-hour glucose [2hGlu] and glycated hemoglobin [HbA1c]), we investigated associations of exome-array variants in up to 144,060 individuals without diabetes of multiple ancestries. Single-variant analyses identified novel associations at 21 coding variants in 18 novel loci, whilst gene-based tests revealed signals at two genes, TF (HbA1c) and G6PC (FG, FI). Pathway and tissue enrichment analyses of trait-associated transcripts confirmed the importance of liver and kidney for FI and pancreatic islets for FG regulation, implicated adipose tissue in FI and the gut in 2hGlu, and suggested a role for the non-endocrine pancreas in glucose homeostasis. Functional studies demonstrated that a novel FG/FI association at the liver-enriched G6PC transcript was driven by multiple rare loss-of-function variants. The FG/HbA1c-associated, islet-specific G6PC2 transcript also contained multiple rare functional variants, including two alleles within the same codon with divergent effects on glucose levels. Our findings highlight the value of integrating genomic and functional data to maximize biological inference.
We assembled an ancestrally diverse collection of genome-wide association studies of type 2 diabetes (T2D) in 180,834 cases and 1,159,055 controls (48.9% non-European descent). We identified 277 loci at genome-wide significance (p<5x10-8), including 237 attaining a more stringent trans-ancestry threshold (p<5x10-9), which were delineated to 338 distinct association signals. Trans-ancestry meta-regression offered substantial enhancements to fine-mapping, with 58.6% of associations more precisely localised due to population diversity, and 54.4% of signals resolved to a single variant with >50% posterior probability. This improved fine-mapping enabled systematic assessment of candidate causal genes and molecular mechanisms through which T2D associations are mediated, laying foundations for functional investigations. Trans-ancestry genetic risk scores enhanced transferability across diverse populations, providing a step towards more effective clinical translation to improve global health.
Mechanistic processes underlying human germline mutations remain largely unknown. Variation in mutation rate and spectra along the genome is informative about the biological mechanisms. We statistically decompose this variation into separate processes using a blind source separation technique. The analysis of a large-scale whole genome sequencing dataset (TOPMed) reveals nine processes that explain the variation in mutation properties between loci. Seven of these processes lend themselves to a biological interpretation. One process is driven by bulky DNA lesions that resolve asymmetrically with respect to transcription and replication. Two processes independently track direction of replication fork and replication timing. We identify a mutagenic effect of active demethylation primarily acting in regulatory regions. We also demonstrate that a recently discovered mutagenic process specific to oocytes can be localized solely from population sequencing data. This process is spread across all chromosomes and is highly asymmetric with respect to the direction of transcription, suggesting a major role of DNA damage.
Age is the dominant risk factor for most chronic human diseases; yet the mechanisms by which aging confers this risk are largely unknown. Recently, the age-related acquisition of somatic mutations in regenerating hematopoietic stem cell populations was associated with both hematologic cancer incidence and coronary heart disease prevalence. Somatic mutations with leukemogenic potential may confer selective cellular advantages leading to clonal expansion, a phenomenon termed 'Clonal Hematopoiesis of Indeterminate Potential' (CHIP). Simultaneous germline and somatic whole genome sequence analysis now provides the opportunity to identify root causes of CHIP. Here, we analyze high-coverage whole genome sequences from 97,691 participants of diverse ancestries in the NHLBI TOPMed program and identify 4,229 individuals with CHIP. We identify associations with blood cell, lipid, and inflammatory traits specific to different CHIP genes. Association of a genome-wide set of germline genetic variants identified three genetic loci associated with CHIP status, including one locus at TET2 that was African ancestry specific. In silico-informed in vitro evaluation of the TET2 germline locus identified a causal variant that disrupts a TET2 distal enhancer. Aggregates of rare germline loss-of-function variants in CHEK2, a DNA damage repair gene, predisposed to CHIP acquisition. Overall, we observe that germline genetic variation altering hematopoietic stem cell function and the fidelity of DNA-damage repair increase the likelihood of somatic mutations leading to CHIP.
Genotype-phenotype association studies often combine phenotype data from multiple studies to increase power. Harmonization of the data usually requires substantial effort due to heterogeneity in phenotype definitions, study design, data collection procedures, and data set organization. Here we describe a centralized system for phenotype harmonization that includes input from phenotype domain and study experts, quality control, documentation, reproducible results, and data sharing mechanisms. This system was developed for the National Heart, Lung and Blood Institute's Trans-Omics for Precision Medicine (TOPMed) program, which is generating genomic and other omics data for >80 studies with extensive phenotype data. To date, 63 phenotypes have been harmonized across thousands of participants from up to 17 TOPMed studies per phenotype. We discuss the challenges faced in this undertaking and how they were addressed. The harmonized phenotype data and associated documentation have been submitted to National Institutes of Health data repositories for controlled-access by the scientific community. We also provide materials to facilitate future harmonization efforts by the community, which include (1) the code used to generate the 63 harmonized phenotypes, enabling others to reproduce, modify or extend these harmonizations to additional studies; and (2) results of labeling thousands of phenotype variables with controlled vocabulary terms.
The Trans-Omics for Precision Medicine (TOPMed) program seeks to elucidate the genetic architecture and disease biology of heart, lung, blood, and sleep disorders, with the ultimate goal of improving diagnosis, treatment, and prevention. The initial phases of the program focus on whole genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here, we describe TOPMed goals and design as well as resources and early insights from the sequence data. The resources include a variant browser, a genotype imputation panel, and sharing of genomic and phenotypic data via dbGaP. In 53,581 TOPMed samples, >400 million single-nucleotide and insertion/deletion variants were detected by alignment with the reference genome. Additional novel variants are detectable through assembly of unmapped reads and customized analysis in highly variable loci. Among the >400 million variants detected, 97% have frequency <1% and 46% are singletons. These rare variants provide insights into mutational processes and recent human evolutionary history. The nearly complete catalog of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and non-coding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and extends the reach of nearly all genome-wide association studies to include variants down to ~0.01% in frequency.