Abstract
Rare coding variants that significantly impact function provide insights into the biology of a gene1-3. However, ascertaining their frequency requires large sample sizes4-8. Here, we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. 23% of the Regeneron Genetics Center Million Exome data (RGC-ME) comes from non-European individuals of African, East Asian, Indigenous American, Middle Eastern, and South Asian ancestry. This catalogue includes over 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss-of-function, we identify 3,988 loss-of-function intolerant genes, including 86 that were previously assessed as tolerant and 1,153 lacking established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions depleted of missense variants despite being tolerant to pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this important resource of coding variation from the RGC-ME accessible via a public variant allele frequency browser.
This is a preview of subscription content, access via your institution
Access options
Change institution
Buy or subscribe
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
£14.99 /30days
cancel any time
Learn more
Subscribe to this journal
Receive 51 print issues and online access
£199.00 per year
only £3.90 per issue
Learn more
Rent or buy this article
Prices vary by article type
from$1.95
to$39.95
Learn more
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Identifying proteomic risk factors for cancer using prospective and exome analyses of 1463 circulating proteins and risk of 19 cancers in the UK Biobank
Article Open access 15 May 2024
Analysis of somatic mutations in whole blood from 200,618 individuals identifies pervasive positive selection and novel drivers of clonal hematopoiesis
Article Open access 14 May 2024
Genome-wide association studies
Article 26 August 2021
Author information
Author notes
These authors contributed equally: Kathie Y. Sun, Xiaodong Bai
Authors and Affiliations
Regeneron Genetics Center, Tarrytown, NY, USA
Kathie Y. Sun,Xiaodong Bai,Siying Chen,Suying Bao,Chuanyi Zhang,Manav Kapoor,Joshua Backman,Tyler Joseph,Evan Maxwell,George Mitra,Alexander Gorovits,Adam Mansfield,Boris Boutkov,Sujit Gokhale,Lukas Habegger,Anthony Marcketta,Adam E. Locke,Liron Ganel,Alicia Hawes,Michael D. Kessler,Deepika Sharma,Jeffrey Staples,Jonas Bovijn,Sahar Gelfman,Alessandro Di Gioia,Veera M. Rajagopal,Alexander Lopez,Jennifer Rico Varela,Michael Cantor,Timothy Thornton,Hyun Min Kang,John D. Overton,Alan R. Shuldiner,M. Laura Cremona,Mona Nafde,Aris Baras,Goncalo Abecasis,Jonathan Marchini,Jeffrey G. Reid,William Salerno&Suganthi Balasubramanian
Faculty of Medicine, National Autonomous University of Mexico (UNAM), Mexico City, Mexico
Jesus Alegre,Jaime Berumen,Roberto Tapia-Conyer&Pablo Kuri-Morales
Instituto Tecnológico y de Estudios Superiores de Monterrey, Monterrey, Mexico
Pablo Kuri-Morales
Clinical Trial Service Unit & Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
Jason Torres,Jonathan Emberson&Rory Collins
Authors
- Kathie Y. Sun
View author publications
You can also search for this author in PubMedGoogle Scholar
- Xiaodong Bai
View author publications
You can also search for this author in PubMedGoogle Scholar
- Siying Chen
View author publications
You can also search for this author in PubMedGoogle Scholar
- Suying Bao
View author publications
You can also search for this author in PubMedGoogle Scholar
- Chuanyi Zhang
View author publications
You can also search for this author in PubMedGoogle Scholar
- Manav Kapoor
View author publications
You can also search for this author in PubMedGoogle Scholar
- Joshua Backman
View author publications
You can also search for this author in PubMedGoogle Scholar
- Tyler Joseph
View author publications
You can also search for this author in PubMedGoogle Scholar
- Evan Maxwell
View author publications
You can also search for this author in PubMedGoogle Scholar
- George Mitra
View author publications
You can also search for this author in PubMedGoogle Scholar
- Alexander Gorovits
View author publications
You can also search for this author in PubMedGoogle Scholar
- Adam Mansfield
View author publications
You can also search for this author in PubMedGoogle Scholar
- Boris Boutkov
View author publications
You can also search for this author in PubMedGoogle Scholar
- Sujit Gokhale
View author publications
You can also search for this author in PubMedGoogle Scholar
- Lukas Habegger
View author publications
You can also search for this author in PubMedGoogle Scholar
- Anthony Marcketta
View author publications
You can also search for this author in PubMedGoogle Scholar
- Adam E. Locke
View author publications
You can also search for this author in PubMedGoogle Scholar
- Liron Ganel
View author publications
You can also search for this author in PubMedGoogle Scholar
- Alicia Hawes
View author publications
You can also search for this author in PubMedGoogle Scholar
- Michael D. Kessler
View author publications
You can also search for this author in PubMedGoogle Scholar
- Deepika Sharma
View author publications
You can also search for this author in PubMedGoogle Scholar
- Jeffrey Staples
View author publications
You can also search for this author in PubMedGoogle Scholar
- Jonas Bovijn
View author publications
You can also search for this author in PubMedGoogle Scholar
- Sahar Gelfman
View author publications
You can also search for this author in PubMedGoogle Scholar
- Alessandro Di Gioia
View author publications
You can also search for this author in PubMedGoogle Scholar
- Veera M. Rajagopal
View author publications
You can also search for this author in PubMedGoogle Scholar
- Alexander Lopez
View author publications
You can also search for this author in PubMedGoogle Scholar
- Jennifer Rico Varela
View author publications
You can also search for this author in PubMedGoogle Scholar
- Jesus Alegre
View author publications
You can also search for this author in PubMedGoogle Scholar
- Jaime Berumen
View author publications
You can also search for this author in PubMedGoogle Scholar
- Roberto Tapia-Conyer
View author publications
You can also search for this author in PubMedGoogle Scholar
- Pablo Kuri-Morales
View author publications
You can also search for this author in PubMedGoogle Scholar
- Jason Torres
View author publications
You can also search for this author in PubMedGoogle Scholar
- Jonathan Emberson
View author publications
You can also search for this author in PubMedGoogle Scholar
- Rory Collins
View author publications
You can also search for this author in PubMedGoogle Scholar
- Michael Cantor
View author publications
You can also search for this author in PubMedGoogle Scholar
- Timothy Thornton
View author publications
You can also search for this author in PubMedGoogle Scholar
- Hyun Min Kang
View author publications
You can also search for this author in PubMedGoogle Scholar
- John D. Overton
View author publications
You can also search for this author in PubMedGoogle Scholar
- Alan R. Shuldiner
View author publications
You can also search for this author in PubMedGoogle Scholar
- M. Laura Cremona
View author publications
You can also search for this author in PubMedGoogle Scholar
- Mona Nafde
View author publications
You can also search for this author in PubMedGoogle Scholar
- Aris Baras
View author publications
You can also search for this author in PubMedGoogle Scholar
- Goncalo Abecasis
View author publications
You can also search for this author in PubMedGoogle Scholar
- Jonathan Marchini
View author publications
You can also search for this author in PubMedGoogle Scholar
- Jeffrey G. Reid
View author publications
You can also search for this author in PubMedGoogle Scholar
- William Salerno
View author publications
You can also search for this author in PubMedGoogle Scholar
- Suganthi Balasubramanian
View author publications
You can also search for this author in PubMedGoogle Scholar
Consortia
Regeneron Genetics Center
RGC-ME Cohort Partners
Corresponding authors
Correspondence to William Salerno or Suganthi Balasubramanian.
Supplementary information
Supplementary Information
This Supplementary information file contains the following. Description of Supplementary Tables 1-14. Supplementary Tables 1-6, and 11 are provided as separate data Excel tables. Supplementary Tables 7-10 and 12-14 are embedded within the Supplementary Information document. Supplementary Methods and descriptions of Supplementary Analyses. Supplementary Figures 1-7. Supplementary References.
Supplementary Table 1
This table includes: Sample subsets of RGC-ME used in different analyses; Full breakdown of sample counts in fine-scale ancestry groups used in Fig. 1 and for the browser; and Sample sizes and collaborator details for each dataset in RGC-ME. See main Supplementary Information PDF for full legend.
Supplementary Table 2
shet values for 16,710 genes and other annotations, including additional annotations, LOEUF scores from gnomAD, minor allele frequency, and coding sequence length. See main Supplementary Information PDF for full legend.
Supplementary Table 3
List of continuous segments of missense constrained regions found in 12,349 genes (canonical transcripts), based on the top 15-percentile threshold of MTR values. See main Supplementary Information PDF for full legend.
Supplementary Table 4
Jaccard index analysis between the MTR-constrained regions and features from UniProt (release 2022_05). See main Supplementary Information PDF for full legend.
Supplementary Table 5
List of genes with significant proportion of CDS in top 1, 5, 10, 15, and 20 percentile of exome wide MTR missense constraint scores based on one-sided binomial tests. See main Supplementary Information PDF for full legend.
Supplementary Table 6
List of 4,848 genes with rare (alternate allele frequency <1%) biallelic pLOF variants (hom*ozygous alternate and compound heterozygous) reported for the entire RGC-ME dataset including related individuals. See main Supplementary Information PDF for full legend.
Supplementary Table 11
List of highly differentiated variants (FST > 0.15).
Rights and permissions
About this article
Cite this article
Sun, K.Y., Bai, X., Chen, S. et al. A deep catalogue of protein-coding variation in 983,578 individuals. Nature (2024). https://doi.org/10.1038/s41586-024-07556-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41586-024-07556-0
Comments
By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.