A deep catalogue of protein-coding variation in 983,578 individuals (2024)

Abstract

Rare coding variants that significantly impact function provide insights into the biology of a gene1-3. However, ascertaining their frequency requires large sample sizes4-8. Here, we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. 23% of the Regeneron Genetics Center Million Exome data (RGC-ME) comes from non-European individuals of African, East Asian, Indigenous American, Middle Eastern, and South Asian ancestry. This catalogue includes over 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss-of-function, we identify 3,988 loss-of-function intolerant genes, including 86 that were previously assessed as tolerant and 1,153 lacking established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions depleted of missense variants despite being tolerant to pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this important resource of coding variation from the RGC-ME accessible via a public variant allele frequency browser.

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Change institution

Buy or subscribe

Access Nature and 54 other Nature Portfolio journals

Get Nature+, our best-value online-access subscription

£14.99 /30days

cancel any time

Learn more

Subscribe to this journal

Receive 51 print issues and online access

£199.00 per year

only £3.90 per issue

Learn more

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Learn more

Prices may be subject to local taxes which are calculated during checkout

Similar content being viewed by others

A deep catalogue of protein-coding variation in 983,578 individuals (1)

Identifying proteomic risk factors for cancer using prospective and exome analyses of 1463 circulating proteins and risk of 19 cancers in the UK Biobank

Article Open access 15 May 2024

A deep catalogue of protein-coding variation in 983,578 individuals (2)

Analysis of somatic mutations in whole blood from 200,618 individuals identifies pervasive positive selection and novel drivers of clonal hematopoiesis

Article Open access 14 May 2024

A deep catalogue of protein-coding variation in 983,578 individuals (3)

Genome-wide association studies

Article 26 August 2021

Author information

Author notes

  1. These authors contributed equally: Kathie Y. Sun, Xiaodong Bai

Authors and Affiliations

  1. Regeneron Genetics Center, Tarrytown, NY, USA

    Kathie Y. Sun,Xiaodong Bai,Siying Chen,Suying Bao,Chuanyi Zhang,Manav Kapoor,Joshua Backman,Tyler Joseph,Evan Maxwell,George Mitra,Alexander Gorovits,Adam Mansfield,Boris Boutkov,Sujit Gokhale,Lukas Habegger,Anthony Marcketta,Adam E. Locke,Liron Ganel,Alicia Hawes,Michael D. Kessler,Deepika Sharma,Jeffrey Staples,Jonas Bovijn,Sahar Gelfman,Alessandro Di Gioia,Veera M. Rajagopal,Alexander Lopez,Jennifer Rico Varela,Michael Cantor,Timothy Thornton,Hyun Min Kang,John D. Overton,Alan R. Shuldiner,M. Laura Cremona,Mona Nafde,Aris Baras,Goncalo Abecasis,Jonathan Marchini,Jeffrey G. Reid,William Salerno&Suganthi Balasubramanian

  2. Faculty of Medicine, National Autonomous University of Mexico (UNAM), Mexico City, Mexico

    Jesus Alegre,Jaime Berumen,Roberto Tapia-Conyer&Pablo Kuri-Morales

  3. Instituto Tecnológico y de Estudios Superiores de Monterrey, Monterrey, Mexico

    Pablo Kuri-Morales

  4. Clinical Trial Service Unit & Epidemiological Studies Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK

    Jason Torres,Jonathan Emberson&Rory Collins

Authors

  1. Kathie Y. Sun

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  2. Xiaodong Bai

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  3. Siying Chen

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  4. Suying Bao

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  5. Chuanyi Zhang

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  6. Manav Kapoor

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  7. Joshua Backman

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  8. Tyler Joseph

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  9. Evan Maxwell

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  10. George Mitra

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  11. Alexander Gorovits

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  12. Adam Mansfield

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  13. Boris Boutkov

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  14. Sujit Gokhale

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  15. Lukas Habegger

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  16. Anthony Marcketta

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  17. Adam E. Locke

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  18. Liron Ganel

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  19. Alicia Hawes

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  20. Michael D. Kessler

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  21. Deepika Sharma

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  22. Jeffrey Staples

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  23. Jonas Bovijn

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  24. Sahar Gelfman

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  25. Alessandro Di Gioia

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  26. Veera M. Rajagopal

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  27. Alexander Lopez

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  28. Jennifer Rico Varela

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  29. Jesus Alegre

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  30. Jaime Berumen

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  31. Roberto Tapia-Conyer

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  32. Pablo Kuri-Morales

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  33. Jason Torres

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  34. Jonathan Emberson

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  35. Rory Collins

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  36. Michael Cantor

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  37. Timothy Thornton

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  38. Hyun Min Kang

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  39. John D. Overton

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  40. Alan R. Shuldiner

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  41. M. Laura Cremona

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  42. Mona Nafde

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  43. Aris Baras

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  44. Goncalo Abecasis

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  45. Jonathan Marchini

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  46. Jeffrey G. Reid

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  47. William Salerno

    View author publications

    You can also search for this author in PubMedGoogle Scholar

  48. Suganthi Balasubramanian

    View author publications

    You can also search for this author in PubMedGoogle Scholar

Consortia

Regeneron Genetics Center

RGC-ME Cohort Partners

Corresponding authors

Correspondence to William Salerno or Suganthi Balasubramanian.

Supplementary information

Supplementary Information

This Supplementary information file contains the following. Description of Supplementary Tables 1-14. Supplementary Tables 1-6, and 11 are provided as separate data Excel tables. Supplementary Tables 7-10 and 12-14 are embedded within the Supplementary Information document. Supplementary Methods and descriptions of Supplementary Analyses. Supplementary Figures 1-7. Supplementary References.

Supplementary Table 1

This table includes: Sample subsets of RGC-ME used in different analyses; Full breakdown of sample counts in fine-scale ancestry groups used in Fig. 1 and for the browser; and Sample sizes and collaborator details for each dataset in RGC-ME. See main Supplementary Information PDF for full legend.

Supplementary Table 2

shet values for 16,710 genes and other annotations, including additional annotations, LOEUF scores from gnomAD, minor allele frequency, and coding sequence length. See main Supplementary Information PDF for full legend.

Supplementary Table 3

List of continuous segments of missense constrained regions found in 12,349 genes (canonical transcripts), based on the top 15-percentile threshold of MTR values. See main Supplementary Information PDF for full legend.

Supplementary Table 4

Jaccard index analysis between the MTR-constrained regions and features from UniProt (release 2022_05). See main Supplementary Information PDF for full legend.

Supplementary Table 5

List of genes with significant proportion of CDS in top 1, 5, 10, 15, and 20 percentile of exome wide MTR missense constraint scores based on one-sided binomial tests. See main Supplementary Information PDF for full legend.

Supplementary Table 6

List of 4,848 genes with rare (alternate allele frequency <1%) biallelic pLOF variants (hom*ozygous alternate and compound heterozygous) reported for the entire RGC-ME dataset including related individuals. See main Supplementary Information PDF for full legend.

Supplementary Table 11

List of highly differentiated variants (FST > 0.15).

Rights and permissions

About this article

A deep catalogue of protein-coding variation in 983,578 individuals (4)

Cite this article

Sun, K.Y., Bai, X., Chen, S. et al. A deep catalogue of protein-coding variation in 983,578 individuals. Nature (2024). https://doi.org/10.1038/s41586-024-07556-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41586-024-07556-0

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

A deep catalogue of protein-coding variation in 983,578 individuals (2024)
Top Articles
Latest Posts
Article information

Author: Gregorio Kreiger

Last Updated:

Views: 6299

Rating: 4.7 / 5 (57 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Gregorio Kreiger

Birthday: 1994-12-18

Address: 89212 Tracey Ramp, Sunside, MT 08453-0951

Phone: +9014805370218

Job: Customer Designer

Hobby: Mountain biking, Orienteering, Hiking, Sewing, Backpacking, Mushroom hunting, Backpacking

Introduction: My name is Gregorio Kreiger, I am a tender, brainy, enthusiastic, combative, agreeable, gentle, gentle person who loves writing and wants to share my knowledge and understanding with you.