New research reveals that human gene annotationsโthe foundational maps scientists use to understand diseaseโcontain thousands of missing transcripts from non-European populations, with profound implications for precision medicine worldwide.
The human genome has been called the “book of life,” but a new study published in Nature Communications suggests that significant chapters have been written exclusively by and for people of European descent. Researchers analyzing blood cells from 43 individuals across eight global populations discovered more than 41,000 RNA transcripts missing from standard gene reference mapsโmolecular blueprints that scientists rely upon daily to understand disease, develop drugs, and advance personalized medicine.
The findings illuminate a troubling reality that has plagued biomedical research for decades: the overwhelming majority of genomic data comes from people of European ancestry, even though this group represents only about 16 percent of the world’s population. As of recent analyses, approximately 86 percent of participants in genome-wide association studies have been of European descent, with African populations contributing just over 1 percent and Indigenous American populations represented at near-zero levels.
“Gene maps are used by scientists every day, but we’ve been leaving out huge sections of the world’s population. This study shows, for the first time, how much we’ve been missing,” said Pau Clavell-Revelles of the Barcelona Supercomputing Center and Centre for Genomic Regulation, the study’s first author.

Uncovering the Hidden Transcriptome
The research team employed long-read RNA sequencing technology, which captures entire RNA molecules from end to end rather than fragmenting them. By analyzing blood cells from Yoruban, Luhya, Mbuti, Han Chinese, Indian Telugu, Peruvian, Ashkenazi Jewish, and Utah European individuals, the scientists generated over 800 million full-length readsโone of the largest datasets of its kind.
The findings were striking. Non-European samples consistently contained far more previously unseen transcripts than European samples. Among the 2,267 population-specific transcripts identified, the pattern was unmistakable: for European groups, most were already catalogued. For non-European groups, most were entirely new to science. Additionally, 773 transcripts appear to originate from genomic regions scientists had not previously recognized as containing genes, and 41 percent of novel transcripts from protein-coding genes are predicted to encode distinct protein variants never before catalogued.

The Real-World Consequences
The implications extend far beyond academic interest. When a transcript is missing from reference gene maps, any genetic variant affecting that transcript becomes essentially invisible to researchers. This means scientists may miss critical information about why certain diseases manifest differently across populationsโor occur more frequently in some groups than others.
“Most gene sequencing so far has come from European individuals, so the reference catalogues we rely on may be missing genes or transcripts that exist only in non-European populations,” explained Dr. Roderic Guigรณ, senior co-author and researcher at the Centre for Genomic Regulation. “If a genetic variant falls in one of these missing genes, we assume it has no biological effect. In some cases, that assumption may simply be wrong.”
The study found that several newly identified transcripts occur in genes already linked to conditions showing ancestry-related differences, including systemic lupus erythematosus, rheumatoid arthritis, Takayasu’s arteritis, asthma, and cholesterol-related traits. Lupus, for example, is known to disproportionately affect African American and Hispanic women compared to European Americans, yet the genetic underpinnings of these disparities remain incompletely understood.
“We found that many novel ancestry-biased transcripts occur in genes already associated with autoimmune diseases, asthma and metabolic traits,” said Dr. Marta Melรฉ, senior co-author and Group Leader at the Barcelona Supercomputing Center. She emphasized that these findings don’t necessarily mean the transcripts cause disease differences, but rather that they reveal genetic signals previously hidden from view.

A Systemic Problem Across Biomedical Research
The ancestry bias in gene annotations mirrors broader inequity across genomic science. Clinical trials have historically enrolled predominantly white participants, pharmacogenomic databases remain skewed toward European populations, and polygenic risk scoresโtools increasingly used to predict individual disease riskโperform far less accurately in non-European populations.
This matters because precision medicine promises to tailor treatments based on individual genetic profiles. When reference data excludes most of humanity’s genetic variation, that promise rings hollow for billions of people. The roots of this problem trace to the Human Genome Project itself, which relied heavily on samples from European-ancestry individuals. Reference genomes and subsequent annotation projects inherited this limitation, while research institutions concentrated in Europe and North America perpetuated the cycle.
The Path Forward
The researchers propose building a human “pantranscriptome”โa comprehensive catalogue of all RNA molecules across all tissues, life stages, and populations. While the Human Pangenome Project has begun expanding DNA diversity, this study demonstrates that DNA alone provides only part of the picture.
“The pangenome tells us about DNA diversity, essentially, it’s a book of instructions. The pantranscriptome tells us which words are important in each cell of our body. Both are essential for fully understanding human diversity,” Dr. Melรฉ explained.
The scale is formidable. This single study generated more than 10 terabytes of data requiring the MareNostrum 5 supercomputer. “We firmly believe that any findings that we made here are really just the tip of the iceberg,” said Dr. Fairlie Reese, a postdoctoral researcher at the Barcelona Supercomputing Center.
The authors acknowledge their work has limitationsโmany populations remain unrepresented and only one tissue was examined. Yet the message is clear: continuing to build genomic science on foundations that exclude most of humanity ensures the benefits of genetic medicine will remain inequitably distributed.
“We hope our study serves as a foundation and an invitation for the global scientific community to contribute data, methods, and diverse populations,” Dr. Melรฉ concluded. “Only through a collective effort will we achieve a truly complete and inclusive map of human biology, which is essential for fair and accurate genomic medicine.”
Sources
- Clavell-Revelles, P., Reese, F., Carbonell-Sala, S., et al. “Long-read transcriptomics of a diverse human cohort reveals ancestry bias in gene annotation.” Nature Communications 16, 10194 (2025). https://doi.org/10.1038/s41467-025-66096-x
- EurekAlert! Press Release: “Human gene maps are biased towards European ancestries.” Barcelona Supercomputing Center (BSC-CNS), December 3, 2025. https://www.eurekalert.org/news-releases/1107625
- Fatumo, S., et al. “A roadmap to increase diversity in genomic studies.” Nature Medicine 28, 243โ250 (2022).
- Sirugo, G., Williams, S.M., & Tishkoff, S.A. “The Missing Diversity in Human Genetic Studies.” Cell 177, 26โ31 (2019).
- Corpas, M., et al. “Bridging genomics’ greatest challenge: The diversity gap.” Cell Genomics 4, 100485 (2024).
- National Human Genome Research Institute. “Genomic databases weakened by lack of non-European populations.” March 6, 2019. https://www.genome.gov/news/news-release/Genomic-databases-weakened-by-lack-of-non-European-populations





Leave a Reply