Why genetic research is too white

By Ella Shalom

This article was originally published in The Oxford Scientist Michaelmas Term 2021 edition, Change.

Ever since the completion of the Human Genome Project in 2003, genetic research has developed in leaps and bounds. International teams of researchers worked for over a decade to map the DNA sequence of the entire human genome, leading to DNA sequencing becoming fast, cheap and effective; thus, our understanding of genetic diseases and our approach to treating them is also evolving rapidly.  

However, despite genomics being at the forefront of the scientific field, it faces one major flaw; genetic research is too white. Genomic-wide association studies (GWAS) aim to identify gene variants associated with disease by sequencing genomes. Of the individuals involved in the study, 78% were of European descent, 10% Asian, 2% African and 1% Hispanic. The issue with this disparity in ethnic representation in genome sequencing means our ability to understand the genetic makeup of the global population is impeded. It also further perpetuates healthcare inequality and promotes its translation into clinical practices and public health policies that could be incomplete and therefore dangerous to populations underrepresented by genome studies.

A prime example of this is in precision medicine, a practice which uses an individual’s DNA sequence to hand pick the correct treatments for any genetic conditions that they may be suffering or at risk of suffering from. An individual’s whole genome sequence can be used to find potential causes of undiagnosed diseases by comparison to reference genomes. However, by focussing on a subset of the global population (in this case, those of European descent), the data will only have genetic variants that don’t reflect the full range of diversity. This means that researchers will miss important genetic determinants for disease in the DNA sequences of other ethnicities.

This can be seen in cystic fibrosis, a condition which is common in Europe, with one case every 2000–3000 births, though much rarer in African Americans (1 case per 17,000 births). On top of this, from genome sequencing, it has been determined that the deletion of a phenylalanine (an amino acid) at position 508 in the CTFR gene is causative of approximately 70% of cystic fibrosis cases in those of European descent. However, in those of African American descent the same allele is only causative of 29% of cases. This means that any genetic screening for cystic fibrosis using the △F508 allele as a marker for cases will result in a severe underdiagnosis of the condition in African Americans, preventing these patients from getting the treatment they require.

A lack of diverse genomic data can also result in the incorrect inference of the causes of genetic diseases. When comparing to reference genomes, rare variants in the affected patient’s DNA sequences can be identified as causative of the condition. For example, DNA sequences of patients suffering from acute myeloid leukaemia were compared to normal skin genomes. This resulted in the identification of somatic mutations in the cancer genomes that could then be used as targets for cancer treatments. While this is an effective technique, it fails to be accurate when the reference genomes are not ethnically diverse. A study in 2016 suggested that a genetic variant that was rare in white European DNA databases was indicative of a high risk of developing heart disease. However, when this information was later cross referenced with genomes from individuals of African descent, it was discovered that this variant was common among them and therefore completely benign. Had there been a better range of diverse genomes when the first reference was completed, this miscalculation would not have occurred, leading to more accurate diagnoses.

These inconsistencies have likely occurred because at the beginning of genetic research, funding for genome sequencing was only majorly available in mostly white, European countries. However, it’s clear from each of these cases that we need more diversity in our genomic databases. These changes are already in the making, with one example being the National Institute of Health’s ‘All of Us’ research program, which aims to build a ‘diverse database that can inform thousands of studies on a variety of health conditions’. Unfortunately, achieving total diversity in this case is no easy feat and will require a global effort from scientists at the forefront of genomic research. However, any move towards improving the data diversity of underrepresented ethnicities is a step in the right direction, in order to push the boundaries of genetic research and medical breakthroughs.