Genetic medicine urgently needs better phenotype data

phenotypeThe core of genetic medicine involves finding causal links between changes in genes (genetic variation) and changes in human characteristics (phenotypic variation). Limited genetic knowledge has been the major bottleneck in genetic medicine for most of human history. But we are now hurtling towards an era in which our limited knowledge about phenotypes may become the major constraint to delivering genetic medicine.


Baseline genetic variation data has been transformative

As we described in a previous post, one of the most valuable outcomes of cheap, fast DNA sequencing has been the collection and curation of genetic data from many thousands of people, including healthy individuals. This has allowed us to observe and describe the nature and range of normal human genetic variation.

We have learnt many important and unexpected lessons from these endeavours. In particular we now know that rare variation is much commoner than we anticipated and is mostly harmless. This new knowledge has allowed us to reclassify as benign, many genetic variants that were thought to cause disease.


Baseline phenotype data would also be transformative

Genetic syndromes often include distinctive combinations of rare morphological, developmental, biochemical or other pathological phenotypes. Whilst genetic testing was laborious and expensive it was only available to individuals with all the syndrome features. Now genetic testing is being used much more liberally, often in individuals with only one phenotype abnormality.

We need to catalogue the spectrum of human phenotypic variation, as we are doing for human genetic variation

But how common are these rare phenotypes individually in the general population? Are they collectively more common than we appreciate, just as was shown for rare genetic variation? Does every one of us have at least one mild ‘abnormality’? If so, we may well be making inappropriate causal links between genetic variants and phenotypes.

To reduce misdiagnoses and to improve the accuracy of genetic medicine we need to stop making assumptions about phenotypes. Instead we should observe and catalogue the spectrum of human phenotypic variation, as we are doing for human genetic variation.

How can this be achieved?


Defining phenotypes

The first requirement is to define phenotypes using consistent, universal terms. There has already been extensive and impressive attention to this complex task. For example, the Human Phenotype Ontology is developing a logical, standardised vocabulary for phenotypes that is being widely adopted and already includes over 11,000 terms.


Standardising phenotype measurements

We next need to consistently use the phenotype terms. For some phenotypes this is straightforward; it would be easy to achieve consensus on how to decide who has an absent tibia. But for many phenotype abnormalities the decision as to who has them is rather subjective. For example macroglossia, which is the medical term for an unusually big tongue, does not have an objective measurable definition. It is down to the examining doctor to decide if the tongue is big enough to warrant being described as macroglossia.

This subjectivity leads to inconsistency in how terms are used by different doctors, and even by the same doctor in different patients, potentially leading to bias and misdiagnoses. It also makes it very difficult to accurately determine the baseline macroglossia frequency in the general population.


How common are phenotype abnormalities?

It is vital to know how common a phenotype variant is in the general population if we are to determine how likely it is to be causally linked to genetic variants. Everyone with a phenotype abnormality will also have genetic variants, because everyone has genetic variants. To prove a causal link between a genetic variant and a phenotype abnormality requires us to show that the two occur together more often than would be expected by chance. We can only do this if we know what the baseline population frequency of the variant and the phenotype are. Then we can work out how often they would be expected to co-occur by chance and we can see if they co-occur more than that, which would suggest the genetic variant has a role in causing the phenotype abnormality.

Unfortunately, we currently do not have robust information about the frequency of most phenotype abnormalities. We either have no data, or we have crude estimates from small studies.


A Phenome Aggregation Database

One potential approach could be to follow the example of gnomAD – the genome aggregation database. GnomAD pulls together genetic data from >120,000 individuals from diverse research studies. It is a wonderful example of research repurposing, giving great added value to the original studies. It has also proved an invaluable reference of human genetic variation, used by hundreds of clinical testing laboratories and research groups. A similar opportunistic aggregation of phenotype data from research studies could be very helpful. Of course one would have to be mindful that most of the data will not be from healthy individuals, but it can still be useful. For example, a heart failure study might have chest images from thousands of patients that could be used to help determine how many people are born with an extra rib.


A Human Phenome Project

Another approach could be to follow the example of population genome projects such as the Human Genome project and the 1000 Genomes project. Fifteen years ago Freimer and Sabatti wrote an excellent commentary on the need for a Human Phenome Project. Most of it is still relevant today. Most of it still needs to be done today. In part because the phenome is more complex, subjective and changeable than the genome, which adds many challenges. But some things should be achievable. Every week 1.75 million babies are born across the world. It is not inconceivable for us to plan a global phenotyping week when we try to get data on the spectrum of physical variation at birth from as many of babies as possible.


Phenotypes in the spotlight

Probably the most important thing we need to do now is to dedicate focus, vision, brainpower and money to phenotyping. Without attention to phenotypes we can never fully realise the promise of genetic medicine. It will be like riding a bicycle with one flat tyre. We will go much slower than we need to and may never reach our destination.