It is currently recommended that symbols denoting genes should be italicised. Last week we explained how this convention arose. The primary drivers were to make text easier to read and to help readers know when the symbol referred to a gene (in italics) and when it referred to a protein (in normal font).
But does the recommendation achieve this? Is it making things easier or harder? Is it reducing confusion or adding confusion? Should it stay or should it go?
I think we should make the recommendation discretionary or simply stop using it. My reasons are below.
Most people don’t know about it
People that publish scientific research about genes and proteins know about the convention and the underlying rationale. This used to encompass the majority of gene symbol users in 2002, when the HUGO Gene Nomenclature Committee (HGNC) recommendations were last updated. Today it is a small minority. We now use gene symbols very widely in many different types of communication, written and read by many different types of consumer. Very few people know why the symbols are sometimes italicised, even amongst scientists.
The need is redundantIn a scientific publication about genes and the proteins they encode there can be a repeated need to distinguish between when the symbol is being used to denote the gene and when it is being used to denote the protein. Italicising the symbol when it refers to the gene was a quick, convenient way to achieve this.
Today, the need is only relevant to a small fraction of information on genes. And because readership has become so much broader, and most readers are not aware of the convention, people are usually explicit anyway. They write ‘the BRCA1 gene’ not simply BRCA1. Or often ‘the BRCA1 gene’, which could be considered correct or incorrect, depending on what you are trying to convey and your interpretation of the rules!
Other molecules are not handled well
These symbols are not solely used to denote genes and proteins. They are also used to tag other molecules such as mRNA and cDNA. I have lost count of the number of times I have been asked whether or not these should be italicised. It is an area of confusion.
In researching for this blog I found out that I have not been following the rules! Turns out the relevant prefix e.g. (mRNA)BRCA1 or (cDNA)BRCA1 should be added, though this has not been widely adopted. In fact, if the rules are applied strictly, gDNA should be added ahead of the italicised gene symbol in almost all instances where just the italicised symbol is currently being used. So it would be (gDNA)BRCA1 instead of BRCA1. To show that it is genomic DNA, not one of the other molecules.
I have never done this. I never even knew I should! Although I have not been following the HGNC recommendations I have been following accepted convention. So I am not planning to change.
It is used inconsistently
HGNC allows some exceptions to the rule, for example in long lists of genes. But inconsistency also arises in many other ways. There are areas of ambiguity. For example should umbrella terms for groups of genes be italicised? BRCA is often used to encompass BRCA1 and BRCA2, but there is no BRCA gene. Should BRCA be italicised? This isn’t addressed in any recommendations that I am aware of, so it’s down to the writer to decide. My interpretation is that BRCA should not be italicised and many agree. But just as many do italicise BRCA.
Unintentional inconsistency is very common, often simply through forgetting to italicise the symbol somewhere in the text. Or because the italics are lost during the journal’s typesetting process. When we receive the proofs of a scientific paper we specifically, laboriously, double-check every single symbol to ensure that the convention has been appropriately followed.
Unfortunately, because the default text has a specific meaning (protein not gene) this inconsistency can lead to errors in meaning. If I forgot to italicise the symbol in a sentence, I would, inadvertently, be referring to the protein not the gene in that sentence. This could make the sentence scientifically incorrect, or ambiguous. In fact, it is usually obvious whether one is talking about the gene or protein simply from the context. We don’t rely on the presence or absence of italics to decide this. So inconsistencies don’t have much impact. On the one hand this is good, but on the other hand it does undermine the value of the convention.
It is unsuited to digital media
Italics are an easy, effective, stable way to highlight and discriminate text in print media. This is not true in digital media. Adding italics is not effortless and they are often lost or misrepresented during the dynamic transfer of data that underpins how we access digital media. Many websites, including the HGNC website, don’t use italics for gene symbols. Databases and ‘big data’ applications almost never italicise gene symbols.
Time to change?
I believe we should change the convention such that the default option is not to use italics to denote if one is referring to the gene or the protein. Instead we should simply be explicit about it. i.e write ‘the BRCA1 gene’ or ‘the BRCA1 protein’. This is already widely adopted in many circumstances, so in many ways this change would be formalising normal, accepted practice.
But what about scientific publications? This is where the convention started and where it is still likely to have some utility. But do the upsides still outweigh the downsides? Clearly it needs more deliberation, discussion and input than I have provided here.
But a suggestion is that we could make use of the convention discretionary. In many genetic medicine papers the protein is not mentioned, so there is no need to use italics to inform the reader that you are talking about the gene. One has all the downsides with none of the upsides. In the papers where it is useful we could simply state that we are using italics as a shorthand for ‘the BRCA1 gene’ the first time it is mentioned, as we do for acronyms.
What do other people think?