Communicating sequence quality with the QSM


Today we introduce the Quality Sequencing Minimum (QSM). The QSM is a simple, convenient, comprehensive shorthand to describe the sequence quality of a genetic test.

It is obviously essential that our genetic tests are as accurate as possible. And we also must know if, and how, tests are vulnerable to error. The accuracy of the DNA code (i.e. the sequence) generated in genetic tests is critical to this accuracy. So sequence quality determines the overall quality of genetic tests.

But what do you know about sequence quality? How often have you seen information about sequence quality? Probably your answers to these questions are ‘nothing’ and ‘never’.

 

Introducing the Quality Sequencing Minimum (QSM)

Everyone agrees that that we must pay meticulous attention to sequence quality in genetic testing. And we must be clear and transparent when we communicate about sequence quality. Unfortunately this has been difficult to deliver. The TGMI has now developed the Quality Sequencing Minimum (QSM) to help achieve this. We published an open-access paper describing the QSM in Wellcome Open Research, this week.

 

What happens in NGS sequencing?

To understand what determines sequence quality, we first need to understand how NGS (next-generation sequencing) works. NGS starts by cutting up the target you want to sequence into millions of overlapping DNA fragments called ‘reads’. The power of NGS is that the DNA code of these fragments can be worked-out all at the same time, in parallel. This is why sequencing is so much faster than before, as we described in a short video.

After the DNA sequence (i.e. the A,C,G,T bases) of the fragments has been determined the fragments have to be stitched back together again. We do this by finding where the fragment sequence is located in the human genome reference sequence. This process is called ‘mapping’. Mapping shows you what you have sequenced and which bases are different from the reference genome sequence. Most of these differences are harmless. But a critical difference in a critical gene can cause medical problems. Genetic testing is about finding these medically-relevant differences.

 

What determines sequence quality?

We use three measures to evaluate sequence quality in NGS data: depth of coverage, base quality and mapping quality.

Depth of coverage describes how many fragments (reads) overlap a given position. A disease-causing mutation can be missed, if it is not covered well enough. You need to know how good the coverage is to know how complete the test is.

Base quality, describes how likely the A,C,T or G called at a given position is the correct base at that position. Different sequence callers do this in different ways, but all provide a prediction of how likely it is they have called a particular base correctly.

Mapping quality describes how likely the fragment has been placed in the correct part of the genome. Mapping is very accurate for most fragments, because the sequence of most fragments is unique to one place in the genome. But there are stretches of DNA that appear in multiple parts of the genome. It can be very, very difficult to know where to map fragments if they are made up of such sequence.

 

The Quality Sequencing Minimum has three parts

The QSM describes the minimum values for depth of coverage (C), base quality (B) and mapping quality (M) the lab has chosen for a particular test.  For the cancer panel test we do in TGLclinical we set the QSM as C50_B10(85)_M20(95).

This means that 100% of the bases must minimally have a depth of coverage of ≥50 reads with a base quality of ≥10 in at least 85% of reads, and a mapping of ≥20 in at least 95% of reads. The rationale for these choices is explained in the QSM paper.

Our pipeline automatically calculates if a base meets QSM. We have used this information to evaluate and improve the quality of testing in several ways. These improvements mean that 97% of the panel passes routinely in every test we do. Using the QSM also lets us automatically flag regions that don’t pass. We can then hone in on these and decide what to do.

The information needed to do this is routinely outputted during NGS analyses, it just needs to be harnessed for the QSM. We developed a simple, freely-available tool called CoverView to do this. We published an open-access paper about CoverView in Wellcome Open Research alongside the QSM paper.

 

Different tests will have different QSMs

The QSM will be different for different tests in different labs. This is because how sequence is generated and analysed affects the QSM component parts.

For example, our depth of coverage minimum is 50 reads. This makes sense because every base is typically covered by ~1000 reads in our cancer panel test. So a position covered by less than 50 reads is, relatively, under-performing. In many exome tests the typical coverage is 100 reads per position, so 50 reads would not be such a cause for concern. This relative performance is important, as it is in many other types of tests. If a child scores 50 on a test where the average score is 1000 you would evaluate the child’s performance differently to a test where the average score was 100.

 

The QSM allows transparent communication about sequence quality

The QSM provides transparency. It lets everyone know what has been tested, to what quality.

One of the most valuable uses of the QSM is that we can use it to communicate about the quality of individual tests, on the actual report.  For most genetic testing it is currently almost impossible to find how complete an individual test is. Using the QSM has allowed us to change this. In a simple, transparent, yet still comprehensive shorthand.

With the QSM we can now put ‘This test met QSM as C50_B10(85)_M20(95)’ for a test that fully passed. Or we can put ‘This test met QSM C50_B10(85)_M20(95) {except PMS2 exons 12-15}’ if part of the test did not pass. PMS2 is a gene with some sequence that maps to more than one place in the genome, which makes it hard to test by NGS. If the report receiver thinks a PMS2 mutation might have caused their patients problems we can test the tricky bit of the gene with another method. But for most of our tests PMS2 is not relevant to the person’s condition, so they don’t need this additional testing.

Most importantly, the QSM provides transparency. It lets everyone know what has been tested, to what quality. People can then make appropriately informed decisions about the genetic test results.

 

Adoption of QSM will foster standardisation of sequence quality

We hope, and believe, that widespread adoption of the QSM will foster transparent communications about sequence quality. Labs will also be able to use the QSM to demonstrate their performance and adherence to quality standards. Over time, this will develop into guidance and guidelines about acceptable QSM’s for genetic testing. This should deliver much needed practical solutions to ensuring and demonstrating that all genetic testing meets required standards.