A benchmarking resource for exon CNV detection

benchmarking exon CNVThe TGMI are pleased to announce a new resource, the ICR96 exon CNV validation series, which we hope will be useful for the evaluation and benchmarking of exon CNV detection tools. We have published a paper detailing the resource in Wellcome Open Research.


Detection of exon CNVs is tricky

In most gene testing labs, small DNA changes are detected quickly, accurately and efficiently by automated DNA sequence analyses.

Unfortunately, larger changes, for instance deletions and duplications of exons, (the building blocks of genes), are much trickier to detect by automated DNA sequence analyses. Most techniques trying to do this have suffered from too many false negatives (missing real mutations) and too many false positives (detecting mutations that aren’t real).


Robust exon CNV detection tools are emerging 

Fortunately, analytical tools are emerging that detect exon CNVs in sequencing data with the robustness needed for medical tests. Several of the TGMI team are working in this area; evaluating, optimising and validating tools and processes.

In a previous post we described DECoN, the tool we use in our accredited clinical testing lab, TGLclinical, to detect exon CNVs in cancer predisposition genes. DECoN has transformed our testing pipeline, making it much more time and cost efficient. We have had a lot of interest in using DECoN, since we released it 6 months ago.


Evaluating and comparing different tools is essential

A pervasive issue that surfaced during our work and discussions on exon CNVs was the challenges in validating and comparing the effectiveness of different tools and pipelines. Typically, individual laboratories use internal experimental data and/or simulated data to evaluate the performance of their exon CNV detection pipeline. Whilst it is excellent that these approaches are taken, they have limitations, particularly when trying to make comparisons between labs and/or methods.

It became clear that an external ‘benchmarking’ dataset would enhance exon CNV testing evaluations, both within individual laboratories and between laboratories.


The ICR96 exon CNV validation series

The ability to evaluate the performance of a method using an independent dataset with validated positive and negative results is hugely beneficial

We put together the ICR96 exon CNV validation series to provide an external dataset that could fulfill these needs. The dataset includes data from 96 samples. 66 samples contain at least one validated CNV and 30 samples have validated negative results for exon CNVs in 26 genes. Both high-quality sequencing data from a targeted assay (the TruSight Cancer Panel) and data from a completely different method, Multiplex Ligation-dependent Probe Amplification (MLPA), are provided.

The ability to evaluate the performance of a method using an independent dataset with pre-determined positive and negative results is hugely beneficial. It can be used to provide invaluable information on sensitivity, specificity and false detection rates.

Moreover, if all labs doing exon CNV detection show how well their pipeline performs on the ICR96 exon CNV dataset, we could start to build up comparative data. This would stimulate knowledge exchange between labs, leading to method improvements. And it would serve as the foundations to defining the minimum and optimal standards for exon CNV detection in medical testing. These standards are sorely needed.


The ICR96 exon CNV validation series is a readily available

We have published information on the ICR96 exon CNV validation series on the Wellcome Open Research platform which makes it immediately available to anyone. We have deposited the sequencing files in the European Genome-phenome Archive (EGA) under the accession number EGAS00001002428.

A simple and quick access process allows any legitimate clinical, research or commercial enterprise to gain access to the data. Instructions are available at either EGA or at www.icr.ac.uk/icr96.


Image by frankieleon on Flickr. CC-BY