In a previous post I discussed the importance of clinical software verification. Verification ensures that our software does what we think it does. However this is not sufficient for proving that software is fit for purpose. We must also check that the software does the right thing. This is called software validation.
What is validation?
In software development, validation means testing that software actually fulfils the needs of its users. In the case of medical genetics software this may mean, for example, that it correctly determines the presence or absence of specific genetic mutations in a patient’s DNA. Validation is by nature an inter-discplinary activity, and involves collaboration between domain experts, software developers, users, and test regulators.
How is validation performed?
Validation of medical software will usually involve computational, experimental and practical validation steps. The computational validation often consists of a suite of automated tests that perform quantitative comparisons between the output of the software and experimental data. An important requirement for this type of validation is access to a ‘truth set’; a trusted source of proven true data to which the results of the software can be compared. For example, if the software aims to detect specific types of variants, a truth set including sample data in which the variant types are definitely present and in which the variant types are definitely absent are essential to validate software performance.
Validation is an ongoing activity
Software is often under active development to deal with bugs, new feature requests etc. If the code is changed, the software must be re-validated to prove that it still does the right thing and to prove that there were no unintended effects of the change. To make this possible, it is a good idea to automate as much of the validation process as possible. Mature software often has fairly sophisticated code dedicated to validation. This requires time to develop, and it is important to allow for this when planning development of medical software.
Finding validation data is hard
Having access to a ‘trusted source of truth’ for validation is essential. But it is not always easy to get hold of datasets that can be used for validation of medical software. Large laboratories may be able to generate their own truth data but this can rarely be fully comprehensive and has significant production and storage costs. Generating suitable data for validation in-house is beyond the capacity and resources of most laboratories.
Data sharing is key
Every year vast quantities of DNA sequencing data and associated metadata are produced in labs all over the world. Only a fraction of this data is made available for others to use. To make matters worse, data generated by research scientists is rarely in a form which facilitates its use for automated validation experiments. This is not surprising; it is difficult for researchers to anticipate the needs of people working on software validation and there is little incentivastion to producing data useful for validation.
TGMI validation datasets
The TGMI has a strong focus on generating validation datasets that are freely available for use by the genetic medicine community. These include the ICR96 exon CNV validation series which we recently made available. As well as assisting labs in their internal validation processes, the use of external validation data provides an opportunity to standardise and benchmark software performance across labs. As genetic testing becomes a global endeavour the ability to compare software validation processes and outputs is becoming increasingly important.