Twenty years ago, Massive Attack’s masterpiece, Mezzanine, was top of the UK album charts. It was groundbreaking from the beginning, being the first album available for digital download. Now, in a beautiful science-art collaboration, Mezzanine is to be stored on DNA. In one of the largest DNA storage projects ever undertaken.
How does DNA storage of data work?
We typically use a binary code to store digital data, a two-bit code often denoted using 1 and 0. DNA is a quaternary, four-bit code denoted with A,C,T,G. So storing data in DNA involves converting a two-bit code into the four-bit DNA code.
For Mezzanine, ETH Zurich will first compress the album to a more manageable 15 megabytes using the Opus audio file format (similar to MP3). They will write this information on 920,000 short DNA strands. Next the DNA strands are poured into 5000 tiny, too-small-to-see, glass spheres which will be kept in a tiny vial of water.
Today, it is still difficult to write information to DNA and to read information from DNA. This makes DNA storage of data very expensive and very time-consuming. For example, it will take about two months to write Mezzanine to DNA, and the cost will be thousands of pounds. But the advantages of using DNA to store data are so compelling it seems certain that the hurdles will be overcome.
So what does DNA storage do better than all the rest?
Data in DNA storage is ultracompact
The amount of information stored per unit of DNA blows other storage media out of the water. We know this from our own bodies. Every cell in the human body contains 6 billion letters of DNA providing instructions for making our bodies work. And we have 3 trillion cells, each containing this massive dataset, all reproduced from the first cell. All the DNA in our body lined up end-to-end would stretch to the sun and back over 100 times.The technique used by ETH Zurich can store about 25 petabytes of data per gram of DNA. So it could store the whole of YouTube (estimated to be about an exabyte) in a mere 40g DNA. That’s about the size of a golf ball instead of two hundred million DVDs! A different DNA storage method called DNA Fountain, can store 214 petabytes per gram. The total information in the world could, theoretically, be stored in a single room using this method.
Data in DNA storage is incredibly durable
One of the most appealing features of DNA storage is its durability. Most data storage is not very durable, as anyone who buys it knows. Apparently, music CDs should last about 30 years. Mine don’t, but I probably don’t look after them well enough. By contrast, data stored in DNA can last for hundreds of thousands of years. This makes DNA storage attractive for archiving and as a data backup solution. For example, using DNA to store a copy of the 170 million items held by the British Library could become a viable proposition once the cost comes down.
Data in DNA storage can be retrieved over and over again
Retrieving data quickly and accurately is a key requirement for any data storage solution. Here, DNA again performs incredibly well. The developers of DNA Fountain showed that data could be retrieved, perfectly, over a quadrillion times – that’s 1,000,000,000,000,000 times. And every copy was a faithful recreation of the original information stored.
DNA extraction is a bottleneck in genetic medicine
These DNA storage innovations are exciting and have huge potential. But in genetic medicine we urgently need innovation in DNA processes at the heart of genetic testing. Getting DNA out of cells, so that we can decode it, is the first step in genetic testing. This DNA extraction process has remained more-or-less the same for the last 20 years. In our lab we have large, expensive, capricious machines with which we extract DNA from 24 samples per day at a cost of about £15 per sample. Scaling up DNA extraction is often a bottleneck for labs when they try to increase the speed and capacity of their testing.
DNA measurements are surprisingly variable
Knowing how much DNA is present in a sample is also an essential early step in genetic testing. This DNA quantification process is not precise and DNA measurements can vary from lab to lab. We always re-measure the amount of DNA using our internal quantification process in any DNA sample we receive from elsewhere. Often our measurement of the DNA concentration is not the same as the lab that sent us the sample. All genetic testing laboratories experience this.
Apart from the waste of time, money and DNA that this re-measurement causes, we should be able to make DNA extraction and measurement faster, more scalable, more consistent and more accurate. Why hasn’t this happened? Because there has been little attention paid to these problems. Perhaps because innovations in DNA processes will not lead to scientific prestige, headlines or collaborations with cultural icons.
Image from ETH Zurich, created using images from Massive Attack / Colourbox / Caroline Laville.