As the quantity of digital information in mediums like Facebook continues to grow, the capacity to store it may become inadequate. Even though these files may not be accessed regularly, people still want the information at their leisure.
The innate purpose of DNA is to store genetic information, but UW researchers are broadening its range to include digital data.
“DNA is a fantastic information storage molecule that nature has evolved over a long time, and there’s nothing that says that we can’t store other kinds of data,” said Luis Ceze, a UW associate professor of computer science and engineering (CSE). “Now, we know how to manipulate data very well. The biotech industry has been developing tools to write and read DNA, so we are using it to store digital data.”
Ceze is working with doctoral students and professors from CSE and bioengineering, as well as Microsoft researchers to store images and text in the form digital data using DNA, which is denser than flash drives, the current storage material in USBs.
Using DNA as a storage solution is viable, particularly because it can never go obsolete, and can store a high density of information.
Ceze envisions this technique being used to archive large amounts of archived data.
“Think about if you wanted to back up the Library of Congress,” Ceze said. “You want to back it up and keep it safe, but you probably need other kinds of technology that make it fast to access. Then you tap into the archive when you need it. Most of the data in the world isn’t accessed all that frequently, but we need it there.”
To test the ability to encode, store, and retrieve digital data in DNA, researchers created algorithms to match the digital data in DNA, ordered manufactured DNA molecules, and sequenced and decoded the information. This data was then compared to the original file.
“There’s a maximum length of a molecule you can make, and it’s about 200 [to] 300 nucleotides, which is big by DNA standards, but small by computer science standards,” said James Bornholt, a UW Ph.D. student in CSE and co-author of the paper. “One of the things we had to do was break data down into smaller chunks and put each one in a separate molecule and have a pool of molecules, that combined, contain a file.”
Afterward, the data is sequenced and decoded back to the original file. This process is called cold storage, which is the retention of infrequently accessed data.
One challenge with information storage and retrieval is random access, which is the ability to access one piece of data from a larger set. These researchers used polymerase chain reaction techniques in which DNA fragments are placed in a liquid medium with unique primers that identify regions of interest. This molecular biology technique is used to amplify these sequences for retrieval, and this has also worked for digital data.
For each digital file encoded in DNA, researchers placed identifying sequences on each end to distinguish them from the rest of the data. Although this method is effective for identifying regions of random access, synthesizing large amounts of DNA can be a barrier.
Bornholt presented the research to the ACM International Conference on Architectural Support for Programming Languages and Operating Systems, a venue for conversations about interdisciplinary CSE research. People expressed excitement about the long-term applications of this research, and the use of biology in the context of computer science.
“Computer scientists are starting to realize that we need radical solutions to some of our problems, and this is one radical solution,” Bornholt said. “It may not work out in the long-term, but that’s fine as long as we’re opening doors and trying to make existing technology work better.”
Future research will focus on developing coding schemes and integrating the process of information storage and retrieval into a viable system. Ultimately, the researchers are excited about the current and future collaborations of the project.
“I think this is a really exciting collaboration that could only happen at UW,” Bornholt said. “I think we’re one of the world leaders in this area.”
Reach Wellness Editor Aleenah Ansari at firstname.lastname@example.org. Twitter: @aleenah_ansari