Ars spoke Tuesday with Dr. Ant Rowstron, a principal researcher at Microsoft Research in Cambridge, UK, about an innovative cold storage project called Silica. Silica aims to replace both tape and optical archival discs as the media of choice for large-scale, (very) long duration cold storage. Microsoft Research is partnering with film giant Warner Bros., which is directly interested in reducing costs and increasing reliability in its own cold storage programs.
The medium in question is a block of high-purity glass, which has voxels etched into it with femtosecond lasers. Each voxel stores multiple bits in two properties, retardance and angle, which may in turn be read using microscope imaging and polarized light. Voxels may be written 100 or more layers deep in a 2mm-deep piece of glass, by focusing the laser to the desired depth within the block itself.
The speed of both reads and writes to Silica currently leave something to be desired—it took approximately a week to etch Superman‘s roughly 76GB of data last year, and Rowstron estimates it would take about three days to re-read the data, with advances made since. The technology is still in its infancy, of course, and large decreases in time required for both writing and reading are expected moving forward. Rowstron says he still doesn’t expect anyone is likely to try to actually play Superman directly from its Silica record—but that’s not what it’s intended for.
True long-term archival of data is a very expensive proposition. When I was in college, I took part in a research project for my university’s archival department—the department needed a database application to index and track its digital media collections, in large part so that it would be able to budget for and carry out archive renewal operations within expected refresh cycles. Its older analog audio and video tape recordings needed to be digitized, and its optical CD and DVD recordings needed to be read, checked for integrity, and burned onto new media before the original discs delaminated.
This archival refresh cycle rapidly becomes daunting at any significant scale. If you assume a collection of 10,000 CDs and a team of three or four undergrads with CD-RW drives and a huge stack of discs, you’re looking at more than a year of full-time work to refresh them. (Warner Bros., which has a rather higher budget than the rare collections department at my alma mater’s library, migrates its own digital archival data on a strict three-year cycle.)
Making matters worse, the lifespan of burned CDs is frequently very short—they can easily begin failing after only five years, so they should at the very least be tested that often, if not refreshed “whether they need it or not.” It is possible to extend optical discs’ lifetime significantly by storing them at 5C/41F and 30% relative humidity, but this adds a significant extra expense to storage and maintenance.
This is the problem Project Silica is poised to solve. Although it’s currently fairly slow to read or write, Silica’s medium—no more or less than high-purity glass—shares none of tape, optical disc, or even paper’s failure modes. A Project Silica glass block is not a compound medium; there’s no plastic outer covering to wear off as there is with CD, DVD or Blu-Ray, and there’s no magnetic medium to physically lose from the surface of a tape or hard disk.
Silica is expected to survive for thousands of years in nearly any temperature, humidity, and chemical environment—it’s literally just glass, and the physical and chemical properties of glass are extremely well understood. We can only guess at the properties of more complex manufactured materials (tape, disk, and so forth) using accelerated aging techniques, but glass artifacts thousands of years old are readily available for study.
In addition to the medium’s already-impressive resistance to degradation—it can basically be expected to shrug off anything short of hitting it with a hammer—the project uses a real filesystem with Forward Error Correction to further insure stored data against corruption or loss. In addition, metadata such as title, index, date, et cetera can be etched into the surface of each Project Silica block in human-readable text.
As fans of The Mote in God’s Eye already know, one remaining question must be answered for any data storage method expected to last for millennia—what happens when the technological and cultural context surrounding a storage medium collapses? Silica addresses this problem also, by using initial “ground truth” tracks. The team is using machine learning algorithms to re-read Silica’s data, and in the event of the loss of those trained algorithms, fresh algorithms can train very rapidly on the “ground truth” tracks, which teach them how to interpret the rest of the data.
We couldn’t resist quizzing Dr. Rowstron on how Niven and Pournelle’s Moties might have coped with Project Silica after one of their cyclical collapses of civilization. Even without machine-learning techniques, researchers who discovered data stored using Project Silica’s techniques should be able to figure out how to read its data armed with nothing more than good microscopes and sources of polarized light, following the same path of discovery along the “ground truth” tracks that artificial neural networks would.