205 What do “omics,” light sources, and sensors on the ocean floor have in common? Big data. Really gargantuan data, that’s what. Advanced genomic sequencers like the Illumina HiSeq X 10 will generate terabytes of data per day. Scientists at the Berkeley Lab’s Advanced Light Source (ALS) talk about light source experiments that could generate over 100 terabytes a day, some saying over 300 terabytes a day. NOAA’s former CIO, Joe Klimavicz, predicted several years ago that by 2020, NOAA could be collecting as much as 800 terabytes of data a day, and storing over 100 exabytes. This is the scale the U.S. government is talking about. Bigger than big data. What is an Exabyte anyway? An exabyte is such a big number that it loses its meaning. Even the regular comparisons to the Library of Congress don’t do it justice, but let’s set it to music. At good quality, a four minute song is 8.4 megabytes. An exabyte would give you 119 billion songs that would last 906,000 years. Still hard to wrap your brain around it? If you look at the amount of IP traffic that is moving around the world every month. If you think about the amount of sensor data that will be captured every month. If you consider long running simulations of climate, geology, or nuclear weapons safety. These are all examples of multi-petabyte or exabyte-scale. We took a crack at highlighting a few more here: Breakthroughs will be possible, but storage breakthroughs will be required Scientists will be able to sequence entire populations to determine rare diseases, identify genetic risk factors, and better pinpoint response to drug dosages. They will be able to better probe the electronic structure of matter to understand complex biological processes or develop new battery technologies. They’ll be doubling (and presumably eventually tripling) the resolution of climate models to better predict storms and weather further into the future. Breakthroughs will be possible with the data that is created, but more storage and analytic breakthroughs will be needed to handle the data explosion. Mass amounts of flexible storage will be needed at the initial stages, as well through the lifecycle of data processing and analysis to work across multiple applications and cost/performance targets. Processing will have to continue to get denser, less expensive, and less power hungry. And analytic techniques and platforms must continue to advance. Here at Scality, we’re seeing bigger and bigger glimpses of the future today, such as our recent 200+ petabyte win at Los Alamos National Laboratory. We’re excited to share what we learn! This week, we’re at the International Supercomputing Conference talking about petabyte-scale storage and GovData. Hope to see you there, here in the comments section below, or on Twitter. Onward!