Escribe tu búsqueda

Big data, big pollution? by Ignasi Lirio


The Economy keeps going with no care about exhausting Earth’s resources or keeping human lives. Some others that do care are trying to do something against this trend so they work to reduce contamination by means of recycling part of our big garbage or design devices that capture the excess of carbon dioxide responsible of Climate Change.

But cleaning up the air or the oceans, or reuse everything from our waste it’s a tremendous task, that is far from see any victory. The rate at which mankind produces trash is higher than the rate we recycle it. Carbon dioxide levels in the atmosphere keep growing up restlessly while ocean’s life is quietly dying.

This is about ‘classic’ pollution, the one generated by matter-energy pair. But, what about information? Are we polluting the world with information waste?

These days seems like everyone is excited with the Big Data and the Internet of Things hype, as the new magic thing that will boost Economy and create lots of wealth. It could be that way but… Are we really aware about how much data will this generate? Do we have the resources to manage it? Are we really prepared to sustain such an information revolution?

Just some facts (or data!):

  1. According to Seagate’s vice president Mark Whitby, the total amount of digital data generated in 2013 was about 3.5 zettabytes
    (1 Zettabyte = 1 000 000 000 000 000 GB)
    and estimates are that world will generate about 44 Zettabytes a year by 2020.
  2. Another estimate says that next year 25% of world’s population will be using smartphones. If such these devices got –in average– about 32 GB of storage capacity, we have 7 billion x 0.25 x 32 GB = 56 Exabytes, only in smartphones!
    (1 Exabyte = 1 000 000 000 GB)
  3. As of 2013, the World Wide Web is estimated to have reached 4 zettabytes
  4. For instance, it is estimated that Facebook servers process around 2.4 billion pieces of content and 750TB of data every day.
  5. Scientists calculate that mankind has stored about 300 Exabytes from 1986 to 2011. So 1/5 of all information stored in 34 years could be stored only in current smartphones.


If the unavoidable trend is that everyone will have a replica of the information they generate in the so called Cloud, what is to expect? Will we have the capacity to build Data Centers to support all this? And, in the case we have it, would it be profitable to invest into?

And I am just talking about basic file storage like pictures, e-mail messages, etc. The vast majority of internet traffic in the near future won’t be generated by people but machines talking to each other.

Believe it or not but this picture is from the biggest Data Center (source: Data Center Knowledge)

Believe it or not but this picture is from the biggest Data Center (source: Data Center Knowledge)


So, if we just happily embrace all this Big Data and Internet of Things rap, we may face with the fact that there won’t be enough data storage capacity to keep all that data pollution. There won’t be a cloud big enough to maintain such an information surfeit.

What should we do about that? May we start caring about Data Pollution yet? What fraction of all this data may be worth archiving? Will we really want to access to all of our pictures, videos, instant messages… that we generated 10 years ago? Should our geolocation traces be available for querying 5 years after we generated them? Does it makes sense to keep the history of our refrigerator performance for years?

Data is invisible to our eyes so we may think it’s totally free so it’s infinite, but it could become pollution just like contaminated air.

One of the ‘fathers’ of the Internet, Vint Cerf, warned some time ago that we may face something he called The Digital Dark Ages, and we may accept that most of the information we generate and share (voluntarily or not) will be lost for ever. We may need to start thinking about concepts like Data Lifetime, Information Impermanence, Archive entitlement, and Flash –or FlushData (like Flash memory cards).

We may get used to a workflow like this:

Generate Data > Create Content out of it > Organize, tag it with a Due Date > Share it in the Cloud > Programmed killing of most of it.

Data is invisible to our eyes so we may think it’s totally free so it’s infinite, but it could become pollution just like contaminated air.

An article by Ignasi Lirio originally published in its Medium

1 Comentario

  1. Mandibul 26 marzo, 2015

    Interesting data.
    No need to worry as long as politicians leave free market to control itself.
    If storage becomes less accessible, its price will increase and therefor the demand will grow.
    After that, as the prices go higher, the interest to make business out of it will become stronger and then it will be the offer what will widen lowering prices…
    It is that simple since “el mundo es mundo”

Dejar un comentario

Visit Us