Water Torture – An Analytics Analogy Goes a Bridge Too Far

Data, like water, comes in many forms. The human mind has evolved to filter out most of the data that comes our way because there’s simply so much of it.

When you open your eyes and ears, data is everywhere. The color of the wall, the sound of the air conditioning and the smell of your neighbor’s coffee are treated like humidity. The water is in the air all the time but it’s not useful to pay much attention to it.

When water condenses into fog, it forces you to see it and makes understanding the world around you all the more difficult. Incomplete datasets, corrupted data, bad science, false conclusions and cognitive bias all make you lose your way in the mist.

Data falls like rain. When there’s just a little, it is wildly unsatisfying– just enough to make your car dirty and confuse the conversation. You find yourself wiping away the spot on your glasses as somebody spouts some random data point, gleaned from some obscure source.

When data arrives in expected amounts at anticipated times, it can be captured, channeled and put to use. Irrigation systems, dams and reservoirs provide a feeling of control and allow for construction of an ever-broadening infrastructure with canals, locks and dams. Data warehouses have been built on less trustworthy flows.

Cleanliness is Next to Godliness

Clean water is vital to the success of life, irrigation, running power plants, etc. The definition of ‘clean’ might change for the purpose; it’s OK if there is algae in water that cools a power plant and it is not acceptable if there are more than 10 parts per billion of arsenic in drinking water.

Data is the same. In a direct mail application, whether you have a person’s title (Mr., Mrs., Ms.) is inconsequential… unless you’re mailing to doctors. But dirty data will trip you up every time.

As U.S. Chief Data Scientist, DJ Patil, put it at a First Round CTO Summit, “If you’re not thinking about how to keep your data clean from the very beginning, you’re f^¢&ed. I guarantee it. Trying to clean it up after the fact will take months at least.”

If you heat water to the boiling point, it can power an entire Industrial Revolution. Data seems to be doing the same thing. From the moment computers could store as well as calculate, data has been collected as fast as the storage equipment could be created to do so.

The Data Lake

As the data from these tributaries trickles through the mills engines, it all ends up in the lake, behind the dam. As data is let out in a controlled fashion, it powers the turbines of the data industry; those giant engines of data processing with names like Google and Facebook. There will be no drought here.

And, finally, there is a deep pool of water, waiting for the analyst to dive in. Scuba gear and spear gun in hand, the analyst investigates the deep, maps new ground and discovers new species. It’s a very exciting time to be a data explorer.

That’s why so many of them have been showing up for the eMetrics Summit since 2002. The next opportunity is in Boston, September 27 to October 1, 2015.

eMetrics Summit Registration

A Bridge Too Far

And what of the power of data to carve the next Grand Canyon? What about the glacial melting of structured data? How do we treat waste water in a world becoming more and more privacy conscious?

Those are questions for another time and water under the bridge.

Exit mobile version