Sysadmin blogOnce all the marketing is cleared away, just what is big data, and how does it help real businesses of all sizes? Marketing would have us believe that big data is new, huge, terrifying, complicated, impossible without their help and yet will deliver unmatched benefits. Like many things in tech, however, big data is really just an iterative evolution of things most businesses already do.
The first thing to understand about big data is that it isn't new. As a concept it is, in fact, quite old. Thousands and thousands of years old; it's just a heck of a lot easier with a computer.
Big data is about gathering lots and lots of unstructured data and then making sense of it. That's it. That's all there is to it, and computers don't need to be involved.
“If you know the enemy and know yourself, you need not fear the result of a hundred battles. If you know yourself but not the enemy, for every victory gained you will also suffer a defeat. If you know neither the enemy nor yourself, you will succumb in every battle." – Sun Tzu, The Art of War .
Thousands of years ago, successful generals gathered information about everything from terrain to weather, economics to the sociocultural mores of their opponents. Huge teams of experts sifted through oceans of data, reviewing notes left by past warriors, studying oral history and more. Victory came from preparation and intelligence, not merely strength of arms.
History repeating itself
Around 2,000 years ago we started using computers. The Antikythera mechanism is thought to have been used to predict astronomical phenomena for calendric purposes. We've used computers – first analogue, and then digital – ever since. So inculcated has reliance upon these mechanisms become that we collectively take them for granted.
Can you imagine farming if there were no calendars, and all documentation on methodology was scattered unevenly throughout society? Essentially farmers would be trying to farm using nothing but oral history and whatever documentation, calculation and records each farmer created on their own.
How about the daily grind of waking on time, commuting, spending the requisite number of hours at work and returning home if there were no clocks? The simple combination of "it is this time" and "get up now" resulting in an alarm changed our entire society.
If these questions seem absurd and primitive, understand that 25 years from now businesses and business people will feel the exact same way about trying to operate a business without the analytics tools we hype as "big data" today.
When companies talk about big data they normally talk about how big data can be used to perform behavioural analytics for governments or fortune 500 companies. They talk about datasets in the hundreds of terabytes and finding tiny needles in universes of data. None of that matters to average businesses. But there are big data concepts that do.
The real gold mine of big data analytics for any company is their middleware. Any company larger than about five people has middleware, even if all the middleware does is join up the accounting system to the Customer Relationship Management (CRM) system.
Most companies produce something. Bread. Bicycles. Articles about big data. Somewhere along the line keeping track of what's been ordered, what's in the process of being manufactured and what's out the door matters. Joining that up to the accounting and CRM systems helps. If you ship or receive physical goods there's probably some logistics software too...even the smallest businesses drown in data.
The middleware might just be some parsers knocked together in PowerShell or bash. Maybe it's just a batch script that runs every night to export from one application and import to another. Maybe one of the applications has middleware functionality built in and manages for all the rest...but somewhere, there's a widget taking data from one or more applications and feeding it into others.
It is right there the opportunity for big data exists. If you copy off the data from these various applications as it transits the middleware and store it somewhere (a database is a likely suspect) then you can start asking that data interesting questions.
The logistics software might want to interact with the accounting software so that it can include a copy of the invoice and the CRM software in order to get the customer's address. That's normal functionality and doesn't really involve analytics.
But maybe sales want to cross reference order arrival time with time to ship out, customer geographic region and even products ordered. The goal: see if there is a correlation between the length of time it takes certain products to make it through manufacturing and customer retention in specific geographic regions.
The data exists in those systems. It probably transits the middleware every day. But building a widget to gather the data from those systems (or the middleware), store it somewhere and then ask the data that question...that's big data.
We know that the movement of the constellations across the night sky is correlated with the seasons, and that planting specific crops at specific times of year is a good plan, but building a widget that takes both sets of knowledge and presents us with "do this on this date"...that's big data.
The cloud thing
The modern concept of big data wasn't born in the cloud, but the two did grow up together. Of all the workloads one could name, Big Data seems almost uniquely suited to the public cloud's cloudbursting capabilities.
The existence of the public cloud has birthed an unending number of data-collecting applications and services and these in turn drive demand for big data. Of course, the cliché is that collecting data doesn't help you if you don't know what to ask your data. This is why cloudbursting works so well for analytics applications.
Most real world big data applications don't churn endlessly on the data. They're reports that run at specific times, or which are carefully constructed for very occasional use. Crunching large datasets can require insane levels of computing power, but when the analysis is done all those VMs doing all that work can be discarded.
Creating good middleware has always been something of a dark art. Doing analytics on all the information that transits that middleware is a layer more complicated again. It takes time, experience, expertise and – above all – testing to get it right.
Done right, however, big data analytics really are transformative for most businesses.
The hot dog cart company sells 8x more bratwurst on 4th and Main than it does on 8th and 2nd, but sells 6x as much Mundare sausage on 8th and 2nd as anywhere else? Weather affects which hot dogs are bought, and different districts are affected differently? We can analyse that. Changing up the cart loads is cheap and simple enough and voila: more profits.
Now, what if we feed all the data about who buys what from each cart to a cloud repository in real time. We add in a customer loyalty something or other to help get demographics, or tie in a camera and start doing facial recognition from one o the emerging cloud forensics-as-a-service (FRaaS) vendors.
Now we can do everything from target advertising to more accurately predicting what loadouts we should have in our hot dog carts for special events, or if we are planning to set up on a new corner with a different demographic.
We could even start looking at where people from our city take their vacations and set up carts there, where we'll be a familiar brand that has exactly what they want, when they want it. That's big data. And it's what turns one hot-dog truck into two, and two into an international chain of them.
So what do you do with your data? ®