“Big data” has by far been the most ubiquitous and exciting business term for the past few years. More recently, doubts have been raised about whether it is being optimized for action or if there is in fact a gap to be filled between the collection, storage, and application of the data for desired outcomes.
What is “big data”? Gartner summarized it in this one precise sentence: “Big data” is high-volume, -velocity and -variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.
Big data is a living, growing entity because of the Internet. Volume refers to sheer amount of it that exists today. It is everywhere and everything adds to it. Whether it is the pictures and information that we share on social media, or what Google or Amazon collects about its users, what geo satellites collect about the weather, traffic, etc.
Velocity refers to the rate at which it is growing, and that is mind boggling. Every day, we create 2.5 quintillion bytes of data, and interestingly, 90% of the data that exists today has been created in the last 2 years alone. It’s crazy when you think about it! The variety refers to as stated above, the different sources that the data comes from.
One thing that might be obvious here already is that in its organic form, this data is raw and unstructured. Think back to the time before the Internet. Traditional data was organized in files and folders, indexed, annotated, and kept ready for future use. Now we need to do that with all the big data that sits in the virtual world. Without organizing and structuring the data, it does not become actionable. Going back to the Gartner definition, it’s that critical part that talks about “enhanced insight and decision making”, which is the ultimate game.
That’s the moment where we are right now. So far, the focus has been on building the infrastructure to store all this data – storage, databases, cloud platforms, and so on. It’s all there and it’s all good. But now the time has come to act on this data. If businesses don’t act now, they will find themselves in a place where they are overwhelmed and under-equipped to make things happen.
The data is here. The desired outcomes need to be defined and a plan of action put in place. This involves indexing and annotating the data in order to draw insights of value that will fuel business growth.
There are some companies (besides the big players like IBM, Google, Amazon, Facebook) that are actively tacking this problem, doing it well, and thriving. Let’s look at some of these examples.
One of the best and most close-to-home examples for all of us is Netflix. They have a massive database of motion pictures and television shows that is growing every day. By tagging each and every one of those pieces of entertainment with several tags and attributes, they have created a structured and organized database. This enables them to do two things – by combining it with the data that they have about their subscribers, they can recommend the right movies and shows, a feature that I’m sure many enjoy as much as I do! At the same time, if their subscriber is in the mood for a “post-war screwball comedy based in Europe”, they can find it if the movie has been made.
Over the years, Foursquare has built a treasure trove of location big data. Companies like Uber and Evernote use their API to fuel their respective businesses. Can you imagine what Uber would be without proper location data? Because Foursquare has a structured database in place, companies like Uber can focus on building their service and not the location database itself.
Similarly, Gnip organizes and structures tons and tons of social data and then provides it to brands. Brands on the other hand, get the information that they care about and can act in it by drawing valuable insights.
Knewton is a company that is using big data to revolutionize the field of education. By harnessing the power of data science and machine learning, they are able to generate individualized progress paths for students. They constantly monitor a student’s progress, identify gaps in learning, and then accordingly recommend what courses they should be taking next. Teachers in the classroom can use this information to tailor their instruction for particular students.
Such inspired use and application of big data is a need of the hour. It is going to redefine every single aspect of our existence, whether it is how we buy things, how we drive our cars, how we get healthcare… you name it. With the coming of the Internet of Things, the need for this structure is even more imminent.
There’s the data itself, the infrastructure to store it, and the domain expertise required to structure and organize it. Without a framework and the application of machine learning and data science, all this data that we are collecting will be like having a Ferrari sit in your garage and not know how to drive it.