Gartner long ago proposed the three Vs of big data – Volume, Velocity and Variety. That’s fine, even if a bit theoretical. After four years of aspiring to build the world’s largest product database, I believe that there are three practical requirements to make progress on turning data into insights and action – Scale, Structure and Quality.
First, you need enough data to enable automated learning. In our case, we set the bar as somewhere around one billion products. Next you need structure. A lot of the data will come from multiple sources and is likely to be unstructured. A structure provides a foundation for insight generation. Structure will evolve, as the system learns with more data and more use.
Finally, the quality of the data is important for confident decision making. You can curate the data, of course, to be completely confident, but that is somewhat counterproductive to the scale of data. I believe this is the hardest part of building big data systems – quality. In our business of building the world’s largest product database, we think of quality in the following seven ways – freshness, coverage, precision, depth, completeness, relevance, and ranking.