This is the fifth in a series of blog posts looking at the parallels between locations and products.
In our post last week, we explored location and product categorization and many of the challenges around it, like standardization and accuracy. Product categorization, however, is the least complicated product-tracking challenge we need to overcome.
If you think about a single product, it seems pretty simple, right? It’s just a product. Maybe it’s a blue sweater or a glass coffee table or a red water bottle. It seems straightforward, but is it? Products, like locations, have a ton of data that systems must track, but unlike locations, product data is far more challenging. Whether we’re trying to track a single product across many stores, or historic pricing changes for a single store, or even know how many products a store carries, product data comes with myriad challenges.
We discussed the challenge of standardization in depth in the previous post. As we said then, there are more standards for naming, specifying, and categorizing locations. For products, we’re all over the place, and no two retailers follow the same standards.
Beyond standardization, we have the challenge of data freshness. Location data changes relatively slowly—it takes a while to open a business, build a new house, pour a new road, or sell a building. Product information, in contrast, changes rapidly, and is influenced by far more factors. Manufacturers and brands release new products or new variations of old products every day. Each of those products or variants has attributes, a description, a category, and a suggested price. Retailers elect to carry these new products and to discontinue older products. They also change prices all the time—sometimes every hour. This stacks up to a giant mound of rapidly changing product data.
Coverage is one of the most confounding challenges of product data. To determine product data coverage, you measure the availability and comprehensiveness of data compared to the total data universe or population of interest. In other words, of all possible data to collect, how much did we actually collect? Location data has a relatively finite denominator. You can count the number of buildings on a map or apartments in a building, or you can define your exact latitude and longitude out to a finite number of decimal places.
With products, however, the denominator is nearly impossible to determine. We can estimate how many products exist in the world, but then we only have a simple definition of coverage: the percentage of the world’s products for which we have data. Sadly, that’s useless for day-to-day operations. Knowing how many products exist at a particular retailer or for a particular brand would be much more useful.
But how do we determine this? We have already established that GTINs won’t work because UPCs can be reused. Our friends at 360pi estimated that Amazon carries 353,710,754 products across all of its sellers, although that excludes product variants. Typing “site:amazon.com” into Google yields 154,000 results. And what about other stores? What about within categories? Estimating the denominator to determine coverage causes a headache and a half.
In addition to these complexities is the complexity of attempting to compare apples to apples. In other words, how can we be sure we’re talking about the exact same location or product? This is hard for locations. I’m trying to get to your apartment; I know your building. How do I match the delivery of this package to you? I knew someone who lived in a building with no apartments ending in -4 or -13, which caused massive confusion for the cable wiring company—they disconnected her cable, since she lived in 305, but her cable connection was labeled 304. Luckily, location doesn’t change quickly, so you only need to figure out an apartment’s numbering or a new city’s streets once. But what happens when someone moves? There is no master database, since we certainly can’t rely on the one-year forwarding from the post office.
Similarly, we lack standardization in product identification and categorization. It means we have to figure out whether one women’s Equipment cashmere blue sweater at Amazon is the same as another women’s Equipment cashmere blue sweater at Nordstrom. A single, universal product identifier that cannot be reused would be a huge first step in de-duping and matching products.
We obviously have a long way to go down a challenging road to use product data exactly as we use location data. We lack standardization, it’s hard to ensure data freshness due to rapid changes, we cannot find any sort of denominator for coverage, and we have a hard time making sure we’re talking about exactly the same products. One almost wonders why we decided to tackle this at Indix! (What can we say? We like a challenge.)
In the end though, can we really use product information like we do location information? Find out in the next and last post in this series.
Also published on Medium.