Working With Product Data on the Internet: Part 2

This is the second in a series of posts about the challenges of working with product data on the internet.

In my previous post, we looked at some challenges that businesses face while working with product data on the internet. Today, we will dig deeper into the particular types of issues and look at examples as well.

This Wasn’t Meant to Be Here!

While businesses employ armies of people and spend fortunes to keep their web data tidy, problems are still aplenty. It doesn’t take too much effort to spot items incorrectly categorized (see examples below) on even the largest of ecommerce sites. In fact, it’s not really a question of whether a listing page has an error but it’s actually a question of whether you will scroll down through a couple of pages. In doing so, you will surely spot many inaccuracies. These examples below were spotted on a  large online retailer’s website (to remain unnamed!) on the first listing pages of the cosmetics and shoes categories.


Miscategorized lip makeup

Miscategorized golf shoes

Miscategorized golf shoes

As many web properties endeavor to expand their product assortment, they face this ever increasing challenge of keeping their growing corpus precise.

If you’re a pricing analyst, a category manager, a deal aggregator, or if you have a function that requires you to collect, process and use product data from the internet, such issues would be a serious concern. If these issues go unchecked into your product data,  you could either end up taking incorrect business decisions or end up presenting content to your end users that leads to a poor user experience. To counter this, the only option is to create an infrastructure that allows you to weed out such inaccuracies in the base data that you work with. The examples above show incorrect categorization, but product data issues also occur in brand labels (e.g.  an aftermarket iPhone accessory being branded incorrectly as Apple), incorrect pricing (e.g. an accessory being labeled as the main product leading to price discrepancies), and so on.

Looks Can Be Deceiving

Today, there is still no universal product catalog for all branded products. In the absence of one comprehensive repository, businesses that list/sell products on the web often end up writing their own product content such as title, specs, description, etc. Outside of textual metadata, many businesses also hire photographers to create high quality product images in lieu of the images they may have acquired from the brand or other public sources. While the intent behind these efforts is to surface the most accurate and appealing product information to their end users, this does mean there is no standardization of content across sites.

It is very common to look at the same product across two different online retailers, and find there is hardly any similarity across the two pages. In fact, at first glance, you may actually conclude that you are looking at two different products. The example below illustrates this case. The product titles and product images are so very different that without deeper analysis, it is easy to assume these are not the same product! With such fragmentation and non-conformance to any standard, the task of matching/comparing across stores becomes extremely complex.



Scaling Doesn’t Get Any Easier With Time

Unless we are talking about “problems”, more is typically always better. The same principle applies to product data where more awareness is almost always better than limited awareness. Let’s say a category manager is looking across his competition to identify assortment gaps. He would love to get data from as vast a number of competitors as he can. Even if for cost/time/complexity reasons, he does start off with a few competitors, with time, he will always want to expand his awareness.

Unfortunately, getting product data from the internet is neither easy to start with nor does it get easier with time or scale. Each retailer site, brand site, or marketplace has its own technology stack, and there is no conformance to a norm when representing product data. has recently got some traction, but its adoption is far below ideal. (See example below on how non-conformance to standards makes the task of extracting meaningful information difficult for even the stalwarts on the web).





This means getting access to product data on the web as a business user is no simple task nor does it get easier with time.

Coming Up

So far, we have looked into the top challenges we run into when we work with product data on the web. In the next post, we will switch gears and move on from defining the problem to possible solutions. As part of that, we will look at the steps involved in cleansing, refining, standardizing, and organizing product data at scale. If you are more of an audio/video person, feel free to watch the recording of our recent webinar on this topic.

  Download the Pervasive Commerce White Paper
Webinar - The Internet Is a Dirty Place

Leave a Reply

Your email address will not be published. Required fields are marked *