Last week, we announced the immediate availability of Indix Standardized Attributes. We’re thrilled to be able to solve this difficult problem of consistent, standardized product attributes at scale for our customers at a fraction of the cost of other available solutions (read: humans doing it). In this post, we’ll discuss some of the details of the problem that makes standardizing attributes hard and expensive and how Indix attacked the problem to bring scale to the party.
At first glance, product attributes seem simple. My coffee table is 42” long, 24” wide, and 18” high. It has ½” glass. Simple, right?
Not so much. One site might list my coffee table at the dimensions above, but another might list the length at 3.5 feet, and yet another might list it at 106.7 centimeters (and even yet another at 1.07 meters… you’re getting the idea). To make matters worse, these sites might label the length as “Length”, “Len.”, “Long”, “L”, or something else.
Now we know why a single attribute might be hard to standardize, but there’s another problem: how do you know which attributes to standardize for a product? Length makes sense for a coffee table, but it’s meaningless for a coffee mug. This means that every single category needs to have its own attribute schema. We have over 7,000 sub-categories for our 800 million+ products, so just think about that scale for a moment. Tired yet?
Rather than close our eyes, stick our fingers in our ears, and yell, “LALALALA WE CAN’T HEAR YOU!” at the top of our lungs, we hired some brilliant AI/ML folks, and tackled the problem.
First, we had to figure out how to set schema at scale. We looked at how some others had done it, and (like many intelligent people) we cherry-picked some of their best practices as well as implementing some of our own. Rather than see every attribute as having the same priority, we separated them into tiers:
We already standardized identifiers for each of our products, but we needed to set our schema for Variant, Level 1, and Level 2 attributes for each category. We systematically prioritized categories and designed attribute schemas for each.
Second, we had to map our keys to the schemas and standardize them. Each attribute comes in a key-value pair, where the label is the “key” and the magnitude, quantity, or descriptor itself is the “value”. We attacked the keys, first, which meant applying machine learning to make the decision for each category which version of the word “Length” should be the attribute key for that dimension. Extrapolate to our scale, and you end up with over 2,500 attribute keys over just our top 500 categories. (And counting as we expand to more categories every day.)
Finally, we had to standardize the values. We applied machine learning again to determine whether an attribute like length should be in inches and then what numeric value that should be. We did this across our top categories, and standardized all required attributes for 69 million+ products, and we had at least 10 attributes for 20 million+ of them.
Standardizing attributes for our products has made a huge difference in the amount of usable data available for each product. Let’s look at Lazy Hill Farm Designs Sunrise 7 in. x 72 in. Cedar Window Box, for example. Here are all the attributes we have for this product in the Indix Cloud Catalog, non-standardized (you’ll have to click to enlarge it, since it’s a bit unwieldy):
After running this product through attributes standardization, this is what we have:
(Yes; we made it pretty.)
As you can clearly see, standardizing attributes took the “soup” of all attributes and fished out the meat and veggies. Because our customers prefer eating with the precision of forks, this makes consuming our product data much less messy! (Did I break the metaphor? I think I pushed it too far…)
Customers can take Indix Standardized Attributes and build robust product pages on top of them. Whether they use them for better product detail, search faceting and filtering, or competitive matrices, standardized attributes will make it easier for our customers to give their customers the attributes they need to find and buy products. We can also take these standardized attributes and pass them to Narrative Science for some natural language generation (NLG) magic, but that’s a blog post for another day.
Also published on Medium.