Last week, I wrote an introductory post on the crawling mechanism that we use at Indix to gather and structure product data from the web. Before going deeper into the crawling and extracting process and the complexities associated with it, I wanted to step back and talk a little bit about catalog record – a critical piece of the product content puzzle.
What are the different fields that make up a product catalog record? At Indix, we use our own taxonomy to extract and store information. On a typical product page in an ecommerce store, attributes are specific to a particular product. The end consumers on retail sites make buying decisions based on these attributes and identifiers. Several studies indicate that richness of content is directly proportional to conversion. (Watch out in the coming months for more content on ecommerce SEO).
Let’s use as an example this Whirlpool 24.5 cu.ft. French Door Refrigerator in Monochromatic Stainless Steel.
So based on the Internal Indix taxonomy, at a high level, we have a few levels of attributes that take values based on the product and category. Base attributes and identifiers consist of information that is used to identify a particular product. This includes the product title, MPN, SKU, UPC, etc. These identifiers are unique and they let you identify these products specifically.
Variant attributes are closely tied to the base attributes and are critical for matching, as matching is always done at a variant level. For instance, your iPhone can have multiple models. As a retailer, you may decide to include it under different products in some cases or under different variants in some. A shoe of size eight and nine would be different variants, while an iPhone 7 with 32 GB capacity and an iPhone 7 of 128 GB capacity would be different products.
Variant attributes let you define all the different variants of the base product. In the example shown here, the color stainless steel is probably one variant. The color white would define another variant of the same refrigerator. Another way to look at it is that all different combinations like color, size, and so on correspond to different variants of the same product (except for shoes where different sizes correspond to the same product).
This brings us to Level 1 attributes, which are typically specific to a particular category. Within a particular category, you would expect a set of values that are commonly attributed to products in that category. The taxonomy used for this purpose differs from store to store. In the case of this refrigerator, these include capacity, depth, width, type, etc. For a shoe rack, these could be shipping weight, dimensions, color, item shape etc. In order to draw insights from this data, you would want both the keys and values to be standardized.
Consumer search intent most of times is based on attributes specific to a particular category. So, a person who is trying to buy a particular refrigerator can facet on any of these attributes. He might want the capacity to be greater than 25 cubic feet in a refrigerator that he is looking to buy. Level 1 attributes help achieve that goal. Depth of attributes and the ability to facet on them are essential to have a higher click to conversion ratio.
Then, there are additional attributes on top of the category-level attributes that are standardized for this particular product. A refrigerator can be water resistant, have a particular place of origin, manufacture date, etc. These attributes enable consumers to get a more complete picture of the product, which is important in the online shopping experience. It may not make its way into faceted search navigation but is important nonetheless.
These above all, are the key ingredients that go into building a whole product attribute set and this set then defines the product in its completeness. Next week, we’ll go back to looking at crawling mechanisms and what it takes to aggregate sophisticated information as detailed here.
If you want to watch a recording of our webinar on gathering and ingesting product data, click here now.
Also published on Medium.