Today I happened to see a website that offered searching for products by color. I actually seen this in another site a few months back but I didn’t think much about the underlying technology. Then, today, as a first reaction I thought “wow! are they hiring people to look at each product image and capture the colors”. Then I realized, this can be done easily by processing the product image. The idea is, every image is made of a bunch of pixels, and the color of each pixel is available through the API. So, one approach is to get the frequency of each color and order the colors by frequency and finally picking first N or based on some threshold. However, as with any image processing, there are other alternate choices available. For example, if the image is jpeg instead of gif, then the number of colors is too many and the frequency of each individual color might be very little. So, perhaps treating all the colors that are very similar into one single color would help. Similarly, sometimes a color with high frequency could be just small specs scattered all over the image and it’s not really useful. Or a ring with a small diamond in the middle could contain a very small but the most important color. So, a color based on clustering rather than purely based on frequency is also a good choice. Only thing is, there needs to be a way to not include the background color, which in most product images is a white color.
Keeping all the above in mind, assume each product is related with a few colors. Then, the next thing is to take the color that the user has picked to search and matching against the product colors with a delta difference since getting precise match is not always possible or gives many choices.
For a retailer doing the above is simply processing the images in the system and creating the color index. However, if this were to be done by a search engine, the search engine has to first retrieve each product image for processing.