We’re using eCommerce data to predict automobile sales. In principle, it’s very straightforward. We use a Python web scraping [1] script to collect star ratings of products that people are likely to buy if they have just purchased, or planning to purchase a new car. We’re assuming that 12.5% of the people buying these products and writing reviews are planing to purchase a new car, while the others are existing owners. Products such as:

We’ve selected about 7000 products and keep a real time check on the number of reviews being written on Flipkart, Amazon and Snapdeal. The hypothesis is: more sales equate to higher number of reviews, and vice versa (with reviews being the independent variable).

However, this is not a linear equation. If you buy a product which turns out to be terrible, you’re more likely to write a review. You’re not likely to write one if the product works great. We quantified this using an anthropological survey. The results are below:

Fig 1: Numbers are in percentages. There’s a 4.1% change of you writing a bad review and giving a one star rating if the product turns out to be terrible. This number reduces as the quality keeps getting better and only slightly increases when there’s customer delight. [2] Once you’ve normalised for this, you can get a real time number of the number of people who might be interested in buying cars.

Fig 2: Here are the normalisation value with a sample calculation of how to estimates sales.

The margins of error is high, but this hedge fund strategy is as a reliable indicator of quarterly sales trends of Indian car manufactures. For longer time-frame investments, that’s all an investor need.