What is the right way to build a recommender system for a startup?

Wanna become a data scientist within 3 months, and get a job? Then you need to check this out !

A recommender system is the kind of service that every B2C startup needs. It can improve sales and user experience, while at the same time helping you understand your customers better.

So, what is the best strategy for building a recommender system?

If you are setting up a new business, it can be difficult and challenging to understand how to best set up a new recommender system. First of all, you need the right data strategy. Secondly, you need to understand what the key performance metrics for your recommender are. Are you trying to improve purchases, impressions or some other metric (e.g. conversion rate)? Finally, you need to establish the right machine learning strategy for the recommender system, since the types of models you’ll need are different depending on your user base, the volume of data you have and how mature is your company.

So, let’s tackle each point in turn.

Building the right data strategy

I have talked about this briefly in another post , but there are a few points worth iterating. First, something very important to understand is that without the right data, the whole recommender thing goes out of the window. There is a saying in data science: garbage in – garbage out. Think carefully about what kinds of variables will influence your user’s decision to buy or not, and try to collect as many of them as you can. Is the gender relevant? The age? Location? Maybe you even want to provide your users with a questionnaire, so that you can collect more information about their preferences.

Performance metrics

What it is that you are trying to achieve? If you have an e-commerce site, you care about people buying products. If you have a job board, you care about matching people with jobs. If you have a dating website, you care about people interacting with each other. In every case, the key performance indicators are different, and you need to think about them very carefully.

Setting up the right recommender system strategy

So, in an ideal world you have a large user base, and lots of clean and detailed data about your users. In the real world, things do not always work out this way.

When you create a new business, you will face the cold start problem. This is when you have no data to start with, so you need to work based on intuition and build up from there. The first recommender to build is always a content based filtering system. A content based filtering system simply assumes that the products and the users can be described as a set of attributes/features in a vector. The user’s feature vector is determined by purchasing behaviour, viewing behaviour (e.g. what is the user clicking on) and maybe a survey. Its job is simply calculating similarity between a user and a product.

This is a very crude recommender system, but it can perform well if you design it carefully. Plus, with no data, usually you don’t have much choice.

Once you accumulate more data you can move on to collaborative filtering or a supervised learning system. You can also combine all approaches to create a hybrid recommender system. Collaborative filtering usually requires lots of data points in order to work well, whereas with supervised learning you might be able to create something that is working well in the meantime.

An algorithm that had great success recently in recommender systems is factorisation machines. Factorisation machines are able to take into account interactions between input variables, the same way as linear regression, while being able to work with spare data and in a linear time and state-of-the art results.

The right plan for a recommender system

So, if I had to provide a quick summary of the right way to set up a recommender system strategy for a startup, I’d say the following:

Think how the user makes a decision: Get some intuition into the features that are the most important. If you are a job board, then the salary is important. If you are a retailer, think about all the different variables that can influence the purchase of a product.
Come up with the key performance metrics: What do you care about? Increasing sales? People clicking on news stories? Or people also spending a considerable amount of time on a news story after having clicked on it?
Set up the right data strategy: Make sure you collect enough data to measure accurately the key performance metrics. Also, make sure you are collecting enough data from your users, which can be used to make accurate recommendations.
Set up a content based filtering system: This can help you beat the cold start problem. You can start running recommendations without data.
Then move on to a supervised learning model: A supervised learning model (e.g. a factorisation machine) will provide better performance, once you have collected enough data.
Move towards collaborative filtering or a hybrid recommender: As your user base and approaches the 1000s, then you have enough data to use a combination of approaches. Hybrid recommender systems are the best choice in terms of performance, as long as you have enough data.