Skip to content

The Data Scientist

the data scientist logo

Which sports can be predicted? Part 2

Wanna become a data scientist within 3 months, and get a job? Then you need to check this out !

We continue in this post our discussion of which sports can be predicted. In part 1 of the discussion we discussed about football, and basketball and we saw that it is possible to predict these sports to some extend. In this post we will deal with cricket, NFL, baseball and hockey. If you are interested in the subject make sure to also check out my course in sports prediction!


Cricket is a sport that is popular in the United Kingdom and in other commonwealth countries. So far the only paper in the direction of predicting cricket is the one I was involved in which you can find here: Using Machine Learning to Predict the Outcome of English County twenty over Cricket Matches.


The National Football League is the most popular sport in the United States. It is extremely popular in fantasy sports with sites such as FanDuel and DraftKings having more than 100k players each.

There has been some work in predicting outcomes of NFL. However, because NFL is played only in the United States, where gambling is not allowed, there has been a considerably smaller body of work than one would expect.

For a formal statistical approach check out this paper : A State-Space Model for National Football League Scores . This papers claims to perform better than the bookmakers at Las Vegas for the season 1993. However, this paper is relatively old, and also, the model has not been tested extensively.

The blog fivethirtyeight also provides predictions for NFL.

Microsoft’s Cortana also provides predictions for NFL (among other things).

Also, there has been some work in using twitter to predict NFL outcomes: Predicting the NFL Using Twitter which is similar to my paper.

So, predicting NFL games is definitely feasible. Cortana’s predictions, at least, are quite accurate. Now, the question of whether the models are good enough to make money is open, since, at least in my research, I couldn’t find any benchmarks.

What is way more popular, however, is fantasy NFL analysis. This MIT thesis is a good piece of work: Interactive tools for fantasy football analytics and predictions using machine learning. There are also some companies that are active in this area such as Swish Analytics.


Baseball was the sport where sports analytics started. Moneyball is the most famous example, but sabermetrics has been around well before that. It’s possible to find academic papers on baseball since 1962. PECOTA is an algorithm that makes baseball predictions that can be found  here.

A recent student project from the Stanford Computer Science department attempted to predict baseball games. The results were average, not performing better than the Las Vegas benchmark. A google search can discover a few more such as this.

So, how accurate can baseball predictions be? An interesting post by fivethirtyeight reports that the theoretical minimum root mean squared error that can be achieved in baseball is 6.4 and that no method currently comes close to that.


There has been limited work in predicting hockey games. A paper from 2013 achieved accuracy of close to 59% on National Hockey League games. The theoretical limit seems to lie at 62% as explained in this blog post.

Wanna become a data scientist within 3 months, and get a job? Then you need to check this out !