Wanna know more about data science? Make sure to check out my events and my webinar What it's like to be a data scientist and What’s the best way to become a data scientist !
Deep learning has become one of the hottest buzzwords in the world of tech. And like all buzzwords it has raised questions, created expectations but has also given birth to some amazing solutions to problems that couldn’t be solved before.
So, what is a deep neural network?
A deep neural network is simply a neural network with many layers. That’s all there is to it, really.
On the figure below, on the left you see a simple neural network. The difference to a deep neural network (on the right) is clearly visible. The extra layers provide a huge increase in computational power, which have allowed deep neural networks to reach amazing performance in multiple tasks.
Until fairly recently, it was believed that additional layers do not really help neural networks with anything. This was partly to a two reasons. First, Cybenko had proven that neural networks are universal approximators (this proof was later generalised to feedforward neural networks by Kurt Hornik). Secondly, there were issues with using gradient descent to train networks with multiple layers. Somehow, the two points were misinterpreted by a large number of researchers and practitioners into thinking that 1 or 2 layers were enough.
Neural networks fell out of fashion in the 90s. In that period there were three important advancements in machine learning. One was establishing parts of machine learning theory on Bayesian statistics and integrating it with probabilistic reasoning. The other advancement was the development of support vector machines, a linear model that was very successful, more transparent and better based in theory than neural networks (which are largely black boxes). Finally, the appearance of boosting and random forests provided very good and fast algorithms that could work in very well in many problems, straight out of the box. Neural networks fell somewhat out of fashion in academia and publications became more rare.
Deep Neural networks are born
In 2006, Geoff Hinton, Osindero and Teh published a series of articles showing how they could efficiently train a neural network with multiple layers. The type of network at that time was called a deep belief net. Hinton mentions in interviews that one reason he came up with the terminology “deep learning” is that he found it very difficult to publish in neural networks, because of the bad hype that neural networks had after a point in academia.
A few years later and neural networks have become one of the most important field of machine learning, giving us unparalleled performance in anything related to computer vision, audio, text and traditional ML tasks.
So, what is it that research in the previous decades had missed and deep learning is doing. From a practical viewpoint, deep learning is very good in extracting features. This is what the picture below is showing. In a traditional ML task, a large part of the time is spent extracting the right features. By features I mean not only finding the right variables, but also combining these variables in a meaningful way. For some fields, such as econometrics, this is easier and more fundamental, since transparency and interpretability is key. However, for fields such as computer vision this is very very difficult.
The human visual system, from the eye down to the neural cortex, contains multiple feature extractors. We have dedicated systems for detecting edges, simple shapes and higher level features such as human faces. Handcrafting a system to do the same is very difficult, but this is where deep neural networks really shine.
The way that neural networks extract features, and what types of features, is still an active field of investigation, as with very deep architectures that consist of 10s of layers this becomes very difficult. However, Google has produced some great work in this area. On their research blog you can find some cool examples of feature extraction from the layers of a deep neural network. The cool graphic below shows how a single neuron responds to different areas within a picture.
You can see that as the visual field (represented by a red square) moves around, the algorithm is trying to decide whether this ear looks more like a labrador or some other kind of dog or animal.
Pros and cons of deep neural networks
So, this was a short overview of deep learning. But what are the pros and the cons of deep learning? And I am not talking from an academic perspective, but from the perspective of a decision maker, be it a startup founder or a CEO of a large corporation. I will outline what I believe is a good list in my opinion
- Deep learning has solved problems in computer vision and audio much better than anything before. The same also holds for advanced natural language processing (e.g translation or speech-to-text). If you are facing any of those problems then this is the solution to go.
- The fact that it is still an active field, means that there is potential for innovation. Some startups have based their whole business model on creating deep learning models optimised for a single goal (e.g. detecting animals on images or translating from voice to text).
- Deep learning has been commoditised to some extend through services such as Cloud Vision, Cloud Translate or IBM’s Watson. So, for some applications you might be able to roll something out quickly by using these services.
- Requires large datasets. Not always available, especially for new business since either data might not exist (e.g. user data), or it might be too expensive.
- Requires lots of computing power, hence higher cost.
- Deep learning experts are a subset of the data science community. Hence, if there is shortage of data scientists, there is even larger shortage of deep learning experts.
- Even though there is some work being done in explaining the how deep neural nets learn, neural networks are still black boxes. If you are interested in interpretability you are better off using linear regression, for example, or some other statistical model.
- Deep learning requires LOTS of tuning. Quite often you can get very good performance in an ML task just by using a random forest or XGBoost. If you are interested about the subject, you can find some excellent tips about tuning deep nets here.
So, it’s time to answer the question in the title of the article. Deep learning is an amazing tool, as long as:
- You are using it in the right context.
- Have the resources (data, computing, human expertise) to utilise it.
Deep learning is not a magic bullet and deep neural networks are not the type of model that magically works well out of the box. The latter description is more suited to random forests. Deep learning can be expensive and difficult to use, and if unless you are in a field where deep learning is the dominant force (e.g. object recognition in images), then you are better off using some other ML tool.