Wanna know more about data science? Make sure to check out my events and my webinar What it's like to be a data scientist and What’s the best way to become a data scientist !
The Bank of England recently released a very interesting report about the use of machine learning in UK’s finance services. One of the things that stand out in this report is that the UK financial sector is adopting using machine learning more and more. Actually, the penetration of artificial intelligence and machine learning in banking extends well beyond the UK.
For example, JPMorgan told investors it had gone “all in on AI.” Also, HSBC opened data science innovation labs in Toronto and London, Citigroup is using AI to fight fraud, Bank of America has an AI-powered customer service bot, and Capital One says it uses AI in all its operations.
The steps in a machine learning pipeline
The report provides a very nice summary of the different steps in a machine learning pipeline. There are different frameworks regarding of data science and machine learning pipelines. A popular one, for example, is CRISP-DM, whereas the scikit-learn community has come up with its own flowchart. In any case, the pipeline presented by the report is worth reposting. According to it, a machine learning pipeline consists of these 4 steps:
- Feature selection and engineering: Choosing the most relevant variables and creating derived ones (including for example through dimensionality reduction) (section 5.2);
- Model engineering and performance metrics (section 5.3): Model selection, optimisation of model parameters and model analysis (evaluation of model performance);
- Model validation (section 5.4): Testing if the model works as expected, which includes among other things the interpretation of how the model works;
- Deployment and safeguards (section 5.6): Implementing the model in the business and setting up safeguards to manage potential risks.
The machine learning pipeline looks like this:
The data acquisition is the most important step, since without the right data, it is impossible to create models of good enough quality. The report provides a very nice overview of the different types of data: structured, semi-structured and unstructured.
There is a common misconception that only structured data is really useful. However, we have so many great algorithms for unstructured data, that this is no longer the case. For example, in domains where the data is predominantly semi-structured or unstructured, like social media, there is a wealth of information.
Financial institutions have picked up on this trend with 2/3 of the institutions reporting that they are using semi-structured or unstructured data in their machine learning projects.
Another interesting finding of this report is the different ways through which financial institutions validate the pipelines. Quoting the report:
The most common method is outcome-focussed monitoring and testing against benchmarks, both before and after deployment. This enables firms to scrutinise how ML models would have performed historically in terms of profitability, customer satisfaction or pricing, for example. Data quality validation — including detecting errors, biases and risks in the data — is the next most frequently used method.
Here is a summary of the different kinds of methods that can be used to verify a model. A benchmark is by far the easiest performance test that can be done, and the results are easy to communicate to the upper management. Understanding the features and the quality of the data are very important as well, but probably, these techniques are only understood well by the technical teams.
The future lies in machine learning
The financial industry is one of the most data rich and sophisticated industries. It makes sense, that as time goes by, AI and machine learning will penetrate into this sector more and more. The greatest barrier to adoption of machine learning methods seems to be the existence of legacy systems. This means that many firms want to change, but they just find it takes a long amount of time.
The second most common barrier to adoption, is having the right culture. This is something that we have talked about in the past many times. The right data culture can often be the main difference between the organisations that successfully adopt data science and those that don’t. If you want to learn more about this topic, make sure to check out my courses and my book.