Something I have been asked in the past is the extend to which data science can be automated. We discussed in a previous post how machine learning and AI are automating other parts of the economy. However, is it possible to actually automated data science?
The answer is not a definite ‘yes’, but I believe it is quite likely we can get close to it. There have been attempts in the last few years to automate data analysis. Notable mentions include:
- Cambridge’s Google-funded automatic statistician
- Wolfram’s Data Science Platform
- MIT’s Data Science Machine
I have also stumbled upon a few companies such as DataRobot and Nutonian. Also, NIPS 2016 had a workshop on the same topic. DARPA seems to be catching up too, announcing a program called D3M.
There are things that can’t be automated yet such as domain knowledge, or human intuition, but it seems that a large part of the data analysis process can be automated to a large extent. I believe we are going to see huge progress in this field within the next decade.
Note that these projects are far from anything related to general artificial intelligence. They are simply using domain knowledge and heuristics. However, this is enough to automate a large part of a data scientist’s work and increase productivity.
On my side, I have been working on a project called ADAN. ADAN stands for the Automated Data ANalyst. ADAN can work with classification and regression problems. Once you load the data it automatically cleans it and formats it. From there, it supports two modes of operation:
- Automated black box predictive modelling: ADAN tries out different solutions delivering the best one.
- Automated Equation Modelling: ADAN discovers one or more simple regression equations for solving the problem.
Automated Equation Modelling is particularly useful when you need something that helps you understand your data better and is easy to implement. For example, here are the results on the MPG dataset. ADAN produces 3 equations of different degrees of complexity, as seen below, while also filtering out most variables and keeping only 3 of them.
These equations can be very easily implemented in any language, and indeed ADAN supports equation extraction in different languages (including, R, Python, Java, C++, C# and PHP). ADAN can also be very easily deployed as a web API. ADAN is in version 1.0 and ready to be used or deployed in your enterprise, so if you are interested in using it just drop me a line.