Wanna become a data scientist? Checkout Beyond Machine!
By Yasmin Chamchoun
Data science is a new area of study that branches from disciplines such as statistics, computing, mathematics, software development, ML and many more. However, with the increasing demand of data scientists and their skillsets, should data science in fact be considered its own discipline?
A statistician usually comes from a background in statistics. A software developer would have studied software development. A mathematician most definitely studied mathematics at university. So, what about a data scientist?
Well, it’s not as simple to pinpoint it to one definitive field of study. But instead they tend to come from a range of academic backgrounds such as statistics, computer science, mathematics, software development or even a mix of all, but not specifically an academia in just data science as such.
Data science is more than just knowing how to use ML, statistics, data mining and business analytics. Or just having a strong skill set in coding, R and python. Dr. Ganapathi Pulipaka, Chief Data Scientist at Accenture describes data science as “an interdisciplinary field with a number of specializations in programming, software engineering, predictive analytics, machine learning, deep learning, HPC, supercomputing, mathematics, data mining, databases (SQL, NoSQL), Hadoop, streaming analytics platform for live analysis (Apache Kafka, Apache Flink, Apache Spark, Apache Impala), IoT platforms, edge computing, fog computing, networks, statistics, web development, cloud computing, data engineering, and data visualization.”
Some may consider data science as just an industry-term, where instead people generally prefer to follow the teaching from their own area of academia and study, whether it was statistics, software development, machine learning and so on.
On the other hand, Dr. Ganapathi Pulipaka argues that “data science is no longer an industry-term. The academia has been training these professionals with rigorous projects to produce top-notch data scientists in the country and around the world. It takes more than 70 units of credit and four to five years or even longer to complete the academic work to implement algorithms in the real-world. The core curriculum offered in the doctoral programs is very intensive with research projects and foundational mathematics, statistics, and programming.
The definition of being a data scientist is holding concrete foundation as all the PhD candidates in machine learning, Computer Science, robotics, big data analytics, statistics have a special level of expertise.”
Thoughts on whether data science should become its own distinct discipline to do tend to differ slightly between data scientists, statisticians, software engineers and university academics.
Dr. David Coloquhen, Professor of Pharmacology NPP at UCL and Statistician states that: “Data science is clearly a branch of statistics. I am worried that “data scientists” will unearth a mass of non-causal correlations, which will waste a lot of time and money and, very possibly, harm people. To avoid that, anyone dealing with data needs a very strong understanding of statistics.
The mechanics of data collection can easily be mastered by a statistician, but a data harvester will find it much harder to get a good understanding of statistical inference.”
As we can see the importance of statistics in data science is reinforced by many, especially statisticians. On the contrary, there are others such as some data scientists, that say the statistical side of things is not in fact the core of data science but instead just an element of it. Dr. Ganapathi Pulipaka, Chief Data Scientist at Accenture said:
“If you take any statistician, the whole term data science appears to be patronizing. I’m not trying to discredit the field of statistics or statisticians. However, it’s vitally important to remember that statistics is only a branch of greater data science and it’s not the data science by itself. 95% of the people with the titles data scientists are not data scientists, they could be statisticians. It’s not about writing theoretical statistical equations on operational research. It’s about applying that to any special branch of machine learning, for example, reinforcement learning in a programming language such as PyTorch, TensorFlow, Python, or R. Data science is not pure statistics.
I can’t bring a data scientist to my organization in my team who knows what is Naïve Bayes algorithm, but has no clue on how to apply that in Python or R programming language to solve a particular problem.”
There are in fact others that identify themselves as both a statistician and data scientist, who actually believe that there is a possibility that data science could be narrowed down into individual disciplines. Ida Peltonen, Statistician/ Data scientist at OECD Paris states:
“I have worked in roles emphasizing the statistical aspects as well as the data science aspects. I see data science mainly as a set of tools. With topics such as economic growth, labour market polarization and automation, I would not be a very good data scientist without having a degree in economics and in-depth knowledge of statistics though. What you should and should not do with the data, also varies between fields, so in that sense, you do need to follow the principles of your field.
On the other hand, building your data scientist capabilities is always useful. It enables changing the context you are working on, because the methods are universal. Nevertheless, you still always need to learn the context or have someone to give you the specifications. I think there is a need for data scientists that are mainly focusing on the methods, but data science should also be diffused into individual disciplines more. The data is not what it was even in the recent past and we also want and need to be more and more efficient.”
Antonio Cangiano, Software Developer & AI Evangelist at IBM and author of ‘Technical Blogging’ also holds an interesting perspective on this matter. He says, “I like to consider data scientists on the basis of the work they perform, rather than their academic credentials.
Right now, I would argue that web developers can get away with significantly less theoretical knowledge. This is due to the nature of the work (very little computer science is required for most web development) and the fact that web development has been around for much longer (at least when compared to modern incarnations of data science).
A lot of the computer science required by web developers has been abstracted away into libraries and frameworks. Web programming today is a game of gluing together heterogeneous pieces of technology.
Data Science is asymptotically on a similar path, but data analysis, machine learning, and deep learning are significantly more mathematical by nature. So there is a limit to how much theory can be skipped before “a little knowledge” starts to be dangerous.
Also, although web developers benefit from a deeper theoretical understanding as well, data scientists who are not familiar with, say, the basics of statistics will face some serious challenges on the job.”
Most recently, a number of UK universities have begun to offer academic degrees in Data Science. In London for example, The London School of Economics, City University, UCL and Kings College London all offer a Masters degree in Data Science. The University of Manchester is also offering a Masters in Data Science.
Furthermore, there are also several short-courses in data science available both online and on site, which promise to equip you will all the relevant skills needed to pursue a career as data scientist.
Antonio Cangiano, Software Developer & AI Evangelist at IBM and author of ‘Technical Blogging’ believes it’s great that nowadays we have unlimited access to online data science courses, as they can in fact be very useful in teaching the tools required to be a successful data scientist.
He states, “that statistical knowledge doesn’t have to come from a university, however. Online courses are often free and available to everyone. In fact, courses that combine the theoretical background and the practical application of that knowledge tend to form data scientists who can really get things done. (For examples of this, check out IBM courses on CognitiveClass.ai, Coursera, or edX.)
Having a Ph.D. in applied mathematics will never hurt, of course, but it’s by no means a requirement for most data science positions. I foresee this flexibility will also enable a much more inclusive data science world. Something we can all hope for.”
With all that being said, it seems as though it’s still too soon to envisage whether or not data science will one day be considered as its own discipline, or whether it will somewhat remain consumed by its elder disciplines.