Wanna become a data scientist? Checkout Beyond Machine!
Data science rules the world
It’s impossible these days to be in the world of tech and not hear about data, AI and machine learning. Everyone is talking about things like data science, using data to drive their decisions, improving their efficiencies and increasing their sales using insights generated from data, and so on.
You know there has to be something to it because you’ve seen the results data-driven companies have generated. Google, Apple, Microsoft, all of the big players are investing into AI, with early adopters of AI getting ahead of their counterparts according to this MIT report. It looks like data science is the future, and you’re considering getting in on the action. After all, you’ve heard that all it takes is collecting some data and hiring a good data scientist and you’re golden.
It can be as simple as that, but only if you want to waste time and money. You need to have at least a cursory understanding of what data science is, what makes a good data scientist, and how to hire a good one if you really want to make the most of it. Let’s start with a quick look at the history of data science.
The History of Data Science at a Glance
Data science is an interdisciplinary field that encapsulates fields such as statistics, artificial intelligence, machine learning, mathematics, computer science, and more.
Data science developed in response to the huge amounts of data humans generate. In 2017 alone, we generated more data than over the previous five millennia of our history. The sheer volume of data we produce led to the realisation that we needed an effective approach to taking that data and doing something useful with it.
Humans have been analysing information in some shape or form for hundreds of years. For example, statistics, which is a core field of data science, traces its origins back to the 18th century. However, technological progress has allowed us to become much more efficient at data analysis, and has given birth to other fields such as machine learning, AI and deep learning. Data science is, in essence, a new approach to data analysis that integrates all of these (and many more) approaches together.
To really discuss the history of data science, I’d have to go into the history of its core fields, namely artificial intelligence, machine learning, and statistics. Unfortunately, that is beyond the scope of this article, but if you are interested, I’ve written a book in which you can learn more about this history.
Artificial Intelligence Versus Machine Learning Versus Statistics
I’ve already mentioned that the core fields of data science are artificial intelligence, machine learning, and statistics. Some people mistakenly believe that artificial intelligence and machine learning are pretty much the same thing. Likewise, they aren’t certain how statistics fits into it all.
Machine learning is that subfield of artificial intelligence, but it takes a completely different approach. In classic artificial intelligence, computers were given a rulebook and they had to follow those rules to reach the results. The problem was that a lot of human input was necessary. Also, these algorithms couldn’t handle uncertainty very well. So, problems which are naturally characterised by lots of uncertainty, like predictions in finance, or the weather, cannot be really handled by classic AI.
Conversely, machine learning, focuses on teaching computers to learn from data they are given. Instead of providing it with the rules, the machine creates its own rules by analysing the data and learning from it. In the era of big data, this makes perfect sense.
Statistics is a branch of mathematics that helps develop and study methods for the collection, analysis, interpretation, and presentation of data. It is also the oldest kid on the block of data analysis, with the first studies in statistics having been conducted in the 18th century.
Statistics is similar in a sense to artificial intelligence in that it is far more rigid and rule-based than machine learning. In statistics everything has to be transparent, validated, and verified. Conversely, machine learning is more of the do it now, ask questions later variety. This is because computing power has advanced so far that machine learning can try thousands of different models before finding one that works. Attempting to do this in statistics would’ve been impossible a mere half century ago, which is why it is so rigid.
However, artificial intelligence, machine learning, as statistics all have their place in data science. They are vital tools that the data scientist will apply as the situation requires it. For example, if you’re looking for predictive power, then machine learning is the best option. However, in terms of interpretability, statistics is much more effective.
None of these however deal with the problem of creating a general artificial intelligence, a machine truly capable of thinking for itself, which was the original goal behind the creation of AI. This is a separate problem and an active field of investigation.
There’s far more we could discuss about each of these and I urge you to check out my book “The Decision Maker’s Handbook to Data Science”. Don’t worry, it’s not technical at all, because I specifically wrote it so that entrepreneurs, business owners, and other non-technical stakeholders can better understand how to most effectively take advantage of data science. For any questions or comments feel free to reach out to me.
So, what are the core fields of data science?
The core fields of data science are:
1) Machine learning
3) Artificial intelligence