Wanna become a data scientist? Checkout Beyond Machine!

Rise of the data science architect

I believe there is a new role in data that businesses need to start taking into account, that of the data science architect.

What is a data science architect? It is a mix between a data scientist and a data engineer. Data science is (according to wikipedia):

Data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured,[1][2] which is a continuation of some of the data analysis fields such as statistics, machine learning, data mining, and predictive analytics,[3] similar to Knowledge Discovery in Databases (KDD).

The role of a data engineer is

A data engineer is a worker whose primary job responsibilities involve preparing data for analytical or operational uses. The specific tasks handled by data engineers can vary from organization to organization but typically include building data pipelines to pull together information from different source systems; integrating, consolidating and cleansing data; and structuring it for use in individual analytics applications.

data engineer

The data science architect (DSA) sits in between the two. The DSA deals with the design of the data collection, storage and analysis processes, while taking into account time and cost trade-offs and business requirements.

Some example problems are:

1) What variables should be stored?

This is mostly an early stage company problem which I have already discussed in my article about data science strategy.

2) What issues might arise regarding data quality?

Should additional measures be taken in order to ensure that the appropriate data is in place? What these measures can be and at what stage of the architecture (e.g. a data firewall, or filling missing values during the analysis).

3) What are the different options for a database, and which suits the company the best at this and at future stages?

Is it more important to go for a solution that makes storage easy, but it is is more difficult to query, or a relational database might be a better choice?

4) Are there any concerns regarding the choice of a database, programming language, the data being collected and different technologies?

E.g. A particular type of analysis might be easier to do with a library that exists only in R. However, there might not be anyone in the company that can use R, so a second best has to be found in Python. The DSA needs to decide on the best way to adapt and move forward.

data science architect

So, what does a data science architect do?

So, a DSA starts by analysing a company’s needs having the end goal in mind: using data to generate value. From that goal, the DSA designs the architecture and the analytics pipelines while taking into account appropriate time frames, and costs.

The DSA is a more relevant role for startups, since all startups that deal with data will have to make these decisions.

Now someone might argue that the DSA is not so much a separate role as it is a separate function within a data scientists repertoire. I think this could be right, but it is still important to stress out the existence of this function. A data scientist is valuable when the data is already in place. A data engineer does not have the appropriate skills and knowledge to design the architecture in a way that maximises value for the long run. A data science architect enters the scene in the early stage and then paves the way for the other two.

How to set up the right data strategy

Understanding how to best structure your data strategy, and the roles within an organisation is not an easy task, but a data science architect can be of great assistance. I have written in other articles about the importance of a data strategy and a data-driven culture.

Wanna become a data scientist? Checkout Beyond Machine!

Dr. Stylianos Kampakis is the owner and author of The Data Scientist.


narsing · March 31, 2017 at 5:50 am

Excellent knowledge sharing on Data Science.

Bobby Saint · January 4, 2018 at 6:59 am

Honestly, I don’t really know much about Data Science. Nevertheless, it’s good that I get to learn new things everyday. You mentioned that a data science architect is a mix between a data scientist and a data engineer. Whew! The job description must be really quite challenging. It’s good to know, though, that more and more people are getting interested in applying for this position. I will be reading more about the role of a data scientist and its primary job description for additional learning. Thanks.

Deep Suraj · November 20, 2018 at 10:59 am

What a fantastic read on Data Science. This has helped me understand a lot in Data Science course. Please keep sharing similar write ups on Data Science. Guys if you are keen to know more on Data Science, must check this wonderful Data Science tutorial and i’m sure you will enjoy learning on Data Science training.:-https://www.youtube.com/watch?v=M80IGWfgm4Q

Happy Hacking · June 6, 2019 at 4:34 am

What is the minimum years experience you are looking at for a person who would fit this role? There should be some demonstrable experience in Data Architecture, Stream Processing concepts and deciding on programming languages to be used, infrastructure to be used? This is like a dream role that I would like to get into where I will do everything possible as a software engineer I have ever dreamt of from the time I chose this career.

Dr. Stylianos Kampakis · July 30, 2019 at 11:52 am

I think this person should have at least 3 years of experience in building pipelines. Feel free to drop me a message to discuss more.

Comments are closed.