This post is coming from Yasmin Chamchoun. A career in data science is becoming highly sought after. With more and more people entering the field, what is that makes ‘data scientist’ such an appealing profession nowadays? Undoubtedly, the field can be very rewarding both financially and through the practical application of key skills. Also, the hands-on involvement in coding, programming and mathematics all sound very fulfilling. Yet, constantly being able to stay onboard with the latest developments in tools and frameworks, as well as the rapid changes in technological advancements, can be quite demanding- which can result in an overload of work and pressure.Here in this Q&A piece, I had the opportunity to interview two data scientists and one former chief data scientist who kindly agreed to share their thoughts on what inspired them to become a data scientist, as well as the pros and cons they experienced of being in the field.
COLIN FAY, Data Scientist and R Hacker at ThinkR. Founder of Data Bzh.
Website: https://colinfay.me
Github: https://github.com/ColinFay
Twitter: https://twitter.com/_ColinFay
- What inspired you to become a data scientist?
“The moment in my career when I decided to switch for data science was driven by two different things: – I was working in a company that was doing a lot of “manual analysis” — downloading excel files, copying and pasting, manual reporting, Excel charts and PowerPoint… I quickly realised it was the most efficient way to do it, so I started writing programs to automate these analysis.
The city I live in in France (Rennes), is a one of the pioneers when it comes to Open Data; it was actually the first French city after Paris to have an open data portal. Open source and Open data are very important to me, and I wanted to be part of that movement. But what I realised was that there were many datasets available, but not that much (if any) were reused. So I decided at that time to open a data-blogging website, where I posted data analysis of open source datasets. The blog worked pretty well, and I was one of the most prolific open data-user at that time, and I really enjoyed telling stories with data. I don’t have that much time now to continue that project but that has been an amazing experience.”
- What do you most enjoy about the job?
“Today I work as a Data Scientist and R Hacker at ThinkR. What we do is consulting, training, software engineering and R infrastructure. The first thing I love is that every new project is a new challenge. People have been collecting data for years, yet not until recently did we started to get value from this data, so a lot of what is needed are new tools and methods.
Also, data and infrastructure are always different, they are specific to each context and to each company we are working with. So there is not ready-made recipe when a new project comes up, we have to find new ways to work with data and technical infrastructure. That is something that is very rewarding on an intellectual level.
I also love the fact that my company is deeply rooted in the open source community, and so are a lot of the companies in the data science world. At ThinkR, we’re helping other companies to make the transition to R and to realise the power of open source. In return, we as a company try to give back as much as we can to the community, which is something that I value a lot. Especially, we’re working on making R tools more and more production-ready, and continually endeavour to make this language a legitimate analytics and data science tool in the enterprise world.”
- What would you consider to be the toughest thing about being a data scientist?
“I’d say that one of the toughest things is that the field is evolving on a daily basis, so you have to stay alert if you want to stay in the game. What you’ve learned a month ago might have changed today, and the way you’re used to do things will be different a year from now. But on the other hand, it’s also what is very existing about working in data science — new tools, news challenges, new languages, new framework… there is always something new to learn, and every day at work is a different day.”
- What advice would you give to those wanting to become data scientists?
“First — you can do it. Then, I would suggest to get yourself a data science portfolio. That can be a simple Github Page, a blog, an open book, packages… The strength of working in a field like data science is that a lot of what we do is built upon open source tools. Which means that, first, you can learn a lot by yourself, you don’t need to pay extra money for a licence for a software. What that also mean is that you can easily share what you know. Even more if you choose a language like R: the online resources are countless, the community is incredibly welcoming and will give you valuable feedbacks and advices on your work. So, build something online and share it.
Give back to the community that has given to you before. Find a topic that you love, and you can use it as an excuse to learn and practice Data Science.”
BOJAN TUNGUZ, Sr. Data Scientist – H2O.ai
LinkedIn: https://www.linkedin.com/in/tunguz/
Twitter: https://twitter.com/tunguz
- What inspired you to become a data scientist?
“My background is in science (Physics), and I have always been interested in trying to understand the physical world through data and modeling. When I discovered Data Science my eyes were opened to the possibilities of using my scientific backgrounds and modes of thinking to a whole variety of problems that seemed almost intractable.
I also always enjoyed technology and working with computers, so this combination of scientific thinking and computational approach had a lot of appeal to me. Within Data Science I particularly like to focus on Machine Learning and Predictive Modelling. There is something that feels almost miraculous about building computational models that can recognize an image, understand a piece of text, or predict the future sales of some product.”
- What do you most enjoy about the job?
“Machine Learning is an incredibly fast-moving field, and almost every day there is a new breakthrough, discovery, or a cool new tool that just got released. It is also a very applied field, and there is such a short lag between something being conceptualized or discovered, and when it’s possible to build a tool or a product with that knowledge.
The field is also filled with incredibly smart people who love to interact and share their knowledge. I am fortunate enough that I work at a place that recognizes such people, and being able to work with them is such a privilege and pleasure.
Data Science is a very general field, and its tools and techniques can be applied to a wide variety of problems. Traditional science is very siloed, and it’s virtually impossible for someone with specialization in one domain to be able to work in another. Data Science manages to transcend those silos. I’ve always had a wide variety of interests, and being able to use my new skills on very diverse sets of problems is amazing. One day I may work on detecting financial fraud or building better underwriting model, another day I am building the most sophisticated protein classification algorithm, and then after that I can switch to detecting toxic comments on the online discussion forums.”
- What would you consider to be the toughest thing about being a data scientist?
“Data Science is still a relatively new discipline, and it is still hard for people to understand what it is, what it can do, and what are its limitations. Oftentimes in an industry setting you are expected to do work that is better suited for other closely related disciplines – data engineering, software engineering, business analysis, statistics, dev ops, etc.
Also, because Data Science is so rapidly evolving it can be pretty hard to keep up with all the latest developments. One day you may feel like you have mastered, say, all there is to know for NLP modelling, and then just within a month or two several new tools and techniques would appear that would make all of your own knowledge either obsolete or commodified.”
- What advice would you give to those wanting to become data scientists?
“If, as is the case with most Data Scientists, you are trying to switch to Data Science from some other field, you should understand that it will take some time for you to become proficient enough to be able to perform well professionally. So give yourself time. Be patient, but also persistent.
Learn as much as you can, either by enrolling in a traditional academic program, or by learning as much as you can from a plethora of excellent online resources. Try creating a meaningful portfolio of projects. One option for this is to find datasets that you are interested in, and then develop projects around them. Another option, and this is what really worked for me, is to become very active on Kaggle and perform well in a few competitions. You don’t need to become a Kaggle Master or Grandmaster, but a consistent Kaggle track record, plus a few high-level finishes in competitions, can be a useful portfolio.”
ERIC LEBIGOT, Senior Scientific Advisor. Capital Fund Management. Former Chief Data Scientist.
Twitter: https://twitter.com/lebigot
- What inspired you to become a data scientist?
“A love for math since I was 6, for programming since I was 10, and for visualization since I was 12.”
- What did you most enjoy about the job?
“Understanding the meaning of the data well enough to find good ideas about how to make predictions with it.”
- What would you consider to be the toughest thing about being a data scientist?
“Fighting with data that is corrupted in some way (undocumented or badly documented features, incorrect values, etc.).”
- What advice would you give to those wanting to become data scientists?
“The best and most satisfying data science is when the features and the models you build are motivated by meaningful reasons (instead of being tried at random).”