Wanna know more about data science? Make sure to check out my events and my webinar What it's like to be a data scientist and What’s the best way to become a data scientist !

Data science is ever evolving, and with so many things going on it can be difficult to keep track of all the new libraries and algorithms. This is when it can be really useful to have a reference guide to help you out. Continuing from our series about cheatsheets, in this post I provide some more very useful cheatsheets for data science.

If you want to acquire data science skills, also make sure to check out some of my courses.

Python data science cheasheets

Python for data science: Python basics

This great cheatsheet from Datacamp is going to be extremely useful for any people learning Python for data science. All the basic commands, from list manipulation to numpy arrays are there.

Keras cheasheet

Keras is a great and easy-to-use deep learning library for Python. It is easier to get started in deep neural networks with Keras, rather than it is with Tensorflow directly. This cheatsheet contains some quick recipes to create the most basic neural network types.

Data visualisation in Python

A great data scientist should also be a great communicator, and quite often there is no better tool to do that, than a visualisation. This cheatsheet covers some of the basics of visualisation in Python using matplotlib and seaborn.

The scikit-learn flowchart

scikit learn data science cheatsheet

I don’t think that any data science cheatsheet article is complete without a reference to the famous scikit-learn flowchart for choosing the right machine learning model. This amazing cheatsheet shows you how to choose the right machine learning model depending on your task and the number of rows and features.

Text cleaning in Python

Every good data scientist should know how to do natural language processing. This cheatsheet presents some very good tips and tricks for cleaning up text.

R data science cheatsheets

The R reference card

This is the go-to cheatsheet for all basic R commands. Provides a good coverage of all the native R commands from plotting, to installing packages, to manipulating vectors. Good for beginners, but even some experienced R users might find it useful.

Data transformation with dplyr

Dplyr is one of the most popular packages in the tidyverse. It can get a bit confusing for beginners, this is why it is useful to have a cheatsheet as a reference guide.

Visualisation with ggplot2

example ggplot2 plot

Ggplot2 is best way to produce visually pleasing plots in R. While the traditional plotting capabilities of R are good, the plots produced do not look that great, plus they are not very flexible. Ggplot2 improves upon all that, but it can be a bit daunting for the uninitiated. This cheatsheet provides a great overview of ggplot2 commands and syntax.

The caret package

The caret package provides an easy way to do machine learning in R. It provides a wrapper over many other machine learning R libraries and has utility functions for running cross-validation and cleaning up data. This cheatsheet is a good way to get started using caret.

R reference card for data mining

This very useful cheatsheet contains a high level overview of functions and associated packages in R for data mining. From data manipulation, to big data and parallel computing, this cheatsheet covers a variety of use cases.

 


Wanna know more about data science? Besides my events, you should check out my webinars:
  1. If you want to learn data science: What it's like to be a data scientist and What’s the best way to become a data scientist
  2. If you are a CEO: The importance of data strategy


Dr. Stylianos Kampakis is the owner and author of The Data Scientist.