Skip to content

The Data Scientist

Doing linear regression the right way

Linear regression is without question the most famous statistical algorithm. It is often the first algorithm that is being taught in machine learning courses and it is surprisingly effective in a huge range of problems. It has a variety of nice properties, such as the fact that the coefficients have a clear interpretation.

However, linear regression is many times misused. The two most common problems I’ve seen in practice are:

  1. Not checking that the assumptions of the model are true.
  2. Not doing any kind of diagnostic checking.

This dashboard follows the ideas set out by my article on data science protocols. It conducts linear regression, along with various proven and trusted diagnostic tests in order to discover any issues with the model. This provides a safe way to run and use linear regression. The dashboard tests for:

  1. Multicollinearity
  2. Global assumptions (skewness, kurtosis, link function and heteroskedasticity).
  3. Outlier detection
  4. Model diagnostics

Let me know about any feedback which you might have.

Link to dashboard: https://stylianos-kampakis.shinyapps.io/linear_regression/