Linear Modeling in R

Kelsey Martinez, PhD, Researcher
April 1, 2020


I am sharing my cheat sheet and general workflow for getting started with linear modeling in R. This sample workflow uses the Iris dataset available from Cran to demonstrate simple linear modeling.

R excels at statistical modeling, but it can be difficult to know where to start if you’re new to the field. Pointers for linear models, generalized linear models, and mixed models are included in the cheatsheet (link). A simplified overview of her workflow is provided below:

  1. Mise en place. Clean your data and get to know it well. A variety of libraries are available for data cleaning in R. Understand what types of data you are working with. Categorical, continuous, count, and ratio are all common data types. Be sure to check for missing values, as linear models cannot incorporate these data rows. Check for autocorrelation between variables using correlation matrices. Choose which variables you will keep should autocorrelation be present. I find basic data visualizations such as boxplots, scatterplots, and bar charts to be very helpful at this stage of analysis.

  2. Pick your dependent (response variable) and independent variables (predictor variables) based on the question(s) you need your model to answer.

  3. Finalize model type (linear, generalized linear, mixed, etc.) and run the model!

  4. Interpret and/or transform model parameters, consider any parameter interactions that may be present in your data, and perform any necessary post-hoc tests.

The full workflow and cheatsheet document can be found here.