Skip to main content

Introduction to R for Data Science




Overview

 

This four week course serves as an introduction to the R software and lays the foundation for anyone who would like to begin studying data science and its applications, or anyone who would like to take more advanced courses related to data science, such as machine learning and computational statistics.

Why R?

There are two tools we’ll need to install right away that are essential for this course: R and RStudio.

  • R is an environment for statistical computing. It allows for robust data analysis and is the industry-standard tool in the field of data science. R can be accessed at https://www.r-project.org

  • In addition to R, you will be asked to download RStudio. You might be wondering, “What is the difference between R and RStudio?” RStudio is a program that allows for program development in R. RStudio requires the use of R. The reason that both installations are necessary will become more apparent later in the course. RStudio can be accessed at https://www.rstudio.com

Once you have the software you’ll need to complete this course, the next thing you will need to do is download data. For the purpose of this course, we will be using airline data. Instructions on how to obtain this data is provided in the video. You will need R and RStudio to make sense of the data!

You can also download the related R files which contain all the R scripts on the course videos. There are 4 related R files. Open the link http://llc.stat.purdue.edu/futurelearn, then find the R files and download them. Their file names are “IndyFlights.R”, “closerlook.R”, “moreconcepts.R” and “tapplyfunction.R”.

 

Module Author

Keywords
Data science

Syllabus/Suggested Schedule

To view any lecture, just click on them and view it in the player above

 


Week 1: How to Use R to Extract and Analyze Data

  • Install R and R Studio for Windows
  • Install R and R Studio for Mac
  • Run R Studio for the First Time
  • Downloading Airline Data for Windows
  • Download Airline Data on Mac
  • Import Data into R for Windows
  • Import Data into R on Mac
  • Extracting the Head and Tail of a Data Set
  • Identifying Properties
  • Extracting Flight Data with a Common City of Origin
  • Analyzing the Departure Times of Flights
  • Annotating R Code with Comments

Week 2: Introduction to Plotting in R and Finding Common Patterns

  • Intro to Week 2 Introduction to R for Data Science
  • Introduction to Plotting in R
  • Identifying the Most Popular Flight Paths
  • Introduction to the Tapply Function
  • Arrival Delays by Day of the Week
  • When Should You Fly?
  • Arrival Delays by Day of the Year
  • Arrival Delays by Flight Path

Week 3: Verifying Data, Indices to Tables, Sorting and Summarizing Variables

  • Relate the Data to a Place You've Been
  • Identifying the Most Popular Airports
  • Tools for Verifying
  • Using Airport Codes
  • Analysis of On-Time and Early Departures
  • Revisiting a Plot in R
  • Analyzing Flights by Origin Airport and Month of Departure
  • Leaving a Specification Blank
  • Calculating Percentages of Flights with Long Delays
  • Analyzing Flights by Time of Day for Departure
  • Analyzing Flights by Time of Day for Departure and Origin Airport

Week 4: Assembling and Storing Data & Creating and Applying Functions

  • Assembling Multiple Years of Airline Data
  • Efficiently Storing Origin-to-Destination Flight Paths
  • Visualizing Flight Paths
  • Incorporating Auxiliary Data about Airports
  • Revising Visualizations of Flight Paths
  • Identifying Airports with Commercial Flights
  • Creating and Applying Functions Built by the Learner
  • Incorporating Tail Numbers from Airplanes
  • Changing Month Numbers to Month Names

Copyright © Purdue University, all rights reserved. Purdue University is an equal access/equal opportunity university.

Contact the College of Science at sciencehelp@purdue.edu for trouble accessing this page. Made possible by grant NSF CCF-0939370