This four week course serves as an introduction to the R software and lays the foundation for anyone who would like to begin studying data science and its applications, or anyone who would like to take more advanced courses related to data science, such as machine learning and computational statistics.
Why R?
There are two tools we’ll need to install right away that are essential for this course: R and RStudio.
R is an environment for statistical computing. It allows for robust data analysis and is the industry-standard tool in the field of data science. R can be accessed at https://www.r-project.org
In addition to R, you will be asked to download RStudio. You might be wondering, “What is the difference between R and RStudio?” RStudio is a program that allows for program development in R. RStudio requires the use of R. The reason that both installations are necessary will become more apparent later in the course. RStudio can be accessed at https://www.rstudio.com
Once you have the software you’ll need to complete this course, the next thing you will need to do is download data. For the purpose of this course, we will be using airline data. Instructions on how to obtain this data is provided in the video. You will need R and RStudio to make sense of the data!
You can also download the related R files which contain all the R scripts on the course videos. There are 4 related R files. Open the link http://llc.stat.purdue.edu/futurelearn, then find the R files and download them. Their file names are “IndyFlights.R”, “closerlook.R”, “moreconcepts.R” and “tapplyfunction.R”.