• Module From: Learning Hub


    This four week course serves as an introduction to the R software and lays the foundation for anyone who would like to begin studying data science and its applications, or anyone who would like to take more advanced courses related to data science, such as machine learning and computational statistics.

    Why R?

    There are two tools we’ll need to install right away that are essential for this course: R and RStudio.

    • R is an environment for statistical computing. It allows for robust data analysis and is the industry-standard tool in the field of data science. R can be accessed at https://www.r-project.org

    • In addition to R, you will be asked to download RStudio. You might be wondering, “What is the difference between R and RStudio?” RStudio is a program that allows for program development in R. RStudio requires the use of R. The reason that both installations are necessary will become more apparent later in the course. RStudio can be accessed at https://www.rstudio.com

    Once you have the software you’ll need to complete this course, the next thing you will need to do is download data. For the purpose of this course, we will be using airline data. Instructions on how to obtain this data is provided in the video. You will need R and RStudio to make sense of the data!

    You can also download the related R files which contain all the R scripts on the course videos. There are 4 related R files. Open the link http://llc.stat.purdue.edu/futurelearn, then find the R files and download them. Their file names are “IndyFlights.R”, “closerlook.R”, “moreconcepts.R” and “tapplyfunction.R”.

    Data science

    Syllabus/Suggested Schedule

    To view any lecture, just click on them and view it in the player above