Getting Started

This week we will demonstrate various techniques for visualising data in R using ggplot2. This will also include the correct interpretation and understanding of the different plotting techniques.

Note on resources: A lot of the content within this course is based on the open-source book Statistical Inference via Data Science: A ModernDive into R and the tidyverse and thus is a useful source for additional examples and questions.

In addition, the following websites are particularly useful for data visualization:

  • From Data to Viz leads you to the most appropriate graph for your data
  • R graph gallery lists hundreds of charts in several sections, always with their reproducible code available.

It is advisable to record the code you create throughout this lab as a record of your work in an .R script file by opening RStudio from the "Maths-Stats" folder on the lab desktop and going to File -> New File -> R Script. Save this file as, say, DAWeek1.R in your personal drive (e.g. H: or M:) or to a USB stick that you've brought with you (IMPORTANT: DO NOT navigate to ANY folders like "Documents" or "Desktop").

You will be required to write your own script in the "Further Tasks" section at the end of the session, but the given R code will help you engage with the material directly as you work through the following sections. Start by loading into R all of the libraries we will need for this session. This can be done by typing/copying the following code into your R script and "running" the code:

install.packages("ggplot2") #Only include if the package hasn't already been installed
library(ggplot2)
install.packages("nycflights13") #Only include if the package hasn't already been installed
library(nycflights13)

The first library ggplot2 allows us to use functions within that package in order to create effective data visualisations. This library is part of the tidyverse collection of R packages, a powerful collection of data tools for transforming and visualizing data, which we will use throughout this course. Many of the libraries within the tidyverse package have concise summaries of the key functions and arguments known at "cheat sheets". These can be accessed via the Data Analysis Moodle page or from RStudio directly. You are encouraged to familiarise yourself with the "cheat sheets" and have them on hand as you analyse data. You will have access to these "cheat sheets" in the class tests.

The second library nycflights13 contains data on flights from New York City in 2013 that we shall begin examining in the next section. You can get basic information on R packages using help(package = "packagename"), which can be applied to this library here: