Viewing The Data
Before visualising any data set, we first need to know its contents. For example, the contents of the flights
data within the nycflights13
library can be observed using the following command:
This prints to the R console the first n = 3
rows of the flights
data set, displaying each of the variables within said data set. We now know the data set contains 19 variables, as well as their names. A quick check on the size of a data set can be obtained using:
## [1] 336776 19
which displays the dimensions of the data set. Thus, here we have 336776 rows and 19 columns worth of data.
To reduce the amount of data we will be working with and make things a little easier, let's only look at Alaska Airlines flights leaving from New York City in 2013. This can be done by subsetting the data in such a way that we only observe flights from Alaska Airlines (carrier code AS), as follows:
This essentially picks out all of the rows within the flights
data set for which the carrier code is AS
and discards the rest, thus creating a new data set entitled Alaska
.
Task: Write code to observe the first 5 rows of the Alaska
data.
You may want to use the head
function.
## # A tibble: 5 × 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int> <int> <int> <int> <int> <dbl> <int> <int>
## 1 2013 1 1 724 725 -1 1020 1030
## 2 2013 1 1 1808 1815 -7 2111 2130
## 3 2013 1 2 722 725 -3 949 1030
## 4 2013 1 2 1818 1815 3 2131 2130
## 5 2013 1 3 724 725 -1 1012 1030
## # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## # tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## # hour <dbl>, minute <dbl>, time_hour <dttm>
Alaska
data set?
Check the the dimensions using the dim
function.
Next week we will look at more sophisticated ways of manipulating data sets. Now, let us go on to look at different visualisations of our Alaska
data set using ggplot2
, starting with scatterplots.