Chapter 2 Datasets

2.1 dslabs

There’s 26 datasets already set up in R for you. To use the datasets in this package you need to do the following kind of steps:

# Install the package
install.packages("dslabs")
library(dslabs)

For the dataset admissions for example, you first need to prep it with data() function and then you can call it as an object.

data(admissions)
admissions

2.2 FiveThiryEight Data

There’s a variety of datasets available as CSVs.

Go to the GitHub When you find the dataset you are interested in, click on that folder and then click on the data file you are interested in. Then click Raw. Copy and paste the URL that that brings you to. For example, for the airline-safety dataset, the URL would look like this:

https://raw.githubusercontent.com/fivethirtyeight/data/master/airline-safety/airline-safety.csv

Then you can use that URL in a read_csv function like this:

airline_safety <- readr::read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/airline-safety/airline-safety.csv")

2.3 Open Case Studies

Open Case studies is a set of datasets and example tutorials of datasets that have to do with public health.

You can find them here.

Click on “Static Version” in the table next to the dataset you are interested and you’ll see a full Rmd example of how to use these data!

2.4 Kaggle

Kaggle has a variety of datasets that are generally available as CSV. You’ll have to make a login before you download, but you can use a Google login.

To download a dataset, you can browse them and then when you find one you like, you can download the CSV. It will download it in a zip file. Upload this to your RStudio server using the Upload button.

Then you will see a CSV included in that data file and you can use readr::read_csv()to read in the file.

For example, for the Data Science Job Salaries dataset, you can click Download and then upload the resulting zip file. Then you can read in the file like:

readr::read_csv("ds_salaries.csv")

2.5 TidyTuesday

TidyTuesday is a community weekly dataset that is digested and talked about by the R community. This has been going on for years so you can pull from previous years datasets and see lots of examples from others about them.

You can find a dataset and read it in directly like this (click on the folders and the files then click on the raw form of it in GitHub and copy that link.

readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2018/2018-04-23/week4_australian_salary.csv")

Or download it and put it in your Posit.Cloud project.