Chapter 2 Datasets
2.1 dslabs
There’s 26 datasets already set up in R for you. To use the datasets in this package you need to do the following kind of steps:
# Install the package
install.packages("dslabs")
library(dslabs)
For the dataset admissions
for example, you first need to prep it with data()
function and then you can call it as an object.
data(admissions)
admissions
2.2 FiveThiryEight Data
There’s a variety of datasets available as CSVs.
Go to the GitHub
When you find the dataset you are interested in, click on that folder and then click on the data file you are interested in.
Then click Raw
.
Copy and paste the URL that that brings you to.
For example, for the airline-safety
dataset, the URL would look like this:
https://raw.githubusercontent.com/fivethirtyeight/data/master/airline-safety/airline-safety.csv
Then you can use that URL in a read_csv function like this:
<- readr::read_csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/airline-safety/airline-safety.csv") airline_safety
2.3 Open Case Studies
Open Case studies is a set of datasets and example tutorials of datasets that have to do with public health.
Click on “Static Version” in the table next to the dataset you are interested and you’ll see a full Rmd example of how to use these data!
2.4 Kaggle
Kaggle has a variety of datasets that are generally available as CSV. You’ll have to make a login before you download, but you can use a Google login.
To download a dataset, you can browse them and then when you find one you like, you can download the CSV. It will download it in a zip file. Upload this to your RStudio server using the Upload button.
Then you will see a CSV included in that data file and you can use readr::read_csv()
to read in the file.
For example, for the Data Science Job Salaries dataset, you can click Download
and then upload the resulting zip file. Then you can read in the file like:
::read_csv("ds_salaries.csv") readr
2.5 TidyTuesday
TidyTuesday is a community weekly dataset that is digested and talked about by the R community. This has been going on for years so you can pull from previous years datasets and see lots of examples from others about them.
You can find a dataset and read it in directly like this (click on the folders and the files then click on the raw
form of it in GitHub and copy that link.
::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2018/2018-04-23/week4_australian_salary.csv") readr
Or download it and put it in your Posit.Cloud project.