Chapter 38 Cleaning Data Project

In this project, we are going to try to answer this question with data:

     

“Across the countries of the world, how is literacy related to life expectancy?”

To answer our question, we’ll need a dataset that has literacy information by countries, as well as a dataset that has life expectancy statistics for different countries.

We already downloaded the data for you from Kaggle.

38.0.1 Starting up this project

  1. Go to the DataTrail workspace.
  2. Return to your own DataTrail_Projects project in RStudio.
  3. For this project, go to the 03_Cleaning_Data folder.
  4. Click on the file countries_project.Rmd to open this file.

38.0.2 Your objectives!

To complete this project you’ll need to do a few things within the countries_project.Rmd file.

  1. Go through this notebook, reading along.

  2. Fill in empty or incomplete code chunks when prompted to do so.

  3. Run each code chunk as you come across it by clicking the tiny green triangles at the right of each chunk. You should see the code chunks print out various output when you do this.

  4. At the very top of this file, put your own name in the author: place. Currently it says "DataTrail Team". Be sure to put your name in quotes.

  5. In the Conclusions section, write up responses to each of these questions posed here.

  6. When you are satisfied with what you’ve written and added to this document you’ll need to save it. In the menu, go to File > Save. Now the nb.html output resulting file will have your new output saved to it.

  7. Open up the resulting countries_project.nb.html file and click View in Web Browser. Does it look good to you? Did all the changes appear here as you expected?

  8. Upload your Rmd and your nb.html to your assignment folder (this is something that will be dependent on what your instructors have told you – or if you are taking this on your own, just collect these projects in one spot, preferably a Google Drive)!

  9. Pat yourself on the back for finishing this project!