• About this Course
  • Introduction
  • 1 Intro: Welcome to DataTrail
    • 1.1 Learning objectives for this section
      • 1.1.1 Course Details
      • 1.1.2 What is data science?
      • 1.1.3 What is DataTrail?
      • 1.1.4 Who is this program for?
      • 1.1.5 How the program is organized
      • 1.1.6 How DataTrail is graded
  • 2 Pre DataTrail Survey
  • 3 Finding Help
    • 3.0.1 Searching the Internet
    • 3.0.2 Search Guidelines
    • 3.1 Using AI ChatBots
  • 4 Understanding Your Chromebook
    • 4.1 What is a Chromebook?
    • 4.2 Chrome OS
      • 4.2.1 Chrome “Apps”
      • 4.2.2 Where do Chromebook Apps live?
      • 4.2.3 The Shelf
      • 4.2.4 Websites as Chrome Apps
      • 4.2.5 The Chrome Web Store and Chrome Apps
      • 4.2.6 Chrome Extensions
    • 4.3 Chrome Browser
    • 4.4 Android Apps
    • 4.5 Chromebook Shortcuts
      • 4.5.1 Taking Screenshots on a Chromebook
      • 4.5.2 Zooming in and out
      • 4.5.3 Finding a word on a page
      • 4.5.4 Text formatting shortcuts
    • 4.6 Chromebook Security
      • 4.6.1 Setting your Chromebook password
      • 4.6.2 Requiring password to wake from sleep
      • 4.6.3 Managing other people
      • 4.6.4 Two step verification
      • 4.6.5 Managing apps and devices
    • 4.7 Updating your Chromebook
      • 4.7.1 Steps to Update
    • 4.8 Getting Help with a Chromebook
      • 4.8.1 Common Issues
      • 4.8.2 Chromebook Support
  • 5 Working Offline vs Online
    • 5.0.1 Cloud computing
    • 5.0.2 What are the advantages of using the cloud?
    • 5.0.3 What are the disadvantages of using the cloud?
    • 5.0.4 Google Drive
    • 5.0.5 Example: editing a text file offline
    • 5.0.6 Offline Apps
    • 5.1 Where Are Files Stored?
      • 5.1.1 Where is the local storage?
  • 6 Account Setup
    • 6.0.1 Choosing a Username
    • 6.0.2 Using a Consistent Username
    • 6.0.3 Accounts
    • 6.1 Google Account Setup
      • 6.1.1 Getting a Google Login
      • 6.1.2 Log Off Guest Chromebook
      • 6.1.3 Re-Login using Google Account
    • 6.2 Using Gmail for Email Communications
      • 6.2.1 Accessing your Gmail Inbox
      • 6.2.2 Composing and Sending Emails
      • 6.2.3 Replying to a message
      • 6.2.4 Searching for specific messages
      • 6.2.5 More specifics on using Gmail
    • 6.3 Google Drive
      • 6.3.1 Accessing your Google Drive account
      • 6.3.2 Organizing files in Google Drive
      • 6.3.3 Creating files in Google Drive
      • 6.3.4 Uploading files to Google Drive
      • 6.3.5 Deleting Files and folders
      • 6.3.6 File Recovery
      • 6.3.7 File Sharing
      • 6.3.8 Working Offline
      • 6.3.9 Storage limit
      • 6.3.10 Security
      • 6.3.11 More specifics on using Google Drive
    • 6.4 Other Accounts Setup
      • 6.4.1 LinkedIn Account
      • 6.4.2 Twitter Account
      • 6.4.3 Basecamp Account
      • 6.4.4 Posit Cloud Account
  • 7 Remote work and task management
    • 7.0.1 Goals for working remotely:
    • 7.1 How to be healthy
      • 7.1.1 Define success
      • 7.1.2 Form healthy habits
      • 7.1.3 Know yourself
      • 7.1.4 Self-reevaluate
      • 7.1.5 Reach out for help
    • 7.2 How to be productive
      • 7.2.1 Write things down
      • 7.2.2 Define success
      • 7.2.3 Identifying tasks
      • 7.2.4 Break down tasks into bite size chunks
      • 7.2.5 Form productive habits
      • 7.2.6 Time Budgeting
      • 7.2.7 Set up your workplace
      • 7.2.8 Plan out your tasks
      • 7.2.9 Do the task
      • 7.2.10 Stick to it (you can do it!)
      • 7.2.11 Self-reevaluate
      • 7.2.12 Reach out for help
      • 7.2.13 Yay you did it!
  • Forming Questions
  • 8 Forming Questions
    • 8.1 Learning Objectives
  • 9 The Data Science Process
    • 9.0.1 The Parts of a Data Science Project
    • 9.0.2 A Data Science Project Example
    • 9.0.3 What you can build using R
    • 9.0.4 Other Cool Data Science Projects
  • 10 Types of Data Science Questions
    • 10.0.1 Examples of data science questions
  • 11 How to Learn
    • 11.0.1 Learning How To Learn
  • 12 RStudio and Projects
    • 12.0.1 Logging in to Posit Cloud
    • 12.0.2 Creating a new project
    • 12.0.3 The Tour
    • 12.0.4 Source: script editor panel
    • 12.0.5 Files/Help/Plots/Packages/Viewer
    • 12.0.6 Swirl
  • 13 Your First Data Science Project
    • 13.0.1 Starting up this project
    • 13.0.2 Your objectives!
  • Getting Data
  • 14 Getting Data
    • 14.1 Learning Objectives
  • 15 What is R
    • 15.0.1 Introduction to R
    • 15.0.2 RStudio
    • 15.0.3 Posit Cloud
    • 15.0.4 Basic History
    • 15.0.5 Learning R
  • 16 Objects in R
    • 16.0.1 What is an object?
    • 16.0.2 Storing objects
    • 16.0.3 Printing objects
    • 16.0.4 Character
    • 16.0.5 Integer
    • 16.0.6 Numeric
    • 16.0.7 Logical
    • 16.0.8 Factor
    • 16.0.9 Data frames
    • 16.0.10 Missing values
    • 16.0.11 Determining the class and shape of an object
    • 16.0.12 Summary
  • 17 Working with Logicals
    • 17.0.1 Logical operators
    • 17.0.2 Logical functions
    • 17.0.3 Summary
  • 18 Data Frames
    • 18.1 Exploring data frames
      • 18.1.1 Subsetting data frames
      • 18.1.2 First plot!
      • 18.1.3 What about matrices and tibbles?
  • 19 Basic Commands in R
    • 19.0.1 Functions
    • 19.0.2 What is this object?
    • 19.0.3 How big is this object?
    • 19.0.4 Are there named features of this object?
    • 19.0.5 What does this object look like?
    • 19.0.6 Errors, Warnings, and Messages
    • 19.1 A brief intro to the apply family of functions
  • 20 Getting Help in R
    • 20.0.1 R Help: ?
    • 20.0.2 Debugging code
    • 20.0.3 Tips on asking questions!
    • 20.0.4 Summary
    • 20.0.5 Additional Community Resources
  • 21 Managing Files
    • 21.1 Managing Files in R
      • 21.1.1 Why manage files in R?
      • 21.1.2 Navigating to R in Posit Cloud
      • 21.1.3 Important functions
    • 21.2 Managing Files in the Terminal
      • 21.2.1 What is the Terminal?
      • 21.2.2 Why manage files in the Terminal?
      • 21.2.3 File system example and vocabulary
      • 21.2.4 Important Terminal Commands
      • 21.2.5 Additional Resources
  • 22 R Packages
    • 22.0.1 What is an R package?
    • 22.0.2 What are repositories?
    • 22.0.3 How do you know what package is right for you?
    • 22.0.4 How do you install packages?
    • 22.0.5 Loading packages
    • 22.0.6 Dependencies
    • 22.0.7 Updating, removing, unloading packages
    • 22.0.8 Using the commands in a function
    • 22.0.9 Summary
    • 22.0.10 Additional Resources
  • 23 R Markdown
    • 23.0.1 What is R Markdown?
    • 23.0.2 Why use R Markdown?
    • 23.0.3 Getting started with R Markdown
    • 23.0.4 File paths in R Markdowns
    • 23.0.5 “Knitting” documents
    • 23.1 Introduction to Markdown
      • 23.1.1 How to use Markdown
      • 23.1.2 Markdown Example
      • 23.1.3 Main commands
      • 23.1.4 More complicated editing
      • 23.1.5 Links
      • 23.1.6 Additional Resources
  • 24 Dataset File Types
    • 24.0.1 Tabular data files
    • 24.0.2 CSV files
    • 24.0.3 Downloading CSV files
    • 24.0.4 Reading files into Posit Cloud
    • 24.0.5 Writing CSV files into Posit Cloud
    • 24.0.6 Excel files
    • 24.0.7 Reading Excel files into Posit Cloud
    • 24.0.8 Text files
    • 24.0.9 Reading TSV files into Posit Cloud
    • 24.0.10 Reading TXT files into Posit Cloud
    • 24.0.11 Exporting Data in R to CSV
    • 24.1 Special R data files
      • 24.1.1 Datasets in R
      • 24.1.2 Datasets in Packages
  • 25 Finding Data
    • 25.0.1 Public versus Private Data
    • 25.0.2 Publicly-available data
    • 25.0.3 Data You Already Have
    • 25.1 Where to get open source data
    • 25.2 dslabs
    • 25.3 FiveThirtyEight Data
    • 25.4 Kaggle
    • 25.5 More places to get data
  • 26 Ethical Data Science
    • 26.0.1 Data Science Team
    • 26.0.2 The Data
    • 26.0.3 The Analysis
    • 26.0.4 After the Algorithm
    • 26.0.5 Ethics in Data Science
    • 26.0.6 Artificial Intelligence in the Military
    • 26.0.7 Additional Resources
    • 26.1 Internet Safety
      • 26.1.1 WiFi
      • 26.1.2 Passwords
      • 26.1.3 Good Internet behavior
      • 26.1.4 Online Scams
      • 26.1.5 Malware & Spyware
      • 26.1.6 Security on a Chromebook
    • 26.2 Data Privacy
      • 26.2.1 What is Data Privacy?
      • 26.2.2 Personally Identifiable Information
      • 26.2.3 What is encryption?
      • 26.2.4 Human Security
      • 26.2.5 Computer Security
      • 26.2.6 Open Science
      • 26.2.7 Data Breaches
      • 26.2.8 Conclusions
  • 27 Google Documents
    • 27.1 Google Sheets
      • 27.1.1 What is a spreadsheet?
      • 27.1.2 Setting up for our next data science project
      • 27.1.3 Setting up your spreadsheet
      • 27.1.4 Collecting data
      • 27.1.5 Checking your data
      • 27.1.6 Publishing to the web
      • 27.1.7 Making the sheet public
  • 28 Getting Data Project
    • 28.0.1 Starting up this project
    • 28.0.2 Your objectives!
  • Cleaning Data
  • 29 Cleaning Data
    • 29.1 Learning Objectives
  • 30 What is Tidy Data
    • 30.0.1 Why Tidy Data?
    • 30.0.2 Data Terminology
    • 30.0.3 Principles of Tidy Data
    • 30.0.4 Rules for Tidy Spreadsheets
    • 30.1 Untidy Data
      • 30.1.1 Common problems with messy data sets
      • 30.1.2 Examples of untidy data
      • 30.1.3 Tidying untidy data
      • 30.1.4 Additional resources
  • 31 Reshaping Data
    • 31.0.1 Data Formats
    • 31.0.2 R Packages
    • 31.1 Transposing data
      • 31.1.1 Additional Resources
  • 32 How to Tidy Data
    • 32.0.1 dplyr
    • 32.0.2 tidyr
    • 32.0.3 janitor
    • 32.0.4 skimr
    • 32.0.5 The Pipe Operator
    • 32.0.6 Filtering Data
    • 32.0.7 Reordering
    • 32.0.8 Creating new columns
    • 32.0.9 Separating Columns
    • 32.0.10 Merging Columns
    • 32.0.11 Cleaning up column names
    • 32.0.12 Combining data across data frames
    • 32.0.13 Grouping Data
    • 32.0.14 Summarizing Data
    • 32.0.15 Conclusion
    • 32.0.16 Additional Resources
  • 33 Joining Data
    • 33.1 dplyr join family of functions
      • 33.1.1 Row binding
      • 33.1.2 Column binding
  • 34 Working with Strings
    • 34.0.1 Strings review
    • 34.0.2 stringr
    • 34.0.3 String basics
    • 34.0.4 Regular expressions
    • 34.0.5 Conclusion
    • 34.0.6 Additional Resources
  • 35 Working with Factors
    • 35.0.1 Factor basics
    • 35.0.2 Manually change the labels of factor levels : fct_relevel()
    • 35.0.3 Keep the order of the factor levels : fct_inorder()
    • 35.0.4 Advanced Factoring
    • 35.0.5 Re-ordering factor levels by frequency : fct_infreq()
    • 35.0.6 Reversing order levels : fct_rev()
    • 35.0.7 Re-ordering factor levels by another variable : fct_reorder()
    • 35.0.8 Combining several levels into one: fct_recode()
    • 35.0.9 Converting numeric levels to factors: ifelse() + factor()
    • 35.0.10 Conclusions
    • 35.0.11 Additional Resources
  • 36 Working with Dates
    • 36.0.1 Dates and time basics
    • 36.0.2 Creating dates and date-time objects
    • 36.0.3 Working with dates
    • 36.0.4 Time spans
    • 36.0.5 What’s not covered in this lesson
    • 36.0.6 Additional Resources
  • 37 Project Organization
    • 37.1 Setting Up Data Science Project Folders
      • 37.1.1 A project organization framework
      • 37.1.2 The next level of organization
      • 37.1.3 The README File
      • 37.1.4 Using Comments
      • 37.1.5 Write in a modular way
      • 37.1.6 Follow a style guide
  • 38 Cleaning Data Project
    • 38.0.1 Starting up this project
    • 38.0.2 Your objectives!
  • Plotting Data
  • 39 Plotting Data
    • 39.1 Learning Objectives
      • 39.1.1 Data Visualization
      • 39.1.2 Explanatory Plots
      • 39.1.3 Types of Plots
      • 39.1.4 Tables
  • 40 Good Plots
    • 40.0.1 Tips for Making Good Plots
    • 40.0.2 What To Consider When Making Plots
    • 40.0.3 Making The iPhone Plot in R
    • 40.0.4 Additional Resources
  • 41 Introduction to ggplot2
    • 41.0.1 The basics
    • 41.0.2 Example dataset: diamonds
    • 41.0.3 Scatterplots: geom_point
    • 41.0.4 Aesthetics
    • 41.0.5 Facets
    • 41.0.6 Geoms
    • 41.0.7 Summary
    • 41.0.8 Additional Resources:
  • 42 Customization in ggplot2
    • 42.0.1 Colors
    • 42.0.2 Labels
    • 42.0.3 Themes
    • 42.0.4 Theme
    • 42.0.5 Legends
    • 42.0.6 Scales
    • 42.0.7 Coordinate Adjustment
    • 42.0.8 Annotation
    • 42.0.9 Summary
    • 42.0.10 Additional References
  • 43 Saving Plots
    • 43.0.1 Image Types
    • 43.0.2 Saving Plots: ggsave()
    • 43.0.3 Alternative Approach
    • 43.0.4 Additional Resources
  • 44 From Exploratory to Explanatory
    • 44.0.1 Apple Product Sales Data
    • 44.0.2 Exploratory Plot
    • 44.0.3 Increasing Line Thickness
    • 44.0.4 Adding a Title
    • 44.0.5 Changing Line Colors
    • 44.0.6 Specifying a Theme
    • 44.0.7 Customizing the Theme
    • 44.0.8 Customizing Axis Labels
    • 44.0.9 Adding direct labels
    • 44.0.10 Explanatory Plot
  • 45 Data Tables
    • 45.0.1 What are Data Tables?
    • 45.0.2 When to Make a Table
    • 45.0.3 What to Consider When Making A Table
    • 45.0.4 Additional resources
  • 46 Tables in R
    • 46.0.1 Getting the data in order
    • 46.0.2 An exploratory table
    • 46.0.3 Improving the table output
    • 46.0.4 Annotating your table
    • 46.0.5 Tables in RMarkdown
    • 46.0.6 Additional resources
  • 47 Multiple Plots in R
    • 47.0.1 Installing patchwork
    • 47.0.2 Basic plotting using patchwork
    • 47.0.3 Altering the layout
    • 47.0.4 Nesting plots
    • 47.0.5 Additional operators
    • 47.0.6 Additional Resources
  • 48 Advanced Data Visualization
    • 48.0.1 Interactive Graphics
    • 48.0.2 Animated Graphics
    • 48.0.3 Advanced Graphics In R
    • 48.0.4 Additional resources
  • 49 Plotting Data Project
    • 49.0.1 Starting up this project
    • 49.0.2 Your objectives!
  • Getting Statistics
  • 50 Getting Statistics
    • 50.1 Learning Objectives
  • 51 Translating Questions to Data Science Questions
    • 51.1 Getting Specific with Questions
    • 51.2 Translating Our Questions
      • 51.2.1 A real example
      • 51.2.2 Summary
  • 52 Identifying Data
    • 52.1 The Data Science Question
    • 52.2 The Perfect Dataset: The Data We Want
    • 52.3 The Data We Have
    • 52.4 The Data We Can Get (Easily)
    • 52.5 Data Collection
    • 52.6 The Data We Can’t Get
      • 52.6.1 Limited Resources
      • 52.6.2 Ethical Limitations
      • 52.6.3 Security
    • 52.7 Questions to Ask Ourselves
    • 52.8 Are the Data We Have Good Data?
      • 52.8.1 The need to wrangle
      • 52.8.2 When data aren’t good
    • 52.9 A case study: Why were polls so wrong about the 2016 US election?
    • 52.10 Summary
    • 52.11 Closing notes
  • 53 In Practice Using Stats
    • 53.1 Tip number 1: Always be looking at your data
    • 53.2 Tip number 2: Dig into weirdness
    • 53.3 Tip number 3: Let the data inform you
    • 53.4 How to find out how much data is missing
      • 53.4.1 How to find if you have outliers
      • 53.4.2 How to know if your data is underpowered for your question
      • 53.4.3 How to know how your data are distributed
    • 53.5 How do I know what test to use?
      • 53.5.1 Comparison tests
      • 53.5.2 Regression tests
      • 53.5.3 Correlation tests
      • 53.5.4 Nonparametric tests
      • 53.5.5 Additional Resources
  • 54 Descriptive Analysis
    • 54.0.1 How to Describe a Dataset
    • 54.0.2 Summarizing Your Data
    • 54.0.3 Summary
    • 54.0.4 Additional Resources
  • 55 Exploratory Analysis
    • 55.0.1 General principles of exploratory analysis
  • 56 Inference: Overview
    • 56.0.1 Getting Started with Inference
    • 56.0.2 Uncertainty
    • 56.0.3 Random Sampling
    • 56.0.4 A real-life example of inferential data analysis
    • 56.0.5 Summary
  • 57 Inference: Linear Regression
  • 58 Inference: Examples
    • 58.0.1 Introduction
    • 58.0.2 Case Study #1: The Effects of Watching Sesame Street on Educational Outcomes
    • 58.0.3 Case Study #2: Effects of Religion on Happiness
    • 58.0.4 Summary
    • 58.0.5 Additional Resources
  • 59 Inference: Multiple Regression
    • 59.0.1 Confounding
    • 59.0.2 Multiple Linear Regression
    • 59.0.3 Correlation is not Causation
    • 59.0.4 Beyond Linear Regression
    • 59.0.5 Mean different from expectation?
    • 59.0.6 More Statistical Tests
    • 59.0.7 Summary
    • 59.0.8 Additional Resources
  • 60 Prediction and Machine Learning
    • 60.0.1 What is Machine Learning?
    • 60.0.2 Machine Learning
    • 60.0.3 Data Splitting
    • 60.0.4 Variable Selection
    • 60.0.5 Model Selection
    • 60.0.6 Model Accuracy
    • 60.0.7 Machine Learning Examples
    • 60.0.8 Summary
    • 60.0.9 Additional Resources
  • 61 Stats Project
    • 61.0.1 Starting up this project
    • 61.0.2 Your objectives!
  • Sharing Results
  • 62 Sharing Results
    • 62.1 Learning Objectives
  • 63 Reproducibility
    • 63.1 Examples about reproducibility
      • 63.1.1 Things that affect reproducibility
    • 63.2 Reproducibility doesn’t happen overnight!
      • 63.2.1 Additional Resources
  • 64 Version Control
    • 64.0.1 What is version control?
    • 64.0.2 Version control on your own
    • 64.0.3 Version control for collaboration or sharing
    • 64.0.4 Summarizing the benefits of version control systems
    • 64.0.5 What is Git? Why should you use it?
    • 64.0.6 What is GitHub?
    • 64.0.7 Version control vocabulary
    • 64.0.8 Best practices
    • 64.0.9 Summary
  • 65 GitHub
    • 65.1 Create a GitHub Account
      • 65.1.1 Logging in to GitHub
      • 65.1.2 The Homepage
      • 65.1.3 User Settings
      • 65.1.4 Notifications
      • 65.1.5 Help Files
      • 65.1.6 Summary
  • 66 Creating a Repository
    • 66.0.1 What is a Repository?
    • 66.0.2 How Do I Create a GitHub Repository?
    • 66.0.3 Adding a README File
    • 66.0.4 The GitHub Guide
  • 67 Cloning a Repository
    • 67.0.1 Step 1: Obtain the URL for the repository to clone
    • 67.0.2 Step 2: Use the RStudio interface to clone the repository
    • 67.0.3 Step 3: Set up GitHub Credentials
    • 67.0.4 Directory/Folder Organization
  • 68 Pushing and Pulling Changes
    • 68.0.1 Pushing
    • 68.0.2 Pulling
    • 68.0.3 Practice
  • 69 Organization with Issues on GitHub
    • 69.0.1 What are Issues?
    • 69.0.2 Creating an Issue
  • 70 Setting Up a Project on GitHub
    • 70.0.1 Setting up a GitHub Repository
    • 70.0.2 Creating a New Project from a GitHub Repository
    • 70.0.3 Pushing Local Changes to the Remote Repository in Posit Cloud
  • 71 Pull Requests
    • 71.0.1 What is a pull request
    • 71.0.2 Forking a repo
    • 71.0.3 Making edits
    • 71.0.4 Pushing your changes to your fork
    • 71.0.5 Sending a pull request
    • 71.0.6 When it gets more complicated
    • 71.0.7 Summary
  • 72 Version Control Help
    • 72.0.1 Google for git help
    • 72.0.2 Burn it Down
    • 72.0.3 Summary
    • 72.0.4 Additional Resources
  • 73 Types of Communication
    • 73.0.1 Opinionated Communication
    • 73.0.2 Types of Communication
    • 73.0.3 Reports
    • 73.0.4 Presentations
    • 73.0.5 Blog Posts
    • 73.0.6 Meetings
  • 74 Data Science Reports
    • 74.0.1 What To Include
    • 74.0.2 What to Avoid
    • 74.0.3 Brief Reports
    • 74.0.4 Full Report
    • 74.0.5 Summary
    • 74.0.6 Additional Resources
    • 74.0.7 Slides and Video
  • 75 Google Slides
    • 75.0.1 Presentation Guidelines
    • 75.0.2 Accessing Google Slides
    • 75.0.3 Creating a Full Slideshow
    • 75.0.4 Presenting Your Slideshow
    • 75.0.5 Accessing, downloading, and sharing your slides
    • 75.0.6 More specifics on using Google Slides
    • 75.0.7 Additional Resources
  • 76 How to Give a Presentation
    • 76.0.1 Presentation Goals
    • 76.0.2 Know your Audience
    • 76.0.3 Mind your Audience
    • 76.0.4 Prepare
    • 76.0.5 Tell a Story
    • 76.0.6 Slide Design
    • 76.0.7 Presenting
    • 76.0.8 Practice
    • 76.0.9 Handling Questions
    • 76.0.10 Example Presentation: Analysis
    • 76.0.11 Summary
    • 76.0.12 Additional Resources
  • 77 Projecting from a Chromebook
    • 77.0.1 Finding the right dongle
    • 77.0.2 Projecting from your Chromebook
  • 78 How to Present to a General Audience
    • 78.0.1 The Audience
    • 78.0.2 The Goal
    • 78.0.3 The Presentation
    • 78.0.4 General Presentation Example
    • 78.0.5 Summary
  • 79 How to Present to a Technical Audience
    • 79.0.1 The Audience
    • 79.0.2 The Goal
    • 79.0.3 The Presentation
    • 79.0.4 Emphasize
    • 79.0.5 De-emphasize
    • 79.0.6 Technical Presentation Example
    • 79.0.7 General vs Technical
    • 79.0.8 Summary
  • 80 How to Write a Blog Post
    • 80.0.1 General Outline
    • 80.0.2 Types of Blog Posts
    • 80.0.3 Why Blog?
    • 80.0.4 Blog Etiquette
    • 80.0.5 Summary
    • 80.0.6 Additional Resources
  • 81 Participating in Meetings
    • 81.0.1 Meeting Etiquette
    • 81.0.2 How to Host A Meeting
    • 81.0.3 How to Participate in a Meeting
    • 81.0.4 Summary
    • 81.0.5 Additional Resources
  • 82 How to Have a One-on-One Meeting
    • 82.0.1 Consulting
    • 82.0.2 What to Expect
    • 82.0.3 Follow-up Meetings
    • 82.0.4 Common Pitfalls
    • 82.0.5 Data Science Ethics
    • 82.0.6 Summary
    • 82.0.7 Slides and Video
  • 83 GitHub and Final Data Project
    • 83.0.1 Starting up this project
    • 83.0.2 Your objectives!
  • Building Resumes
  • 84 Building Resumes
    • 84.1 Learning Objectives
    • 84.2 What You Need to Find a Data Science Job
      • 84.2.1 Job Applications
      • 84.2.2 Online Presence
      • 84.2.3 Additional Resources
  • 85 Resumes
    • 85.0.1 What is a Resume?
    • 85.0.2 General Features
    • 85.0.3 Formatting
    • 85.0.4 What to Include
    • 85.0.5 Example Resumes
    • 85.0.6 Sharing your resume
    • 85.0.7 Summary
    • 85.0.8 Additional Resources
  • 86 Cover Letters
    • 86.0.1 Format
    • 86.0.2 Content
    • 86.0.3 What to Avoid
    • 86.0.4 Summary
    • 86.0.5 Additional Resources
    • 86.0.6 Slides and Video
  • 87 Make Your Own Website
    • 87.1 More customization of your website
      • 87.1.1 Blogdown
      • 87.1.2 Getting Started
      • 87.1.3 Website Content
      • 87.1.4 Website Appearance
      • 87.1.5 Posts
      • 87.1.6 Projects
      • 87.1.7 Deployment
      • 87.1.8 Additional Resources
      • 87.1.9 Slides and Video
  • 88 Project Gallery
    • 88.0.1 Project List
    • 88.0.2 Project Gallery
    • 88.0.3 What To Include
    • 88.0.4 What Not To Include
    • 88.0.5 Your Final Project
    • 88.0.6 Push to GitHub
    • 88.0.7 Your Own Project
    • 88.0.8 Slides and Video
  • 89 Improving Your GitHub Profile
    • 89.0.1 Profile
    • 89.0.2 Pinned Repos
    • 89.0.3 GitHub Contributions
    • 89.0.4 Summary
    • 89.0.5 Additional Resources
  • 90 Improving Your LinkedIn Profile
    • 90.0.1 LinkedIn Profile
    • 90.0.2 Connecting to Others
    • 90.0.3 Summary
    • 90.0.4 Additional Resources
    • 90.0.5 Slides and Video
  • 91 Using Twitter for Data Science
    • 91.0.1 Profile
    • 91.0.2 Who & What To Follow
    • 91.0.3 Tweeting
    • 91.0.4 Summary
    • 91.0.5 Additional Resources
  • 92 Data Science Job Descriptions
    • 92.0.1 What Data Scientists Do
    • 92.0.2 Skills for entry-level positions
    • 92.0.3 Job Descriptions
    • 92.0.4 Which Job is Right For You?
    • 92.0.5 Summary
    • 92.0.6 Additional Resources
  • 93 Where to Look for Data Science Jobs
    • 93.0.1 What Jobs to Look For
    • 93.0.2 Additional Considerations
    • 93.0.3 Job Titles to Search
    • 93.0.4 LinkedIn
    • 93.0.5 Job Boards
    • 93.0.6 Company websites
    • 93.0.7 Twitter
    • 93.0.8 Remote Work
    • 93.0.9 Applying
    • 93.0.10 Summary
    • 93.0.11 Additional Resources
  • 94 Where to Find Remote Data Science Jobs
    • 94.0.1 General Postings
    • 94.0.2 Remote OK
    • 94.0.3 Summary
  • 95 Data Science Interviews
    • 95.0.1 Pre-Interview Phone Calls
    • 95.0.2 Interview Preparation
    • 95.0.3 The Day of the Interview
    • 95.0.4 After The Interview
    • 95.0.5 Summary
    • 95.0.6 Additional Resources
  • 96 Data Science Meetups
    • 96.0.1 Where to Look
    • 96.0.2 What to Expect
    • 96.0.3 Meetups and Jobs
    • 96.0.4 Summary
  • 97 Create Your Portfolio
    • 97.1 Your objectives!
      • 97.1.1 Get the rest of your projects online on GitHub
  • Post Survey
  • 98 Post DataTrail Survey
  • About the Authors
  • This content was published with bookdown by:

    Data Trail

    Style adapted from: rstudio4edu-book (CC-BY 2.0)

    Click here to provide feedback

DataTrail

Chapter 88 Project Gallery

When looking for a job in data science, it’s best that those interested in hiring you can get a sense for your work easily on the Internet. By displaying projects you’ve worked on on your website, hiring managers can quickly see what skills you have and what you’re interested in.

Arguably, this could be seen on GitHub (we’ll get to this in the next lesson!), and hiring managers will likely look there as well; however, by writing up a short report that tells the whole story for a few of your data science projects, (including visualizations!), and including them on your website, you’re making it even easier on hiring managers to see what you can do!

In this lesson, we’ll discuss two different ways of displaying your projects on your website, discuss what you’ll want to include on your website, and walk through step-by-step of turning one of the projects in this course into a project included on your website.

88.0.1 Project List

The simplest way to display projects on your website is by including bullet points that link to your projects on GitHub; however, looking through multiple folders on your GitHub to figure out what you did on a project is asking a lot of someone. Thus, while it’s better than nothing to include links to your GitHub, you want to make it easier on your readers. To do this, you’ll want to turn your projects into a report or blog post that you’ll then publish on your website. By telling a story about your project, including the results in figures and tables, and making what you did in the analysis very clear, you’re demonstrating both your technical and communication skills all at once.

In the last lesson on updating your website, we looked at David Robinson’s “about me” section. In this lesson, we can see that as he carries out projects, he writes blog posts and posts them to his website.

David Robinson’s Projects

On his website, his most recent posts are displayed, and the contents of each can be found by clicking on any of the titles. Then, visitors to his site can see the story of his analysis!

Blog posts tell analytical stories

By including links to summaries (blog posts!) of your work on your website, you’re helping individuals looking to learn more about your work or simply interested in your analyses

88.0.2 Project Gallery

Beyond providing links to blog posts, it can sometimes be helpful to provide readers with a visual cue as to what will be included in that link. Rather than a list of your projects, including an image along with the title of the post can be very helpful to attracting readers to your work.

For example, on Nathan Yau’s site, FlowingData.com, projects can be searched by topic. Then, images for each post are displayed with the title underneath the project and the first few words from the post displayed underneath the title.

Projects on Nathan Yau’s website

One caveat here is that on this site, Nathan Yau often links to others’ work to promote and support their work, so these aren’t all his own projects. On your website, when looking for a job, you would want to be sure to include links to your work primarily. You can then use Twitter (we’ll talk about this later!) to support and promote others’ work, at least while you’re looking for a job. Once you’re more established (like Nathan Yau), it’s OK to include links to others’ work on your website, as long as you give them credit, of course!

Another great example of a project gallery can be seen on Mona Chalabi’s website. We saw her about me section in the last lesson and learned that she visualizes data using creative and artistic drawings/visualizations. This type of work perfectly lends itself to a project gallery, as the visualizations are the story.

Mona Chalabi’s Project Gallery

However, Mona is also a writer. Her journalism pieces are also listed on her website, in list format! Both project lists and project galleries can be effective ways to share your work on your website!

Links to Mona Chalabi’s Writing

88.0.3 What To Include

As you’re preparing your project gallery, you’ll want links to a few blog posts about projects you’ve worked on. These posts should display your ability to:

  • find a dataset
  • wrangle a dataset
  • explore a dataset
  • analyze a dataset
  • visualize a dataset
  • tell a story
  • effective data science communication

You’ll want at least a few examples of your work, meaning at least 3 different projects you’ve worked on. And, you’ll want to make sure at least one of these is a project you’ve come up with and worked on on your own – you don’t want the projects you display to all come from the projects done in this Course Set, although some of them can be!

88.0.4 What Not To Include

As you’re looking for a job, it’s best to avoid criticizing the work of others in your project gallery. You can certainly use a dataset that someone else has written about previously to do your own analysis; however, your post should not tear apart what that other individual did in theirs.

Additionally, remember this is a blog post. This post should tell a story and include details about your analysis, but it shouldn’t include every detail about your analysis. You should do the analysis and then pare down what you have to only what’s necessary to communicate your story to your audience.

Finally, remember that this should demonstrate your skills and abilities. Do not take code or work from others without properly citing those individuals. Giving attribution to others’ work when deserved is incredibly important. Be sure that all writing and code portrayed are your own in your project gallery actually are your own work.

88.0.5 Your Final Project

Now that we’ve seen a few examples of others’ projects, let’s get down to preparing one project you may want to include in your project gallery! In the last course, one of the quiz questions required you to create a presentation on Google Slides taking what you did in your Final project and turning it into a presentation that you’d present to a technical audience.

Thus, you should have already thought a bit about how you would tell a story about this analysis. We’ll use that same project here, but instead of generating a Google Slides presentation, we’ll turn this project into a project that you will include on your website!

The link to this page on your website will be required in the quiz for this lesson, so it’s best to follow along, create this post, and update your website as you work through this lesson!

88.0.5.1 New Post Setup

In the last lesson, we set up the skeleton for a project on your website, but we didn’t include any content in that project at that time. The goal of this lesson is to fill out that section of your website a little more! We’ll start by adding your Final Project analyzing the American Time Use Survey Data for this Course Set as a project on your website here.

Final Project on Posit Cloud

This write up should take what you did in your final project and turn it into a detailed but clear blog post format for inclusion on your website.

To get started, return to the project on Posit Cloud for your blogdown website. Navigate in the Files tab to /cloud/project/content/project. In here you’ll see the atus-survey.md file we created in the last lesson.

atus-survey.md

But, at this point, if you want to take what you did in your final project and turn it into a blog post, you probably want an R Markdown (.Rmd) file, not a Markdown (.md) file. That’s ok! We’ll walk through how to create a new post now!

First, make sure that you are in /cloud/project and not your version controlled website directory before attempting to make a new post.

Then, click on the “Addins” button on the menu along the top. Select “New post” from the drop-down menu.

New Post

In the pop-up “New Post” box that appear, enter the “Title” of your final project. For this lesson, we’ve titled this post “ATUS Survey Data”, but as we’ve discussed in previous lessons, this is not a very good title. A better title would concisely convey the findings of the analysis. Be sure that your project title is better than “ATUS Survey Data.”

After deciding on a good title, include your name in the “Author” box and today’s Date in the “Date” box. Then, importantly change Subdirectory to project. This ensures that the new post goes in your project directory and not your post directory.

After that, you can choose a few Categories and Tags to include on your website. Leave “Archetype” as is. The name of the file and the slug will fill in automatically. Leave those defaults alone.

Finally **change the format to “R Markdown (.Rmd)”.

New Post Filled In Form

This will generate an R Markdown document where you’ll be able to get started! Once this information is all complete, click “Done”

New RMarkdown Post

88.0.5.2 Project Content

With the file ready, we’ll need to determine the general framework for this blog post. This means determining what sections to include in the project and what figures/results you’ll present to tell your story. It’s a good idea to place possible section headers in the document before you start adding content.

A general framework for this project and most analyses blog posts could be the following:

General Framework

You could choose to use this framework or modify it to best tell the story of your analysis. But, regardless of what sections you include, you want to be sure to explain why you’re doing the analysis and what question you’ll be answering in this post in the introduction.

Following this, it’s often a good idea to explain where the data came from and display the code you used to get the data into Posit Cloud. The Exploratory Data Analysis section should explain how you wrangled the data and maybe show an explanatory plot or two.

How you analyzed the data should be explained briefly after the data are explained. The Results section is the most important - this should summarize your findings, including figures and tables to guide the reader and help them understand your results.

The sub-headers within the results section should tell the reader what your results were.

Finally, a conclusion should pull everything together.

Within your R Markdown document, this framework would be entered as follows:

Framework in R Markdown

Before we get to adding content, a reminder that you can run blogdown::serve_site() at any point in time to preview the changes to your site that you’ve made. After saving those changes to the R Markdown file, the homepage would look as follows.

Post on Homepage

If you were to click on this post, you would be able to preview the skeleton of your project that you just made!

Project Preview

We’ll get to filling in the content for the actual post; however, let’s take a step back to make the project on the homepage look more like the skeleton post we created before. We want to add a picture to the preview on the homepage, include a caption on the homepage post, and include a picture on the post itself. These changes will all be made in the YAML at the top of your R Markdown document.

YAML edits

summary: 'Analysis of 2016 American Time Use Survey (ATUS) Data'
image_preview: 'bubbles.jpg'
header:
  image: 'bubbles.jpg'

We’ll explain the YAML edits here: * summary: the caption text on the post on your homepage * image_preview: the image included with your post on your homepage * header: image: the image included in the header of your project post itself.

There are a number of other appearance edits you can make in the YAML of your R Markdown document, but these are all we’ll cover for now. We’re including the default bubbles.jpg image for now, but for each project you should change this to an image that makes sense for that project. The image you want to include should be stored in /cloud/project/static/img.

where to save images

Now when you preview your projects on your homepage, the project looks just like the skeleton post we created previously!

Projects updates preview

At this point, we can delete the atus-survey.md project file because we’re going to include just the .Rmd file on our website.

Delete atus-survey.md

88.0.5.3 Content

With the skeleton for your post in your R Markdown document ready and the appearance on your website ready to go, it’s time to include text and code throughout the post. For example, in the data section you’d explain where the data came from, describe how many observations are in the data and what variables you’re most interested in.

Data section

You’d also include a code chunk that would read the data into R. Note that you’ll likely want to include the data in the project website directory (/cloud/project/content/project). You can create subdirectories within this directory (for example: /cloud/project/content/project/atus_survey) to help keep things organized.

If we were to save and preview this file at this point on your website, it would be pretty bare bones, but you would be able to see all the changes you’ve made. Note that you do NOT have to Knit this file, as you would normally. blogdown does that automatically for you when you preview the site!

Skeleton with data section started

And, as you include more content for what you did in this analysis, it will become a full-fledged post. Be sure that there is text accompanying all code, figures and table, to explain to your readers what is being done throughout the project.

Adapt the text and code from your projects into this document by using the code you’ve already written but adding text to explain to a reader what you’ve done. You can copy sentences and code chunks, but do not simply copy everything currently in your project .Rmd document because there is a lot of explanatory text explaining to you what to do in the project. The goal of this post is to tell a story about your analysis. Make sure all text included in this document and every code chunk helps tell that story.

Before finishing change the picture included in the header and on your homepage preview to something that matches the project. It could be a figure included in your analysis or a photo of yours (or that is freely-available) that has something to do with your analysis. Make sure that this photo is uploaded into /cloud/project/static/img

88.0.5.4 Proofread

After you’re happy with your post, it’s always a good idea to proofread your writing for typos and errors.

88.0.6 Push to GitHub

Once you’re happy with these changes, remember that in order for them to be seen on your website, all the contents of your /cloud/project/public/ directory have to be added to your website directory. To do this, first delete all of the current files (except for the hidden files!) in /cloud/project/username.github.com.

Delete old website files

Then, select all the files in cloud/project/public. Click “More” and select “Move” from the drop down menu.

Move files

Select the Folder for your website in the pop-up “Choose Folder” window. Click “Choose.”

Select website

You’re now ready to push all your changes to GitHub. Make sure you’re in the correct version-controlled directory (/cloud/project/username.github.com/) and then add, commit, and push your changes:

cd /cloud/project/username.github.com/
git add .
git commit -m "add projects tab"
git push

Your changes will now be viewable on your website!

Changes to website

You’ll also want to create projects or posts for your Data Tidying and Data Visualization Projects on your website. That won’t be required of this lesson, but it is a good idea to make sure that your work is represented clearly on your website

88.0.7 Your Own Project

So far in this course, you’ve been told what data to work with, how to explore it, and what questions to answer. However, when looking for a job, hiring managers will want to see that you have completed projects on your own. At this point, it’s your job to use all the skills you’ve learned throughout this course to add a project that is all your own to your website.

To be considered complete, this project must demonstrate your ability to:

  • Form a data science question
  • Get data
  • Clean the data
  • Plot data
  • Get stats
  • Report results

Carry out a data science project on your own about a topic you care about or find interesting! Then add this project to your website. Push the changes to GitHub and include the link to your project in the quiz below.

88.0.8 Slides and Video

Project Gallery

  • Slides

All illustrations CC-BY.
All other materials CC-BY unless noted otherwise.