Chapter 1 Intro: Welcome to DataTrail

Hello and welcome! The goal of DataTrail is to help anyone with an internet connection and a computer learn to do data science. The program will start with the very basics of using a computer on the internet and work all the way up to doing data science and data analysis. We hope that by building this program we can help people get into the exciting tech world in one of the fastest growing and most satisfying jobs in the United States. There are only going to be more and more jobs asking for data science skills in the future. We believe that by making this career accessible to anyone we can have a positive impact on the world.

1.1 Learning objectives for this section

In this section our goal is for you to be able to:

  • Understand the components of this DataTrail course
  • Describe what data science is
  • Acknowledge the expectations and requirements for completing this course
  • Navigate the basic use of your Chromebook and its operating system
  • Setup the basic accounts you will need to complete this course

1.1.1 Course Details

Before we jump into the content, we just wanted to orient you to how this course will be laid out:

  • Sections - There are 8 major sections in this course that take you through another step of the data science process:
    • Introduction to DataTrail (what you are looking at now)
    • Forming Questions
    • Getting Data
    • Cleaning the Data
    • Plotting the Data
    • Sharing Results
    • Building a resume
  • Chapters - You’re looking at the first chapter here. It’s called “Welcome to DataTrail”. You can see a list of all the chapters (and the overall sections in bold) in this course in the left panel. The chapters will contain text and images to walk you through every chapters of each course.
  • Quizzes - Most chapters (when looking at this in Leanpub) will have a quiz to evaluate your understanding of the material in that chapter. Successful completion of these quizzes is required for receipt of the certificate at the end of the DataTrail course. Quizzes often also have an interactive component called swirl which we will introduce you to.
  • Projects - At the end of each section, there will be a project that will require you to turn in some code. The main objective of these projects is to have you try setting up an analysis. If you are at all struggling with completing these projects reach out. The main objective is that you attempt to use what we have learned in the section. If you get stuck, that is okay, just make sure to turn in the work that you have attempted.

1.1.2 What is data science?

You might not think about data very much. Most people don’t. So why is data science such a popular and growing career? And what does data science even mean?

One definition of data is any information that you can store on a computer. Examples of data that you produce all the time are text messages, Facebook posts, websites you visit, things you buy with a credit card, pictures of your car on speed cameras, and information you fill out in profiles for your work, school, or community organizations. If you can take a picture of it, write about it, make a video of it, or record it on audio - then it is probably data. All of these different kinds of data are collected and saved on a computer.

It used to be that measuring and storing data was expensive and hard. Now it is easy and cheap. Governments, companies, organizations, and even individual people can now collect and store more data in a few days than the entire world collected over the last few centuries. Most of the time we don’t even think about the data we are collecting. We take pictures and post them to Facebook to show our grandparents, not because we want to analyze the data in the pictures. This is true for most of the data that we create and collect, both for ourselves and for companies. We don’t do it for the data - we do it because we want to record and share information about ourselves and the world. For example, companies do it so they can keep track of their customers, and governments do it because they want to keep records of who got parking tickets.

But people started to figure out that you could use that data for other purposes. When you search for “symptoms of the flu” on Google, you are just looking for information because you are sick. But the data that you are searching for flu symptoms is valuable information for companies, doctors, and even scientists. We could use that data from you to do things like show you an ad for blankets or for flu medicine. We could also use data from you and everyone else who searched for flu symptoms to find out where there are lots of flu cases.

Another example is social media. You might write a post on Facebook with pictures describing your child’s birthday party. You might do this so your child’s grandparents can see pictures of her birthday. But the information in that post can again be valuable for other people. We could figure out birthdays, hobbies, interests, and who knows who from Facebook posts and likes. That information can be valuable for showing ads, for suggesting other people you might know, or for studying how humans interact with each other on birthdays and holidays.

Nonprofits use data science to better further their causes. For example, the Red Cross uses data science to find counties to target for smoke alarm installation campaigns.

But to make data valuable we need to be able to study it and separate the interesting facts (called the “signal”) from the uninteresting information (the “noise”). One definition for data science is that.

“Data science is asking a question that can be answered with data, collecting and cleaning the data, studying the data, creating models to help understand and answer the question, and sharing the answer to the question with other people.”

The reason this field is growing so fast is that nearly every government, company, and organization is now collecting data. As the data have become cheaper and cheaper, the ability to analyze that data and find useful information has become a more and more valuable skill. But most people don’t have training or experience sifting through big piles of data to make interesting and valuable discoveries. The people who can do this well are called data scientists. They have a job that is exciting, interesting, and promises to be in high demand for years to come.

1.1.3 What is DataTrail?

Data science is a fast-growing and exciting profession. But it can be a challenge to get into this career. Some of the biggest challenges are:

  • Finding out that data science is even a real career
  • Getting an education in data science can be costly and inconvenient
  • You usually need a background in math, statistics, or computer science
  • The equipment for data science can be costly and difficult to set up
  • Many of the jobs are only available in major cities

Most of the people who are currently data scientists have degrees in math, statistics, physics. They can afford computers that cost thousands of dollars and specialized computing software to help them do their jobs. They also mostly live in a few major cities like New York, San Francisco, Seattle, and Washington D.C. Many of these data scientists are former software engineers or other white-collar workers who moved into data science when they saw the demand for this kind of job.

It is our goal with DataTrail to try to help people who would otherwise not have access to this exciting career to get into the career. To do that we need to remove some of the challenges above. So we designed this program to tackle some of the challenges that are preventing more widespread adoption of this career.

  • DataTrail is being released as a set of online courses with a pay-what-you-can model. That means you can take the whole series of courses for free or for whatever cost you can afford.
  • DataTrail is designed to be done entirely online using only tools you can access from a web browser. This means that you can do the entire program on a Chromebook - which you can get for as little as $150.
  • DataTrail starts at the very basics of how to set up all of your accounts, which websites and apps to use, and straightforward projects that anyone can do. The only pre-requisites are high school math/reading and the ability to use a computer.
  • DataTrail includes resources for finding, getting, and working at data science jobs. It also includes resources for finding and working at remote data science jobs that can be done from anywhere in the world.

1.1.4 Who is this program for?

DataTrail is designed for people who have a high school education and know how to use a computer. Some people who we hope the program will be useful for are:

  • High school students
  • People who are working on or have completed a high school education
  • Students at community colleges
  • Older adults who want to learn something new

But the program can be completed by anyone! We hope that it will be useful for anyone who wants to learn something new about data science. This program is also focused on people who want to learn to do data science.

In some cases this program may not be the most efficient way to learn about data science. If you already have a background in statistics, math, or computer science and want to jump directly to more advanced topics we have already created a Data Science Specialization on Coursera just for you. There are many jobs that require people to understand or manage a data science project. If you are a leader or executive who just wants a high level overview of what data science is all about, we have also created an Executive Data Science Specialization.

Our goal here is also to create a supportive and inclusive learning experience. Data science is frustrating and slow to learn. Often the best way is to learn from other people who have discovered similar solutions or made similar mistakes. Fortunately, there are communities in data science that are cheerful, friendly and willing to help new people get involved. Throughout the program we will introduce you to these communities and hope that you will also make an effort to help your fellow students as they discover this exciting field.

1.1.5 How the program is organized

This course is designed to be used in many different ways so they can be useful for the most people possible. The courses and projects can be completed entirely online using nothing more than a web browser.

To keep up on the latest information about the program, courses and more go to https://www.DataTrail.org/.

1.1.6 How DataTrail is graded

Most chapters will have a short quiz at the end. To pass the course you need to get 70% of the quiz questions in the course correct. Additionally, if you are taking this course as a part of a datatrail cohort, you will be required to turn in your projects for credit. If you receive more than 90% of the points across all quizzes and turn in completed projects you will pass with honors.