Chapter 1 Welcome to DataTrail

Hello and welcome! This is the first course in the DataTrail program. The goal of DataTrail is to help anyone with an internet connection and a computer learn to do data science. The program will start with the very basics of using a computer on the internet and work all the way up to doing data science and data analysis. We hope that by building this program we can help people get into the exciting tech world in one of the fastest growing and most satisfying jobs in the United States. There are only going to be more and more jobs asking for data science skills in the future. We believe that by making this career accessible to anyone we can have a positive impact on the world.

1.0.1 Course Details

Before we jump into the content, we just wanted to orient you to how this course and all the courses in this program will be laid out:

  • Courses - There are multiple courses in the DataTrail program. The first one is “Introduction to DataTrail”, which is the course you’re in right now.
  • Lessons - Each course will consist of lessons. You’re looking at the first lesson here. It’s called “Welcome to DataTrail”. You can see a list of all the lessons in this course in the left panel. The lessons will contain text and images to walk you through every lesson of each course.
  • Videos - At the end of each lesson there will be a link to a YouTube video. This video contains the same information as what is included in the text of the lesson; however, we know that some people learn better by listening. Sometimes you may find the videos more helpful. Sometimes you may find the text more helpful. These are included in case they are more helpful than the text to you personally.
  • Slides - Each lesson also has link at the bottom to an accompanying slide show. Feel free to look through these slides if you find them helpful. They are the same images that were used to generate the video.
  • Quizzes - Most lessons will have a quiz to evaluate your understanding of the material in that lesson. Successful completion of these quizzes is required for receipt of the certificate at the end of each course.
  • Exercise - A few of the courses will have associated exercises. Think of these as larger projects. They won’t be required to receive the certificate at the end of the course; however, the skills the exercises require will be essential if you’re interested in getting a job in data science, so we highly suggest you complete them.

1.0.2 What is data science?

You might not think about data very much. Most people don’t. So why is data science such a popular and growing career? And what does data science even mean?

One definition of data is any information that you can store on a computer. Examples of data that you produce all the time are text messages, Facebook posts, websites you visit, things you buy with a credit card, pictures of your car on speed cameras, and information you fill out in profiles for your work, school, or community organizations. If you can take a picture of it, write about it, make a video of it, or record it on audio - then it is probably data. All of these different kinds of data are collected and saved on a computer.

It used to be that measuring and storing data was expensive and hard. Now it is easy and cheap. Governments, companies, organizations, and even individual people can now collect and store more data in a few days than the entire world collected over the last few centuries. Most of the time we don’t even think about the data we are collecting. We take pictures and post them to Facebook to show our grandparents, not because we want to analyze the data in the pictures. This is true for most of the data that we create and collect, both for ourselves and for companies. We don’t do it for the data - we do it because we want to record and share information about ourselves and the world. For example, companies do it so they can keep track of their customers, and governments do it because they want to keep records of who got parking tickets.

But people started to figure out that you could use that data for other purposes. When you search for “symptoms of the flu” on Google, you are just looking for information because you are sick. But the data that you are searching for flu symptoms is valuable information for companies, doctors, and even scientists. We could use that data from you to do things like show you an ad for blankets or for flu medicine. We could also use data from you and everyone else who searched for flu symptoms to find out where there are lots of flu cases.

Another example is social media. You might write a post on Facebook with pictures describing your child’s birthday party. You might do this so your child’s grandparents can see pictures of her birthday. But the information in that post can again be valuable for other people. We could figure out birthdays, hobbies, interests, and who knows who from Facebook posts and likes. That information can be valuable for showing ads, for suggesting other people you might know, or for studying how humans interact with each other on birthdays and holidays.

But to make data valuable we need to be able to study it and separate the interesting facts (called the “signal”) from the uninteresting information (the “noise”). One definition for data science is that.

“Data science is asking a question that can be answered with data, collecting and cleaning the data, studying the data, creating models to help understand and answer the question, and sharing the answer to the question with other people.”

The reason this field is growing so fast is that nearly every government, company, and organization is now collecting data. As the data have become cheaper and cheaper, the ability to analyze that data and find useful information has become a more and more valuable skill. But most people don’t have training or experience sifting through big piles of data to make interesting and valuable discoveries. The people who can do this well are called data scientists. They have a job that is exciting, interesting, and promises to be in high demand for years to come.

1.0.3 What is DataTrail?

Data science is a fast-growing and exciting profession. But it can be a challenge to get into this career. Some of the biggest challenges are:

  • Finding out that data science is even a real career
  • Getting an education in data science can be costly and inconvenient
  • You usually need a background in math, statistics, or computer science
  • The equipment for data science can be costly and difficult to set up
  • Many of the jobs are only available in major cities

Most of the people who are currently data scientists have degrees in math, statistics, physics. They can afford computers that cost thousands of dollars and specialized computing software to help them do their jobs. They also mostly live in a few major cities like New York, San Francisco, Seattle, and Washington D.C. Many of these data scientists are former software engineers or other white-collar workers who moved into data science when they saw the demand for this kind of job.

It is our goal with DataTrail to try to help people who would otherwise not have access to this exciting career to get into the career. To do that we need to remove some of the challenges above. So we designed this program to tackle some of the challenges that are preventing more widespread adoption of this career.

  • DataTrail is being released as a set of online courses with a pay-what-you-can model. That means you can take the whole series of courses for free or for whatever cost you can afford.
  • DataTrail is designed to be done entirely online using only tools you can access from a web browser. This means that you can do the entire program on a Chromebook - which you can get for as little as $150.
  • DataTrail starts at the very basics of how to set up all of your accounts, which websites and apps to use, and simple little projects that anyone can do. The only pre-requisites are high school math/reading and the ability to use a computer.
  • DataTrail includes resources for finding, getting, and working at data science jobs. It also includes resources for finding and working at remote data science jobs that can be done from anywhere in the world.

1.0.4 Who is this program for?

DataTrail is designed for people who have a high school education and know how to use a computer. Some people who we hope the program will be useful for are:

  • High school students
  • People who are working on or have completed a high school education
  • Students at community colleges
  • Older adults who want to learn something new

But the program can be completed by anyone! We hope that it will be useful for anyone who wants to learn something new about data science. This program is also focused on people who want to learn to do data science.

In some cases this program may not be the most efficient way to learn about data science. If you already have a background in statistics, math, or computer science and want to jump directly to more advanced topics we have already created a Data Science Specialization on Coursera just for you. There are many jobs that require people to understand or manage a data science project. If you are a leader or executive who just wants a high level overview of what data science is all about, we have also created an Executive Data Science Specialization.

Our goal here is also to create a supportive and inclusive learning experience. Data science is frustrating and slow to learn. Often the best way is to learn from other people who have discovered similar solutions or made similar mistakes. Fortunately, there are communities in data science that are cheerful, friendly and willing to help new people get involved. Throughout the program we will introduce you to these communities and hope that you will also make an effort to help your fellow students as they discover this exciting field.

1.0.5 How the program is organized

This program is a series of online classes. They are designed to be used in many different ways so they can be useful for the most people possible. The courses and projects can be completed entirely online using nothing more than a web browser. The program is organized into

  • Courses: Courses are designed to be able to be done in about a month working in your spare time or day or two working full time. You can receive a certificate for each course and all courses are based on a pay-what-you-can model. Each course consists of:
    • Text based tutorials and lessons
    • Slides with the images from the tutorials
    • Video tutorials that cover the same information as the lessons
    • Ungraded exercises to practice what you have learned
    • Graded quizzes to measure what you have learned
    • Projects to help you build a portfolio for showing what you’ve learned
  • Course Set: A Course Set is a group of courses that form credentials.

To keep up on the latest information about the program, courses and more go to https://www.DataTrail.org/.

1.0.6 How this course is graded

This first class is designed to get you set up with the accounts you will use as you learn to become a data scientist. You will also complete your first data science project. Each lesson will have a short quiz at the end. To pass the course you need to get 70% of the questions in the course correct. If you receive more than 90% of the points across all quizzes you will pass with honors.

1.0.7 Slides and Video

Slides