Chapter 8 Using Paths in Code
Now that you know what a file path is and how to navigate to different directories using both the Terminal and the R Console, we’ll discuss how to include file paths when coding.
8.0.1 Relative and absolute paths
Relative and absolute paths were discussed in an earlier lesson in this course. This is a very brief review:
8.0.1.1 Relative paths
Relative paths give a path to the destination folder based on where you are right now (your current working directory).
8.0.1.2 Absolute paths
Absolute paths give a path to the destination folder based on the root directory of a file system.
8.0.2 Directions analogy
So, to return to analogy of giving someone directions. Imagine that your friend plans to go from the town square, then to the library, and finally to the bakery. In this analogy, the town square represents the root directory, a universal starting location. If your friend is currently at the library and asks you for directions, you would likely tell them how to go from the library (where they are) to the bakery (where they want to go). This represents a relative path – taking you from where you are currently to where you want to be. Alternatively, you could give them directions from the Town square, to the library, and then to the bakery. This represents an absolute path, directions that will always work in this town, no matter where you are currently, but that contain extra information given where your friend is currently.
8.0.3 Paths in projects
In this course, you’ve learned how to organize your project folders and name your files for every data science project you complete. Setting up projects in this way not only helps keep you organized and your file naming consistent, but it also sets you up to easily refer to many different files while you’re writing code.
Often you’ll find yourself writing some code in your raw_code
folder, but you’ll want to read in some data that are in your raw_data
folder. The good thing is that since everything for this project is together in a subfolders within a single project folder, you can accomplish this easily.
8.0.3.1 Use relative paths in projects
To easily reference back to a dataset in your raw_data
folder where you have a file called dataset.csv
, you’ll want to use a relative path. You’ll set your path relative to the project folder. In this example, you will have set your project directory as /cloud/directory
and you would reference your relative path data/dataset.csv
.
Since you’ve already defined where your starting point is for this project (/cloud/directory
), you’ll be able to and should refer to every folder and every file easily with relative paths.
8.0.3.2 Relative paths make sharing projects easier
The reason relative paths for each project are the correct approach is because if you share this project with someone else with a different computer, you would likely share the project file and all the subdirectories. If you were to share this with someone else and relative paths were used throughout all the code in your project, that new person would be able to follow along and run all your code without any problems.
However, you are not likely to (and shouldn’t!) share all the contents of your entire computer with another person. If you had used absolute paths that work for your computer, rather than relative paths, none of the paths will make any sense on the person’s computer whom you’ve shared your content with. To enable sharing with someone else, you should always use relative paths in your code that are relative to the project folder. Thankfully, there’s an easy way in R to make that happen!
8.0.4 The here
package
We haven’t yet discussed what an R package is or what the basic commands in R are. However, we have covered file organization at this point and how to navigate within R. With that knowledge, we’re now going to discuss how to use the here package.
While we’ll define in more depth what packages are, at this point, think of packages as something that allows you to accomplish something that you wouldn’t have been able to otherwise or that wouldn’t have been as easy to accomplish without the package.
We’re going to discuss using a single R package now, called here
.
To get started using the here
package (or any R package!), it first has to be installed (using the install.packages()
function) and then loaded in (using the library()
function). Note that the package name in the install.packages()
function has to be in quotes but for library()
it doesn’t have to. The code to copy and paste into your R console is below:
install.packages("here")
library(here)
8.0.5 Why to use here
Okay, so if we’re discussing packages later, why discuss this one package now? Well, here
is a package specifically designed to help you deal with file organization when you’re coding.
This package allows you to define in which folder all your relative paths should begin within a project.
8.0.5.1 Setting your project directory
After installing and loading the here
package, to set your project directory using here()
, you’ll simply type the command here()
. You’ll notice that the output of here()
and getwd()
in this case is the same; however, what they’re doing behind the scenes is different.
getwd()
- shows the directory you are in currentlyhere()
- sets the directory to use for all future relative paths
The here()
function is what you want to use to set your project directory so that you can use it for future relative paths in your code. While in this case it also happened to be in the same directory you were in, it doesn’t have to be this way. The here() function looks to see if you have a .Rproj file in your project. If then sets your base directory to whichever directory that file is located.
So, if we were to change our current directory and re-type here()
in the Console, you’ll note that the output from here() does not change because it’s still looking for the directory where .Rproj is.
Note: In cases where there is no .Rproj file, here()
will look for files other than a .Rproj file. You can read about those cases in the fine print here. But for most of your purposes, here()
will behave as we just discussed.
8.0.5.2 Get files paths using here()
After setting your project folder using here()
, R will then know what folder to use to define any and all other paths within this project.
For example, if you wanted to include a path to a file named “intro_code.R” in your raw_code
directory, you would simply specify that in your code like this:
here("code", "raw_code", "intro_code.R")
This code says start from where I’ve already defined the project starts (here()
), then look in the folders code
and then raw_code
, for the file “intro_code.R.”
The syntax is simplified when using here()
. Each subdirectory or file in the path is in quotes and simply separated by commas within the here()
function, and voila you have the relative path you’re looking for relative to here()
.
The output from this code includes the correct file path to this file, just as you wanted!
8.0.5.3 Where you should use this
You should use here()
to set the base project directory for each data science project you do. And, you should use relative paths using here()
throughout your code any time you want to refer to a different directory or sub-directory within your project using the syntax we just discussed.
8.0.6 Additional Resources
here, here, by Jenny Bryan here documentation, by Kirill Müller