Staying organized
What is a file? What is a folder?
When you use an application on a desktop computer you are usually invited to give your work a name and save it to a file in a particular place (a folder) on your computer storage so that it can be accessed later. A big exception occurs when you use a web browser; in this case you rarely permanently store your work on your computer, usually your work is stored “in the cloud”. By contrast, on a phone or tablet data you create is usually associated with the app you are using and there is no way to work with the data except through the app.
What do all these words (save, file, folder ) mean? I need to explain some of these ideas because we will be managing a lot of files — for both our writing and finding the data we want to use.
To save data means that you want to keep it safe against loss. Work done in many applications is temporary – it will be lost if you quit the application – unless you save the data. A file is a place on the computer where you store your data. Usually files are identified by their file name. Since a computer has a vast number of files, its conveient to put them in a folder or directory. Again there are many folders, so folders are placed inside other folders. Files and folders are references to old physical storage technology for paper. When you save data to a file in a folder on your computer, it’s important to remember where you put it so that you can find it again! This all sounds mundane, but picking good names for files and folders to store them in is surprisingly tricky1. It’s well worth a bit of time to develop some easy to use habits.
1 Most guides you find for organizing files emphasize the use of version numbers in file names. There is a better way – see the lesson on version control!
Here are some suggestions to help you get and stay organized.
Put everything in a folder
When you first start learning a computing tool like R, most people just put their files in “Downloads” or “Documents” or even on the “Desktop” with no particular thought to names or organization. This is fine when you are first experimenting, but it quickly gets out of control, especially when you want to keep one project separate from another, add data files, or want to share files with a collaborator.
Whenever you start a new project, such as a course, create a folder with an easy to understand name like “STAT2430” or “Data-Visualization”. Spaces in file names in general are fine, but if you plan to share files with others its usually best to use a dash (-
) or underscore (_
) instead of spaces, since spaces in file names cause problems with some computing tools.
As soon as you start work on a new project, you should give some thought to backups too. I put project folders in ‘Dropbox’ or ‘One Drive’ to reduce the chance of losing my work.
Sub-projects
For each component of a project, create a new folder (directory) called something meaningful like “Assignments”. When I open R, instead of creating a new R markdown file right away, I first start a new project (File > New Project…). This new project can be in this new folder or you can create a new folder.
You can choose to start using version control with the project. I suggest you do! Version control on your local machine takes very little extra effort and can be really handy if (when!) you delete something you didn’t mean to, or decide to undo a whole bunch of changes that didn’t work out the way you hoped. This works on your own computer – no need to use github.
A project encourages you keep all your files together in the same place. Rstudio will remember which files were open when you quit and re-open them for you. It makes the process of switching from one project to another much easier. You can have two different Rstudio sessions open in different windows at the same time. Each R session has its own window and workspace and operates completely independently of any other R session you have open. I suggest you don’t mix files from multiple projects in an R session; that works against your effort to stay organized.
Files and Folders
You can put all your files in this project in this one folder, but if you have different kinds of files – R markdown documents, data files, figures, and presentations, it can help you stay organized to create a sub-folder for each type of file. I find it particularly useful to do this for two kinds of files. Any file I get from somewhere else, typically a data file from a collaborator or the internet, goes in its own folder (“sources”) and I never modify these files. That way I can clean up copies of a file and always know what the original file looked like. This is not really necessary if you use version control, but the original data file is the most likely file I’ll want to use again and I think the duplication is worth the effort. Second, any files that are created by my R code go in their own folder (“output”): figures, data tables, or the results of time-consuming calculations, for example, because I know they can always be deleted and recreated later on. Putting them in their own folder reminds me I don’t need to use version control on them; my code created them and there is no need to keep the results.
Naming conventions
It’s helpful to create informative names for files. If two files are related, use similar names with a different suffix (world-map.R and world-map.png, for example). If some steps need to be performed in a particular order, a simple way is to use a number (01, 02, etc) as a prefix. I used this convention when writing these notes to help me keep the sections in order. For advanced projects, there are specialty tools for performing calculations in a particular order, but a simple numbering scheme works for many projects that outgrow simple sequencing of calculations in a single file.
Resources
- Healy. Appendix. Managing projects and files
- Avoid using directory paths that won’t exist on someone else’s computer when you share files by using the Here pacakge (github).
- R for Data Science notes on scripts and projects
- Drake package and manual
- If you are worried about versions of packages changing, you can keep track of the packages you used and their versions using checkpoint or packrat