Evaluation
All the work you will complete for evaluation and credit in the course is described below.
Tasks
I have created a tasks to give you practice with the main skills and ideas from each lesson.
After the first week, most tasks have work for you to do in R. After you have done the task, go to Brightspace to answer questions associated with each task. Only these questions will be graded. Save your completed work in R as it will be a reference for you later in the course when you want to review skills you learned earlier in the course.
Week 1
There are three tasks this week. See Brightspace to submit your work for these tasks.
Task: Invitation (Lesson 1)
Find two data visualizations that you find informative, compelling, or in need of improvement.
Create a document that shows each visualization (the figure, or a snapshot of a dynamic visualization), provides the source (e.g., url and publication details if applicable). In a short paragraph describe
- the data behind the visualization,
- main message conveyed by the visualization,
- one or two features of the visualization that make it effective or provide suggestions for improvement.
The goal of this task isn’t to be right or wrong in your assessment, but rather to develop a practice of looking at data visualizations through the perspective of creator, designer, and critic. It’s okay if you find visualizations from secondary sources and not the creator or original publisher. Include the reference to the source you used to find the visualizations.
Submit your work by answering questions about one of these visualizations in the quiz on Brightspace.
Task (Lesson 2)
Read Healy, Sections 2.1-2.4, which contains excellent advice on the reasons for using R markdown, R/Rstudio, projects, the basics of R, and being patient as you learn computing tasks.
Make a list of about 10 key takeaway messages from the reading. For example, I would start with
- Do all my learning and work in R in R markdown documents created with Rstudio. This is a new skill to learn, but will be worthwhile. Don’t use a word processor. Don’t cut and paste code and results from the R console into a document. It’s too easy to make mistakes this way and will take longer.
- Take notes about what I learn (for ggplot and other skills). I’ll write these in a private Rmd document called data-viz-notes.Rmd (or something like that)
Write about two takeaway messages in the Brightspace quiz.
Task: Setting up (Lesson 3)
See the instructions in the syllabus or in Lesson 3 notes.
- Install R, Rstudio, and the packages listed in Lesson 3.
- Install git.
- Create a github account.
- Login to rstudio.cloud. This is a complete R, Rstudio, and git setup “in the cloud” that can be used if you have trouble using R on your own computer. I suggest you login using your git account to link Rstudio to git.
- Ask for help with any of these tasks if you need it.
The Brightspace quiz asks if you were successful with each task or if you need help. Provide your GitHub ID in one of the quiz questions.
Week 2
Task: Look at data (Lesson 4)
Your task for this lesson is to answer two specific questions about choices made when visualizing data, drawing on your own experience and the ideas presented below. You should also become familiar with Healy, Chapter 1 which contains much more on these topics.
Some people feel very strongly about the placement of 0 on the vertical scale of plots. Look again at the carbon dioxide plots in Lesson 1. The vertical scale does not start at 0. Use the ideas in Healy, Chapter 1.6 to describe how you would interpret vertical position on the carbon dioxide plots and how you could interpret this position if 0 was included on the vertical scale.
Hans Rosling’s visualizations (as shown in Lesson 1) use many channels for conveying data: x and y position, color, size, an annotation for year in the plot background. The interactive versions use an animation for change over time, and mouse-over pop-ups to identify the country for each dot. These are very complex visualizations!
- For Rosling’s plot show in in Lesson 1, what variables are shown for each of x and y position, color, and symbol size?
- According to Healy, Chapter 1 which of these 4 features is most difficult to make quantitative comparisons with? Why? Do you agree?
- In your judgment, is this visualization effective or too complex? Watch the TED talk or experiment with the interactive version before answering the question. Does the live explanation provided in Rosling’s oral presentation help you interpret the plot?
Write answers for these questions in a word processor. Answer the questions in the Task Quiz for week 2 on Brightspace.
Task (Lesson 5)
Repeat the examples from Lesson 5 until you are comfortable with the basics of making a plot. We will learn much more about plotting starting the lesson after next. Refer to Healy, Section 2.6 for a different basic plot. The Brightspace quiz asks you a couple of questions about making plots.
Task (Lesson 6)
Get the software git
working on your computer by following the steps in Lesson 6.
Week 3
Task (Lesson 7)
Your task for this lesson is to review some material on ggplot to connect the ideas from to the concept of the grammar of graphics to prepare you for starting to learn the details in the next lesson. There are a few multiple choice questions to answer in this week’s task quiz
In addition, I suggest the following reading with nothing to submit:
- Browse the R graph gallery to explore the huge variety of visualiaations you can make with R. Practice thinking about how the language of aesthetic mappings, geometries, and scales can be used to describe these visualizations.
- Read through the ggplot cheatsheet to see how the concepts of the grammar of graphics will be connected to computer code. Don’t worry about the details – you will practice making visualizations using these tools over many future lessons.
Task (Lesson 8)
Practice using ggplot to make graphs by reproducing examples from notes and modifying them by changing variables used in aesthetic mappings. If you would like, use other data sources described in “Data Sources” chapter in course notes.
Week 4
Task: Summarizing data (Lesson 9)
Work through the exercises to practice filter, group_by, summarize, and mutate provided in the file task-L09-summarize. Right click the link and select “save as” to get the file on your computer. Then open it with Rstudio.
Follow the instructions in the file. When you are done, knit the file to be sure there are no errors. Save your answers for future reference. Answer questions in the Brightspace Task quiz.
Week 5
Task: Facetted plots (Lesson 10)
Exercises to practice making faceted plots. File task-L10-facets.
Task (Lesson 11)
I will get you to practice reading files later on in the course. For now,
- pay attention to the code I use in future lessons for reading files, and
- download the Excel file from the lesson, practice reading it into R, editing it in Excel (or another spreadsheet), and confirming that you can read the changes with R.
You can get the file “test-data.csv” from this link.
Task: Reshaping data (Lesson 12)
Reshaping data practice. File: task-L12-reshape.
You can get the file “question-2.csv” from this link.
Task: Formatting tables (Lesson 13)
Displaying tables practice. File: task-L13-tables
Task (Lesson 14)
The material in this lesson should be helpful if you run into challenges while working on Assignment 2, which asks you to develop new skills with unfamiliar functions.
Week 6
Task (Lesson 15)
Practice adding linear models and smooths to visualizations. Reproduce some of the examples from the course notes, mini-lecture, or course textbook. Create new visualizations of your own design by changing the model, data, or underlying visualization. Experiment with colors and facets.
Task: Linear models (Lesson 16)
Exercises on linear models. File: task-L16-linear-models.
Task: LOESS and GAM (Lesson 17)
Exercises on LOESS and GAM smooths. File: task-L17-LOESS-GAM.
Team Planning
If you would like to work in a team with a specific classmate, fill in the form on Brightspace. You’ll need the name, email, and GitHub ID of the student you would like to work with. If you want to be assigned a random teammate, you don’t need to do anything. Teams of two or three students are both okay.
Week 7
Reading week. No classes or course work due.
Week 8
Task (Lesson 18)
Practice skills associated with collaborating on GitHub using the project repository for your team.
Task (Lesson 19)
Look for some data on the internet. Download the data to your computer. Read the data into R. Make a summary table describing some part of the data. Make a visualization using some of the data. Can you find any formatting errors in the data? Did you have any trouble reading the data into R?
You can use any data you like for this task. If you want a specific suggestion, get some data from gapminder.org or another source in the lesson.
Task: Reports (Lesson 20)
We’ve been using R markdown to make reproducible reports throughout the course. In this task, practice using here
and chunk options. File task-L20-collaboration.
Week 9
PCA (Lesson 21)
The task for this lesson is grouped with the MDS task.
Task: MDS (Lesson 22)
Create a PCA and MDS plot as described in the file task-L22-mds.
Task: K-means (Lesson 23)
Create a K-means analysis and accompanying visualization as described in the file task-L23-k-means.
Week 10
Task: Presentations (Lesson 24)
Create and modify a Quarto presentation as described in the lesson. Ensure you know how to
- show graphics output, but not the code that generated it,
- make the graphic the right size to fit on the slide,
- open the HTML version in your web browser.
File: task-L24-presentation code and the formatted presentation.
Week 11
Task (Lesson 27)
Practice making a map from the lesson
Task: Maps (Lesson 28)
Make maps described in the file task-L28-maps.
Task: Map alternatives (Lesson 29)
Show geographic data without maps as described in task-L29-map-alternatives.
Week 12
Task: Factors and Dates (Lesson 30)
Exercises with strings, factors, dates, and times. See task-L30-factors-dates.
Task (Lesson 31)
Make a custom color scale using a web interactive tool and then use those colours on a plot. Select a palette from a list given in the lesson and use it in a visualization.
Task: Themes (Lesson 32)
See task-themes.
That is the last task for the course!
Assignments
Assignments are opportunities to apply and combine the skills from several lessons. They are both structured, in that you are asked to use specific skills to accomplish a task, and creative in that you have some flexibility in the product you produce. You will be assessed on your use of technical skills and your judgement in making well-designed and effective visualizations, following the principles explored in the course.
Assignment 1
Your assignment is to get the repositories for Assignments on to your computer, make a few edits to the file, and push your changes back to GitHub. I will let the class know when the repositories have been created.
Before you begin this assignment, be sure you have completed the process of getting git working on your computer described in Lesson 6.
- Go to your github account and look for a repository called
data-viz-assignments-
followed by your github user name. Get the link by clicking the green button labeled “Code”. It should look like this:https://github.com/Dalhousie-AndrewIrwin-Teaching/data-viz-assignments-<your github name>.git
- Using Rstudio on your computer, select menu File > New Project > From repository… > Git. Insert the link to the repository and tell R where to put the repository on your computer.
- If you use rstudio.cloud, start at the projects page and choose New Project from git repository (pop-up menu) or open your existing project for our course. Clone the github repository in your project using the terminal window and type the command
git clone <link from github>
. Finally, open the new folder in the Files pane and click on the “assignments.Rproj” link to switch to the project. You may be asked by rstudio.cloud to link to your github account to access private repositories.
Open the Assignments project and edit assignment-1.Rmd
. Follow the instructions in that file. If you have trouble with this assignment, ask for help. It’s very important.
Use the assignment repository for this work and all assignments in the course.
If, when you try to push your work back to github, you get an error “Author identity unknown”, you need to type the following two commands into the R Terminal window:
git config --global user.email "you@example.com"
git config --global user.name "Your Name"
Replace you@example.com
and Your Name
with your email address and your name. Keep the quotation marks.
Assignment 2
There are so many packages and functions to make visualizations, that its really important to be able to read documentation and learn new functions. Fortunately, the design of ggplot means that many functions work very similarly and so once you have learned the basics, it’s quite easy to learn more on your own.
The purpose of this assignment is for you to practice this sort of learning. I’ve picked out a few functions that work much like the examples we’ve looked at already. Your assignment is to pick out two of these and make a R markdown document describing how they work. This is the sort of practice I do all the time when I learn a new R skill.
Look in the file assignment-2.Rmd
for a template for this assignment.
Assignment 3
Practice using methods developed in the course so far (summarizing, ggplot visualizations, linear regression, smooths) to explore a data set and answer questions about the data.
Assignment 4
Tidy Tuesday is a weekly activity to support people learning to use R for data analysis and visualization. Each week a new dataset is posted and interested participants post their visualizations. Some are complex pieces of work by people with lots of experience, but many are the work of beginners just learning to make good visualizations. I encourage you to explore the datasets and example visualizations others have made as a source of ideas and inspiration.
For the next two assignments, I’ll select a few datasets and ask you to work with one of those for your assignment.
For this assignment you will make scatter plots with smooths (linear, loess, or gam) and dimensionality reduction (PCA or MDS). The goal is to gain some insight into the data and present some aspect of the data in a visually appealing way. You may be able to use the data as its presented, or you may need to transform it in some way first (for example using the dplyr tools). You should feel free to show a subset of the data if you think that makes a better visualization to highlight a particular feature of the data.
Present your work as a short R markdown report. You should describe the dataset, explain any analysis or transformations you did, present at least 2 visualizations, and describe the main messages conveyed by your visualization. Full instructions are in the repostory.
Assignment 5
This is the second Tidy Tuesday assignment. Create maps as described in the repository.
Organize your work as a slide presentation. A template with instructions is in the repository.
Term project
Your term project is an analysis on a dataset of your own choosing. You can choose the data based on your interests, based on work in other courses, or independent research projects. You should demonstrate many of the techniques from the course, applying them as appropriate to develop and communicate insight into the data.
You should create compelling visualizations of your data. Pay attention to your presentation: neatness, coherency, and clarity will count. All analyses must be done in RStudio using R.
Deliverables
Work in groups of 2 or 3. Most teams will produce one of each component (proposal, report, presentation) per team member. If you prefer, each person on the team can make their own documents (proposal, report, presentation). Each team member has two roles in the project. First you will contribute your original creative work for the project. Second, you will act as a collaborator, providing your teammate with feedback, suggestions, debugging help, proofreading and other assistance as requested.
Use a single GitHub repository for the proposal, presentation, and final report. Use the repository I created for your team. See the notes on collaboration with GitHub for guidance. Contact me if you have trouble.
Team creation
Teams will be created shortly after the mid-term break. If you would like to form a team with a specific person in the class, answer the Brightspace quiz in the “project” section. If you would like to be assigned a randomly selected team mate, you don’t need to complete the form. I will finalize the team assignments during the break week.
If you withdraw from the course once you have been assigned to a team for the term project, please let your teammate and me know as soon as possible.
Proposal
Your main task for the proposal is to find a data set to analyze for your project and describe at least one question you can address with data visualizations.
It is important that you choose a readily accessible data set that is large enough so multiple relationships can be explored, but no so complex that you get lost. I suggest your data set should have at least 50 observations and about 10 variables. If you find a very large data set, you can make a subset to work with for your project. The data set must include categorical and quantitative variables. If you plan to use a data set that comes in a format that we haven’t encountered in class, make sure that you are able to load it into R as this can be a challenging task. If you are having trouble ask for help before it is too late.
Do not reuse data sets from any part of the course.
Here is list of data repositories containing many interesting data sets. Feel free to use data from other sources if you prefer.
- TidyTuesday
- OpenIntro
- Awesome public datasets
- Bikeshare data portal
- Harvard Dataverse
- Statistics Canada
- Open government data: Canada, NS, and many other sources
- Other sources listed in the Data sources section of the course notes
- Data you find on your own may be suitable too.
Describe a data set and question you can address with the data for your proposal. Outline a plan to use about five visualizations (e.g., data overview plot, dplyr/table summary, small multiples, smoothing/regression, k-means/PCA, map).
Many data sets on Kaggle are simulated data – generated by a computer with randomization and not the result of a real-life data-gathering process. You can recognize these data because there is no clear explanation of how the data were collected. Do not use simulated data. Do not use data if you can’t explain how it was collected.
The repository contains a template for your proposal called proposal.Rmd
. Write your proposal by revising this file and using this template.
- Questions: The introduction should introduce your research questions
- Data: Describe the data (where it came from, how it was collected, what are the cases, what are the variables, etc.). Place your data in the /data folder. Show that you can read the data and include the output of
dplyr::glimpse()
orskimr::skim()
on your data in the proposal. Do not show the entire dataset. - Visualization plan:
- The outcome (response, Y) and predictor (explanatory, X) variables you will use to answer your question.
- Ideas for at least two possible visualizations for exploratory data analysis, including some summary statistics and visualizations, along with some explanation on how they help you learn more about your data.
- An idea of how at least one statistical method described in the course (smoothing, PCA, k-means) could be useful in analyzing your data
- Team planning: briefly describe how members of your team will divide the tasks to be performed.
Assessment. See the file grade-proposal.Rmd
for the assessment guidelines and rubric.
Written report
Follow the template provided for your written report (report.Rmd
) to present your visualizations and insights about your data. Review the marking guidelines in grade-report.Rmd
and ask questions if any of the expectations are unclear.
Style and format does count for this assignment, so please take the time to make sure everything looks good and your tables and visualizations are properly formatted.
You and your teammate will be using the same repository, so merge conflicts will happen, issues will arise, and that’s fine! Pull work from github before you start, commit your changes, and push often. Ask questions when stuck. Look at the lesson on collaboration for help.
In your knitted report your R code should be hidden (echo = FALSE
) so that your document is neat and easy to read. However your document should include all your code such that if I re-knit your R Markdown file I will obtain the results you presented. If you want to highlight something specific about a piece of code, you’re welcome to show that portion.
General criteria for evaluation:
- Content - What is the quality of research question and relevancy of data to those questions?
- Correctness - Are visualization procedures carried out and explained correctly?
- Writing and Presentation - What is the quality of the visualizations, writing, and explanations?
- Creativity and Critical Thought - Is the project carefully thought out? Does it appear that time and effort went into the planning and implementation of the project?
Final exam
The final exam has three components:
- The oral presentation, which must be completed before the time of the exam.
- You will be assigned 5 presentations to watch as part of the final exam. You should plan to watch these during the assigned time for the final exam.
- You will complete a peer evaluation of the presentations you watch. This will be an assignment on Brightspace to be done during the final exam. You will also be asked to provide feedback on your partner’s contribution to the project. You should find a quiet place to write the exam. You do not need to come to the exam room.
Oral presentation
The oral presentation should be 5-10 minutes long (at least 5, no more than 10). Your goal is to present the highlights of your project.
You should have a small number of slides to accompany your presentation. I have provided a template for you to use as presentation.qmd
. I suggest a format such as the following:
- A title for your project and your team members’ names
- A description of the data you are analyzing
- At least one question you can investigate with your data visualization
- At least two data visualizations that help answer this question
- A conclusion
For suggestions on making slide presentations see the lesson on slides.
Don’t show your R code; the focus should be on your results and visualizations not your computing. Set echo = FALSE
to hide R code (this is already done in the template).
Your presentation should convey what you learned from your analysis and the choices you made when constructing your visualizations. Be careful not to simply describe a list of visualizations and analyses (“then we did this, then we did this, etc.”). You must tie the visualizations you show to the questions you are investigating.
Presentation schedule: You should record your presentation as part of the exam preparation period. You must upload your recorded presentation at least 1 day before the scheduled final exam. During the exam period you will watch presentations from other teams I assign to you and provide feedback in the form of a peer evaluation.
Assessment. See the file grade-presentation.Rmd
for the assessment guidelines.