Evaluation

All the work you will complete for evaluation and credit in the course is described below.

Assignments

Assignments are opportunities to apply and combine the skills from several lessons. They are both structured, in that you are asked to use specific skills to accomplish a task, and creative in that you have some flexibility in the product you produce. You will be assessed on your use of technical skills and your judgement in making well-designed and effective visualizations, following the principles explored in the course.

Assignment 0

Find two data visualizations, not from the course notes or textbooks, that you find informative, compelling, or in need of improvement.

Create a document that shows each visualization (the figure, or a snapshot of a dynamic visualization), provides the source (e.g., url and publication details if applicable). In a short paragraph describe

  1. the data behind the visualization,
  2. a question that can be answered with the visualization,
  3. the main message conveyed by the visualization,
  4. one or two features of the visualization that make it effective or provide suggestions for improvement. Make specific references to the guidelines in Healy Chapter 1 for this part of the assignment.

The goal of this task isn’t to be right or wrong in your assessment, but rather to develop a practice of looking at data visualizations through the perspective of creator, designer, and critic. You should also be able to explain or justify your assessment of a visualization by reference to evidence or reasoning. It’s okay if you find visualizations from secondary sources and not the creator or original publisher. Include the reference to the source you used to find the visualizations.

Submit a PDF of your work on Brightspace.

Assignment 1

Your assignment is to get the repositories for Assignments on to your computer, make a few edits to the file, and push your changes back to GitHub. I will let the class know when the repositories have been created and you can begin work.

Before you begin this assignment, be sure you have completed the process of getting git working on your computer described in Lesson 6.

  • Go to your github account and look for a repository called data-viz-assignments- followed by your github user name. Get the link by clicking the green button labeled “Code”. It should look like this: https://github.com/Dalhousie-AndrewIrwin-Teaching/data-viz-assignments-<your github name>.git
  • Using Rstudio on your computer, select menu File > New Project > From repository… > Git. Insert the link to the repository and tell R where to put the repository on your computer.
  • If you use posit.cloud, start at the projects page and choose New Project from git repository (pop-up menu) or open your existing project for our course. Clone the github repository in your project using the terminal window and type the command git clone <link from github>. Finally, open the new folder in the Files pane and click on the “Assignments.Rproj” link to switch to the project. You may be asked by posit.cloud to link to your github account to access private repositories.

Open the Assignments project and edit assignment-1.Rmd. Follow the instructions in that file. If you have trouble with this assignment, ask for help. It’s very important.

Use the assignment repository for Assignments 1-5 in the course.

If, when you try to push your work back to github, you get an error “Author identity unknown”, you need to type the following two commands into the R Terminal window:

git config --global user.email "you@example.com"
git config --global user.name "Your Name"

Replace you@example.com and Your Name with your email address and your name. Keep the quotation marks.

Assignment 2

There are so many packages and functions to make visualizations, that its really important to be able to read documentation and learn new functions. Fortunately, the design of ggplot means that many functions work very similarly and so once you have learned the basics, it’s quite easy to learn more on your own.

The purpose of this assignment is for you to practice this sort of learning. I’ve picked out a few functions that work much like the examples we’ve looked at already. Your assignment is to pick out two of these and make a R markdown document describing how they work. This is the sort of practice I do all the time when I learn a new R skill.

Look in the file assignment-2.Rmd for a template for this assignment.

Assignment 3

Practice using methods developed in the course so far (summarizing, ggplot visualizations, linear regression, smooths) to explore a data set and answer questions about the data.

Assignment 4

Tidy Tuesday is a weekly activity to support people learning to use R for data analysis and visualization. Each week a new dataset is posted and interested participants post their visualizations. Some are complex pieces of work by people with lots of experience, but many are the work of beginners just learning to make good visualizations. I encourage you to explore the datasets and example visualizations others have made as a source of ideas and inspiration.

For the next two assignments, I’ll select a few datasets and ask you to work with one of those for your assignment.

For this assignment you will make scatter plots with smooths (linear, loess, or gam) and dimensionality reduction (PCA or MDS). The goal is to gain some insight into the data and present some aspect of the data in a visually appealing way. You may be able to use the data as its presented, or you may need to transform it in some way first (for example using the dplyr tools). You should feel free to show a subset of the data if you think that makes a better visualization to highlight a particular feature of the data.

Present your work as a short Quarto report. You should describe the dataset, explain any analysis or transformations you did, present at least 2 visualizations, and describe the main messages conveyed by your visualization. Full instructions are in the repostory.

Assignment 5

This is the second Tidy Tuesday assignment. Create maps as described in the repository.

Organize your work as a slide presentation. A template with instructions is in the repository.

Mid-term test

The mid-term test will be written on paper during class time on February 26. It will cover material from lessons 1-15. You will not have access to your notes, a computer, or the internet for the test.

Questions on the test will be similar to the exercises for the lessons completed in the first half of the course. More details will be provided closer to the test date.

Term project

Your term project is an analysis on a dataset of your own choosing. You can choose the data based on your interests, based on work in other courses, or independent research projects. You should demonstrate many of the techniques from the course, applying them as appropriate to develop and communicate insight into the data.

You should create compelling visualizations of your data. Pay attention to your presentation: neatness, coherency, and clarity will count. All analyses must be done in RStudio using R.

Deliverables

Work in groups of 2 or 3. Most teams will produce one of each component (proposal and presentation). If you prefer, each person on the team can make their own individual documents (proposal and presentation).

Each team member has two roles in the project. First you will contribute your original creative work for the project. Second, you will act as a collaborator, providing your teammate with feedback, suggestions, debugging help, proofreading and other assistance as requested. Part of your responsibility for the project is to work with your team to divide up the tasks, take responsibility for leading your parts, and provide feedback to your team on the parts they lead.

Use a single GitHub repository for the proposal, presentation, and final report. Use the repository I created for your team. See the notes on collaboration with GitHub for guidance. Contact me if you have trouble.

Team creation

Teams will be created just before the mid-term break. If you would like to form a team with a specific person in the class, answer the Brightspace quiz in the “important surveys” section. If you would like to be assigned a randomly selected team mate, you don’t need to complete the form. I will finalize the team assignments just before the break week.

If you withdraw from the course once you have been assigned to a team for the term project, please let your teammate and me know as soon as possible.

Proposal

Your main task for the proposal is to find a data set to analyze for your project and describe at least one question you can address with data visualizations.

It is important that you choose a readily accessible data set that is large enough so multiple relationships can be explored, but not so complex that you get lost. I suggest your data set should have a minimum of 100 observations and about 10 variables. If you find a very large data set (thousands of observations or dozens of variables), you can make a subset to work with for your project. The data set must include categorical and quantitative variables. If you plan to use a data set that comes in a format that we haven’t encountered in class, make sure that you are able to load it into R as this can be a challenging task. If you are having trouble ask for help before it is too late.

Do not reuse data sets from any part of the course.

Here is list of data repositories containing many interesting data sets. Feel free to use data from other sources if you prefer.

  • TidyTuesday
  • OpenIntro
  • Awesome public datasets
  • Bikeshare data portal
  • Harvard Dataverse
  • Statistics Canada
  • Open government data: Canada, NS, and many other sources
  • Other sources listed in the Data sources section of the course notes
  • Data you find on your own may be suitable too.
  • You can also use LLMs to help you find datasets on topics of interest. Make sure you can describe the original source of the data and that the data are real observations and not simulated.

Many data sets on Kaggle are simulated data – generated by a computer with randomization and not the result of a real-life data-gathering process. You can recognize these data because there is no clear explanation of how the data were collected. Do not use simulated data. Do not use data if you can’t explain how it was collected.

Describe a data set and question you can address with the data for your proposal. Outline a plan to use about five visualizations (e.g., data overview plot, dplyr/table summary, a facetted plot, smoothing/regression, k-means/PCA, map).

The repository contains a template for your proposal called proposal.qmd. Write your proposal by revising this file and using this template.

  • Questions: The introduction should introduce your research questions
  • Data: Describe the data (where it came from, how it was collected, what are the cases, what are the variables, etc.). Place your data in the /data folder. Show that you can read the data and include the output of dplyr::glimpse() or skimr::skim() on your data in the proposal. Do not show the entire dataset.
  • Visualization plan:
    • The outcome (response, Y) and predictor (explanatory, X) variables you will use to answer your question.
    • Ideas for at least two possible visualizations for exploratory data analysis, including some summary statistics and visualizations, along with some explanation on how they help you learn more about your data.
    • An idea of how at least one statistical method described in the course (smoothing, PCA, k-means) could be useful in analyzing your data
  • Team planning: briefly describe how members of your team will divide the tasks to be performed.

Assessment. Commit your project proposal to your repository and push it to GitHub. You and your team will describe your project in a 5 minute in-class conversation with me on March 3 or 5. A schedule will be provided on Brightspace. The project template has an evaluation rubric.

Oral presentation

The oral presentation should be 5-10 minutes long (at least 5, no more than 10). Your goal is to present the main findings of your project.

You should have a small number of slides to accompany your presentation. I have provided a template for you to use as presentation.qmd. I suggest a format such as the following:

  • A title for your project and your team members’ names
  • A description of the data you are analyzing
  • At least one question you can investigate with your data visualization
  • At least two data visualizations that help answer this question
  • A conclusion

Feel free to add or delete slides as needed from the template. Be sure to update the titles of sides to be informative.

For suggestions on making slide presentations see the lesson on slides.

Don’t show your R code; the focus should be on your results and visualizations not your computing. Set echo = FALSE to hide R code (this is already done in the template).

Your presentation should convey what you learned from your analysis and the choices you made when constructing your visualizations. Be careful not to simply describe a list of visualizations and analyses (“then we did this, then we did this, etc.”). You must tie the visualizations you show to the questions you are investigating.

Presentation schedule: You should record your presentation in the last week of term and upload your recorded presentation to Brightspace before the last day of classes. Every member of the team should have a visible, speaking role in the presentation. Only one member should upload the recording to Brightspace.

Assessment. See the file grade-presentation.Rmd for the assessment guidelines.

Final exam

The final exam will be written on paper during the exam period at a time selected by the registrar.

Questions on the exam will be similar to the exercises, assignments, and mid-term test.