Semester: Fall 2024
Time: Self-Paced
Location: Online
Instructor: Rebecca Barter (rebecca.barter@hsc.utah.edu)
Faculty Coordinator: Jeff Phillips (jeff.m.phillips@utah.edu)
Short courses
This is a composite course consisting of the four self-paced online short courses, listed below.
You should complete all four of these courses sequentially, starting with Introduction to R for Data Analysis, and ending with Statistics with R.
If you finish a course early, you are welcome to start the next course (you do not need to wait until after the previous course’s due date).
Registration: Click here to register for this course on Headlamp.
Length: 5.5 hours of pre-recorded lectures, 3 projects
Due date: September 18
Description: This course introduces the R programming language and is designed for beginners who are new to R and coding. Specific topics covered include using RStudio and writing documents with quarto along with basic coding principles, defining variables, vectors, and data frames, pipes, data manipulation with dplyr, and data visualization with ggplot2.
Course Learning Objectives:
- Perform operations with character, numeric, logical, and Boolean type objects in R
- Use the dplyr library functions select, mutate, filter, summarize, and group_by to manipulate, and summarize tabular data
- Use the ggplot2 library to create customizable data visualizations, including histograms, bar charts, and scatterplots
Length: 4 hours of pre-recorded lectures, 4 projects
Due date: October 9
Description: You’ve learned the basics of R and the tidyverse, and now you’re ready to conduct more sophisticated data manipulation and analysis. This course is designed to elevate your R programming expertise, building on the foundations laid in the Introduction to R for Data Analysis course. You’ll learn advanced techniques and powerful tools that will transform your data workflows and analytical capabilities, such as creating custom R functions to automate your data science pipeline, reducing code redundancy with map functions, refactoring column variables, joining multiple data frames, and reshaping data. This course utilizes the R programming language, and it is assumed that learners have taken our “Introduction to R for Data Analysis” course or have equivalent experience.
Course Learning Objectives:
- Write your own custom R functions
- Reduce redundancy in your code using iteration techniques with the purrr package
- Refactor column variables
- Reshape your data frames
- Join multiple data frames together
Length: 6 hours of pre-recorded lectures, 6 projects
Due date: November 7
Description: Real-world data can never perfectly capture the real world, and it rarely arrives in a ready-for-analysis format. Before your data is ready for analysis, it is critical that it is formatted appropriately, and you have ensured that it is representing the reality it was originally designed to capture to the greatest extent possible. The process of molding a dataset into a format that satisfies these criteria is known as “data cleaning”. While it may seem like a boring process, data cleaning is arguably the most important stage of the entire data science life cycle, since it ensures that you have a clear understanding of how real-world information is represented in your data as well as its limitations. Since every dataset is “messy” in its own way, the process of data cleaning will be unique to every dataset. This course introduces a series of steps that can be used to help you to understand your data and create a custom data cleaning procedure for any dataset, with a focus on biomedical data, such as electronic health records and health survey data. Specific topics covered include identifying and addressing missing values, handling data quality issues, such as invalid and inconsistent values, reshaping data into a “tidy” format, and creating a custom function in R that provides a reproducible and modifiable data cleaning pipeline for your project. This course utilizes the R programming language, and it is assumed that learners have taken our “Introduction to R for Data Analysis” course, or equivalent. It is recommended that students have also taken our “Advanced R for Data Analysis” course, but this is not a requirement.
Course Learning Objectives:
- Data collection procedure and data dictionaries
- Loading and pre-formatting data in R
- Identifying and handling missing values
- Identifying and handling invalid and inconsistent values
- Converting data to a tidy data format
- Creating and implementing a reproducible data cleaning pipeline
Length: TBD
Due date: December 6
Description: TBD
Course Learning Objectives: TBD
Grading
Grade: Each short course contains of a number of projects that are each graded out of 4 points. For any given project, a score of:
- 1 point: you did not correctly complete the majority of the required tasks
- 2 points: you correctly completed the majority, but not all of, the required tasks
- 3 points: you correctly completed all of the required tasks.
- 4 points: you correctly completed all of the required tasks AND your code and figures were tidy and polished, where relevant.
Your grade for each short course will be based on the average of your individual project scores.
Your grade for the entire composite course will be correspond to the average score you receive across the four short courses.
Late policy: If you do not complete a short-course by its due date (see above), you will lose 2% of your grade from that short-course per day, unless instructor permission is granted.
Final exam: There is no final exam for this course.