Semester: Fall 2025
Time: Self-Paced
Location: Online
Instructor: Rebecca Barter (rebecca.barter@hsc.utah.edu)
Faculty Coordinator: Jeff Phillips (jeff.m.phillips@utah.edu)
Short courses
This is a composite course consisting of the four self-paced online short courses, listed below.
You should complete all four of these courses sequentially, starting with Introduction to R for Data Analysis, and ending with Machine Learning with R.
If you finish a course early, you are welcome to start the next course (you do not need to wait until after the previous course’s due date).
Length: 5.5 hours of pre-recorded lectures, 3 projects
Due date: September 18
Description: This course introduces the R programming language and is designed for beginners who are new to R and coding. Specific topics covered include using RStudio and writing documents with quarto along with basic coding principles, defining variables, vectors, and data frames, pipes, data manipulation with dplyr, and data visualization with ggplot2.
Course Learning Objectives:
- Perform operations with character, numeric, logical, and Boolean type objects in R
- Use the dplyr library functions select, mutate, filter, summarize, and group_by to manipulate, and summarize tabular data
- Use the ggplot2 library to create customizable data visualizations, including histograms, bar charts, and scatterplots
Length: 4 hours of pre-recorded lectures, 4 projects
Due date: October 9
Description: You’ve learned the basics of R and the tidyverse, and now you’re ready to conduct more sophisticated data manipulation and analysis. This course is designed to elevate your R programming expertise, building on the foundations laid in the Introduction to R for Data Analysis course. You’ll learn advanced techniques and powerful tools that will transform your data workflows and analytical capabilities, such as creating custom R functions to automate your data science pipeline, reducing code redundancy with map functions, refactoring column variables, joining multiple data frames, and reshaping data. This course utilizes the R programming language, and it is assumed that learners have taken our “Introduction to R for Data Analysis” course or have equivalent experience.
Course Learning Objectives:
- Write your own custom R functions
- Reduce redundancy in your code using iteration techniques with the purrr package
- Refactor column variables
- Reshape your data frames
- Join multiple data frames together
Length: 6 hours of pre-recorded lectures, 6 projects
Due date: November 7
Description: Real-world data can never perfectly capture the real world, and it rarely arrives in a ready-for-analysis format. Before your data is ready for analysis, it is critical that it is formatted appropriately, and you have ensured that it is representing the reality it was originally designed to capture to the greatest extent possible. The process of molding a dataset into a format that satisfies these criteria is known as “data cleaning”. While it may seem like a boring process, data cleaning is arguably the most important stage of the entire data science life cycle, since it ensures that you have a clear understanding of how real-world information is represented in your data as well as its limitations. Since every dataset is “messy” in its own way, the process of data cleaning will be unique to every dataset. This course introduces a series of steps that can be used to help you to understand your data and create a custom data cleaning procedure for any dataset, with a focus on biomedical data, such as electronic health records and health survey data. Specific topics covered include identifying and addressing missing values, handling data quality issues, such as invalid and inconsistent values, reshaping data into a “tidy” format, and creating a custom function in R that provides a reproducible and modifiable data cleaning pipeline for your project. This course utilizes the R programming language, and it is assumed that learners have taken our “Introduction to R for Data Analysis” course, or equivalent. It is recommended that students have also taken our “Advanced R for Data Analysis” course, but this is not a requirement.
Course Learning Objectives:
- Data collection procedure and data dictionaries
- Loading and pre-formatting data in R
- Identifying and handling missing values
- Identifying and handling invalid and inconsistent values
- Converting data to a tidy data format
- Creating and implementing a reproducible data cleaning pipeline
Length: 6 hours of pre-recorded lectures, 5 projects
Due date: December 6
Description: Machine learning offers powerful tools for making predictions from data. Machine learning is about using data to learn patterns that generalize to new situations. This course introduces the process of building, evaluating, and interpreting machine learning models using the tidymodels framework in R. Learners will be guided through each stage of the machine learning workflow: from framing prediction problems and selecting appropriate performance metrics, to training models, tuning hyperparameters, and understanding model predictions. This course uses the R programming language and builds on the “Introduction to R for Data Analysis” course. It is recommended, though not required, that learners have also completed the “Advanced R for Data Analysis” and “Data Cleaning in R” courses.
Course Learning Objectives:
- Frame real-world research questions as machine learning prediction problems
- Use the tidymodels library in R to implement and evaluate models including linear regression, logistic regression, decision trees, random forests, XGBoost, and basic neural networks
- Interpret and visualize machine learning results in R
- Integrate machine learning techniques into your data analysis workflow for predictive insights
Grading
Grade: Each short course contains of a number of projects that are at the end of each “module”.
Your grade for each short course will be based on the average of your individual project scores.
Your grade for the entire composite course will be correspond to the average score you receive across the four short courses.
Late policy: If you do not complete a short-course by its due date (see above), you will lose 2% of your grade from that short-course per day, unless instructor permission is granted.
Final exam: There is no final exam for this course.