These workshops introduce methods, tools, and software for reproducibly managing, manipulating, analyzing, and visualizing large-scale biomedical data using the R statistical computing environment. Visit to register.

IMPORTANT: Click here for instructions on setting up your computer for any of these workshops. Each workshop involves lots of hands-on practice coding, and you’ll need to download and install some free software prior to our first class. This may take up to an hour or so, and please do not hesitate to email one of the instructors prior to the workshop if you are having difficulty.

THRIV Data Science

The THRIV Scholars Data Science Training Program is a seven-part series of courses on the essentials of biological data science directed toward junior faculty seeking a clinical and translational research career. Please see the THRIV syllabus for more information.

BIMS 8382

BIMS 8382 is a graduate class that will run in Spring 2018. Please see the BIMS 8382 syllabus for more information.


What’s this series all about?

Both the workshop series and the BIMS8382 course introduce methods, tools, and software for reproducibly managing, manipulating, analyzing, and visualizing large-scale biomedical data. Specifically, the course introduces the R statistical computing environment and packages for manipulating and visualizing high-dimensional data, covers strategies for reproducible research, and culminates with analyses of real experimental NGS data using R and Bioconductor packages.

This is not a “Tool X” or “Software Y” class. I want you to take away from this series the ability to use an extremely powerful scientific computing environment (R) to do many of the things that you’ll do across study designs and disciplines – managing, manipulating, visualizing, and analyzing large, sometimes high-dimensional data. Whether that data is gene expression data from yeast, microbial genomics data from B. pertussis, public health data from Gapminder, RNA-seq data from humans, movie preference trends from Netflix, or truck routing data from FedEx, you’ll need the same computational know-how and data literacy to do the same kinds of basic tasks in each. I might show you how to use specific tools here and there (DESeq2 for RNA-seq analysis, ggtree for drawing phylogenetic trees, etc.), but these are not important – you probably won’t be using the same specific software or methods 10 years from now, but you’ll still use the same underlying data and computational foundation. That is the point of this series – to arm you with a basic foundation, and more importantly, to enable you to figure out how to use this tool or that tool on your own, when you need to.

This is not a statistics class. There is a short lesson on essential statistics using R but this 3-hour lesson offers neither a comprehensive background on underlying theory nor in-depth coverage of implementation strategies using R. Some general knowledge of statistics and study design is helpful, but isn’t required for this course.

What are the pre-requisites?

For the entire workshop series / BIMS8382 course, there are none!

But, if you’re taking these workshops piecemeal, it depends on the class. The introductory courses don’t assume any knowledge of programming or using a command-line interface, but if you’ve ever had any experience here, the content won’t come as so much of a shock. But don’t panic. Command-line interfaces and programming languages like R are incredibly powerful and will be utterly transformative on your research. There’s a learning curve, and it’s near-vertical in the beginning, but it’s surmountable and the payoff is worth it!

The later classes in the R series (manipulation, tidy data, visualization, RNA-seq, phylogenetic trees, statistics, TCGA, Bioconductor, etc.) all require a working knowledge of R, dplyr, and ggplot2.

(Be sure to see the required software setup and recommended reading on the setup page).

Do I need a laptop?

YES. You must have access to a computer on which you can install software. The class will be a mix of lecture, discussion, but primarily live coding. You must bring your laptop to each session. Bring your charging cable also. Please follow the setup instructions prior to the workshop.

Where do I get additional help?

Glad you asked! See here.

Attribution: Course material is inspired by and/or modified in part from Jenny Bryan’s Stat 545 course, Software Carpentry, Data Carpentry, David Robinson’s blog, Marian Schmidt’s MSU NGS Workshop, Vanderbilt Department of BioStatistics Datasets, the ggtree vignettes, Shirin Glander’s blog, and likely many others.