We generally follow, and encourage others to follow, Ten Simple Rules for a Computational Biologist’s Laboratory Notebook.
Lab notebooks provide a complete record of procedures, reagents, data, and thoughts, as well as an explanation of why/how you are doing an experiment and its results.
We recommend taking note of the following in your daily and/or dry lab experiment notebook:
One can use R Projects that can be linked to version control programs such as Git (which is covered later in the guide) to create and manage reproducible data.
See this R Master Guide for a full introduction to R and RStudio.
R Markdown, a tool in RStudio, creates reproducible documents that can combine narrative text and code from languages such as R, Python, and SQL. See below for some helpful resources:
It might be helpful to add a sessionInfo() function at the bottom of Rmarkdown files in order to help people know what version of R and its packages you are using. You can also use this package, titled “packrat”, for R package version control, which allows versions of packages to be shared with others.
These lessons, titled “R for Reproducible Scientific Analysis,” were covered in the aforementioned R Master Guide but are particularly helpful in learning how to code for data analysis and setting a good workflow. Take a look at the lesson plan to see what specific lessons fit your needs.
Git is a free, open source distributed version control system. It tracks changes, maintains history, and allows you to revert changes. Clones will mirror the full history of the original repository. For more information on this as well as an overview of different version control systems, check out this chapter of the Git documentation book.
For an overview of the different version control systems and a brief explanation of what makes git a distributed
Github uses Git to allow software development version control to manage changes to scripts through a push/pull system. It is a very useful tool for both individual and team version control, with settings to allow others to view your code for collaboration and/or reproducibility. Below are some GitHub tutorials you can follow:
GitLab also uses Git and is a service that allows access to private pipelines for free. It has seamless migration to/from GitHub, and it has an RStudio integration. Below are some resources explaining GitLab, as well as some clarifying the differences between it and GitHub:
This section provides steps you can follow for general data management and storage.
RStudio projects, scripts, and some small files should be stored in a GitHub project/repository as well as a backup location.
Cheaha is UAB’s cluster computing environment. You can learn more about it in this youtube video.
These tools should be used for version control for tool and workflow management on Cheaha to ensure reproducibility and scalability.
Anaconda: an integrated tool used to build the environment the pipeline runs on. It installs programs into the working directory and allows you to view and change which versions programs are utilizing.
Snakemake: built on a make system that ensures that files are produced for each appropriate step, ensuring every expected output is produced. You can rerun a step by deleting the output from that step and the pipeline will rerun from that point.
Our lab uses a Google Drive folder, where data is organized into subfolders by project.
Box is a cloud storage service that is free to UAB staff. Though UAB previously announced discontinuing its use of box, it has been revealed that negotiations have allowed for UAB to continue its use as before.
Our lab uses G-Drives, but one needs to be aware of how a given drive is formatted, as this will affect how well it works with different computing platforms. We label these drives with our initials as well as additional labels as needed and a description of what is in the hard drive. Other options are Seagate and other reputable brands with extensive reviews. As external drives are a physical storage device for data, it is important to choose a trustworthy brand and model that is compatible with your needs.