Chapter 2 Programming tools
2.1 Basic tools and languages
The most commonly used programming languages in our lab are R and Julia. They are both great languages for biology-related programming and statistical analysis.
If you already know how to program and are looking for a challenge, our lab members enjoy participating in Advent of Code. It forces you to think outside the box and will teach you some new tricks!
2.1.1 R
R is very well-established and probably has the most support from all the languages we use. Most of the time, you can copy-paste your error message from R into Google, and you will find several solutions for your problem. A very helpful website is Stack Overflow.
2.1.1.1 Interface
In our lab, most people use RStudio when they are working in R. RStudio allows you to very easily write and execute your scripts, create plots and keep an eye on your environment. It also includes a helpful debugging mode. Other, not as commonly used GUIs are RKWard, BlueSky and R Commander.
2.1.1.2 Resources
To get started, there are several books on how to use R.
R for Data Science: “You’ll learn how to get your data into R, get it into the most useful structure, transform it, visualise it and model it.”
Advanced R: “The book is designed primarily for R users who want to improve their programming skills and understanding of the language. It should also be useful for programmers coming to R from other languages, as help you to understand why R works the way it does.”
Further reading:
Coding Etiquette (in R) - slides from Anne prepared for BioInfo F2 2020 (ask Anne)
Troubleshooting in R - Slides from Anne prepared for BioInfo F2 2020 (ask Anne)
R for Dummies, Andrie de Vries & Joris Meys, 2015 (PDF available, ask Jana)
2.1.1.3 Packages
What are R packages?
If you have a problem that you need a certain function to solve, many times you do not need to write that function from scratch yourself. Usually, another (more skilled) person has already written some code that solves this problem. These “packages of code” can be uploaded to CRAN (the Comprehensive R Archive Network) and then downloaded to be used by other users.
An example: You have a nice plot and want to ensure it is easy to read and interpret for colorblind people. Now you could set all of your 200 colors by hand so that it is not the default red-green nightmare, but it is much easier to just download a package called “viridis” that contains multiple color palettes that are optimized for this very use. Then, all you need to do is tell R to use the viridis palette instead of setting all colors by hand, which would get very tedious. Installing and loading a package via CRAN:
install.packages("PackageName")
library(PackageName)
Note that packages need to be loaded into your environment using the
library() command in order for R to use them. You can also download
packages that are not yet on CRAN using
devtools.
Commonly used packages in our working group include (but are not limited to):
2.1.1.4 Rmd
R Markdown is a great way to create documents in HTML, Word or PDF format. Rmds can include chunks of code which create the tables and plots of the document. This way, you don’t have to update your document parallel to your code, as you keep both your code to create your plots and said plots in the file. This file was also created using Rmd! You can also use .Rmd files to create computational notebooks.
Best overall guide is linked.
A future alternative might be quarto.
2.1.2 Julia, the language
Julia is a relatively new programming language with a significantly higher performance than R. For complex models that take a lot of time to run in R, it might make sense to transfer them to Julia. While it has many advantages, it is not as well-known as R, so you may need to search for an explanation for your error message a bit longer. Julia has its own Discourse page with many helpful entries and tutorials as well.
The Julia Documentation is also VERY good.
To quickly look up Julia syntax, you can use this Cheat Sheet.
2.1.2.1 Best tutorials
To get started, here is a very nice book for beginners by Ben Lauwens and Allen Downey that explains not only the Julia language, but also introduces the concept of programming in general.
JuliaAcademy offers an introductory course that is tried and liked by Robin.
If you are not that into reading, the official Julia Programming Language Youtube channel has some very nice tutorials structured by playlists. One especially helpful playlist is Julia Programming for Nervous Beginners. They also hold the annual JuliaCon online and upload the speaker’s talks to Youtube if you cannot be there live.
2.1.2.2 Plotting in Julia
Plots in Julia can be a bit more tricky than in R (but that might be because we are used to plotting in R - don’t let this discourage you from plotting your heart out in Julia if that is what you want to do!). Here are some packages that may help facilitate your plotting adventures:
2.1.2.3 Interface
As for R, you can choose from a variety of interfaces for Julia. In our working group, these are the most common ones:
Atom and Juno for julia. Here is a nice tutorial on how to set it up. However, even though popular in our lab, Juno will receive no more feature updates as of March 2022, so if you are new to Julia, it might be best to choose a different IDE.
VSCode is the successor of Juno. It works in a similar way and is also used by some members of our lab and is at this point the recommended option to use.
if you are interested in using a Computational Notebook (see section below), Jupyter might be a good fit for you. Learn how to use Jupyter and Julia together here. This is still a bit buggy, though.
2.1.3 Version control
Git is a great software that allows you to keep track of the changes you make in your code (version control). This way, you will not need to keep several files named “Code_02_revised” and “Code_02_revised_changed” etc, which can quickly get confusing. There are different servers that you can store your repositories on, the most common ones being GitHub and GitLab. You can install git on your PC and use it via the commandline, or you can use it with the help of a UI, like GitHub Desktop. This is a very helpful Training and Tutorial that explains what git is and how to use it very nicely.
2.1.4 Bash commands
When we let our models run, they tend to need a lot of computational power. Running experiments on your laptop can take days, weeks or even months. Therefore, we can send our scripts to more powerful machines from the University infrastructure. To tell the computer “execute script XXX” you need to write a short shell using bash.
A nice tool to use for bash scripts is shellcheck - Robin
For an example bash script see section IT infrastructure - Using HPC Julia
2.2 Advanced tools
2.2.1 Singularity
Singularity is a way to use containers, small virtual machines that use the resources of the computer they run on but have their own environment. This will only be a short explanation on how to use them otherwise refer to the official documentation.
Before a container can be used it first has to be built. This can be
done by either using a .def file (Only if you have sudo rights on the
local machine and under Linux so use Linux or WSL) or you can get it on
the base of an existing container in the Singularity library or from
docker. Prebuilt containers on the gaias are stored in
/storage/singularity_images. ecomodsdm.sif is a container based on
docker://rocker/geospatial
The main ways to use Singularity containers are with the commands:
- - singularity shell [container.sif] opens a shell in the provided
container - singularity exec [container.sif] This executes a
predefined command as defined in the definition file -
singularity exec [container.sif] [command] This only executes the
requested command e.g. opens R or executes a program
By default /home/$USER is bound into the container at execution. If
more folders need to be bound into it do this with the
--bind [local]:[container] option
Here is a users guide.
2.2.3 Regex
Robin
A handy website that goes through the basics of Regular Expression grammar.
WHAT IS THAT? https://regex101.com/
2.3 Best practices
2.3.1 Best Modeling and Programming practices
There are some practices in modelling and programming that make the job a lot easier. Our former colleague, Daniel, has put together some thoughts on what to keep in mind:
Use relative paths
Document your code and how to use it Using Github/Gitlab. Gitlab is self-hosted (Biology). Zenodo is citable, trusted third-party.
User Manual
DeveloperManual
Script (automatize) experimental steps (i.e. pipeline)
- Analyses in a single R script
Use version control
Use commit identifier to refer to a particular code version for a experiment
Use seeds for replicate (must be robust for changes in OS)
Map and config file as output (for seed for example) Data longevity & accessibility: Use storage folder in gaias. Physical backup, repository of very large data. Container for codes that require dependencies and particular versions.
2.3.2 Blog post series on software development
Daniel also has a nice blog post series if you want to read more.
See also the paper from Daniel (Vedder, Ankenbrand, and Sarmento Cabral 2021).
References
Vedder, Daniel, Markus Ankenbrand, and Juliano Sarmento Cabral. 2021. “Dealing with Software Complexity in Individual-Based Models.” Methods in Ecology and Evolution 12 (12): 2324–33. https://doi.org/10.1111/2041-210x.13716.