Chapter 3 IT infrastructure
See the CCTB WueCampus course for documentation of our computing tools and tutorials for common software.
In general, there are two powerful computers that you can run your code on - the Julia HPC cluster and the gaias. The gaias are the owned by the CCTB, while the Julia HPC belongs to the faculties of Biology, Physics and Chemistry and is maintained by the Rechenzentrum. See below on how to connect to the respective servers.
3.1 SSH
3.1.1 Connect with SSH
There’s several ways to connect directly to the servers. Apple and Linux have inbuilt ssh clients in the terminal. From Windows 10 onward powershell also comes with an ssh client, though be aware it does have some hiccups. For simple tasks PuTTy is sufficient. If you’re trying something more complicated (ssh tunnel) you could try git bash or get Windows Subsystem for Linux
3.1.2 Using SSH to authenticate your Github account TODO: move to Github section OR set links
SSH stands for secure shell protocol. It allows you to securely connect to a remote computer using a commandline interface. Please note that you must be connected to the university network (e.g. via VPN) to access our servers.
3.1.2.1 Step 1: Generate an SSH keypair on the server and copy public key to Github
TODO what github docs should be checked? For details, check Github Docs. Following instructions are adapted from here. Log in to the server (e.g. Curta). Check if you have existing ssh keys with
ls -al ~/.ssh
Generate a new ssh key:
ssh-keygen -t ed25519 -C "your_email@example.com
Make sure that you use the email that you also used for your Github account. Accept the default file name by pressing Enter key and type a password when prompted. You can also leave the password blank if you don’t want to protect your SSH key with an additional password.
Start your ssh agent:
eval "$(ssh-agent -s)"
Add your new ssh key to ssh agent. If your key has a different name, replace id_ed25510 with the respective name of your key
ssh-add -K ~/.ssh/id_ed25519
Add your new ssh key to your Github account (the public key): First, copy the public ssh key with (or any other method of copying of your choice):
cat ~/.ssh/id_ed25519.pub
Now the key should be printed to the terminal. Copy the key (it starts with ssh-ed255…. and ends with your email address) to the clipboard. Go to your Github account, click the icon on the upper-right corner > Settings > SSH and GPG Keys > Click New SSH key. Give your new ssh key a name and paste the copied key in the key text area. Then click Add SSH Key and if prompted, type your account’s password.
3.1.2.2 Step 2: Test the connection
You can test the connection with
ssh -T git@github.com
It can be that the host cannot be authorized the first time that you try to connect. If you want to add Github to the known hosts, you can just type yes and the next time you won’t see this warning anymore. Now you can clone, push and pull from a repository without having to enter your password and username all the time. Note: This only works if you clone a repository via ssh and not via https!
3.1.3 ssh alias
When using ssh, scp or rsync on Linux to connect to or move files
always having to type out a user and IP adress can get tiring. There’s a
couple ways around that but the easiest is provided by the ssh tool
itself. It is possible to configure an alias that replaces the whole
user and IP address and can be filled with the autocomplete.
To accomplish this:
find the path where your ssh configuration is located. This should be a folder in your user folder called
.ssh(When usinglsit only shows you the some of the folders, usels -ato find folders leading with a dot.)move into the folder (
cd ssh) and create a file named config (eg.touch config)in the file the configuration is formatted like the following:
HOST hostname1
SSH_OPTION value
SSH_OPTION value
HOST hostname2
SSH_OPTION value
HOST *
SSH_OPTION value
more specific options go to the top, more general ones further down. Everything in Hostname * is for all ssh sessions.
If you were to configure a shortcut for say gaia5 it would look something like this:
Host gaia5
User yourusername
HostName 132.187.198.19
4.Write into the file and save it. You should now be able to use your shortcut with ssh.
for further details you can check online, I found this site also
explains it nicely and in more detail:
https://linuxize.com/post/using-the-ssh-config-file/
TODO: how to do a key pair
3.2 Using HPC Julia
Robin - check over it TODO
General information by the Julia support
The Julia HPC Cluster only coincidentally has the same name as the Julia Programming language. You can use it to run your models and also store your data on it if you want. Here is how to use it:
The first step to using the Julia HPC Cluster is to get an account. To do this, go here. Simply go to “Benutzer freischalten” at the very bottom of the page and choose “Ecosystem Modeling” from the drop down menu. You should soon get an email about your account being created. Now it’s time to get started!
Depending on your PC’s operating system, you will access the HPC differently.
on Windows: install PuTTy and WinSCP. WinSCP allows you to transfer data between the cluster and your own PC. With PuTTy, you can access the HPC and tell it what to do. Connect to both via your JMU-Account name like so: name[at]julia.uni-wuerzburg.de and your JMU password. In PuTTy, you will need to use linux commands for the cluster. A linux tutorial can be found here.
on linux: add how to here
You will probably need to install all packages for R and julia that the code you want to run uses. You can do this in R very easily. Type
Rand press enter for R to open. Then, give the commandinstall.packages("package")as you usually would. The packages should install without problem. For julia, you might need to install it first. Instructions can be found here To install the packages, start Julia by typingjuliaand pressing enter. Then, typeusing Pkgand install your packages as usual withPkg.add("package").You can use your normal scripts that you also use to run your code on your PC. Just double check that there are no absolute path names. Remember to save results! To execute your script, you need a bash file. A good way to create these files is using Notepad++. Open Notepad++ and create a new file (top left). Important: go to Edit, Format end of line and choose UNIX. Otherwise, the cluster will not be able to read your file. In your bash file, your code will look something like this for running an R script:
#!/bin/bash
#SBATCH -J #NameOfJob
#SBATCH -c 2 #this is your number of cores to use
#SBATCH --mail-type=ALL #if you want to receive emails when your job starts/fails/finishes, very handy
#SBATCH --mail-user=name\@uni-wuerzburg.de #(mailto:mail-user=name@uni-wuerzburg.de){.email} #(your email address) \
#SBATCH -p test #(where do you want your job to run?)
#SBATCH -t 3-00:00:00 #(how much time is allocated at maximum?)
#SBATCH --mem=100G #(how much memory do you need?)
Rscript "Test.R"- Run your bash script using “sbatch name.sh”. You can use “squeue” to check if your code is pending (PD) or running (R). Depending on your settings in the bash script, you might get email notifications about your job. When it has finished, you can transfer the results from the cluster to your PC using WinSCP.
Happy coding!
3.2.1 Calculate you CO2 Emission from HPC
Step 1: log into HPC and run the following command
Step 2: Download usage_2021.txt from HPC
Step 3: Open usage_2021.txt locally and remove the second row wi
Step 4: Use R to calculate CPU hours
usage <- readr::read_csv("usage_2021.txt")
dplyr::summarize(usage, sum_CPU_hours = sum(CPUTimeRAW)/60/60)Step 5: Calculate climate impact
Go to https://green-algorithms.org/ Enter the following for HPC: Runtime: number calculated above Type of cores: CPU Number of cores: 1 (because the hours are CPU hours) Model: Any Memory available: average memory that you used (I put 10 GB) Platform: Local server Select correct location and put no to all additional questions asked about CPU
3.3 Using slurm
Robin TODO
SLURM is the resource manager
running on the gaias and the julia log-in node. It’s job is to
distribute tasks thorough the run-time among the planets so you and
others can all use them at the same time. While the julia cluster has a
resource manager the planets don’t and queue everything directly
regardless of how often you’ve used it so please be considerate.
Some of the most common commands are:
- srun [command] gives the following command directly to SLURM and has
it output to your current shell. If you get disconnected in the meantime
you won’t be able to reconnect so be careful - sbatch [batchscript]
passes a job script to SLURM and puts it in the queue. One of the
planets will execute all the steps in the job script and puts all the
output in a .out file as specified in the options. You can also use an
sbatch script to submit several jobs at the same time -
scancel [jobid] use this to cancel a running job - squeue show
currently running jobs - sinfo show current status of the whole
cluster - sstat [jobid] give stats of a currently running or recently
finished job
Here are some of the most useful options for srun and
sbatch(reminder: you can always check for more options yourself by
putting in srun --help or search for man srun ):
- --job-name gives your job a name for squeue and the like -
--output where to put the output of your batch file. Default is
slurm-%A_%a.out where %A is the job ID and %a is the array index.
- -n --ntasks how many tasks you want to have run simultaneously. -
-t --time Maximum time your job will run. If it runs longer it will be
killed
- --mem request this much memory. Default is in MB, add G to request
GB. If you don’t specify the whole node will be requested.
- -w --nodelist request a specific node or list of nodes, they will
all be used--a
–arrayrequests a job array -–mail-typeSLURM can send you mails to update you about the status of your jobs. Most common is ALL, check documentation for more. -–mail-userSet your email here Here's another summary by the people who make SLURM: https://slurm.schedmd.com/pdfs/summary.pdf Helpful commands are: -srun
-w [node] –mem=[needed memory] –pty
bashrequests to open a bash run time on the specified node. You can either start R or Julia from here or check how much memory or CPU is used withhtop`
3.4 Using CCTB Server = gaias and planets
Here at the CCTB our own servers are the gaias and the planets. While the gaias are more powerful than your own PC their main job is to be used for smaller tasks and to serve as log-in nodes to get access to the more powerful planets. running smaller bits of code on the gaias is no problem, but if you have more consider having it run on the planets.
3.4.1 Opening an Rstudio server on the planets
An Rstudio server works basically the same way as Rstudio on your local machine. Only that your code is running somewhere else so you are able to access the power of our computational infrastructure.
Find the script
rstudio-server.shin internal/scripts and copy it to a directory on one of the gaias you are using.From your chosen directory start the SLURM job with the command
sbatch rstudio-server.shSLURM will tell you your job is running and gives you your job ID. Open the log file that was created with your job number (you can use something like
catorless)cat rstudio-server.job.[jobID]Follow the instructions in the job file. These should be to connect via a ssh tunnel (on windows 10 and up you can use powershell, if that doesn’t work refer to the ssh section), open your favorite browser and point it to
localhost:8787where you’ll find a log-in screen for your rstudio server. Log-in with the given credentials.When you are done close the server by pressing the off button on the top right and cancel the SLURM job by entering
scancel -f [jobID]
Some notes:\
- The way the base script works after 8 hours SLURM will send a
termination signal to the r server which will then save everything
and next time you open a new session you should again have your
environment back
- If you need more memory or a longer run time you can change it in
the script header using typical SLURM syntax
- See the section on Singularity and containers (in progress) if you want to change anything about the container. You will probably have to change quite a bit of the script to use a different one than the one that’s used
- The way this job script is set up you can’t install new R packages.
This is so this will work out of the box with no conflicts. You can
enable access of the R session to other packages by commenting out
line 31
export R_LIBS_USER=/usr/local/lib/R/library,/usr/local/lib/R/site-libraryat your own risk. - The script is based on Rocker