Chapter 3 IT infrastructure

See the CCTB WueCampus course for documentation of our computing tools and tutorials for common software.

In general, there are two powerful computers that you can run your code on - the Julia HPC cluster and the gaias. The gaias are the owned by the CCTB, while the Julia HPC belongs to the faculties of Biology, Physics and Chemistry and is maintained by the Rechenzentrum. See below on how to connect to the respective servers.

3.1 SSH

3.1.1 Connect with SSH

There’s several ways to connect directly to the servers. Apple and Linux have inbuilt ssh clients in the terminal. From Windows 10 onward powershell also comes with an ssh client, though be aware it does have some hiccups. For simple tasks PuTTy is sufficient. If you’re trying something more complicated (ssh tunnel) you could try git bash or get Windows Subsystem for Linux

3.1.3 ssh alias

When using ssh, scp or rsync on Linux to connect to or move files always having to type out a user and IP adress can get tiring. There’s a couple ways around that but the easiest is provided by the ssh tool itself. It is possible to configure an alias that replaces the whole user and IP address and can be filled with the autocomplete.

To accomplish this:

  1. find the path where your ssh configuration is located. This should be a folder in your user folder called .ssh (When using ls it only shows you the some of the folders, use ls -a to find folders leading with a dot.)

  2. move into the folder (cd ssh) and create a file named config (eg. touch config)

  3. in the file the configuration is formatted like the following:

HOST hostname1
  SSH_OPTION value
  SSH_OPTION value
  
HOST hostname2
  SSH_OPTION value

HOST *
  SSH_OPTION value

more specific options go to the top, more general ones further down. Everything in Hostname * is for all ssh sessions.

If you were to configure a shortcut for say gaia5 it would look something like this:

Host gaia5
  User yourusername
  HostName 132.187.198.19

4.Write into the file and save it. You should now be able to use your shortcut with ssh.

for further details you can check online, I found this site also explains it nicely and in more detail:
https://linuxize.com/post/using-the-ssh-config-file/

TODO: how to do a key pair

3.2 Using HPC Julia

Robin - check over it TODO

General information by the Julia support

The Julia HPC Cluster only coincidentally has the same name as the Julia Programming language. You can use it to run your models and also store your data on it if you want. Here is how to use it:

  1. The first step to using the Julia HPC Cluster is to get an account. To do this, go here. Simply go to “Benutzer freischalten” at the very bottom of the page and choose “Ecosystem Modeling” from the drop down menu. You should soon get an email about your account being created. Now it’s time to get started!

  2. Depending on your PC’s operating system, you will access the HPC differently.

  • on Windows: install PuTTy and WinSCP. WinSCP allows you to transfer data between the cluster and your own PC. With PuTTy, you can access the HPC and tell it what to do. Connect to both via your JMU-Account name like so: name[at]julia.uni-wuerzburg.de and your JMU password. In PuTTy, you will need to use linux commands for the cluster. A linux tutorial can be found here.

  • on linux: add how to here

  1. You will probably need to install all packages for R and julia that the code you want to run uses. You can do this in R very easily. Type R and press enter for R to open. Then, give the command install.packages("package") as you usually would. The packages should install without problem. For julia, you might need to install it first. Instructions can be found here To install the packages, start Julia by typing julia and pressing enter. Then, type using Pkg and install your packages as usual with Pkg.add("package").

  2. You can use your normal scripts that you also use to run your code on your PC. Just double check that there are no absolute path names. Remember to save results! To execute your script, you need a bash file. A good way to create these files is using Notepad++. Open Notepad++ and create a new file (top left). Important: go to Edit, Format end of line and choose UNIX. Otherwise, the cluster will not be able to read your file. In your bash file, your code will look something like this for running an R script:

    #!/bin/bash
    #SBATCH -J #NameOfJob
    #SBATCH -c 2 #this is your number of cores to use 
    #SBATCH --mail-type=ALL #if you want to receive emails when your job starts/fails/finishes, very handy
    #SBATCH --mail-user=name\@uni-wuerzburg.de #(mailto:mail-user=name@uni-wuerzburg.de){.email} #(your email address) \
    #SBATCH -p test #(where do you want your job to run?)
    #SBATCH -t 3-00:00:00 #(how much time is allocated at maximum?) 
    #SBATCH --mem=100G #(how much memory do you need?) 
    Rscript "Test.R"
  1. Run your bash script using “sbatch name.sh”. You can use “squeue” to check if your code is pending (PD) or running (R). Depending on your settings in the bash script, you might get email notifications about your job. When it has finished, you can transfer the results from the cluster to your PC using WinSCP.

Happy coding!

3.2.1 Calculate you CO2 Emission from HPC

Step 1: log into HPC and run the following command

sacct --user=<your-user> --starttime=2021-01-01 --format='CPUTimeRAW' > usage_2021.txt

Step 2: Download usage_2021.txt from HPC

Step 3: Open usage_2021.txt locally and remove the second row wi

Step 4: Use R to calculate CPU hours

usage <- readr::read_csv("usage_2021.txt")
dplyr::summarize(usage, sum_CPU_hours = sum(CPUTimeRAW)/60/60)

Step 5: Calculate climate impact

Go to https://green-algorithms.org/ Enter the following for HPC: Runtime: number calculated above Type of cores: CPU Number of cores: 1 (because the hours are CPU hours) Model: Any Memory available: average memory that you used (I put 10 GB) Platform: Local server Select correct location and put no to all additional questions asked about CPU

3.3 Using slurm

Robin TODO
SLURM is the resource manager running on the gaias and the julia log-in node. It’s job is to distribute tasks thorough the run-time among the planets so you and others can all use them at the same time. While the julia cluster has a resource manager the planets don’t and queue everything directly regardless of how often you’ve used it so please be considerate.
Some of the most common commands are:
- srun [command] gives the following command directly to SLURM and has it output to your current shell. If you get disconnected in the meantime you won’t be able to reconnect so be careful - sbatch [batchscript] passes a job script to SLURM and puts it in the queue. One of the planets will execute all the steps in the job script and puts all the output in a .out file as specified in the options. You can also use an sbatch script to submit several jobs at the same time - scancel [jobid] use this to cancel a running job - squeue show currently running jobs - sinfo show current status of the whole cluster - sstat [jobid] give stats of a currently running or recently finished job
Here are some of the most useful options for srun and sbatch(reminder: you can always check for more options yourself by putting in srun --help or search for man srun ):
- --job-name gives your job a name for squeue and the like - --output where to put the output of your batch file. Default is slurm-%A_%a.out where %A is the job ID and %a is the array index. - -n --ntasks how many tasks you want to have run simultaneously. - -t --time Maximum time your job will run. If it runs longer it will be killed
- --mem request this much memory. Default is in MB, add G to request GB. If you don’t specify the whole node will be requested.
- -w --nodelist request a specific node or list of nodes, they will all be used--a –arrayrequests a job array -–mail-typeSLURM can send you mails to update you about the status of your jobs. Most common is ALL, check documentation for more. -–mail-userSet your email here Here's another summary by the people who make SLURM: https://slurm.schedmd.com/pdfs/summary.pdf Helpful commands are: -srun -w [node] –mem=[needed memory] –pty bashrequests to open a bash run time on the specified node. You can either start R or Julia from here or check how much memory or CPU is used withhtop`

3.4 Using CCTB Server = gaias and planets

Here at the CCTB our own servers are the gaias and the planets. While the gaias are more powerful than your own PC their main job is to be used for smaller tasks and to serve as log-in nodes to get access to the more powerful planets. running smaller bits of code on the gaias is no problem, but if you have more consider having it run on the planets.

3.4.1 Opening an Rstudio server on the planets

An Rstudio server works basically the same way as Rstudio on your local machine. Only that your code is running somewhere else so you are able to access the power of our computational infrastructure.

  1. Find the script rstudio-server.sh in internal/scripts and copy it to a directory on one of the gaias you are using.

  2. From your chosen directory start the SLURM job with the command sbatch rstudio-server.sh

  3. SLURM will tell you your job is running and gives you your job ID. Open the log file that was created with your job number (you can use something like cat or less) cat rstudio-server.job.[jobID]

  4. Follow the instructions in the job file. These should be to connect via a ssh tunnel (on windows 10 and up you can use powershell, if that doesn’t work refer to the ssh section), open your favorite browser and point it to localhost:8787where you’ll find a log-in screen for your rstudio server. Log-in with the given credentials.

  5. When you are done close the server by pressing the off button on the top right and cancel the SLURM job by entering scancel -f [jobID]
    Some notes:\

  • The way the base script works after 8 hours SLURM will send a termination signal to the r server which will then save everything and next time you open a new session you should again have your environment back
  • If you need more memory or a longer run time you can change it in the script header using typical SLURM syntax
  • See the section on Singularity and containers (in progress) if you want to change anything about the container. You will probably have to change quite a bit of the script to use a different one than the one that’s used
  • The way this job script is set up you can’t install new R packages. This is so this will work out of the box with no conflicts. You can enable access of the R session to other packages by commenting out line 31 export R_LIBS_USER=/usr/local/lib/R/library,/usr/local/lib/R/site-library at your own risk.
  • The script is based on Rocker

3.5 X2GO

TODO