Complete Cluster Course

12/11/2025 17/11/2025

Complete Cluster Course

CRG Training Center (Bioinformatic room)

This course will introduce and consolidate material presented in all levels of cluster course and expand on the concepts to be aware of when trying to optimize use of the cluster.
If you have never used the CRG cluster, this is your mandatory course to obtain the user license. If you are already using the CRG Cluster and need an update, you will need to take the this course.

The main message of the course is to embrace the parallelism available within the cluster and that pipelines should be made from lots of small independent pieces that are spread throughout the cluster rather than large monolithic long jobs that run on a single node. The course will show why this should be done and how to achieve it.

Topics that are going to be addressed:

Video tour of the data centre
What is a cluster
Logging in
Queuing / the scheduler
What resource are available at the CRG cluster
Simple batch scripts - directives
Troubleshooting - what happened to my jobs?
Interactive sessions
Supercomputers, beowulf clusters, horizontal v vertical scaling
Hardware considerations
Multithreaded jobs, parallelism, Amdahl's Law
Job arrays
Job dependencies
Building a pipeline
Storage issues, treemap
Job stats, resource estimation
Scaling analysis

What NOT to expect:
Specific bioinformatics methods, pipeline builders (nextflow, snakemake etc.)

Pre-requisite: Linux Terminal for beginners course (or Linux experience)

Target audience: CRG staff
Instructors and teachers: Emyr James (Head of SIT) and other SIT members
Dates: 12th, 13th, 14th and 17th of November 2025
Time: 10:00-13:00h
Level: Intermediate-advanced
Location: Bioinformatics room, CRG Training Centre
Maximum number of participants: 18
Registration deadline: 7th November 2pm

Registration HERE

For any information, please send an email to CRG Training and Academic office (TAO): training@crg.eu

Feedback from previous editions:
Around the world, many institutions now have clusters in order to help perform complex calculations (or just resource-intensive computations) and advance sciences. However, we are currently running into a shortage in computing power with an exponential increase in the number of users around the world. And the way these institutions are responding to this issue is by increasing the compute units. However, from an environmental point of view, this is a crisis as the increase in the computational units is leading to an increased consumption of other essential natural resources and an increased contribution to global warming. However, in reality, most of the time these compute-related bottlenecks can be solved very simply by optimising the use and thereby giving everyone a better chance at using the available units rather than increasing the units. Hence, this type of course is of utmost importance as it makes the users aware of the consequences of their actions while running the compute units recklessly.
This gives us a structured understanding of the cluster use and, more importantly, how the storage and the jobs are managed, and how important it is to respect the fact that many people share this cluster. We must be efficient with our code so that the cluster use is optimised and everyone gets to benefit from it equally.
It is a mandatory course if you want to understand how to work in the cluster and use properly the resources available.
I would recommend this course because it provides all the essential foundational knowledge needed to work in a cluster environment. It clearly explains what a cluster system is and guides you step by step on how to work effectively within it. Overall, it offers a solid groundwork for anyone who needs to operate in real cluster environments.

Training financiado por Ayuda:CEX2020-001049-S financiada por MCIN/ AEI / 10.13039/501100011033

completecluster25.pdf

You are here

Complete Cluster Course

Complete Cluster Course

Complete Cluster Course

EVENTS

Events Calendar