Complete Cluster Course Feb-26

Complete Cluster Course Feb-26
This course will introduce and consolidate material presented in all levels of cluster course and expand on the concepts to be aware of when trying to optimize use of the cluster.
If you have never used the CRG cluster, this is your mandatory course to obtain the user license. If you are already using the CRG Cluster and need an update, you will need to take the this course.
The main message of the course is to embrace the parallelism available within the cluster and that pipelines should be made from lots of small independent pieces that are spread throughout the cluster rather than large monolithic long jobs that run on a single node. The course will show why this should be done and how to achieve it.
Topics that are going to be addressed:
- Video tour of the data centre
- What is a cluster
- Logging in
- Queuing / the scheduler
- What resource are available at the CRG cluster
- Simple batch scripts - directives
- Troubleshooting - what happened to my jobs?
- Interactive sessions
- Supercomputers, beowulf clusters, horizontal v vertical scaling
- Hardware considerations
- Multithreaded jobs, parallelism, Amdahl's Law
- Job arrays
- Job dependencies
- Building a pipeline
- Storage issues, treemap
- Job stats, resource estimation
- Scaling analysis
What NOT to expect:
Specific bioinformatics methods, pipeline builders (nextflow, snakemake etc.)
Pre-requisite: Linux Terminal for beginners course (or Linux experience)
Target audience: CRG staff ONLY
Instructors and teachers: Emyr James (Head of SIT) and other SIT members
Dates: 16th, 19th and 20th of February 2026
Time: 10:30-13:30h
Level: Intermediate-advanced
Location: Bioinformatics room, CRG Training Centre
Maximum number of participants: 18
Registration deadline: 30th January 2pm
Registration HERE
For any information, please send an email to CRG Training and Academic office (TAO): training@crg.eu
Feedback from previous editions:
... this type of course is of utmost importance as it makes the users aware of the consequences of their actions while running the compute units recklessly. ... it provides all the essential foundational knowledge needed to work in a cluster environment.
Training financiado por Ayuda:CEX2020-001049-S financiada por MCIN/ AEI / 10.13039/501100011033

