Complete Cluster Course

Complete Cluster Course
This course will introduce and consolidate material presented in all levels of cluster course and expand on the concepts to be aware of when trying to optimize use of the cluster.
If you have never used the CRG cluster, this is your mandatory course to obtain the user license. If you are already using the CRG Cluster and need an update, you will need to take the this course.
The main message of the course is to embrace the parallelism available within the cluster and that pipelines should be made from lots of small independent pieces that are spread throughout the cluster rather than large monolithic long jobs that run on a single node. The course will show why this should be done and how to achieve it.
Topics that are going to be addressed:
- Video tour of the data centre
- What is a cluster
- Logging in
- Queuing / the scheduler
- What resource are available at the CRG cluster
- Simple batch scripts - directives
- Troubleshooting - what happened to my jobs?
- Interactive sessions
- Supercomputers, beowulf clusters, horizontal v vertical scaling
- Hardware considerations
- Multithreaded jobs, parallelism, Amdahl's Law
- Job arrays
- Job dependencies
- Building a pipeline
- Storage issues, treemap
- Job stats, resource estimation
- Scaling analysis
What NOT to expect:
Specific bioinformatics methods, pipeline builders (nextflow, snakemake etc.)
Pre-requisite: Linux Terminal for beginners course (or Linux experience)
Instructors and teachers: Emyr James (Head of SIT) and other SIT members
Dates: 12th, 13th, 14th and 17th of November 2025
Time: 10:00-13:00h
Level: Intermediate-advanced
Location: Bioinformatics room, CRG Training Centre
Maximum number of participants: 18
Registration deadline: 7th November 2025
Registration HERE
For any information, please send an email to CRG Training and Academic office (TAO): training@crg.eu
Training financiado por Ayuda:CEX2020-001049-S financiada por MCIN/ AEI / 10.13039/501100011033


