PRBB Computational Genomics Seminars Jose Antonio Espinosa

PRBB Computational Genomics Seminars Jose Antonio Espinosa
04/12/202512:00R_473.10_AULAPRBB Computational Genomics SeminarsJose Antonio EspinosaComparative Bioinformatics group, Notredame Lab."nf-core proteinfold: a community-driven open source pipeline for deep learning based protein structure prediction methods"Abstract:The release of AlphaFold2 paved the way for a new generation of prediction tools for studying unknown proteomes. These tools enable highly accurate protein structure predictions by leveraging advances in deep learning. However, their implementation can pose technical challenges for users, who must navigate a complex landscape of dependencies and large reference databases. Providing the community with a standardized workflow framework to run these tools could ease adoption.
Thanks to its adherence to nf-core guidelines, the nf-core/proteinfold pipeline simplifies the application of state-of-the-art protein structure modeling techniques by making use of Nextflow’s optimized execution capabilities on both cloud providers and HPC infrastructures. The pipeline integrates several popular methods, namely AlphaFold 2 and 3, Boltz 1 and 2, ColabFold, ESMFold, HelixFold, RosettaFoldAA, and RosettaFold2NA. Following structure prediction, nf-core/proteinfold generates an interactive report that allows users to explore and compare predicted models together with standardized confidence metrics, harmonized across methods for consistent interpretation. The workflow also integrates Foldseek-based structural search, enabling the identification of known protein structures similar to the predicted models.
The pipeline is developed through an international collaboration that includes the Centre for Genomic Regulation, Pompeu Fabra University, European Bioinformatics Institute, and the Australian BioCommons, and it already serves as a central resource for structure prediction at these organisations and others. This broad adoption shows how the nf-core community, through its open-source and community-driven model plus its shared computational resources, makes key bioinformatics pipelines widely accessible for everyday research. I will also briefly introduce the community during the talk.
Interestingly, nf core proteinfold represents a new generation of Nextflow workflows designed to place multiple alternative methods for the same task within one coherent framework. This design makes it possible to benchmark the different procedures, providing a basis for developing combined approaches or selecting the best method.
