You are here

    • You are here:
    • Home > Events > PRBB-CRG Sessions Juan Antonio Vizcaino

PRBB-CRG Sessions Juan Antonio Vizcaino

PRBB-CRG Sessions Juan Antonio Vizcaino

14/12/2018
Add to Calendar

PRBB-CRG Sessions Juan Antonio Vizcaino

MARIE CURIE

14/12/201812:00MARIE CURIEPRBB-CRG SessionsJuan Antonio VizcainoEuropean Bioinformatics Institute - EMBL-EBI, Cambridge, UK"Big data" approaches in proteomics: Big challenges give the best rewards"Host: Eduard Sabidó (CRG)Abstract:"Big data" approaches in proteomics: Big challenges give the best rewards
First of all, I will summarize the work we have done in the last years to create an infrastructure to enable data sharing of mass spectrometry (MS) proteomics data in the public domain, including the development of the world-leading PRIDE database (https://www.ebi.ac.uk/pride/), the related tools and software, open data standards and the establishment of the worldwide ProteomeXchange Consortium of proteomics resources (http://www.proteomexchange.org/). Thanks, among other efforts, to the great success of PRIDE and ProteomeXchange, the proteomics community is now widely embracing open data policies, an opposite scenario to the situation just a few years ago.
To corroborate this, during 2018 approximately 300 datasets per month have been submitted to PRIDE, which is now approaching the PB scale. This plethora of public proteomics data is being increasingly reused by the research community, since there are indeed highly attractive applications for data scientists. Some of them are proteomics centric (e.g. meta-studies to expand the knowledge of the human proteome, generation of spectral libraries, etc), but others involve the integration between proteomics and other omics data types, especially genomics. In this context, I will explain a few projects that we are carrying out in-house, for instance the generation of the functionally-relevant human phospho-proteome, and the integrative analysis of protein expression in human cancer.
We also aim to facilitate public data re-use by third parties by building open, reproducible, scalable proteomics data analysis pipelines. As a proof of context, these pipelines are deployed first in the EMBL-EBI “Embassy Cloud”, with the idea that in the future they can be made available in other cloud infrastructures, and that can be freely reused by any interested researcher in the community. These pipelines are connected to PRIDE, bringing the analysis tools closer to the data.