CINECA Guest series - Gary Saunders - ELIXIR CINECA and the ELIXIR Federated Human Data Community.
Gary Saunders is the Human Data Coordinator at ELIXIR. Gary leads the implementation of the ELIXIR-wide strategy to enable responsible sharing of human data consented for reuse in scientific research.
A major focus of this role entails working with the existing Human Data Communities to ensure that where possible all data generated is compliant with the FAIR data principles (Findable, Accessible, Interoperable, Re-usable), and to coordinate these efforts with the Global Alliance for Genomics and Health (GA4GH). Genomics/Biomics data has become increasingly important in medical and translational research, and the vast amount of data generated from these techniques has led to a huge demand for secure means to store, transfer and analyse the human biomedical data that has been consented for research. The ELIXIR Federated Human Data Community extends and generalises the system of access authorisation and secure data transfer developed in the European Genome-phenome Archive (EGA). It aims to provide a framework for the secure submission, archiving, dissemination and analysis of human biomedical data across Europe.
Tell us a little bit about the ELIXIR Federated Human Data Community (FHD), and your vision for what you hope to achieve.
Genomics data generation is shifting from being predominantly funded by research to having healthcare systems as the primary funding source. Across Europe healthcare data is a national competence, often meaning that these data cannot leave national jurisdictions and boundaries. In this new environment the classic, centralised model of the European Genome phenome-Archive (EGA) does not scale, and a federated model must be adopted. The ELIXIR FHD Community brings together 17 of the 23 ELIXIR Nodes to design the framework for the Federated EGA (FEGA). The focus is on the identification of interoperable interfaces that allow sensitive human data archives to be connected in the federated framework. It is possible, and likely probable, that in the FEGA network the underlying technologies for sensitive data archiving and access are different; however if we can identify the gold standard and fit for purpose interfaces that are necessary for the network to be interoperable then this will enable cross-border sensitive data access as we move into the new era of genomics/Biomics research. The FHD Community is funded centrally by ELIXIR, using Member States contributions to build this framework which is viewed as a high priority for the majority of the ELIXIR Nodes.
How do you see CINECA fitting into/contributing to your vision for FHD?
The ELIXIR FHD Community developed from the “Human Data Use Case” of the EXCELERATE project, a Horizon 2020-funded project to kick start the implementation of ELIXIR. EXCELERATE was a large, multi Use Case project that allowed ELIXIR to centrally fund coordination, development, and implementation of large bioinformatic infrastructure elements. EXCELERATE ended in 2019 and as ELIXIR moves into its second scientific programme (2019-23) it is necessary to find a way to work in a more established format. In this framework CINECA, an EC H2020 project, is a pivotal piece of the jigsaw, with specific work packages on key infrastructure elements such as ELIXIR AAI (WP2); Beacons (WP1); building standardised metadata frameworks for cohorts (WP3); and developing the necessary regulatory and policy roadmaps to enable the network to function globally (WP7). These elements being developed by CINECA align with GA4GH approved standards, critical to ELIXIR’s partnership with GA4GH and the drive to reach the European Union’s goal of providing transnational access to at least one million genomes by 2022.
Do you have any predictions for the future of your research area?
In 2013 I was first employed at EMBL-EBI to help design and implement the European Variation Archive (EVA; www.ebi.ac.uk/eva). This is a centralised database and is used as the underpinning for genetic variation data archival and access for many of the EBI services, and also direct access to these raw data for many researchers across the world. From this position of employment I was able to watch the formation and growth of ELIXIR, to participate in the discussion of how it was necessary to build a federated network of data management - not only across Europe, but globally. I now enjoy building that framework and I think through projects such as CINECA the early vision for how this framework could be implemented are truly being realised. The CINECA project is developing the capacity for cross-national data analysis frameworks; where researchers can build custom datasets in the cloud, and run standardised bioinformatic pipelines on these data to generate reproducible results. The future for me is when cloud computing environments, workflows, and containers are in full production; when Europe has a supercomputing framework with the opportunity for access to all Member States; where the infrastructure is more widely recognised as important and funded accordingly. I think we are well on course for this future and I look forward to continuing to help build it.
Are there any popular misconceptions about data sharing that you come across, eg, in the general public? Eg, a Myth vs. Fact statement.
The concept of a federated infrastructure is not an easy one to explain, nor understand. Therefore language is incredibly important; to have agreed definitions of terms, and for these definitions to be published and readily available. I think this is majorly important in the conversations that we bioinformaticians have with those outside of our field for example with clinicians, policy makers, and Government officials. And it is with the latter two that I have seen some really rather easy changes to language have dramatic effects. For one example, the term “data sharing” often carries negative connotations for those that are not native to the field of bioinformatics, or indeed “Open Data”. In recent times I have seen much greater success in conversation, documentation, negotiation in using the term “data access”. The bioinformatician in me can absolutely see that these terms are almost equivalent - but they are open to very different interpretations by others. These subtle yet relatively simple changes in language can have dramatic effects on collaborations and can help alleviate concerns and misconceptions. Efforts to improve communication are especially important as we move towards global data access and data sharing agreements, encouraging all stakeholders to engage in the discussion.