CanDIG: Federated network across Canada for multi-omic and health data discovery and analysis

Dursi et al. describe setting up a genomics and biomedical data federation across Canada’s provincial regulatory boundaries and the drivers behind their governance and technical decisions. They guide on how to implement Global Alliance for Genomics and Health (GA4GH) standards to aid in building a federated data platform.

https://doi.org/10.1016/j.xgen.2021.100033

Read More
International federation of genomic medicine databases using GA4GH standards

Adrian Thorogood, Heidi L. Rehm, Peter Goodhand, Angela J.H. Page, Yann Joly, Michael Baudis, Jordi Rambla, Arcadi Navarro, Tommi H. Nyronen, Mikael Linden, Edward S. Dove, Marc Fiume, Michael Brudno, Melissa S. Cline, Ewan Birney

Thorogood et al. provide a guide to federated approaches to data sharing, which aim to connect independent, secure genomic medicine databases through common standards, enabling users to derive insights across multiple databases. The authors argue that a federated approach is feasible and necessary to connect national genomics initiatives into a global network to advance precision medicine.

Read More
Publications
GA4GH: International policies and standards for data sharing across genomic research and healthcare

Heidi L. Rehm, Angela J.H. Page, Mélanie Courtot, Jonathan Dursi, Lauren A. Fromont, Thomas M. Keane, Mikael Linden, Isuru Udara Liyanage, Nicola Mulder, Jordi Rambla, Gary I. Saunders, et al.

Rehm et al. describe the Global Alliance for Genomics and Health (GA4GH), which develops technical standards and policy frameworks to enable responsible international human genomic and biomedical data sharing. Broad international participation in building, adopting, and deploying these standards is necessary to bridge research and healthcare and is critical to making the best use of genomic data to inform advances in medicine and human health.

Read More
Publications
CINECA Poster - GA4GH 9th Plenary

In collaboration with other GA4GH-associated projects, CINECA is developing infrastructure which will permit effective use of widely-dispersed data increasing the size and quality of datasets available for disease research. In alignment with community standards, using standardised interfaces, data analysis will be federated and migrated to the data, respecting data access restrictions.
Solutions CINECA is adopting from the Discovery Work Stream include the Data Connect and Beacon v2 API, while from the DURI and Data Security Work Streams the GA4GH Passports, AAI and DUO are being utilised.

Read More
Joint Variant genotyping use case

Recently WP4 has delivered a simple demonstrator pipeline to perform a federated joint variant genotyping analysis. The goal of this use case is to demonstrate how a simple metric (in this case, allele frequency) can be computed in a federated manner, without requiring ever collecting the individual level data in a central location.

Read More
Semantic and harmonisation best practice - D3.2

Authors - Melanie Courtot (EMBL-EBI), Isuru Liyanage (EMBL-EBI)

To support human cohort genomic and other omic data discovery and analysis across jurisdictions, basic data such as cohort participants’ demographic data, diseases, medication etc. (termed “minimal metadata”) needs to be harmonised. Individual cohorts are constrained by size, ancestral origins, and geographic boundaries that limit the subgroups, exposures, outcomes, and interactions which can be examined. Combining data across large cohorts to address questions none of them can answer alone enhances the value of each and leverages the enormous investments already made in them to address pressing questions in global health. By capturing genomic, epidemiological, clinical and environmental data from genetically and environmentally diverse populations, including populations that are traditionally under-represented, we will be able to capture novel factors associated with health and disease that are applicable to both individuals and communities globally.

We provide best practices for cohort metadata harmonisation, using the semantic platform we deployed in the cloud to enable cohort owners to map their data and harmonise against the GECKO (GEnomics Cohorts Knowledge Ontology) we developed. GECKO is derived from the CINECA minimal metadata model of the basic set of attributes that should be recorded with all cohorts and is critical to aid initial querying across jurisdictions for suitable dataset discovery. We describe how this minimal metadata model was formalised using modern semantic standards, making it interoperable with external efforts and machine readable. Furthermore, we present how those practices were successfully used at scale, both within CINECA for data discovery in WP1 and in the synthetic datasets constructed by WP3, and outside of CINECA such as in the International HundredK+ Cohorts Consortium (IHCC) and the Davos Alzheimer’s Collaborative (DAC). Finally, we highlight ongoing work for alignment with other efforts in the community and future opportunities.

https://doi.org/10.5281/zenodo.5055308

Read More