Date: 18th – 19th May 2022
Time: 14:00-18:00 CEST
Location: Online
Workshop organisers: CINECA, H3ABioNet
Contact: Nicola Mulder, Mamana Mbiyavanga
-
The ultimate goal of CINECA's vision of a federated cloud-enabled infrastructure making population-scale genomic and biomolecular data accessible across international borders is to enable large-scale federated data analysis responsibly and securely. This will require integrating and harmonizing diverse, large human cohort data using community standards. Data harmonization within and across cohorts adds value to the data for downstream analysis and interpretation and facilitates cross-cohort meta-analysis.
This workshop aims to discuss ways to address common challenges in cohort data harmonization, work towards practical steps to address them, and share best practices. We welcome any cohort with plans for prospective or retrospective data harmonization, enthusiastic about sharing their experience and learning from others' perspectives in cohort data discovery and analysis.
-
Data cleaning and curation
ELSI considerations in merging data
Data collection standards, ontology terminology and interoperability standards, metadata models
Data storage standards
Data harmonization
Sharing cohort summary data
Applicants are encouraged to check out the CINECA webinar on 31st March 2022 (https://www.cineca-project.eu/webinar/bringing-it-all-together), which highlights some of the relevant standards and applications.
-
After this workshop, participants should be able to:
Do basic data cleaning
Understand what data standards & ontologies exist for clinical data
Map their cohort metadata to a data model
Understand existing approaches to and algorithms for data harmonization
Prepare summary data from their cohorts
-
Members of cohort projects who are working on data curation and management. Data managers, curators, bioinformaticians, data scientists.
-
None, but should be involved in cohort data management or analysis.
-
This workshop will only provide a foundation for continued learning in data harmonization, with some example applications using synthetic datasets. Future bring our own workshops can be arranged for more hands-on work with your cohort data.
Workshop programme
Time CEST/CAT |
Topic |
Speaker |
18th May 2022 |
||
14:00 |
Welcome and introduction, workshop aims - Video, Slides |
Nicky Mulder (CINECA , H3Africa/H3ABioNet, University of Cape Town, South Africa) |
14:15 |
Data cleaning - Video |
Katherine Johnston, Ayton Meintjes (H3Africa/H3ABioNet, University of Cape Town, South Africa) |
14:45 |
Machine learning/text mining tools for cleaning data - Video |
Isuru Liyanage (CINECA, EMBL-EBI, UK) |
15:15 |
ELSI considerations in merging data - Video |
Melanie Goisauf (CINECA, BBMRI-ERIC, Austria) |
15:40-16:00 |
Break |
|
16:00 |
Overview of data collection standards, ontology terminology and interoperability - Video |
Peter Robinson (Jackson Laboratory, US) |
16:45 |
Metadata: GECKO, IHCC - Video |
Carles Garcia (CINECA, EMBL-EBI, UK) |
End 18:00 |
Hands-on work to prepare for loading into Atlas - Video |
Carles Garcia (CINECA, EMBL-EBI, UK) |
19th May 2022 |
||
14:00 |
Browse newly uploaded data in Atlas - Video |
Carles Garcia (CINECA, EMBL-EBI, UK) |
14:20 |
Other considerations -data storage standards - Video |
Alexa Heekes (Western Cape Department of Health, University of Cape Town, South Africa) |
14:45 |
Summary from the literature review on data harmonization - Video |
Lyndon Zass (H3Africa/H3ABioNet, University of Cape Town, South Africa) |
15:00 |
Example 1: DPUK - Video |
Sarah Bauermeister (DPUK, University of Oxford, UK) |
15:20 |
Example 2: H3Africa CVD - Video |
Katherine Johnston (H3Africa/H3ABioNet, University of Cape Town, South Africa) |
15:45-16:00 |
Break |
|
16:00 |
Example 3: PRIMED - Video |
Leslie Lange (PRIME consortium, University of Colorado, USA) |
16:20 |
Data harmonization algorithms - Video |
Mamana Mbiyavanga (CINECA , H3Africa/H3ABioNet, University of Cape Town, South Africa) |
16:45 |
Cohort representation (MIACC), how to generate and share summary data - Video |
Melanie Courtot (OICR, Canada) |
17:15-18:00 |
BYOD and discussion - Video |
All |
-
The CINECA (Common Infrastructure for National Cohorts in Europe, Canada, and Africa) project aims to develop a federated cloud-enabled infrastructure to make population-scale genomic and biomolecular data accessible across international borders, to accelerate research, and improve the health of individuals across continents. CINECA will leverage international investment in human cohort studies from Europe, Canada, and Africa to deliver a paradigm shift of federated research and clinical applications. The CINECA consortium will create one of the largest cross-continental implementations of human genetic and phenotypic data federation and interoperability with a focus on common (complex) disease, one of the world’s most significant health burdens. CINECA has assembled a virtual cohort of 1.4M individuals from population, longitudinal and disease studies. Federated analyses will deliver new scientific knowledge, harmonisation strategies and the necessary ELSI framework supporting data exchange across legal jurisdictions enabling federated analyses in the cloud. CINECA will provide a template to achieve virtual longitudinal and disease-specific cohorts of millions of samples, to advance benefits to patients. It will leverage partner membership of standards and infrastructures like the Global Alliance for Global Health, BBMRI, ELIXIR, and EOSC driving the state of the art in standards development, technical implementation and FAIR data.