Back to All Events

CINECA Session at IHCC - cross cohort interoperability, metadata harmonisation, and discovery

CINECA_news announcement_c_Spencer_Phillips_EMBL-EBI_780x400.jpg

Date and Location: TDC due to Coronavirus outbreak

Registration:  https://is.gd/cineca_ihcc2020 (Even if you cannot attend there are some questions in the form to help guide our discussions)

Contact: Mamana Mbiyavanga

Overview

Over the past forty years, vast cohorts of human participants have been assembled and phenotyped through research studies, public and private healthcare initiatives, and clinical trials. The International HundredK+ Cohorts Consortium (IHCC) has identified over 60 large scale cohorts with over 100,000 participants consisting of longitudinal, environmental, and clinical data collected over decades. The IHCC is bringing large cohorts together to encourage data sharing, improve efficiencies and maximize benefits in addressing scientific questions none could answer alone. There are several axes of cohort data discovery depending on the type and source of metadata collected by each cohort, e.g. disease status, data use, sample collection parameters, genotype, and complex phenotypes. The Global Alliance for Genomics and Health (GA4GH) is a policy-framing and technical standards-setting organization, seeking to enable responsible genomic data sharing. GA4GH has approved a series of standards relevant for representing and making cohort data discoverable, e.g.:

  1. Phenotypic metadata representation via Phenopackets, which have recently been converted into JSON schema through the Schemablocks effort;

  2. Formalization of data access via researcher passports and the Data Use Ontology (DUO), which are new GA4GH standards promoting consistency of data access representation and user authorization;

  3. Frameworks for data discovery such as Beacon and the Search API. Beacon v2 will enable to find dataset based on Phenopackets or DUO Schemablocks.

The hands-on workshop aims are to collect a variety of use cases for cohort interoperability and highlight work towards practical steps to address them. This will include approaches for metadata harmonization, querying/discovery, authentication, data retrieval and analysis. We propose to identify a subset of cohorts that will provide the relevant information for use cases driven development.

We will use two specific cohort interoperability projects to show how the GA4GH standards are being deployed to enable discoverability and access to over 1.6M cohort participants across three continents. The Common Infrastructure for National Cohorts in Europe, Canada, and Africa (CINECA) project has assembled a virtual cohort of 1.4M individuals from population, longitudinal and disease studies. CINECA aims at developing the necessary tools and templates for federated cross-cohort and cross-border queries, with the aim to provide common APIs for further clinical and health-related analyses. The Cross-cohort Harmonization Project for Tomorrow (CHPT) from the Canadian Partnership against Cancer aims to leverage collaborations across large population-based cohorts and to facilitate interdisciplinary, cross-national research on the determinants and etiology of cancer and other major chronic diseases. The overlap between the CINECA and CHPT projects in terms of both cohorts and data provide a natural bridging point to extend their individual scopes. We welcome contributions from other activities that have done cross-cohort harmonization work.

Audience

Any cohort with plans for prospective or retrospective metadata harmonization, enthusiastic to share their experience and learn from CINECA’s perspective in the context of federated dataset discovery and analysis.

Register

Please  complete the form here:  https://is.gd/cineca_ihcc2020 to indicate whether you plan to attend the session. Even if you cannot attend there are some questions in the form to help guide our discussions.