Framework and APIs for executing federated genomics analyses D4.2

Authors - Álvaro González (CSC), Shubham Kapoor (CSC), Kirill Tsukanov (EMBL-EBI)

The federated analysis platform defined by this task aims to provide technological solutions for three exemplar use cases: Federated joint cohort genotyping; Polygenic Risk Scores (PRS) workflow across two similar ethnic background sample sets; Federated QTL analysis for molecular phenotypes. In this deliverable, we gathered the technical requirements based on these use case descriptions and wrote a short design document which explains the requirements and lists the different options for a solution.

Three distinct frameworks were considered to address the requirements from the use-cases. The chosen framework supports different computing environments, which is a requirement for true federated analysis. The framework also supports extending compatibility with GA4GH standards, such as WES, htsget, and AAI / Passports. Plans to extend this proposed solution beyond these initial sites will be carried out after the initial phase of validation.

https://doi.org/10.5281/zenodo.4609356

Read More
DeliverablesLeslie Glasswp4, WP4
Query expansion service - D1.2

Authors - Romain Tanzer (HES-SO), Nona Naderi (HES-SO), Douglas Teodoro (HES-SO), Anais Mottaz (HES-SO), Patrick Ruch (HES-SO), Jonathan Dursi (SickKids), Jordi Rambla de Argila (CRG)

CINECA aims to support federated queries and analyses of distributed cohorts across continents. But human health datasets are extremely diverse; many different types of data are collected for many different kinds of health studies by many different health research communities. As a result, different cohort datasets often use different ontologies to describe similar kinds of entities, or represent concepts, such as genomic variation differently.

CINECA must span this diversity of data representations in order to achieve its goals of connecting health research cohort data. The work of WP3 partially addresses discoverability of datasets by defining a standard minimal cohort-level data representation which will be common across all cohorts; but that does not address cohort-level data that falls outside of the minimal common data model, nor does it address the representation of patient-level data. WP1’s role is to design and deploy API access to both cohort- and patient-level data, and a fundamental functionality of the infrastructure is to allow the user to find the appropriate dataset independently of the ontology used to map locally the different cohorts or indifferently of the format and syntax used to describe the variants.

This report describes the work done on query expansion, by implementing and demonstrating a query expansion service API that improves findability and searchability of distributed cohort data. Multiple kinds of query expansions are available for enabling further data integration and interoperability, including horizontal expansion, i.e., across ontological systems, and vertical expansion, i.e., within sublevels of the same ontological resource.

https://doi.org/10.5281/zenodo.4609335

Read More
DeliverablesLeslie GlassWP1
Cohort minimal metadata model - D3.1

Authors - Vivian Jin, Fiona Brinkman (SFU)

To support human cohort genomic and other “omic” data discovery and analysis across jurisdictions, basic data such as cohort participant age, sex, etc needs to be harmonised. Developing a key “minimal metadata model” of these basic attributes which should be recorded with all cohorts is critical to aid initial querying across jurisdictions for suitable dataset discovery. We describe here the creation of a minimal metadata model, the specific methods used to create the minimal metadata model, and this model’s utility and impact.

A first version of the metadata model was built based on a review of Maelstrom research data standards and a manual survey of cohort data dictionaries, which identified and incorporated overlapping core variables across CINECA cohorts. The model was then converted to Genomics Cohorts Knowledge Ontology (GECKO) format and further expanded with additional terms. The minimal metadata model is being made broadly available to aid any project or projects, including those outside of CINECA interested in facilitating cross-jurisdictional data discovery and analysis.


https://doi.org/10.5281/zenodo.4575460

Read More
Training Programme, Detailed D6.4

In this deliverable document, we report on the activities in task 6.4 - Training Programme, describe the CINECA training activities in the first 24 months of the project and provide the Training Plan for the next 12-24 months. For training interventions targeted at a broader audience, we have set up a webinar series, providing quarterly online learning interventions. We ran a total of 6 webinars (3 of these webinars in 2019, and 3 in 2020), with 23 attendees on average, 68% on average of those who registered. In addition, a series of short training videos (https://www.cineca-project.eu/short-videos) was created to facilitate the uptake of CINECA outputs. Eight short videos were produced by work packages on different topics. The short videos were submitted to ELIXIR’s training portal to increase engagement and disseminated via CINECA’s various communication channels.

https://doi.org/10.5281/zenodo.6223125

Read More
Deliverables, WP4deliverables, wp4
Biomedical Named entity recognition - Pros and cons of rule-based and deep learning methods

The final blog in our series on text-mining is a guest blog written by Shyama Saha, who specialises in Machine Learning/Text Mining at EMBL-EBI. The CINECA project aims to create a text mining tool suite to support extraction of metadata concepts from unstructured textual cohort data and description files. To create a standardised metadata representation CINECA is using Natural language processing (NLP) techniques such as entity recognition, using rule-based tools such as MetaMap, LexMapr, and Zooma. In this blog Shyama discusses the challenges of dictionary and rule-based text-mining tools, especially for entity recognition tasks, and how deep learning methods address these issues.

Read More
Catalogue of Canadian, European and African ethical and legal gaps - D7.2

Authors - Éloïse Gennet, Melanie Goisauf, Delphine Pichereau, Emmanuelle Rial-Sebbag

Remaining liberties that GDPR provides to EU Member States, as well as remaining ambiguities on GDPR interpretation, continue to feed debates in the ethical and legal literature. Projects like CINECA, which is seeking to facilitate health data exchanges between cohorts in Europe, Canada and Africa, offer valuable experience and input on essential ethical and legal gaps between countries and cohorts on questions such as the ethical lawful basis for international health data sharing and secondary processing for research purposes.

The focus of this deliverable will be on answering, both from a legal and an ethical point of view, two priority questions: How to choose a legal basis for CINECA’s data processing? And how should CINECA apprehend broad consent to further data processing? The goal will be to study how the CINECA project could be efficiently conducted (especially data sharing) while being legally compliant with relevant laws and regulations across all member states, and most of all, being compliant with established ethical guidelines and practices across three continents.

https://doi.org/10.5281/zenodo.4298450

Read More
DeliverablesLeslie Glasswp4