This report details the development of a demonstration tool for CINECA Task T5.1, which integrates WP1-3 CINECA services to advance federated biobank search. UML diagrams and mockups were used to model the search process and create a user interface. The architecture of the system and strategies for accessing and storing data were also described. The report highlights efforts to integrate pre-analytical metadata, which are important for determining the quality of biospecimens, and a synthetic dataset was prepared to develop search services for accessing and visualizing such data. The evaluation of the robustness and search effectiveness of the implemented services is also discussed.
https://doi.org/10.5281/zenodo.6783295
In collaboration with other GA4GH-associated projects, CINECA is developing infrastructure which will permit effective use of widely-dispersed data increasing the size and quality of datasets available for disease research. In alignment with community standards, using standardised interfaces, data analysis will be federated and migrated to the data, respecting data access restrictions.
Solutions CINECA is adopting from the Discovery Work Stream include the Data Connect and Beacon v2 API, while from the DURI and Data Security Work Streams the GA4GH Passports, AAI and DUO are being utilised.
Recently WP4 has delivered a simple demonstrator pipeline to perform a federated joint variant genotyping analysis. The goal of this use case is to demonstrate how a simple metric (in this case, allele frequency) can be computed in a federated manner, without requiring ever collecting the individual level data in a central location.
Read MoreThis video describes a common framework for designing portable federated pipelines. The joint cohort genotyping pipeline is provided as a specific implementation example. The ability of the pipeline to run in different environments, accessing the data via different protocols, and applying the appropriate normalisations is demonstrated.
Read MoreThis month’s blog was written by Dylan Spalding (EMBL-EBI), Coordinator of the European Genome-phenome Archive and co-WPL of CINECA WP4 - Federated Joint Cohort Analysis. This blog is the first in our new series, presenting an overview of GA4GH standards being developed and implemented by CINECA.
Read MoreThe CINECA project aims to develop a common infrastructure to support federated data analysis across national cohorts in Europe, Canada, and Africa. In this report, the progress made over the past four years is discussed, which involves the development of six modular workflows to quantify and normalize molecular traits, pre-process genotype data, and test for associations between molecular traits and genotypes. The approach improves on the previous state of the art by packaging software dependencies into Docker/Singularity containers, using the Nextflow language to orchestrate complex multi-step workflows, and using the HASE(1) framework to reduce the amount of data that needs to be transferred between cohorts. The project provides training materials and open access datasets to encourage adoption and demonstrates how the workflows can be used to perform federated analysis across multiple real cohorts located in Switzerland, Germany, the Netherlands, and Estonia.
https://doi.org/10.5281/zenodo.7464116
In this deliverable document, we report on the activities in task 6.4 - Training Programme, describe the CINECA training activities in the first 24 months of the project and provide the Training Plan for the next 12-24 months. For training interventions targeted at a broader audience, we have set up a webinar series, providing quarterly online learning interventions. We ran a total of 6 webinars (3 of these webinars in 2019, and 3 in 2020), with 23 attendees on average, 68% on average of those who registered. In addition, a series of short training videos (https://www.cineca-project.eu/short-videos) was created to facilitate the uptake of CINECA outputs. Eight short videos were produced by work packages on different topics. The short videos were submitted to ELIXIR’s training portal to increase engagement and disseminated via CINECA’s various communication channels.
https://doi.org/10.5281/zenodo.6223125
Direct industrial participation in CINECA is made by the SMEs (Small and Medium-sized Enterprises) The Hyve and Clinicageno, both companies with an interest in bioinformatics applied to research and clinical genomic problems. The key background driver for their interest in a project like CINECA are the possibilities for long-term, sustainable profits in the areas of data-driven science and medicine in which the Hyve and Clinicageno specialise.
Read More