This month’s blog was written by Lauren Fromont (CRG), a member of the EGA team at CRG and a member of CINECA WP1 - Federated Data Discovery and Querying. This blog is the second in our GA4GH standards series, presenting an overview of how GA4GH standards are being developed and implemented by CINECA.
Read MoreThis month’s blog was written by Dylan Spalding (EMBL-EBI), Coordinator of the European Genome-phenome Archive and co-WPL of CINECA WP4 - Federated Joint Cohort Analysis. This blog is the first in our new series, presenting an overview of GA4GH standards being developed and implemented by CINECA.
Read MoreAuthors - Romain Tanzer (HES-SO), Nona Naderi (HES-SO), Douglas Teodoro (HES-SO), Anais Mottaz (HES-SO), Patrick Ruch (HES-SO), Jonathan Dursi (SickKids), Jordi Rambla de Argila (CRG)
CINECA aims to support federated queries and analyses of distributed cohorts across continents. But human health datasets are extremely diverse; many different types of data are collected for many different kinds of health studies by many different health research communities. As a result, different cohort datasets often use different ontologies to describe similar kinds of entities, or represent concepts, such as genomic variation differently.
CINECA must span this diversity of data representations in order to achieve its goals of connecting health research cohort data. The work of WP3 partially addresses discoverability of datasets by defining a standard minimal cohort-level data representation which will be common across all cohorts; but that does not address cohort-level data that falls outside of the minimal common data model, nor does it address the representation of patient-level data. WP1’s role is to design and deploy API access to both cohort- and patient-level data, and a fundamental functionality of the infrastructure is to allow the user to find the appropriate dataset independently of the ontology used to map locally the different cohorts or indifferently of the format and syntax used to describe the variants.
This report describes the work done on query expansion, by implementing and demonstrating a query expansion service API that improves findability and searchability of distributed cohort data. Multiple kinds of query expansions are available for enabling further data integration and interoperability, including horizontal expansion, i.e., across ontological systems, and vertical expansion, i.e., within sublevels of the same ontological resource.
https://doi.org/10.5281/zenodo.4609335
Read MoreThis video demonstrates how a service registry can operate with additional functionality when integrated more closely with the services providing queries (here, Beacon queries).
Read MoreThis video demonstrates how a standalone service registry operates, and our extensions to the service info and service registry standards to include cohort-level metadata.
Read MoreThis video highlights introduction to the Service Catalog, a searchable listing of query services available atop CINECA cohort data.
Read MoreCINECA aims to support the federated queries and analyses of distributed cohorts across continents. A vital component of this work is building a machine readable catalogue of cohorts and sites that support the efforts of Work Package 1 discovery and analysis APIs, which can be programmatically queried so that API calls can be made to relevant sites and results gathered and presented to the researcher.
Deliverable D1.1, Discovery Service Catalogue, supports the work of dependent work packages by implementing and demonstrating an open-source extended implementation of the Service Registry standard of the Global Alliance for Genomics and Health (GA4GH) for WP1’s discovery queries, the GA4GH Beacon queries. The Service Registry standard is now supported by the ELIXIR Beacon Network that CINECA WP1 uses to federate discovery queries across cohorts, and this demonstrator deliverable demonstrates the use of the service registry and its open source implementation.
https://doi.org/10.5281/zenodo.3908397
Read More