In collaboration with other GA4GH-associated projects, CINECA is developing infrastructure which will permit effective use of widely-dispersed data increasing the size and quality of datasets available for disease research. In alignment with community standards, using standardised interfaces, data analysis will be federated and migrated to the data, respecting data access restrictions.
Solutions CINECA is adopting from the Discovery Work Stream include the Data Connect and Beacon v2 API, while from the DURI and Data Security Work Streams the GA4GH Passports, AAI and DUO are being utilised.
Recently WP4 has delivered a simple demonstrator pipeline to perform a federated joint variant genotyping analysis. The goal of this use case is to demonstrate how a simple metric (in this case, allele frequency) can be computed in a federated manner, without requiring ever collecting the individual level data in a central location.
Read MoreThis post is part of a series on a text-mining pipeline being developed by CINECA in Work Package 3. In previous instalments, first, Zooma and Curami pipelines were explained in "Uncovering metadata from semi-structured cohort data". Then, LexMapr was introduced in "LexMapr - A rule-based text-mining tool for ontology term mapping and classification". In this third instalment we are going to explain the normalisation pipeline developed at SIB/HES-SO.
Read MoreThe initial focus of LexMapr development has been on providing a text-mining tool to clean up the short free-text biosample metadata that contained inconsistent punctuation, abbreviations and typos, and to map the identified entities to standard terms from ontologies. This blog is the second in a series on text-mining in CINECA. For the previous instalment "Uncovering metadata from semi-structured cohort data" please click here.
Read MoreHarmonisation of attributes across different cohorts is very challenging and labour intensive, but critical to leverage the collective potential of the data. The CINECA text mining group aims to provide common tools and methods to extract additional metadata from structured and semi-structured fields in cohorts’ data.
Read MoreThis work has contributed towards establishing a description of the trust model and four different levels of data access concerning specific cohort’s data, identifying use cases for the development of federated analysis workflows and describing existing data access models to inspire subsequent WP4 deliverables related to the implementation of the federated analysis workflow.
Read More