This month’s blog was written by Nicola Mulder, Professor and head of the Computational Biology division at the University of Cape Town, and Principal investigator of H3ABioNet, a Pan African bioinformatics network for H3Africa, and Mamana Mbiyavanga, a Bioinformatics Scientist and PhD student at UCT, who contribute to a diverse range of CINECA work packages. This blog is less of a technical report in our Global Alliance for Genomics and Health (GA4GH) standards series than the previous 4, and more of a report on how WP6 - ‘Outreach, training and dissemination’ is contributing to developing better implementation of GA4GH standards.
Read MoreAuthors - Vivian Jin, Fiona Brinkman (SFU)
To support human cohort genomic and other “omic” data discovery and analysis across jurisdictions, basic data such as cohort participant age, sex, etc needs to be harmonised. Developing a key “minimal metadata model” of these basic attributes which should be recorded with all cohorts is critical to aid initial querying across jurisdictions for suitable dataset discovery. We describe here the creation of a minimal metadata model, the specific methods used to create the minimal metadata model, and this model’s utility and impact.
A first version of the metadata model was built based on a review of Maelstrom research data standards and a manual survey of cohort data dictionaries, which identified and incorporated overlapping core variables across CINECA cohorts. The model was then converted to Genomics Cohorts Knowledge Ontology (GECKO) format and further expanded with additional terms. The minimal metadata model is being made broadly available to aid any project or projects, including those outside of CINECA interested in facilitating cross-jurisdictional data discovery and analysis.
https://doi.org/10.5281/zenodo.4575460