Authors: Melanie Courtot (EMBL-EBI), Jonathan Dursi (SickKids), Nicky Mulder (UCT), Morris Swertz (UMCG)
Access, reuse and integration of biomedical datasets is critical to advance genomics research and realise benefits to human health. However, obtaining human controlled-access data in a timely fashion can be challenging, as neither the access requests nor the data uses conditions are standardised: their manual review and evaluation by a Data Access Committee (DAC) to determine whether access should be granted or not can significantly delay the process, typically by at least 4 to 6 weeks once the dataset of interest has been identified.
To address this, we have contributed to the development of the Data Use Ontology (DUO), which was approved as a Global Alliance for Genomics and Health (GA4GH) standard and has been used in over 200,000 annotations worldwide. DUO is a machine readable structured vocabulary that contains "Permission terms" (which describe data use permissions) and "Modifier terms" (which describe data use requirements, limitations or prohibitions) and it has already been implemented in some CINECA cohort and cohort data sharing resources (e.g. EGA, H3Africa, synthetic datasets); additional cohorts are in the process of reviewing data access policies with a view of applying DUO terms to their datasets.
https://doi.org/10.5281/zenodo.5795449