"How FAIR are You" webinar series and hackathon
This month’s blog was written by Nicola Mulder, Professor and head of the Computational Biology division at the University of Cape Town (UCT), and Principal investigator of H3ABioNet, a Pan African bioinformatics network for H3Africa, and Mamana Mbiyavanga, a Bioinformatics Scientist and PhD student at UCT, who contribute to a diverse range of CINECA work packages. This blog is less of a technical report in our Global Alliance for Genomics and Health (GA4GH) standards series than the previous 4, and more of a report on how WP6 - ‘Outreach, training and dissemination’ is contributing to developing better implementation of GA4GH standards.
CINECA aims to facilitate the federated analysis of cohort data, which requires the data to be findable, accessible, interoperable and reusable (FAIR). However, it is not always easy to determine how FAIR a dataset, piece of software, or other resource is, as the tools that assess compliance with FAIR principles are still being developed. Additionally, the tools don't necessarily apply to all these types of materials, and not everyone is aware of how to identify and quantify FAIR indicators. To address these issues, CINECA ran a "How FAIR are you" webinar series and hackathon, an event open to all scientific community members.
The webinar series
The webinar series, which ran from January to April 2021, was aimed at introducing the FAIR principles and how they could be applied to these different types of resources. In the first webinar, Kees van Bochove (The Hyve) introduced the FAIR concept and discussed applications for health data with a webinar on Open science through FAIR health data networks: dream or reality?. This was followed by a webinar from William Hsiao (SFU) on how to make cohort data FAIR. Carlos Martinez (Netherlands eScience Center) then discussed the importance of the FAIR principles for the software tools, presenting the existing initiatives that are developing tools and raising awareness of the importance of FAIR for software. In the fourth webinar, Andrew Stubbs (EMC) spoke about the practical applications of the FAIR data principles, particularly in the context of clinical bioinformatics. This was followed by a webinar on making training materials FAIR by Sarah Morgan and Anna Swan (both EMBL-EBI), and one presented by Melanie Goisauf (BBMRI) on the important topic of ethical, legal and social considerations in FAIR data sharing. The final webinar by Thomas Keane (EMBL-EBI) took place during the hackathon, where he presented the application of FAIR in the CINECA project. This webinar discussed how the CINECA project is applying FAIR principles by using open standards, such as ontologies and the Global Alliance for Genomics and Health (GA4GH) standards, across the federated network to enable federated data discovery, data access, and analysis. All of these webinars were recorded and are available to watch on the CINECA website.
Hackathon participants
The hackathon itself took place from the 28-29th of April 2021 and was attended by ~40 people. The hackathon set out to investigate gaps in the application of FAIR principles to project outputs and to facilitate the implementation of FAIR approaches. It focused on cohort data, software tools including workflows, algorithms and web services, and training materials. More than half of the participants were female, showing a good gender balance. Although most participants were from European institutions, we had participants from both Africa and the US. Over 60% were either part of the CINECA project or from other EUCAN project institutions. More than 60% of the participants indicated that they were familiar with FAIR principles at the registration, which is reassuring - but shows that there is a real need for events such as this. As the hackathon was designed to be a practical event, participants were encouraged to bring their resources to assess and implement FAIR principles during the hands-on sessions. About 15% of the participants mentioned that they would bring their resources to the hackathon to evaluate.
The participant interests in different streams were evenly distributed, showing the necessity of making FAIR all three types of resources. Participants who had interests in multiple streams could jump from one stream to another at any time of the hackathon.
Hackathon plenary and breakout sessions
The hackathon aimed to increase and facilitate the uptake of FAIR approaches into software, training materials, and cohort data to facilitate responsible and ethical data and resource sharing and implementing federated applications for data analysis. There was a mix of talks and discussions in plenary sessions and practical implementation in breakout sessions. The talks included experiences in FAIRplus from Jolanda Strubel, FAIRsharing by Allyson Lister, the FAIRplus fairification wizard by Fuqi Xu (EMBL-EBI), and the FAIR Cookbook presented by Philippe Rocca-Serra. The talks gave useful perspectives on the tools that exist for FAIR indicators and assessment. The recordings of these talks are available on the hackathon webpage on the CINECA project website and Youtube channel.
There were three streams for the hackathon breakouts: cohort data, training materials and software. These provided some context or demonstrations and then provided an opportunity for participants to work through some of the tools or questionnaires to assess their own FAIR compliance.
The cohort data stream looked at FAIR assessment using the FAIRplus RDA indicators and H3ABioNet FAIR assessment questionnaire. Examples from EMBL-EBI BioSamples and an H3Africa cohort in EGA were used to demonstrate how to assess cohort data before allowing the participants to evaluate their own cohort data.
The training materials stream worked on FAIR assessment for different categories of participants; those wanting to make their lecture FAIR, those wishing to make a whole set of course materials FAIR, and those responsible for organising courses taught by others and who want to assess the FAIRness of the materials. The assessment was based on the 10 simple rules for making training materials FAIR and the H3ABioNet FAIR assessment questionnaire for training materials.
The software tools stream introduced existing tools for assessing software FAIRness and discussed metrics for FAIRness. The group also discussed current gaps, understanding why it is crucial to make software FAIR, and practical steps to apply FAIR principles to software tools. Assessment tools used in this stream included the Five recommendations for FAIR software and a Python package to analyse a GitHub or GitLab repository's compliance with the fair-software.eu recommendations.
These exercises raised a number of issues during the discussion sessions, such as: how to interpret the questions, how to measure the level of compliance, and what are the minimum requirements versus a complete set of indicators to determine whether something is FAIR. Individual streams also had their own issues raised; for example, at what level of granularity in training should a DOI be assigned (set of materials for a whole course versus individual lectures or modules)? The participants also reflected on which of the different FAIR components they considered to be the most important. The training stream decided the "F" was most important as you need to be able to find the materials, while the Slido question capturing the overall opinions identified "Reusable" as the most important, followed by "Interoperable" (See Fig 3). During the hackathon, the software stream used an assessment tool accessible at https://github.com/fair-software/howfairis, with its associated GitHub action https://github.com/marketplace/actions/fair-software to evaluate some software tools brought by the participants. The stream discussed any inadequacies raised through the assessment and then improved the FAIR compliance score by applying appropriate recommendations.
What's next?
Follow up discussions were held on how to move forward to enhance FAIR compliance based on the assessment outcomes. Further discussions were also held on how the use of open standards such as the GA4GH standards and other ontologies could improve the FAIRness of different resources. As outputs or next steps, in addition to this blog, we agreed to compile some recommendations based on the discussion on improving assessment indicators and investigate badging to reflect how FAIR a tool or resource is. We would like to thank the co-organizers: Vera Matser, Saskia Hiltemann and Marta Lloret Llinares, the speakers (mentioned earlier), and the stream leads Melanie Courtot, Sarah Morgan, Saskia Hiltemann and Carlos Martinez Ortiz for their contributions to the webinar series and hackathon.