Back to All Events

Data Gravity in the Life Sciences: Lessons learned from the Human Cell Atlas and other federated data projects

Date: 12 November 2020

Tiime: 4:00 PM CET

Materials:  Slides Data Gravity in the Life Sciences: Lessons learned from the Human Cell Atlas and other federated data projects

Speaker: Tony Burdett, EMBL-EBI

Contact: Marta Lloret Llinares


Overview

We live in an era of cloud computing. Many of the services in the life sciences are keenly planning cloud transformations, seeking to create globally distributed ecosystems of harmonised data based on standards from organisations like GA4GH. CINECA faces similar challenges, gathering cohort datasets from all over the globe, many of which are pinned in place, due to their size, legal restrictions, or other considerations. But is “bringing compute to the data” always the right choice? In this webinar, based on experiences from the Human Cell Atlas Data Coordination Platform and other projects from EMBL-EBI, we will explore the concept of “data gravity”: The idea that whilst there are forces that may hold data in one place, there are others that require it to be mobile. We’ll consider how effectively planning a cloud strategy requires consideration of the gravity of datasets, and the impact it may have on team skills required, incentives for good practice, and storage and compute costs.

burdett_tony_web.jpg

About the speaker:

Tony Burdett leads the Archival Infrastructure and Technology team, which develops services and provides technology to support the activities of EMBL-EBI’s molecular archives, including data submission, storage, validation, coordination and presentation.

Tony joined EMBL-EBI in 2005 and has personally built and led development teams for many resources such as the GWAS Catalog, ArrayExpress, the Expression Atlas and BioSamples. His team now develops the ingestion service for the Human Cell Atlas Data Coordination Platform, EMBL-EBI’s Unified Submission Interface, and the BioSamples database.

About CINECA:

The CINECA (Common Infrastructure for National Cohorts in Europe, Canada, and Africa) project aims to develop a federated cloud enabled infrastructure to make population scale genomic and biomolecular data accessible across international borders, to accelerate research, and improve the health of individuals across continents. CINECA will leverage international investment in human cohort studies from Europe, Canada, and Africa to deliver a paradigm shift of federated research and clinical applications. The CINECA consortium will create one of the largest cross-continental implementations of human genetic and phenotypic data federation and interoperability with a focus on common (complex) disease, one of the world’s most significant health burdens. CINECA has assembled a virtual cohort of 1.4M individuals from population, longitudinal and disease studies. Federated analyses will deliver new scientific knowledge, harmonisation strategies and the necessary ELSI framework supporting data exchange across legal jurisdictions enabling federated analyses in the cloud. CINECA will provide a template to achieve virtual longitudinal and disease specific cohorts of millions of samples, to advance benefits to patients. It will leverage partner membership of standards and infrastructures like the Global Alliance for Global Health, BBMRI, ELIXIR, and EOSC driving the state of the art in standards development, technical implementation and FAIR data.