Passport is the glue between the researcher, data and computing

This month’s blog was written by Mikael Linden, Senior application specialist at CSC - IT Center for Science, and co-Work Package Lead of CINECA WP2 - Interoperable Authentication and Authorisation Infrastructure. This blog is the third in our GA4GH standards series, presenting an overview of how GA4GH standards are being developed and implemented by CINECA. For the first blog in the series, giving a broad overview of how CINECA is facilitating federated data discovery, access and analysis, please see Dylan Spalding’s blog Implementation of GA4GH standards in CINECA.

Beacon cohorts: A model for cohort discovery in CINECA and beyond

This month’s blog was written by Lauren Fromont (CRG), a member of the EGA team at CRG and a member of CINECA WP1 - Federated Data Discovery and Querying. This blog is the second in our GA4GH standards series, presenting an overview of how GA4GH standards are being developed and implemented by CINECA.

Implementation of GA4GH standards in CINECA

This month’s blog was written by Dylan Spalding (EMBL-EBI), Coordinator of the European Genome-phenome Archive and co-WPL of CINECA WP4 - Federated Joint Cohort Analysis. This blog is the first in our new series, presenting an overview of GA4GH standards being developed and implemented by CINECA.

GA4GH standards series, CINECA Short Reports, WP1, WP2, WP3, WP4, WP5, BlogDylan Spalding (EMBL-EBI)February 5, 2021WP1, WP2, WP3, WP4, WP5

Biomedical Named entity recognition - Pros and cons of rule-based and deep learning methods

The final blog in our series on text-mining is a guest blog written by Shyama Saha, who specialises in Machine Learning/Text Mining at EMBL-EBI. The CINECA project aims to create a text mining tool suite to support extraction of metadata concepts from unstructured textual cohort data and description files. To create a standardised metadata representation CINECA is using Natural language processing (NLP) techniques such as entity recognition, using rule-based tools such as MetaMap, LexMapr, and Zooma. In this blog Shyama discusses the challenges of dictionary and rule-based text-mining tools, especially for entity recognition tasks, and how deep learning methods address these issues.

Text-mining series, CINECA Short Reports, WP3, BlogShyama Saha (EMBL-EBI)December 11, 2020WP3

Assigning standard descriptors to free text

This post is part of a series on a text-mining pipeline being developed by CINECA in Work Package 3. In previous instalments, first, Zooma and Curami pipelines were explained in "Uncovering metadata from semi-structured cohort data". Then, LexMapr was introduced in "LexMapr - A rule-based text-mining tool for ontology term mapping and classification". In this third instalment we are going to explain the normalisation pipeline developed at SIB/HES-SO.

Text-mining series, CINECA Short Reports, WP3, BlogJenny Copara, SIB/HES-SONovember 3, 2020text mining, zooma, metadata, cohort, ontology, WP3

LexMapr - A rule-based text-mining tool for ontology term mapping and classification

The initial focus of LexMapr development has been on providing a text-mining tool to clean up the short free-text biosample metadata that contained inconsistent punctuation, abbreviations and typos, and to map the identified entities to standard terms from ontologies. This blog is the second in a series on text-mining in CINECA. For the previous instalment "Uncovering metadata from semi-structured cohort data" please click here.

Uncovering metadata from semi-structured cohort data

Harmonisation of attributes across different cohorts is very challenging and labour intensive, but critical to leverage the collective potential of the data. The CINECA text mining group aims to provide common tools and methods to extract additional metadata from structured and semi-structured fields in cohorts’ data.

Text-mining series, CINECA Short Reports, WP3, BlogIsuru LiyanageSeptember 15, 2020text mining, metadata, ontology, cohort, WP3

Sustainability and SMEs

Direct industrial participation in CINECA is made by the SMEs (Small and Medium-sized Enterprises) The Hyve and Clinicageno, both companies with an interest in bioinformatics applied to research and clinical genomic problems. The key background driver for their interest in a project like CINECA are the possibilities for long-term, sustainable profits in the areas of data-driven science and medicine in which the Hyve and Clinicageno specialise.

Our Blog

Get In Touch

QUICk Links

How do we work