Researchers at FIMM have developed a computational approach that can automatically identify various cell types based on single-cell RNA-sequencing data.
The method has a huge potential for unbiased profiling of mixtures of cells, both for large-scale projects and samples derived from a single patient, states the researchers. It is not yet known how many different cell types there are in the human body. Previous estimates have suggested the number is between 200 and 300, but the recent large-scale sequencing projects such as the Human Cell Atlas are constantly revealing new cell types.
Cells can be defined by the activity of their ~20,000 genes. Rapid advancements in next-generation sequencing methodology have made profiling of cells at individual level feasible. Of the many single-cell analysis methods, single-cell RNA sequencing (scRNA-seq), which profiles gene expression, is the most common technique.
With the new scRNA-seq methods, researchers can now profile hundreds of thousands of cells in human specimens. This new field has a huge scientific potential by allowing the study of cell-to-cell variation within a complex tissue. However, the downstream analyses of such data are complicated and computationally heavy.
A group led by Professor Tero Aittokallio consisting of researchers from the Institute for Molecular Medicine Finland FIMM (University of Helsinki) and the Helsinki Institute of Information Technology HIIT, (Aalto University) has developed a computational platform that can make this tedious work much easier.
In a recent Nature Communications publication, the team describes their newly developed and freely available tool, called ScType, which enables accurate cell type identification by guaranteeing the specificity of positive and negative marker genes both across cell clusters and cell types.
“The existing cell type identification methods are mainly based on unsupervised clustering of cells based on the similarity of their scRNA-seq profiles, followed by manual annotation of cell clusters using established marker genes. This is a time-consuming process that may lead to sub-optimal results,” explains Doctoral Researcher Aleksandr Ianevski, the fist author of the study and the main developer of the method.
“By contrasts, ScType platform enables data-driven, fully-automated and ultra-fast cell-type identification based solely on given scRNA-seq data, combined with a comprehensive cell marker database as background information,” says Anil K Giri, another lead author of the work.
The team demonstrated the feasibility of the method by re-analyzing six scRNA-seq datasets representing both human and mouse tissues. The results showed that ScType platform correctly annotated a total of 72 out of 73 cell-types (almost 99% accuracy), including eight newly-reannotated cell-types that were incorrectly or non-specifically annotated in the original studies.
Furthermore, ScType also enables distinguishing between healthy and malignant cell populations, making it a versatile tool for exploration and use of single-cell transcriptomic data for anticancer applications.
Accelerate unbiased phenotypic profiling of cells
“We anticipate the ScType platform will accelerate unbiased phenotypic profiling of cells when applied either to large-scale single-cell sequencing projects or smaller-scale profiling of patient-derived samples,” said Professor Tero Aittokallio.
To promote its wide application, either as a stand-alone tool or together with other popular single-cell data analysis software, the group has deployed ScType both as an interactive web-platform, and as an open-source R-package, connected with a comprehensive ScType database of specific markers.
Image of blood cells: iStock