Skip to content
2000
image of scADCA: An Anomaly Detection-Based scRNA-seq Dataset Cell Type Annotation Method for Identifying Novel Cells

Abstract

Background

With the rapid evolution of single-cell RNA sequencing technology, the study of cellular heterogeneity in complex tissues has reached an unprecedented resolution. One critical task of the technology is cell-type annotation. However, challenges persist, particularly in annotating novel cell types.

Objective

Current methods rely heavily on well-annotated reference data, using correlation comparisons to determine cell types. However, identifying novel cells remains unstable due to the inherent complexity and heterogeneity of scRNA-seq data and cell types. To address this problem, we propose scADCA, a method based on anomaly detection, for identifying novel cell types and annotating the entire dataset.

Methods

The convolutional modules and fully connected networks are integrated into an autoencoder, and the reference dataset is trained to obtain the reconstruction errors. The threshold based on these errors can distinguish between novel and known cells in the query dataset. After novel cells are identified, a multinomial logistic regression model fully annotates the dataset.

Results

Using a simulation dataset, three real scRNA-seq pancreatic datasets, and a real scRNA-seq lung cancer cell line dataset, we compare scADCA with six other cell-type annotation methods, demonstrating competitive performance in terms of distinguished accuracy, full accuracy, -score, and confusion matrix.

Conclusion

In conclusion, the scADCA method can be further improved and expanded to achieve better performance and application effects in cell type annotation, which is helpful to improve the accuracy and reliability of cytology research and promote the development of single-cell omics.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/0115748936334071240903064630
2024-10-14
2024-11-02
Loading full text...

Full text loading...

References

  1. Mortazavi A. Williams B.A. McCue K. Schaeffer L. Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat. Methods 2008 5 7 621 628 10.1038/nmeth.1226 18516045
    [Google Scholar]
  2. Saliba A.E. Westermann A.J. Gorski S.A. Vogel J. Single-cell RNA-seq: advances and future challenges. Nucleic Acids Res. 2014 42 14 8845 8860 10.1093/nar/gku555 25053837
    [Google Scholar]
  3. Han X. Zhou Z. Fei L. Sun H. Wang R. Chen Y. Chen H. Wang J. Tang H. Ge W. Zhou Y. Ye F. Jiang M. Wu J. Xiao Y. Jia X. Zhang T. Ma X. Zhang Q. Bai X. Lai S. Yu C. Zhu L. Lin R. Gao Y. Wang M. Wu Y. Zhang J. Zhan R. Zhu S. Hu H. Wang C. Chen M. Huang H. Liang T. Chen J. Wang W. Zhang D. Guo G. Construction of a human cell landscape at single-cell level. Nature 2020 581 7808 303 309 10.1038/s41586‑020‑2157‑4 32214235
    [Google Scholar]
  4. Cheng C. Chen W. Jin H. Chen X. A Review of Single-Cell RNA-Seq Annotation, Integration, and Cell–Cell Communication. Cells 2023 12 15 1970 10.3390/cells12151970 37566049
    [Google Scholar]
  5. Zeng H. What is a cell type and how to define it? Cell 2022 185 15 2739 2755 10.1016/j.cell.2022.06.031 35868277
    [Google Scholar]
  6. Clarke Z.A. Andrews T.S. Atif J. Pouyabahar D. Innes B.T. MacParland S.A. Bader G.D. Tutorial: guidelines for annotating single-cell transcriptomic maps using automated and manual methods. Nat. Protoc. 2021 16 6 2749 2764 10.1038/s41596‑021‑00534‑0 34031612
    [Google Scholar]
  7. Zappia L. Phipson B. Oshlack A. Exploring the single-cell RNA-seq analysis landscape with the scRNA-tools database. PLOS Comput. Biol. 2018 14 6 e1006245 10.1371/journal.pcbi.1006245 29939984
    [Google Scholar]
  8. Diaz-Mejia J.J. Meng E.C. Pico A.R. MacParland S.A. Ketela T. Pugh T.J. Bader G.D. Morris J.H. Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data. F1000 Res. 2019 8 8 296 323 10.12688/f1000research.18490.1 31508207
    [Google Scholar]
  9. Abdelaal T. Michielsen L. Cats D. Hoogduin D. Mei H. Reinders M.J.T. Mahfouz A. A comparison of automatic cell identification methods for single-cell RNA sequencing data. Genome Biol. 2019 20 1 194 10.1186/s13059‑019‑1795‑z 31500660
    [Google Scholar]
  10. Huang Q. Liu Y. Du Y. Garmire L.X. Evaluation of cell type annotation R packages on single-cell RNA-seq data. Genomics Proteomics Bioinformatics 2021 19 2 267 281 10.1016/j.gpb.2020.07.004 33359678
    [Google Scholar]
  11. Kiselev V.Y. Andrews T.S. Hemberg M. Publisher Correction: Challenges in unsupervised clustering of single-cell RNA-seq data. Nat. Rev. Genet. 2019 20 5 310 10.1038/s41576‑019‑0095‑5 30670832
    [Google Scholar]
  12. Newman A.M. Liu C.L. Green M.R. Gentles A.J. Feng W. Xu Y. Hoang C.D. Diehn M. Alizadeh A.A. Robust enumeration of cell subsets from tissue expression profiles. Nat. Methods 2015 12 5 453 457 10.1038/nmeth.3337 25822800
    [Google Scholar]
  13. Crow M. Paul A. Ballouz S. Huang Z.J. Gillis J. Characterizing the replicability of cell types defined by single cell RNA-sequencing data using MetaNeighbor. Nat. Commun. 2018 9 1 884 10.1038/s41467‑018‑03282‑0 29491377
    [Google Scholar]
  14. Zhang Z. Luo D. Zhong X. Choi J.H. Ma Y. Wang S. Mahrt E. Guo W. Stawiski E.W. Modrusan Z. Seshagiri S. Kapur P. Hon G.C. Brugarolas J. Wang T. SCINA: A semi-supervised subtyping algorithm of single cells and bulk samples. Genes (Basel) 2019 10 7 531 10.3390/genes10070531 31336988
    [Google Scholar]
  15. Wang J. Ma A. Chang Y. Gong J. Jiang Y. Qi R. Wang C. Fu H. Ma Q. Xu D. scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses. Nat. Commun. 2021 12 1 1882 10.1038/s41467‑021‑22197‑x 33767197
    [Google Scholar]
  16. Ji X. Tsao D. Bai K. Tsao M. Xing L. Zhang X. scAnnotate: an automated cell-type annotation tool for single-cell RNA-sequencing data. Bioinformatics Advances 2023 3 1 vbad030 10.1093/bioadv/vbad030 36949780
    [Google Scholar]
  17. Liu H. Li H. Sharma A. Huang W. Pan D. Gu Y. Lin L. Sun X. Liu H. scAnno: a deconvolution strategy-based automatic cell type annotation tool for single-cell RNA-sequencing data sets. Brief. Bioinform. 2023 24 3 bbad179 10.1093/bib/bbad179 37183449
    [Google Scholar]
  18. Duan B. Zhu C. Chuai G. Tang C. Chen X. Chen S. Fu S. Li G. Liu Q. Learning for single-cell assignment. Sci. Adv. 2020 6 44 eabd0855 10.1126/sciadv.abd0855 33127686
    [Google Scholar]
  19. Kiselev V.Y. Yiu A. Hemberg M. scmap: projection of single-cell RNA-seq data across data sets. Nat. Methods 2018 15 5 359 362 10.1038/nmeth.4644 29608555
    [Google Scholar]
  20. Hu J. Li X. Hu G. Lyu Y. Susztak K. Li M. Iterative transfer learning with neural network for clustering and cell type classification in single-cell RNA-seq analysis. Nat. Mach. Intell. 2020 2 10 607 618 10.1038/s42256‑020‑00233‑7 33817554
    [Google Scholar]
  21. Skinnider M.A. Squair J.W. Foster L.J. Evaluating measures of association for single-cell transcriptomics. Nat. Methods 2019 16 5 381 386 10.1038/s41592‑019‑0372‑4 30962620
    [Google Scholar]
  22. Lieberman Y. Rokach L. Shay T. Correction: CaSTLe - Classification of single cells by transfer learning: Harnessing the power of publicly available single cell RNA sequencing experiments to annotate new experiments. PLoS One 2018 13 11 e0208349 10.1371/journal.pone.0208349 30481223
    [Google Scholar]
  23. Boufea K. Seth S. Batada N.N. scID uses discriminant analysis to identify transcriptionally equivalent cell types across single-cell RNA-Seq data with batch effect. iScience 2020 23 3 100914 10.1016/j.isci.2020.100914 32151972
    [Google Scholar]
  24. Zhang A.W. O’Flanagan C. Chavez E.A. Lim J.L.P. Ceglia N. McPherson A. Wiens M. Walters P. Chan T. Hewitson B. Lai D. Mottok A. Sarkozy C. Chong L. Aoki T. Wang X. Weng A.P. McAlpine J.N. Aparicio S. Steidl C. Campbell K.R. Shah S.P. Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling. Nat. Methods 2019 16 10 1007 1015 10.1038/s41592‑019‑0529‑1 31501550
    [Google Scholar]
  25. Kim H. Lee J. Kang K. Yoon S. MarkerCount: A stable, count-based cell type identifier for single-cell RNA-seq experiments. Comput. Struct. Biotechnol. J. 2022 20 3120 3132 10.1016/j.csbj.2022.06.010 35782735
    [Google Scholar]
  26. Tan Y. Cahan P. SingleCellNet: A computational tool to classify single cell RNA-Seq data across platforms and across species. Cell Syst. 2019 9 2 207 213.e2 10.1016/j.cels.2019.06.004 31377170
    [Google Scholar]
  27. Liu J. Yang M. Yu Y. Xu H. Li K. Zhou X. Large language models in bioinformatics: Applications and perspectives. ArXiv 2024
    [Google Scholar]
  28. Flores M. Liu Z. Zhang T. Hasib M.M. Chiu Y.C. Ye Z. Paniagua K. Jo S. Zhang J. Gao S.J. Jin Y.F. Chen Y. Huang Y. Deep learning tackles single-cell analysis—a survey of deep learning for scRNA-seq analysis. Brief. Bioinform. 2022 23 1 bbab531 10.1093/bib/bbab531 34929734
    [Google Scholar]
  29. Lazaros K. Vlamos P. Vrahatis A.G. Methods for cell-type annotation on scRNA-seq data: A recent overview. J. Bioinform. Comput. Biol. 2023 21 5 2340002 10.1142/S0219720023400024 37743364
    [Google Scholar]
  30. Liu T. Chen T. Zheng W. Luo X. Zhao H. scELMo: Embeddings from Language Models are Good Learners for Single-cell Data Analysis. bioRxiv 2023 10.1101/2023.12.07.569910
    [Google Scholar]
  31. Hou W. Ji Z. Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis. Nat. Methods 2024 21 8 1462 1465 10.1038/s41592‑024‑02235‑4 38528186
    [Google Scholar]
  32. Dong J. Zhang Y. Wang F. scSemiAE: a deep model with semi-supervised learning for single-cell transcriptomics. BMC Bioinformatics 2022 23 1 161 10.1186/s12859‑022‑04703‑0 35513780
    [Google Scholar]
  33. Wang Z. Wang H. Zhao J. Zheng C. scSemiAAE: a semi-supervised clustering model for single-cell RNA-seq data. BMC Bioinformatics 2023 24 1 217 10.1186/s12859‑023‑05339‑4 37237310
    [Google Scholar]
  34. Li Z. Wang Y. Ganan-Gomez I. Colla S. Do K.A. A machine learning-based method for automatically identifying novel cells in annotating single-cell RNA-seq data. Bioinformatics 2022 38 21 4885 4892 10.1093/bioinformatics/btac617 36083008
    [Google Scholar]
  35. Demestichas K. Peppes N. Alexakis T. Adamopoulou E. An advanced abnormal behavior detection engine embedding autoencoders for the investigation of financial transactions. Information (Basel) 2021 12 1 34 10.3390/info12010034
    [Google Scholar]
  36. Muraro M.J. Dharmadhikari G. Grün D. Groen N. Dielen T. Jansen E. van Gurp L. Engelse M.A. Carlotti F. de Koning E.J.P. van Oudenaarden A. A Single-Cell Transcriptome Atlas of the Human Pancreas. Cell Syst. 2016 3 4 385 394.e3 10.1016/j.cels.2016.09.002 27693023
    [Google Scholar]
  37. Tian L. Dong X. Freytag S. Lê Cao K.A. Su S. JalalAbadi A. Amann-Zalcenstein D. Weber T.S. Seidi A. Jabbari J.S. Naik S.H. Ritchie M.E. Benchmarking single cell RNA-sequencing analysis pipelines using mixture control experiments. Nat. Methods 2019 16 6 479 487 10.1038/s41592‑019‑0425‑8 31133762
    [Google Scholar]
  38. Segerstolpe Å. Palasantza A. Eliasson P. Andersson E.M. Andréasson A.C. Sun X. Picelli S. Sabirsh A. Clausen M. Bjursell M.K. Smith D.M. Kasper M. Ämmälä C. Sandberg R. Single-cell transcriptome profiling of human pancreatic islets in health and type 2 diabetes. Cell Metab. 2016 24 4 593 607 10.1016/j.cmet.2016.08.020 27667667
    [Google Scholar]
  39. Baron M. Veres A. Wolock S.L. Faust A.L. Gaujoux R. Vetere A. Ryu J.H. Wagner B.K. Shen-Orr S.S. Klein A.M. Melton D.A. Yanai I. A single-cell transcriptomic map of the human and mouse pancreas reveals inter- and intra-cell population structure. Cell Syst. 2016 3 4 346 360.e4 10.1016/j.cels.2016.08.011 27667365
    [Google Scholar]
  40. Tian L. Xie Y. Xie Z. Tian J. Tian W. AtacAnnoR: a reference-based annotation tool for single cell ATAC-seq data. Brief. Bioinform. 2023 24 5 bbad268 10.1093/bib/bbad268 37497729
    [Google Scholar]
  41. Ferré Q. Chèneby J. Puthier D. Capponi C. Ballester B. Anomaly detection in genomic catalogues using unsupervised multi-view autoencoders. BMC Bioinformatics 2021 22 1 460 10.1186/s12859‑021‑04359‑2 34563116
    [Google Scholar]
  42. Shafiq M. Gu Z. Deep Residual Learning for Image Recognition: A Survey. Appl. Sci. (Basel) 2022 12 18 8972 10.3390/app12188972
    [Google Scholar]
  43. Wang Y. Liu J. Misic J. Misic V.B. Lv S. Chang X. Assessing optimizer impact on DNN model sensitivity to adversarial examples. IEEE Access 2019 7 152766 152776 10.1109/ACCESS.2019.2948658
    [Google Scholar]
  44. Torabi H. Mirtaheri S.L. Greco S. Practical autoencoder based anomaly detection by using vector reconstruction error. Cybersecurity 2023 6 1 1 13 10.1186/s42400‑022‑00134‑9
    [Google Scholar]
  45. Huang Y. Zhang P. Evaluation of machine learning approaches for cell-type identification from single-cell transcriptomics data. Brief. Bioinform. 2021 22 5 bbab035 10.1093/bib/bbab035 33611343
    [Google Scholar]
  46. Amjoud A.B. Amrouch M. Transfer Learning for Automatic Image Orientation Detection Using Deep Learning and Logistic Regression. IEEE Access 2022 10 128543 128553 10.1109/ACCESS.2022.3225455
    [Google Scholar]
  47. Altan G. Kutlu Y. Allahverdi N. Deep learning on computerized analysis of chronic obstructive pulmonary disease. IEEE J. Biomed. Health Inform. 2020 24 5 1344 1350 10.1109/JBHI.2019.2931395 31369388
    [Google Scholar]
  48. Altan G DeepOCT: An explainable deep learning architecture to analyze macular edema on OCT images. JESTECH 2022 34 101091 10.1016/j.jestch.2021.101091
    [Google Scholar]
/content/journals/cbio/10.2174/0115748936334071240903064630
Loading
/content/journals/cbio/10.2174/0115748936334071240903064630
Loading

Data & Media loading...

Supplements

All supplementary materials mentioned in the article are presented in the documents of the Supplementary material.

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test