Skip to content
2000
image of Graph-Root: Prediction of Root-Associated Proteins in Maize, Sorghum, And Soybean Based on Graph Convolutional Network and Network Embedding Method

Abstract

Background

The root system plays an irreplaceable role in plant growth. Its improvement can increase crop productivity. However, such a system is still mysterious for us. The underlying mechanism has not been fully uncovered. The investigation on proteins related to the root system is an important means to complete this task. In the previous time, lack of root-related proteins makes it impossible to adopt machine learning methods for designing efficient models for the discovery of novel root-related proteins. Recently, a public database on root-related proteins was set up and machine learning methods can be applied in this field.

Objective

The purpose of this study was to design an efficient computational method to predict root-associated proteins in three plants: maize, sorghum, and soybean.

Method

In this study, we proposed a machine learning based model, named Graph-Root, for the identification of root-related proteins in maize, sorghum, and soybean. The features derived from protein sequences, functional domains, and one network were extracted, where the first type of features were processed by graph convolutional neural network and multi-head attention, the second type of features reflected the essential functions of proteins, and the third type of features abstracted the linkage between proteins. These features were fed into the fully connected layer to make predictions.

Results

The 5-fold cross-validation and independent tests suggested its acceptable performance. It also outperformed the only previous model, SVM-Root. Furthermore, the importance of each feature type and component in the proposed model was investigated.

Conclusion

Graph-Root had a good performance and can be a useful tool to identify novel root-related proteins. BLOSUM62 features were found to be important in determining root-related proteins.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/0115748936343410241008103219
2024-10-29
2025-01-19
Loading full text...

Full text loading...

References

  1. Schiefelbein J.W. Somerville C. Genetic control of root hair development in arabidopsis thaliana. Plant Cell 1990 2 3 235 243 10.2307/3869138 12354956
    [Google Scholar]
  2. Grierson C. Nielsen E. Ketelaarc T. Schiefelbein J. Root hairs. Arabidopsis Book 2014 12 e0172 10.1199/tab.0172 24982600
    [Google Scholar]
  3. Ogura T. Goeschl C. Filiault D. Root system depth in arabidopsis is shaped by EXOCYST70A3 via the dynamic modulation of auxin transport. Cell 2019 178 2 400 412.e16 10.1016/j.cell.2019.06.021 31299202
    [Google Scholar]
  4. Zhu J. Ingram P.A. Benfey P.N. Elich T. From lab to field, new approaches to phenotyping root system architecture. Curr. Opin. Plant Biol. 2011 14 3 310 317 10.1016/j.pbi.2011.03.020 21530367
    [Google Scholar]
  5. Lynch J. Root architecture and plant productivity. Plant Physiol. 1995 109 1 7 13 10.1104/pp.109.1.7 12228579
    [Google Scholar]
  6. Ober E.S. Alahmad S. Cockram J. Forestan C. Hickey L.T. Kant J. Wheat root systems as a breeding target for climate resilience. TAG Theor Appl Genet 2021 134 6 1645 1662 10.1007/s00122‑021‑03819‑w
    [Google Scholar]
  7. Li Y. Liu X. Chen R. Tian J. Fan Y. Zhou X. Genome-scale mining of root-preferential genes from maize and characterization of their promoter activity. BMC Plant Biol. 2019 19 1 584 10.1186/s12870‑019‑2198‑8 31878892
    [Google Scholar]
  8. Jung J.K.H. McCouch S. Getting to the roots of it: Genetic and hormonal control of root architecture. Front Plant Sci 2013 4 186 10.3389/fpls.2013.00186 23785372
    [Google Scholar]
  9. Ramireddy E. Nelissen H. Leuendorf J.E. Van Lijsebettens M. Inzé D. Schmülling T. Root engineering in maize by increasing cytokinin degradation causes enhanced root growth and leaf mineral enrichment. Plant Mol. Biol. 2021 106 6 555 567 10.1007/s11103‑021‑01173‑5 34275101
    [Google Scholar]
  10. Bush W.S. Moore J.H. Chapter 11: Genome-wide association studies. PLOS Comput. Biol. 2012 8 12 e1002822 10.1371/journal.pcbi.1002822 23300413
    [Google Scholar]
  11. Xu F. Chen S. Yang X. Genome-wide association study on root traits under different growing environments in wheat (Triticum aestivum L.). Front. Genet. 2021 12 646712 10.3389/fgene.2021.646712 34178022
    [Google Scholar]
  12. Kirschner G.K. Rosignoli S. Guo L. ENHANCED GRAVITROPISM 2 encodes a STERILE ALPHA MOTIF–containing protein that controls root growth angle in barley and wheat. Proc. Natl. Acad. Sci. USA 2021 118 35 e2101526118 10.1073/pnas.2101526118 34446550
    [Google Scholar]
  13. Karnatam K.S. Chhabra G. Saini D.K. Genome-wide meta-analysis of QTLs associated with root traits and implications for maize breeding. Int. J. Mol. Sci. 2023 24 7 6135 10.3390/ijms24076135 37047112
    [Google Scholar]
  14. Ma J. Zhao D. Tang X. Genome-wide association study on root system architecture and identification of candidate genes in wheat (Triticum aestivum L.). Int. J. Mol. Sci. 2022 23 3 1843 10.3390/ijms23031843 35163763
    [Google Scholar]
  15. Fizames C. Muños S. Cazettes C. The Arabidopsis root transcriptome by serial analysis of gene expression. Gene identification using the genome sequence. Plant Physiol. 2004 134 1 67 80 10.1104/pp.103.030536 14730065
    [Google Scholar]
  16. Moisseyev G Park K Cui A Freitas D Rajagopal D Konda AR RGPDB: Database of root-associated genes and promoters in maizeRGPDB: Database of root-associated genes and promoters in maize, soybean, and sorghum. Database J Biol Databases Curation 2020 Database J Biol Databases Curation 2020 2020 baaa038 10.1093/database/baaa038
    [Google Scholar]
  17. Kumar Meher P. Hati S. Sahu T.K. Pradhan U. Gupta A. Rath S.N. SVM-root: Identification of root-associated proteins in plants by employing the support vector machine with sequence-derived features. Curr. Bioinform. 2024 19 1 91 102 10.2174/1574893618666230417104543
    [Google Scholar]
  18. Kipf TN Welling M Semi-supervised classification with graph convolutional networks arXiv preprint, 1609, 02907 2016
    [Google Scholar]
  19. Lin Z Feng M. A structured self-attentive sentence embedding arXiv preprint :170303130 2017
    [Google Scholar]
  20. Grover A. Leskovec J. node2vec: Scalable feature learning for networks. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining San Francisco, California, USA 2016 855 64 10.1145/2939672.2939754
    [Google Scholar]
  21. Szklarczyk D. Kirsch R. Koutrouli M. The STRING database in 2023: Protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic Acids Res. 2023 51 D1 D638 D646 10.1093/nar/gkac1000 36370105
    [Google Scholar]
  22. Yates A.D. Allen J. Amode R.M. Ensembl Genomes 2022: An expanding genome resource for non-vertebrates. Nucleic Acids Res. 2022 50 D1 D996 D1003 10.1093/nar/gkab1007 34791415
    [Google Scholar]
  23. Bateman A. Martin M-J. Orchard S. UniProt: The universal protein knowledgebase in 2023. Nucleic Acids Res. 2023 51 D1 D523 D531 10.1093/nar/gkac1052 36408920
    [Google Scholar]
  24. Fu L. Niu B. Zhu Z. Wu S. Li W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformatics 2012 28 23 3150 3152 10.1093/bioinformatics/bts565 23060610
    [Google Scholar]
  25. Henikoff S. Henikoff J.G. Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. USA 1992 89 22 10915 10919 10.1073/pnas.89.22.10915 1438297
    [Google Scholar]
  26. Altschul S. Madden T.L. Schäffer A.A. Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 1997 25 17 3389 3402 10.1093/nar/25.17.3389 9254694
    [Google Scholar]
  27. Boeckmann B. Bairoch A. Apweiler R. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 2003 31 1 365 370 10.1093/nar/gkg095 12520024
    [Google Scholar]
  28. Singh J. Litfin T. Singh J. Paliwal K. Zhou Y. SPOT-Contact-LM: Improving single-sequence-based prediction of protein contact map using a transformer language model. Bioinformatics 2022 38 7 1888 1894 10.1093/bioinformatics/btac053 35104320
    [Google Scholar]
  29. Pan X Chen L Liu I Niu Z Huang T Cai YD Identifying protein subcellular locations with embeddings-based node2loc. IEEE/ACM Trans Comput Biol Bioinform 2022 19 2 666 675 10.1109/TCBB.2021.3080386
    [Google Scholar]
  30. Pan X. Li H. Zeng T. Identification of protein subcellular localization with network and functional embeddings. Front. Genet. 2021 11 626500 10.3389/fgene.2020.626500 33584818
    [Google Scholar]
  31. Chen L. Gu J. Zhou B. PMiSLocMF: Predicting miRNA subcellular localizations by incorporating multi-source features of miRNAs. Brief. Bioinform. 2024 25 5 bbae386 10.1093/bib/bbae386 39154195
    [Google Scholar]
  32. Zhao R. Hu B. Chen L. Zhou B. Identification of latent oncogenes with a network embedding method and random forest. BioMed Res. Int. 2020 2020 1 11 10.1155/2020/5160396 33029511
    [Google Scholar]
  33. Perozzi B. Al-Rfou R. Skiena S. Deepwalk: Online learning of social representations. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining New York, USA 24 August 2014 2014 701 710 10.1145/2623330.2623732
    [Google Scholar]
  34. Cho H. Berger B. Peng J. Compact integration of multi-network topology for functional analysis of genes. Cell Syst. 2016 3 6 540 548.e5 10.1016/j.cels.2016.10.017 27889536
    [Google Scholar]
  35. Tang J. Qu M. Wang M. Zhang M. Yan J. Mei Q. Line: Large-scale information network embedding. Proceedings of the 24th international conference on world wide web Florence, Italy 18 May 2015 1067 1077 10.1145/2736277.2741093
    [Google Scholar]
  36. Mikolov T. Chen K. Corrado G. Dean J. Efficient estimation of word representations in vector space. arXiv:13013781 2013
    [Google Scholar]
  37. Chen L. Zhang C. Xu J. PredictEFC: A fast and efficient multi-label classifier for predicting enzyme family classes. BMC Bioinformatics 2024 25 1 50 10.1186/s12859‑024‑05665‑1 38291384
    [Google Scholar]
  38. Cai Y.D. Chou K.C. Using functional domain composition to predict enzyme family classes. J. Proteome Res. 2005 4 1 109 111 10.1021/pr049835p 15707365
    [Google Scholar]
  39. Lu L. Qian Z. Cai Y.D. Li Y. ECS: An automatic enzyme classifier based on functional domain composition. Comput. Biol. Chem. 2007 31 3 226 232 10.1016/j.compbiolchem.2007.03.008 17500036
    [Google Scholar]
  40. Zou Z. Tian S. Gao X. Li Y. mlDEEPre: Multi-functional enzyme function prediction with hierarchical multi-label deep learning. Front. Genet. 2019 9 714 10.3389/fgene.2018.00714 30723495
    [Google Scholar]
  41. Blum M. Chang H.Y. Chuguransky S. The InterPro protein families and domains database: 20 years on. Nucleic Acids Res. 2021 49 D1 D344 D354 10.1093/nar/gkaa977 33156333
    [Google Scholar]
  42. Apweiler R. Attwood T.K. Bairoch A. The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res. 2001 29 1 37 40 10.1093/nar/29.1.37 11125043
    [Google Scholar]
  43. Kingma D.P. Ba J. Adam: A method for stochastic optimization. arXiv:14126980 2019
    [Google Scholar]
  44. Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of the 14th International Joint Conference on Artificial Intelligence Montreal, Quebec, Canada 1995 1137 1143
    [Google Scholar]
  45. Chen L. Chen Y. RMTLysPTM: Recognizing multiple types of lysine PTM sites by deep analysis on sequences. Brief. Bioinform. 2023 25 1 bbad450 10.1093/bib/bbad450 38066710
    [Google Scholar]
  46. Powers D. Evaluation: From precision, recall and f-measure to roc., informedness, markedness & correlation. J. Mach. Learn. Technol. 2011 2 1 37 63
    [Google Scholar]
  47. Matthews B.W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim. Biophys. Acta Protein Struct. 1975 405 2 442 451 10.1016/0005‑2795(75)90109‑9 1180967
    [Google Scholar]
  48. Chen L. Li L. Prediction of drug pathway-based disease classes using multiple properties of drugs. Curr. Bioinform. 2024 19 9 859 872 10.2174/0115748936284973240105115444
    [Google Scholar]
  49. Srivastava A. Kumar M. Prediction of zinc binding sites in proteins using sequence derived information. J. Biomol. Struct. Dyn. 2018 36 16 4413 4423 10.1080/07391102.2017.1417910 29241411
    [Google Scholar]
  50. Chen L. Hu H. MBPathNCP: A metabolic pathway prediction model for chemicals and enzymes based on network consistency projection. Curr. Bioinform. 2024
    [Google Scholar]
  51. Chen L. Zhao X. PCDA-HNMP: Predicting circRNA-disease association using heterogeneous network and meta-path. Math. Biosci. Eng. 2023 20 12 20553 20575 10.3934/mbe.2023909 38124565
    [Google Scholar]
  52. Chen L. Xu J. Zhou Y. PDATC-NCPMKL: Predicting drug’s Anatomical Therapeutic Chemical (ATC) codes based on network consistency projection and multiple kernel learning. Comput. Biol. Med. 2024 169 107862 10.1016/j.compbiomed.2023.107862 38150886
    [Google Scholar]
  53. Chowdhury S.Y. Shatabda S. Dehzangi A. iDNAProt-ES: Identification of DNA-binding proteins using evolutionary and structural features. Sci. Rep. 2017 7 1 14938 10.1038/s41598‑017‑14945‑1 29097781
    [Google Scholar]
  54. Huang F. Ma Q. Ren J. Identification of smoking‐associated transcriptome aberration in blood with machine learning methods. BioMed Res. Int. 2023 2023 1 5333361 10.1155/2023/5333361 36644165
    [Google Scholar]
  55. Ren J. Zhang Y. Guo W. Identification of genes associated with the impairment of olfactory and gustatory functions in covid-19 via machine-learning methods. Life 2023 13 3 798 10.3390/life13030798 36983953
    [Google Scholar]
  56. Wang Y. Xu Y. Yang Z. Liu X. Dai Q. Using recursive feature selection with random forest to improve protein structural class prediction for low-similarity sequences. Comput. Math. Methods Med. 2021 2021 1 9 10.1155/2021/5529389 34055035
    [Google Scholar]
  57. Onesime M. Yang Z. Dai Q. Genomic island prediction via chi-square test and random forest algorithm. Comput. Math. Methods Med. 2021 2021 1 9 10.1155/2021/9969751 34122622
    [Google Scholar]
  58. Pedregosa F. Varoquaux G. Gramfort A. Michel V. Thirion B. Grisel O. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 2011 122 2825 2830
    [Google Scholar]
  59. Rives A. Meier J. Sercu T. Goyal S. Lin Z. Liu J. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl. Acad. Sci. USA 2021 118 15 e2016239118 10.1073/pnas.2016239118
    [Google Scholar]
  60. Rao R.M. Liu J. Verkuil R. Meier J. Canny J. Abbeel P. MSA Transformer. Preprints 2021 10.1101/2021.02.12.430858
    [Google Scholar]
/content/journals/cbio/10.2174/0115748936343410241008103219
Loading
/content/journals/cbio/10.2174/0115748936343410241008103219
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test