Skip to content
2000
Volume 16, Issue 10
  • ISSN: 1574-8936
  • E-ISSN: 2212-392X

Abstract

Background: The number of human genetic variants deposited into publicly available databases has been increasing exponentially. Among these variants, non-synonymous single nucleotide polymorphisms (nsSNPs), also known as single Amino Acid Polymorphisms (SAPs), have been demonstrated to be strongly correlated with phenotypic variations of traits/diseases. Objective: However, the detailed mechanisms governing the disease association of SAPs remain unclear. Thus, further investigation of new attributes and improvement of the prediction becomes more and more urgent since amount of unknown disease-related SAPs need to be investigated. Methods: Based on the principle of Random Forest (RF), we firstly constructed a new effective prediction model for SAPs associated with a particular disease from protein sequences. Four usual sequence signature extractions were separately performed to select the optimal features. Then SAP peptide lengths from 12 to 202 were also optimized. Results: The optimal models achieve higher than 90% accuracy and Area Under the Curve (AUC) of over 0.9 on all 11 external testing datasets. Finally, the good performance on an independent test set with an accuracy higher than 95% proves the superiority of our method. Conclusion: In this paper, based on Random Forest (RF), we constructed 11 disease-association prediction models for SAPs from the protein sequence level. All models yield prediction accuracy higher than 90% and Area Under the Curve (AUC) more than 0.9. Our method only using the information of protein sequences are more universal than those that depend on some additional information or predictions about the proteins.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/1574893616666210825094751
2021-12-01
2025-06-12
Loading full text...

Full text loading...

/content/journals/cbio/10.2174/1574893616666210825094751
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test