Skip to content
2000
Volume 15, Issue 5
  • ISSN: 1574-8936
  • E-ISSN: 2212-392X

Abstract

Background: Thermophilic proteins can maintain good activity under high temperature, therefore, it is important to study thermophilic proteins for the thermal stability of proteins. Objective: In order to solve the problem of low precision and low efficiency in predicting thermophilic proteins, a prediction method based on feature fusion and machine learning was proposed in this paper. Methods: For the selected thermophilic data sets, firstly, the thermophilic protein sequence was characterized based on feature fusion by the combination of g-gap dipeptide, entropy density and autocorrelation coefficient. Then, Kernel Principal Component Analysis (KPCA) was used to reduce the dimension of the expressed protein sequence features in order to reduce the training time and improve efficiency. Finally, the classification model was designed by using the classification algorithm. Results: A variety of classification algorithms was used to train and test on the selected thermophilic dataset. By comparison, the accuracy of the Support Vector Machine (SVM) under the jackknife method was over 92%. The combination of other evaluation indicators also proved that the SVM performance was the best. Conclusion: Because of choosing an effectively feature representation method and a robust classifier, the proposed method is suitable for predicting thermophilic proteins and is superior to most reported methods.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/1574893615666200207094357
2020-06-01
2025-06-17
Loading full text...

Full text loading...

/content/journals/cbio/10.2174/1574893615666200207094357
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test