Skip to content
2000
Volume 17, Issue 1
  • ISSN: 1574-8936
  • E-ISSN: 2212-392X

Abstract

Background: The identification of DNA binding proteins (DBP) is an important research field. Experiment-based methods are time-consuming and labor-intensive for detecting DBP. Objective: To solve the problem of large-scale DBP identification, some machine learning methods are proposed. However, these methods have insufficient predictive accuracy. Our aim is to develop a sequence- based machine learning model to predict DBP. Methods: In our study, we extracted six types of features (including NMBAC, GE, MCD, PSSM-AB, PSSM-DWT, and PsePSSM) from protein sequences. We used Multiple Kernel Learning based on Hilbert- Schmidt Independence Criterion (MKL-HSIC) to estimate the optimal kernel. Then, we constructed a hypergraph model to describe the relationship between labeled and unlabeled samples. Finally, Laplacian Support Vector Machines (LapSVM) is employed to train the predictive model. Our method is tested on PDB186, PDB1075, PDB2272 and PDB14189 data sets. Results: Compared with other methods, our model achieved best results on benchmark data sets. Conclusion: The accuracy of 87.1% and 74.2% are achieved on PDB186 (Independent test of PDB1075) and PDB2272 (Independent test of PDB14189), respectively.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/1574893616666210806091922
2022-01-01
2024-12-27
Loading full text...

Full text loading...

/content/journals/cbio/10.2174/1574893616666210806091922
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test