Skip to content
2000
Volume 13, Issue 5
  • ISSN: 1574-8936
  • E-ISSN: 2212-392X

Abstract

Objective: Small GTPase is an important molecular switch that plays an important role in numerous signaling transduction pathways, the aim is to explore its binary classification features with machine learning algorithms. Methods: The sequences including small GTPases and non small GTPases were clustered to remove similar entries, respectively. Then, they were divided into 10 datasets, each containing equal entries of small GTPases and non small GTPases. These datasets extracted three feature vectors that included188- dimensional(188D), 400D, and motif-based features (608D). The next step was classification based on easy-classify.py software in scikit-learn, which integrated 12 classifiers and finally discovered the conserved motifs by MEME suite. Results: The three best performed classifiers were logistic regression (LR), gradient boosting decision tree (GBDT), and bagging for 400D features, LibSVM, GBDT, and bagging for 188D features, and GBDT, bagging, and AdaBoost for 608D features, respectively. The top four classifiers were GBDT, bagging, LR, and AdaBoost according to commonly evaluated indices as a whole. GBDT obtained the highest area under the curve (AUC) value at 88.61%. The 400D features performed better than the 188D and 608D ones. Five conserved G-box motifs were discovered in the sequences of human small GTPases. Conclusion: This study provides the first description of GBDT algorithm performed best for small GTPases classification.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/1574893612666171121162552
2018-10-01
2025-05-25
Loading full text...

Full text loading...

/content/journals/cbio/10.2174/1574893612666171121162552
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test