Discrimination of Thermophilic and Mesophilic Proteins Using Reduced Amino Acid Alphabets with n-Grams

Aydin Albayrak; Ugur O. Sezerman

doi:10.2174/157489312800604435

ISSN: 1574-8936
E-ISSN: 2212-392X

Discrimination of Thermophilic and Mesophilic Proteins Using Reduced Amino Acid Alphabets with n-Grams
Authors: Aydin Albayrak¹ and Ugur O. Sezerman²
View Affiliations Hide Affiliations

¹ Biological Sciences and Bioengineering, Sabanci University, Orhanli, Tuzla, Istanbul, Turkey. ² Biological Sciences and Bioengineering, Sabanci University, Orhanli, Tuzla, Istanbul, Turkey.
Source: Current Bioinformatics, Volume 7, Issue 2, Jun 2012, p. 152 - 158
DOI: https://doi.org/10.2174/157489312800604435
- Available online: 01 Jun 2012

Abstract

Protein thermostabilization has been the focus of recent research due to growing interest in the production of enzymes that can operate at temperatures that are industrially beneficial. Understanding the determinants of thermostabilization at the level of sequence and structure is important to design such enzymes. A bioinformatical approach was used to determine the extent by which reduced amino acid alphabets (RAAA) with n-grams (subsequences of length n) that were subjected to a t-test-based feature selection procedure can be used to discriminate proteins from thermophiles and mesophiles. Classification performance of 65 different protein alphabets with 3 different n-gram sizes was systematically evaluated using support vector machines in a test set that contained 707 proteins from mesophilic Xylella fastidosa and thermophilic Aquifex aeolicus. A classification accuracy of 91.796% was achieved with Hsdm16 RAAA with 13 features: EK-ILV-ST-A-G-F-H-Q-N-R-M-W-Y. The t-test-based feature selection procedure reduced the classification time without significantly affecting classification accuracy. The overall combination of methods in this paper is useful and computationally fast for classifying protein sequences from thermophiles and mesophiles using sequence information alone.

Article metrics loading...

/content/journals/cbio/10.2174/157489312800604435

2012-06-01

2026-02-09

From This Site

/content/journals/cbio/10.2174/157489312800604435

dcterms_title,dcterms_subject,pub_keyword

-contentType:Contributor -contentType:Concept -contentType:Institution

10

5

Full text loading...

/content/journals/cbio/10.2174/157489312800604435

Article Type: Research Article

Keyword(s): Amino acid composition; dipeptide; homologous proteins; N-grams; reduced amino acid alphabets; statistically significant features; thermostability; tripeptide; Xylella fastidosa

Most Cited Most Cited RSS feed

- A Review of Ensemble Methods in Bioinformatics
  
  Authors: Pengyi Yang, Yee Hwa Yang, Bing B. Zhou and Albert Y. Zomaya
- Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
  
  Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru Tomita
- Distance-based Support Vector Machine to Predict DNA N6- methyladenine Modification
  
  Authors: Haoyu Zhang, Quan Zou, Ying Ju, Chenggang Song and Dong Chen
- A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
  
  Authors: Jun Zhang and Bin Liu
- Molecular Genetic Markers: Discovery, Applications, Data Storage and Visualisation
  
  Authors: Chris Duran, Nikki Appleby, David Edwards and Jacqueline Batley
- A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
  
  Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao Lin
- Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
  
  Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan Zou
- Relevance of Molecular Docking Studies in Drug Designing
  
  Authors: Ritu Jakhar, Mehak Dangi, Alka Khichi and Anil K. Chhillar
- The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
  
  Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng Xie
- Gene Expression Profile Classification: A Review
  
  Authors: Musa H. Asyali, Dilek Colak, Omer Demirkaya and Mehmet S. Inan
More Less

Discrimination of Thermophilic and Mesophilic Proteins Using Reduced Amino Acid Alphabets with n-Grams

Abstract

Most Read This Month

Most Cited Most Cited RSS feed