Skip to content
2000
Volume 13, Issue 4
  • ISSN: 1574-8936
  • E-ISSN: 2212-392X

Abstract

Background: Post-translational modifications (PTMs) are a key regulating mechanism in the cellular process. It is of importance to quickly and accurately identify PTMs. Both next generation sequencing as well as bioinformatics techniques greatly facilitated discovery of PTMs. Most bioinformatics techniques followed the machine learning framework where feature extraction occupies a key position. Conclusion: The article focuses mainly on reviewing various feature extractions from protein sequence, structure, function, physicochemical and biochemical property and evolution conservation, which were used for predicting PTMs in the machine learning-based methods. The binary encoding, amino acid composition, pseudo amino acid composition, composition of K-spaced amino acid pairs, auto correlation functions, position weight amino acids composition and position-specific amino acid propensity extracted features directly from protein sequences. Encoding based on grouped weight is a hybrid way of feature extraction integrating information both on physicochemical and biochemical property and on sequences. The information on protein structure, especially secondary structure, accessible surface and disorder was used for encoding proteins. The feature extraction from the evolution conservation included position-specific scoring matrix and k-nearest neighbor score. In addition, we discussed some existing problems in the feature extractions.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/1574893612666170707094916
2018-08-01
2025-05-21
Loading full text...

Full text loading...

/content/journals/cbio/10.2174/1574893612666170707094916
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test