Skip to content
2000
Volume 18, Issue 5
  • ISSN: 1566-5232
  • E-ISSN: 1875-5631

Abstract

Motivation: Knowledge of the correct protein subcellular localization is necessary for understanding the function of a protein and revealing the mechanism of many human diseases due to protein subcellular mislocalization, which is required before approaching gene therapy to treat a disease. In addition, it is well-known that the gene therapy is an effective way to overcome disease by targeting a gene therapy product to a specific subcellular compartment. Deep neural networks to predict protein function have become increasingly popular due to large increases in the available genomics data due to its strong superiority in the non-linear classification ability. However, they still have some drawbacks such as too many hyper-parameters and sufficient amount of labeled data. Results: We present a deep forest-based protein location algorithm relying on sequence information. The prediction model uses a random forest network with a multi-layered structure to identify the subcellular regions of protein. The model was trained and tested on a latest UniProt releases protein dataset, and we demonstrate that our deep forest predict the subcellular location of proteins given only the protein sequence with high accuracy, outperforming the current state-of-art algorithms. Meanwhile, unlike the deep neural networks, it has a significantly smaller number of parameters and is much easier to train.

Loading

Article metrics loading...

/content/journals/cgt/10.2174/1566523218666180913110949
2018-10-01
2025-05-08
Loading full text...

Full text loading...

/content/journals/cgt/10.2174/1566523218666180913110949
Loading
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test