- Home
- A-Z Publications
- Current Bioinformatics
- Previous Issues
- Volume 14, Issue 3, 2019
Current Bioinformatics - Volume 14, Issue 3, 2019
Volume 14, Issue 3, 2019
-
-
Protein Inter-Residue Contacts Prediction: Methods, Performances and Applications
Authors: Xiaoyang Jing, Qimin Dong, Ruqian Lu and Qiwen DongBackground: Protein inter-residue contacts prediction play an important role in the field of protein structure and function research. As a low-dimensional representation of protein tertiary structure, protein inter-residue contacts could greatly help de novo protein structure prediction methods to reduce the conformational search space. Over the past two decades, various methods have been developed for protein inter-residue contacts prediction. Objective: We provide a comprehensive and systematic review of protein inter-residue contacts prediction methods. Results: Protein inter-residue contacts prediction methods are roughly classified into five categories: correlated mutations methods, machine-learning methods, fusion methods, templatebased methods and 3D model-based methods. In this paper, firstly we describe the common definition of protein inter-residue contacts and show the typical application of protein inter-residue contacts. Then, we present a comprehensive review of the three main categories for protein interresidue contacts prediction: correlated mutations methods, machine-learning methods and fusion methods. Besides, we analyze the constraints for each category. Furthermore, we compare several representative methods on the CASP11 dataset and discuss performances of these methods in detail. Conclusion: Correlated mutations methods achieve better performances for long-range contacts, while the machine-learning method performs well for short-range contacts. Fusion methods could take advantage of the machine-learning and correlated mutations methods. Employing more effective fusion strategy could be helpful to further improve the performances of fusion methods.
-
-
-
A Review on the Recent Developments of Sequence-based Protein Feature Extraction Methods
More LessBackground: Proteins play a crucial role in life activities, such as catalyzing metabolic reactions, DNA replication, responding to stimuli, etc. Identification of protein structures and functions are critical for both basic research and applications. Because the traditional experiments for studying the structures and functions of proteins are expensive and time consuming, computational approaches are highly desired. In key for computational methods is how to efficiently extract the features from the protein sequences. During the last decade, many powerful feature extraction algorithms have been proposed, significantly promoting the development of the studies of protein structures and functions. Objective: To help the researchers to catch up the recent developments in this important field, in this study, an updated review is given, focusing on the sequence-based feature extractions of protein sequences. Method: These sequence-based features of proteins were grouped into three categories, including composition-based features, autocorrelation-based features and profile-based features. The detailed information of features in each group was introduced, and their advantages and disadvantages were discussed. Besides, some useful tools for generating these features will also be introduced. Results: Generally, autocorrelation-based features outperform composition-based features, and profile-based features outperform autocorrelation-based features. The reason is that profile-based features consider the evolutionary information, which is useful for identification of protein structures and functions. However, profile-based features are more time consuming, because the multiple sequence alignment process is required. Conclusion: In this study, some recently proposed sequence-based features were introduced and discussed, such as basic k-mers, PseAAC, auto-cross covariance, top-n-gram etc. These features did make great contributions to the developments of protein sequence analysis. Future studies can be focus on exploring the combinations of these features. Besides, techniques from other fields, such as signal processing, natural language process (NLP), image processing etc., would also contribute to this important field, because natural languages (such as English) and protein sequences share some similarities. Therefore, the proteins can be treated as documents, and the features, such as k-mers, top-n-grams, motifs, can be treated as the words in the languages. Techniques from these filed will give some new ideas and strategies for extracting the features from proteins.
-
-
-
The Recent Applications and Developments of Bioinformatics and Omics Technologies in Traditional Chinese Medicine
More LessBackground: Traditional Chinese Medicine (TCM) is widely utilized as complementary health care in China whose acceptance is still hindered by conventional scientific research methodology, although it has been exercised and implemented for nearly 2000 years. Identifying the molecular mechanisms, targets and bioactive components in TCM is a critical step in the modernization of TCM because of the complexity and uniqueness of the TCM system. With recent advances in computational approaches and high throughput technologies, it has become possible to understand the potential TCM mechanisms at the molecular and systematic level, to evaluate the effectiveness and toxicity of TCM treatments. Bioinformatics is gaining considerable attention to unearth the in-depth molecular mechanisms of TCM, which emerges as an interdisciplinary approach owing to the explosive omics data and development of computer science. Systems biology, based on the omics techniques, opens up a new perspective which enables us to investigate the holistic modulation effect on the body. Objective: This review aims to sum up the recent efforts of bioinformatics and omics techniques in the research of TCM including Systems biology, Metabolomics, Proteomics, Genomics and Transcriptomics. Conclusion: Overall, bioinformatics tools combined with omics techniques have been extensively used to scientifically support the ancient practice of TCM to be scientific and international through the acquisition, storage and analysis of biomedical data.
-
-
-
A Survey on Computational Methods for Essential Proteins and Genes Prediction
Authors: Ming Fang, Xiujuan Lei and Ling GuoBackground: Essential proteins play important roles in the survival or reproduction of an organism and support the stability of the system. Essential proteins are the minimum set of proteins absolutely required to maintain a living cell. The identification of essential proteins is a very important topic not only for a better comprehension of the minimal requirements for cellular life, but also for a more efficient discovery of the human disease genes and drug targets. Traditionally, as the experimental identification of essential proteins is complex, it usually requires great time and expense. With the cumulation of high-throughput experimental data, many computational methods that make useful complements to experimental methods have been proposed to identify essential proteins. In addition, the ability to rapidly and precisely identify essential proteins is of great significance for discovering disease genes and drug design, and has great potential for applications in basic and synthetic biology research. Objective: The aim of this paper is to provide a review on the identification of essential proteins and genes focusing on the current developments of different types of computational methods, point out some progress and limitations of existing methods, and the challenges and directions for further research are discussed.
-
-
-
The Computational Prediction Methods for Linear B-cell Epitopes
Authors: Cangzhi Jia, Hongyan Gong, Yan Zhu and Yixia ShiBackground: B-cell epitope prediction is an essential tool for a variety of immunological studies. For identifying such epitopes, several computational predictors have been proposed in the past 10 years. Objective: In this review, we summarized the representative computational approaches developed for the identification of linear B-cell epitopes. Methods: We mainly discuss the datasets, feature extraction methods and classification methods used in the previous work. Results: The performance of the existing methods was not very satisfying, and so more effective approaches should be proposed by considering the structural information of proteins. Conclusion: We consider existing challenges and future perspectives for developing reliable methods for predicting linear B-cell epitopes.
-
-
-
A Brief Survey of Machine Learning Methods in Protein Sub-Golgi Localization
Authors: Wuritu Yang, Xiao-Juan Zhu, Jian Huang, Hui Ding and Hao LinBackground: The location of proteins in a cell can provide important clues to their functions in various biological processes. Thus, the application of machine learning method in the prediction of protein subcellular localization has become a hotspot in bioinformatics. As one of key organelles, the Golgi apparatus is in charge of protein storage, package, and distribution. Objective: The identification of protein location in Golgi apparatus will provide in-depth insights into their functions. Thus, the machine learning-based method of predicting protein location in Golgi apparatus has been extensively explored. The development of protein sub-Golgi apparatus localization prediction should be reviewed for providing a whole background for the fields. Method: The benchmark dataset, feature extraction, machine learning method and published results were summarized. Results: We briefly introduced the recent progresses in protein sub-Golgi apparatus localization prediction using machine learning methods and discussed their advantages and disadvantages. Conclusion: We pointed out the perspective of machine learning methods in protein sub-Golgi localization prediction.
-
-
-
Research Progress of Exogenous Plant MiRNAs in Cross-Kingdom Regulation
Authors: Hao Zhang, Mengping Zhan, Haowu Chang, Shizeng Song, Chunhe Zhang and Yuanning LiuBackground: Studies have shown that exogenous miRNAs have cross-kingdom regulatory effects on bacteria and viruses, but whether exogenous plant miRNAs are stable in human body or participate in cross-kingdom regulation is still controversial. Objective: This study aims to propose a new method for the presence and cross-kingdom regulation pathway of exogenous Plant miRNA, which combines biological calculations and biological experiments. Method: Based on the high-throughput sequencing data of human health tissue, the tissue specificity model of exogenous plant miRNA can be constructed and the absorption characteristics will be excavated and analyzed. Then screening the exogenous Plant miRNA based on the crosskingdom regulation model of plant-human miRNA, and isotope labeling can be used to verify the presence and regulation pathway of exogenous plant miRNA. Results: Only based on a comprehensive analysis to human high-throughput miRNA data, establishing cross-kingdom regulation model and designing effective biological experiments, can we reveal the existence, access pathways and regulation of exogenous plant miRNAs. Conclusion: Here, we reviewed the most recent advances in the presence and pathway of exogenous plant miRNAs into human and their cross-kingdom regulation.
-
-
-
A Review of DNA-binding Proteins Prediction Methods
Authors: Kaiyang Qu, Leyi Wei and Quan ZouBackground: DNA-binding proteins, binding to DNA, widely exist in living cells, participating in many cell activities. They can participate some DNA-related cell activities, for instance DNA replication, transcription, recombination, and DNA repair. Objective: Given the importance of DNA-binding proteins, studies for predicting the DNA-binding proteins have been a popular issue over the past decades. In this article, we review current machine-learning methods which research on the prediction of DNA-binding proteins through feature representation methods, classifiers, measurements, dataset and existing web server. Method: The prediction methods of DNA-binding protein can be divided into two types, based on amino acid composition and based on protein structure. In this article, we accord to the two types methods to introduce the application of machine learning in DNA-binding proteins prediction. Results: Machine learning plays an important role in the classification of DNA-binding proteins, and the result is better. The best ACC is above 80%. Conclusion: Machine learning can be widely used in many aspects of biological information, especially in protein classification. Some issues should be considered in future work. First, the relationship between the number of features and performance must be explored. Second, many features are used to predict DNA-binding proteins and propose solutions for high-dimensional spaces.
-
-
-
Data Integration of Hybrid Microarray and Single Cell Expression Data to Enhance Gene Network Inference
Authors: Wei Zhang, Wenchao Li, Jianming Zhang and Ning WangBackground: Gene Regulatory Network (GRN) inference algorithms aim to explore casual interactions between genes and transcriptional factors. High-throughput transcriptomics data including DNA microarray and single cell expression data contain complementary information in network inference. Objective: To enhance GRN inference, data integration across various types of expression data becomes an economic and efficient solution. Method: In this paper, a novel E-alpha integration rule-based ensemble inference algorithm is proposed to merge complementary information from microarray and single cell expression data. This paper implements a Gradient Boosting Tree (GBT) inference algorithm to compute importance scores for candidate gene-gene pairs. The proposed E-alpha rule quantitatively evaluates the credibility levels of each information source and determines the final ranked list. Results: Two groups of in silico gene networks are applied to illustrate the effectiveness of the proposed E-alpha integration. Experimental outcomes with size50 and size100 in silico gene networks suggest that the proposed E-alpha rule significantly improves performance metrics compared with single information source. Conclusion: In GRN inference, the integration of hybrid expression data using E-alpha rule provides a feasible and efficient way to enhance performance metrics than solely increasing sample sizes.
-
-
-
A Novel Model for Predicting LncRNA-disease Associations based on the LncRNA-MiRNA-Disease Interactive Network
Authors: Lei Wang, Zhanwei Xuan, Shunxian Zhou, Linai Kuang and Tingrui PeiBackground: Accumulating experimental studies have manifested that long-non-coding RNAs (lncRNAs) play an important part in various biological process. It has been shown that their alterations and dysregulations are closely related to many critical complex diseases. Objective: It is of great importance to develop effective computational models for predicting potential lncRNA-disease associations. Method: Based on the hypothesis that there would be potential associations between a lncRNA and a disease if both of them have associations with the same group of microRNAs, and similar diseases tend to be in close association with functionally similar lncRNAs. A novel method for calculating similarities of both lncRNAs and diseases is proposed, and then a novel prediction model LDLMD for inferring potential lncRNA-disease associations is proposed. Results: LDLMD can achieve an AUC of 0.8925 in the Leave-One-Out Cross Validation (LOOCV), which demonstrated that the newly proposed model LDLMD significantly outperforms previous state-of-the-art methods and could be a great addition to the biomedical research field. Conclusion: Here, we present a new method for predicting lncRNA-disease associations, moreover, the method of our present decrease the time and cost of biological experiments.
-
Volumes & issues
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)