- Home
- A-Z Publications
- Current Bioinformatics
- Previous Issues
- Volume 13, Issue 4, 2018
Current Bioinformatics - Volume 13, Issue 4, 2018
Volume 13, Issue 4, 2018
-
-
Natural Vector Method for Virus Phylogenetic Classification: A Mini-Review
By Chenglong YuBackground: Existing alignment-based phylogenetic methods remain computationally arduous and even impossible for large numbers of viral genetic sequences. Objective: Alignment-free methodologies which successfully overcome serious limitations of alignment-based ways, especially for computation time and storage space, have been quickly proposed. Methods: Natural vector method is a new alignment-free approach for studying phylogenetics. Because of its simplicity when compared with the alignment-based ways, the method has been successfully applied in virus phylogenetic classification lately. Results: Natural vector method allows us to identify phylogenetic relationships for single-segmented and multiple-segmented viruses at different classification levels such as family, subfamily, genus, species and Baltimore class. The method has high speed and accuracy. Conclusion: Here we reviewed the development of the natural vector method and discussed its applications on virus phylogenetic classification.
-
-
-
A Review of Biological Image Analysis
Authors: Weiyang Chen, Weiwei Li, Xiangjun Dong and Jialun PeiBackground: In recent years, there is an increasing number of researchers applying bioimaging techniques to generate a myriad of biological images. The growing image data pose great methodological challenges for image processing and quantitative analysis. The analyses of biological images range from the quantification of phenotypes to the visualization of biological structures. Objective: Accurate, high-throughput and quantitative biological phenotypes from images is becoming an important technique in many labs. More and more phenotype and genotype were generated, better utilization and mining connections from these data are important. This article provides an overview of the major studies based on biological images.
-
-
-
Recent Progress in Long Noncoding RNAs Prediction
Authors: Yuhua Yao, Xianhong Li, Lili Geng, Xuying Nan, Zhaohui Qi and Bo LiaoBackground: As potent gene regulators, long noncoding RNAs (lncRNAs) are critical in various biological activities, such as cellular processes. With the development of new sequencing technologies, vast amount of transcriptome data are available, which require efficient computational tools to distinguish noncoding RNAs from their coding counterparts, especially for lncRNAs. Methods: In this paper, we review the advancement of computational methods in predicting lncRNAs, summarize the difficulties in developing machine learning algorithms, and point out a few promising future directions. We also briefly summarize and describe popular softwares and web-servers in the area. Results and Conclusion: Given the exponentially expanding transcriptome data and increasing importance of lncRNAs in disease development and treatment, novel and effective computational tools for identifying lncRNAs are highly demanded.
-
-
-
The Advances and Challenges of Deep Learning Application in Biological Big Data Processing
Authors: Li Peng, Manman Peng, Bo Liao, Guohua Huang, Weibiao Li and Dingfeng XieBackground: Bioinformatics research comes into an era of big data. Mining potential value in biological big data for scientific research and health care field has the vital significance. Deep learning as new machine learning algorithms, on the basis of big data and high performance distributed parallel computing, show the excellent performance in biological big data processing. Objective: Provides a valuable reference for researchers to use deep learning in their studies of processing large biological data. Methods: This paper introduces the new model of data storage and computational facilities for big data analyzing. Then, the application of deep learning in three aspects including biological omics data processing, biological image processing and biomedical diagnosis was summarized. Aiming at the problem of large biological data processing, the accelerated methods of deep learning model have been described. Conclusion: The paper summarized the new storage mode, the existing methods and platforms for biological big data processing, and the progress and challenge of deep learning applies in biological big data processing.
-
-
-
A Review of Epidemic Models Related to Meteorological Factors
Authors: Yuewu Liu, Yan Peng, Qingfan Li and Xiangquan XiongBackground: An epidemic can spread rapidly among a large number of people in a community within a short period of time. Some infectious diseases, including influenza, hand, foot and mouth disease, dengue and meningitis, are temporally limited by variations in the meteorological factors, such as sunshine, temperature, humidity, rainfall, atmospheric pressure, wind speed and so on. Therefore, it is necessary to predict the behavior of outbreak of these infectious diseases based on meteorological factors. Objective: Review various epidemic models related to meteorological factors. Results: We discuss two kinds of epidemic models: deterministic models and stochastic models. The deterministic models include switched SIR model, seasonal SIR model, periodic SEIR system and seasonal SEIQR model. And the stochastic models involve multiple regression models, auto-regressive moving average model, autoregressive distributed lag model, time series Poisson regression models and generalized additive models. Furthermore, we introduce the latest applications of these models, respectively. Conclusion: In our work, these deterministic models and stochastic models can successfully predict the diseases outbreak using meteorological factors, and they all are now widely used in the field. However, few meteorological factors are used in these models. With the development of Meteorological Science, large amounts of Meteorological factor data will be obtained. More key Meteorological factors causing an epidemic will be identified. Therefore, in the future, more key meteorological factors will be considered in models and they will further improve the accuracy of the forecast.
-
-
-
Methods for Mining Single Nucleotide Polymorphism Data of Complex Diseases
By Xiong LiBackground: A key goal of mining single nucleotide polymorphism data of complex diseases (CD) is to build models that provide fundamental insight into genetic variations of CD. Therefore, we can predict disease risk and clinical outcomes and ultimately understand the development and progress mechanism of CD. As the technologies of omics data generation and computer science, the reductionist paradigm of genome wide association study becomes less prevalent. Conclusion: In this review, we summarize the different strategies for boosting the power of association study, which include data quality improvement, high-performance computing platform and advanced computational method. Using these complementary approaches, the fundamental mechanism of genomic variations affecting occurrence and development of CD may be uncovered.
-
-
-
A Review of Computational Approaches to Predict Gene Functions
Background: Recently, novel high-throughput biotechnologies have provided rich data about different genomes. However, manual annotation of gene function is time consuming. It is also very expensive and infeasible for the growing amounts of data. At present there are numerous functions in certain species that remain unknown or only partially known. Hence, the use of computational approaches to predicting gene function is becoming widespread. Computational approaches are time saving and less costly. Prediction analysis provided can be used in hypotheses to drive the biological validation of gene function. Objective: This paper reviews computational approaches such as the support vector machine, clustering, hierarchical ensemble and network-based approaches. Methods: Comparisons between these approaches are also made in the discussion portion. Results: In addition, the advantages and disadvantages of these computational approaches are discussed. Conclusion: With the emergence of omics data, the focus should be continued on integrating newly added data for gene functions prediction field.
-
-
-
Feature Extractions for Computationally Predicting Protein Post-Translational Modifications
Authors: Guohua Huang and Jincheng LiBackground: Post-translational modifications (PTMs) are a key regulating mechanism in the cellular process. It is of importance to quickly and accurately identify PTMs. Both next generation sequencing as well as bioinformatics techniques greatly facilitated discovery of PTMs. Most bioinformatics techniques followed the machine learning framework where feature extraction occupies a key position. Conclusion: The article focuses mainly on reviewing various feature extractions from protein sequence, structure, function, physicochemical and biochemical property and evolution conservation, which were used for predicting PTMs in the machine learning-based methods. The binary encoding, amino acid composition, pseudo amino acid composition, composition of K-spaced amino acid pairs, auto correlation functions, position weight amino acids composition and position-specific amino acid propensity extracted features directly from protein sequences. Encoding based on grouped weight is a hybrid way of feature extraction integrating information both on physicochemical and biochemical property and on sequences. The information on protein structure, especially secondary structure, accessible surface and disorder was used for encoding proteins. The feature extraction from the evolution conservation included position-specific scoring matrix and k-nearest neighbor score. In addition, we discussed some existing problems in the feature extractions.
-
-
-
Inter-Species/Host-Parasite Protein Interaction Predictions Reviewed
Authors: Jumoke Soyemi, Itunnuoluwa Isewon, Jelili Oyelade and Ezekiel AdebiyiBackground: Host-parasite protein interactions (HPPI) are those interactions occurring between a parasite and its host. Host-parasite protein interaction enhances the understanding of how parasite can infect its host. The interaction plays an important role in initiating infections, although it is not all host-parasite interactions that result in infection. Identifying the protein-protein interactions (PPIs) that allow a parasite to infect its host has a lot do in discovering possible drug targets. Such PPIs, when altered, would prevent the host from being infected by the parasite and in some cases, result in the parasite inability to complete specific stages of its life cycle and invariably lead to the death of such parasite. It therefore becomes important to understand the workings of host-parasite interactions which are the major causes of most infectious diseases. Objective: Many studies have been conducted in literature to predict HPPI, mostly using computational methods with few experimental methods. Computational method has proved to be faster and more efficient in manipulating and analyzing real life data. This study looks at various computational methods used in literature for host-parasite/inter-species protein-protein interaction predictions with the hope of getting a better insight into computational methods used and identify whether machine learning approaches have been extensively used for the same purpose. Methods: The various methods involved in host-parasite protein interactions were reviewed with their individual strengths. Tabulations of studies that carried out host-parasite/inter-species protein interaction predictions were performed, analyzing their predictive methods, filters used, potential protein-protein interactions discovered in those studies and various validation measurements used as the case may be. The commonly used measurement indexes for such studies were highlighted displaying the various formulas. Finally, future prospects of studies specific to human-plasmodium falciparum PPI predictions were proposed. Result: We discovered that quite a few studies reviewed implemented machine learning approach for HPPI predictions when compared with methods such as sequence homology search and protein structure and domain-motif. The key challenge well noted in HPPI predictions is getting relevant information. Conclusion: This review presents useful knowledge and future directions on the subject matter.
-
-
-
GENIRF: An Algorithm for Gene Regulatory Network Inference Using Rotation Forest
Authors: Jamshid Pirgazi, Ali R. Khanteymoori and Maryam JalilkhaniBackground: A central problem of systems biology is the reconstruction of the topology of gene regulatory networks (GRNs) using high throughput genomic data like microarray gene expression data. The main challenge in gene expression data is that the number of genes is high, number of samples is low, and the data are often impregnated with noise. Objective: In this paper, we present a method for Gene Regulatory Network Inference using Rotation Forest (GENIRF). Methods: The rotation forest will exploit the embedded variable ranking mechanism of tree-based ensemble methods and dimension reduction. This feature solves the main challenge in gene expression data. GENIRF decomposes the prediction of a gene regulatory network between p genes into p different regression problems. Each regression problem is constructed with a transformed expression pattern and rotation forest. The expression pattern of the target gene is predicted from the expression patterns of all the remaining genes, using rotation forest. Results: GENIRF does not make any hypotheses regarding the nature of gene regulation, so it can identify combinatorial and non-linear interactions in GRN. Experimental results on the DREAM4 in silico multifactorial challenge simulated data indicate that GENIRF has better accuracy and compares favorably with existing well known algorithms. Furthermore, it is a fast and scalable method. Conclusion: GENIRF shows high performance across this benchmark with different performance metrics and the overall score of GENIRF is slightly better than other method. We have also shown that the dimension reduction of the gene expression data can further improve the performance of GENIRF and other methods. In addition, GENIRF is competitive in terms of computational efficiency, especially with ensemble methods and for big data, our method can be easily parallelized.
-
-
-
The Topologically Associated Domains (TADs) of a Chromatin Correlated with Isochores Organization of a Genome
Authors: Abraham A. Labena, Hai-Xia Guo, Chuan Dong, Li Li and Feng-Biao GuoBackground: Recent studies suggest that the one-dimensional genomic feature, isochore, underlie the three-dimensional chromatin architecture. It has also been reported that open chromatin fibers originate from regions of high gene density while closed chromatin corresponds to low gene density chromosomal regions. Objective: To verify how the 3D architecture of chromatin is linked to the one-dimensional genomic features, we analyzed the degree of overlapping between the segments of isochores and TADs of Drosophila melanogaster. Method: Using an R program, we measured the percentage overlapping between the segments of isochores and TADs of 2L, 2R, 3L, 3R and X chromosomes of Drosophila melanogaster and performed a Monte Carlo randomization test to check whether the observed overlapping results are by chance alone or not. Results: The overlapping ratio between the two genomic features was ranged from 71.82% in X chromosome to 79.06% in the 2R chromosome when the ranges of TADs were mapped onto isochores and 66.97% in X chromosome to 74.14% in chromosome 2R when the ranges of isochores were mapped onto TAD ranges in D. melanogaster. Conclusion: The probability test on 1000 random samples for each chromosome demonstrated that there exists a statistically significant correlation between the isochores and TADs (at FDR of 0.05) that could not be obtained by chance alone. In general, this finding would have an interesting implication in deducing the 3D structure of a chromatin from the isochores organization, especially in genomes that lack TADs.
-
-
-
Inference of Transcriptional Regulation from Expression Data Using Model Integration
Authors: Long Wang, Jing Guo, Ji-Wei Chang, Muhammad Tahir ul Qamar and Ling-Ling ChenBackground: Rapid accumulation of genomic and transcriptomic data initiates the development of computational methods to identify the regulation of transcriptional factors (TF) and genes. However, available methods display high false-positive rate and unstable performance across different networks due to their preferences for interactions with certain features. Model integration can reduce the biases of these methods and improve the specificity, especially for the pairwise methods whose correlations are very low. Objective: Different integration methods were compared in this analysis, and the best integration method will be identified. Method: We applied integration of 14 different models categorized into five major groups, i.e. regression, mutual information, correlation, Bayesian and others, to predict the simulated regulation networks extracted from Escherichia coli at two different scales. Results: We have found that support vector regression (SVR) method achieved the highest precision. While one another method Cubist, was less precise than SVR but much more efficient especially in time cost. This conclusion was also confirmed by simulated expression data from in silico Saccharomyces cerevisiae network at three different scales and the real expression data from the sub-network of SOS DNA repair system. We applied SVR to construct the network orchestrating cell envelope stress in B. licheniformis, and found that the predicted network was consistent with the results of previous studies. Conclusion: This study conducted and compared different integration methods, and found that SVR can better meet the demand of higher precision followed by Cubist. The integration can provide more clear insights into the transcriptional architecture.
-
Volumes & issues
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)