- Home
- A-Z Publications
- Current Bioinformatics
- Previous Issues
- Volume 16, Issue 6, 2021
Current Bioinformatics - Volume 16, Issue 6, 2021
Volume 16, Issue 6, 2021
-
-
Bioinformatics Tools and Databases for Genomics-assisted Breeding and Population Genetics of Plants: A Review
Authors: Supriya B. Aglawe, Amit Kumar Verma and Atul Kumar UpadhyayGenomics is the study of the complete genetic material of an organism. It would not be an exaggeration to say that we are at the peak of the genomics era as with the advent of high-throughput sequencing technologies we have an enormous amount of genomic data coming every day. Genomics assisted breeding (GAB) is becoming increasingly popular in the field of crop improvement. GAB utilizes available genomics information of different crops and their relatives for the purpose of plant breeding to produce improved varieties of the crops. Proper knowledge of these tools and databases helps in speeding up the process of plant breeding. The available tools can be categorized into several groups such as genetic diversity, Quantitative Trait Locus (QTL)/gene mapping, Next-Generation Sequencing (NGS) based Single Nucleotide Polymorphism (SNP) genotyping, Molecular breeding, Genome-Wide Association Studies (GWAS), Genomic Selection (GS), Marker-Assisted Recurrent Selection (MARS), Multiparent Advanced Generation Inter-Cross (MAGIC), etc. Most of the available tools are user friendly and where it is not, needs to be updated soon. There is an urgent need to develop the scientific resources and technical expertise for the proper and effective use of these tools. In this review, we have extensively covered the available tools and databases for the genomic assisted breeding and population genetics study of the plants. The details of these tools and databases along with their web links are also provided. We believe this review will be handy and useful for the scientists and researches of plant breeding, population genetics, and genomics.
-
-
-
Intelligent Techniques Analysis for Glycosylation Site Prediction
Authors: Alhasan Alkuhlani, Walaa Gad, Mohamed Roushdy and Abdel-Badeeh M. SalemBackground: Glycosylation is one of the most common post-translation modifications (PTMs) in organism cells. It plays important roles in several biological processes including cell-cell interaction, protein folding, antigen recognition, and immune response. In addition, glycosylation is associated with many human diseases such as cancer, diabetes and coronaviruses. The experimental techniques for identifying glycosylation sites are time-consuming, extensive laboratory work, and expensive. Therefore, computational intelligence techniques are becoming very important for glycosylation site prediction. Objective: This paper is a theoretical discussion of the technical aspects of the biotechnological (e.g., using artificial intelligence and machine learning) to digital bioinformatics research and intelligent biocomputing. The computational intelligent techniques have shown efficient results for predicting Nlinked, O-linked and C-linked glycosylation sites. In the last two decades, many studies have been conducted for glycosylation site prediction using these techniques. In this paper, we analyze and compare a wide range of intelligent techniques of these studies from multiple aspects. The current challenges and difficulties facing the software developers and knowledge engineers for predicting glycosylation sites are also included. Methods: The comparison between these different studies is introduced including many criteria such as databases, feature extraction and selection, machine learning classification methods, evaluation measures and the performance results. Results and Conclusions: Many challenges and problems are presented. Consequently, more efforts are needed to get more accurate prediction models for the three basic types of glycosylation sites.
-
-
-
Chemical Genetic Validation of GWAS-derived Disease Loci
Authors: Yuan Quan and Hong-Yu ZhangBackground: Genome-wide association studies (GWAS) have opened the door to unprecedented large-scale identification of susceptibility loci for human diseases and traits. However, it is still a great challenge to validate these loci and elucidate how these sequence variants give rise to the genetic and phenotypic changes. Because many drug targets are genetic disease genes and the general drug mode of action (MoA, agonist or antagonist) is in line with the consequence of target gene mutations (loss-of-function (LOF) or gain-of-function (GOF)), here we propose a chemical genetic method to address the above issues of GWAS. Objective: This study intends to use chemical genetics information to validate GWAS-derived disease loci and interpret their underlying pathogenesis. Methods: We conducted a comprehensive comparative analysis on GWAS data and drug/target information (chemical genetics information). Results: We have identified hundreds of GWAS-derived disease loci which are linked to drug target genes and have matched disease traits and drug indications. It is interesting to note that more than 40% genes have been recognized as disorder factors, indicating the potential power of chemical genetic validation. The pathogenesis of these loci was inferred by the corresponding drug MoA. Some inferences were supported by prior experimental observations; some were interpreted in terms of microRNA regulation, codon usage bias, and transcriptional regulation, in particular the transcription factor-binding affinity variation induced by disease-causing mutations. Conclusion: In summary, chemical genetics information is useful to validate GWAS-derived disease loci and to interpret their underlying pathogenesis as well, which has important implications not only in medical genetics but also in methodology evaluation of GWAS.
-
-
-
Predicting Interactions Between Pathogen and Human Proteins Based on the Relation Between Sequence Length and Amino Acid Composition
Authors: Saud Alguwaizani, Shulei Ren, De-Shuang Huang and Kyungsook HanAim: Both bacterial infection and viral infection involve a large number of protein-protein interactions (PPIs) between a pathogen and its target host. Background: So far, many computational methods have focused on predicting PPIs within the same species rather than PPIs across different species. Methods: From the extensive analysis of PPIs between Yersinia pestis bacteria and humans, we recently discovered an interesting relation; a linear relation between amino acid composition and sequence length was observed in many proteins involved in PPIs. We have built a support vector machine (SVM) model, which predicts PPIs between human and bacteria using two feature types derived from the relation. The two feature types used in the SVM are the amino acid composition group (AACG) and the difference in amino acid composition between host and pathogen proteins. Results: The SVM model achieved high performance in predicting bacteria-human PPIs. The model showed an accuracy of 96%, sensitivity of 94%, and specificity of 98% in predicting PPIs between humans and Yersinia pestis, in which there is a strong relation between amino acid composition and sequence length. The SVM model was also tested in predicting PPIs between human and viruses, which include Ebola, HCV, and SARS-CoV-2, and showed a good performance. Conclusion: The feature types identified in our study are simple yet powerful in predicting pathogenhuman PPIs. Although preliminary, our method will be useful for finding unknown target host proteins or pathogen proteins and designing in vitro or in vivo experiments.
-
-
-
Prediction of Microbe-drug Associations Based on Chemical Structures and the KATZ Measure
Authors: Lingzhi Zhu, Guihua Duan, Cheng Yan and Jianxin WangBackground: Microbial communities have important influences on our health and disease. Identifying potential human microbe-drug associations will be greatly advantageous to explore complex mechanisms of microbes in drug discovery, combinations and repositioning. Until now, the complex mechanism of microbe-drug associations remains unknown. Objective: Computational models play an important role in discovering hidden microbe-drug associations because biological experiments are time-consuming and expensive. Based on chemical structures of drugs and the KATZ measure a new computational model (HMDAKATZ) is proposed for identifying potential Human Microbe-Drug Associations. Methods: In HMDAKATZ, the similarity between microbes is computed using the Gaussian Interaction Profile (GIP) kernel based on known human microbe-drug associations. The similarity between drugs is computed based on known human microbe-drug associations and chemical structures. Then, a microbe-drug heterogeneous network is constructed by integrating the microbemicrobe network, the drug-drug network, and a known microbe-drug association network. Finally, we apply KATZ to identify potential associations between microbes and drugs. Results: The experimental results showed that HMDAKATZ achieved area under the curve (AUC) values of 0.9010±0.0020, 0.9066±0.0015, and 0.9116 in 5-fold cross-validation (5-fold CV), 10-fold cross-validation (10-fold CV), and leave one out cross-validation (LOOCV), respectively, which outperformed four other computational models(SNMF,RLS,HGBI, and NBI). Conclusion: HMDAKATZ obtained better prediction performance than four other methods in 5- fold CV, 10-fold CV, and LOOCV. Furthermore, three case studies also illustrated that HMDAKATZ is an effective way to discover hidden microbe-drug associations.
-
-
-
Effective Classification of Melting Curve in Real-time PCR Based on Dynamic Filter-based Convolutional Neural Network
Authors: Di Gai, Xuanjing Shen and Haipeng ChenBackground: The effective classification of the melting curve is conducive to measure the specificity of the amplified products and the influence of invalid data on subsequent experiments is excluded. Objective: In this paper, a convolutional neural network (CNN) classification model based on dynamic filter is proposed, which can categorize the number of peaks in the melting curve image and distinguish the pollution data represented by the noise peaks. Methods: The main advantage of the proposed model is that it adopts the filter which changes with the input and uses the dynamic filter to capture more information in the image, making the network learning more accurate. In addition, the residual module is used to extract the characteristics of the melting curve, and the pooling operation is replaced with an atrous convolution to prevent the loss of context information. Results: In order to train the proposed model, a novel melting curve dataset is created, which includes a balanced dataset and an unbalanced dataset. The proposed method uses six classification-based assessment criteria to compare with seven representative methods based on deep learning. Experimental results show that the proposed method not only markedly outperforms the other state-of-the-art methods in accuracy, but also has much less running time. Conclusion: It evidently proves that the proposed method is suitable for judging the specificity of amplification products according to the melting curve. Simultaneously, it overcomes the difficulties of manual selection with low efficiency and artificial bias.
-
-
-
A Network Pharmacology Approach to Explore the Underlying Mechanism of Tufuling Qiwei Tangsan in Treating Psoriasis
Authors: Xiaolei Ma, Yinan Lu, Yang Lu and Zhili PeiBackground: Tufuling Qiwei Tangsan (TQTS) is a commonly used Mongolian medicine preparation against psoriasis in China. However, its mechanism of action and molecular targets for the treatment of psoriasis is still unclear. Network pharmacology can reveal the synergistic mechanism of drugs at the molecular, target, and pathway levels and is suitable for the complex study of traditional Chinese medicine formulations. However, it is rarely involved in the application of Mongolian medicine with the same holistic concept of traditional Chinese medicine. Methods: In this paper, the active compounds of TQTS were collected, and their targets were identified. Psoriasis-related targets were obtained by analyzing the differential expressed genes between psoriasis patients and healthy individuals. Then, the network concerning the interactions of potential targets of TQTS with well-known psoriasis-related targets was built. The core targets were selected according to topological parameters. And the enrichment analysis was carried out to explore the mechanism of action of TQTS. Moreover, molecular docking was performed to study the interaction between the selected ligands and receptors related to psoriasis. Result and Conclusion: Eighty-five active compounds of TQTS were screened, with corresponding 270 targets, and 313 differentially expressed genes were identified. Additionally, enrichment analysis showed that the targets of TQTS for treating psoriasis were mainly involved in multiple biological processes, including apoptosis, growth factor response, etc., and related pathways including PI3K-Akt and MAPK signaling pathway, and so on. Genes such as NFKB1, TP53, and MAPK1 are the key genes in the gene pathway network of TQTS against psoriasis. The 4 main active components of TQTS have certain binding activity with 13 potential targets, and the stability of their interaction with AKT1 is found to be the most efficient, which indicates the potential mechanism of TQTS on psoriasis.
-
-
-
Development of a Gene Expression Panel, for the Prediction of Protein Abundances in Cancer Cell Lines
Authors: Gunhee Lee, Yeun-Jun Chung and Minho LeeBackground: Due to the ease of quantifying mRNA expression in comparison with that of protein abundances, many studies have utilized it to infer protein product quantification. However, the mRNA expression values for a gene and its protein products are not known to have a strong relationship, because of the complex mechanisms required to regulate the amounts of protein levels, from translation to post-translational modifications. Methods: We have developed, in this study, models to predict protein levels from mRNA expression levels using the transcriptome and reverse phase protein arrays (RPPA)-based on protein levels in pancancer cell lines. When predicting the abundance of a protein expression, in addition to using RNA expression of the corresponding gene, we also used RNA expression levels of a particular set of other genes. By applying support vector regression, we have identified a 47-gene expression panel that contributes to the improved performance of the prediction, and its optimal subsets specific to each protein species. Result and Conclusion: Eventually, our final prediction models doubled the number of predictable protein expressions (r > 0.7). Due to the weaknesses of RPPA, our model had some limitations, however, we expect that these prediction models and the panel can be widely used in the future to infer protein abundances.
-
-
-
Development of Machine Learning Based Blood-brain Barrier Permeability Prediction Models Using Physicochemical Properties, MACCS and Substructure Fingerprints
Authors: Deeksha Saxena, Anju Sharma, Mohammed H. Siddiqui and Rajnish KumarBackground: Blood-Brain Barrier (BBB) protects the central nervous system from systemic circulation and maintains the homeostasis of the brain. BBB permeability is one of the essential characteristics of drugs acting on the central nervous system to indicate if the drug could reach the brain or not. The available laboratory methods for the prediction of BBB permeability are accurate but expensive and time-consuming. Therefore, many attempts have been made over the years to predict the BBB permeability of compounds using computational approaches. The accuracy of the prediction models with external dataset has always been an issue with the prediction models. Objective: To develop a Machine learning-based BBB permeability prediction model using physicochemical properties and molecular fingerprints. Methods: Support vector machine (SVM), k-nearest neighbor (kNN), Random forest (RF), and Naïve Bayes (NB) algorithms were applied on a large dataset of 1978 compounds using 1917 feature vectors containing physicochemical properties, MACCS fingerprints, and substructure fingerprints to predict the BBB permeability. Results and Discussion: The comparative analysis of performance metrics of developed models suggested that SVM with the radial basis function kernel performed better than the kNN, RF, and NB algorithms. The BBB permeability prediction model's accuracy with the SVM was 96.77%. The prediction performance of the model developed in this study was found better than the existing machine learning-based BBB permeability prediction models. Conclusion: The prediction model developed in this study could be useful for screening compounds based on their BBB permeability at the preliminary stages of drug design and development.
-
-
-
Prediction of Protein-protein Interactions in Arabidopsis thaliana Using Partial Training Samples in a Machine Learning Framework
Authors: Fee F. Ahmed, Mst. Shamima Khatun, Md. Parvez Mosharaf and Md. N. H. MollahBackground: Protein-protein interactions (PPI) play a vital role in a wide range of biological processes starting from cell-cell interactions to developmental control in all organisms. However, experimental identification of PPI is often laborious, time-consuming and costly compared to computational prediction. There are several computational prediction models in the literature based on complete training samples, but none of them dealt with the partial training samples. Objective: The objective of this work was to develop an effective PPI prediction model for Arabidopsis Thaliana using partial training samples in a machine learning framework. Methods: We proposed an effective computational PPI prediction model by combining random forest (RF) classifier and autocorrelation (AC) sequence encoding features with 1:2 ratio of positive- PPI and unknown-PPI samples. Results: We observed that the proposed prediction model produces the highest average performance scores of sensitivity (94.62%), AUC (0.92) and pAUC (0.189) with the training datasets and sensitivity (88.14%), AUC (0.89) and pAUC (0.176) with the test datasets of 5-fold crossvalidation compared to other candidate predictors based on LDA, LOGI, ADA, NB, KNN & SVM classifiers. It also computed the highest performance scores of TPR (91.82%) and pAUC (0.174) at FPR= 20% with AUC (0.948) compared to other candidate predictors. Conclusion: Overall performance of the developed model revealed that our proposed predictor might be useful to elucidate the biological function of unseen PPIs from a large number of candidate proteins in Arabidopsis thaliana.
-
-
-
Genome-wide Characterization Deciphers Distinct Properties of Aquaporins in Six Phytophthora Species
Background: Aquaporins, also known as major intrinsic proteins (MIPs), facilitate the membrane diffusion of water and some other small solutes. The roles of MIPs in plant physiological processes are established and now their roles in plant-pathogen interactions are getting more attention. Objective: To investigate the evolution, diversity, and structural insights of Phytophthora MIPs (PhyMIPs) and to compare them to those in other domains of life. Methods: Bioinformatics approaches were used to identify and characterize the PhyMIPs. The phylogenetic analysis was done with MEGA7.0 using maximum likelihood method. The prediction of transmembrane α-helices was done by using SOSUI and TMpred servers, and that of subcellular localization was performed with WoLF PSORT and Cello prediction system. The structure of PhyMIP genes was predicted by GeneMark.hmm ES-3.0 program. The 3D homology models were generated using the Molecular Operating Environment software and the stereochemical quality of the templates and models was assessed by PROCHECK. The PoreWalker server was used to detect and characterize PhyMIP channels from their 3D structural models. Results: Herein, we identified 17, 24, 27, 19, 19, and 22 full-length MIPs, respectively, in the genomes of six Phytophthora species, P. infestans, P. parasitica, P. sojae, P. ramorum, P. capsici, and P. cinnamomi. Phylogenetic analysis showed that the PhyMIPs formed a completely distinct clade from their counterparts in other taxa and were clustered into nine subgroups. Sequence and structural properties indicated that the primary selectivity-related constrictions, including aromatic arginine (ar/R) selectivity filter and Froger's positions in PhyMIPs were distinct from those in other taxa. The substitutions in the conserved Asn-Pro-Ala motifs in loops B and E of many PhyMIPs were also divergent from those in other taxonomic domains. The group-specific consensus sequences/ motifs deciphered in different loops and transmembrane α-helices of PhyMIPs were distinct from those in plants, animals, and other microbes. Conclusion: This study represents PhyMIPs with distinct evolutionary and structural properties, and the data collectively indicates that PhyMIPs might have novel functions.
-
Volumes & issues
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)