- Home
- A-Z Publications
- Current Bioinformatics
- Previous Issues
- Volume 17, Issue 1, 2022
Current Bioinformatics - Volume 17, Issue 1, 2022
Volume 17, Issue 1, 2022
-
-
A Review: Computational Approaches to Design sgRNA of CRISPR-Cas9
Authors: Mohsin A. Nasir, Samia Nawaz and Jian HuangClustered regularly interspaced short palindromic repeats along with CRISPR-associated protein mechanisms preserve the memory of previous experiences with DNA invaders, in particular spacers that are embedded in CRISPR arrays between coordinate repeats. There has been a fast progression in the comprehension of this immune system and its implementations; however, there are numerous points of view that anticipate explanations to make the field an energetic research zone. The efficiency of CRISPR-Cas depends upon well-considered single guide RNA; for this purpose, many bioinformatics methods and tools are created to support the design of greatly active and precise single guide RNA. Insilico single guide RNA architecture is a crucial point for effective gene editing by means of the CRISPR technique. Persistent attempts have been made to improve in-silico single guide RNA formulation having great on-target effectiveness and decreased off-target effects. This review offers a summary of the CRISPR computational tools to help different researchers pick a specific tool for their work according to pros and cons, along with new thoughts to make new computational tools to overcome all existing limitations.
-
-
-
Bioinformatics Approach on Bioisosterism Softwares to be Used in Drug Discovery and Development
Background: In the rational drug development field, bioisosterism is a tool that improves lead compounds' performance, referring to molecular fragment substitution that has similar physical-chemical properties. Thus, it is possible to modulate drug properties such as absorption, toxicity, and half-life increase. This modulation is of pivotal importance in the discovery, development, identification, and interpretation of the mode of action of biologically active compounds. Objective: Our purpose here is to review the development and application of bioisosterism in drug discovery. In this study history, applications, and use of bioisosteric molecules to create new drugs with high binding affinity in the protein-ligand complexes are described. Methods: It is an approach for molecular modification of a prototype based on the replacement of molecular fragments with similar physicochemical properties, being related to the pharmacokinetic and pharmacodynamic phase, aiming at the optimization of the molecules. Results: Discovery, development, identification, and interpretation of the mode of action of biologically active compounds are the most important factors for drug design. The strategy adopted for the improvement of leading compounds is bioisosterism. Conclusion: Bioisosterism methodology is a great advance for obtaining new analogs to existing drugs, enabling the development of new drugs with reduced toxicity, in a comparative analysis with existing drugs. Bioisosterism has a wide spectrum to assist in several research areas.
-
-
-
A Review of DNA Data Storage Technologies Based on Biomolecules
Authors: Lichao Zhang, Yuanyuan Lv, Lei Xu and Murong ZhouIn the information age, data storage technology has become the key to improving computer systems. Since traditional storage technologies cannot meet the demand for massive storage, new DNA storage technology based on biomolecules attracts much attention. DNA storage refers to the technology that uses artificially synthesized deoxynucleotide chains to store and read all information, such as documents, pictures, and audio. First, data are encoded into binary number strings. Then, the four types of base, A(Adenine), T(Thymine), C(Cytosine), and G(Guanine), are used to encode the corresponding binary numbers so that the data can be used to construct the target DNA molecules in the form of deoxynucleotide chains. Subsequently, the corresponding DNA molecules are artificially synthesized, enabling the data to be stored within them. Compared with traditional storage systems, DNA storage has major advantages, such as high storage density, long duration, as well as low hardware cost, high access parallelism, and strong scalability, which satisfies the demands for big data storage. This manuscript first reviews the origin and development of DNA storage technology, then the storage principles, contents, and methods are introduced. Finally, the development of DNA storage technology is analyzed. From the initial research to the cutting edge of this field and beyond, the advantages, disadvantages, and practical applications of DNA storage technology require continuous exploration.
-
-
-
Integration of Multi-Omics Data Using Probabilistic Graph Models and External Knowledge
Authors: Bridget A. Tripp and Hasan H. OtuBackground: High-throughput sequencing technologies have revolutionized the ability to perform systems-level biology and elucidate molecular mechanisms of disease through the comprehensive characterization of different layers of biological information. Integration of these heterogeneous layers can provide insight into the underlying biology but is challenged by modeling complex interactions.
Objective: We introduce OBaNK: omics integration using Bayesian networks and external knowledge, an algorithm to model interactions between heterogeneous high-dimensional biological data to elucidate complex functional clusters and emergent relationships associated with an observed phenotype.
Methods: Using Bayesian network learning, we modeled the statistical dependencies and interactions between lipidomics, proteomics, and metabolomics data. The strength of a learned interaction between molecules was altered based on external knowledge.
Results: Networks learned from synthetic datasets based on real pathways achieved an average area under the curve score of ~0.85, an improvement of ~0.23 from baseline methods. When applied to real multi-omics data collected during pregnancy, five distinct functional networks of heterogeneous biological data were identified, and the results were compared to other multi-omics integration approaches.
Conclusion: OBaNK successfully improved the accuracy of learning interaction networks from data integrating external knowledge, identified heterogeneous functional networks from real data, and suggested potential novel interactions associated with the phenotype. These findings can guide future hypothesis generation. OBaNK source code is available at: https://github.com/bridgettripp/OBaNK.git, and a graphical user interface is available at: http://otulab.unl.edu/OBaNK.
-
-
-
Identification of Drug-Disease Associations by Using Multiple Drug and Disease Networks
More LessBackground: Drug repositioning is a new research area in drug development. It aims to discover novel therapeutic uses of existing drugs. It could accelerate the process of designing novel drugs for some diseases and considerably decrease the cost. The traditional method to determine novel therapeutic uses of an existing drug is quite laborious. It is alternative to design computational methods to overcome such defect. Objective: This study aims to propose a novel model for the identification of drug–disease associations. Methods: Twelve drug networks and three disease networks were built, which were fed into a powerful network-embedding algorithm called Mashup to produce informative drug and disease features. These features were combined to represent each drug–disease association. Classic classification algorithm, random forest, was used to build the model. Results: Tenfold cross-validation results indicated that the MCC, AUROC, and AUPR were 0.7156, 0.9280, and 0.9191, respectively. Conclusion: The proposed model showed good performance. Some tests indicated that a small dimension of drug features and a large dimension of disease features were beneficial for constructing the model. Moreover, the model was quite robust even if some drug or disease properties were not available.
-
-
-
Improved Hybrid Particle Swarm Optimizer with Sine-Cosine Acceleration Coefficients for Transient Electromagnetic Inversion
Authors: Ruiheng Li, Qiong Zhuang, Nian Yu, Ruiyou Li and Huaiqing ZhangBackground: Recently, Particle Swarm Optimization (PSO) has been increasingly used in geophysics due to its simple operation and fast convergence. Objective: However, PSO lacks population diversity and may fall to local optima. Hence, an Improved Hybrid Particle Swarm Optimizer with Sine-Cosine Acceleration Coefficients (IH-PSO-SCAC) is proposed and successfully applied to test functions in Transient Electromagnetic (TEM) nonlinear inversion. Methods: A reverse learning strategy is applied to optimize population initialization. The sine-cosine acceleration coefficients are utilized for global convergence. Sine mapping is adopted to enhance population diversity during the search process. In addition, the mutation method is used to reduce the probability of premature convergence. Results: The application of IH-PSO-SCAC in the test functions and several simple layered models are demonstrated with satisfactory results in terms of data fit. Two inversions have been carried out to test our algorithm. The first model contains an underground low-resistivity anomaly body and the second model utilized measured data from a profile of the Xishan landslide in Sichuan Province. In both cases, resistivity profiles are obtained, and the inverse problem is solved for verification. Conclusion: The results show that the IH-PSO-SCAC algorithm is practical, can be effectively applied in TEM inversion and is superior to other representative algorithms in terms of stability and accuracy.
-
-
-
A NOD-Like Receptor Signaling-Based Gene Signature Identified as a Novel Prognostic Biomarker for Predicting Overall Survival of Colorectal Cancer Patients
Authors: Xin Qi, Jiachen Zuo, Donghui Yan, Guang Hu, Rui Wang, Jiajia Chen and Jiaolong FuBackground: Colorectal Cancer (CRC) is the most frequently diagnosed gastrointestinal tract malignant tumor worldwide, which is closely associated with distant metastasis and poor prognosis. Due to high degree of heterogeneity, reliable prognostic biomarkers are urgently needed to guide the therapeutic intervention of CRC patients. Objective: The present study aimed to develop a NOD-Like Receptors (NLRs) signaling-based gene signature that can successfully predict the overall survival of CRC patients. Methods: Firstly, differentially expressed NLR signaling-related genes were identified between primary and metastatic human CRC samples. Genes with prognostic value were then screened through univariate Cox regression analysis. Next, the NLR signaling-based prognostic signature was constructed by LASSO-penalized Cox regression analysis, and its predictive ability was further confirmed in an independent cohort. Furthermore, functional studies including GO, GSEA, ssGSEA and chemotherapeutic response analyses were performed to explore the role of the NLR signaling-based signature in CRC pathogenesis and therapy. Results: The established prognostic signature that consisted of 7 NLR signaling-related genes can effectively stratify the high-risk and low-risk CRC patients in both training and validation cohorts. Moreover, the signature proved to be an independent indicator of overall survival in CRC patients. Functional annotation and chemotherapeutic response analyses showed that the signature was closely associated with immune status and chemotherapeutic sensitivity of CRC patients. Conclusion: The novel NLR signaling-based gene signature could serve as a potential tool for survival prediction and therapeutic evaluation, thereby contributing to the personalized prognostic management of CRC patients.
-
-
-
Multivariate Information Fusion for Identifying Antifungal Peptides with Hilbert-Schmidt Independence Criterion
Authors: Haohao Zhou, Hao Wang, Yijie Ding and Jijun TangBackground: Antifungal Peptides (AFP) have been found to be effective against many fungal infections. Objective: However, it is difficult to identify AFP. Therefore, it is great practical significance to identify AFP via machine learning methods (with sequence information). Methods: In this study, a Multi-Kernel Support Vector Machine (MKSVM) with Hilbert-Schmidt Independence Criterion (HSIC) is proposed. Proteins are encoded with five types of features (188-bit, AAC, ASDC, CKSAAP, DPC), and then construct kernels using Gaussian kernel function. HSIC are used to combine kernels and multi-kernel SVM model is built. Results: Our model performed well on three AFPs datasets and the performance is better than or comparable to other state-of-art predictive models. Conclusion: Our method will be a useful tool for identifying antifungal peptides.
-
-
-
Recognition of CRISPR Off-Target Cleavage Sites with SeqGAN
Authors: Wen Li, Xiao-Bo Wang and Yan XuBackground: The CRISPR system can quickly achieve the editing of different gene loci by changing a small sequence on a single guide RNA. But the off-target event limits the further development of the CRISPR system. How to improve the efficiency and specificity of this technology and minimize the risk of off-target have always been a challenge. For genome-wide CRISPR Off-Target Cleavage Sites (OTS) prediction, an important issue is data imbalance, that is, the number of true OTS identified is much less than that of all possible nucleotide mismatch loci. Methods: In this work, based on the sequence-generating adversarial network (SeqGAN), positive offtarget sequences were generated to amplify the off-target gene locus OTS dataset of Cpf1. Then we trained the data by a deep Convolutional Neural Network (CNN) to obtain a predictor with stronger generalization ability and better performance. Results: In 10-fold cross-validation, the AUC value of the CNN classifier after SeqGAN balance was 0.941, which was higher than that of the original 0.863 and over-sampling 0.929. In independence testing, the AUC value of the CNN classifier after SeqGAN balance was 0.841, which was higher than that of the original 0.833 and over-sampling 0.836. The PR value was 0.722 after SeqGAN, which was also about higher 0.16 than the original data and higher about 0.03 than over-sampling. Conclusion: The sequence generation antagonistic network SeqGAN was firstly used to deal with data imbalance processing on CRISPR data. All the results showed that the SeqGAN can effectively generate positive data for CRISPR off-target sites.
-
-
-
Identification of DNA-Binding Proteins via Hypergraph Based Laplacian Support Vector Machine
Authors: Yuqing Qian, Hao Meng, Weizhong Lu, Zhijun Liao, Yijie Ding and Hongjie WuBackground: The identification of DNA binding proteins (DBP) is an important research field. Experiment-based methods are time-consuming and labor-intensive for detecting DBP. Objective: To solve the problem of large-scale DBP identification, some machine learning methods are proposed. However, these methods have insufficient predictive accuracy. Our aim is to develop a sequence- based machine learning model to predict DBP. Methods: In our study, we extracted six types of features (including NMBAC, GE, MCD, PSSM-AB, PSSM-DWT, and PsePSSM) from protein sequences. We used Multiple Kernel Learning based on Hilbert- Schmidt Independence Criterion (MKL-HSIC) to estimate the optimal kernel. Then, we constructed a hypergraph model to describe the relationship between labeled and unlabeled samples. Finally, Laplacian Support Vector Machines (LapSVM) is employed to train the predictive model. Our method is tested on PDB186, PDB1075, PDB2272 and PDB14189 data sets. Results: Compared with other methods, our model achieved best results on benchmark data sets. Conclusion: The accuracy of 87.1% and 74.2% are achieved on PDB186 (Independent test of PDB1075) and PDB2272 (Independent test of PDB14189), respectively.
-
Volumes & issues
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)