- Home
- A-Z Publications
- Current Bioinformatics
- Previous Issues
- Volume 16, Issue 5, 2021
Current Bioinformatics - Volume 16, Issue 5, 2021
Volume 16, Issue 5, 2021
-
-
Deep Learning in Disease Diagnosis: Models and Datasets
Authors: Deeksha Saxena, Mohammed H. Siddiqui and Rajnish KumarBackground: Deep learning (DL) is an Artificial neural network-driven framework with multiple levels of representation for which non-linear modules combined in such a way that the levels of representation can be enhanced from lower to a much abstract level. Though DL is used widely in almost every field, it has largely brought a breakthrough in biological sciences as it is used in disease diagnosis and clinical trials. DL can be clubbed with machine learning, but at times both are used individually as well. DL seems to be a better platform than machine learning as the former does not require an intermediate feature extraction and works well with larger datasets. DL is one of the most discussed fields among scientists and researchers these days for diagnosing and solving various biological problems. However, deep learning models need some improvisation and experimental validations to be more productive. Objective: To review the available DL models and datasets that are used in disease diagnosis. Methods: Available DL models and their applications in disease diagnosis were reviewed discussed and tabulated. Types of datasets and some of the popular disease-related data sources for DL were highlighted. Results: We have analyzed the frequently used DL methods, data types, and discussed some of the recent deep learning models used for solving different biological problems. Conclusion: The review presents useful insights about DL methods, data types, and selection of DL models for the disease diagnosis.
-
-
-
A Roadmap to Sequence Assembly Evaluation Tools
Authors: Sara El-Metwally, Eslam Hamouda and Mayada TarekThe assembly evaluation process is the starting step towards meaningful downstream data analysis. We need to know how much accurate information is included in an assembled sequence before going further to any data analysis stage. Four basic metrics are targeted by different assembly evaluation tools: contiguity, accuracy, completeness, and contamination. Some tools evaluate these metrics based on comparing the assembly results to a closely related reference. Others utilize different types of heuristics to overcome the missing guiding reference, such as the consistency between assembly results and sequencing reads. In this paper, we discuss the assembly evaluation process as a core stage in any sequence assembly pipeline and present a roadmap that is followed by most assembly evaluation tools to assess different metrics. We highlight the challenges that currently exist in the assembly evaluation tools and summarize their technical and practical details to help the end-users choose the best tool according to their working scenarios. To address the similarities/differences among different assembly assessment tools, including their evaluation approaches, metrics, comprehensive nature, limitations, usability and how the evaluated results are presented to the end-user, we provide a practical example for evaluating Velvet assembly results for S. aureus dataset from GAGE competition. A Github repository (https://github.com/SaraEl-Metwally/Assembly-Evaluation-Tools) is created for evaluation result details along with their generated command line parameters.
-
-
-
A Comparative Analysis of Biological Data Integration Systems Famous for Data Exploitation and Knowledge Discovery
Authors: Omer Irshad and Muhammad U. G. KhanIntegrating heterogeneous biological databases for unveiling the new intra-molecular and inter-molecular attributes, behaviors, and relationships in the human cellular system has always been a focused research area of computational biology. In this context, a lot of biological data integration systems have been deployed in the last couple of decades. One of the prime and common objectives of all these systems is to better facilitate the end-users for exploring, exploiting, and analyzing the integrated biological data for knowledge extraction. With the advent of especially high-throughput data generation technologies, biological data is growing and dispersing continuously, exponentially, heterogeneously, and geographically. Due to this, biological data integration systems face data integration and data organization-related current and future challenges. The objective of this review is to quantitatively evaluate and compare some of the recent warehouse- based multi-omics data integration systems to check their compliance with the current and future data integration needs. For this, we identified some of the major data integration design characteristics that should be in the multi-omics data integration model to comprehensively address the current and future data integration challenges. Based on these design characteristics and the evaluation criteria, we evaluated some of the recent data warehouse systems and showed categorical and comparative analysis results. Results show that most of the systems exhibit no or partial compliance with the required data integration design characteristics. So, these systems need design improvements to adequately address the current and future data integration challenges while keeping their service level commitments in place.
-
-
-
Mechanism of Actions of Dexamethasone Against COVID-19 Predicted by Alpha Shape Analysis of Binding Sites
Authors: Mengxu Zhu, Avirup Ghosh and Hong YanBackground: COVID-19 emerged in late 2019 and became a pandemic disease with severe mortality and morbidity. No specific remedy exists at present, but some drugs, such as Dexamethasone, have shown clinical benefits against the causative agent, the SARS-CoV-2 virus. Objective: To analyze the binding affinity between drugs and an SARS-CoV-2 protein through geometrical methods and to study the theoretical effectiveness of Dexamethasone as a potential treatment for COVID-19. Methods: The binding affinity of Dexamethasone to the target SARS-CoV-2 protein was compared with those of different inhibitors. Drug molecules were docked to the SARS-CoV-2 main protease, and the system was simulated by molecular dynamics, allowing alpha shape analysis to extract geometrical features, such as the matching rates of atoms, solid angles, and the distances between atoms at interfaces. Binding affinities between drugs and the main protease were assessed by these geometrical data and the free energy of binding. Results: The behaviour of Dexamethasone was similar to other inhibitors. The efficacy of Dexamethasone as a treatment may be due to it being a glucocorticoid and its properties as a potent inhibitor. Conclusion: This study revealed the mechanism of action of Dexamethasone and provided a geometrical method to distinguish among potential drugs for the treatment of COVID-19.
-
-
-
CSBPI_Site: Multi-Information Sources of Features to RNA Binding Sites Prediction
Authors: Lichao Zhang, Zihong Huang and Liang KongBackground: RNA-binding proteins establish posttranscriptional gene regulation by coordinating maturation, editing, transport, stability, and translation of cellular RNAs. Immunoprecipitation experiments could identify the interaction between RNA and proteins, but they are limited due to the experimental environment and material. Therefore, it is essential to construct computational models to identify the function sites. Objective: Although some computational methods have been proposed to predict RNA binding sites, the accuracy could be further improved. Moreover, it is necessary to construct a dataset with more samples to design a reliable model. Here we present a computational model based on multi-information sources to identify RNA binding sites. Methods: We construct an accurate computational model named CSBPI_Site, based on extreme gradient boosting. The specifically designed 15-dimensional feature vector captures four types of information (chemical shift, chemical bond, chemical properties and position information). Results: The satisfied accuracy of 0.86 and AUC of 0.89 were obtained by leave-one-out crossvalidation. Meanwhile, the accuracies were slightly different (range from 0.83 to 0.85) among the three classifiers algorithm, which showed that the novel features are stable and fit to multiple classifiers. These results showed that the proposed method is effective and robust for the identification of noncoding RNA binding sites. Conclusion: Our method based on multi-information sources is effective to represent the binding sites information among ncRNAs. The satisfied prediction results of Diels-Alder riboz-yme based on CSBPI_Site indicates that our model is valuable to identify the function site.
-
-
-
iTSP-PseAAC: Identifying Tumor Suppressor Proteins by Using Fully Connected Neural Network and PseAAC
Authors: Muhammad Awais, Waqar Hussain, Nouman Rasool and Yaser D. KhanBackground: The uncontrolled growth due to accumulation of genetic and epigenetic changes as a result of loss or reduction in the normal function of Tumor Suppressor Genes (TSGs) and Prooncogenes is known as cancer. TSGs control cell division and growth by repairing DNA mistakes during replication and restrict the unwanted proliferation of a cell or activities, that are part of tumor production. Objectives: This study aims to propose a novel, accurate, user-friendly model to predict tumor suppressor proteins, which would be freely available to experimental molecular biologists to assist them using in vitro and in vivo studies. Methods: The prediction model has used the input feature vector (IFV) calculated from the physicochemical properties of proteins based on FCNN to compute the accuracy, sensitivity, specificity, and MCC. The proposed model was validated against different exhaustive validation techniques i.e. self-consistency and cross-validation. Results: Using self-consistency, the accuracy is 99%, for cross-validation and independent testing has 99.80% and 100% accuracy, respectively. The overall accuracy of the proposed model is 99%, sensitivity value 98% and specificity 99% and F1-score was 0.99. Conclusion: It is concluded that the proposed model for prediction of the tumor suppressor proteins can predict the tumor suppressor proteins efficiently, but it still has space for improvements in computational ways as the protein sequences may rapidly increase, day by day.
-
-
-
MDAPlatform: A Component-based Platform for Constructing and Assessing miRNA-disease Association Prediction Methods
Authors: Yayan Zhang, Guihua Duan, Cheng Yan, Haolun Yi, Fang-Xiang Wu and Jianxin WangBackground: Increasing evidence has indicated that miRNA-disease association prediction plays a critical role in the study of clinical drugs. Researchers have proposed many computational models for miRNA-disease prediction. However, there is no unified platform to compare and analyze the pros and cons or share the code and data of these models. Objective: In this study, we developed an easy-to-use platform (MDAPlatform) to construct and assess miRNA-disease association prediction method. Methods: MDAPlatform integrates the relevant data of miRNA, disease and miRNA-disease associations that are used in previous miRNA-disease association prediction studies. Based on the componentized model, it develops different components of previous computational methods. Results: Users can conduct cross validation experiments and compare their methods with other methods, and the visualized comparison results are also provided. Conclusion: Based on the componentized model, MDAPlatform provides easy-to-operate interfaces to construct the miRNA-disease association method, which is beneficial to develop new miRNA-disease association prediction methods in the future.
-
-
-
Identification of Biomarkers and Functional Modules from Genomic Data in Stage-wise Breast Cancer
Background: Breast cancer is the most common cancer in women across the world, with high incidence and mortality rates. Being a heterogeneous disease, gene expression profiling based analysis plays a significant role in understanding breast cancer. Since expression patterns of patients belonging to the same stage of breast cancer vary considerably, an integrated stage-wise analysis involving multiple samples is expected to give more comprehensive results and understanding of breast cancer. Objective: The objective of this study is to detect functionally significant modules from gene coexpression network of cancerous tissues and to extract prognostic genes related to multiple stages of breast cancer. Methods: To achieve this, a multiplex framework is modelled to map the multiple stages of breast cancer, which is followed by a modularity optimization method to identify functional modules from it. These functional modules are found to enrich many Gene Ontology terms significantly that are associated with cancer. Results and Discussion: Predictive biomarkers are identified based on differential expression analysis of multiple stages of breast cancer. Conclusion: Our analysis identified 13 stage-I specific genes, 12 stage-II specific genes, and 42 stage- III specific genes that are significantly regulated and could be promising targets of breast cancer therapy. That apart, we could identify 29, 18 and 26 lncRNAs specific to stage I, stage II and stage III, respectively.
-
-
-
Gene Selection in Multi-class Imbalanced Microarray Datasets Using Dynamic Length Particle Swarm Optimization
Authors: R. D. Priya and R. SivarajBackground: Microarray gene expression datasets usually contain a large number of genes that complicate further operations like classification, clustering and other kinds of analysis. During the classification process, the identification of salient genes is a brainstorming task and needs a careful selection. Methods: The classification of multi-class datasets is more critical when compared with binary classification. When there are multiple class labels, chances are more likely that the datasets are imbalanced. Large variations can be seen in the number of samples belonging to each class, and hence the classification process may go biased with incorrect samples chosen for training. There is no sufficient research work available to address all these three scenarios together in microarray datasets. Results and Discussion: The paper fills this gap with the following contributions: i) Selects salient genes for classification using multiSURF algorithm ii) Identifies right instances from imbalanced datasets using Retained Tomek Link algorithm and iii) Performs gene selection for multi-class classification using Dynamic Length Particle Swarm Optimization (DPSO). Conclusion: The proposed method is implemented on multi-class imbalanced microarray datasets, and the final classification performance is seen to be encouraging and better than other compared methods.
-
-
-
Modeling Hereditary Disease Behavior Using an Innovative Similarity Criterion and Ensemble Clustering
Authors: Musa Mojarad, Fariba Sarhangnia, Amin Rezaeipanah, Hamin Parvin and Samad NejatianBackground: Today, there are various theories about the causes of hereditary diseases, but doctors believe that both genetic and environmental factors play an essential role in the incidence and spread of these diseases. Objective: In order to identify genes that are cause the disease, inter-cell or inter-tissue communications must be determined. The inter-cells or inter-tissues interaction could be illustrated by applying the gene expression. The disorders that have led to widespread changes could be identified by investigating gene expression information. Methods: In this paper, identifying inter-cell and inter-tissue communications for various diseases has been accomplished utilizing an innovative similarity criterion of the graph topological structure characteristics and an extended clustering ensemble. The proposed method is performed in two stages: first, several clustering models have been combined to detect initial inter-cell or inter-tissue communications and produce better results than singular algorithms. Second, the cell-to-cell or tissue-totissue similarity in each cluster is identified through a similarity criterion based on the graph topological structure. Results: The evaluation of the proposed method has been carried out, benefiting the UCI and FANTOM5 datasets. The results of experiments over FANTOM5 dataset report that the Silhouette coefficient equals 0.901 in 18 clusters for cells and equal to 0.762 in 13 clusters for tissues. Conclusion: The maximum inter-cells or inter-tissues similarity in each cluster can be exploited to detect the relationships between diseases.
-
Volumes & issues
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)