Volume 23, Issue 2

Current Genomics - Volume 23, Issue 2, 2022

Volume 23, Issue 2, 2022

- GAAP: A GUI-based Genome Assembly and Annotation Package
  
  Authors: Deepak Singla and Inderjit S. Yadav
  
  https://doi.org/10.2174/1389202923666220128155537
  More Less
  
  Background: Next-generation sequencing (NGS) technologies are being continuously used for high-throughput sequencing data generation that requires easy-to-use GUI-based data analysis software. These kinds of software could be used in-parallel with sequencing for the automatic data analysis. At present, very few software are available for use and most of them are commercial, thus creating a gap between data generation and data analysis. Methods: GAAP is developed on the NodeJS platform that uses HTML, JavaScript as the front- end for communication with users. We have implemented FastQC and trimmomatic tool for quality checking and control. Velvet and Prodigal are integrated for genome assembly and gene prediction. The annotation will be done with the help of remote NCBI Blast and IPR-Scan. In the backend, we have used PERL and JavaScript for the processing of data. To evaluate the performance of GAAP, we have assembled a viral (SRR11621811), bacterial (SRR17153353) and human genome (SRR16845439). Results: We have used GAAP software to assemble, and annotate a COVID-19 genome on a desktop computer that resulted in a single contig of 27994bp with 99.57% reference genome coverage. This assembly predicted 11 genes, of which 10 were annotated using annotation module of GAAP. We have also assembled a bacterial and human genome 138 and 194281 contigs with N50 value 100399 and 610, respectively. Conclusion: In this study, we have developed freely available, platform-independent genome assembly and annotation (GAAP) software (www.deepaklab.com/gaap). The software itself acts as a complete data analysis package with quality check, quality control, de-novo genome assembly, gene prediction and annotation (Blast, PFAM, GO-Term, pathway and enzyme mapping) modules.
  
  Add to my favourites
  
  Email this

- Integrating LASSO Feature Selection and Soft Voting Classifier to Identify Origins of Replication Sites
  
  Authors: Yingying Yao, Shengli Zhang and Tian Xue
  
  https://doi.org/10.2174/1389202923666220214122506
  More Less
  
  Background: DNA replication plays an indispensable role in the transmission of genetic information. It is considered to be the basis of biological inheritance and the most fundamental process in all biological life. Considering that DNA replication initiates with a special location, namely the origin of replication, a better and accurate prediction of the origins of replication sites (ORIs) is essential to gain insight into the relationship with gene expression. Objective: In this study, we have developed an efficient predictor called iORI-LAVT for ORIs identification. Methods: This work focuses on extracting feature information from three aspects, including mononucleotide encoding, k-mer and ring-function-hydrogen-chemical properties. Subsequently, least absolute shrinkage and selection operator (LASSO) as a feature selection is applied to select the optimal features. Comparing the different combined soft voting classifiers results, the soft voting classifier based on GaussianNB and Logistic Regression is employed as the final classifier. Results: Based on 10-fold cross-validation test, the prediction accuracies of two benchmark datasets are 90.39% and 95.96%, respectively. As for the independent dataset, our method achieves high accuracy of 91.3%. Conclusion: Compared with previous predictors, iORI-LAVT outperforms the existing methods. It is believed that iORI-LAVT predictor is a promising alternative for further research on identifying ORIs.
  
  Add to my favourites
  
  Email this

- Comparisons of Forecasting for Survival Outcome for Head and Neck Squamous Cell Carcinoma by using Machine Learning Models based on Multi-omics
  
  Authors: Liying Mo, Yuangang Su, Jianhui Yuan, Zhiwei Xiao, Ziyan Zhang, Xiuwan Lan and Daizheng Huang
  
  https://doi.org/10.2174/1389202923666220204153744
  More Less
  
  Background: Machine learning methods showed excellent predictive ability in a wide range of fields. For the survival of head and neck squamous cell carcinoma (HNSC), its multi-omics influence is crucial. This study attempts to establish a variety of machine learning multi-omics models to predict the survival of HNSC and find the most suitable machine learning prediction method. Methods: The HNSC clinical data and multi-omics data were downloaded from the TCGA database. The important variables were screened by the LASSO algorithm. We used a total of 12 supervised machine learning models to predict the outcome of HNSC survival and compared the results. In vitro qPCR was performed to verify core genes predicted by the random forest algorithm. Results: For omics of HNSC, the results of the twelve models showed that the performance of multiomics was better than each single-omic alone. Results were presented, which showed that the Bayesian network(BN) model (area under the curve [AUC] 0.8250, F1 score=0.7917) and random forest(RF) model (area under the curve [AUC] 0.8002,F1 score=0.7839) played good prediction performance in HNSC multi-omics data. The results of in vitro qPCR were consistent with the RF algorithm. Conclusion: Machine learning methods could better forecast the survival outcome of HNSC. Meanwhile, this study found that the BN model and the RF model were the most superior. Moreover, the forecast result of multi-omics was better than single-omic alone in HNSC.
  
  Add to my favourites
  
  Email this

- Prospective Analysis of Proteins Carried in Extracellular Vesicles with Clinical Outcome in Hepatocellular Carcinoma
  
  Authors: Wenbiao Chen, Feng Zhang, Huixuan Xu, Xianliang Hou and Donge Tang
  
  https://doi.org/10.2174/1389202923666220304125458
  More Less
  
  Background: Extracellular vehicles (EVs) contain different proteins that relay information between tumor cells, thus promoting tumorigenesis. Therefore, EVs can serve as an ideal marker for tumor pathogenesis and clinical application. Objective: Here, we characterised EV-specific proteins in hepatocellular carcinoma (HCC) samples and established their potential protein-protein interaction (PPI) networks. Materials and Methods: We used multi-dimensional bioinformatics methods to mine a network module to use as a prognostic signature and validated the model’s prediction using additional datasets. The relationship between the prognostic model and tumor immune cells or the tumor microenvironment status was also examined. Results: 1134 proteins from 316 HCC samples were mapped to the exoRBase database. HCC-specific EVs specifically expressed a total of 437 proteins. The PPI network revealed 321 proteins and 938 interaction pathways, which were mined to identify a three network module (3NM) with significant prognostic prediction ability. Validation of the 3NM in two more datasets demonstrated that the model outperformed the other signatures in prognostic prediction ability. Functional analysis revealed that the network proteins were involved in various tumor-related pathways. Additionally, these findings demonstrated a favorable association between the 3NM signature and macrophages, dendritic, and mast cells. Besides, the 3NM revealed the tumor microenvironment status, including hypoxia and inflammation. Conclusion: These findings demonstrate that the 3NM signature reliably predicts HCC pathogenesis. Therefore, the model may be used as an effective prognostic biomarker in managing patients with HCC.
  
  Add to my favourites
  
  Email this

- Draft Genome Sequence of the Earthworm Eudrilus eugeniae
  
  Authors: Arun Arumugaperumal, Dinesh K. Sudalaimani, Vaithilingaraja Arumugaswami and Sudhakar Sivasubramaniam
  
  https://doi.org/10.2174/1389202923666220401095626
  More Less
  
  Background: Earthworms are annelids. They play a major role in agriculture and soil fertility. Vermicompost is the best organic manure for plant crops. Eudrilus eugeniae is an earthworm well suited for efficient vermicompost production. The worm is also used to study the cell and molecular biology of regeneration, molecular toxicology, developmental biology, etc., because of its abilities like high growth rate, rapid reproduction, tolerability toward wide temperature range, and less cost of maintenance. Objective: The whole genome has been revealed only for Eisenia andrei and Eisenia fetida. Methods: In the present work, we sequenced the genome of E. eugeniae using the Illumina platform and generated 160,684,383 paired-end reads. Results: The reads were assembled into a draft genome of size 488 Mb with 743,870 contigs and successfully annotated 24,599 genes. Further, 208 stem cell-specific genes and 3,432 non-coding genes were identified. Conclusion: The sequence and annotation details were hosted in a web application available at https://sudhakar-sivasubramaniam-labs.shinyapps.io/eudrilus_genome/.
  
  Add to my favourites
  
  Email this

- Infestation of Rice by Gall Midge Influences Density and Diversity of Pseudomonas and Wolbachia in the Host Plant Microbiome
  
  Authors: Deepak K. Sinha, Ayushi Gupta, Ayyagari P. Padmakumari, Jagadish S. Bentur and Suresh Nair
  
  https://doi.org/10.2174/1389202923666220401101604
  More Less
  
  Background: The virulence of phytophagous insects is predominantly determined by their ability to evade or suppress host defense for their survival. The rice gall midge (GM, Orseolia oryzae), a monophagous pest of rice, elicits a host defense similar to the one elicited upon pathogen attack. This could be due to the GM feeding behaviour, wherein the GM endosymbionts are transferred to the host plant via oral secretions, and as a result, the host mounts an appropriate defense response(s) (i.e., up-regulation of the salicylic acid pathway) against these endosymbionts. Methods: The current study aimed to analyze the microbiome present at the feeding site of GM maggots to determine the exchange of bacterial species between GM and its host and to elucidate their role in rice-GM interaction using a next-generation sequencing approach. Results: Our results revealed differential representation of the phylum Proteobacteria in the GMinfested and -uninfested rice tissues. Furthermore, analysis of the species diversity of Pseudomonas and Wolbachia supergroups at the feeding sites indicated the exchange of bacterial species between GM and its host upon infestation. Conclusion: As rice-GM microbial associations remain relatively unstudied, these findings not only add to our current understanding of microbe-assisted insect-plant interactions but also provide valuable insights into how these bacteria drive insect-plant coevolution. Moreover, to the best of our knowledge, this is the first report analyzing the microbiome of a host plant (rice) at the feeding site of its insect pest (GM).
  
  Add to my favourites
  
  Email this

- MetaConClust - Unsupervised Binning of Metagenomics Data using Consensus Clustering
  
  Authors: Dipro Sinha, Anu Sharma, Dwijesh C. Mishra, Anil Rai, Shashi Bhushan Lal, Sanjeev Kumar, Moh. Samir Farooqi and Krishna Kumar Chaturvedi
  
  https://doi.org/10.2174/1389202923666220413114659
  More Less
  
  Background: Binning of metagenomic reads is an active area of research, and many unsupervised machine learning-based techniques have been used for taxonomic independent binning of metagenomic reads. Objective: It is important to find the optimum number of the cluster as well as develop an efficient pipeline for deciphering the complexity of the microbial genome. Methods: Applying unsupervised clustering techniques for binning requires finding the optimal number of clusters beforehand and is observed to be a difficult task. This paper describes a novel method, MetaConClust, using coverage information for grouping of contigs and automatically finding the optimal number of clusters for binning of metagenomics data using a consensus-based clustering approach. The coverage of contigs in a metagenomics sample has been observed to be directly proportional to the abundance of species in the sample and is used for grouping of data in the first phase by MetaConClust. The Partitioning Around Medoid (PAM) method is used for clustering in the second phase for generating bins with the initial number of clusters determined automatically through a consensus- based method. Results: Finally, the quality of the obtained bins is tested using silhouette index, rand Index, recall, precision, and accuracy. Performance of MetaConClust is compared with recent methods and tools using benchmarked low complexity simulated and real metagenomic datasets and is found better for unsupervised and comparable for hybrid methods. Conclusion: This is suggestive of the proposition that the consensus-based clustering approach is a promising method for automatically finding the number of bins for metagenomics data.
  
  Add to my favourites
  
  Email this

Current Genomics - Volume 23, Issue 2, 2022

Volume 23, Issue 2, 2022

Volumes & issues

Most Read This Month

Most Cited Most Cited RSS feed