- Home
- A-Z Publications
- Current Bioinformatics
- Previous Issues
- Volume 13, Issue 1, 2018
Current Bioinformatics - Volume 13, Issue 1, 2018
Volume 13, Issue 1, 2018
-
-
Analysis and Prediction of Nitrated Tyrosine Sites with the mRMR Method and Support Vector Machine Algorithm
Authors: Shao P. Wang, Qing Zhang, Jing Lu and Yu-Dong CaiBackground: The post-translational modification of tyrosine nitration is an important covalently substituted process in many biochemical processes that are closely related to several human diseases. Objective: Therefore, the correct recognition of nitration sites is useful for diseases diagnosis and the design of effective treatments. However, traditional experimental techniques and methods for the identification of nitration sites are time-consuming, labor-intensive and expensive. Alternatively, effective computational methods can be designed to tackle this problem. Method: In this study, we proposed a computational workflow to identify and analyze nitrated tyrosine residues in proteins. Specifically, each nitrated tyrosine was represented by features derived from a segment of an amino acid sequence of the protein containing the nitrated tyrosine site. A reliable feature selection method, minimum redundancy maximum relevance, was adopted to analyze these features, and incremental feature selection and a type of support vector machine, SMO (sequential minimal optimization), were employed to extract core features and build an optimal prediction classifier. Results: 223 features were extracted and used to build the optimal prediction classifier, with which the Matthew's correlation coefficient (MCC) of the training set was 0.717. The nitration sites in the testing set were extracted from the UniProt database based on the sequence similarity technique and were all denoted as positive samples. The sensitivity of the optimal classifier was 0.950 for the testing set. The results demonstrate the effectiveness and importance of optimal features and the classifier for the recognition of nitration sites. In addition, three other methods, the nearest neighbor algorithm (NNA), Dagging and random forest (RF) methods were also applied to the training and testing set, and the results were compared with those of the SMO. Conclusion: 61 core features of the 223 total features were analyzed, and this analysis revealed the essential residue types and conserved sites proximal to the central tyrosine residue.
-
-
-
Prediction of Protein-Peptide Interactions with a Nearest Neighbor Algorithm
Authors: Bi-Qing Li, Yu-Hang Zhang, Mei-Ling Jin, Tao Huang and Yu-Dong CaiBackground: As a crucial component of the entire protein-protein interaction (PPI) network, protein-peptide interactions are ubiquitous in living cells. These interactions play important roles in signaling transduction and regulation. Compared with laborious and time-consuming experimental approaches, predicting protein-peptide interactions with effective computational methods could be convenient and rapid. Method: This study proposed a novel method for the prediction of interactions between proteins and peptides using various features extracted from both proteins and peptides. The traditional amino acid composition as well as pseudo-amino acid composition and features derived from 205 domains were utilized to represent a protein-peptide interaction. The predictor was constructed based on four different machine learning algorithms including SMO (sequential minimal optimization), IB1 (nearest neighbor algorithm), dagging, and random forest (RF). All features were analyzed by some feature selection technologies, such as the maximum relevance minimum redundancy method and the incremental feature selection method, to extract optimal features. Additionally, an optimal predictor based on IB1 was constructed according to the extracted optimal features. Results: MCC values of 0.4436 for the cross-validation test of the training set and 0.4444 for the independent test set were obtained with the IB1 algorithm. Different encoding methods were compared. The domain-based method outperformed the pseudo-amino acid composition method. An optimal feature set of 230 features was selected, which contributed most to the prediction of the protein-peptide pairs. Conclusion: Several important domains related to some features in the optimal feature set were deemed to play key roles in determining the protein-peptide interactions.
-
-
-
Euler String-Based Compression of Tree-Structured Data and its Application to Analysis of RNAs
Authors: Liwei Liu, Tomoya Mori, Yang Zhao, Morihiro Hayashida and Tatsuya AkutsuBackground: Data compression is essential for efficient large-scale data processing, so that a number of studies have been done. Grammar-based compression is to find a small grammar that generates input data, and it has been used not only for data compression but also for analysis of biological data since it is useful for pattern extraction. Objective: Recently, for rooted ordered trees, a special kind of network structures, elementary ordered tree grammar (EOTG) has been defined by extending context-free grammar (CFG) and an integer-programming (IP) method which finds the smallest EOTG for input data has also been proposed and applied to extract common pattern of RNA secondary structures. However, the method is not so efficient for large input trees. Therefore, development of an efficient method is important. Methods: We propose an Euler string-based compression approach that finds the smallest CFG for the Euler string corresponding to an input rooted ordered tree. Results: From a theoretical viewpoint, we show that there exists a gap of compression ratios between the tree grammar-based approach and Euler string-based approach. From a practical viewpoint, we show the efficiency and effectiveness of our proposed approach by applying it to comparison of RNA secondary structures. Conclusion: The experimental results indicate that the Euler string-based approach can efficiently compress tree-structured data retaining some structural information of them.
-
-
-
Improving Clustering of MicroRNA Microarray Data by Incorporating Functional Similarity
Authors: Yang Yang, Zhichen Wu and Wei KongBackground: MicroRNAs (miRNAs) are short non-coding RNAs that serve as key regulators at post-transcriptional level in many important biological processes. In recent years, miRNA expression profiles have been largely investigated and demonstrated to be promising biomarkers for discriminating subtypes of complex human diseases and measuring treatment effects. Methods: Most of the analysis approaches for DNA microarray data can be applied to miRNA microarray data, such as statistical test for differential expression analysis, and clustering for coregulation analysis. Benefitting from the comprehensive annotation available for protein -coding genes, gene expression analysis is usually guided by prior biological knowledge in order to obtain more biologically meaningful results. However, functional annotation of miRNAs is relatively few, thus the prior knowledge-based methods are hard to be applied to miRNAs. In this paper, we incorporate gene ontology information of the target genes of miRNAs for the clustering of miRNAs , and propose a combined similarity measure. Results: The experiments were conducted on two public miRNA microarray data sets. Experimental results show that the new similarity measure can improve the quality of clustering with regard to the classification accuracy and functional enrichment significance of clusters. Conclusion: The clustering of microRNA expression profiles can be improved by incorporating domain knowledge, thus resulting in more functionally compact clusters, which are the basis for the identification of potential miRNA biomarkers and the construction of miRNA co-regulation networks.
-
-
-
Cloning, Bioinformatic Analysis and Expression Pattern of Phospholipase D Gene Family in Vitis vinifera
Authors: Dan Yu, Qimin Chen, Weidong Huang, Sibao Wan and Jicheng ZhanBackground: The phospholipase D genes have been identified to play critical roles in plant growth and stress responses. There are 12 and 17 PLD members conformed in Arabidopsis and Oryza sativa respectively. Different PLD isoforms have distinct regulatory and catalytic properties. Method: In this study, the VvPLD genes were cloned using the genome-wide search and RT-PCR amplification from suspenson-cultured grape cells (Vitis vinifera L. cv. Cabernet Sauvignon). A comprehensive bioinformatics analysis of the VvPLD family was then performed.To further investigate the function of the VvPLD genes during pathogen infection process, Botrytis cinerea was used to attack suspenson-cultured Cabernet Sauvignon cells. The mRNA expression patterns of VvPLDs were examined by quantitative real-time PCR. Results: Ten PLD coding sequences (CDS) and two PLD genes segments were isolated from grape berry suspension cells. The VvPLDs were characterized and classified into 6 types (2 VvPLDαs, 2 VvPLDβs, 3 VvPLDδs, 1 VvPLDε, 1 VvPLDρ and 1 VvPLDζ) and 3 groups (C2-PLD, PXPH-PLD and SP-PLD). Quantitative real-time RT-PCR analysis showed that VvPLDβ1, VvPLDβ2, VvPLDδ2, VvPLDρ and VvPLDζ were up-regulated, whereas VvPLDα and VvPLDδ were down-regulated during Botrytis cinerea infection. Immunoblotting with AtPLDα1 antibodies detected a higher abundance of VvPLDα in infected grape cells, which was in accordance with its enzyme activity. Conclusion: The results of this study will be useful in selecting candidate genes related to disease resistance in grapevine and pave the way for further functional verification of the VvPLD genes.
-
-
-
Discriminating Ramos and Jurkat Cells with Image Textures from Diffraction Imaging Flow Cytometry Based on a Support Vector Machine
Authors: Ning Zhang, Yu Sa, Yu Guo, Wang Lin, Ping Wang and Yuanming FengBackground: The flow cytometry (FCM) has been widely used in both basic and clinical research applications. However, the conventional noncoherent fluorescence and the bright or dark field images acquired spatially integrated and can only yield limited information. Few 3D morphological features of cells can be unveiled. Objective: Diffraction imaging techniques can be used to improve the flow cytometry system and to reflect some 3D morphological features of cells. Method: The newly developed diffraction imaging flow cytometry system (DIFC) in our previous studies could be used to compensate conventional flow cytometries to reflect a cell's 3D morphological features. In this study, we developed a method based on a Support Vector Machine to classify the diffraction images acquired from human acute leukaemia T (Jurkat) cells and Burkitt lymphoma B (Ramos) cells with the diffraction imaging flow cytometry system technique. Results: As a result, an accuracy of 99.38% with MCC value of 0.9875 was achieved in an independent testing dataset, which indicated that the DIFC system could differentiate the cells. Conclusion: It is indicated by the results that strong correlation exists between the characteristic parameters of the images and the 3D morphological features of cells. Since diffraction images correlate strongly to the 3D morphology of cells, this system could be used for studies concerning cellular morphology.
-
-
-
Cancer Diagnosis Through IsomiR Expression with Machine Learning Method
Authors: Zhijun Liao, Dapeng Li, Xinrui Wang, Lisheng Li and Quan ZouBackground: IsomiR is an isoform of microRNA (miRNA), and its sequences vary from those of a reference miRNA, which arose with the advencements of deep sequencing, high miRNA variability has been detected from the same miRNA precursor. IsomiR exists in four main types formed through the following processes: 5' or 3' trimming, Nucleotide addition, Nucleotide removal, and posttranscriptional RNA editing. Objective: For cancer diagnosis, it needs to explore differential expression profiles which can be used to distinguish cancer and normal cell lines, especially in the isomiR-mRNA regulatory networks, because aberrant isomiR expression profiles may contribute to tumorigenesis. Method: We extracted five features of the isomiR read counts from RNA-SEQ data in TCGA, with a random forest classification algorithm, these features were applied to diagnose six cancers: breast invasive carcinoma, lung adenocarcinoma, squamous-cell carcinoma of the lung, stomach adenocarcinoma, thyroid carcinoma, and uterine corpus endometrial carcinoma. Results: Compare with the classifier libD3C, our method can be utilized to distinguish cancers from their normal counterparts by performance based on sn, sp, ACC and MCC measures. Conclusion: IsomiR can be successfully and effectively used to diagnose cancer through machine learning method from high-throughput data.
-
-
-
A Review of Software Tools for Pathway Crosstalk Inference
Background: We are living in an era that is in general characterized by a lot of data but little information. An enormous amount of biological data collected over several years is now presented as annotations and databases. In this context, all this data properly combined and grouped has great potential for enabling novel discoveries which would then, finally and hopefully, lead to advances in biology and medicine. The inference of different kinds of relations between pathways constitutes a challenging step towards the analysis of all these sources of biological data. Objective: This review article aims at outlining several methods that analyze associations between pathways starting from different sources of information, namely the internet, databases, and/or gene expression data. Methods: The article consists of a summary of the most important methods for pathway networks inference and arranges them according to the data they use as well as the findings they provide. Results: The advantages and drawbacks of each considered methodology are presented, as well as a taxonomy tree and summary table as an overview of the discussion. Conclusion: The methods explained in this paper consist especially of those that explore the concept of associations between pathways using microarray experimental data and/or topological or curated information. Each strategy was introduced, classified and analyzed. The identification of different kinds of associations between pathways plays a central role in systems biology, revealing information which is undetectable at a gene level. Therefore, a comprehensible understanding of the benefits and limitations of these approaches could be the key to the development of new computational strategies for genome-wide analysis.
-
-
-
A Weighted Association Rule Mining Method for Predicting HCV-Human Protein Interactions
Authors: Murugan Indhumathy, Ahmed R. Nabhan and Subramanian ArumugamBackground: Hepatitis C Virus causes the most severe form of chronic liver disease and nearly 200 million people worldwide are estimated to be infected with this virus. Much about the HCV pathogenesis process is still unknown. The study of interactions between HCV and human proteins will lead to deeper understanding of HCV mechanism. Objective: The objective of this paper is to predict potentially new HCV-Human protein interactions using a weighted association rule mining technique. Methods: A new computational method was developed for mining associations within a bipartite graph that was constructed from the HCV-human protein interactions dataset. A new mathematical model was applied to weigh the discovered association rules based on Gene Ontology annotations of viral and human proteins. HCVpro database was used to generate a human-viral bipartite graph that was then analyzed computationally to extract biclusters within the graph. Association rules were extracted from the bipartite graph and weighted using a mathematical model that incorporated information about proteins available from Gene Ontology knowledge base. Results: Forty two new interactions between HCV and human proteins were predicted. Some of these predicted interactions were validated through literature survey and enrichment studies such as Gene ontology-based analysis, pathway- based analysis and disease association based analysis. Conclusion: The methodology developed in this paper can also be used for various other kind of data analysis and hence it carries a wide scope. This will be useful to conduct similar kind of experiments for other disease databases.
-
-
-
A Composite Entropy Model in a Multiobjective Framework for Gene Regulatory Networks
Authors: Aurpan Majumder, Mrityunjay Sarkar, Harihar Dash and Indupalli AkhileshBackground: Transcription Factors (TFs) play a pivotal role in a Gene Regulatory Network (GRN) by differentially regulating genes across conditions. In some cases, it requires coordinated regulation of multiple TFs to control a Differentially Expressed (DE) gene. In this line, we have also developed simple architectures to unveil the parallel regulatory control by TFs. Objective: To date there are few works that have conducted active research to develop serial TF regulatory paths. In order to make some contribution in this specific area, here we have proposed an algorithm which puts up an architecture of multiple serial TF regulatory paths for a target gene. Methods: In order to explore the full potential of our algorithm we have tested it on one synthetic and three eukaryotic organism gene expression datasets. We were able to construct multiple transcription factor regulatory paths with varying lengths to each target differentially expressed gene with such transcription factors distributed across various multiobjective optimal fronts based on their regulatory properties. This is followed by multiple stage minimal entropy analysis. Conclusion: Through this multiple stage composite entropy approach we have not only assessed the strength of transcription factor to target interaction pathways supported by different literatures but added some new interactions and deleted a few existing ones having weak regulatory control probabilities.
-
-
-
Development and Cross-genera Transferability of Ginger EST-SSR Markers for Cardamom
Authors: Mathavaraj Sakthipriya and Kalluvettankuzhy K. SabuBackground: Cardamom (Elettaria cardamomum Maton) is an important commodity spice that comes under the ginger family (Zingiberaceae). Several genetic markers are widely being used to analyze plant genomes. However, genetic mapping of cardamom has never been attempted owing to the lack of sufficient number of high quality genetic markers and other pertinent genome information. Objective: The goals of the present study were to design SSR markers from the EST sequences of ginger (Zingiber officinale) and to validate the same in cardamom for demonstrating cross generic transferability. Methods: 38,116 expressed sequence tags of ginger downloaded from the NCBI dbEST database were used to develop and validate co-dominant, multi-allelic SSR markers. Results: A total of 1214 SSRs including mono, di, tri, tetra and hexa repeats were identified in the study. The validation through SSR-PCR followed by agarose gel electrophoresis was carried out for the developed markers. Genetic analysis of the SSR markers showed polymorphism and it clearly differentiated wild genotypes from cultivars and wild escapes from the plantations. Large cardamom, Amomum subulatum Roxb. was used as an outgroup and the newly developed EST-SSR markers were amplified well in this species. Conclusion: The newly developed EST-SSRs could be useful as reproducible markers for cardamom genetic studies.
-
-
-
SCAN-Toolbox: Structural COBRA Add-oN (SCAN) for Analysing Large Metabolic Networks
Authors: Yazdan Asgari, Zahra Zabihinpour and Ali Masoudi-NejadBackground: Modelling and analysis of metabolic networks can be carried out with different levels of granularity, ranging from structural (topological) analysis to constraint-based methods to kinetic modelling. The increasing number of genome-scale metabolic reconstructions and the growing importance of metabolic modelling intensify the need for integrated approaches allowing the application of different analysis methods. Objective: There is no well-organized and freely accessible code package or specific toolbox/plugin available for the reconstruction of metabolite- and enzyme-centric networks. So, we have developed a toolbox which could be able to perform such reconstructions. Method: We have developed the SCAN-toolbox, an Add-on to the widely used COBRA framework for constraint-based modelling. SCAN is an open-source MATLAB/Octave toolbox which extends the COBRA framework and provides functions to prepare different metabolite- and enzyme-centric networks that could also be imported in other structural analysis softwares, such as Cytoscape. Results: We have applied the SCAN-toolbox to four metabolic networks to demonstrate its applicability. Conclusion: SCAN extends the COBRA framework towards structural analysis of large metabolic networks, including the reconstruction of undirected and directed metabolite- as well as enzyme-centric networks.
-
Volumes & issues
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)