- Home
- A-Z Publications
- Current Bioinformatics
- Previous Issues
- Volume 7, Issue 1, 2012
Current Bioinformatics - Volume 7, Issue 1, 2012
Volume 7, Issue 1, 2012
-
-
Editorial
More LessWhat can be compared to travel thousands of miles from your place, arriving for the first time in a new country and suddenly finding you at home? This is the experience the great writer and philosopher Gilbert Keith Chesterton described one hundred years ago equating his spiritual and intellectual experience to an English explorer that arrives, after a long and dangerous journey, to an island and suddenly discovers it is Great Britain [1]. A similar ‘back-to-the-roots’ journey is now experienced by Bioinformatics, a discipline that started with a sort of ancillary attitude with respect to biology: this attitude is reflected by the emphasis on ‘informatics’, i.e. on making research work more efficient and not on the opening of new avenues. After two decades, the character of the discipline gradually changed, and Current Bioinformatics was a privileged observatory to look at this change: the development of new data analysis tools (I prefer the term data analysis with respect to informatics because it is more linked to the semantic part of the job), as often happened in science, unravelled a state-of-the-world completely different from what expected by mainstream biologists. Computational research urged biologists to consider the physical reality of the biological regulation so making a step ahead the ‘gene-by-gene’ approach (that did not fulfil its promises on both application and knowledge fronts). The classical way (started from Monod's ‘Operon’ discovery [2]) of dissecting the biological regulation into separate elementary steps simply adding up to create the complete picture was rapidly discovered as a sort of ‘biology-in-the-vacuum’ that did not take into account the basic physical laws governing our world. Just to make an example, have you ever considered which is the probability of dozens of ‘ordered sequential encounters’ necessary to the completion of Krebs cycle in a purely diffusion regime? Similar examples can be made for the onset of a coordinated and repeatable gene expression profile at the tissue level supported by a largely stochastic behaviour at single cell level [3], or for the presence of few ‘ideal’ protein folds out of the transfinite number of possible configurations. The entering into biological themes of scientists with a different background, fostered by Bioinformatics, now makes it possible (going back to the Krebs cycle this means we must think of reactions happening in a solid state and thus taking into consideration the relevance of ordered phases like cytoskeleton, while protein preferred shapes pushes our attention to the presence of a configuration space constrained by still largely unknown global energy landscapes) to push Biology back to its chemicophysical roots . This represents a great hope: any relevant revolution starts from the re-discovery of the past. It is not a case this revolution started from taking seriously the network paradigm: formalizing the biological systems as networks corresponds to shift from a paradigm where a given layer of organization (typically genes) is considered as the basis for the explanation of the whole system that follows from the basic layer in terms of quasi-deterministic rules to a ‘mesoscopic’ paradigm [4] where the focus is on the relations linking the different elements of the system. This paradigm shift reminds me of the introduction of the concept of field in physics that more than one century ago completely changed our mechanistic view of the world. Current Bioinformatics will remain a vibrant forum for the computational scientists if it will continue to keep alive and clear the strict link between computational methods and biological reality. REFERENCES [1] Chesterton GK, Orthodoxy. (http://www.gutenberg.org/ebooks/16769) 1908. [2] Jacob F, Monod J. On the regulation of gene activity. Cold Spring Harb Symp Quant Biol 1961; 26: 193-211. [3] Huang S. Reprogramming cell fates: reconciling rarity with robustness. BioEssays 2009; 31: 546-560. [4] Laughlin RB, Pines D, Schmalian J, Stojkovic BP, Wolynes P. The middle way. Proc Natl Acad Sci USA 2000; 97: 32-37.
-
-
-
Recent Directions in Compressing Next Generation Sequencing Data
Authors: Malay Bhattacharyya, Manas Bhattacharyya and Sanghamitra BandyopadhyayBioinformatics research in the last three decades has contributed quite a large number of methodologies for compressing genomic sequence data. However, recent progress in the next generation sequencing (NGS) techniques requires the development of more effective compression methods. In this review paper, a comprehensive overview of the state-of-the-art DNA sequence compression techniques for handling the exponential growth of DNA sequence data, emerging from NGS techniques, is provided.
-
-
-
Generalized String Pseudo-Folding Lattices in Bioinformatics: State-of-Art Review, New Model for Enzyme Sub-Classes, and Study of ESTs on Trichinella spiralis
Several graph representations have been introduced for different data in theoretical biology. For instance, Complex Networks based on Graph theory are used to represent the structure and/or dynamics of different large biological systems such as protein-protein interaction networks. In addition, Randic, Liao, Nandy, Basak, and many others developed some special types of graph-based representations. This special type of graph includes geometrical constrains to node positioning (sequence pseudo-folding rules) in 2D space and adopts final geometrical shapes that resemble lattice-like patterns. Lattice networks have been used to visually depict DNA and protein sequences but they are very flexible. In fact, we can use this technique to create string pseudo-folding lattice representations for any kind of string data. However, despite the proved efficacy of new Lattice-like graph/networks to represent diverse systems, most works focus on only one specific type of biological data. In this work, we review both classic and generalized types of lattice graphs as well as examples that illustrate how to use it in order to represent and compare biological data from different sources. The examples reviewed include the following cases: Protein sequence; Mass Spectra (MS) of protein Peptide Mass Fingerprints (PMF); Molecular Dynamic Trajectory (MDTs) from structural studies; mRNA Microarray data; Single Nucleotide Polymorphisms (SNPs); 1D or 2D-Electrophoresis study of protein Polymorphisms and Protein-research patent and/or copyright information. We used data available from public sources for some examples but for other, we used experimental results reported herein for the first time. This work may break new ground for the application of graph theory in theoretical biology and other areas of biomedical sciences. In addition, we carried out the statistical analysis of 50,000+ cases to seek and validate a new QSAR-like predictor for enzyme sub-classes. The model use as inputs spectral moments of pseudo-folding lattice graphs. Last we illustrated the use of this model to study 4,000+ ESTs of protein sequences present on the parasite Trichinella spiralis.
-
-
-
Substitution Transformation of Score Matrix for Improving Alignment Quality of Local Sequence of Distantly Related Proteins
Authors: Juan Li and Huisheng FangThis paper presents a transformation method of original score matrix by correspondently normalizing its elements in row and column. The sequence alignment of 4830 superfamilies and 9570 fold related protein pairs showed that the quality of the alignment based on the transformed matrices is significantly improved compared to that based on the original matrices, especially for the fold related protein pairs, which is possibly due to that the elements in the transformed matrix contain more similarity information between two residues than those in the original matrix. These results indicate that for the sequence alignment of distantly related proteins, the original score matrix should probably be replaced by its correspondently transformed matrix to enhance the quality of sequence alignment.
-
-
-
Biclustering Analysis for Pattern Discovery: Current Techniques, Comparative Studies and Applications
Authors: Hongya Zhao, Alan Wee-Chung Liew, Doris Z. Wang and Hong YanBiclustering analysis is a useful methodology to discover the local coherent patterns hidden in a data matrix. Unlike the traditional clustering procedure, which searches for groups of coherent patterns using the entire feature set, biclustering performs simultaneous pattern classification in both row and column directions in a data matrix. The technique has found useful applications in many fields but notably in bioinformatics. In this paper, we give an overview of the biclustering problem and review some existing biclustering algorithms in terms of their underlying methodology, search strategy, detected bicluster patterns, and validation strategies. Moreover, we show that geometry of biclustering patterns can be used to solve biclustering problems effectively. Well-known methods in signal and image analysis, such as the Hough transform and relaxation labeling, can be employed to detect the geometrical biclustering patterns. We present performance evaluation results for several of the well known biclustering algorithms, on both artificial and real gene expression datasets. Finally, several interesting applications of biclustering are discussed.
-
-
-
Optimal Control for Generalized Asynchronous Probabilistic Boolean Networks†
Authors: Qiuli Liu, Xianping Guo and Tianshou ZhouAs a paradigm for modeling gene regulatory networks, synchronous or asynchronous probabilistic Boolean networks (PBNs) provide us an effective tool to design therapeutic intervention strategies. However, most of the previous works focused on the former and only few studied the latter. This paper deals with an optimal control problem in a generalized asynchronous PBN by applying the theory on semi-Markov decision processes. Specifically, we first formulate a control model for a generalized asynchronous PBN as a first passage model for semi-Markov decision processes and then solve the corresponding optimal control problem by choosing optimal constituent Boolean networks in the asynchronous PBN such that the risk probability that the first passage time to some undesirable states associated with disease does not exceed a certain time is minimal. Numerical simulations are also provided to demonstrate the effectiveness of the proposed optimality approach.
-
-
-
Comparative Analysis of Clustering and Biclustering Algorithms for Grouping of Genes: Co-Function and Co-Regulation
Authors: Anindya Bhattacharya, Nirmalya Chowdhury and Rajat K. DeIn this article, we discuss the basic challenges of clustering on gene expression data. In particular, we divide the methods of clustering into eight different categories. Then, we present specific characteristics pertinent to each clustering category. We compare the results of 27 clustering/biclustering algorithms on various gene expression datasets using different cluster validation indices. Comparison is made in terms of P -value on the best and three best clusters obtained by each algorithm along with overall results using z-score. Biclustering algorithms are also compared in terms of their capacity in handling overlapping biclusters. Finally, we provide some guidelines for the development of new clustering algorithms for gene expression data analysis. Availability of the software: The software for most of the existing clustering algorithms has been developed using C and Visual Basic languages, and can be executed on the Microsoft Windows platforms. The software may be downloaded as a zip file from http: //www.isical.ac.in/rajat. Then it needs to be installed. Two word files (included in the zip file) need to be consulted before installation and execution of the software.
-
-
-
13C Metabolic Flux Analysis: From the Principle to Recent Applications
Authors: Luciana Calheiros Gomes and Manuel SimoesMetabolic flux analysis (MFA) has become a fundamental tool for quantifying metabolic pathways, which is essential for in-depth understanding of biological systems. In experimentally based MFA, isotopically labeled substrates are supplied to a biological system and the resulting labeling patterns are analyzed to obtain internal flux information. Three main techniques are necessary for 13C MFA: (i) a steady state cell culture in a defined medium with 13C substrates; (ii) precise measurements of the labeling pattern of targeted metabolites by nuclear magnetic resonance (NMR) or mass spectrometry (MS); (iii) mathematical modeling for experimental design, data processing and flux calculation. Recently, important technical advances have been made. The high costs of labeled substrates generate a demand for small cell cultivation volumes. The development of analytical instruments allows the measurement of 13C enrichments with high accuracy and sensitivity. Moreover, powerful flux calculation algorithms have reduced computational efforts. Dynamic labeling experiments are also opening new possibilities for the investigation of specific pathways. While MFA is quite widely established in the study of microbial physiology, it is still a challenge to apply MFA to mammalian cells and plants. However, 13C MFA techniques are continuously enhanced to better discern compartmentalized behaviors, which can help to characterize diseased metabolic states and improve metabolic engineering efforts in plants and other complex systems. The main objective of this work is to present the basic experimental and analytical methods of 13C MFA, as well as representative examples of the latest approaches and findings of MFA in microorganisms, mammalian cells and plants.
-
-
-
Protein Aggregation in Neurodegenerative Diseases: Insights from Computational Analyses
Authors: Anita Sarkar, Sonu Kumar, Abhinav Grover and Durai SundarBiological aggregation is a process where bio-macromolecules such as proteins, lipids and nucleic acids essentially self-associate in an ordered fashion into functional complexes (that may be normal or pathological) and finally precipitate out due to formation of higher order conglomerates of low solubility. Neurodegenerative diseases are, in general, associated with the deposition of pathogenic aggregates composed of amyloid fibrils/plaques in tissues. Sequence analysis of proteins prone to aggregation has aided the evolution of accurate prediction algorithms now being used in designing aggregation-reducing mutations. Computational and experimental researches have proved that preventing aggregation does not necessarily prevent amyloidosis and vice-versa. Investigation of amyloid fibril formation with the help of these approaches shall lead to the understanding of the mechanism and prevention from various neurodegenerative diseases. We have observed in our past computational studies that there are certain “sequence breaker” amino acid residues which when placed in the aggregation-prone stretches, drastically affect the aggregation propensity and fibrillization activity. The sequence breaker concept is quite similar to that of the “gatekeeper residues” which have been explored earlier [1, 2]. On such grounds, we have studied α-synuclein, which is the pathogenic protein implicated in Parkinson's Disease (PD) as well as other neurodegenerative diseases, and identified a double mutant of A53T (familial PD-causing) mutant that increases its solubility, positively enhances its thermodynamic stability and nearly ends the aggregation propensity in the diseased state (which is a precursor to amyloidosis). Moreover, as protein aggregation is the key to control the symptoms of most neurodegenerative diseases, numerous small peptides (therapeutic drugs) as well as small molecules have been designed to target the aggregation-prone regions in individual experimental studies. This is also an important facet in biotherapeutics where constant efforts are being made to reduce protein aggregation. The focus of this article is to shed light on the recent technologies and developments in bioinformatics to investigate protein aggregation (with α-synuclein as a recurring example).
-
-
-
Bioinformatics Tools for Mass Spectroscopy-Based Metabolomic Data Processing and Analysis
Authors: Masahiro Sugimoto, Masato Kawakami, Martin Robert, Tomoyoshi Soga and Masaru TomitaBiological systems are increasingly being studied in a holistic manner, using omics approaches, to provide quantitative and qualitative descriptions of the diverse collection of cellular components. Among the omics approaches, metabolomics, which deals with the quantitative global profiling of small molecules or metabolites, is being used extensively to explore the dynamic response of living systems, such as organelles, cells, tissues, organs and whole organisms, under diverse physiological and pathological conditions. This technology is now used routinely in a number of applications, including basic and clinical research, agriculture, microbiology, food science, nutrition, pharmaceutical research, environmental science and the development of biofuels. Of the multiple analytical platforms available to perform such analyses, nuclear magnetic resonance and mass spectrometry have come to dominate, owing to the high resolution and large datasets that can be generated with these techniques. The large multidimensional datasets that result from such studies must be processed and analyzed to render this data meaningful. Thus, bioinformatics tools are essential for the efficient processing of huge datasets, the characterization of the detected signals, and to align multiple datasets and their features. This paper provides a state-of-the-art overview of the data processing tools available, and reviews a collection of recent reports on the topic. Data conversion, pre-processing, alignment, normalization and statistical analysis are introduced, with their advantages and disadvantages, and comparisons are made to guide the reader.
-
Volumes & issues
-
Volume 19 (2024)
-
Volume 18 (2023)
-
Volume 17 (2022)
-
Volume 16 (2021)
-
Volume 15 (2020)
-
Volume 14 (2019)
-
Volume 13 (2018)
-
Volume 12 (2017)
-
Volume 11 (2016)
-
Volume 10 (2015)
-
Volume 9 (2014)
-
Volume 8 (2013)
-
Volume 7 (2012)
-
Volume 6 (2011)
-
Volume 5 (2010)
-
Volume 4 (2009)
-
Volume 3 (2008)
-
Volume 2 (2007)
-
Volume 1 (2006)