
Full text loading...
We use cookies to track usage and preferences.I Understand
The objective of this research is to demonstrate that alignment-free bioinformatics approaches are effective tools for analyzing the similarity and dissimilarity of protein sequences. All numerical parameters representing sequences are expressed analytically, ensuring precision, clarity, and efficient processing, even for large datasets and long sequences. Additionally, a novel approach for identifying previously unknown virus strains is introduced.
A novel approach is proposed, integrating the unique features of our newly developed method, the 20D-Dynamic Representation of Protein Sequences, with the K-means clustering algorithm. The sequences are represented as clouds of material points in a 20-dimensional space (20D-dynamic graphs), with their spatial distribution being unique to each protein sequence. The numerical parameters, referred to as descriptors in molecular similarity theory, represent quantities characteristic of dynamic systems and serve as input data for the K-means clustering algorithm.
Examples of the application of the approach are presented, including projections of the 20D-dynamic graphs onto 3D spaces, which serve as a visual tool for comparing sequences. Additionally, cluster plots for the analyzed sequences are provided using the proposed method.
It has been demonstrated that the 20D-Dynamic Representation of Protein Sequences, combined with the K-means clustering algorithm, successfully classifies subtypes of influenza A virus strains.
Article metrics loading...
Full text loading...
References
Data & Media loading...