NaturePred: A Tool for Revolutionizing Natural Product Classification with Artificial Intelligence

Sharanbasappa D. Madival; Dwijesh Chandra Mishra; Krishna Kumar Chaturvedi; Anu Sharma; Neeraj Budhlakoti; Ulavappa B. Angadi; Pavana Basavaraja; Mohammad Samir Farooqi; Sudhir Srivastava; Girish Kumar Jha

doi:10.2174/0115701646322417241101055512

ISSN: 1570-1646
E-ISSN: 1875-6247

NaturePred: A Tool for Revolutionizing Natural Product Classification with Artificial Intelligence
Authors: Sharanbasappa D. Madival¹, Dwijesh Chandra Mishra, Krishna Kumar Chaturvedi, Anu Sharma, Neeraj Budhlakoti, Ulavappa B. Angadi, Pavana Basavaraja¹, Mohammad Samir Farooqi, Sudhir Srivastava and Girish Kumar Jha
View Affiliations Hide Affiliations

¹ The Graduate School, ICAR-IARI, New Delhi-110012, India ; ² ²Division of Agricultural Bioinformatics, ICAR-IASRI, New Delhi- 110012, India ; ³ ³Division of Computer Aplications, ICAR-IASRI, New Delhi-110012, India
Source: Current Proteomics, Volume 21, Issue 5, Dec 2024, p. 429 - 436
DOI: https://doi.org/10.2174/0115701646322417241101055512
- Received: 09 Apr 2024
- Accepted: 10 Oct 2024
- Available online: 01 Jan 2025

Abstract

Background

The identification and classification of natural products are vital in drug discovery and bioactive compound exploration. Traditional methods are laborious and time-consuming, necessitating innovative tools for accurate predictions using advanced AI techniques.

Objectives

This paper presents NaturePred, a user-friendly tool designed to predict the class of natural products and calculate eight physicochemical properties of protein sequences. It aims to accurately predict five distinct classes of natural product biosynthetic gene clusters (BGCs): Polyketide Synthases (PKS), Non-ribosomal Peptide Synthetases (NRPS), Ribosomally Synthesized and Post-Translationally Modified Peptides (RiPPs), Terpenes, and PKS-NRPS Hybrids. It also addresses reliability in multi-class classification with a 90% confidence score threshold.

Methods

NaturePred offers three input options: single protein sequence, CSV file, or GenBank (.gbk) file. It uses a pipeline with a Natural Language Processing model based on TF-IDF (Term Frequency- Inverse Document Frequency) and a Logistic Regression classifier. Predictions are made if the confidence score exceeds 90%; otherwise, “None of the above class” is predicted. Evaluation with unseen data from the MiBIG database shows high accuracy (~96%) in assigning BGCs.

Results

NaturePred provides accurate predictions with high confidence scores, demonstrating reliability across different datasets. It calculates eight physicochemical properties of protein sequences, offering valuable insights for further analysis.

Conclusion

NaturePred's integrated features, including versatile input options, accurate predictions, and physicochemical property calculations, make it an indispensable tool in natural product research. By addressing classification challenges, NaturePred facilitates drug discovery and bioactive compound exploration, advancing the field. Tool available: (http://login1.cabgrid.res.in:5101/).

Article metrics loading...

/content/journals/cp/10.2174/0115701646322417241101055512

2025-01-01

2026-02-21

From This Site

/content/journals/cp/10.2174/0115701646322417241101055512

dcterms_title,dcterms_subject,pub_keyword

-contentType:Contributor -contentType:Concept -contentType:Institution

10

5

Full text loading...

References

NewmanD.J. CraggG.M. Natural products as sources of new drugs over the nearly four decades from 01/1981 to 09/2019.J. Nat. Prod.2022853500516 32162523
[Google Scholar]
ButlerM.S. RobertsonA.A.B. CooperM.A. Natural product and natural product derived drugs in clinical trials.Nat. Prod. Rep.201431111612166110.1039/C4NP00064A 25204227
[Google Scholar]
DemainA.L. Pharmaceutically active secondary metabolites of microorganisms.Appl. Microbiol. Biotechnol.199952445546310.1007/s002530051546 10570792
[Google Scholar]
GrisoniF. MerkD. ByrneR. SchneiderG. Scaffold-hopping from synthetic drugs by holistic molecular representation.Sci. Rep.2018811646910.1038/s41598‑018‑34677‑0 30405170
[Google Scholar]
MedemaM.H. BlinK. CimermancicP. de JagerV. ZakrzewskiP. FischbachM.A. WeberT. TakanoE. BreitlingR. antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences.Nucleic Acids Res.201139Web Server issue)(Suppl. 2W339W34610.1093/nar/gkr466 21672958
[Google Scholar]
SkinniderM.A. MerwinN.J. JohnstonC.W. MagarveyN.A. PRISM 3: expanded prediction of natural product chemical structures from microbial genomes.Nucleic Acids Res.201745W1W49W5410.1093/nar/gkx320 28460067
[Google Scholar]
CimermancicP. MedemaM.H. ClaesenJ. KuritaK. Wieland BrownL.C. MavrommatisK. PatiA. GodfreyP.A. KoehrsenM. ClardyJ. BirrenB.W. TakanoE. SaliA. LiningtonR.G. FischbachM.A. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters.Cell2014158241242110.1016/j.cell.2014.06.034 25036635
[Google Scholar]
WamboP.A. ML-Miner: A machine learning tool used for identification of novel biosynthetic gene clusters.2022
[Google Scholar]
MedemaM.H. FischbachM.A. Computational approaches to natural product discovery.Nat. Chem. Biol.201511963964810.1038/nchembio.1884 26284671
[Google Scholar]
MishraD.C. MadivalS.D. SharmaA. KumarS. MajiA.K. BudhlakotiN. SinhaD. RaiA. A deep clustering-based novel approach for binning of metagenomics data.Curr. Genomics202223535336810.2174/1389202923666220928150100 36778191
[Google Scholar]
Van RossumG. DrakeF.L. Python 3 Reference Manual.Scotts Valley, CACreateSpace2009
[Google Scholar]
PedregosaF. VaroquauxG. GramfortA. MichelV. ThirionB. GriselO. Scikit-learn: Machine learning in Python.J. Mach. Learn. Res.201112Oct28252830
[Google Scholar]
CholletF. Keras 3: Deep learning for humans.2015Available from: https://github.com/fchollet/keras (accessed on 8-10-2024)
[Google Scholar]
AbadiM. BarhamP. ChenJ. ChenZ. DavisA. DeanJ. TensorFlow: A system for large-scale machine learning.12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16) 02 Nov2016USA265283
[Google Scholar]
CockP.J.A. AntaoT. ChangJ.T. ChapmanB.A. CoxC.J. DalkeA. FriedbergI. HamelryckT. KauffF. WilczynskiB. de HoonM.J.L. Biopython: freely available Python tools for computational molecular biology and bioinformatics.Bioinformatics200925111422142310.1093/bioinformatics/btp163 19304878
[Google Scholar]
HarrisC.R. MillmanK.J. van der WaltS.J. GommersR. VirtanenP. CournapeauD. WieserE. TaylorJ. BergS. SmithN.J. KernR. PicusM. HoyerS. van KerkwijkM.H. BrettM. HaldaneA. del RíoJ.F. WiebeM. PetersonP. Gérard-MarchantP. SheppardK. ReddyT. WeckesserW. AbbasiH. GohlkeC. OliphantT.E. Array programming with NumPy.Nature2020585782535736210.1038/s41586‑020‑2649‑2 32939066
[Google Scholar]
McKinneyW. Data structures for statistical computing in Python.Proceeding of the 9th Python in Science Conference; SCIPY201010.25080/Majora‑92bf1922‑00a
[Google Scholar]
KautsarS.A. BlinK. ShawS. Navarro-MuñozJ.C. TerlouwB.R. van der HooftJ.J.J. van SantenJ.A. TracannaV. Suarez DuranH.G. Pascal AndreuV. Selem-MojicaN. AlanjaryM. RobinsonS.L. LundG. EpsteinS.C. SistoA.C. CharkoudianL.K. CollemareJ. LiningtonR.G. WeberT. MedemaM.H. MIBiG 2.0: a repository for biosynthetic gene clusters of known function.Nucleic Acids Res.202048D1D454D458 31612915
[Google Scholar]
MadivalS.D. JhaG.K. MishraD.C. KumarS. BudhlakotiN. SharmaA. ChaturvediK.K. KabilanS. FarooqiM.S. SrivastavaS. A novel deep contrastive convolutional autoencoder based binning approach for taxonomic independent metagenomics data.J. Plant Biochem. Biotechnol.202411110.1007/s13562‑024‑00911‑2
[Google Scholar]
MikolovT. Efficient estimation of word representations in vector space.arXiv:1301.37812013
[Google Scholar]
HosmerD.W.Jr LemeshowS. SturdivantR.X. Applied Logistic Regression.John Wiley & Sons201310.1002/9781118548387
[Google Scholar]
GuoG. WangH. BellD. BiY. GreerK. KNN model-based approach in classification.On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE - OTM Confederated International Conferences, CoopIS, DOA, and ODBASE 2003Catania, Sicily, Italy, Nov 3-7200311210.1007/978‑3‑540‑39964‑3_62
[Google Scholar]
LewisD.D. Naive (Bayes) at forty: The independence assumption in information retrieval.European Conference on Machine Learning199841510.1007/BFb0026666
[Google Scholar]
LohW.Y. Classification and regression trees.Wiley Interdiscip. Rev. Data Min. Knowl. Discov.201111142310.1002/widm.8
[Google Scholar]
BreimanL. Random forests.Mach. Learn.200145153210.1023/A:1010933404324
[Google Scholar]
CortesC. VapnikV. Support-vector networks.Mach. Learn.199520327329710.1007/BF00994018
[Google Scholar]
ChenT. GuestrinC. Xgboost: A scalable tree boosting system.Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAug 13-17, 2016California, San Francisco, USA78579410.1145/2939672.2939785
[Google Scholar]
ProkhorenkovaL. GusevG. VorobevA. DorogushA.V. GulinA. CatBoost: Unbiased boosting with categorical features.Adv. Neural Inf. Process. Syst.2018201831
[Google Scholar]
RosenblattF. Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms.Washington, DCSpartan Books1962Vol. 55
[Google Scholar]
ChawlaN.V. BowyerK.W. HallL.O. KegelmeyerW.P. SMOTE: Synthetic minority over-sampling technique.J. Artif. Intell. Res.20021632135710.1613/jair.953
[Google Scholar]
MishraD.C. MadivalS.D. SharmaA. BudhlakotiN. ChaturvediK.K. AngadiU.B. Enhancing the classification of biosynthetic gene clusters through comprehensive NLP-based approach.Preprints202310.1564.v1202310.20944/preprints202310.1564.v1
[Google Scholar]

/content/journals/cp/10.2174/0115701646322417241101055512

NaturePred: A Tool for Revolutionizing Natural Product Classification with Artificial Intelligence

Curr. Proteomics 21, 429 (2024); https://doi.org/10.2174/0115701646322417241101055512

/content/journals/cp/10.2174/0115701646322417241101055512

Data & Media loading...

Article Type: Research Article

Keyword(s): Biosynthetic gene clusters; hybrid PKS-NRPS; machine learning; natural language processing; natural products; physicochemical properties

NaturePred: A Tool for Revolutionizing Natural Product Classification with Artificial Intelligence

Abstract

From This Site

Most Read This Month

Most Cited Most Cited RSS feed

Subtype Classification by Polymerase and Gag Genes of HIV-1 Iranian Sequences Registered in the NCBI GenBank

Proteomic Analysis of the Vitreous Body in Proliferative and Non-Proliferative Diabetic Retinopathy

Prediction of Prophylactic Peptide Vaccine Candidates for Human Papillomavirus (HPV): Immunoinformatics and Reverse Vaccinology Approaches

Proteomic Investigations to Assess the Impact of Salinity on Vigna radiata L. Genotypes

In Silico Structural and Functional Analysis of Bacillus Uricases

Cloning, Expression and Biochemical Characterization of the Recombinant α-amylase from Bacillus subtilis YX48

Osteoarthritis: Insights into Potential Causes and Biomarkers from Articular Fluid Metabonomics

Soy Protein Remnants Digested by Gastro-duodenal Proteases can Alter Microbial Interactions and Intestinal Cholesterol Absorption

Analysis of the Non-Specific Binding Proteins in the RNA Pull-Down Experiment

Computer-Aided Design of a Novel Poly-Epitope Protein in Fusion with an Adjuvant as a Vaccine Candidate Against Leptospirosis