Skip to content
2000
Volume 20, Issue 3
  • ISSN: 1574-8936
  • E-ISSN: 2212-392X

Abstract

Although enzymes have the advantage of efficient catalysis, natural enzymes lack stability in industrial environments and do not even meet the required catalytic reactions. This prompted us to urgently design new enzymes. As a powerful strategy, computational method can not only explore sequence space rapidly and efficiently, but also promote the design of new enzymes suitable for specific conditions and requirements, so it is very beneficial to design new industrial enzymes. Currently, there exists only one tool for enzyme generation, which exhibits suboptimal performance. We have selected several general protein sequence design tools and systematically evaluated their effectiveness when applied to specific industrial enzymes. We summarized the computational methods used for protein sequence generation into three categories: structure-conditional sequence generation, sequence generation without structural constraints, and co-generation of sequence and structure. To effectively evaluate the ability of the six computational tools to generate enzyme sequences, we first constructed a luciferase dataset named Luc_64. Then we assessed the quality of enzyme sequences generated by these methods on this dataset, including amino acid distribution, EC number validation, . We also assessed sequences generated by structure-based methods on existing public datasets using sequence recovery rates and root-mean-square deviation (RMSD) from a sequence and structure perspective. In the functionality dataset, Luc_64, ABACUS-R and ProteinMPNN stood out for producing sequences with amino acid distributions and functionalities closely matching those of naturally occurring luciferase enzymes, suggesting their effectiveness in preserving essential enzymatic characteristics. Across both benchmark datasets, ABACUS-R and ProteinMPNN, have also exhibited the highest sequence recovery rates, indicating their superior ability to generate sequences closely resembling the original enzyme structures. Our study provides a crucial reference for researchers selecting appropriate enzyme sequence design tools, highlighting the strengths and limitations of each tool in generating accurate and functional enzyme sequences. ProteinMPNN and ABACUS-R emerged as the most effective tools in our evaluation, offering high accuracy in sequence recovery and RMSD and maintaining the functional integrity of enzymes through accurate amino acid distribution. Meanwhile, the performance of protein general tools for migration to specific industrial enzymes was fairly evaluated on our specific industrial enzyme benchmark.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/0115748936303223240404043202
2024-04-15
2025-07-04
Loading full text...

Full text loading...

References

  1. KirkO. BorchertT.V. FuglsangC.C. Industrial enzyme applications.Curr. Opin. Biotechnol.200213434535110.1016/S0958‑1669(02)00328‑2
    [Google Scholar]
  2. AntwiN.Y.A. The potential of in-house pectinolytic enzymes for industrial application.Sci. Am.202321e01827
    [Google Scholar]
  3. VellardM. The enzyme as drug: Application of enzymes as pharmaceuticals.Curr. Opin. Biotechnol.200314444445010.1016/S0958‑1669(03)00092‑2
    [Google Scholar]
  4. CopelandR.A. HarpelM.R. TumminoP.J. Targeting enzyme inhibitors in drug discovery.Expert Opin. Ther. Targets200711796797810.1517/14728222.11.7.967
    [Google Scholar]
  5. PrydeD.C. DalvieD. HuQ. JonesP. ObachR.S. TranT-D. Aldehyde oxidase: An enzyme of emerging importance in drug discovery.J. Med. Chem.201053248441846010.1021/jm100888d
    [Google Scholar]
  6. ChenL. YuL. GaoL. Potent antibiotic design via guided search from antibacterial activity evaluations.Bioinformatics2023392btad05910.1093/bioinformatics/btad059
    [Google Scholar]
  7. ReedG. Enzymes in food processing.Academic Press19663483
    [Google Scholar]
  8. HowellE. Enzyme nutrition: The food enzyme concept.Penguin1995
    [Google Scholar]
  9. BakerR.A. WickerL. Current and potential applications of enzyme infusion in the food industry.Trends Food Sci. Technol.19967927928410.1016/0924‑2244(96)10030‑3
    [Google Scholar]
  10. ZhangY. HeS. SimpsonB.K. Enzymes in food bioprocessing — Novel food enzymes, applications, and related techniques.Curr. Opin. Food Sci.201819303510.1016/j.cofs.2017.12.007
    [Google Scholar]
  11. DuranN. DuranM. Enzyme applications in the textile industry.Rev. Prog. Color. Relat. Top.2000301414410.1111/j.1478‑4408.2000.tb03779.x
    [Google Scholar]
  12. MaurerK. Detergent proteases.Curr. Opin. Biotechnol.200415433033410.1016/j.copbio.2004.06.005
    [Google Scholar]
  13. MojsovK. Application of enzymes in the textile industry: A review. II International Congress"Engineering, Ecology and Materials in the Processing Industry. 09-11 March 2011;2011230239
    [Google Scholar]
  14. MadhuA. ChakrabortyJ.N. Developments in application of enzymes for textile processing.J. Clean. Prod.201714511413310.1016/j.jclepro.2017.01.013
    [Google Scholar]
  15. BesegattoS.V. CostaF.N. DamasM.S.P. Enzyme treatment at different stages of textile processing: a review.Ind. Biotechnol.201814629830710.1089/ind.2018.0018
    [Google Scholar]
  16. KimJ. JiaH. WangP. Challenges in biocatalysis for enzyme-based biofuel cells.Biotechnol. Adv.200624329630810.1016/j.biotechadv.2005.11.006
    [Google Scholar]
  17. CooneyM.J. Enzyme catalysed biofuel cells.Energy Environ. Sci.200813320337
    [Google Scholar]
  18. WillnerI. YanY-M. WillnerB. VeredT.R. Integrated enzyme-based biofuel cells-A review.Fuel Cells20099172410.1002/fuce.200800115
    [Google Scholar]
  19. BinodP. GnansounouE. SindhuR. PandeyA. Enzymes for second generation biofuels: Recent developments and future perspectives.Bioresour. Technol. Rep.2019531732510.1016/j.biteb.2018.06.005
    [Google Scholar]
  20. ClelandW.W. What limits the rate of an enzyme-catalyzed reaction.Acc. Chem. Res.19758514515110.1021/ar50089a001
    [Google Scholar]
  21. LiS. YangX. YangS. ZhuM. WangX. Technology prospecting on enzymes: Application, marketing and engineering.Comput. Struct. Biotechnol. J.201223e20120901710.5936/csbj.201209017
    [Google Scholar]
  22. DauparasJ. AnishchenkoI. BennettN. Robust deep learning–based protein sequence design using ProteinMPNN.Science20223786615495610.1126/science.add2187
    [Google Scholar]
  23. HallM.P. UnchJ. BinkowskiB.F. Engineered luciferase reporter from a deep sea shrimp utilizing a novel imidazopyrazinone substrate.ACS Chem. Biol.20127111848185710.1021/cb3002478
    [Google Scholar]
  24. PeracchiA. The limits of enzyme specificity and the evolution of metabolism.Trends Biochem. Sci.2018431298499610.1016/j.tibs.2018.09.015
    [Google Scholar]
  25. GigerL. CanerS. ObexerR. Evolution of a designed retro-aldolase leads to complete active site remodeling.Nat. Chem. Biol.20139849449810.1038/nchembio.1276
    [Google Scholar]
  26. YaoZ. BrennanC.K. ScipioniL. Multiplexed bioluminescence microscopy via phasor analysis.Nat. Methods202219789389810.1038/s41592‑022‑01529‑9
    [Google Scholar]
  27. WangY. PangC. WangY. Retrosynthesis prediction with an interpretable deep-learning framework based on molecular assembly tasks.Nat. Commun.2023141615510.1038/s41467‑023‑41698‑5
    [Google Scholar]
  28. JiangL. AlthoffE.A. ClementeF.R. De novo computational design of retro-aldol enzymes.Science200831958681387139110.1126/science.1152692
    [Google Scholar]
  29. RathbunC.M. PorterfieldW.B. JonesK.A. Parallel screening for rapid identification of orthogonal bioluminescent tools.ACS Cent. Sci.20173121254126110.1021/acscentsci.7b00394
    [Google Scholar]
  30. RöthlisbergerD. KhersonskyO. WollacottA.M. Kemp elimination catalysts by computational enzyme design.Nature2008453719219019510.1038/nature06879
    [Google Scholar]
  31. LombardiA. PirroF. MaglioO. ChinoM. DeGradoW.F. De novo design of four-helix bundle metalloproteins: one scaffold, diverse reactivities.Acc. Chem. Res.20195251148115910.1021/acs.accounts.8b00674
    [Google Scholar]
  32. BasantaB. BickM.J. BeraA.K. An enumerative algorithm for de novo design of proteins with diverse pocket structures.Proc. Natl. Acad. Sci.202011736221352214510.1073/pnas.2005412117
    [Google Scholar]
  33. YehH.W. KarmachO. JiA. CarterD. GreenM.M.M. AiH. Red-shifted luciferase–luciferin pairs for enhanced bioluminescence imaging.Nat. Methods2017141097197410.1038/nmeth.4400
    [Google Scholar]
  34. IsobeH. YamanakaS. KuramitsuS. YamaguchiK. Regulation mechanism of spin-orbit coupling in charge-transfer-induced luminescence of imidazopyrazinone derivatives.J. Am. Chem. Soc.2008130113214910.1021/ja073834r
    [Google Scholar]
  35. ZambitoG. ChawdaC. MezzanotteL. Emerging tools for bioluminescence imaging.Curr. Opin. Chem. Biol.202163869410.1016/j.cbpa.2021.02.005
    [Google Scholar]
  36. BranchiniB.R. BehneyC.E. SouthworthT.L. Experimental support for a single electron-transfer oxidation mechanism in firefly bioluminescence.J. Am. Chem. Soc.2015137247592759510.1021/jacs.5b03820
    [Google Scholar]
  37. BaierF. CoppJ.N. TokurikiN. Evolution of enzyme superfamilies: Comprehensive exploration of sequence-Function relationships.Biochemistry201655466375638810.1021/acs.biochem.6b00723
    [Google Scholar]
  38. PackerM.S. LiuD.R. Methods for the directed evolution of proteins.Nat. Rev. Genet.201516737939410.1038/nrg3927
    [Google Scholar]
  39. KissG. ÖlçümC.N. MorettiR. BakerD. HoukK.N. Computational enzyme design.Angew. Chem. Int. Ed.201352225700572510.1002/anie.201204077
    [Google Scholar]
  40. SharmaM. GargP. Computational approaches for enzyme functional class prediction: A review.Curr. Proteomics2014111172210.2174/1570164611666140415225013
    [Google Scholar]
  41. AhmedZ. ZulfiqarH. TangL. LinH. A statistical analysis of the sequence and structure of thermophilic and non-thermophilic proteins.Int. J. Mol. Sci.202223171011610.3390/ijms231710116
    [Google Scholar]
  42. YehA.H.W. NornC. KipnisY. De novo design of luciferases using deep learning.Nature2023614794977478010.1038/s41586‑023‑05696‑3
    [Google Scholar]
  43. ZengX. XiangH. YuL. Accurate prediction of molecular properties and drug targets using a self-supervised image representation learning framework.Nat. Mach. Intell.20224111004101610.1038/s42256‑022‑00557‑6
    [Google Scholar]
  44. ZouX. RenL. CaiP. Accurately identifying hemagglutinin using sequence information and machine learning methods.Front. Med.202310128188010.3389/fmed.2023.1281880
    [Google Scholar]
  45. ZhuW. YuanS-S. LiJ. HuangC-B. LinH. LiaoB. A first computational frame for recognizing heparin-binding protein.Diagnostics20231314246510.3390/diagnostics13142465
    [Google Scholar]
  46. AnishchenkoI. PellockS.J. ChidyausikuT.M. De novo protein design by deep network hallucination.Nature2021600788954755210.1038/s41586‑021‑04184‑w
    [Google Scholar]
  47. FerruzN. SchmidtS. HöckerB. ProtGPT2 is a deep unsupervised language model for protein design.Nat. Commun.2022131434810.1038/s41467‑022‑32007‑7
    [Google Scholar]
  48. MadaniA. KrauseB. GreeneE.R. Large language models generate functional protein sequences across diverse families.Nat. Biotechnol.20234181099110610.1038/s41587‑022‑01618‑2
    [Google Scholar]
  49. MunsamyG. ZymCTRL: A conditional language model for the controllable generation of artificial enzymes.Machine Learning for Structural Biology Workshop, NeurIPS2022116
    [Google Scholar]
  50. LiuY. ZhangL. WangW. Rotamer-free protein sequence design based on deep learning and self-consistency.Nat Computat Sci20222745146210.1038/s43588‑022‑00273‑6
    [Google Scholar]
  51. AnandN. EguchiR. MathewsI.I. Protein sequence design with a learned potential.Nat. Commun.202213174610.1038/s41467‑022‑28313‑9
    [Google Scholar]
  52. LiA.J. TERMinator: A neural framework for structure-based protein design using tertiary repeating motifsarXiv:2204130482022
  53. GaoZ. TanC. LiS.Z. PiFold: Toward effective and efficient protein inverse folding.arXiv:2209126432023
  54. HuangB. FanT. WangK. Accurate and efficient protein sequence design through learning concise local environment of residues.Bioinformatics2023393btad12210.1093/bioinformatics/btad122
    [Google Scholar]
  55. AnandN. AchimT. Protein structure and sequence generation with equivariant denoising diffusion probabilistic models.arXiv:2205150192022
  56. McPartlonM. LaiB. XuJ. A deep SE (3)-equivariant model for learning inverse protein folding.bioRxiv202220220410.1101/2022.04.15.488492
    [Google Scholar]
  57. ChengH. RaoB. LiuL. PepFormer: End-to-End transformer-based siamese network to predict and enhance peptide detectability based on sequence only.Anal. Chem.202193166481649010.1021/acs.analchem.1c00354
    [Google Scholar]
  58. JinQ. MengZ. PhamT.D. ChenQ. WeiL. SuR. DUNet: A deformable network for retinal vessel segmentation.Knowl. Base. Syst.201917814916210.1016/j.knosys.2019.04.025
    [Google Scholar]
  59. ChenJ. ZouQ. LiJ. DeepM6ASeq-EL: Prediction of human N6-methyladenosine (m6A) sites with LSTM and ensemble learning.Front. Comput. Sci.202216216230210.1007/s11704‑020‑0180‑0
    [Google Scholar]
  60. SimonsK.T. KooperbergC. HuangE. BakerD. Assembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and bayesian scoring functions.J. Mol. Biol.1997268120922510.1006/jmbi.1997.0959
    [Google Scholar]
  61. ChenZ. WangX. ChenX. Accelerating therapeutic protein design with computational approaches toward the clinical stage.Comput. Struct. Biotechnol. J.2023212909292610.1016/j.csbj.2023.04.027
    [Google Scholar]
  62. HuangT. LiY. Current progress, challenges, and future perspectives of language models for protein representation and protein design.Innovation202344
    [Google Scholar]
  63. HanreichS. BonandiE. DrienovskáI. Design of artificial enzymes: Insights into protein scaffolds.ChemBioChem2022246e202200566
    [Google Scholar]
  64. BhatP. PatilN. An exhaustive review of computational prediction techniques for PPI sites, protein locations, and protein functions.Netw. Model. Anal. Health Inform. Bioinform.2023123110.1007/s13721‑023‑00427‑0
    [Google Scholar]
  65. MalbrankeC. BikardD. CoccoS. MonassonR. TubianaJ. Ferritin nanocage: A versatile platform for nanozyme design. 4.J. Mater. Chem. B20231141534170
    [Google Scholar]
  66. WangJ. ChenC. YaoG. Intelligent protein design and molecular characterization techniques: A comprehensive review.Molecules202328237865
    [Google Scholar]
  67. MalbrankeC. BikardD. CoccoS. MonassonR. TubianaJ. Machine learning for evolutionary-based and physics-inspired protein design: Current and future synergies.Curr. Opin. Struct. Biol.20238010257110.1016/j.sbi.2023.102571
    [Google Scholar]
  68. KhakzadH. IgashovI. SchneuingA. GoverdeC. BronsteinM. CorreiaB. A new age in protein design empowered by deep learning.Cell Syst.2023141192593910.1016/j.cels.2023.10.006
    [Google Scholar]
  69. WangX. Possibilities of using de novo design for generating diverse functional food enzymes.Int. J. Molecul Sci.20232443827
    [Google Scholar]
  70. PengC.X. LiangF. XiaY.H. Recent advances and challenges in protein structure prediction.J. Chem. Inform Model.20236417695
    [Google Scholar]
  71. BrandesN. OferD. PelegY. ProteinBERT: A universal deep-learning model of protein sequence and function.Bioinformatics202038821022110
    [Google Scholar]
  72. NornC. Protein sequence design by explicit energy landscape optimization.BioRxiv2020
    [Google Scholar]
  73. HsuC. Learning inverse folding from millions of predicted structures.bioRxiv10.1101/2022.04.10.487779
    [Google Scholar]
  74. SillitoeI. BordinN. DawsonN. CATH: Increased structural coverage of functional space.Nucleic Acids Res.202149D1D266D27310.1093/nar/gkaa1079
    [Google Scholar]
  75. ChangA. JeskeL. UlbrichS. BRENDA, the ELIXIR core data resource in 2021: New developments and updates.Nucleic Acids Res.202149D1D498D50810.1093/nar/gkaa1025
    [Google Scholar]
  76. BurleyS.K. Protein Data Bank (PDB): The single global macromolecular structure archive. In: Protein crystallography: methods and protocols.201762741
    [Google Scholar]
  77. ZhouJ. GrigoryanG. Rapid search for tertiary fragments reveals protein sequence–structure relationships.Protein Sci.201524450852410.1002/pro.2610
    [Google Scholar]
  78. AoC. YeX. SakuraiT. ZouQ. YuL. m5U-SVM: Identification of RNA 5-methyluridine modification sites based on multi-view features of physicochemical features and distributed representation.BMC Biol.20232119310.1186/s12915‑023‑01596‑0
    [Google Scholar]
  79. WangY. ZhaiY. DingY. ZouQ. SBSM-Pro: Support Bio-sequence Machine for ProteinsarXiv:2308102752023
    [Google Scholar]
  80. LiH.L. PangY.H. LiuB. BioSeq-BLM: A platform for analyzing DNA, RNA and protein sequences based on biological language models.Nucleic Acids Res.20214922e12910.1093/nar/gkab829
    [Google Scholar]
  81. LiH. LiuB. BioSeq-Diabolo: Biological sequence similarity analysis using Diabolo.PLOS Comput. Biol.2023196e101121410.1371/journal.pcbi.1011214
    [Google Scholar]
  82. JingB. Learning from protein structure with geometric vector perceptrons.arXiv:2009014112020
    [Google Scholar]
  83. JumperJ. EvansR. PritzelA. Highly accurate protein structure prediction with AlphaFold.Nature2021596787358358910.1038/s41586‑021‑03819‑2
    [Google Scholar]
  84. VaradiM. AnyangoS. DeshpandeM. AlphaFold Protein Structure Database: Massively expanding the structural coverage of protein-sequence space with high-accuracy models.Nucleic Acids Res.202250D1D439D44410.1093/nar/gkab1061
    [Google Scholar]
  85. ZengX. WangF. LuoY. Deep generative molecular design reshapes drug discovery.Cell Rep. Med.202231210079410.1016/j.xcrm.2022.100794
    [Google Scholar]
  86. TangY.J. PangY.H. LiuB. IDP-Seq2Seq: Identification of intrinsically disordered regions based on sequence to sequence learning.Bioinformatics202136215177518610.1093/bioinformatics/btaa667
    [Google Scholar]
  87. StrokachA. BecerraD. VergeC.C. RibaP.A. KimP.M. Fast and flexible protein design using deep graph neural networks.Cell Syst.2020114402411.e410.1016/j.cels.2020.08.016
    [Google Scholar]
  88. TanC. Global-context aware generative protein design.ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).04-10 June 2023; Rhodes Island, Greece.202310.1109/ICASSP49357.2023.10095229
    [Google Scholar]
  89. IngrahamJ. Generative models for graph-based protein design.Adv. Neural Inf. Process. Syst.2019201932
    [Google Scholar]
  90. SgarbossaD. LupoU. BitbolA.F. Generative power of a protein language model trained on multiple sequence alignments.eLife202312e7985410.7554/eLife.79854
    [Google Scholar]
  91. HusicB.E. PandeV.S. Markov state models: From an art to a science.J. Am. Chem. Soc.201814072386239610.1021/jacs.7b12191
    [Google Scholar]
  92. BatemanA. MartinM-J. OrchardS. UniProt: The universal protein knowledgebase in 2021.Nucleic Acids Res.202149D1D480D48910.1093/nar/gkaa1100
    [Google Scholar]
  93. RepeckaD. JauniskisV. KarpusL. Expanding functional protein sequence spaces using generative adversarial networks.Nat. Mach. Intell.20213432433310.1038/s42256‑021‑00310‑5
    [Google Scholar]
  94. CastroE. GodavarthiA. RubinfienJ. GivechianK. BhaskarD. KrishnaswamyS. Transformer-based protein generation with regularized latent space optimization.Nat. Mach. Intell.202241084085110.1038/s42256‑022‑00532‑1
    [Google Scholar]
  95. BaekM. DiMaioF. AnishchenkoI. Accurate prediction of protein structures and interactions using a three-track neural network.Science2021373655787187610.1126/science.abj8754
    [Google Scholar]
  96. WangJ. LisanzaS. JuergensD. Scaffolding protein functional sites using deep learning.Science2022377660438739410.1126/science.abn2100
    [Google Scholar]
  97. ShiC. Protein sequence and structure co-design with equivariant translation.arXiv:2210087612022
    [Google Scholar]
  98. ThompsonR.H.S. Classification and nomenclature of enzymes.Science1962137352840540810.1126/science.137.3528.405
    [Google Scholar]
  99. NathN. MitchellJ.B.O. Is EC class predictable from reaction mechanism?BMC Bioinformatics20121316010.1186/1471‑2105‑13‑60
    [Google Scholar]
  100. YuT. CuiH. LiJ.C. LuoY. JiangG. ZhaoH. Enzyme function prediction using contrastive learning.Science202337966391358136310.1126/science.adf2465
    [Google Scholar]
  101. SmithM.H. The amino acid composition of proteins.J. Theor. Biol.19661326128210.1016/0022‑5193(66)90021‑X
    [Google Scholar]
  102. NakashimaH. NishikawaK. OoiT. The folding type of a protein is relevant to the amino acid composition.J. Biochem.198699115316210.1093/oxfordjournals.jbchem.a135454
    [Google Scholar]
  103. BullerR. LutzS. KazlauskasR.J. SnajdrovaR. MooreJ.C. BornscheuerU.T. From nature to industry: Harnessing enzymes for biocatalysis.Science20233826673eadh861510.1126/science.adh8615
    [Google Scholar]
  104. KoubaP. KohoutP. HaddadiF. Machine learning-guided protein engineering.ACS Catal.20231321138631389510.1021/acscatal.3c02743
    [Google Scholar]
/content/journals/cbio/10.2174/0115748936303223240404043202
Loading
/content/journals/cbio/10.2174/0115748936303223240404043202
Loading

Data & Media loading...

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test