Skip to content
2000
Volume 20, Issue 1
  • ISSN: 1574-8936
  • E-ISSN: 2212-392X

Abstract

Background

The application of deep generative models for molecular discovery has witnessed a significant surge in recent years. Currently, the field of molecular generation and molecular optimization is predominantly governed by autoregressive models regardless of how molecular data is represented. However, an emerging paradigm in the generation domain is diffusion models, which treat data non-autoregressively and have achieved significant breakthroughs in areas such as image generation.

Methods

The potential and capability of diffusion models in molecular generation and optimization tasks remain largely unexplored. In order to investigate the potential applicability of diffusion models in the domain of molecular exploration, we proposed DiffSeqMol, a molecular sequence generation model, underpinned by diffusion process.

Results & Discussion

DiffSeqMol distinguishes itself from traditional autoregressive methods by its capacity to draw samples from random noise and direct generating the entire molecule. Through experiment evaluations, we demonstrated that DiffSeqMol can achieve, even surpass, the performance of established state-of-the-art models on unconditional generation tasks and molecular optimization tasks.

Conclusion

Taken together, our results show that DiffSeqMol can be considered a promising molecular generation method. It opens new pathways to traverse the expansive chemical space and to discover novel molecules.

Loading

Article metrics loading...

/content/journals/cbio/10.2174/0115748936285493240307071916
2024-04-01
2025-01-31
Loading full text...

Full text loading...

References

  1. ZengX. WangF. LuoY. KangS. TangJ. LightstoneF.C. FangE.F. CornellW. NussinovR. ChengF. Deep generative molecular design reshapes drug discovery.Cell Rep. Med.202231210079410.1016/j.xcrm.2022.10079436306797
    [Google Scholar]
  2. SuR. YangH. WeiL. ChenS. ZouQ. A multi-label learning model for predicting drug-induced pathology in multi-organ based on toxicogenomics data.PLOS Comput. Biol.2022189e101040210.1371/journal.pcbi.101040236070305
    [Google Scholar]
  3. WangF. DingY. LeiX. LiaoB. WuF-X. Machine learning and deep learning strategies in drug repositioning.Curr. Bioinform.202217321723710.2174/1574893616666211119093100
    [Google Scholar]
  4. ButlerK.T. DaviesD.W. CartwrightH. IsayevO. WalshA. Machine learning for molecular and materials science.Nature2018559771554755510.1038/s41586‑018‑0337‑230046072
    [Google Scholar]
  5. MengY. LuC. JinM. XuJ. ZengX. YangJ. A weighted bilinear neural collaborative filtering approach for drug repositioning.Brief. Bioinform.2022232bbab58110.1093/bib/bbab58135039838
    [Google Scholar]
  6. PanX. LinX. CaoD. ZengX. YuP.S. HeL. NussinovR. ChengF. Deep learning for drug repurposing: Methods, databases, and applications.Wiley Interdiscip. Rev. Comput. Mol. Sci.2022124e159710.1002/wcms.1597
    [Google Scholar]
  7. JinJ. YuY. WangR. ZengX. PangC. JiangY. LiZ. DaiY. SuR. ZouQ. NakaiK. WeiL. iDNA-ABF: Multi-scale deep biological language learning model for the interpretable prediction of DNA methylations.Genome Biol.202223121910.1186/s13059‑022‑02780‑136253864
    [Google Scholar]
  8. WangR. JiangY. JinJ. YinC. YuH. WangF. FengJ. SuR. NakaiK. ZouQ. WeiL. DeepBIO: an automated and interpretable deep-learning platform for high-throughput biological sequence prediction, functional annotation and visualization analysis.Nucleic Acids Res.20235173017302910.1093/nar/gkad05536796796
    [Google Scholar]
  9. ElmanJ.L. Finding structure in time.Cogn. Sci.199014217921110.1207/s15516709cog1402_1
    [Google Scholar]
  10. VaswaniA. Attention is all you need.Adv. Neural Inf. Process. Syst.201730
    [Google Scholar]
  11. YanK. LvH. GuoY. ChenY. WuH. LiuB. TPpred-ATMV: Therapeutic peptide prediction by adaptive multi-view tensor learning model.Bioinformatics202238102712271810.1093/bioinformatics/btac20035561206
    [Google Scholar]
  12. JiZ. LeeN. FrieskeR. YuT. SuD. XuY. IshiiE. BangY.J. MadottoA. FungP. Survey of hallucination in natural language generation.ACM Comput. Surv.2023551213810.1145/3571730
    [Google Scholar]
  13. WeiningerD. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules.J. Chem. Inf. Comput. Sci.1988281313610.1021/ci00057a005
    [Google Scholar]
  14. WangY. ZhaiY. DingY. ZouQ. SBSM-Pro: Support bio-sequence machine for proteins.arXiv:2308.102752023
    [Google Scholar]
  15. LiuX.W. ShiT.Y. GaoD. MaC.Y. LinH. YanD. DengK.J. iPADD: A computational tool for predicting potential antidiabetic drugs using machine learning algorithms.J. Chem. Inf. Model.202363154960496910.1021/acs.jcim.3c0056437499224
    [Google Scholar]
  16. YangY. GaoD. XieX. QinJ. LiJ. LinH. YanD. DengK. DeepIDC: A prediction framework of injectable drug combination based on heterogeneous information and deep learning.Clin. Pharmacokinet.202261121749175910.1007/s40262‑022‑01180‑936369328
    [Google Scholar]
  17. SeglerM.H.S. KogejT. TyrchanC. WallerM.P. Generating focused molecule libraries for drug discovery with recurrent neural networks.ACS Cent. Sci.20184112013110.1021/acscentsci.7b0051229392184
    [Google Scholar]
  18. KotsiasP.C. Arús-PousJ. ChenH. EngkvistO. TyrchanC. BjerrumE.J. Direct steering of de novo molecular generation with descriptor conditional recurrent neural networks.Nat. Mach. Intell.20202525426510.1038/s42256‑020‑0174‑5
    [Google Scholar]
  19. FromerJ.C. ColeyC.W. Computer-aided multi-objective optimization in small molecule discovery.Patterns20234210067810.1016/j.patter.2023.100678
    [Google Scholar]
  20. ChenY. WangZ. WangL. WangJ. LiP. CaoD. ZengX. YeX. SakuraiT. Deep generative model for drug design from protein target sequence.J. Cheminform.20231513810.1186/s13321‑023‑00702‑236978179
    [Google Scholar]
  21. WangJ. ChuY. MaoJ. JeonH.N. JinH. ZebA. JangY. ChoK.H. SongT. NoK.T. De novo molecular design with deep molecular generative models for PPI inhibitors.Brief. Bioinform.2022234bbac28510.1093/bib/bbac28535830870
    [Google Scholar]
  22. KingmaD.P. WellingM. Auto-encoding variational bayesInternational Conference on Learning Representations, ICLR2013
    [Google Scholar]
  23. JinW. BarzilayR. JaakkolaT. Junction tree variational autoencoder for molecular graph generation.arXiv:1802.043642018
    [Google Scholar]
  24. Flam-ShepherdD. ZhuK. Aspuru-GuzikA. Language models can learn complex molecular distributions.Nat. Commun.2022131329310.1038/s41467‑022‑30839‑x35672310
    [Google Scholar]
  25. WaltersW.P. BarzilayR. Applications of deep learning in molecule generation and molecular property prediction.Acc. Chem. Res.202154226327010.1021/acs.accounts.0c0069933370107
    [Google Scholar]
  26. HoJ. JainA. AbbeelP. Denoising diffusion probabilistic models.Adv. Neural Inf. Process. Syst.20203368406851
    [Google Scholar]
  27. Sohl-DicksteinJ. WeissE. A. MaheswaranathanN. GanguliS. Deep unsupervised learning using nonequilibrium thermodynamics.arXiv:1503.035852015
    [Google Scholar]
  28. HoJ. SalimansT. GritsenkoA. ChanW. NorouziM. FleetD.J. Video diffusion models.arXiv:2204.034582022
    [Google Scholar]
  29. KongZ. PingW. HuangJ. ZhaoK. CatanzaroB. Diffwave: A versatile diffusion model for audio synthesis.arXiv:2204.034582020
    [Google Scholar]
  30. RombachR. Blattmann A, Lorenz D, Esser P, Ommer B. High-resolution image synthesis with latent diffusion models.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern RecognitionNew Orleans, LA, USA 2022, pp. 10674-10685202210.1109/CVPR52688.2022.01042
    [Google Scholar]
  31. LiX. ThickstunJ. GulrajaniI. LiangP. HashimotoT.B. Diffusion-lm improves controllable text generation.arXiv:2205.142172022
    [Google Scholar]
  32. GaoZ. GuoJ. TanX. ZhuY. ZhangF. BianJ. Difformer: Empowering diffusion model on embedding space for text generation.arXiv:2212.094122022
    [Google Scholar]
  33. GongS. LiM. FengJ. WuZ. KongL. Diffuseq: Sequence to sequence text generation with diffusion models.arXiv:2210.089332023
    [Google Scholar]
  34. GuoZ. JianL. YanliW. MengruiC. DuolinW. DongX. Diffusion models in bioinformatics and computational biology.Nat. Rev. Bioeng.20232119
    [Google Scholar]
  35. YangL. ZhangZ. SongY. HongS. XuR. ZhaoY. ZhangW. CuiB. YangM-H. Diffusion models: A comprehensive survey of methods and applications.ACM Comput. Surv.202456413910.1145/3626235
    [Google Scholar]
  36. LuoS. ChenceS. MinkaiX. JianT. Predicting molecular conformation via dynamic graph score matching.Adv. Neural Inf. Process. Syst.2021341978419795
    [Google Scholar]
  37. HoogeboomE. SatorrasV.G. VignacC. WellingM. Equivariant diffusion for molecule generation in 3d.arXiv:2203.170032022
    [Google Scholar]
  38. WatsonJ.L. DavidJ. Nathaniel R.B. BrianL.T. JasonY. Broadly applicable and accurate protein design by integrating structure prediction networks and diffusion generative models.BioRxiv2022.12202210.1101/2022.12.09.519842
    [Google Scholar]
  39. XuM. YuL. SongY. ShiC. ErmonS. Geodiff: A geometric diffusion model for molecular conformation generation.arXiv:2203.029232022
    [Google Scholar]
  40. XuM. PowersA. DrorR. ErmonS. LeskovecJ. Geometric latent diffusion models for 3d molecule generation.arXiv:2305.011402023
    [Google Scholar]
  41. LinH. HuangY. LiuM. LiX. JiS. LiS.Z. Diffbp: Generative diffusion of 3d molecules for target protein binding.arXiv:2203.029232022
    [Google Scholar]
  42. CorsoG. StärkH. JingB. BarzilayR. JaakkolaT. Diffdock: Diffusion steps, twists, and turns for molecular docking.arXiv:2210.017762022
    [Google Scholar]
  43. VignacC. KrawczukI. SiraudinA. SiraudinB. CevherV. Digress: Discrete denoising diffusion for graph generation.arXiv:2209.147342022
    [Google Scholar]
  44. LeeS. JoJ. HwangS.J. Exploring chemical space with score-based out-of-distribution generation.arXiv:2206.076322023
    [Google Scholar]
  45. LiuS. LiY. LiZ. GitterA. LuJ. XuZ. A text-guided protein design framework.arXiv:2302.046112023
    [Google Scholar]
  46. NiB. KaplanD.L. BuehlerM.J. Generative design of de novo proteins based on secondary-structure constraints using an attention-based diffusion model.Chem2023971828184910.1016/j.chempr.2023.03.02037614363
    [Google Scholar]
  47. AvdeyevP. ShiC. TanY. DudnykK. ZhouJ. Dirichlet diffusion score model for biological sequence generation.arXiv:2302.046112023
    [Google Scholar]
  48. LiZ. YuhaoN. TimA.B AkashadityaD. GuoxuanX. Latent diffusion model for DNA sequence generation.arXiv:2310.061502023
    [Google Scholar]
  49. CreswellA. WhiteT. DumoulinV. ArulkumaranK. SenguptaB. BharathA.A. Generative adversarial networks: An overview.arXiv:1710.07035201810.1109/MSP.2017.2765202
    [Google Scholar]
  50. WangY. LuoX. ZouQ. Effector-GAN: Prediction of fungal effector proteins based on pretrained deep representation learning methods and generative adversarial networks.Bioinformatics202238143541354810.1093/bioinformatics/btac37435640972
    [Google Scholar]
  51. SennrichR. HaddowB. BirchA. Neural machine translation of rare words with subword units.arXiv:1508.079092015
    [Google Scholar]
  52. IrwinJ.J. ShoichetB.K. ZINC--a free database of commercially available compounds for virtual screening.J. Chem. Inf. Model.200545117718210.1021/ci049714+15667143
    [Google Scholar]
  53. BlumL.C. ReymondJ.L. 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13.J. Am. Chem. Soc.2009131258732873310.1021/ja902302h19505099
    [Google Scholar]
  54. HachmannJ. Olivares-AmayaR. Atahan-EvrenkS. Amador-BedollaC. Sánchez-CarreraR.S. Gold-ParkerA. VogtL. BrockwayA.M. Aspuru-GuzikA. The Harvard clean energy project: Large-scale computational screening and design of organic photovoltaics on the world community grid.J. Phys. Chem. Lett.20112172241225110.1021/jz200866s
    [Google Scholar]
  55. St JohnP.C. PhillipsC. KemperT.W. WilsonA.N. GuanY. CrowleyM.F. NimlosM.R. LarsenR.E. Message-passing neural networks for high-throughput polymer screening.J. Chem. Phys.20191502323411110.1063/1.509913231228909
    [Google Scholar]
  56. KimS. ChenJ. ChengT. GindulyteA. HeJ. HeS. LiQ. ShoemakerB.A. ThiessenP.A. YuB. ZaslavskyL. ZhangJ. BoltonE.E. PubChem 2019 update: Improved access to chemical data.Nucleic Acids Res.201947D1D1102D110910.1093/nar/gky103330371825
    [Google Scholar]
  57. OlivecronaM. BlaschkeT. EngkvistO. ChenH. Molecular de-novo design through deep reinforcement learning.J. Cheminform.2017914810.1186/s13321‑017‑0235‑x29086083
    [Google Scholar]
  58. QiR. GuoF. ZouQ. String kernels construction and fusion: A survey with bioinformatics application.Front. Comput. Sci.202216616690410.1007/s11704‑021‑1118‑x
    [Google Scholar]
  59. ChenY. WangZ. ZengX. LiY. LiP. YeX. SakuraiT. Molecular language models: RNNs or transformer?Brief. Funct. Genomics202322439240010.1093/bfgp/elad01237078726
    [Google Scholar]
  60. FabbriM. MoroG. Dow jones trading with deep learning: The unreasonable effectiveness of recurrent neural networks. In: Proceedings of the 7th International Conference on Data Science, Technology and Applications201814215310.5220/0006922101420153
    [Google Scholar]
  61. LiuQ. AllamanisM. BrockschmidtM. GauntA.L. Constrained graph variational autoencoders for molecule design.arXiv:1805.090762018
    [Google Scholar]
  62. JinW. BarzilayR. JaakkolaT. Hierarchical generation of molecular graphs using structural motifs.arXiv:2002.032302020
    [Google Scholar]
  63. HoogeboomE. NielsenD. JainiP. ForréP. WellingM. Argmax flows and multinomial diffusion: Learning categorical distributions.arXiv:2102.053792021
    [Google Scholar]
  64. BemisG.W. MurckoM.A. The properties of known drugs. 1. Molecular frameworks.J. Med. Chem.199639152887289310.1021/jm96029288709122
    [Google Scholar]
  65. BenhendaM. ChemGAN challenge for drug discovery: Can AI reproduce natural chemical diversity?arXiv:1708.082272017
    [Google Scholar]
  66. SimonovskyM. KomodakisN. Graphvae: Towards generation of small graphs using variational autoencoders.27th International Conference on Artificial Neural NetworksRhodes, Greece, October 4-7, 201810.1007/978‑3‑030‑01418‑6_41
    [Google Scholar]
  67. De CaoN. KipfT. MolGAN: An implicit generative model for small molecular graphs.arXiv:1708.082272018
    [Google Scholar]
  68. MaT. ChenJ. XiaoC. Constrained generation of semantically valid graphs via regularizing variational autoencoders.Adv. Neural Inf. Process. Syst.201831
    [Google Scholar]
  69. Flam-ShepherdD. WuT.C. Aspuru-GuzikA. MPGVAE: Improved generation of small organic molecules using message passing neural nets.Mach. Learn: Sci. Techno.202124045010
    [Google Scholar]
  70. YouJ. LiuB. YingR. PandeV. LeskovecJ. Graph convolutional policy network for goal-directed molecular graph generation.Adv. Neural Inf. Process. Syst.201831
    [Google Scholar]
  71. DalkeA. HertJ. KramerC. mmpdb: An open-source matched molecular pair platform for large multiproperty data sets.J. Chem. Inf. Model.201858590291010.1021/acs.jcim.8b0017329770697
    [Google Scholar]
  72. JinW. YangK. BarzilayR. JaakkolaT. Learning multimodal graph-to-graph translation for molecular optimization.arXiv:1708.082272018
    [Google Scholar]
  73. EberhardtJ. Santos-MartinsD. TillackA.F. ForliS. AutoDock Vina 1.2. 0: New docking methods, expanded force field, and python bindings.J. Chem. Inf. Model.20216183891389810.1021/acs.jcim.1c0020334278794
    [Google Scholar]
  74. TrottO. OlsonA.J. AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading.J. Comput. Chem.201031245546110.1002/jcc.2133419499576
    [Google Scholar]
  75. WangS. CheT. LevitA. ShoichetB.K. WackerD. RothB.L. Structure of the D2 dopamine receptor bound to the atypical antipsychotic drug risperidone.Nature2018555769526927310.1038/nature2575829466326
    [Google Scholar]
/content/journals/cbio/10.2174/0115748936285493240307071916
Loading
/content/journals/cbio/10.2174/0115748936285493240307071916
Loading

Data & Media loading...

Supplements

Supplementary material is available on the publisher’s website along with the published article.

This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test