
Full text loading...
Background: DNA fragments located near the transcription initiation site are categorized
into two types, strong promoters, and weak promoters, based on their transcriptional activation and
expression levels.
Identifying promoters and determining their strength is crucial for understanding gene expression
regulation. There is a need to improve the predictive quality of promoter prediction models for realworld
applications.
Methods: The most recent training dataset was constructed from the RegalonDB website, where all
promoters had been experimentally validated, and their sequence similarity was below 85%. DNA
sequence samples were represented using one-hot encoding, along with nucleotide chemical properties
and density (NCPD). An integrated deep learning framework was developed, incorporating a
multi-head attention module, a long short-term memory (LSTM) module, and a convolutional neural
network (CNN) module.
Results: The AUC and MCC for iPSI(2L)-EDL in identifying promoters improved by 2.23% and
2.96%, respectively, compared to the PseDNC-DL method on independent testing data. The AUC
and MCC for iPSI(2L)-EDL increased by 3.74% and 5.86%, respectively, in predicting promoter
strength type.
Conclusion: The importance of different input positions and long-range dependency relationships
among features contributed to better promoter recognition. The CNN played a crucial role in recognizing
promoters. Furthermore, to facilitate access for most experimental scientists, a user-friendly
web server has been established, which can be accessed at http://47.94.248.117/IPSW(2L)-EDL.