The Computational Tools to Identify DNA Repeats and Motifs: A Systematic Review

Kavya Singh; Shreya Srivastava; Ashish Prabhu; Navjeet Kaur

doi:10.2174/0115680266331305241113172257

ISSN: 1568-0266
E-ISSN: 1873-4294

The Computational Tools to Identify DNA Repeats and Motifs: A Systematic Review
Authors: Kavya Singh¹, Shreya Srivastava¹, Ashish Prabhu² and Navjeet Kaur^3,4
View Affiliations Hide Affiliations

¹ Department of Biosciences and Bioengineering, Indian Institute of Technology, Roorkee (IITR), Roorkee, 247667, Uttarakhand, India ; ² Department of Biotechnology, NIT Warangal, Warangal, 506004, Telangana, India ; ³ Department of Surgery, The University of Oklahoma Health Sciences Center, Oklahoma City, Oklahoma-73104, USA ; ⁴ Department of Chemistry and Division of Research and Development, Lovely Professional University, Phagwara, 144411, Punjab, India
Source: Current Topics in Medicinal Chemistry, Volume 25, Issue 6, Mar 2025, p. 705 - 723
DOI: https://doi.org/10.2174/0115680266331305241113172257
- Received: 14 Jun 2024
- Accepted: 04 Oct 2024
- Available online: 21 Nov 2024

Abstract

Introduction

DNA repeats and motifs are specific nucleotide patterns/DNA sequences frequently present in the genomes of prokaryotes and eukaryotes. Computational identification of these discrete patterns is of considerable importance since they are associated with gene regulation, genomic instability, and genetic diversity and result in a variety of diseases/disorders.

Objectives

In this article, the myriad of computational tools/algorithms and databases (~200 distinct resources) implicated in the detection of DNA repeats and motifs have been enlisted. This article will not only provide guidance to the users regarding the accuracy, reliability, and popularity (reflected by the citation index) of currently available tools but also enable them to select the best tool(s) to carry out a desired task.

Methods

The structured literature review, with its dependable and reproducible research process, allowed us to acquire 200 peer-reviewed publications from indexing databases, such as Scopus, ScienceDirect, Web of Science (WoS), PubMed, and EMBASE, by utilizing PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) regulations. Numerous keyword combinations regarding DNA repeats and motifs were used to create the query syntax.

Results

Initially, 3,233 research publications were retrieved, and 200 of them that satisfied the eligibility criteria for the detection and identification of DNA repeats and motifs by computational tools were chosen. A total of 200 research publications were recovered, of which 99 dealt with repeat prediction tools, 12 with repetitive sequence databases, 19 with specialized regulatory element databases, and 69 with motif prediction tools.

Conclusion

This article lists numerous databases and computational tools/algorithms (~ 200 different resources) that are involved in the identification of DNA repeats and motifs. It will help users choose the appropriate tool(s) for carrying out a particular task in addition to offering guidance on the reliability, dependability, and popularity (as indicated by the citation index) of currently available tools.

Article metrics loading...

/content/journals/ctmc/10.2174/0115680266331305241113172257

2024-11-21

2026-02-14

From This Site

/content/journals/ctmc/10.2174/0115680266331305241113172257

dcterms_title,dcterms_subject,pub_keyword

-contentType:Contributor -contentType:Concept -contentType:Institution

10

5

Full text loading...

References

LanderE.S. LintonL.M. BirrenB. NusbaumC. ZodyM.C. BaldwinJ. DevonK. DewarK. DoyleM. FitzHughW. FunkeR. GageD. HarrisK. HeafordA. HowlandJ. KannL. LehoczkyJ. LeVineR. McEwanP. McKernanK. MeldrimJ. MesirovJ.P. MirandaC. MorrisW. NaylorJ. RaymondC. RosettiM. SantosR. SheridanA. SougnezC. Stange-ThomannN. StojanovicN. SubramanianA. WymanD. RogersJ. SulstonJ. AinscoughR. BeckS. BentleyD. BurtonJ. CleeC. CarterN. CoulsonA. DeadmanR. DeloukasP. DunhamA. DunhamI. DurbinR. FrenchL. GrafhamD. GregoryS. HubbardT. HumphrayS. HuntA. JonesM. LloydC. McMurrayA. MatthewsL. MercerS. MilneS. MullikinJ.C. MungallA. PlumbR. RossM. ShownkeenR. SimsS. WaterstonR.H. WilsonR.K. HillierL.D.W. McPhersonJ.D. MarraM.A. MardisE.R. FultonL.A. ChinwallaA.T. PepinK.H. GishW.R. ChissoeS.L. WendlM.C. DelehauntyK.D. MinerT.L. DelehauntyA. KramerJ.B. CookL.L. FultonR.S. JohnsonD.L. MinxP.J. CliftonS.W. HawkinsT. BranscombE. PredkiP. RichardsonP. WenningS. SlezakT. DoggettN. ChengJ-F. OlsenA. LucasS. ElkinC. UberbacherE. FrazierM. GibbsR.A. MuznyD.M. SchererS.E. BouckJ.B. SodergrenE.J. WorleyK.C. RivesC.M. GorrellJ.H. MetzkerM.L. NaylorS.L. KucherlapatiR.S. NelsonD.L. WeinstockG.M. SakakiY. FujiyamaA. HattoriM. YadaT. ToyodaA. ItohT. KawagoeC. WatanabeH. TotokiY. TaylorT. WeissenbachJ. HeiligR. SaurinW. ArtiguenaveF. BrottierP. BrulsT. PelletierE. RobertC. WinckerP. RosenthalA. PlatzerM. NyakaturaG. TaudienS. RumpA. SmithD.R. Doucette-StammL. RubenfieldM. WeinstockK. LeeH.M. DuboisJ.A. YangH. YuJ. WangJ. HuangG. GuJ. HoodL. RowenL. MadanA. QinS. DavisR.W. FederspielN.A. AbolaA.P. ProctorM.J. RoeB.A. ChenF. PanH. RamserJ. LehrachH. ReinhardtR. McCombieW.R. de la BastideM. DedhiaN. BlöckerH. HornischerK. NordsiekG. AgarwalaR. AravindL. BaileyJ.A. BatemanA. BatzoglouS. BirneyE. BorkP. BrownD.G. BurgeC.B. CeruttiL. ChenH-C. ChurchD. ClampM. CopleyR.R. DoerksT. EddyS.R. EichlerE.E. FureyT.S. GalaganJ. GilbertJ.G.R. HarmonC. HayashizakiY. HausslerD. HermjakobH. HokampK. JangW. JohnsonL.S. JonesT.A. KasifS. KaspryzkA. KennedyS. KentW.J. KittsP. KooninE.V. KorfI. KulpD. LancetD. LoweT.M. McLysaghtA. MikkelsenT. MoranJ.V. MulderN. PollaraV.J. PontingC.P. SchulerG. SchultzJ. SlaterG. SmitA.F.A. StupkaE. SzustakowkiJ. Thierry-MiegD. Thierry-MiegJ. WagnerL. WallisJ. WheelerR. WilliamsA. WolfY.I. WolfeK.H. YangS-P. YehR-F. CollinsF. GuyerM.S. PetersonJ. FelsenfeldA. WetterstrandK.A. MyersR.M. SchmutzJ. DicksonM. GrimwoodJ. CoxD.R. OlsonM.V. KaulR. RaymondC. ShimizuN. KawasakiK. MinoshimaS. EvansG.A. AthanasiouM. SchultzR. PatrinosA. MorganM.J. Initial sequencing and analysis of the human genome.Nature2001409682286092110.1038/35057062
[Google Scholar]
SharmaD. IssacB. RaghavaG.P.S. RamaswamyR. Spectral Repeat Finder (SRF): identification of repetitive sequences using Fourier transformation.Bioinformatics20042091405141210.1093/bioinformatics/bth103
[Google Scholar]
BensonG. Tandem repeats finder: a program to analyze DNA sequences.Nucleic Acids Res.199927257358010.1093/nar/27.2.573
[Google Scholar]
KolpakovR. BanaG. KucherovG. mreps: efficient and flexible detection of tandem repeats in DNA.Nucleic Acids Res.200331133672367810.1093/nar/gkg617
[Google Scholar]
EdgarR.C. PILER-CR: Fast and accurate identification of CRISPR repeats.BMC Bioinformatics2007811810.1186/1471‑2105‑8‑18
[Google Scholar]
GymrekM. GolanD. RossetS. ErlichY. lobSTR: A short tandem repeat profiler for personal genomes.Genome Res.20122261154116210.1101/gr.135780.111
[Google Scholar]
PriceA.L. JonesN.C. PevznerP.A. De novo identification of repeat families in large genomes.Bioinformatics200521Suppl. 1i351i35810.1093/bioinformatics/bti1018
[Google Scholar]
KurtzS. REPuter: the manifold applications of repeat analysis on a genomic scale.Nucleic Acids Res.200129224633464210.1093/nar/29.22.4633
[Google Scholar]
NovákP. NeumannP. PechJ. SteinhaislJ. MacasJ. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads.Bioinformatics201329679279310.1093/bioinformatics/btt054
[Google Scholar]
KoflerR. SchlöttererC. LelleyT. SciRoKo: a new tool for whole genome microsatellite search and investigation.Bioinformatics200723131683168510.1093/bioinformatics/btm157
[Google Scholar]
BeierS. ThielT. MünchT. ScholzU. MascherM. MISA-web: a web server for microsatellite prediction.Bioinformatics201733162583258510.1093/bioinformatics/btx198
[Google Scholar]
GirgisH.Z. SheetlinS.L. MsDetector: toward a standard computational tool for DNA microsatellites detection.Nucleic Acids Res.2013411e22e2210.1093/nar/gks881
[Google Scholar]
XuZ. WangH. LTR_FINDER: An efficient tool for the prediction of full-length LTR retrotransposons.Nucleic Acids Res.200735W265W268
[Google Scholar]
EllinghausD. KurtzS. WillhoeftU. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons.BMC Bioinformatics2008911810.1186/1471‑2105‑9‑18
[Google Scholar]
KohanyO. GentlesA.J. HankusL. JurkaJ. Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor.BMC Bioinformatics20067147410.1186/1471‑2105‑7‑474
[Google Scholar]
SmitA.F.A. The origin of interspersed repeats in the human genome.Curr. Opin. Genet. Dev.19966674374810.1016/S0959‑437X(96)80030‑X
[Google Scholar]
YinC. Identification of repeats in DNA sequences using nucleotide distribution uniformity.J. Theor. Biol.201741213814510.1016/j.jtbi.2016.10.013
[Google Scholar]
ShelenkovA. KorotkovE. LEPSCAN--a web server for searching latent periodicity in DNA sequences.Brief. Bioinform.201213214314910.1093/bib/bbr044
[Google Scholar]
BaoZ. EddyS.R. Automated de novo identification of repeat sequence families in sequenced genomes.Genome Res.20021281269127610.1101/gr.88502
[Google Scholar]
HoenD.R. HickeyG. BourqueG. CasacubertaJ. CordauxR. FeschotteC. Fiston-LavierA-S. Hua-VanA. HubleyR. KapustaA. LeratE. MaumusF. PollockD.D. QuesnevilleH. SmitA. WheelerT.J. BureauT.E. BlanchetteM. A call for benchmarking transposable element annotation methods.Mob. DNA2015611310.1186/s13100‑015‑0044‑6
[Google Scholar]
LiaoX. LiM. HuK. WuF-X. GaoX. WangJ. A sensitive repeat identification framework based on short and long reads.Nucleic Acids Res.20214917e100e10010.1093/nar/gkab563
[Google Scholar]
BaoW. KojimaK.K. KohanyO. Repbase Update, a database of repetitive elements in eukaryotic genomes.Mob. DNA2015611110.1186/s13100‑015‑0041‑9
[Google Scholar]
GelfandY. RodriguezA. BensonG. TRDB--The Tandem Repeats Database.Nucleic Acids Res.200735DatabaseD80D8710.1093/nar/gkl1013
[Google Scholar]
GrissaI. BouchonP. PourcelC. VergnaudG. On-line resources for bacterial micro-evolution studies using MLVA or CRISPR typing.Biochimie200890466066810.1016/j.biochi.2007.07.014
[Google Scholar]
HubleyR. FinnR.D. ClementsJ. EddyS.R. JonesT.A. BaoW. SmitA.F.A. WheelerT.J. The Dfam database of repetitive DNA families.Nucleic Acids Res.201644D1D81D8910.1093/nar/gkv1272
[Google Scholar]
YuJ. DossaK. WangL. ZhangY. WeiX. LiaoB. ZhangX. PMDBase: a database for studying microsatellite DNA and marker development in plants.Nucleic Acids Res.201745D1D1046D105310.1093/nar/gkw906
[Google Scholar]
RobertsonG. cisRED: a database system for genome-scale computational discovery of regulatory elements.Nucleic Acids Res.20063490001D68D7310.1093/nar/gkj075
[Google Scholar]
Kel-MargoulisO.V. COMPEL: a database on composite regulatory elements providing combinatorial transcriptional regulation.Nucleic Acids Res.200028131131510.1093/nar/28.1.311
[Google Scholar]
SuzukiY. DBTSS: DataBase of human Transcriptional Start Sites and full-length cDNAs.Nucleic Acids Res.200230132833110.1093/nar/30.1.328
[Google Scholar]
KarolchikD. The UCSC Genome Browser Database: 2008 update.Nucleic Acids Res.200836Database issueD773D779
[Google Scholar]
Cavin PerierR. JunierT. BucherP. The Eukaryotic Promoter Database EPD.Nucleic Acids Res.199826135335710.1093/nar/26.1.353
[Google Scholar]
SandelinA. JASPAR: an open-access database for eukaryotic transcription factor binding profiles.Nucleic Acids Res.20043290001Suppl. 191D9410.1093/nar/gkh012
[Google Scholar]
GhoshD. OOTFD (Object-Oriented Transcription Factors Database): an object- oriented successor to TFD.Nucleic Acids Res.199826136036210.1093/nar/26.1.360
[Google Scholar]
CiprianoM.J. NovichkovP.N. KazakovA.E. RodionovD.A. ArkinA.P. GelfandM.S. DubchakI. RegTransBase – a database of regulatory sequences and interactions based on literature: a resource for investigating transcriptional regulation in prokaryotes.BMC Genomics201314121310.1186/1471‑2164‑14‑213
[Google Scholar]
PahlH.L. Activators and target genes of Rel/NF-κB transcription factors.Oncogene199918496853686610.1038/sj.onc.1203239
[Google Scholar]
MatysV. TRANSFAC(R): transcriptional regulation, from patterns to profiles.Nucleic Acids Res.200331137437810.1093/nar/gkg108
[Google Scholar]
ZhaoF. TRED: a Transcriptional Regulatory Element Database and a platform for in silico gene regulation studies.Nucleic Acids Res.200433Database issueD103D10710.1093/nar/gki004
[Google Scholar]
KolchanovN.A. Transcription Regulatory Regions Database (TRRD): its status in 2002.Nucleic Acids Res.200230131231710.1093/nar/30.1.312
[Google Scholar]
DavuluriR.V. SunH. PalaniswamyS.K. MatthewsN. MolinaC. KurtzM. GrotewoldE. AGRIS: Arabidopsis Gene Regulatory Information Server, an information resource of Arabidopsis cis-regulatory elements and transcription factors.BMC Bioinformatics2003412510.1186/1471‑2105‑4‑25
[Google Scholar]
GuoA. HeK. LiuD. BaiS. GuX. WeiL. LuoJ. DATF: a database of Arabidopsis transcription factors.Bioinformatics200521102568256910.1093/bioinformatics/bti334
[Google Scholar]
GaoG. ZhongY. GuoA. ZhuQ. TangW. ZhengW. GuX. WeiL. LuoJ. DRTF: a database of rice transcription factors.Bioinformatics200622101286128710.1093/bioinformatics/btl107
[Google Scholar]
SharmaD. MohantyD. SuroliaA. RegAnalyst: a web interface for the analysis of regulatory motifs, networks and pathways.Nucleic Acids Res.200937W193W20110.1093/nar/gkp388
[Google Scholar]
HigoK. UgawaY. IwamotoM. KorenagaT. Plant cis-acting regulatory DNA elements (PLACE) database: 1999.Nucleic Acids Res.199927129730010.1093/nar/27.1.297
[Google Scholar]
JinJ. ZhangH. KongL. GaoG. LuoJ. PlantTFDB 3.0: a portal for the functional and evolutionary study of plant transcription factors.Nucleic Acids Res.201442D1D1182D118710.1093/nar/gkt1016
[Google Scholar]
ZhuJ. ZhangM.Q. SCPD: a promoter database of the yeast Saccharomyces cerevisiae.Bioinformatics199915760761110.1093/bioinformatics/15.7.607
[Google Scholar]
BaileyT.L. MEME SUITE: tools for motif discovery and searching.Nucleic Acids Res.200937W202W20810.1093/nar/gkp335
[Google Scholar]
BaileyT.L. DREME: motif discovery in transcription factor ChIP-seq data.Bioinformatics201127121653165910.1093/bioinformatics/btr261
[Google Scholar]
LiuX.S. BrutlagD.L. LiuJ.S. An algorithm for finding protein–DNA binding sites with applications to chromatin- immunoprecipitation microarray experiments.Nat. Biotechnol.200220883583910.1038/nbt717
[Google Scholar]
HeinzS. BennerC. SpannN. BertolinoE. LinY.C. LasloP. ChengJ.X. MurreC. SinghH. GlassC.K. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities.Mol. Cell201038457658910.1016/j.molcel.2010.05.004
[Google Scholar]
LiuX. BrutlagD.L. LiuJ.S. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes.Pac. Symp. Biocomput.2001127138
[Google Scholar]
RothF.P. HughesJ.D. EstepP.W. ChurchG.M. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation.Nat. Biotechnol.1998161093994510.1038/nbt1098‑939
[Google Scholar]
DavuluriR.V. GrosseI. ZhangM.Q. Computational identification of promoters and first exons in the human genome.Nat. Genet.200129441241710.1038/ng780
[Google Scholar]
PavesiG. Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes.Nucleic Acids Res.200432W199W20310.1093/nar/gkh465
[Google Scholar]
MahonyS. BenosP.V. STAMP: A web tool for exploring DNA-binding motif similarities.Nucleic Acids Res.200735W253W258
[Google Scholar]
LootsG.G. OvcharenkoI. rVISTA 2.0: evolutionary analysis of transcription factor binding sites.Nucleic Acids Res.32W217W221
[Google Scholar]
SandelinA. WassermanW.W. LenhardB. ConSite: web-based prediction of regulatory elements using cross-species comparison.Nucleic Acids Res.200432W249W25210.1093/nar/gkh372
[Google Scholar]
EskinE. PevznerP.A. Finding composite regulatory patterns in DNA sequences.Bioinformatics200218Suppl. 1S354S36310.1093/bioinformatics/18.suppl_1.S354
[Google Scholar]
Thomas-ChollierM. HerrmannC. DefranceM. SandO. ThieffryD. van HeldenJ. RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets.Nucleic Acids Res.2012404e3110.1093/nar/gkr1104
[Google Scholar]
SiddharthanR. SiggiaE.D. van NimwegenE. PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny.PLOS Comput. Biol.200517e6710.1371/journal.pcbi.0010067
[Google Scholar]
LinhartC. HalperinY. ShamirR. Transcription factor and microRNA motif discovery: The Amadeus platform and a compendium of metazoan target sets.Genome Res.20081871180118910.1101/gr.076117.108
[Google Scholar]
WangT. StormoG.D. Combining phylogenetic data with co-regulated genes to identify regulatory motifs.Bioinformatics200319182369238010.1093/bioinformatics/btg329
[Google Scholar]
SinhaS. TompaM. YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation.Nucleic Acids Res.200331133586358810.1093/nar/gkg618
[Google Scholar]
SinhaS. BlanchetteM. TompaM. PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences.BMC Bioinformatics20045117010.1186/1471‑2105‑5‑170
[Google Scholar]
BlanchetteM. TompaM. FootPrinter: a program designed for phylogenetic footprinting.Nucleic Acids Res.200331133840384210.1093/nar/gkg606
[Google Scholar]
FrithM.C. Finding functional sequence elements by multiple local alignment.Nucleic Acids Res.200432118920010.1093/nar/gkh169
[Google Scholar]
FrithM.C. HansenU. WengZ. Detection of cis -element clusters in higher eukaryotic DNA.Bioinformatics2001171087888910.1093/bioinformatics/17.10.878
[Google Scholar]
FrithM.C. LiM.C. WengZ. Cluster-Buster: finding dense clusters of motifs in DNA sequences.Nucleic Acids Res.200331133666366810.1093/nar/gkg540
[Google Scholar]
KulakovskiyI.V. BoevaV.A. FavorovA.V. MakeevV.J. Deep and wide digging for binding motifs in ChIP-Seq data.Bioinformatics201026202622262310.1093/bioinformatics/btq488
[Google Scholar]
PongerL. MouchiroudD. CpGProD: identifying CpG islands associated with transcription start sites in large genomic mammalian sequences.Bioinformatics200218463163310.1093/bioinformatics/18.4.631
[Google Scholar]
GoteaV. OvcharenkoI. DiRE: identifying distant regulatory elements of co-expressed genes.Nucleic Acids Res.200836W133W13910.1093/nar/gkn300
[Google Scholar]
GeorgievS. BoyleA.P. JayasuryaK. DingX. MukherjeeS. OhlerU. Evidence-ranked motif identification.Genome Biol.2010112R1910.1186/gb‑2010‑11‑2‑r19
[Google Scholar]
TanedaA. Adplot: detection and visualization of repetitive patterns in complete genomes.Bioinformatics200420570170810.1093/bioinformatics/btg470
[Google Scholar]
LiY. JiangN. SunY. AnnoSINE : a short interspersed nuclear elements annotation tool for plant genomes.Plant Physiol.2022188295597010.1093/plphys/kiab524
[Google Scholar]
WexlerY. YakhiniZ. KashiY. GeigerD. Finding approximate tandem repeats in genomic sequences.J. Comput. Biol.200512792894210.1089/cmb.2005.12.928
[Google Scholar]
HoffK.J. StankeM. Predicting genes in single genomes with AUGUSTUS.Curr. Protoc. Bioinformatics2019651e5710.1002/cpbi.57
[Google Scholar]
JurkaJ. KlonowskiP. DagmanV. PeltonP. Censor—a program for identification and elimination of repetitive elements from DNA sequences.Comput. Chem.199620111912110.1016/S0097‑8485(96)80013‑1
[Google Scholar]
ShiL. ChenH. JiangM. WangL. WuX. HuangL. LiuC. CPGAVAS2, an integrated plastome sequence annotator and analyzer.Nucleic Acids Res.201947W1W65W7310.1093/nar/gkz345
[Google Scholar]
CouvinD. BernheimA. Toffano-NiocheC. TouchonM. MichalikJ. NéronB. RochaE.P.C. VergnaudG. GautheretD. PourcelC. CRISPRCasFinder, an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins.Nucleic Acids Res.201846W1W246W25110.1093/nar/gky425
[Google Scholar]
YeC. JiG. LiL. LiangC. detectIR: a novel program for detecting perfect and imperfect inverted repeats using complex numbers and vector calculation.PLoS One2014911e11334910.1371/journal.pone.0113349
[Google Scholar]
YeC. JiG. LiangC. detectMITE: A novel approach to detect miniature inverted repeat transposable elements in genomes.Sci. Rep.2016611968810.1038/srep19688
[Google Scholar]
ShiJ. LiangC. Generic Repeat Finder: A High-Sensitivity Tool for Genome-Wide De Novo Repeat Detection.Plant Physiol.201918041803181510.1104/pp.19.00386
[Google Scholar]
MudunuriS.B. NagarajaramH.A. IMEx: Imperfect Microsatellite Extractor.Bioinformatics200723101181118710.1093/bioinformatics/btm097
[Google Scholar]
WirawanA. INVERTER: INtegrated Variable numbER Tandem rEpeat findeR.Computational Systems-Biology and Bioinformatics.Berlin, HeidelbergSpringer Berlin Heidelberg201010.1007/978‑3‑642‑16750‑8_14
[Google Scholar]
NicolasJ. PeterlongoP. TempelS. Finding and Characterizing Repeats in Plant Genomes.Methods Mol. Biol.20161374329333710.1007/978‑1‑4939‑3167‑5_17
[Google Scholar]
OuS. JiangN. LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons.Plant Physiol.201817621410142210.1104/pp.17.01310
[Google Scholar]
ChenG.L. ChangY.J. HsuehC.H. PRAP: an ab initio software package for automated genome-wide analysis of DNA repeats for prokaryotes.Bioinformatics201329212683268910.1093/bioinformatics/btt482
[Google Scholar]
BergmanC.M. QuesnevilleH. Discovering and detecting transposable elements in genome sequences.Brief. Bioinform.20078638239210.1093/bib/bbm048
[Google Scholar]
GirgisH.Z. Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale.BMC Bioinformatics201516122710.1186/s12859‑015‑0654‑5
[Google Scholar]
FeschotteC. KeswaniU. RanganathanN. GuibotsyM.L. LevineD. Exploring repetitive DNA landscapes using REPCLASS, a tool that automates the classification of transposable elements in eukaryotic genomes.Genome Biol. Evol.2009120522010.1093/gbe/evp023
[Google Scholar]
GuoR. LiY-R. HeS. Ou-YangL. SunY. ZhuZ. RepLong: de novo repeat identification using long read sequencing data.Bioinformatics20183471099110710.1093/bioinformatics/btx717
[Google Scholar]
KurtzS. PhillippyA. DelcherA.L. SmootM. ShumwayM. AntonescuC. SalzbergS.L. Versatile and open software for comparing large genomes.Genome Biol.200452R1210.1186/gb‑2004‑5‑2‑r12
[Google Scholar]
AgarwalP. StatesD.J. The Repeat Pattern Toolkit (RPT): analyzing the structure and evolution of the C. elegans genome.Proc. Int. Conf. Intell. Syst. Mol. Biol.19942119
[Google Scholar]
AchazG. BoyerF. RochaE.P.C. ViariA. CoissacE. Repseek, a tool to retrieve approximate repeats from large DNA sequences.Bioinformatics200723111912110.1093/bioinformatics/btl519
[Google Scholar]
MaoH. WangH. SINE_scan: an efficient tool to discover short interspersed nuclear elements (SINEs) in large-scale genomic datasets.Bioinformatics201733574374510.1093/bioinformatics/btw718
[Google Scholar]
PokrzywaR. PolanskiA. BWtrs: A tool for searching for tandem repeats in DNA sequences based on the Burrows–Wheeler transform.Genomics201096531632110.1016/j.ygeno.2010.08.001
[Google Scholar]
DelgrangeO. RivalsE. STAR: an algorithm to Search for Tandem Approximate Repeats.Bioinformatics200420162812282010.1093/bioinformatics/bth335
[Google Scholar]
ChiuR. Rajan-BabuI-S. FriedmanJ.M. BirolI. Straglr: discovering and genotyping tandem repeat expansions using whole genome long-read sequences.Genome Biol.202122122410.1186/s13059‑021‑02447‑3
[Google Scholar]
KurtzS. NarechaniaA. SteinJ.C. WareD. A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes.BMC Genomics20089151710.1186/1471‑2164‑9‑517
[Google Scholar]
BoevaV. RegnierM. PapatsenkoD. MakeevV. Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression.Bioinformatics200622667668410.1093/bioinformatics/btk032
[Google Scholar]
WlodzimierzP. HongM. HendersonI.R. TRASH: Tandem Repeat Annotation and Structural Hierarchy.Bioinformatics2023395btad30810.1093/bioinformatics/btad308
[Google Scholar]
CasteloA.T. MartinsW. GaoG.R. TROLL--tandem repeat occurrence locator.Bioinformatics200218463463610.1093/bioinformatics/18.4.634
[Google Scholar]
HusiH. SkipworthR.J. FearonK.C.H. RossJ.A. LSCluster, a large-scale sequence clustering and aligning software for use in partial identity mapping and splice-variant analysis.J. Proteomics20138418518910.1016/j.jprot.2013.04.006
[Google Scholar]
KalyanaramanA. AluruS. Efficient algorithms and software for detection of full-length LTR retrotransposons.J. Bioinform. Comput. Biol.20064219721610.1142/S021972000600203X
[Google Scholar]
TothG. PLOTREP: a web tool for defragmentation and visual analysis of dispersed genomic repeats.Nucleic Acids Res.200634W708W71310.1093/nar/gkl263
[Google Scholar]
SperberG. LövgrenA. ErikssonN-E. BenachenhouF. BlombergJ. RetroTector online, a rational tool for analysis of retroviral elements in small and medium size vertebrate genomic sequences.BMC Bioinformatics200910S6Suppl. 6S410.1186/1471‑2105‑10‑S6‑S4
[Google Scholar]
DuC. CaronnaJ. HeL. DoonerH.K. Computational prediction and molecular confirmation of Helitron transposons in the maize genome.BMC Genomics2008915110.1186/1471‑2164‑9‑51
[Google Scholar]
LucierJ.F. RTAnalyzer: a web application for finding new retrotransposons and detecting L1 retrotransposition signatures.Nucleic Acids Res.200735W269W27410.1093/nar/gkm313
[Google Scholar]
SzakS.T. PickeralO.K. MakalowskiW. BoguskiM.S. LandsmanD. BoekeJ.D. Molecular archeology of L1 insertions in the human genome.Genome Biol.2002310research0052.110.1186/gb‑2002‑3‑10‑research0052
[Google Scholar]
TuZ. Eight novel families of miniature inverted repeat transposable elements in the African malaria mosquito, Anopheles gambiae.Proc. Natl. Acad. Sci. USA20019841699170410.1073/pnas.98.4.1699
[Google Scholar]
RhoM. ChoiJ-H. KimS. LynchM. TangH. De novo identification of LTR retrotransposons in eukaryotic genomes.BMC Genomics2007819010.1186/1471‑2164‑8‑90
[Google Scholar]
ChenY. ZhouF. LiG. XuY. MUST: A system for identification of miniature inverted-repeat transposable elements and applications to Anabaena variabilis and Haloquadratum walsbyi.Gene20094361-21710.1016/j.gene.2009.01.019
[Google Scholar]
PereiraV. Automated paleontology of repetitive DNA with REANNOTATE.BMC Genomics20089161410.1186/1471‑2164‑9‑614
[Google Scholar]
LiaoX. GaoX. ZhangX. WuF-X. WangJ. RepAHR: an improved approach for de novo repeat identification by assembly of the high-frequency reads.BMC Bioinformatics202021146310.1186/s12859‑020‑03779‑w
[Google Scholar]
NovákP. NeumannP. MacasJ. Global analysis of repetitive DNA from unassembled sequence reads using RepeatExplorer2.Nat. Protoc.202015113745377610.1038/s41596‑020‑0400‑y
[Google Scholar]
SmithC.D. EdgarR.C. YandellM.D. SmithD.R. CelnikerS.E. MyersE.W. KarpenG.H. Improved repeat identification and masking in Dipterans.Gene200738911910.1016/j.gene.2006.09.011
[Google Scholar]
OttoT.D. GomesL.H.F. Alves-FerreiraM. de MirandaA.B. DegraveW.M. ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS).BMC Bioinformatics20089136610.1186/1471‑2105‑9‑366
[Google Scholar]
NaikP.K. MittalV.K. GuptaS. RetroPred: A tool for prediction, classification and extraction of non-LTR retrotransposons (LINEs & SINEs) from the genome by integrating PALS, PILER, MEME and ANN.Bioinformation20082626327010.6026/97320630002263
[Google Scholar]
DashnowH. PedersenB.S. HiattL. BrownJ. BeecroftS.J. RavenscroftG. LaCroixA.J. LamontP. RoxburghR.H. RodriguesM.J. DavisM. MeffordH.C. LaingN.G. QuinlanA.R. STRling: a k-mer counting approach that detects short tandem repeat expansions at known and novel loci.Genome Biol.202223125710.1186/s13059‑022‑02826‑4
[Google Scholar]
KronmillerB.A. WiseR.P. TEnest: automated chronological annotation and visualization of nested plant transposable elements.Plant Physiol.20081461455910.1104/pp.107.110353
[Google Scholar]
GiordanoJ. GeY. GelfandY. AbrusánG. BensonG. WarburtonP.E. Evolutionary history of mammalian transposons determined by genome-wide defragmentation.PLOS Comput. Biol.200737e13710.1371/journal.pcbi.0030137
[Google Scholar]
LexaM. JedlickaP. VanatI. CervenanskyM. KejnovskyE. TE-greedy-nester: structure-based detection of LTR retrotransposons and their nesting.Bioinformatics202036204991499910.1093/bioinformatics/btaa632
[Google Scholar]
KennedyR.C. UngerM.F. ChristleyS. CollinsF.H. MadeyG.R. An automated homology-based approach for identifying transposable elements.BMC Bioinformatics201112113010.1186/1471‑2105‑12‑130
[Google Scholar]
Fiston-LavierA.S. CarriganM. PetrovD.A. GonzálezJ. T-lex: a program for fast and accurate assessment of transposable element presence using next-generation sequencing data.Nucleic Acids Res.2011396e3610.1093/nar/gkq1291
[Google Scholar]
JordaJ. KajavaA.V. T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm.Bioinformatics200925202632263810.1093/bioinformatics/btp482
[Google Scholar]
PaladinL. BevilacquaM. ErrigoS. PiovesanD. MičetićI. NecciM. MonzonA.M. FabreM.L. LopezJ.L. NilssonJ.F. RiosJ. MennaP.L. CabreraM. BuitronM.G. KulikM.G. Fernandez-AlbertiS. FornasariM.S. ParisiG. LagaresA. HirshL. Andrade-NavarroM.A. KajavaA.V. TosattoS.C.E. RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures.Nucleic Acids Res.202149D1D452D45710.1093/nar/gkaa1097
[Google Scholar]
NeumannP. NovákP. HoštákováN. MacasJ. Systematic survey of plant LTR-retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classification.Mob. DNA201910112210.1186/s13100‑018‑0144‑1
[Google Scholar]
MistryJ. ChuguranskyS. WilliamsL. QureshiM. SalazarG.A. SonnhammerE.L.L. TosattoS.C.E. PaladinL. RajS. RichardsonL.J. FinnR.D. BatemanA. Pfam: The protein families database in 2021.Nucleic Acids Res.202149D1D412D41910.1093/nar/gkaa913
[Google Scholar]
Jorng-Tzong Horng LinF.M. LinJ.H. HuangH.D. LiuB.J. Database of repetitive elements in complete genomes and data mining using transcription factor binding sites.IEEE Trans. Inf. Technol. Biomed.2003729310010.1109/TITB.2003.811878
[Google Scholar]
BobyT. PatchA.M. AvesS.J. TRbase: a database relating tandem repeats to disease genes for the human genome.Bioinformatics200521681181610.1093/bioinformatics/bti059
[Google Scholar]
LiaoX. HuK. SalhiA. ZouY. WangJ. GaoX. msRepDB: a comprehensive repetitive sequence database of over 80 000 species.Nucleic Acids Res.202250D1D236D24510.1093/nar/gkab1089
[Google Scholar]
GhoshD. Object-oriented transcription factors database (ooTFD).Nucleic Acids Res.200028130831010.1093/nar/28.1.308
[Google Scholar]
KazakovA.E. CiprianoM.J. NovichkovP.S. MinovitskyS. VinogradovD.V. ArkinA. MironovA.A. GelfandM.S. DubchakI. RegTransBase--a database of regulatory sequences and interactions in a wide range of prokaryotic genomes.Nucleic Acids Res.200735DatabaseD407D41210.1093/nar/gkl865
[Google Scholar]
ShiJ. YangW. ChenM. DuY. ZhangJ. WangK. AMD, an automated motif discovery tool using stepwise refinement of gapped consensuses.PLoS One201169e2457610.1371/journal.pone.0024576
[Google Scholar]
WorkmanC.T. StormoG.D. ANN-Spec: a method for discovering transcription factor binding sites with improved specificity.Pac Symp Biocomput.2000200046747810.1142/9789814447331_0044
[Google Scholar]
SteffensN.O. AthaMap: an online resource for in silico transcription factor binding sites in the Arabidopsis thaliana genome.Nucleic Acids Res.20043290001368D37210.1093/nar/gkh017
[Google Scholar]
CheD. JensenS. CaiL. LiuJ.S. BEST: binding-site estimation suite of tools.Bioinformatics200521122909291110.1093/bioinformatics/bti425
[Google Scholar]
LiG. LiuB. MaQ. XuY. A new framework for identifying cis-regulatory motifs in prokaryotes.Nucleic Acids Res.2011397e4210.1093/nar/gkq948
[Google Scholar]
TriskaM. GrocuttD. SouthernJ. MurphyD.J. TatarinovaT. cisExpress : motif detection in DNA sequences.Bioinformatics201329172203220510.1093/bioinformatics/btt366
[Google Scholar]
KuttippurathuL. HsingM. LiuY. SchmidtB. MaskellD.L. LeeK. HeA. PuW.T. KongS.W. CompleteMOTIFs: DNA motif discovery platform for transcription factor binding experiments.Bioinformatics201127571571710.1093/bioinformatics/btq707
[Google Scholar]
KaranamS. MorenoC.S. CONFAC: automated application of comparative genomic promoter analysis to DNA microarray datasets.Nucleic Acids Res.200432W475W48410.1093/nar/gkh353
[Google Scholar]
BerezikovE. GuryevV. CuppenE. CONREAL web server: identification and visualization of conserved transcription factor binding sites.Nucleic Acids Res.200533W447W45010.1093/nar/gki378
[Google Scholar]
MaQ. DMINDA: an integrated web server for DNA motif identification and analyses.Nucleic Acids Res.201442W12W1910.1093/nar/gku315
[Google Scholar]
LiuF.F.M. FMGA: finding motifs by genetic algorithm.Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering21 May, 2004, Taichung, Taiwan, 2004, pp. 459-466.10.1109/BIBE.2004.1317378
[Google Scholar]
WangD. XiL. GAPK: Genetic algorithms with prior knowledge for motif discovery in DNA sequences.2009 IEEE Congress on Evolutionary Computation18-21 May, 2009, Trondheim, 2009, pp. 277-284.10.1109/CEC.2009.4982959
[Google Scholar]
NewbergL.A. ThompsonW.A. ConlanS. SmithT.M. McCueL.A. LawrenceC.E. A phylogenetic Gibbs sampler that yields centroid solutions for cis -regulatory site prediction.Bioinformatics200723141718172710.1093/bioinformatics/btm241
[Google Scholar]
van HeeringenS.J. VeenstraG.J.C. GimmeMotifs: a de novo motif prediction pipeline for ChIP-sequencing experiments.Bioinformatics201127227027110.1093/bioinformatics/btq636
[Google Scholar]
AoW. GaudetJ. KentW.J. MuttumuS. MangoS.E. Environmentally induced foregut remodeling by PHA-4/FoxA and DAF-12/NHR.Science200430556911743174610.1126/science.1102216
[Google Scholar]
BaileyT.L. GribskovM. Combining evidence using p-values: application to sequence homology searches.Bioinformatics1998141485410.1093/bioinformatics/14.1.48
[Google Scholar]
WangD. TapanS. MISCORE: a new scoring function for characterizing DNA regulatory motifs in promoter sequences.BMC Syst. Biol.20126Suppl. 2S410.1186/1752‑0509‑6‑S2‑S4
[Google Scholar]
BelmadaniM. MotifGP: DNA motif discovery using multiobjective evolution.Master thesis, University of Ottawa, 2016.
[Google Scholar]
ClaeysM. StormsV. SunH. MichoelT. MarchalK. MotifSuite: workflow for probabilistic motif detection and assessment.Bioinformatics201228141931193210.1093/bioinformatics/bts293
[Google Scholar]
FuY. MotifViz: an analysis and visualization tool for motif discovery.Nucleic Acids Res.200432W420W42310.1093/nar/gkh426
[Google Scholar]
MendesN.D. CasimiroA.C. SantosP.M. Sá-CorreiaI. OliveiraA.L. FreitasA.T. MUSA: a parameter free algorithm for the identification of biologically significant motifs.Bioinformatics200622242996300210.1093/bioinformatics/btl537
[Google Scholar]
NielsenM.M. TataruP. MadsenT. HobolthA. PedersenJ.S. Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments.Algorithms Mol. Biol.20181311710.1186/s13015‑018‑0135‑2
[Google Scholar]
MercierE. DroitA. LiL. RobertsonG. ZhangX. GottardoR. An integrated pipeline for the genome-wide analysis of transcription factor binding sites from ChIP-Seq.PLoS One201162e1643210.1371/journal.pone.0016432
[Google Scholar]
UdayakumarM. VaidhyanathanM. SadhanaR. SaiM. RSMD-repeat searcher and motif detector.J. Biomed. Res.201428541642210.7555/JBR.28.20130065
[Google Scholar]
ChakravartyA. CarlsonJ.M. KhetaniR.S. GrossR.H. A novel ensemble learning method for de novo computational identification of DNA binding sites.BMC Bioinformatics20078124910.1186/1471‑2105‑8‑249
[Google Scholar]
MahonyS. HendrixD. GoldenA. SmithT.J. RokhsarD.S. Transcription factor binding site identification using the self-organizing map.Bioinformatics20052191807181410.1093/bioinformatics/bti256
[Google Scholar]
SchonesD.E. SmithA.D. ZhangM.Q. Statistical significance of cis-regulatory modules.BMC Bioinformatics2007811910.1186/1471‑2105‑8‑19
[Google Scholar]
RomerK.A. KayombyaG.R. FraenkelE. WebMOTIFS: automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches.Nucleic Acids Res200735W217W22010.1093/nar/gkm376
[Google Scholar]
SunH. YuanY. WuY. LiuH. LiuJ.S. XieH. Tmod: toolbox of motif discovery.Bioinformatics201026340540710.1093/bioinformatics/btp681
[Google Scholar]

/content/journals/ctmc/10.2174/0115680266331305241113172257

The Computational Tools to Identify DNA Repeats and Motifs: A Systematic Review

Curr. Top. Med. Chem. 25, 705 (2025); https://doi.org/10.2174/0115680266331305241113172257

/content/journals/ctmc/10.2174/0115680266331305241113172257

Data & Media loading...

Supplements

PRISMA checklist is available as supplementary material on the publisher’s website along with the published article.

Article Type: Review Article

Keyword(s): Computational tools/algorithms; Databases; DNA repeats; Motifs; Nucleotide; Web of Science (WoS)

The Computational Tools to Identify DNA Repeats and Motifs: A Systematic Review

Abstract

From This Site

PRISMA checklist is available as supplementary material on the publisher’s website along with the published article.

Most Read This Month

Most Cited Most Cited RSS feed

Subunit Composition, Distribution and Function of GABA-A Receptor Subtypes

Structure and Function of HIV-1 Integrase

Mapping of the Benzodiazepine Recognition Site on GABA-A Receptors

Iminosugars and Relatives as Antiviral and Potential Anti-infective Agents

LncRNA as a Therapeutic Target for Angiogenesis

Design, Synthesis and Biological Evaluation of Iminosugar-Based Glycosyltransferase Inhibitors

Naturally Occurring Iminosugars and Related Compounds: Structure, Distribution, and Biological Activity

Macrolide Antibiotics: Binding Site, Mechanism of Action, Resistance

Therapeutic Applications of Imino Sugars in Lysosomal Storage Disorders

Photoaffinity Labeling in Drug Discovery and Developments: Chemical Gateway for Entering Proteomic Frontier