Skip to content
2000
image of The Computational Tools to Identify DNA Repeats and Motifs: A Systematic Review

Abstract

Introduction

DNA repeats and motifs are specific nucleotide patterns/DNA sequences frequently present in the genomes of prokaryotes and eukaryotes. Computational identification of these discrete patterns is of considerable importance since they are associated with gene regulation, genomic instability, and genetic diversity and result in a variety of diseases/disorders.

Objective

In this article, the myriad of computational tools/algorithms and databases (~200 distinct resources) implicated in the detection of DNA repeats and motifs have been enlisted. This article will not only provide guidance to the users regarding the accuracy, reliability, and popularity (reflected by the citation index) of currently available tools but also enable them to select the best tool(s) to carry out a desired task.

Methods

The structured literature review, with its dependable and reproducible research process, allowed us to acquire 200 peer-reviewed publications from indexing databases, such as Scopus, ScienceDirect, Web of Science (WoS), PubMed, and EMBASE, by utilizing PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) regulations. Numerous keyword combinations regarding DNA repeats and motifs were used to create the query syntax.

Results

Initially, 3,233 research publications were retrieved, and 200 of them that satisfied the eligibility criteria for the detection and identification of DNA repeats and motifs by computational tools were chosen. A total of 200 research publications were recovered, of which 99 dealt with repeat prediction tools, 12 with repetitive sequence databases, 19 with specialized regulatory element databases, and 69 with motif prediction tools.

Conclusion

This article lists numerous databases and computational tools/algorithms (~ 200 different resources) that are involved in the identification of DNA repeats and motifs. It will help users choose the appropriate tool(s) for carrying out a particular task in addition to offering guidance on the reliability, dependability, and popularity (as indicated by the citation index) of currently available tools.

Loading

Article metrics loading...

/content/journals/ctmc/10.2174/0115680266331305241113172257
2024-11-21
2024-12-26
Loading full text...

Full text loading...

References

  1. Lander E.S. Linton L.M. Birren B. Nusbaum C. Zody M.C. Baldwin J. Devon K. Dewar K. Doyle M. FitzHugh W. Funke R. Gage D. Harris K. Heaford A. Howland J. Kann L. Lehoczky J. LeVine R. McEwan P. McKernan K. Meldrim J. Mesirov J.P. Miranda C. Morris W. Naylor J. Raymond C. Rosetti M. Santos R. Sheridan A. Sougnez C. Stange-Thomann N. Stojanovic N. Subramanian A. Wyman D. Rogers J. Sulston J. Ainscough R. Beck S. Bentley D. Burton J. Clee C. Carter N. Coulson A. Deadman R. Deloukas P. Dunham A. Dunham I. Durbin R. French L. Grafham D. Gregory S. Hubbard T. Humphray S. Hunt A. Jones M. Lloyd C. McMurray A. Matthews L. Mercer S. Milne S. Mullikin J.C. Mungall A. Plumb R. Ross M. Shownkeen R. Sims S. Waterston R.H. Wilson R.K. Hillier L.D.W. McPherson J.D. Marra M.A. Mardis E.R. Fulton L.A. Chinwalla A.T. Pepin K.H. Gish W.R. Chissoe S.L. Wendl M.C. Delehaunty K.D. Miner T.L. Delehaunty A. Kramer J.B. Cook L.L. Fulton R.S. Johnson D.L. Minx P.J. Clifton S.W. Hawkins T. Branscomb E. Predki P. Richardson P. Wenning S. Slezak T. Doggett N. Cheng J-F. Olsen A. Lucas S. Elkin C. Uberbacher E. Frazier M. Gibbs R.A. Muzny D.M. Scherer S.E. Bouck J.B. Sodergren E.J. Worley K.C. Rives C.M. Gorrell J.H. Metzker M.L. Naylor S.L. Kucherlapati R.S. Nelson D.L. Weinstock G.M. Sakaki Y. Fujiyama A. Hattori M. Yada T. Toyoda A. Itoh T. Kawagoe C. Watanabe H. Totoki Y. Taylor T. Weissenbach J. Heilig R. Saurin W. Artiguenave F. Brottier P. Bruls T. Pelletier E. Robert C. Wincker P. Rosenthal A. Platzer M. Nyakatura G. Taudien S. Rump A. Smith D.R. Doucette-Stamm L. Rubenfield M. Weinstock K. Lee H.M. Dubois J.A. Yang H. Yu J. Wang J. Huang G. Gu J. Hood L. Rowen L. Madan A. Qin S. Davis R.W. Federspiel N.A. Abola A.P. Proctor M.J. Roe B.A. Chen F. Pan H. Ramser J. Lehrach H. Reinhardt R. McCombie W.R. de la Bastide M. Dedhia N. Blöcker H. Hornischer K. Nordsiek G. Agarwala R. Aravind L. Bailey J.A. Bateman A. Batzoglou S. Birney E. Bork P. Brown D.G. Burge C.B. Cerutti L. Chen H-C. Church D. Clamp M. Copley R.R. Doerks T. Eddy S.R. Eichler E.E. Furey T.S. Galagan J. Gilbert J.G.R. Harmon C. Hayashizaki Y. Haussler D. Hermjakob H. Hokamp K. Jang W. Johnson L.S. Jones T.A. Kasif S. Kaspryzk A. Kennedy S. Kent W.J. Kitts P. Koonin E.V. Korf I. Kulp D. Lancet D. Lowe T.M. McLysaght A. Mikkelsen T. Moran J.V. Mulder N. Pollara V.J. Ponting C.P. Schuler G. Schultz J. Slater G. Smit A.F.A. Stupka E. Szustakowki J. Thierry-Mieg D. Thierry-Mieg J. Wagner L. Wallis J. Wheeler R. Williams A. Wolf Y.I. Wolfe K.H. Yang S-P. Yeh R-F. Collins F. Guyer M.S. Peterson J. Felsenfeld A. Wetterstrand K.A. Myers R.M. Schmutz J. Dickson M. Grimwood J. Cox D.R. Olson M.V. Kaul R. Raymond C. Shimizu N. Kawasaki K. Minoshima S. Evans G.A. Athanasiou M. Schultz R. Patrinos A. Morgan M.J. Initial sequencing and analysis of the human genome. Nature 2001 409 6822 860 921 10.1038/35057062
    [Google Scholar]
  2. Sharma D. Issac B. Raghava G.P.S. Ramaswamy R. Spectral Repeat Finder (SRF): identification of repetitive sequences using Fourier transformation. Bioinformatics 2004 20 9 1405 1412 10.1093/bioinformatics/bth103
    [Google Scholar]
  3. Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999 27 2 573 580 10.1093/nar/27.2.573
    [Google Scholar]
  4. Kolpakov R. Bana G. Kucherov G. mreps: efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res. 2003 31 13 3672 3678 10.1093/nar/gkg617
    [Google Scholar]
  5. Edgar R.C. PILER-CR: Fast and accurate identification of CRISPR repeats. BMC Bioinformatics 2007 8 1 18 10.1186/1471‑2105‑8‑18
    [Google Scholar]
  6. Gymrek M. Golan D. Rosset S. Erlich Y. lobSTR: A short tandem repeat profiler for personal genomes. Genome Res. 2012 22 6 1154 1162 10.1101/gr.135780.111
    [Google Scholar]
  7. Price A.L. Jones N.C. Pevzner P.A. De novo identification of repeat families in large genomes. Bioinformatics 2005 21 Suppl. 1 i351 i358 10.1093/bioinformatics/bti1018
    [Google Scholar]
  8. Kurtz S. REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 2001 29 22 4633 4642 10.1093/nar/29.22.4633
    [Google Scholar]
  9. Novák P. Neumann P. Pech J. Steinhaisl J. Macas J. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 2013 29 6 792 793 10.1093/bioinformatics/btt054
    [Google Scholar]
  10. Kofler R. Schlötterer C. Lelley T. SciRoKo: a new tool for whole genome microsatellite search and investigation. Bioinformatics 2007 23 13 1683 1685 10.1093/bioinformatics/btm157
    [Google Scholar]
  11. Beier S. Thiel T. Münch T. Scholz U. Mascher M. MISA-web: a web server for microsatellite prediction. Bioinformatics 2017 33 16 2583 2585 10.1093/bioinformatics/btx198
    [Google Scholar]
  12. Girgis H.Z. Sheetlin S.L. MsDetector: toward a standard computational tool for DNA microsatellites detection. Nucleic Acids Res. 2013 41 1 e22 e22 10.1093/nar/gks881
    [Google Scholar]
  13. Xu Z. Wang H. LTR_FINDER: An efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 2007 35 W265 W268
    [Google Scholar]
  14. Ellinghaus D. Kurtz S. Willhoeft U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 2008 9 1 18 10.1186/1471‑2105‑9‑18
    [Google Scholar]
  15. Kohany O. Gentles A.J. Hankus L. Jurka J. Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC Bioinformatics 2006 7 1 474 10.1186/1471‑2105‑7‑474
    [Google Scholar]
  16. Smit A.F.A. The origin of interspersed repeats in the human genome. Curr. Opin. Genet. Dev. 1996 6 6 743 748 10.1016/S0959‑437X(96)80030‑X
    [Google Scholar]
  17. Yin C. Identification of repeats in DNA sequences using nucleotide distribution uniformity. J. Theor. Biol. 2017 412 138 145 10.1016/j.jtbi.2016.10.013
    [Google Scholar]
  18. Shelenkov A. Korotkov E. LEPSCAN--a web server for searching latent periodicity in DNA sequences. Brief. Bioinform. 2012 13 2 143 149 10.1093/bib/bbr044
    [Google Scholar]
  19. Bao Z. Eddy S.R. Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res. 2002 12 8 1269 1276 10.1101/gr.88502
    [Google Scholar]
  20. Hoen D.R. Hickey G. Bourque G. Casacuberta J. Cordaux R. Feschotte C. Fiston-Lavier A-S. Hua-Van A. Hubley R. Kapusta A. Lerat E. Maumus F. Pollock D.D. Quesneville H. Smit A. Wheeler T.J. Bureau T.E. Blanchette M. A call for benchmarking transposable element annotation methods. Mob. DNA 2015 6 1 13 10.1186/s13100‑015‑0044‑6
    [Google Scholar]
  21. Liao X. Li M. Hu K. Wu F-X. Gao X. Wang J. A sensitive repeat identification framework based on short and long reads. Nucleic Acids Res. 2021 49 17 e100 e100 10.1093/nar/gkab563
    [Google Scholar]
  22. Bao W. Kojima K.K. Kohany O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 2015 6 1 11 10.1186/s13100‑015‑0041‑9
    [Google Scholar]
  23. Gelfand Y. Rodriguez A. Benson G. TRDB--The Tandem Repeats Database. Nucleic Acids Res. 2007 35 Database D80 D87 10.1093/nar/gkl1013
    [Google Scholar]
  24. Grissa I. Bouchon P. Pourcel C. Vergnaud G. On-line resources for bacterial micro-evolution studies using MLVA or CRISPR typing. Biochimie 2008 90 4 660 668 10.1016/j.biochi.2007.07.014
    [Google Scholar]
  25. Hubley R. Finn R.D. Clements J. Eddy S.R. Jones T.A. Bao W. Smit A.F.A. Wheeler T.J. The Dfam database of repetitive DNA families. Nucleic Acids Res. 2016 44 D1 D81 D89 10.1093/nar/gkv1272
    [Google Scholar]
  26. Yu J. Dossa K. Wang L. Zhang Y. Wei X. Liao B. Zhang X. PMDBase: a database for studying microsatellite DNA and marker development in plants. Nucleic Acids Res. 2017 45 D1 D1046 D1053 10.1093/nar/gkw906
    [Google Scholar]
  27. Robertson G. cisRED: a database system for genome-scale computational discovery of regulatory elements. Nucleic Acids Res. 2006 34 90001 D68 D73 10.1093/nar/gkj075
    [Google Scholar]
  28. Kel-Margoulis O.V. COMPEL: a database on composite regulatory elements providing combinatorial transcriptional regulation. Nucleic Acids Res. 2000 28 1 311 315 10.1093/nar/28.1.311
    [Google Scholar]
  29. Suzuki Y. DBTSS: DataBase of human Transcriptional Start Sites and full-length cDNAs. Nucleic Acids Res. 2002 30 1 328 331 10.1093/nar/30.1.328
    [Google Scholar]
  30. Karolchik D. The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res. 2008 36 Database issue D773 D779
    [Google Scholar]
  31. Cavin Perier R. Junier T. Bucher P. The Eukaryotic Promoter Database EPD. Nucleic Acids Res. 1998 26 1 353 357 10.1093/nar/26.1.353
    [Google Scholar]
  32. Sandelin A. JASPAR: an open-access database for eukaryotic transcription factor binding profiles. Nucleic Acids Res. 2004 32 90001 Suppl. 1 91D 94 10.1093/nar/gkh012
    [Google Scholar]
  33. Ghosh D. OOTFD (Object-Oriented Transcription Factors Database): an object- oriented successor to TFD. Nucleic Acids Res. 1998 26 1 360 362 10.1093/nar/26.1.360
    [Google Scholar]
  34. Cipriano M.J. Novichkov P.N. Kazakov A.E. Rodionov D.A. Arkin A.P. Gelfand M.S. Dubchak I. RegTransBase – a database of regulatory sequences and interactions based on literature: a resource for investigating transcriptional regulation in prokaryotes. BMC Genomics 2013 14 1 213 10.1186/1471‑2164‑14‑213
    [Google Scholar]
  35. Pahl H.L. Activators and target genes of Rel/NF-κB transcription factors. Oncogene 1999 18 49 6853 6866 10.1038/sj.onc.1203239
    [Google Scholar]
  36. Matys V. TRANSFAC(R): transcriptional regulation, from patterns to profiles. Nucleic Acids Res. 2003 31 1 374 378 10.1093/nar/gkg108
    [Google Scholar]
  37. Zhao F. TRED: a Transcriptional Regulatory Element Database and a platform for in silico gene regulation studies. Nucleic Acids Res. 2004 33 Database issue D103 D107 10.1093/nar/gki004
    [Google Scholar]
  38. Kolchanov N.A. Transcription Regulatory Regions Database (TRRD): its status in 2002. Nucleic Acids Res. 2002 30 1 312 317 10.1093/nar/30.1.312
    [Google Scholar]
  39. Davuluri R.V. Sun H. Palaniswamy S.K. Matthews N. Molina C. Kurtz M. Grotewold E. AGRIS: Arabidopsis Gene Regulatory Information Server, an information resource of Arabidopsis cis-regulatory elements and transcription factors. BMC Bioinformatics 2003 4 1 25 10.1186/1471‑2105‑4‑25
    [Google Scholar]
  40. Guo A. He K. Liu D. Bai S. Gu X. Wei L. Luo J. DATF: a database of Arabidopsis transcription factors. Bioinformatics 2005 21 10 2568 2569 10.1093/bioinformatics/bti334
    [Google Scholar]
  41. Gao G. Zhong Y. Guo A. Zhu Q. Tang W. Zheng W. Gu X. Wei L. Luo J. DRTF: a database of rice transcription factors. Bioinformatics 2006 22 10 1286 1287 10.1093/bioinformatics/btl107
    [Google Scholar]
  42. Sharma D. Mohanty D. Surolia A. RegAnalyst: a web interface for the analysis of regulatory motifs, networks and pathways. Nucleic Acids Res. 2009 37 W193 W201 10.1093/nar/gkp388
    [Google Scholar]
  43. Higo K. Ugawa Y. Iwamoto M. Korenaga T. Plant cis-acting regulatory DNA elements (PLACE) database: 1999. Nucleic Acids Res. 1999 27 1 297 300 10.1093/nar/27.1.297
    [Google Scholar]
  44. Jin J. Zhang H. Kong L. Gao G. Luo J. PlantTFDB 3.0: a portal for the functional and evolutionary study of plant transcription factors. Nucleic Acids Res. 2014 42 D1 D1182 D1187 10.1093/nar/gkt1016
    [Google Scholar]
  45. Zhu J. Zhang M.Q. SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics 1999 15 7 607 611 10.1093/bioinformatics/15.7.607
    [Google Scholar]
  46. Bailey T.L. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009 37 W202 W208 10.1093/nar/gkp335
    [Google Scholar]
  47. Bailey T.L. DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 2011 27 12 1653 1659 10.1093/bioinformatics/btr261
    [Google Scholar]
  48. Liu X.S. Brutlag D.L. Liu J.S. An algorithm for finding protein–DNA binding sites with applications to chromatin- immunoprecipitation microarray experiments. Nat. Biotechnol. 2002 20 8 835 839 10.1038/nbt717
    [Google Scholar]
  49. Heinz S. Benner C. Spann N. Bertolino E. Lin Y.C. Laslo P. Cheng J.X. Murre C. Singh H. Glass C.K. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol. Cell 2010 38 4 576 589 10.1016/j.molcel.2010.05.004
    [Google Scholar]
  50. Liu X. Brutlag D.L. Liu J.S. BioProspector: discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Pac. Symp. Biocomput. 2001 127 138
    [Google Scholar]
  51. Roth F.P. Hughes J.D. Estep P.W. Church G.M. Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nat. Biotechnol. 1998 16 10 939 945 10.1038/nbt1098‑939
    [Google Scholar]
  52. Davuluri R.V. Grosse I. Zhang M.Q. Computational identification of promoters and first exons in the human genome. Nat. Genet. 2001 29 4 412 417 10.1038/ng780
    [Google Scholar]
  53. Pavesi G. Weeder Web: discovery of transcription factor binding sites in a set of sequences from co-regulated genes. Nucleic Acids Res. 2004 32 W199 W203 10.1093/nar/gkh465
    [Google Scholar]
  54. Mahony S. Benos P.V. STAMP: A web tool for exploring DNA-binding motif similarities. Nucleic Acids Res. 2007 35 W253 W258
    [Google Scholar]
  55. Loots G.G. Ovcharenko I. rVISTA 2.0: evolutionary analysis of transcription factor binding sites. Nucleic Acids Res. ••• 32 W217 W221
    [Google Scholar]
  56. Sandelin A. Wasserman W.W. Lenhard B. ConSite: web-based prediction of regulatory elements using cross-species comparison. Nucleic Acids Res. 2004 32 W249 W252 10.1093/nar/gkh372
    [Google Scholar]
  57. Eskin E. Pevzner P.A. Finding composite regulatory patterns in DNA sequences. Bioinformatics 2002 18 Suppl. 1 S354 S363 10.1093/bioinformatics/18.suppl_1.S354
    [Google Scholar]
  58. Thomas-Chollier M. Herrmann C. Defrance M. Sand O. Thieffry D. van Helden J. RSAT peak-motifs: motif analysis in full-size ChIP-seq datasets. Nucleic Acids Res. 2012 40 4 e31 10.1093/nar/gkr1104
    [Google Scholar]
  59. Siddharthan R. Siggia E.D. van Nimwegen E. PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny. PLOS Comput. Biol. 2005 1 7 e67 10.1371/journal.pcbi.0010067
    [Google Scholar]
  60. Linhart C. Halperin Y. Shamir R. Transcription factor and microRNA motif discovery: The Amadeus platform and a compendium of metazoan target sets. Genome Res. 2008 18 7 1180 1189 10.1101/gr.076117.108
    [Google Scholar]
  61. Wang T. Stormo G.D. Combining phylogenetic data with co-regulated genes to identify regulatory motifs. Bioinformatics 2003 19 18 2369 2380 10.1093/bioinformatics/btg329
    [Google Scholar]
  62. Sinha S. Tompa M. YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation. Nucleic Acids Res. 2003 31 13 3586 3588 10.1093/nar/gkg618
    [Google Scholar]
  63. Sinha S. Blanchette M. Tompa M. Phy M.E. A probabilistic algorithm for finding motifs in sets of orthologous sequences. BMC Bioinformatics 2004 5 1 170 10.1186/1471‑2105‑5‑170
    [Google Scholar]
  64. Blanchette M. Tompa M. FootPrinter: a program designed for phylogenetic footprinting. Nucleic Acids Res. 2003 31 13 3840 3842 10.1093/nar/gkg606
    [Google Scholar]
  65. Frith M.C. Finding functional sequence elements by multiple local alignment. Nucleic Acids Res. 2004 32 1 189 200 10.1093/nar/gkh169
    [Google Scholar]
  66. Frith M.C. Hansen U. Weng Z. Detection of cis -element clusters in higher eukaryotic DNA. Bioinformatics 2001 17 10 878 889 10.1093/bioinformatics/17.10.878
    [Google Scholar]
  67. Frith M.C. Li M.C. Weng Z. Cluster-Buster: finding dense clusters of motifs in DNA sequences. Nucleic Acids Res. 2003 31 13 3666 3668 10.1093/nar/gkg540
    [Google Scholar]
  68. Kulakovskiy I.V. Boeva V.A. Favorov A.V. Makeev V.J. Deep and wide digging for binding motifs in ChIP-Seq data. Bioinformatics 2010 26 20 2622 2623 10.1093/bioinformatics/btq488
    [Google Scholar]
  69. Ponger L. Mouchiroud D. CpGProD: identifying CpG islands associated with transcription start sites in large genomic mammalian sequences. Bioinformatics 2002 18 4 631 633 10.1093/bioinformatics/18.4.631
    [Google Scholar]
  70. Gotea V. Ovcharenko I. DiRE: identifying distant regulatory elements of co-expressed genes. Nucleic Acids Res. 2008 36 W133 W139 10.1093/nar/gkn300
    [Google Scholar]
  71. Georgiev S. Boyle A.P. Jayasurya K. Ding X. Mukherjee S. Ohler U. Evidence-ranked motif identification. Genome Biol. 2010 11 2 R19 10.1186/gb‑2010‑11‑2‑r19
    [Google Scholar]
  72. Taneda A. Adplot: detection and visualization of repetitive patterns in complete genomes. Bioinformatics 2004 20 5 701 708 10.1093/bioinformatics/btg470
    [Google Scholar]
  73. Li Y. Jiang N. Sun Y. AnnoSINE: a short interspersed nuclear elements annotation tool for plant genomes. Plant Physiol. 2022 188 2 955 970 10.1093/plphys/kiab524
    [Google Scholar]
  74. Wexler Y. Yakhini Z. Kashi Y. Geiger D. Finding approximate tandem repeats in genomic sequences. J. Comput. Biol. 2005 12 7 928 942 10.1089/cmb.2005.12.928
    [Google Scholar]
  75. Hoff K.J. Stanke M. Predicting genes in single genomes with AUGUSTUS. Curr. Protoc. Bioinformatics 2019 65 1 e57 10.1002/cpbi.57
    [Google Scholar]
  76. Jurka J. Klonowski P. Dagman V. Pelton P. Censor—a program for identification and elimination of repetitive elements from DNA sequences. Comput. Chem. 1996 20 1 119 121 10.1016/S0097‑8485(96)80013‑1
    [Google Scholar]
  77. Shi L. Chen H. Jiang M. Wang L. Wu X. Huang L. Liu C. CPGAVAS2, an integrated plastome sequence annotator and analyzer. Nucleic Acids Res. 2019 47 W1 W65 W73 10.1093/nar/gkz345
    [Google Scholar]
  78. Couvin D. Bernheim A. Toffano-Nioche C. Touchon M. Michalik J. Néron B. Rocha E.P.C. Vergnaud G. Gautheret D. Pourcel C. CRISPRCasFinder, an update of CRISRFinder, includes a portable version, enhanced performance and integrates search for Cas proteins. Nucleic Acids Res. 2018 46 W1 W246 W251 10.1093/nar/gky425
    [Google Scholar]
  79. Ye C. Ji G. Li L. Liang C. detectIR: a novel program for detecting perfect and imperfect inverted repeats using complex numbers and vector calculation. PLoS One 2014 9 11 e113349 10.1371/journal.pone.0113349
    [Google Scholar]
  80. Ye C. Ji G. Liang C. detectMITE: A novel approach to detect miniature inverted repeat transposable elements in genomes. Sci. Rep. 2016 6 1 19688 10.1038/srep19688
    [Google Scholar]
  81. Shi J. Liang C. Generic Repeat Finder: A High-Sensitivity Tool for Genome-Wide De Novo Repeat Detection. Plant Physiol. 2019 180 4 1803 1815 10.1104/pp.19.00386
    [Google Scholar]
  82. Mudunuri S.B. Nagarajaram H.A. IMEx: Imperfect Microsatellite Extractor. Bioinformatics 2007 23 10 1181 1187 10.1093/bioinformatics/btm097
    [Google Scholar]
  83. Wirawan A. INVERTER: INtegrated Variable numbER Tandem rEpeat findeR. Computational Systems-Biology and Bioinformatics Springer Berlin Heidelberg: Berlin, Heidelberg 2010 10.1007/978‑3‑642‑16750‑8_14
    [Google Scholar]
  84. Nicolas J. Peterlongo P. Tempel S. Finding and Characterizing Repeats in Plant Genomes. Methods Mol. Biol. 2016 1374 3 293 337 10.1007/978‑1‑4939‑3167‑5_17
    [Google Scholar]
  85. Ou S. Jiang N. LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. Plant Physiol. 2018 176 2 1410 1422 10.1104/pp.17.01310
    [Google Scholar]
  86. Chen G.L. Chang Y.J. Hsueh C.H. PRAP: an ab initio software package for automated genome-wide analysis of DNA repeats for prokaryotes. Bioinformatics 2013 29 21 2683 2689 10.1093/bioinformatics/btt482
    [Google Scholar]
  87. Bergman C.M. Quesneville H. Discovering and detecting transposable elements in genome sequences. Brief. Bioinform. 2007 8 6 382 392 10.1093/bib/bbm048
    [Google Scholar]
  88. Girgis H.Z. Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale. BMC Bioinformatics 2015 16 1 227 10.1186/s12859‑015‑0654‑5
    [Google Scholar]
  89. Feschotte C. Keswani U. Ranganathan N. Guibotsy M.L. Levine D. Exploring repetitive DNA landscapes using REPCLASS, a tool that automates the classification of transposable elements in eukaryotic genomes. Genome Biol. Evol. 2009 1 205 220 10.1093/gbe/evp023
    [Google Scholar]
  90. Guo R. Li Y-R. He S. Ou-Yang L. Sun Y. Zhu Z. RepLong: de novo repeat identification using long read sequencing data. Bioinformatics 2018 34 7 1099 1107 10.1093/bioinformatics/btx717
    [Google Scholar]
  91. Kurtz S. Phillippy A. Delcher A.L. Smoot M. Shumway M. Antonescu C. Salzberg S.L. Versatile and open software for comparing large genomes. Genome Biol. 2004 5 2 R12 10.1186/gb‑2004‑5‑2‑r12
    [Google Scholar]
  92. Agarwal P. States D.J. The Repeat Pattern Toolkit (RPT): analyzing the structure and evolution of the C. elegans genome. Proc. Int. Conf. Intell. Syst. Mol. Biol. 1994 2 1 1 9
    [Google Scholar]
  93. Achaz G. Boyer F. Rocha E.P.C. Viari A. Coissac E. Repseek, a tool to retrieve approximate repeats from large DNA sequences. Bioinformatics 2007 23 1 119 121 10.1093/bioinformatics/btl519
    [Google Scholar]
  94. Mao H. Wang H. SINE_scan: an efficient tool to discover short interspersed nuclear elements (SINEs) in large-scale genomic datasets. Bioinformatics 2017 33 5 743 745 10.1093/bioinformatics/btw718
    [Google Scholar]
  95. Pokrzywa R. Polanski A. BWtrs: A tool for searching for tandem repeats in DNA sequences based on the Burrows–Wheeler transform. Genomics 2010 96 5 316 321 10.1016/j.ygeno.2010.08.001
    [Google Scholar]
  96. Delgrange O. Rivals E. STAR: an algorithm to Search for Tandem Approximate Repeats. Bioinformatics 2004 20 16 2812 2820 10.1093/bioinformatics/bth335
    [Google Scholar]
  97. Chiu R. Rajan-Babu I-S. Friedman J.M. Birol I. Straglr: discovering and genotyping tandem repeat expansions using whole genome long-read sequences. Genome Biol. 2021 22 1 224 10.1186/s13059‑021‑02447‑3
    [Google Scholar]
  98. Kurtz S. Narechania A. Stein J.C. Ware D. A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes. BMC Genomics 2008 9 1 517 10.1186/1471‑2164‑9‑517
    [Google Scholar]
  99. Boeva V. Regnier M. Papatsenko D. Makeev V. Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression. Bioinformatics 2006 22 6 676 684 10.1093/bioinformatics/btk032
    [Google Scholar]
  100. Wlodzimierz P. Hong M. Henderson I.R. TRASH: Tandem Repeat Annotation and Structural Hierarchy. Bioinformatics 2023 39 5 btad308 10.1093/bioinformatics/btad308
    [Google Scholar]
  101. Castelo A.T. Martins W. Gao G.R. TROLL--tandem repeat occurrence locator. Bioinformatics 2002 18 4 634 636 10.1093/bioinformatics/18.4.634
    [Google Scholar]
  102. Husi H. Skipworth R.J. Fearon K.C.H. Ross J.A. LSCluster, a large-scale sequence clustering and aligning software for use in partial identity mapping and splice-variant analysis. J. Proteomics 2013 84 185 189 10.1016/j.jprot.2013.04.006
    [Google Scholar]
  103. Kalyanaraman A. Aluru S. Efficient algorithms and software for detection of full-length LTR retrotransposons. J. Bioinform. Comput. Biol. 2006 4 2 197 216 10.1142/S021972000600203X
    [Google Scholar]
  104. Toth G. PLOTREP: a web tool for defragmentation and visual analysis of dispersed genomic repeats. Nucleic Acids Res. 2006 34 W708 W713 10.1093/nar/gkl263
    [Google Scholar]
  105. Sperber G. Lövgren A. Eriksson N-E. Benachenhou F. Blomberg J. RetroTector online, a rational tool for analysis of retroviral elements in small and medium size vertebrate genomic sequences. BMC Bioinformatics 2009 10 S6 Suppl. 6 S4 10.1186/1471‑2105‑10‑S6‑S4
    [Google Scholar]
  106. Du C. Caronna J. He L. Dooner H.K. Computational prediction and molecular confirmation of Helitron transposons in the maize genome. BMC Genomics 2008 9 1 51 10.1186/1471‑2164‑9‑51
    [Google Scholar]
  107. Lucier J.F. RTAnalyzer: a web application for finding new retrotransposons and detecting L1 retrotransposition signatures. Nucleic Acids Res. 2007 35 W269 W274 10.1093/nar/gkm313
    [Google Scholar]
  108. Szak S.T. Pickeral O.K. Makalowski W. Boguski M.S. Landsman D. Boeke J.D. Molecular archeology of L1 insertions in the human genome. Genome Biol. 2002 3 10 research0052.1. 10.1186/gb‑2002‑3‑10‑research0052
    [Google Scholar]
  109. Tu Z. Eight novel families of miniature inverted repeat transposable elements in the African malaria mosquito, Anopheles gambiae. Proc. Natl. Acad. Sci. USA 2001 98 4 1699 1704 10.1073/pnas.98.4.1699
    [Google Scholar]
  110. Rho M. Choi J-H. Kim S. Lynch M. Tang H. De novo identification of LTR retrotransposons in eukaryotic genomes. BMC Genomics 2007 8 1 90 10.1186/1471‑2164‑8‑90
    [Google Scholar]
  111. Chen Y. Zhou F. Li G. Xu Y. MUST: A system for identification of miniature inverted-repeat transposable elements and applications to Anabaena variabilis and Haloquadratum walsbyi. Gene 2009 436 1-2 1 7 10.1016/j.gene.2009.01.019
    [Google Scholar]
  112. Pereira V. Automated paleontology of repetitive DNA with REANNOTATE. BMC Genomics 2008 9 1 614 10.1186/1471‑2164‑9‑614
    [Google Scholar]
  113. Liao X. Gao X. Zhang X. Wu F-X. Wang J. RepAHR: an improved approach for de novo repeat identification by assembly of the high-frequency reads. BMC Bioinformatics 2020 21 1 463 10.1186/s12859‑020‑03779‑w
    [Google Scholar]
  114. Novák P. Neumann P. Macas J. Global analysis of repetitive DNA from unassembled sequence reads using RepeatExplorer2. Nat. Protoc. 2020 15 11 3745 3776 10.1038/s41596‑020‑0400‑y
    [Google Scholar]
  115. Smith C.D. Edgar R.C. Yandell M.D. Smith D.R. Celniker S.E. Myers E.W. Karpen G.H. Improved repeat identification and masking in Dipterans. Gene 2007 389 1 1 9 10.1016/j.gene.2006.09.011
    [Google Scholar]
  116. Otto T.D. Gomes L.H.F. Alves-Ferreira M. de Miranda A.B. Degrave W.M. ReRep: Computational detection of repetitive sequences in genome survey sequences (GSS). BMC Bioinformatics 2008 9 1 366 10.1186/1471‑2105‑9‑366
    [Google Scholar]
  117. Naik P.K. Mittal V.K. Gupta S. RetroPred: A tool for prediction, classification and extraction of non-LTR retrotransposons (LINEs & SINEs) from the genome by integrating PALS, PILER, MEME and ANN. Bioinformation 2008 2 6 263 270 10.6026/97320630002263
    [Google Scholar]
  118. Dashnow H. Pedersen B.S. Hiatt L. Brown J. Beecroft S.J. Ravenscroft G. LaCroix A.J. Lamont P. Roxburgh R.H. Rodrigues M.J. Davis M. Mefford H.C. Laing N.G. Quinlan A.R. STRling: a k-mer counting approach that detects short tandem repeat expansions at known and novel loci. Genome Biol. 2022 23 1 257 10.1186/s13059‑022‑02826‑4
    [Google Scholar]
  119. Kronmiller B.A. Wise R.P. TEnest: automated chronological annotation and visualization of nested plant transposable elements. Plant Physiol. 2008 146 1 45 59 10.1104/pp.107.110353
    [Google Scholar]
  120. Giordano J. Ge Y. Gelfand Y. Abrusán G. Benson G. Warburton P.E. Evolutionary history of mammalian transposons determined by genome-wide defragmentation. PLOS Comput. Biol. 2007 3 7 e137 10.1371/journal.pcbi.0030137
    [Google Scholar]
  121. Lexa M. Jedlicka P. Vanat I. Cervenansky M. Kejnovsky E. TE-greedy-nester: structure-based detection of LTR retrotransposons and their nesting. Bioinformatics 2020 36 20 4991 4999 10.1093/bioinformatics/btaa632
    [Google Scholar]
  122. Kennedy R.C. Unger M.F. Christley S. Collins F.H. Madey G.R. An automated homology-based approach for identifying transposable elements. BMC Bioinformatics 2011 12 1 130 10.1186/1471‑2105‑12‑130
    [Google Scholar]
  123. Fiston-Lavier A.S. Carrigan M. Petrov D.A. González J. T-lex: a program for fast and accurate assessment of transposable element presence using next-generation sequencing data. Nucleic Acids Res. 2011 39 6 e36 10.1093/nar/gkq1291
    [Google Scholar]
  124. Jorda J. Kajava A.V. T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm. Bioinformatics 2009 25 20 2632 2638 10.1093/bioinformatics/btp482
    [Google Scholar]
  125. Paladin L. Bevilacqua M. Errigo S. Piovesan D. Mičetić I. Necci M. Monzon A.M. Fabre M.L. Lopez J.L. Nilsson J.F. Rios J. Menna P.L. Cabrera M. Buitron M.G. Kulik M.G. Fernandez-Alberti S. Fornasari M.S. Parisi G. Lagares A. Hirsh L. Andrade-Navarro M.A. Kajava A.V. Tosatto S.C.E. RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures. Nucleic Acids Res. 2021 49 D1 D452 D457 10.1093/nar/gkaa1097
    [Google Scholar]
  126. Neumann P. Novák P. Hoštáková N. Macas J. Systematic survey of plant LTR-retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classification. Mob. DNA 2019 10 1 1 22 10.1186/s13100‑018‑0144‑1
    [Google Scholar]
  127. Mistry J. Chuguransky S. Williams L. Qureshi M. Salazar G.A. Sonnhammer E.L.L. Tosatto S.C.E. Paladin L. Raj S. Richardson L.J. Finn R.D. Bateman A. Pfam: The protein families database in 2021. Nucleic Acids Res. 2021 49 D1 D412 D419 10.1093/nar/gkaa913
    [Google Scholar]
  128. Jorng-Tzong Horng Lin, F.M.; Lin, J.H.; Huang, H.D.; Liu, B.J. Database of repetitive elements in complete genomes and data mining using transcription factor binding sites. IEEE Trans. Inf. Technol. Biomed. 2003 7 2 93 100 10.1109/TITB.2003.811878
    [Google Scholar]
  129. Boby T. Patch A.M. Aves S.J. TRbase: a database relating tandem repeats to disease genes for the human genome. Bioinformatics 2005 21 6 811 816 10.1093/bioinformatics/bti059
    [Google Scholar]
  130. Liao X. Hu K. Salhi A. Zou Y. Wang J. Gao X. msRepDB: a comprehensive repetitive sequence database of over 80 000 species. Nucleic Acids Res. 2022 50 D1 D236 D245 10.1093/nar/gkab1089
    [Google Scholar]
  131. Ghosh D. Object-oriented transcription factors database (ooTFD). Nucleic Acids Res. 2000 28 1 308 310 10.1093/nar/28.1.308
    [Google Scholar]
  132. Kazakov A.E. Cipriano M.J. Novichkov P.S. Minovitsky S. Vinogradov D.V. Arkin A. Mironov A.A. Gelfand M.S. Dubchak I. RegTransBase--a database of regulatory sequences and interactions in a wide range of prokaryotic genomes. Nucleic Acids Res. 2007 35 Database D407 D412 10.1093/nar/gkl865
    [Google Scholar]
  133. Shi J. Yang W. Chen M. Du Y. Zhang J. Wang K. AMD, an automated motif discovery tool using stepwise refinement of gapped consensuses. PLoS One 2011 6 9 e24576 10.1371/journal.pone.0024576
    [Google Scholar]
  134. Workman C.T. Stormo G.D. ANN-Spec: a method for discovering transcription factor binding sites with improved specificity. Pac. Symp. Biocomput. 2000 2000 467 478 10.1142/9789814447331_0044
    [Google Scholar]
  135. Steffens N.O. AthaMap: an online resource for in silico transcription factor binding sites in the Arabidopsis thaliana genome. Nucleic Acids Res. 2004 32 90001 368D 372 10.1093/nar/gkh017
    [Google Scholar]
  136. Che D. Jensen S. Cai L. Liu J.S. BEST: binding-site estimation suite of tools. Bioinformatics 2005 21 12 2909 2911 10.1093/bioinformatics/bti425
    [Google Scholar]
  137. Li G. Liu B. Ma Q. Xu Y. A new framework for identifying cis-regulatory motifs in prokaryotes. Nucleic Acids Res. 2011 39 7 e42 10.1093/nar/gkq948
    [Google Scholar]
  138. Triska M. Grocutt D. Southern J. Murphy D.J. Tatarinova T. cisExpress: motif detection in DNA sequences. Bioinformatics 2013 29 17 2203 2205 10.1093/bioinformatics/btt366
    [Google Scholar]
  139. Kuttippurathu L. Hsing M. Liu Y. Schmidt B. Maskell D.L. Lee K. He A. Pu W.T. Kong S.W. CompleteMOTIFs: DNA motif discovery platform for transcription factor binding experiments. Bioinformatics 2011 27 5 715 717 10.1093/bioinformatics/btq707
    [Google Scholar]
  140. Karanam S. Moreno C.S. CONFAC: automated application of comparative genomic promoter analysis to DNA microarray datasets. Nucleic Acids Res. 2004 32 W475 W484 10.1093/nar/gkh353
    [Google Scholar]
  141. Berezikov E. Guryev V. Cuppen E. CONREAL web server: identification and visualization of conserved transcription factor binding sites. Nucleic Acids Res. 2005 33 W447 W450 10.1093/nar/gki378
    [Google Scholar]
  142. Ma Q. DMINDA: an integrated web server for DNA motif identification and analyses. Nucleic Acids Res. 2014 42 W12 W19 10.1093/nar/gku315
    [Google Scholar]
  143. Liu F.F.M. FMGA: finding motifs by genetic algorithm. Proceedings. Fourth IEEE Symposium on Bioinformatics and Bioengineering Taichung, Taiwan 21 May 2004 459 466 10.1109/BIBE.2004.1317378
    [Google Scholar]
  144. Wang D. Xi L. GAPK: Genetic algorithms with prior knowledge for motif discovery in DNA sequences. 2009 IEEE Congress on Evolutionary Computation Trondheim 18-21 May 2009 277 284 10.1109/CEC.2009.4982959
    [Google Scholar]
  145. Newberg L.A. Thompson W.A. Conlan S. Smith T.M. McCue L.A. Lawrence C.E. A phylogenetic Gibbs sampler that yields centroid solutions for cis -regulatory site prediction. Bioinformatics 2007 23 14 1718 1727 10.1093/bioinformatics/btm241
    [Google Scholar]
  146. van Heeringen S.J. Veenstra G.J.C. GimmeMotifs: a de novo motif prediction pipeline for ChIP-sequencing experiments. Bioinformatics 2011 27 2 270 271 10.1093/bioinformatics/btq636
    [Google Scholar]
  147. Ao W. Gaudet J. Kent W.J. Muttumu S. Mango S.E. Environmentally induced foregut remodeling by PHA-4/FoxA and DAF-12/NHR. Science 2004 305 5691 1743 1746 10.1126/science.1102216
    [Google Scholar]
  148. Bailey T.L. Gribskov M. Combining evidence using p-values: application to sequence homology searches. Bioinformatics 1998 14 1 48 54 10.1093/bioinformatics/14.1.48
    [Google Scholar]
  149. Wang D. Tapan S. MISCORE: a new scoring function for characterizing DNA regulatory motifs in promoter sequences. BMC Syst. Biol. 2012 6 Suppl. 2 S4 10.1186/1752‑0509‑6‑S2‑S4
    [Google Scholar]
  150. Belmadani M. MotifGP: DNA motif discovery using multiobjective evolution.. Master thesis, University of Ottawa 2016
    [Google Scholar]
  151. Claeys M. Storms V. Sun H. Michoel T. Marchal K. MotifSuite: workflow for probabilistic motif detection and assessment. Bioinformatics 2012 28 14 1931 1932 10.1093/bioinformatics/bts293
    [Google Scholar]
  152. Fu Y. MotifViz: an analysis and visualization tool for motif discovery. Nucleic Acids Res. 2004 32 W420 W423 10.1093/nar/gkh426
    [Google Scholar]
  153. Mendes N.D. Casimiro A.C. Santos P.M. Sá-Correia I. Oliveira A.L. Freitas A.T. MUSA: a parameter free algorithm for the identification of biologically significant motifs. Bioinformatics 2006 22 24 2996 3002 10.1093/bioinformatics/btl537
    [Google Scholar]
  154. Nielsen M.M. Tataru P. Madsen T. Hobolth A. Pedersen J.S. Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments. Algorithms Mol. Biol. 2018 13 1 17 10.1186/s13015‑018‑0135‑2
    [Google Scholar]
  155. Mercier E. Droit A. Li L. Robertson G. Zhang X. Gottardo R. An integrated pipeline for the genome-wide analysis of transcription factor binding sites from ChIP-Seq. PLoS One 2011 6 2 e16432 10.1371/journal.pone.0016432
    [Google Scholar]
  156. Udayakumar M. Vaidhyanathan M. Sadhana R. Sai M. RSMD-repeat searcher and motif detector. J. Biomed. Res. 2014 28 5 416 422 10.7555/JBR.28.20130065
    [Google Scholar]
  157. Chakravarty A. Carlson J.M. Khetani R.S. Gross R.H. A novel ensemble learning method for de novo computational identification of DNA binding sites. BMC Bioinformatics 2007 8 1 249 10.1186/1471‑2105‑8‑249
    [Google Scholar]
  158. Mahony S. Hendrix D. Golden A. Smith T.J. Rokhsar D.S. Transcription factor binding site identification using the self-organizing map. Bioinformatics 2005 21 9 1807 1814 10.1093/bioinformatics/bti256
    [Google Scholar]
  159. Schones D.E. Smith A.D. Zhang M.Q. Statistical significance of cis-regulatory modules. BMC Bioinformatics 2007 8 1 19 10.1186/1471‑2105‑8‑19
    [Google Scholar]
  160. Romer K.A. Kayombya G.R. Fraenkel E. WebMOTIFS: automated discovery, filtering and scoring of DNA sequence motifs using multiple programs and Bayesian approaches. Nucleic Acids Res. 2007 35 W217 W220 10.1093/nar/gkm376
    [Google Scholar]
  161. Sun H. Yuan Y. Wu Y. Liu H. Liu J.S. Xie H. Tmod: toolbox of motif discovery. Bioinformatics 2010 26 3 405 407 10.1093/bioinformatics/btp681
    [Google Scholar]
/content/journals/ctmc/10.2174/0115680266331305241113172257
Loading
/content/journals/ctmc/10.2174/0115680266331305241113172257
Loading

Data & Media loading...


  • Article Type:
    Review Article
Keywords: Databases ; Motifs ; DNA repeats ; Computational tools/algorithms
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error
Please enter a valid_number test