A Review of Planted Motif Problem Types

Authors

  • Wajih Abdul Ghani Abdul Hussain Mustansiriyah University, College of Science, Department of Computer Science, Baghdad, Iraq,
  • Hussein Keitan Al-Khafaji Al-Rafidain University College, Department of Computer Communication Engineering, Baghdad, Iraq
  • Thekra Abbas Mustansiriyah University, College of Science, Department of Computer Science, Baghdad, Iraq

DOI:

https://doi.org/10.29304/jqcsm.2025.17.11961

Keywords:

Bioinformatics, motif mining, planted motif mining, deep learning

Abstract

The field of bioinformatics faces significant challenges in the extraction of meaningful knowledge from the vast and complex biological data, particularly in gene regulation and motif identification. This research delves into the intricate problem of planted motif mining (PMP), an NP-complete challenge critical for numerous applications, including disease diagnostics, forensic medicine, and environmental monitoring. Various algorithms have been developed to address this problem, ranging from deterministic polynomial-time algorithms to advanced deep learning methods. This study provides a comprehensive review of these methodologies, highlighting their efficiencies, limitations, and practical applications. The paper discusses profile-based and pattern-based algorithms, emphasizing the importance of approximation algorithms for handling extensive biological data. Additionally, the role of deep learning in motif mining is explored, showcasing the advancements brought by CNN-based, RNN-based, and hybrid models. The integration of these innovative techniques holds promise for improving the accuracy and speed of motif identification, ultimately enhancing our understanding of genetic regulation and its implications in health and disease.

Downloads

Download data is not yet available.

References

H. Leung and F. Chin, “Algorithms for Challenging motif problems”, JBCB, 43—58, (2005).

Pankaj Agarwal, Sanjeev K. Yadav & A.P. Shukla, “Solving Planted-Motif Problem – A New Approach”, Department of Computer Science & Engineering, Krishna Institute of Engineering & Technology, Ghaziabad, U.P,15th International Conference on Advanced Computing and Communications, (2007).

Ali Basim Yousif. “Simple and Structured Motif Mining in DNA, RNA, and Protein Data Using Knowledge Discovery Techniques“. Doctoral dissertation at University of Al-Mustansiriya, (2023).

Atheel Sabih Shaker,Saadaldeen Rashid Ahmed, “Information Retrieval for Cancer Cell Detection Based on Advanced Machine Learning Techniques”, Al-Mustansiriyah Journal of Science, Volume 33, Issue 3, (2022).

Stormo, G, “DNA binding sites: representation and discovery”. Bioinformatics, 16:16{23, (2000).

Du, Y.H.,Wang, Z.Z.: “Review on computational prediction of transcription factor blinding sites”. Life Sci. Res. 10(2), 24–31 (2006)

Li, T.T., Jiang, B.,Wang, X.W.: “Tutorial for computational analysis of transcription factor binding sites”. ActaBiophys. Sin. 24(5), 334– 347 (2008)

Brazma, A., Jonassen, I., Eidhammer, I., Gilbert, D.: “Approaches to the automatic discovery of patterns in biosequences”. J. Comput.Biol. 5, 279–305 (1998)

Bussemaker, H.J., Li, H., Siggia, E.D.: “Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis”. Proc. Natl. Acad. Sci. USA 97(18), 10096–10100 (2000).

Sinha, S., Tompa, M.: “Discovery of novel transcription factor binding sites by statistical overrepresentation”. Nucleic Acids Res.30(24), 5549–5560 (2002)

Sinha, S., Tompa, M.: “YMF: a program for discovery of novel transcription factor binding sites by statistical over representation”. Nucleic Acids Res. 31(13), 3586–3588 (2003)

Jaime Davila, SudhaBalla, and Sanguthevar Rajasekaran,“Space and Time Efficient Algorithms for Planted Motif Search“,V.N. Alexandrov et al. (Eds.): ICCS 2006, Part II, LNCS 3992, pp. 822–829, 2006._c Springer-Verlag Berlin Heidelberg (2006).

Sanguthevar Rajasekaran, Sudipta Pathak, “An experimental comparison of PMSprune and other algorithms for motif search“,Int. J. Bioinformatics Research and Applications, Vol. 10, No. 6, (2014).

P.P. Kuksa, and V. Pavlovic, “Efficient Motif Finding Algorithms for Large-Alphabet Inputs”, BMC Bioinformatics,http://www.biomedcentral.com/1471-2105/11/S8/S1, vol. 11, no. 8, article S1, (2010).

S. Rajasekaran, and H. Dinh, “A Speedup Technique for (l, d)Motif Finding Algorithms,” BMC Research Notes, https://doi.org/10.1186/1756-0500-4-54, vol. 4, no. 54, (2011).

H. Dinh, S. Rajasekaran, V.K. Kundeti, "PMS5: an efficient exact algorithm for the (l, d)-motif finding problem", BMC Bioinformatics, doi: 10.1186/1471-2105-12-410. vol. 12, no. 410, (2011).

S. Bandyopadhyay, S. Sahni, S. Rajasekaran, "PMS6: A faster algorithm for motif discovery", Proceedings of the second IEEE International Conference on Computational Advances in Bio and Medical Sciences, vol. 10, no. 4-5, pp 369-383, (2014).

Q. Yu, H. Huo, Y. Zhang and H. Guo, “Pair Motif: A New Pattern- Driven Algorithm for Planted (l, d) DNA Motif Search”, PLoS One, https://doi.org/10.1371/ journal.pone.0048442, vol. 7, no. 10:e48442, (2012).

Z. Wei, and S.T. Jensen, "GAME: detecting cis-regulatory elements using a genetic algorithm", Bioinformatics, vol. 22, no. 13, pp. 1577- 1584, (2006).

M. Kaya, "MOGAMOD: Multi-objective genetic algorithm for motif discovery", Expert Systems with Applications, vol. 36, no. 2, pp. 1039-1047, (2009).

H. Huo, Z. Zhao,V. Stojkovic and L. Liu, "Optimizing genetic algorithm for motif discovery". Mathematical and Computer Modeling, vol. 52, no. 11, pp. 2011-2020, (2010).

Peng Xiao, Soumitra Pal, Sanguthevar Rajasekaran, “qPMS10: A Randomized Algorithm for Efficiently Solving Quorum Planted Motif Search Problem”, IEEE International Conference on Bioinformatics and Biomedicine (BIBM), USA, (2016).

CaiyanJia, Ruqian Lu, Lusheng Chen, “A Frequent Pattern Mining Method for Finding Planted Motifs of Unknown Length in DNA Sequences”, International Journal of Computational Intelligence Systems, Vol. 4, No. 5, 1032-1041, China, September, (2011).

Yongqiang Zhang, Mohammed J. Zaki, “exMotif: Efficient Structured Motif Extraction”, Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY 12180, USA.

Majed Sahli, Essam Mansour, PanosKalnis, “ACME: A scalable parallel system for extracting frequent patterns from a very long sequence “, The VLDB Journal 23:871–893, © Springer-Verlag Berlin Heidelberg (2014).

Zhang Y, Wang P, Yan M. “An entropy-based position projection algorithm for motif discovery”. BioMed Res Int 2016; (2016).

11. Sharov AA, Ko SH. “Exhaustive search for over-re-presented DNA sequence motifs with CisFinder”. DNA Res (2009);16(5):261-273.

Kilpatrick AM, Ward B, Aitken S. “Stochastic EM-based TFBS motif discovery with MITSU”. Bioinformatics (2014); 30(12):i310–i318.

Siebert M, Söding J. “Bayesian Markov models consis-tently outperform PWMs at predicting motifs in nucleo-tide sequences”. Nucleic Acids Res (2016); 44(13):6055-6069.

Latchman, D.S.: “Transcription Factors: A Practical Approach”. Oxford University Press, Oxford (1993).

Wu, B., et al.: “Identify target genes involved in transcription factor GCF2 that promotes cell migration in tumor cell BEL”-7404.Genomics Appl. Biol. 34(1), 35–40 .(2015).

Haruka, O., Wataru, I.: MOCCS: “Clarifying DNA-binding motif ambiguity using ChIP-Seq data”. Comput. Biol. Chem. 63, 62–72. (2016).

Tamura,K., Peterson, D., Peterson, N., Stecher,G., Nei, M.,Kumar, S.: “MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods”. Mol. Biol. Evol. 28, 2731–2739 (2011)

Xiaochun Sheng, Kefeng Wang, “Motif identification method based on Gibbs sampling and genetic algorithm“,ClusterComput DOI 10.1007/s10586-016-0699-x,© Springer Science+Business Media New York (2016).

Lounnas Bilal, Bouderah Brahim, Moussaoui Abdelouahab, “Biological Motif Discovery Algorithm based on Mining Tree Structure “, International Journal of Computer Applications (0975 – 8887) Volume 69– No.4, May (2013).

QIANG YU AND XIAO ZHANG, “A New Efficient Algorithm for Quorum Planted Motif Search on Large DNA Datasets”, IEEE Access, Digital Object Identifier 10.1109/ACCESS.(2019). 2940115

Qiang Yu, HongweiHuo*, Xiaoyang Chen, HaitaoGuo, Jeffrey Scott Vitter and Jun Huan, “An Efficient Motif Finding Algorithm for Large DNA Data Sets“, IEEE International Conference on Bioinformatics and Biomedicine (2014).

Makolo AU and Lamidi UA, “Motif Discovery in DNA Sequences Using an Improved Gibbs (i Gibbs) Sampling Algorithm“, Journal of Computer Science & Systems Biology, Nigeria (2018).

Zainab Muhammed Jameel, ”DNA Motif Mining Using Finite State Automata”, Iraqi Commission for Computers and Informatics Institute for Postgraduate Studies, pp.4,30-31, (2017).

Hasnaa Imad Al-Shaikhli,“Approximate Algorithms for Regulatory Motif Discovery in DNA“, Doctoral Dissertation Western Michigan University, USA, (2019).

Fan Y, Wu W, Liu R, Yang W. “An iterative algorithm for motif discovery”. Procedia Computer Science (2013); 24:25-29.

Wang X, Song T, Wang Z, Su Y, Liu X. MRPGA: “motif detecting by modified random projection strategy and genetic algorithm”. J Computational Theoretical Nano-science (2013); 10(5):1209-1214.

Zhou J, Troyanskaya OG, “Predicting effects of noncoding variants with deep learning–based sequence model”. Nat Methods (2015); 12:931–4.

Salekin S, Zhang JM, Huang Y. “A deep learning model for predicting transcription factor binding location at single nucleotide resolution”. In: 2017 IEEE EMBS International Conference on Biomedical & Health Informatics. (2017), pp. 57-60.

Gupta A, Rush AM. Dilated convolutions for modeling long distance genomic dependencies. arXiv:1710.01278. (2017).

Quang D, XieX.DanQ: “a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences”. Nucleic Acids Res (2016); 44:e107–7.

Shen Z, Bao W, Huang D-S. “Recurrent neural network for predicting transcription factor binding sites”. Sci Rep (2018); 8:1–10.

Pan X, Rijnbeek P, Yan J, et al. “Prediction of RNA protein sequence and structure binding preferences using deep convolutional and recurrent neural networks”. BMC Genomics (2018); 19:511.

Downloads

Published

2025-03-30

How to Cite

Abdul Hussain, W. A. G., Al-Khafaji, H. K., & Abbas, T. (2025). A Review of Planted Motif Problem Types. Journal of Al-Qadisiyah for Computer Science and Mathematics, 17(1), Comp. 41–51. https://doi.org/10.29304/jqcsm.2025.17.11961

Issue

Section

Computer Articles