An Incremental Ensemble Diversification in Data Stream Classification using Improved Hoeffding Trees with Thompson Sampling

Authors

  • Ahmed Al-Shammari Department of Computer Science, College of Computer Science and Information Technology, University of Al-Qadisiyah, Al Diwaniyah, 58002, Iraq

DOI:

https://doi.org/10.29304/jqcsm.2024.16.21569

Keywords:

Classification, Data Stream, Algorithms, Concept drift, Ensemble Diversification

Abstract

Data stream classification is a challenging task because of disruptive changes in the data distribution, also known as concept drift. Ensemble diversification is a crucial method in data stream classification, offering improved adaptability, flexibility, and efficiency.  In such cases, it is recognized that having an additional diverse ensemble of components improves prediction accuracy. Existing works have shown serious drawbacks in terms of accuracy and response time. This requires an adaptive approach for selecting components with high performance. Therefore, in this paper, we proposed an incremental ensemble diversification approach in data streams classification based on the combination of Improved Hoeffding Trees and Thompson Sampling (IHTTS). Our proposed approach begins with generating an initial set of classes for the data stream with timestamp (tn), then updating the classes when newly incoming data arrive (tn+1), and finally combining module diversity and prediction accuracy. The results on real datasets verify the efficiency and effectiveness of the proposed IHTTS approach.

Downloads

Download data is not yet available.

References

Derweesh, M. S., Alazawi, S. A. H., & Al-Saleh, A. H. (2023). Multi-Level Deep Learning Model for Network Anomaly Detection. Journal of Al-Qadisiyah for Computer Science and Mathematics, 15(4), 8-19.

Hassoon, I. M. (2022). Classification and Diseases Identification of Mango Based on Artificial Intelligence: A Review. Journal of Al-Qadisiyah for computer science and mathematics, 14(4), Page-39.

Karim, A. A., & Shati, N. M. (2017). Abnormality Detection using K-means Data Stream Clustering Algorithm in Intelligent Surveillance System. Journal of AL-Qadisiyah for computer science and mathematics, 9(1), 82-98.

Chen, H., & He, H. (2022). "Ensemble Methods for Data Stream Classification: A Review." IEEE Access, 10, 23952-23967. doi:10.1109/ACCESS.2022.3146795.

Minku, L. L., & Gama, J. (2021). "A Survey on Learning from Data Streams: Current Trends and Future Directions." Progress in Artificial Intelligence, 10(3), 183-206. doi:10.1007/s13748-021-00242-0.

Zhou, Z.-H. (2023). "Ensemble Learning in Data Streams: Principles and Algorithms." Foundations and Trends® in Machine Learning, 16(1-2), 1-202. doi:10.1561/2200000075.

Sousa, R. T., Bifet, A., Pfahringer, B., & Holmes, G. (2022). "Adaptive Random Forests for Evolving Data Stream Classification." Journal of Machine Learning Research, 23(156), 1-37. Available: https://jmlr.org/papers/volume23/21-1272/21-1272.pdf.

Parvathi, G., & Sasirekha, V. (2023). "Enhancing Data Stream Classification through Ensemble Diversity." Journal of Machine Learning Research, 24(1), 112-134.

Krawczyk, B., & Woźniak, M. (2023). "Online Learning from Imbalanced Data Streams with Adaptive Ensemble Methods." Knowledge-Based Systems, 257, 109905. doi:10.1016/j.knosys.2023.109905.

Bifet, A., Read, J., Pfahringer, B., Holmes, G., & Gama, J. (2021). "Ensembles of Restricted Hoeffding Trees for Imbalanced Data Streams." Journal of Artificial Intelligence Research, 70, 1-40. doi:10.1613/jair.1.12851.

Losing, V., Hammer, B., & Wersing, H. (2021). "Incremental On-line Learning: A Review and Comparison of State of the Art Algorithms." Neurocomputing, 275, 1261-1274. doi:10.1016/j.neucom.2017.06.084.

Jiao, B., Guo, Y., Yang, S., Pu, J., & Gong, D. (2022). Reduced-space multistream classification based on multi-objective evolutionary optimization. IEEE Transactions on Evolutionary Computation.

Gama, J., Sebastião, R., & Rodrigues, P. (2023). "Heterogeneous Ensembles for Concept Drift Adaptation." ACM Computing Surveys, 55(2), 45-67.

Kuncheva, L. I., & Whitaker, C. J. (2003). "Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy." Machine Learning, 51(2), 181-207.

Zhang, M., Zhao, X., Li, W., Zhang, Y., Tao, R., & Du, Q. (2023). Cross-scene joint classification of multisource data with multilevel domain adaption network. IEEE Transactions on Neural Networks and Learning Systems.

Gao, J., Fan, W., Han, J., & Yu, P. S. (2021). "A Chunk-based Adaptive Ensemble Framework for Data Stream Classification." ACM Transactions on Knowledge Discovery from Data, 15(3), 45-67.

Bifet, A., Read, J., & Pfahringer, B. (2021). "Hybrid Methods for Data Stream Classification." Knowledge and Information Systems, 63(1), 5-29.

Mendes-Moreira, J., Soares, C., Jorge, A. M., & Sousa, J. F. (2022). "Evolving Ensemble Methods with Genetic Algorithms for Data Stream Mining." Evolutionary Computation, 30(2), 221-243.

Liu, F., & Wu, X. (2024). "Resource-Aware Ensemble Methods for Scalable Data Stream Classification." Data Mining and Knowledge Discovery, 38(1), 55-78.

Bi, W., Wang, B., & Liu, H. (2024). Personalized Dynamic Pricing Based on Improved Thompson Sampling. Mathematics, 12(8), 1123.

Abadifard, S., Bakhshi, S., Gheibuni, S., & Can, F. (2023, October). DynED: Dynamic Ensemble Diversification in Data Stream Classification. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (pp. 3707-3711).

Dheeru Dua and Casey Graff. 2017. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml

Downloads

Published

2024-06-30

How to Cite

Al-Shammari, A. (2024). An Incremental Ensemble Diversification in Data Stream Classification using Improved Hoeffding Trees with Thompson Sampling. Journal of Al-Qadisiyah for Computer Science and Mathematics, 16(2), Comp Page 187– 194. https://doi.org/10.29304/jqcsm.2024.16.21569

Issue

Section

Computer Articles