Web Scraping Scientific Repositories: Springer and Nature for University of Basrah

Authors

  • Zahraa Taufeeq Al-Madhhachi computer science, Iraqi commission for Computer & informatics, university of information Technology and communication, Iraq
  • Salma A. Mahmood Collage of Computer Science and Information Technology, University of Basrah, Iraq

DOI:

https://doi.org/10.29304/jqcsm.2024.16.11430

Keywords:

Springer and Nature, University of Basrah, Web Scraping, data extraction

Abstract

This study explores the field of scientific data extraction using online scraping techniques, with a specific focus on the Springer and Nature archives within the University of Basrah's setting. This study aims to explicate the theoretical underpinnings of web scraping, emphasizing its importance in the acquisition of structured data from online sources. This study explores the many issues presented by dynamic content, captchas, and IP blocking and proposes novel solutions for each of these obstacles. The university's research objectives were supported by a rich dataset that was carefully constructed through a painstaking approach encompassing data collection, preparation techniques. The results highlight the effectiveness of web scraping, significant influence of preprocessing. This study not only enhances the existing body of academic research methodology but also advances the University of Basrah's pursuit of data-driven and influential scholarly pursuits.

Downloads

Download data is not yet available.

References

B. Tabaku and M. Ali, “Protecting Web Applications from Web Scraping,” in Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, 2021. doi: 10.1007/978-3-030-90016-8_4.

P. Kaur, “Sentiment analysis using web scraping for live news data with machine learning algorithms,” Mater Today Proc, vol. 65, pp. 3333–3341, Jan. 2022, doi: 10.1016/j.matpr.2022.05.409.

K. K. C. Reddy, P. R. Anisha, N. G. Nguyen, and G. Sreelatha, “A Text Mining using Web Scraping for Meaningful Insights,” in Journal of Physics: Conference Series, IOP Publishing Ltd, Nov. 2021. doi: 10.1088/1742-6596/2089/1/012048.

P. Matta , N. Sharma , D. Sharma , B. Pant and S. Sharma,“Web Scraping: Applications and Scraping Tools,” International Journal of Advanced Trends in Computer Science and Engineering, vol. 9, no. 5, 2020, doi: 10.30534/ijatcse/2020/185952020. [5] T. Gottardi, C. B. Medeiros, and J. C. Dos Reis, “Semantic Search on Scientific Repositories: A Systematic Literature Review,” Sociedade Brasileira de Computacao - SB, Mar. 2021, pp. 271–276. doi: 10.5753/sbbd.2020.13653.

F. Speckmann, “Web Scraping,” Z Psychol, vol. 229, no. 4, 2021, doi: 10.1027/2151-2604/a000470.

M. Dogucu and M. Çetinkaya-Rundel, “Web Scraping in the Statistics and Data Science Curriculum: Challenges and Opportunities,” Journal of Statistics Education, 2020, doi: 10.1080/10691898.2020.1787116.

S. vanden Broucke and B. Baesens, “From Web Scraping to Web Crawling,” in Practical Web Scraping for Data Science, 2018. doi: 10.1007/978-1-4842-3582-9_6.

H. Nigam and P. Biswas, “From Web Scraping to Web Crawling,” 2021. doi: 10.1007/978-981-16-3067-5_9.

C. C. Chung and T. S. Jeng, “Information extraction methodology by web scraping for smart cities: Using machine learning to train air quality monitor for smart cities,” in CAADRIA 2018 - 23rd International Conference on Computer-Aided Architectural Design Research in Asia: Learning, Prototyping and Adapting, The Association for Computer-Aided Architectural Design Research in Asia (CAADRIA), 2018, pp. 515–524. doi: 10.52842/conf.caadria.2018.2.515.

K. BABU, “Survey on Web scraping technology,” WAFFEN-UND KOSTUMKUNDE JOURNAL, vol. 16, no. 06, 2020, doi: 10.37896/whjj16.06/001.

A. D. I. I. G. L. E. P. G. S. P. E. H. A. Prastyo. Dedy Rahman Prehanto, “Implementation of Web Scraping on News Sites Using the Supervised Learning Method,” İlköğretim Online, vol. 20, no. 3, Jan. 2021, doi: 10.17051/ilkonline.2021.03.43.

J. C. Bricongne, B. Meunier, and S. Pouget, “Web-scraping housing prices in real-time: The Covid-19 crisis in the UK,” J Hous Econ, vol. 59, Mar. 2023, doi: 10.1016/j.jhe.2022.101906.

M. I. Habibie, T. Widiaputra, and Y. Yulianingsani, “Web Scraping of Disease Information From Social Media Twitter,” Jurnal Teknoinfo, vol. 16, no. 2, 2022, doi: 10.33365/jti.v16i2.1871.

H. Nigam and P. Biswas, “Web scraping: From tools to related legislation and implementation using python,” in Lecture Notes on Data Engineering and Communications Technologies, 2021. doi: 10.1007/978-981-15-9651-3_13.

D. M. Thomas and S. Mathur, “Data Analysis by Web Scraping using Python,” in Proceedings of the 3rd International Conference on Electronics and Communication and Aerospace Technology, ICECA 2019, 2019. doi: 10.1109/ICECA.2019.8822022.

E. Uzun, “A Novel Web Scraping Approach Using the Additional Information Obtained from Web Pages,” IEEE Access, vol. 8, 2020, doi: 10.1109/ACCESS.2020.2984503.

R. Diouf, E. N. Sarr, O. Sall, B. Birregah, M. Bousso, and S. N. Mbaye, “Web Scraping: State-of-the-Art and Areas of Application,” in Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019, 2019. doi: 10.1109/BigData47090.2019.9005594.

C. Lotfi, S. Srinivasan, M. Ertz, and I. Latrous, “Web Scraping Techniques and Applications: A Literature Review,” in SCRS CONFERENCE PROCEEDINGS ON INTELLIGENT SYSTEMS, 2021. doi: 10.52458/978-93-91842-08-6-38.

Downloads

Published

2023-03-30

How to Cite

Taufeeq Al-Madhhachi, Z., & A. Mahmood, S. (2023). Web Scraping Scientific Repositories: Springer and Nature for University of Basrah. Journal of Al-Qadisiyah for Computer Science and Mathematics, 16(1), Comp. 1–8. https://doi.org/10.29304/jqcsm.2024.16.11430

Issue

Section

Computer Articles