Web Scraping Scientific Repositories: Springer and Nature for University of Basrah
DOI:
https://doi.org/10.29304/jqcsm.2024.16.11430Keywords:
Springer and Nature, University of Basrah, Web Scraping, data extractionAbstract
This study explores the field of scientific data extraction using online scraping techniques, with a specific focus on the Springer and Nature archives within the University of Basrah's setting. This study aims to explicate the theoretical underpinnings of web scraping, emphasizing its importance in the acquisition of structured data from online sources. This study explores the many issues presented by dynamic content, captchas, and IP blocking and proposes novel solutions for each of these obstacles. The university's research objectives were supported by a rich dataset that was carefully constructed through a painstaking approach encompassing data collection, preparation techniques. The results highlight the effectiveness of web scraping, significant influence of preprocessing. This study not only enhances the existing body of academic research methodology but also advances the University of Basrah's pursuit of data-driven and influential scholarly pursuits.
Downloads
References
B. Tabaku and M. Ali, “Protecting Web Applications from Web Scraping,” in Lecture Notes of the Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering, LNICST, 2021. doi: 10.1007/978-3-030-90016-8_4.
P. Kaur, “Sentiment analysis using web scraping for live news data with machine learning algorithms,” Mater Today Proc, vol. 65, pp. 3333–3341, Jan. 2022, doi: 10.1016/j.matpr.2022.05.409.
K. K. C. Reddy, P. R. Anisha, N. G. Nguyen, and G. Sreelatha, “A Text Mining using Web Scraping for Meaningful Insights,” in Journal of Physics: Conference Series, IOP Publishing Ltd, Nov. 2021. doi: 10.1088/1742-6596/2089/1/012048.
P. Matta , N. Sharma , D. Sharma , B. Pant and S. Sharma,“Web Scraping: Applications and Scraping Tools,” International Journal of Advanced Trends in Computer Science and Engineering, vol. 9, no. 5, 2020, doi: 10.30534/ijatcse/2020/185952020. [5] T. Gottardi, C. B. Medeiros, and J. C. Dos Reis, “Semantic Search on Scientific Repositories: A Systematic Literature Review,” Sociedade Brasileira de Computacao - SB, Mar. 2021, pp. 271–276. doi: 10.5753/sbbd.2020.13653.
F. Speckmann, “Web Scraping,” Z Psychol, vol. 229, no. 4, 2021, doi: 10.1027/2151-2604/a000470.
M. Dogucu and M. Çetinkaya-Rundel, “Web Scraping in the Statistics and Data Science Curriculum: Challenges and Opportunities,” Journal of Statistics Education, 2020, doi: 10.1080/10691898.2020.1787116.
S. vanden Broucke and B. Baesens, “From Web Scraping to Web Crawling,” in Practical Web Scraping for Data Science, 2018. doi: 10.1007/978-1-4842-3582-9_6.
H. Nigam and P. Biswas, “From Web Scraping to Web Crawling,” 2021. doi: 10.1007/978-981-16-3067-5_9.
C. C. Chung and T. S. Jeng, “Information extraction methodology by web scraping for smart cities: Using machine learning to train air quality monitor for smart cities,” in CAADRIA 2018 - 23rd International Conference on Computer-Aided Architectural Design Research in Asia: Learning, Prototyping and Adapting, The Association for Computer-Aided Architectural Design Research in Asia (CAADRIA), 2018, pp. 515–524. doi: 10.52842/conf.caadria.2018.2.515.
K. BABU, “Survey on Web scraping technology,” WAFFEN-UND KOSTUMKUNDE JOURNAL, vol. 16, no. 06, 2020, doi: 10.37896/whjj16.06/001.
A. D. I. I. G. L. E. P. G. S. P. E. H. A. Prastyo. Dedy Rahman Prehanto, “Implementation of Web Scraping on News Sites Using the Supervised Learning Method,” İlköğretim Online, vol. 20, no. 3, Jan. 2021, doi: 10.17051/ilkonline.2021.03.43.
J. C. Bricongne, B. Meunier, and S. Pouget, “Web-scraping housing prices in real-time: The Covid-19 crisis in the UK,” J Hous Econ, vol. 59, Mar. 2023, doi: 10.1016/j.jhe.2022.101906.
M. I. Habibie, T. Widiaputra, and Y. Yulianingsani, “Web Scraping of Disease Information From Social Media Twitter,” Jurnal Teknoinfo, vol. 16, no. 2, 2022, doi: 10.33365/jti.v16i2.1871.
H. Nigam and P. Biswas, “Web scraping: From tools to related legislation and implementation using python,” in Lecture Notes on Data Engineering and Communications Technologies, 2021. doi: 10.1007/978-981-15-9651-3_13.
D. M. Thomas and S. Mathur, “Data Analysis by Web Scraping using Python,” in Proceedings of the 3rd International Conference on Electronics and Communication and Aerospace Technology, ICECA 2019, 2019. doi: 10.1109/ICECA.2019.8822022.
E. Uzun, “A Novel Web Scraping Approach Using the Additional Information Obtained from Web Pages,” IEEE Access, vol. 8, 2020, doi: 10.1109/ACCESS.2020.2984503.
R. Diouf, E. N. Sarr, O. Sall, B. Birregah, M. Bousso, and S. N. Mbaye, “Web Scraping: State-of-the-Art and Areas of Application,” in Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019, 2019. doi: 10.1109/BigData47090.2019.9005594.
C. Lotfi, S. Srinivasan, M. Ertz, and I. Latrous, “Web Scraping Techniques and Applications: A Literature Review,” in SCRS CONFERENCE PROCEEDINGS ON INTELLIGENT SYSTEMS, 2021. doi: 10.52458/978-93-91842-08-6-38.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Zahraa Taufeeq Al-Madhhachi, Salma A. Mahmood
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.