Oil and Gas Production Forecasting Using Decision Trees, Random Forst, and XGBoost


  • Mays A. Al shabaan College of Computer Science & Information Technology, University of Basrah, Iraq
  • Zainab N. Nemer College of Computer Science & Information Technology, University of Basrah, Iraq




Machine Learning (ML), Oil and Gas production, Random forest (RFR), Decision tree(DTR)


Oil and gas production forecasting has always been a hot topic in the petroleum industry. Production forecasting in this sector aims to estimate future production rates, facilitating operational planning, production optimization, and resource allocation for companies. Scientists have traditionally attempted to forecast oil and gas production using methods such as Numerical Reservoir Simulation (NRS) and Decline Curve Analysis (DCA). However, these methods present challenges including time-consuming processes lasting hours or even days, uncertain accuracy, reliance on accurate static models, and uncertainty in dynamic model parameters. In this research, aim to address these limitations by leveraging machine learning models for production forecasting. These models enable faster and more precise decision-making by accurately predicting future outcomes based on historical data. Our study employs three models: Decision Trees (DTR), Random Forest (RFR), and XGBoost. In this reserch utilize the Python programming language and a dataset from wells in New York State, USA. Experimental results demonstrate that the RFR model achieves the highest accuracy (99%) for oil and gas production compared to the XGBoost and DTR models.


Download data is not yet available.


S. A. Al-Hilfi and M. A. U. Naser, "Cascade networks model to predict the crude oil prices in Iraq," International Journal of Electrical and Computer Engineering, vol. 12, no. 6, p. 6697, 2022, doi: https://doi.org/10.11591/ijece.v12i6.pp6697-6706.

s. culture. "What is Oil and Gas Production?" https://safetyculture.com/topics/oil-and-gas-production/ (accessed.

T. Doan and M. Van Vo, "Using machine learning techniques for enhancing production forecast in north Malay Basin," in Proceedings of the International Field Exploration and Development Conference 2020, 2021: Springer, pp. 114-121, doi: https://doi.org/10.1007/978-981-16-0761-5_11.

A. M. AlRassas et al., "Optimized ANFIS model using Aquila Optimizer for oil production forecasting," Processes, vol. 9, no. 7, p. 1194, 2021, doi: https://doi.org/10.3390/pr9071194.

I. Makhotin, D. Orlov, and D. Koroteev, "Machine Learning to Rate and Predict the Efficiency of Waterflooding for Oil Production," Energies, vol. 15, no. 3, p. 1199, 2022, doi: https://doi.org/10.3390/en15031199.

M. A. Al-Qaness, A. A. Ewees, L. Abualigah, A. M. AlRassas, H. V. Thanh, and M. Abd Elaziz, "Evaluating the applications of dendritic neuron model with metaheuristic optimization algorithms for crude-oil-production forecasting," Entropy, vol. 24, no. 11, p. 1674, 2022, doi: https://doi.org/10.3390/e24111674.

E. H. Alkhammash, "An Optimized Gradient Boosting Model by Genetic Algorithm for Forecasting Crude Oil Production," Energies, vol. 15, no. 17, p. 6416, 2022, doi: https://doi.org/10.3390/en15176416.

C. S. W. Ng, A. J. Ghahfarokhi, and M. N. Amar, "Well production forecast in Volve field: Application of rigorous machine learning techniques and metaheuristic algorithm," Journal of Petroleum Science and Engineering, vol. 208, p. 109468, 2022, doi: https://doi.org/10.1016/j.petrol.2021.109468.

N. M. Ibrahim et al., "Well Performance Classification and Prediction: Deep Learning and Machine Learning Long Term Regression Experiments on Oil, Gas, and Water Production," Sensors, vol. 22, no. 14, p. 5326, 2022, doi: https://doi.org/10.3390/s22145326.

M. H. Abed, W. A. Wali, and M. Alaziz, "Machine Learning Approach Based on Smart Ball COMSOL Multiphysics Simulation for Pipe Leak Detection," 2023, doi: https://doi.org/10.37917/ijeee.19.1.13.

B. M. Negash and A. D. Yaw, "Artificial neural network based production forecasting for a hydrocarbon reservoir under water injection," Petroleum Exploration and Development, vol. 47, no. 2, pp. 383-392, 2020, doi: https://doi.org/10.1016/s1876-3804(20)60055-6.

Z. Tariq et al., "A systematic review of data science and machine learning applications to the oil and gas industry," Journal of Petroleum Exploration and Production Technology, pp. 1-36, 2021, doi: https://doi.org/10.1007/s13202-021-01302-2.

P. F. Orrù, A. Zoccheddu, L. Sassu, C. Mattia, R. Cozza, and S. Arena, "Machine learning approach using MLP and SVM algorithms for the fault prediction of a centrifugal pump in the oil and gas industry," Sustainability, vol. 12, no. 11, p. 4776, 2020, doi: https://doi.org/10.3390/su12114776.

K. M. Hanga and Y. Kovalchuk, "Machine learning and multi-agent systems in oil and gas industry applications: A survey," Computer Science Review, vol. 34, p. 100191, 2019, doi: https://doi.org/10.1016/j.cosrev.2019.08.002.

K. A. Karoon and Z. N. Nemer, "A Review of Methods of Removing Haze from An Image," doi: https://doi.org/10.37391/ijeer.100354.

Z. Guo, H. Wang, X. Kong, L. Shen, and Y. Jia, "Machine learning-based production prediction model and its application in Duvernay Formation," Energies, vol. 14, no. 17, p. 5509, 2021, doi: https://doi.org/10.3390/en14175509.

N. A. Sami and D. S. Ibrahim, "Forecasting multiphase flowing bottom-hole pressure of vertical oil wells using three machine learning techniques," Petroleum Research, vol. 6, no. 4, pp. 417-422, 2021, doi: https://doi.org/10.1016/j.ptlrs.2021.05.004.

M. A. M. Fadzil, H. Zabiri, A. A. Razali, J. Basar, and M. Syamzari Rafeen, "Base Oil Process Modelling Using Machine Learning," Energies, vol. 14, no. 20, p. 6527, 2021, doi: https://doi.org/10.3390/en14206527.

G. Hui, S. Chen, Y. He, H. Wang, and F. Gu, "Machine learning-based production forecast for shale gas in unconventional reservoirs via integration of geological and operational factors," Journal of Natural Gas Science and Engineering, vol. 94, p. 104045, 2021, doi: https://doi.org/10.1016/j.jngse.2021.104045.

C. Tan et al., "Fracturing productivity prediction model and optimization of the operation parameters of shale gas well based on machine learning," Lithosphere, vol. 2021, no. Special 4, p. 2884679, 2021, doi: https://doi.org/10.2113/2021/2884679.

X.-y. Wang, Y.-j. Ma, E.-z. Fei, and Y.-f. Gao, "Daily production prediction of oil wells based on machine learning," in International Conference on Automation Control, Algorithm, and Intelligent Bionics (ACAIB 2023), 2023, vol. 12759: SPIE, pp. 516-520, doi: https://doi.org/10.1117/12.2686768.

S. Ray, "A quick review of machine learning algorithms," in 2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon), 2019: IEEE, pp. 35-39, doi: https://doi.org/10.1109/comitcon.2019.8862451.

G. Carleo et al., "Machine learning and the physical sciences," Reviews of Modern Physics, vol. 91, no. 4, p. 045002, 2019, doi: https://doi.org/10.1103/revmodphys.91.045002.

C. Gkerekos, I. Lazakis, and G. Theotokatos, "Machine learning models for predicting ship main engine Fuel Oil Consumption: A comparative study," Ocean Engineering, vol. 188, p. 106282, 2019, doi: https://doi.org/10.1016/j.oceaneng.2019.106282.

M. M. Shawkat, A. R. B. Risal, N. J. Mahdi, Z. Safari, M. H. Naser, and A. W. Al Zand, "Fluid Flow Behavior Prediction in Naturally Fractured Reservoirs Using Machine Learning Models," Complexity, vol. 2023, 2023, doi: https://doi.org/10.1155/2023/7953967.

P. Sharma, K. Ramesh, R. Parameshwaran, and S. S. Deshmukh, "Thermal conductivity prediction of titania-water nanofluid: A case study using different machine learning algorithms," Case Studies in Thermal Engineering, vol. 30, p. 101658, 2022, doi: https://doi.org/10.1016/j.csite.2021.101658.

J. A. Alhijaj and R. S. Khudeyer, "Integration of EfficientNetB0 and Machine Learning for Fingerprint Classification," Informatica, vol. 47, no. 5, 2023, doi: https://doi.org/10.31449/inf.v47i5.4527.

A. Al-Fakih, A. F. Ibrahim, S. Elkatatny, and A. Abdulraheem, "Estimating electrical resistivity from logging data for oil wells using machine learning," Journal of Petroleum Exploration and Production Technology, vol. 13, no. 6, pp. 1453-1461, 2023, doi: https://doi.org/10.1007/s13202-023-01617-2.

G. S. Ohannesian and E. J. Harfash, "Epileptic Seizures Detection from EEG Recordings Based on a Hybrid system of Gaussian Mixture Model and Random Forest Classifier," Informatica, vol. 46, no. 6, 2022, doi: https://doi.org/10.31449/inf.v46i6.4203.

A. K. Ali and A. M. Abdullah, "Fake accounts detection on social media using stack ensemble system," International Journal of Electrical and Computer Engineering (IJECE), vol. 12, no. 3, pp. 3013-3022, 2022.

R. I. Kazim and E. F. Abdullah, "Preprocessing of Drugs Reviews and Classification Techniques," Journal of Al-Qadisiyah for computer science and mathematics, vol. 15, no. 3, pp. Page 1-10, 2023, doi: https://doi.org/10.29304/jqcm.2023.15.3.1261.

B. Lu and Y. He, "Evaluating empirical regression, machine learning, and radiative transfer modelling for estimating vegetation chlorophyll content using bi-seasonal hyperspectral images," Remote Sensing, vol. 11, no. 17, p. 1979, 2019, doi: https://doi.org/10.3390/rs11171979.

D. K. Seo, Y. H. Kim, Y. D. Eo, W. Y. Park, and H. C. Park, "Generation of radiometric, phenological normalized image based on random forest regression for change detection," Remote Sensing, vol. 9, no. 11, p. 1163, 2017, doi: https://doi.org/10.3390/rs9111163.

P. Jain, A. Choudhury, P. Dutta, K. Kalita, and P. Barsocchi, "Random forest regression-based machine learning model for accurate estimation of fluid flow in curved pipes," Processes, vol. 9, no. 11, p. 2095, 2021, doi: https://doi.org/10.3390/pr9112095.

N. U. Moroff, E. Kurt, and J. Kamphues, "Machine Learning and statistics: A Study for assessing innovative demand forecasting models," Procedia Computer Science, vol. 180, pp. 40-49, 2021, doi: https://doi.org/10.1016/j.procs.2021.01.127.

T. Bikmukhametov and J. Jäschke, "Oil production monitoring using gradient boosting machine learning algorithm," Ifac-Papersonline, vol. 52, no. 1, pp. 514-519, 2019, doi: https://doi.org/10.1016/j.ifacol.2019.06.114.

H. Mo, H. Sun, J. Liu, and S. Wei, "Developing window behavior models for residential buildings using XGBoost algorithm," Energy and Buildings, vol. 205, p. 109564, 2019, doi: https://doi.org/10.1016/j.enbuild.2019.109564.

C. Cao, P. Jia, L. Cheng, Q. Jin, and S. Qi, "A review on application of data-driven models in hydrocarbon production forecast," Journal of Petroleum Science and Engineering, vol. 212, p. 110296, 2022, doi: https://doi.org/10.1016/j.petrol.2022.110296.

M. Zou, W.-G. Jiang, Q.-H. Qin, Y.-C. Liu, and M.-L. Li, "Optimized XGBoost model with small dataset for predicting relative density of Ti-6Al-4V parts manufactured by selective laser melting," Materials, vol. 15, no. 15, p. 5298, 2022, doi: https://doi.org/10.3390/ma15155298.

D. Chicco, M. J. Warrens, and G. Jurman, "The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation," PeerJ Computer Science, vol. 7, p. e623, 2021, doi: https://doi.org/10.7717/peerj-cs.623.

D. Zhang, "A coefficient of determination for generalized linear models," The American Statistician, vol. 71, no. 4, pp. 310-316, 2017, doi: https://doi.org/10.1080/00031305.2016.1256839.

d. world. "Oil and Gas Summary Production Data: 1967-1999." https://data.world/data-ny-gov/8y5c-ebxg (accessed.




How to Cite

A. Al shabaan , M., & N. Nemer , Z. (2024). Oil and Gas Production Forecasting Using Decision Trees, Random Forst, and XGBoost. Journal of Al-Qadisiyah for Computer Science and Mathematics, 16(1), Comp. 9–20. https://doi.org/10.29304/jqcsm.2024.16.11431



Computer Articles