Research Article | | Peer-Reviewed

Explainable Machine Learning for Well Integrity and Casing Failure Risk with Sparse Failure Labels

Received: 13 January 2026     Accepted: 5 February 2026     Published: 24 February 2026
Views:       Downloads:
Abstract

The integrity of the well is critical to efficient and safe oil and gas operations because few records exist of casing failure, and conditions in the downhole are complicated. In this study, an explainable machine learning (XML) framework is introduced to estimate the risk of casing failures in limited labelled wells. Which is based on measurements of pressure, temperature, annular pressure, cement bond quality and downhole vibration but integrates semi-supervised learning with gradient-boosted trees to predict the likelihood of failure. SHAP (Shapley Additive Explanations) values gives explanations on the feature level, and these values can be used to determine how operational parameters can contribute to risk predictions. Experimentation with synthetic and field-derived datasets shows that the XML framework has an accuracy of 91%, a precision of 0.87, a recall of 0.89, and a F1-score of 0.88. The critical predictors such as pressure differences in the annulus, low bond integrity of cement, and high vibration were quantitatively related with high risk of failures, whereas the risk scoring at the time gave a premonition to the known failures 2-6 months before the recorded failures. The improvement in the prediction performance and interpretability of the framework is evident when it is compared to the baseline models such as logistic regression and standard random forest. In low failure label environments, the technique fills the gap between risk assessment based on data and operational decision making by integrating the strong prediction with actionable information that enables operators to focus more on tracking, preventive actions, and to save on time on non-productive activities.

Published in American Journal of Energy Engineering (Volume 14, Issue 1)
DOI 10.11648/j.ajee.20261401.14
Page(s) 27-33
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2026. Published by Science Publishing Group

Keywords

Explainable Machine Learning, Well Integrity, Casing Failure Risk, Gradient-boosted Trees. SHAP Analysis, Sparse Failure Labels

References
[1] Alvarez, J., Smith, R. and Johnson, P. (2020). Well integrity management in complex subsurface environments. Journal of Petroleum Engineering, 45(3), pp. 210–223.
[2] Azmi, P. A. R., Yusoff, M., and Sallehuddin, M. T. M. (2024). A review of predictive analytics and machine learning applications in oil and gas engineering systems. Sensors, 24(12), 4013.
[3] Doshi-Velez, F., and Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint, arXiv: 1702.08608.
[4] Ben Seghier, M. E. A., Mohamed, O. A., & Ouaer, H. (2024). Machine learning-based Shapley additive explanations approach for corroded pipeline failure mode identification. Structures, 65, 106653.
[5] Ishkulov, I. M. (2025) ‘Interpretable machine learning to detect well integrity issues’, Journal of Mining Institute. pmi.spmi.ru
[6] Jafari, H., Lee, T. and Kumar, A. (2018). Statistical modeling for casing failure prediction. Energy Systems, 9(2), pp. 123–136.
[7] Li, X., Chen, Z., and Wu, Y. (2020). Machine learning applications for casing failure prediction in oil and gas wells. Journal of Petroleum Science and Engineering, 185, 106664.
[8] Li, Y., Guo, H. & Paplinski, A. (2018) ‘Semi-Supervised Classification for Oil Reservoir’, arXiv preprint. arxiv.org
[9] Li, Y., Guo, H., and Paplinski, A. (2018). Semi-supervised learning approaches for fault detection under limited labeled data. IEEE Transactions on Industrial Informatics, 14(6), pp. 2566–2576.
[10] Lundberg, S. M., and Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, pp. 4765–4774.
[11] Ma, X., Hou, M. & Zhan, J. (2023) ‘Interpretable predictive modeling using SHAP and LIME in energy systems’, Energies, 16(9), 3653. mdpi.com
[12] Ma, X., Hou, M., and Zhan, J. (2023). Explainable machine learning for energy system risk prediction using SHAP and LIME. Energies, 16(9), 3653.
[13] Nguyen, T., Tran, D. and Pham, L. (2021). Sensor-data-driven casing failure identification using neural networks. Journal of Energy Analytics, 3(4), pp. 89–103.
[14] Nguyen, T., Tran, D., and Pham, L. (2021). Sensor-based machine learning models for casing integrity monitoring. Journal of Petroleum Exploration and Production Technology, 11, pp. 3899–3912.
[15] Processes (2023). Explainable Machine Learning-Based Method for Fracturing Prediction of Horizontal Shale Oil Wells. Processes, 11(9), 2520.
[16] Santos, G., Pinto, T., Ramos, C., and Corchado, J. M. (2023). Explainable artificial intelligence for decision support in energy systems. Frontiers in Energy Research, 11, 1269397.
[17] Wang, M., Su, X., Song, H., Li, D. and Zhao, L. (2025). Enhancing predictive maintenance strategies for oil and gas equipment through ensemble learning modeling. Journal of Petroleum Exploration and Production Technology.
[18] Yoon, A. S., Lee, T., Lim, Y., Jung, D., Kang, P., Kim, D., Park, K. & Choi, Y. (2017) ‘Semi?supervised Learning with Deep Generative Models for Asset Failure Prediction’, arXiv preprint. arxiv.org
[19] Zhang, J., Wu, L., Jia, D., Wang, L., Chang, J., Li, X., Cui, L. & Shi, B. (2022) ‘A Machine Learning Method for the Risk Prediction of Casing Damage and Its Application in Waterflooding’, Sustainability, 14(22), 14733. mdpi.com
[20] Zhang, W., Eckert, A., Hilgedick, S. A., Goodman, H. & Meng, M. (2022) ‘Wellbore integrity: An integrated experimental and numerical study to investigate pore pressure variation during cement hardening under downhole conditions’, SPE Journal, 27(1), pp. 488-503.
[21] Zhou, Y., Liu, H., Wang, K., and Zhang, X. (2024). Explainable machine learning for failure risk assessment in complex engineering systems. Engineering Applications of Artificial Intelligence, 126, 107021.
Cite This Article
  • APA Style

    Ichenwo, J. L., Philip, O. (2026). Explainable Machine Learning for Well Integrity and Casing Failure Risk with Sparse Failure Labels. American Journal of Energy Engineering, 14(1), 27-33. https://doi.org/10.11648/j.ajee.20261401.14

    Copy | Download

    ACS Style

    Ichenwo, J. L.; Philip, O. Explainable Machine Learning for Well Integrity and Casing Failure Risk with Sparse Failure Labels. Am. J. Energy Eng. 2026, 14(1), 27-33. doi: 10.11648/j.ajee.20261401.14

    Copy | Download

    AMA Style

    Ichenwo JL, Philip O. Explainable Machine Learning for Well Integrity and Casing Failure Risk with Sparse Failure Labels. Am J Energy Eng. 2026;14(1):27-33. doi: 10.11648/j.ajee.20261401.14

    Copy | Download

  • @article{10.11648/j.ajee.20261401.14,
      author = {John Lander Ichenwo and Ogwu Philip},
      title = {Explainable Machine Learning for Well Integrity and Casing Failure Risk with Sparse Failure Labels},
      journal = {American Journal of Energy Engineering},
      volume = {14},
      number = {1},
      pages = {27-33},
      doi = {10.11648/j.ajee.20261401.14},
      url = {https://doi.org/10.11648/j.ajee.20261401.14},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajee.20261401.14},
      abstract = {The integrity of the well is critical to efficient and safe oil and gas operations because few records exist of casing failure, and conditions in the downhole are complicated. In this study, an explainable machine learning (XML) framework is introduced to estimate the risk of casing failures in limited labelled wells. Which is based on measurements of pressure, temperature, annular pressure, cement bond quality and downhole vibration but integrates semi-supervised learning with gradient-boosted trees to predict the likelihood of failure. SHAP (Shapley Additive Explanations) values gives explanations on the feature level, and these values can be used to determine how operational parameters can contribute to risk predictions. Experimentation with synthetic and field-derived datasets shows that the XML framework has an accuracy of 91%, a precision of 0.87, a recall of 0.89, and a F1-score of 0.88. The critical predictors such as pressure differences in the annulus, low bond integrity of cement, and high vibration were quantitatively related with high risk of failures, whereas the risk scoring at the time gave a premonition to the known failures 2-6 months before the recorded failures. The improvement in the prediction performance and interpretability of the framework is evident when it is compared to the baseline models such as logistic regression and standard random forest. In low failure label environments, the technique fills the gap between risk assessment based on data and operational decision making by integrating the strong prediction with actionable information that enables operators to focus more on tracking, preventive actions, and to save on time on non-productive activities.},
     year = {2026}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Explainable Machine Learning for Well Integrity and Casing Failure Risk with Sparse Failure Labels
    AU  - John Lander Ichenwo
    AU  - Ogwu Philip
    Y1  - 2026/02/24
    PY  - 2026
    N1  - https://doi.org/10.11648/j.ajee.20261401.14
    DO  - 10.11648/j.ajee.20261401.14
    T2  - American Journal of Energy Engineering
    JF  - American Journal of Energy Engineering
    JO  - American Journal of Energy Engineering
    SP  - 27
    EP  - 33
    PB  - Science Publishing Group
    SN  - 2329-163X
    UR  - https://doi.org/10.11648/j.ajee.20261401.14
    AB  - The integrity of the well is critical to efficient and safe oil and gas operations because few records exist of casing failure, and conditions in the downhole are complicated. In this study, an explainable machine learning (XML) framework is introduced to estimate the risk of casing failures in limited labelled wells. Which is based on measurements of pressure, temperature, annular pressure, cement bond quality and downhole vibration but integrates semi-supervised learning with gradient-boosted trees to predict the likelihood of failure. SHAP (Shapley Additive Explanations) values gives explanations on the feature level, and these values can be used to determine how operational parameters can contribute to risk predictions. Experimentation with synthetic and field-derived datasets shows that the XML framework has an accuracy of 91%, a precision of 0.87, a recall of 0.89, and a F1-score of 0.88. The critical predictors such as pressure differences in the annulus, low bond integrity of cement, and high vibration were quantitatively related with high risk of failures, whereas the risk scoring at the time gave a premonition to the known failures 2-6 months before the recorded failures. The improvement in the prediction performance and interpretability of the framework is evident when it is compared to the baseline models such as logistic regression and standard random forest. In low failure label environments, the technique fills the gap between risk assessment based on data and operational decision making by integrating the strong prediction with actionable information that enables operators to focus more on tracking, preventive actions, and to save on time on non-productive activities.
    VL  - 14
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • Sections