Explainable Machine Learning for Well Integrity and Casing Failure Risk with Sparse Failure Labels

John Lander Ichenwo; Ogwu Philip

doi:doi:10.11648/j.ajee.20261401.14

Research Article |

| Peer-Reviewed

Explainable Machine Learning for Well Integrity and Casing Failure Risk with Sparse Failure Labels

John Lander Ichenwo^*

, Ogwu Philip

Published in American Journal of Energy Engineering (Volume 14, Issue 1)

Received: 13 January 2026 Accepted: 5 February 2026 Published: 24 February 2026

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

The integrity of the well is critical to efficient and safe oil and gas operations because few records exist of casing failure, and conditions in the downhole are complicated. In this study, an explainable machine learning (XML) framework is introduced to estimate the risk of casing failures in limited labelled wells. Which is based on measurements of pressure, temperature, annular pressure, cement bond quality and downhole vibration but integrates semi-supervised learning with gradient-boosted trees to predict the likelihood of failure. SHAP (Shapley Additive Explanations) values gives explanations on the feature level, and these values can be used to determine how operational parameters can contribute to risk predictions. Experimentation with synthetic and field-derived datasets shows that the XML framework has an accuracy of 91%, a precision of 0.87, a recall of 0.89, and a F1-score of 0.88. The critical predictors such as pressure differences in the annulus, low bond integrity of cement, and high vibration were quantitatively related with high risk of failures, whereas the risk scoring at the time gave a premonition to the known failures 2-6 months before the recorded failures. The improvement in the prediction performance and interpretability of the framework is evident when it is compared to the baseline models such as logistic regression and standard random forest. In low failure label environments, the technique fills the gap between risk assessment based on data and operational decision making by integrating the strong prediction with actionable information that enables operators to focus more on tracking, preventive actions, and to save on time on non-productive activities.

Published in	American Journal of Energy Engineering (Volume 14, Issue 1)
DOI	10.11648/j.ajee.20261401.14
Page(s)	27-33
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2026. Published by Science Publishing Group

Keywords

Explainable Machine Learning, Well Integrity, Casing Failure Risk, Gradient-boosted Trees. SHAP Analysis, Sparse Failure Labels

References

[1]	Alvarez, J., Smith, R. and Johnson, P. (2020). Well integrity management in complex subsurface environments. Journal of Petroleum Engineering, 45(3), pp. 210–223.
[2]	Azmi, P. A. R., Yusoff, M., and Sallehuddin, M. T. M. (2024). A review of predictive analytics and machine learning applications in oil and gas engineering systems. Sensors, 24(12), 4013. https://doi.org/10.3390/s24124013
[3]	Doshi-Velez, F., and Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint, arXiv: 1702.08608.
[4]	Ben Seghier, M. E. A., Mohamed, O. A., & Ouaer, H. (2024). Machine learning-based Shapley additive explanations approach for corroded pipeline failure mode identification. Structures, 65, 106653. https://doi.org/10.1016/j.istruc.2024.106653
[5]	Ishkulov, I. M. (2025) ‘Interpretable machine learning to detect well integrity issues’, Journal of Mining Institute. pmi.spmi.ru
[6]	Jafari, H., Lee, T. and Kumar, A. (2018). Statistical modeling for casing failure prediction. Energy Systems, 9(2), pp. 123–136.
[7]	Li, X., Chen, Z., and Wu, Y. (2020). Machine learning applications for casing failure prediction in oil and gas wells. Journal of Petroleum Science and Engineering, 185, 106664. https://doi.org/10.1016/j.petrol.2019.106664
[8]	Li, Y., Guo, H. & Paplinski, A. (2018) ‘Semi-Supervised Classification for Oil Reservoir’, arXiv preprint. arxiv.org
[9]	Li, Y., Guo, H., and Paplinski, A. (2018). Semi-supervised learning approaches for fault detection under limited labeled data. IEEE Transactions on Industrial Informatics, 14(6), pp. 2566–2576. https://doi.org/10.1109/TII.2017.2778076
[10]	Lundberg, S. M., and Lee, S.-I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, 30, pp. 4765–4774.
[11]	Ma, X., Hou, M. & Zhan, J. (2023) ‘Interpretable predictive modeling using SHAP and LIME in energy systems’, Energies, 16(9), 3653. mdpi.com
[12]	Ma, X., Hou, M., and Zhan, J. (2023). Explainable machine learning for energy system risk prediction using SHAP and LIME. Energies, 16(9), 3653. https://doi.org/10.3390/en16093653
[13]	Nguyen, T., Tran, D. and Pham, L. (2021). Sensor-data-driven casing failure identification using neural networks. Journal of Energy Analytics, 3(4), pp. 89–103.
[14]	Nguyen, T., Tran, D., and Pham, L. (2021). Sensor-based machine learning models for casing integrity monitoring. Journal of Petroleum Exploration and Production Technology, 11, pp. 3899–3912. https://doi.org/10.1007/s13202-021-01234-7
[15]	Processes (2023). Explainable Machine Learning-Based Method for Fracturing Prediction of Horizontal Shale Oil Wells. Processes, 11(9), 2520. https://doi.org/10.3390/pr11092520
[16]	Santos, G., Pinto, T., Ramos, C., and Corchado, J. M. (2023). Explainable artificial intelligence for decision support in energy systems. Frontiers in Energy Research, 11, 1269397. https://doi.org/10.3389/fenrg.2023.1269397
[17]	Wang, M., Su, X., Song, H., Li, D. and Zhao, L. (2025). Enhancing predictive maintenance strategies for oil and gas equipment through ensemble learning modeling. Journal of Petroleum Exploration and Production Technology. https://doi.org/10.1007/s13202-025-01931-x
[18]	Yoon, A. S., Lee, T., Lim, Y., Jung, D., Kang, P., Kim, D., Park, K. & Choi, Y. (2017) ‘Semi?supervised Learning with Deep Generative Models for Asset Failure Prediction’, arXiv preprint. arxiv.org
[19]	Zhang, J., Wu, L., Jia, D., Wang, L., Chang, J., Li, X., Cui, L. & Shi, B. (2022) ‘A Machine Learning Method for the Risk Prediction of Casing Damage and Its Application in Waterflooding’, Sustainability, 14(22), 14733. mdpi.com
[20]	Zhang, W., Eckert, A., Hilgedick, S. A., Goodman, H. & Meng, M. (2022) ‘Wellbore integrity: An integrated experimental and numerical study to investigate pore pressure variation during cement hardening under downhole conditions’, SPE Journal, 27(1), pp. 488-503.
[21]	Zhou, Y., Liu, H., Wang, K., and Zhang, X. (2024). Explainable machine learning for failure risk assessment in complex engineering systems. Engineering Applications of Artificial Intelligence, 126, 107021. https://doi.org/10.1016/j.engappai.2023.107021

Cite This Article

Plain Text BibTeX RIS

APA Style

Ichenwo, J. L., Philip, O. (2026). Explainable Machine Learning for Well Integrity and Casing Failure Risk with Sparse Failure Labels. American Journal of Energy Engineering, 14(1), 27-33. https://doi.org/10.11648/j.ajee.20261401.14

Copy | Download

ACS Style

Ichenwo, J. L.; Philip, O. Explainable Machine Learning for Well Integrity and Casing Failure Risk with Sparse Failure Labels. Am. J. Energy Eng. 2026, 14(1), 27-33. doi: 10.11648/j.ajee.20261401.14

Copy | Download

AMA Style

Ichenwo JL, Philip O. Explainable Machine Learning for Well Integrity and Casing Failure Risk with Sparse Failure Labels. Am J Energy Eng. 2026;14(1):27-33. doi: 10.11648/j.ajee.20261401.14

Copy | Download

@article{10.11648/j.ajee.20261401.14,
  author = {John Lander Ichenwo and Ogwu Philip},
  title = {Explainable Machine Learning for Well Integrity and Casing Failure Risk with Sparse Failure Labels},
  journal = {American Journal of Energy Engineering},
  volume = {14},
  number = {1},
  pages = {27-33},
  doi = {10.11648/j.ajee.20261401.14},
  url = {https://doi.org/10.11648/j.ajee.20261401.14},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajee.20261401.14},
  abstract = {The integrity of the well is critical to efficient and safe oil and gas operations because few records exist of casing failure, and conditions in the downhole are complicated. In this study, an explainable machine learning (XML) framework is introduced to estimate the risk of casing failures in limited labelled wells. Which is based on measurements of pressure, temperature, annular pressure, cement bond quality and downhole vibration but integrates semi-supervised learning with gradient-boosted trees to predict the likelihood of failure. SHAP (Shapley Additive Explanations) values gives explanations on the feature level, and these values can be used to determine how operational parameters can contribute to risk predictions. Experimentation with synthetic and field-derived datasets shows that the XML framework has an accuracy of 91%, a precision of 0.87, a recall of 0.89, and a F1-score of 0.88. The critical predictors such as pressure differences in the annulus, low bond integrity of cement, and high vibration were quantitatively related with high risk of failures, whereas the risk scoring at the time gave a premonition to the known failures 2-6 months before the recorded failures. The improvement in the prediction performance and interpretability of the framework is evident when it is compared to the baseline models such as logistic regression and standard random forest. In low failure label environments, the technique fills the gap between risk assessment based on data and operational decision making by integrating the strong prediction with actionable information that enables operators to focus more on tracking, preventive actions, and to save on time on non-productive activities.},
 year = {2026}
}

Copy | Download

TY  - JOUR
T1  - Explainable Machine Learning for Well Integrity and Casing Failure Risk with Sparse Failure Labels
AU  - John Lander Ichenwo
AU  - Ogwu Philip
Y1  - 2026/02/24
PY  - 2026
N1  - https://doi.org/10.11648/j.ajee.20261401.14
DO  - 10.11648/j.ajee.20261401.14
T2  - American Journal of Energy Engineering
JF  - American Journal of Energy Engineering
JO  - American Journal of Energy Engineering
SP  - 27
EP  - 33
PB  - Science Publishing Group
SN  - 2329-163X
UR  - https://doi.org/10.11648/j.ajee.20261401.14
AB  - The integrity of the well is critical to efficient and safe oil and gas operations because few records exist of casing failure, and conditions in the downhole are complicated. In this study, an explainable machine learning (XML) framework is introduced to estimate the risk of casing failures in limited labelled wells. Which is based on measurements of pressure, temperature, annular pressure, cement bond quality and downhole vibration but integrates semi-supervised learning with gradient-boosted trees to predict the likelihood of failure. SHAP (Shapley Additive Explanations) values gives explanations on the feature level, and these values can be used to determine how operational parameters can contribute to risk predictions. Experimentation with synthetic and field-derived datasets shows that the XML framework has an accuracy of 91%, a precision of 0.87, a recall of 0.89, and a F1-score of 0.88. The critical predictors such as pressure differences in the annulus, low bond integrity of cement, and high vibration were quantitatively related with high risk of failures, whereas the risk scoring at the time gave a premonition to the known failures 2-6 months before the recorded failures. The improvement in the prediction performance and interpretability of the framework is evident when it is compared to the baseline models such as logistic regression and standard random forest. In low failure label environments, the technique fills the gap between risk assessment based on data and operational decision making by integrating the strong prediction with actionable information that enables operators to focus more on tracking, preventive actions, and to save on time on non-productive activities.
VL  - 14
IS  - 1
ER  -

Copy | Download

Author Information

John Lander Ichenwo

Department of Petroleum Engineering, University of Port Harcourt, Port Harcourt, Nigeria

Contact Email

http://orcid.org/0009-0001-0617-7662
Ogwu Philip

Department of Petroleum Engineering, Rivers State University, Port Harcourt, Nigeria

http://orcid.org/0009-0009-6871-2518

Download PDF

Submit an Article

Sections

Plain Text BibTeX RIS

APA Style

Ichenwo, J. L., Philip, O. (2026). Explainable Machine Learning for Well Integrity and Casing Failure Risk with Sparse Failure Labels. American Journal of Energy Engineering, 14(1), 27-33. https://doi.org/10.11648/j.ajee.20261401.14

Copy | Download

ACS Style

Ichenwo, J. L.; Philip, O. Explainable Machine Learning for Well Integrity and Casing Failure Risk with Sparse Failure Labels. Am. J. Energy Eng. 2026, 14(1), 27-33. doi: 10.11648/j.ajee.20261401.14

Copy | Download

AMA Style

Ichenwo JL, Philip O. Explainable Machine Learning for Well Integrity and Casing Failure Risk with Sparse Failure Labels. Am J Energy Eng. 2026;14(1):27-33. doi: 10.11648/j.ajee.20261401.14

Copy | Download

@article{10.11648/j.ajee.20261401.14,
  author = {John Lander Ichenwo and Ogwu Philip},
  title = {Explainable Machine Learning for Well Integrity and Casing Failure Risk with Sparse Failure Labels},
  journal = {American Journal of Energy Engineering},
  volume = {14},
  number = {1},
  pages = {27-33},
  doi = {10.11648/j.ajee.20261401.14},
  url = {https://doi.org/10.11648/j.ajee.20261401.14},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajee.20261401.14},
  abstract = {The integrity of the well is critical to efficient and safe oil and gas operations because few records exist of casing failure, and conditions in the downhole are complicated. In this study, an explainable machine learning (XML) framework is introduced to estimate the risk of casing failures in limited labelled wells. Which is based on measurements of pressure, temperature, annular pressure, cement bond quality and downhole vibration but integrates semi-supervised learning with gradient-boosted trees to predict the likelihood of failure. SHAP (Shapley Additive Explanations) values gives explanations on the feature level, and these values can be used to determine how operational parameters can contribute to risk predictions. Experimentation with synthetic and field-derived datasets shows that the XML framework has an accuracy of 91%, a precision of 0.87, a recall of 0.89, and a F1-score of 0.88. The critical predictors such as pressure differences in the annulus, low bond integrity of cement, and high vibration were quantitatively related with high risk of failures, whereas the risk scoring at the time gave a premonition to the known failures 2-6 months before the recorded failures. The improvement in the prediction performance and interpretability of the framework is evident when it is compared to the baseline models such as logistic regression and standard random forest. In low failure label environments, the technique fills the gap between risk assessment based on data and operational decision making by integrating the strong prediction with actionable information that enables operators to focus more on tracking, preventive actions, and to save on time on non-productive activities.},
 year = {2026}
}

Copy | Download

TY  - JOUR
T1  - Explainable Machine Learning for Well Integrity and Casing Failure Risk with Sparse Failure Labels
AU  - John Lander Ichenwo
AU  - Ogwu Philip
Y1  - 2026/02/24
PY  - 2026
N1  - https://doi.org/10.11648/j.ajee.20261401.14
DO  - 10.11648/j.ajee.20261401.14
T2  - American Journal of Energy Engineering
JF  - American Journal of Energy Engineering
JO  - American Journal of Energy Engineering
SP  - 27
EP  - 33
PB  - Science Publishing Group
SN  - 2329-163X
UR  - https://doi.org/10.11648/j.ajee.20261401.14
AB  - The integrity of the well is critical to efficient and safe oil and gas operations because few records exist of casing failure, and conditions in the downhole are complicated. In this study, an explainable machine learning (XML) framework is introduced to estimate the risk of casing failures in limited labelled wells. Which is based on measurements of pressure, temperature, annular pressure, cement bond quality and downhole vibration but integrates semi-supervised learning with gradient-boosted trees to predict the likelihood of failure. SHAP (Shapley Additive Explanations) values gives explanations on the feature level, and these values can be used to determine how operational parameters can contribute to risk predictions. Experimentation with synthetic and field-derived datasets shows that the XML framework has an accuracy of 91%, a precision of 0.87, a recall of 0.89, and a F1-score of 0.88. The critical predictors such as pressure differences in the annulus, low bond integrity of cement, and high vibration were quantitatively related with high risk of failures, whereas the risk scoring at the time gave a premonition to the known failures 2-6 months before the recorded failures. The improvement in the prediction performance and interpretability of the framework is evident when it is compared to the baseline models such as logistic regression and standard random forest. In low failure label environments, the technique fills the gap between risk assessment based on data and operational decision making by integrating the strong prediction with actionable information that enables operators to focus more on tracking, preventive actions, and to save on time on non-productive activities.
VL  - 14
IS  - 1
ER  -

Copy | Download