The study aimed to determine the survival rate of first-class passengers using the Titanic dataset from Kaggle. Descriptive statistics revealed that first class passengers had way more chance to survive as compared to other classes, which underscores the role of socioeconomic status in determining chances of survival. Evaluation metrics, which assess model performance independently for male and female cohorts, shed light on gender specific projected accuracy. The analysis of propensity scores matching data for male and female passengers separately ensured that each gender category had control groups and treatments that were equally distributed. It was discovered that women had higher survival rates compared to men and these findings also identified disparities in the levels of surviving among genders. Improvements in covariate balance were indicated by post-matching statistics for both the male and female cohorts, indicating that the matching process was successful for both genders. The treatment effect estimates for male and female passengers were computed independently, and the findings showed that a number of characteristics significantly improved the survival rates for each gender group. The overall results of the study emphasized how important it is to include gender when analyzing survival outcomes using the Titanic dataset. In addition, age was suggested as an important factor whereby young people had higher chances of being saved.
Published in | American Journal of Mathematical and Computer Modelling (Volume 9, Issue 3) |
DOI | 10.11648/j.ajmcm.20240903.12 |
Page(s) | 68-77 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2024. Published by Science Publishing Group |
Propensity Score Matching, Survival Rates, Observational Data, Treatment and Control Groups
[1] | H. Harris and S. J. Horst, “A brief guide to decisions at each step of the propensity score matching process,” Practical Assessment, Research, and Evaluation, vol. 21, no. 1, p. 4, 2019. |
[2] | A. S. Jones, R. B. D’Agostino Jr, E. W. Gondolf, and A. Heckert, “Assessing the effect of batterer program completion on reassault using propensity scores,” Journal of Interpersonal Violence, vol. 19, no. 9, pp. 1002–1020, 2004. |
[3] | P. R. Rosenbaum and D. B. Rubin, “The central role of the propensity score in observational studies for causal effects,” Biometrika, vol. 70, no. 1, pp. 41–55, 1983. |
[4] | P. R. Rosenbaum and D. B. Rubin, “The bias due to incomplete matching,” Biometrics, pp. 103–116, 1985. |
[5] | M. Caliendo and S. Kopeinig, “Some practical guidance for the implementation of propensity score matching,” Journal of economic surveys, vol. 22, no. 1, pp. 31–72, 2008. |
[6] | X. S. Gu and P. R. Rosenbaum, “Comparison of multivariate matching methods: Structures, distances, and algorithms,” Journal of Computational and Graphical Statistics, vol. 2, no. 4, pp. 405–420, 1993. |
[7] | D. E. Ho, K. Imai, G. King, and E. A. Stuart, “Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference,” Political analysis, vol. 15, no. 3, pp. 199–236, 2007. |
[8] | M. A. Brookhart, S. Schneeweiss, K. J. Rothman, R. J. Glynn, J. Avorn, and T. Stürmer, “Variable selection for propensity score models,” American journal of epidemiology, vol. 163, no. 12, pp. 1149–1156, 2006. |
[9] | A. W. Steiner, J. M. Lattimer, and E. F. Brown, “The equation of state from observed masses and radii of neutron stars,” The Astrophysical Journal, vol. 722, no. 1, p. 33, 2010. |
[10] | E. A. Stuart, “Matching methods for causal inference: A review and a look forward,” Statistical science: a review journal of the Institute of Mathematical Statistics, vol. 25, no. 1, p. 1, 2010. |
[11] | P. C. Austin, “An introduction to propensity score methods for reducing the effects of confounding in observational studies,” Multivariate behavioral research, vol. 46, no. 3, pp. 399–424, 2011. |
[12] | G. W. Imbens, “Nonparametric estimation of average treatment effects under exogeneity: A review,” Review of Economics and statistics, vol. 86, no. 1, pp. 4–29, 2004. |
[13] | H. J. Schmoll, R. Souchon, S. Krege, P. Albers, J. Beyer, C. Kollmannsberger, S. Fossa, N. Skakkebaek, R. De Wit, K. Fizazi, et al., “European consensus on diagnosis and treatment of germ cell cancer: a report of the european germ cell cancer consensus group (egcccg),” Annals of Oncology, vol. 15, no. 9, pp. 1377–1399, 2004. |
[14] | S. L. Morgan and J. J. Todd, “A diagnostic routine for the detection of consequential heterogeneity of causal effects,” Sociological Methodology, vol. 38, no. 1, pp. 231–281, 2008. |
[15] | T. Young, L. Finn, P. E. Peppard, M. Szklo-Coxe, D. Austin, F. J. Nieto, R. Stubbs, and K. M. Hla, “Sleep disordered breathing and mortality: eighteen-year follow-up of the wisconsin sleep cohort,” Sleep, vol. 31, no. 8, pp. 1071–1078, 2008. |
[16] | B. K. Flury and H. Riedwyl, “Standard distance in univariate and multivariate analysis,” The American Statistician, vol. 40, no. 3, pp. 249–251, 1986. |
[17] | P. C. Austin, “Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples,” Statistics in medicine, vol. 28, no. 25, pp. 3083–3107, 2009. |
[18] | A. Buja, D. Cook, H. Hofmann, M. Lawrence, E.-K. Lee, D. F. Swayne, and H. Wickham, “Statistical inference for exploratory data analysis and model diagnostics,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 367, no. 1906, pp. 4361–4383, 2009. |
APA Style
Collins, W., Anjela, W., Jacinta, M. (2024). Propensity Score Matching: An Application on Observational Data. American Journal of Mathematical and Computer Modelling, 9(3), 68-77. https://doi.org/10.11648/j.ajmcm.20240903.12
ACS Style
Collins, W.; Anjela, W.; Jacinta, M. Propensity Score Matching: An Application on Observational Data. Am. J. Math. Comput. Model. 2024, 9(3), 68-77. doi: 10.11648/j.ajmcm.20240903.12
AMA Style
Collins W, Anjela W, Jacinta M. Propensity Score Matching: An Application on Observational Data. Am J Math Comput Model. 2024;9(3):68-77. doi: 10.11648/j.ajmcm.20240903.12
@article{10.11648/j.ajmcm.20240903.12, author = {Wangila Collins and Wanjala Anjela and Muindi Jacinta}, title = {Propensity Score Matching: An Application on Observational Data }, journal = {American Journal of Mathematical and Computer Modelling}, volume = {9}, number = {3}, pages = {68-77}, doi = {10.11648/j.ajmcm.20240903.12}, url = {https://doi.org/10.11648/j.ajmcm.20240903.12}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajmcm.20240903.12}, abstract = {The study aimed to determine the survival rate of first-class passengers using the Titanic dataset from Kaggle. Descriptive statistics revealed that first class passengers had way more chance to survive as compared to other classes, which underscores the role of socioeconomic status in determining chances of survival. Evaluation metrics, which assess model performance independently for male and female cohorts, shed light on gender specific projected accuracy. The analysis of propensity scores matching data for male and female passengers separately ensured that each gender category had control groups and treatments that were equally distributed. It was discovered that women had higher survival rates compared to men and these findings also identified disparities in the levels of surviving among genders. Improvements in covariate balance were indicated by post-matching statistics for both the male and female cohorts, indicating that the matching process was successful for both genders. The treatment effect estimates for male and female passengers were computed independently, and the findings showed that a number of characteristics significantly improved the survival rates for each gender group. The overall results of the study emphasized how important it is to include gender when analyzing survival outcomes using the Titanic dataset. In addition, age was suggested as an important factor whereby young people had higher chances of being saved. }, year = {2024} }
TY - JOUR T1 - Propensity Score Matching: An Application on Observational Data AU - Wangila Collins AU - Wanjala Anjela AU - Muindi Jacinta Y1 - 2024/12/03 PY - 2024 N1 - https://doi.org/10.11648/j.ajmcm.20240903.12 DO - 10.11648/j.ajmcm.20240903.12 T2 - American Journal of Mathematical and Computer Modelling JF - American Journal of Mathematical and Computer Modelling JO - American Journal of Mathematical and Computer Modelling SP - 68 EP - 77 PB - Science Publishing Group SN - 2578-8280 UR - https://doi.org/10.11648/j.ajmcm.20240903.12 AB - The study aimed to determine the survival rate of first-class passengers using the Titanic dataset from Kaggle. Descriptive statistics revealed that first class passengers had way more chance to survive as compared to other classes, which underscores the role of socioeconomic status in determining chances of survival. Evaluation metrics, which assess model performance independently for male and female cohorts, shed light on gender specific projected accuracy. The analysis of propensity scores matching data for male and female passengers separately ensured that each gender category had control groups and treatments that were equally distributed. It was discovered that women had higher survival rates compared to men and these findings also identified disparities in the levels of surviving among genders. Improvements in covariate balance were indicated by post-matching statistics for both the male and female cohorts, indicating that the matching process was successful for both genders. The treatment effect estimates for male and female passengers were computed independently, and the findings showed that a number of characteristics significantly improved the survival rates for each gender group. The overall results of the study emphasized how important it is to include gender when analyzing survival outcomes using the Titanic dataset. In addition, age was suggested as an important factor whereby young people had higher chances of being saved. VL - 9 IS - 3 ER -