Exploring a Novel Approach of K-mean Gradient Boosting Algorithm with PCA for Drought Prediction

Babatunde Isaiah Ayinla; Rasheedat Aderonke Abdulsalam

doi:doi:10.11648/j.ajdmkd.20240901.11

Research Article |

| Peer-Reviewed

Exploring a Novel Approach of K-mean Gradient Boosting Algorithm with PCA for Drought Prediction

Babatunde Isaiah Ayinla^*

, Rasheedat Aderonke Abdulsalam

Published in American Journal of Data Mining and Knowledge Discovery (Volume 9, Issue 1)

Received: 11 June 2024 Accepted: 8 July 2024 Published: 23 July 2024

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Drought poses a significant threat to essential resources like food, land, and public health. Machine Learning (ML) has emerged as a powerful tool in weather forecasting, leveraging algorithms to predict weather phenomena with remarkable accuracy. ML models excel in navigating complex atmospheric systems, including those affected by climate change, offering precision beyond traditional forecasting methods. However, predicting drought remains challenging due to its uneven distribution and varying degrees. To tackle this challenge, an exploration of a novel approach of combining K-means++ clustering and Gradient Boosting Algorithm (KGBA) with Principal Component Analysis (PCA) for dimensionality reduction was carried out. Using a dataset spanning from 2000 to July 2016, comprising 2,756,796 US Drought Monitor records, the study developed and evaluated the KGBA model's effectiveness in drought prediction. The results demonstrated the superiority of high precision and recall rates, particularly in forecasting extreme and exceptional drought periods. Specifically, KGBA attained precision accuracies of 33% and 74%, along with recall rates of 72% and 77% for predicting extreme and exceptional drought periods, respectively. The model had an overall accuracy of 46% in predicting all the multiple classes of droughts. A performance that is slightly better than other ensemble methods that had the closest performance. These findings underscore the potential of KGBA in enhancing the predictive capabilities for drought mitigation efforts, as it outperformed other models such as Gradient Boosting, Random Forest, Bayes Naive, and K-Nearest Neighbor.

Published in	American Journal of Data Mining and Knowledge Discovery (Volume 9, Issue 1)
DOI	10.11648/j.ajdmkd.20240901.11
Page(s)	1-19
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

K-means++, Gradient Boosting, Drought, Principal Component Analysis, Machine Learning and Climate Change

1. Background of the Study

Climate refers to the long-term patterns and variations in key atmospheric factors like temperature, precipitation, and wind. Essentially, climate represents a comprehensive summary of weather conditions over time. Drought, on the other hand, is a notable environmental occurrence that carries substantial consequences for agriculture, water resources, and natural habitats. Precise forecasting of drought events can play a crucial role in proactive planning and implementing strategies to alleviate the adverse effects of droughts.

According to the National Oceanic and Atmospheric Administration, US Commerce 2021 report. The world experienced coronavirus pandemic shutdowns in 2020, yet there was a surge in the level of anthropogenic greenhouse gases; that is, CO₂ rose to about 40% from 25% since 1953, and methanes also increased in its percentage . Climate change is beyond rising in temperatures. However, Global temperatures continue to rise to about 1.980 F (1.10C) between 1901 and 2020. The effect of Climate change is also reflected in the rise of sea level from 1.7 mm/year, mostly in the twentieth century, to 3.2 mm/year. The shifting weather patterns brought about by climate change are causing both droughts and floods, adversely impacting essentials for human and animal survival

[1]

The adverse effects of climate change are obvious in various aspects of our society. However, drought can be devastating, where food production could be grossly affected, and human health could be highly degraded. In a similar vein, flooding can lead to the spread of diseases that can wipe out a complete community and cause damage to social amenities and ecosystems. Climate change (CC) manifests its negative impact across various aspects of our world. It is also relevant that climate change differs from neighbourhoods or individuals to others.

The world has been facing the challenges of adaptation and strategies to survive in the midst of climate change for decades. One of the key reasons for the failure of countries to tackle climate change vulnerability is the lack of appropriate steps in risk management. Mortuza observed that the government of countries like Bangladesh tends to prioritize response and recovery efforts over monitoring, preparedness, and mitigation strategies

[2]

. Therefore, there is a need for accurate drought projection tools for the sustainable management of all aspects affected by CC, such as agriculture, health resources, and more. Given the unpredictable and diverse characteristics of drought, which can vary in intensity and occurrence across different locations, there's a critical need to develop swift, reliable, and accurate prediction models. These models are essential for quantifying the risks associated with drought events and for better understanding their potential impacts

[2].

Traditional drought prediction systems frequently rely on meteorological indicators such as temperature and precipitation. These approaches, however, may fail to capture the intricate linkages and patterns associated with drought

[26]

. In recent years, machine learning algorithms have shown promise in improving drought forecast accuracy by incorporating new data sources and recognizing non-linear connections. reported the utilization of various machine learning (ML) models, including support vector machine (SVM), artificial neural networks (ANN), and Extreme Learning Machine (ELM)

[3]

. However, the most widely used model in the domain of drought prediction has been Support Vector Machine (SVM) along with other potent algorithms

[4-7]

. The algorithm is combined with SPEI to predict drought over Pakistan, Palmer Drought Severity Index (PDSI) to predict drought over Turkey. Some researchers have examined the performance of ANN in predicting the Standardized Precipitation Index (SPI) over Iran.

1.1. Principal Component Analysis (PCA)

This is a sophisticated statistical method for reducing the dimensionality of large data sets. It is widely used in data science. The approach has its strength in managing complex and highly dimensional datasets without significant loss of information. This methodology has taken a strong position in various fields, from finance to genomics. These areas are known for massive amounts of data that require interpretation. It highlights different patterns in the dataset as well as any similarities among them, essentially converting the original features into new, uncorrelated features called principal components. The components are orthogonal axes of maximum variance that stand as datasets in a reduced-dimensional space

[8]

For instance, consider a data matrix, X, with column-wise zero empirical mean, a situation where each of the columns of the sample data has been shifted to zero. Each of the n rows stands as the repetition of the experiment, where each of the p columns generates a new feature.

In mathematical terms, the transformation involves a collection of weight vectors or coefficients, each of size p dimensions, which map every row vector of X is transformed into a new set of principal components. scores, denoted as t. These coefficients are structured to maximize the variance inherited from X across the individual variables of t in the dataset. Furthermore, each coefficient vector w is treated as a unit vector, typically leading to a reduction in the dimensionality of the observations.

When the First component is observed to maximize variance, the first weight vector w₍₁₎ must satisfy. In other word, expressing this in matrix form yields:

x₁=ℷ₁₁f₁+ℷ₂₁f₂+……ℷ_1mf_m+e₁(1)

x₂=ℷ₂₁f₁+ℷ₂₂f₂+……ℷ_2mf_m+e₂

…

X_p=ℷ_p1f₁+ℷ_p2f₂+……ℷ_pmf_m+e_p

where λ_jk, j = 1, 2,...,p; k = 1, 2,...,m are constants called the factor loadings, and e_j, j = 1, 2,...,p are error terms, sometimes called specific factors (because e_j is ‘specific’ to x_j , whereas the f_k are ‘common’ to several x_j).

Since w₍₁₎ has been defined as a unit vector, it also satisfies the equation.

The Rayleigh quotient is the quantity to be maximized. The maximum possible value of the quotient, indicated by the largest eigenvalue of the matrix, is a standard result for positive semidefinite matrices like X^TX. This maximum value is achieved when w corresponds to the eigenvector.

The w₍₁₎ is the first principal component of a data vector x₍_i₎, that is

scoret₁₍_i₎=x₍_i₎⋅w₍₁₎(2)

represented as the transformed co-ordinates, or could the seen as the corresponding vector in the original variables, {x₍_i₎ ⋅ w₍₁₎} w₍₁₎.

The other components of the PCA can be deduced as:

The k-th component is the subtraction of first k − 1 principal components from matrix X in (1):

Then, the weight vector is subsequently found, which is the extraction from the maximum variance of the new data matrix.

The above result in (2) extends to the remaining eigenvectors of X^TX, with the maximum values for the quantity in brackets given by their corresponding eigenvalues. Hence, the weight vectors are eigenvectors of X^TX.

The k-th principal component of a data vector x₍_i₎ is defined as the

scoret_k₍_i₎=x₍_i₎⋅w₍_k₎(3)

in the transformed coordinates, or as the corresponding vector in the space of the original variables,

{x₍_i₎ ⋅ w₍_k₎} w₍_k₎, where w₍_k₎ is the kth eigenvector of X^TX.

The entire principal components decomposition of X can be defined as follow:

W represents a p-by-p matrix of weights, where each column corresponds to the eigenvectors of X^TX. The transpose of W, often referred to as the whitening or sphering transformation, is calculated by scaling the columns of the W matrix by the square root of the corresponding eigenvalues. This process, essentially multiplying the eigenvectors by their respective variances, results in loadings in PCA or Factor analysis, as described by Sidak in 2023

[9]

This can further be mathematically represented as follows:

PCA_1=a_11*x_1+a_12*x_2+...+a_1p*x_p(4)

PCA_2=a_21*x_1+a_22*x_2+...+a_2p*x_p

...

PCA_k=a_k1*x_1+a_k2*x_2+...+a_kp*x_p

where a_ij is the loading or weight of variable x_j on principal component PCA_i, and x_j is the jth variable in the data matrix X. The principal components are ordered such that the first component PCA_1 captures the most significant variation in the data, the second component PCA_2 captures the second most significant variation, and so on as earlier described. The number of principal components used in the analysis, k, determines the reduced dimensionality of the dataset

[8, 9]

The PCA is useful in three major areas of model building, especially during the preprocessing stage. The areas are data reduction, an approach that simplifies model building in machine learning and statistical analysis by reducing the number of variables under consideration. Secondly, data analysis exploration; It uncovered hidden patterns in the preliminary stages of data analysis and thirdly, multivariate Analysis that deal with observations of multiple interrelated features.

1.2. Machine Learning Models

According to , machine learning models are mathematical representations of sets of data where predictions can be made for decision-making

[10]

. The models are built from training machine learning algorithms to learn from historical datasets, which are either labelled or unlabelled. Once the training is done, a generalized prediction can be made from unseen datasets. These models have revolutionalised several domains of interest, including security, health, finance, and the like. The model has the ability to uncover insights, including patterns and irregularities in data, in a more sophisticated way than the traditional statistical models

[11]

. Furthermore, Machine Learning Models (MLMs) exhibit portability, robustness, and flexibility, enabling them to perform effectively across diverse tasks, including assessing patient risk levels, making diagnoses, and predicting outcomes

[12]

. However, most MLMs are black boxes, and explainability and interpretability are concerns

[13, 14]

1.2.1. Gradient Boosting Algorithm (GBA)

Gradient boosting is a versatile ensemble machine learning method suitable for regression and classification tasks. It combines the predictions of numerous weak learners, typically decision trees, sequentially. The primary aim is to enhance predictive accuracy by optimizing the model's weights. These weights are determined by the errors of previous iterations, gradually reducing errors to refine the final model accuracy. Employing an arbitrary differentiable loss function, the model is systematically constructed stage by stage, akin to other boosting algorithms.

Similarly, the gradient boosting algorithm originated from Leo Breiman's idea that boosting can be conceptualized as an optimization algorithm on an appropriate cost function

[15]

. In the years 2001 and 2002, developed a regression version of the gradient boosting algorithm. The algorithm uses the approach proposed by Mason et al.

[16-18]

There are basically three family members of GBA: XGBoost, LightGBM, and CatBoost, with the aim of achieving better accuracy and speed optimization as the focus. Extreme Gradient Boosting Algorithm (XGBA) is known for its scalability, efficiency and reliability among the machine learning algorithms. However, LightGBM is extremely fast in model training with the use of selective samplings of high-gradient records. In a similar way, CatBoost places a premium on the accuracy prediction of the model by modifying the computation of gradients

[19]

The Gradient tree boosting algorithms are based on the derivation outlined by with minor enhancements made to the regularized objective function, which have proven to be beneficial in practical applications

[20]

The Gradient Tree Boosting ensemble model can be illustrated with Eq. (5). This involves functions being treated as parameters, a process that conventional optimization techniques in Euclidean space cannot handle. Essentially, the model is trained in an additive manner, allowing for optimization through the inclusion of these functions as parameters. Assume

\hat{y}

(t) as the value of prediction of the i-th tuple at the looping t-th. To minimize the objective, f_t needs to be added.

L (\emptyset) = \sum_{i} l ({\hat{y}}_{i}, y_{i}) + \sum_{k} Ω (f_{k})

(5)

where

Ω (f) = γT + \frac{1}{2} λ   ω  ^{2}

In this expression, l symbolizes a differentiable convex loss function, which evaluates the difference between the predicted value

{\hat{y}}_{i}

and the actual target value prediction

{\hat{y}}_{i}

. The subsequent term Ω penalizes the complexity of the model, particularly focusing on the regression tree functions. Furthermore, the regularization term is incorporated to refine the final learned weights, effectively addressing the issue of overfitting.

L^{(t)} = \sum_{i = 1}^{n} l (y_{i}, ŷ_{i}^{(t - 1)} + f_{t} (x_{i})) + Ω (f_{t})

(6)

The equation in (6) implies that the f_t that most improves it is greedily added.

According to Friedman et al., 2000, the second-order approximation is speedily applied to optimize the objective as presented in (5).

L^{(t)} ≃ \sum_{i = 1}^{n} [l (y_{i}, ŷ_{i}^{(t - 1)}) + g_{i} f_{t} (x_{i}) + \frac{1}{2} h_{i} f_{t}^{2} (x_{i})] + Ω (f_{t})

(7)

where gi = ∂yˆ(t−1)l(yi , yˆi (t−1)) and hi = ∂ 2 yˆ(t−1)l(yi , yˆi (t−1)), the first and second-order gradient statistics are computed on the loss function. After removing the constant terms, the simplified objective at step t is obtained.

L˜(t) = Xn i=1 [gift(xi) + 1 2 hif 2 t (xi)] + Ω(ft) (3) Define Ij = {i|q(xi) = j} as the instance of leaf j. We can rewrite (7) by expanding Ω as

{\tilde{L}}^{(t)} = \sum_{i = 1}^{n} [g_{i} f_{t} (x_{i}) + \frac{1}{2} h_{i} f_{t}^{2} (x_{i})] + Ω (f_{t})

(8)

The final prediction for a given group of samples is the sum of all the predictions from each tree as follows

{\tilde{L}}^{(t)} ≃ \sum_{i = 1}^{n} [g_{i} f_{t} (x_{i}) + \frac{1}{2} h_{i} f_{t}^{2} (x_{i})] + γT + \frac{1}{2} λ \sum_{j = 1}^{T} ω_{j}^{2}

(9)

\sum_{j = 1}^{T} [(\sum_{i \in I_{j}} g_{i}) w_{j} + \frac{1}{2} (\sum_{i \in I_{j}} h_{i} + λ) ω_{j}^{2}] + γT

The optimal weight w_j* for a given structure q(x) can be calculated using (9).

ω_{j}^{i} = - \frac{\sum_{i \in I_{j}} g_{i}}{\sum_{i \in I_{j}} h_{i} + λ}

(10)

The corresponding optimal value can be determined by the following calculation:

{\tilde{L}}^{(t)} (q) = - \frac{1}{2} \sum_{j = 1}^{T} \frac{{(\sum_{i \in I_{j}} g_{i})}^{2}}{\sum_{i \in I_{j}} h_{i} + λ} + γT

(11)

The equation in (11) can serve as a scoring function to evaluate the quality of a tree structure q. This score functions similarly to the impurity score used in decision trees but is applicable to a broader range of objective functions.

1.2.2. Random Forest (RF)

The RF is a special model that is based on multiple decision trees that are called forest with controlled variance

[13, 21]

. This model, called Random Forest, has been potency to work with both continuous and discrete datasets. The classifiers are known as regression and classification models, respectively. A random forest regression is a bootstrap ensemble. It works with random binary trees that made use of a subset of the datasets through bootstrapping, from a random subset of the training dataset isolated to build the models

[10]

2. Literature Review

In the study "Applying Machine Learning for Threshold Selection in Drought Early Warning System" by , the correlation between NDVI readings and drought categories is investigated across a 34-year timeframe in two distinct climate zones in Australia. The research aims to establish NDVI threshold values for different drought categories through a threshold selection approach. While the model provides valuable insights into drought severity and lays the groundwork for future drought classification models, additional efforts are necessary to enhance the accuracy of the model.

With a one-month lead time, artificial neural networks are used in "Applying machine learning for drought prediction using data from a large ensemble of climate simulations"

[22]

. This method predicts the onset of drought in two European domains. The paper addresses the application of explainable AI methods to acquire insights into the outcomes using data from a model of a large ensemble. Consequently, the models give chances to examine the impact of input variables on drought formation and serve as a foundation for the creation of future drought prediction models, but weak prediction accuracies are noted

[23]

According to Jiang and Luo's article "An Evaluation of Machine Learning and Deep Learning Models for Drought Prediction Using Weather Data"

[11]

. An experiment was conducted to analyze different AI models both machine and deep learning models to predict drought using dataset from United State. It was reported that no one model performs best for all the assessment criteria, due to the imbalanced nature of drought events, special attention was given to developing models capable of accurately predicting drought occurrences.

The study "The global k-means clustering algorithm" by Likas, Vlassis, and Verbeek in 2011, suggested changes to the technique to lessen the computing effort without significantly reducing solution quality

[24]

. The evaluation took into account both the solution quality and computational complexity. It was discovered that the global k-means algorithm was quite effective. But there is a significant issue that needs more research, and that has to do with the potential creation of theoretical underpinnings for the method's underlying presumptions

[24]

in the study titled "Application of Meteorological and Hydrological Drought Indices to Establish Drought Classification Maps of the Ba River Basin in Vietnam,". The aimed of this study was to generate maps illustrating the lack of discharge in the Ba River basin of Vietnam. They employed various indices, such as the Standardized Precipitation Index (SPI), the Drought Index (I), and the Ped Index (Ped), in conjunction with the Soil and Water Assessment Tool (SWAT) model and the hydrological drought index (KDrought), to create these maps

[25]

The hydrological drought index for the study area was derived by utilizing the simulation outcomes from the SWAT model. The impacts of the drought on both the spatial and temporal dimensions of the study area were assessed through drought classification maps generated from the calculated drought index (KDrought). While there were limited calibrations and validations conducted on the SWAT model, the study identified a correlation between the moisture regime and drought occurrences in the Central region.

3. Research Methodology

This study was performed using the US drought dataset, which contains different drought levels by state in the US from 2000 to 2016. The size of the data was 18.28 MB. Similarly, the total records in the dataset were 19,300,680, and 16,543,884 out of the records had null values. Two million, Seven hundred and fifty-six, Seven hundred and ninety-six (2,756,796) were left as viable for the experiment. The dataset was obtained from data. World, it is a dataset containing different drought levels by state in the United States (US)

[27]

. The structure of the dataset is described in Tables 1 and 2, and the stages of the experiment can be viewed diagrammatically in Figure 1.

Figure 1 shows the extreme skewness of the entire dataset, favouring more free drought seasons. The no drought, I mean class 0 dataset, was approximately 60% of the whole dataset, signifying 1,652,230 out of 2,756,796 rows. The other classes, such as abnormal dry (class 1), moderate drought (class 2), severe drought (class 3), and extreme drought (class 4), had 17% (466,944), 11% (295,331), 7% (196,802) and 4% (106,265) of tuples respectively. The last class of the drought, that is, exceptional drought (class 5) had extremely lowest representation of 1% (39,224) observations.

[4]	Barua, S., Ng, A. W. M., & Perera, B. J. C. (2012). Artificial Neural Network–Based Drought Forecasting Using a Nonlinear Aggregated Drought Index. Journal of Hydrologic Engineering, 17(12), 1408–1413. https://doi.org/10.1061/(ASCE)HE.1943-5584.0000574 View Article
[5]	Ghimire, S., Deo, R. C., Downs, N. J., & Raj, N. (2019). Global solar radiation prediction by ANN integrated with European Centre for medium range weather forecast fields in solar rich cities of Queensland Australia. Journal of Cleaner Production, 216, 288–310. https://doi.org/10.1016/J.JCLEPRO.2019.01.158 View Article
[6]	Xiang, B., Lin, S. J., Zhao, M., Johnson, N. C., Yang, X., & Jiang, X. (2019). Subseasonal Week 3–5 Surface Air Temperature Prediction During Boreal Wintertime in a GFDL Model. Geophysical Research Letters, 46(1), 416–425. https://doi.org/10.1029/2018GL081314 View Article
[7]	Yang, T., Zhou, X., Yu, Z., Krysanova, V., & Wang, B. (2015). Drought projection based on a hybrid drought index using Artificial Neural Networks. Hydrological Processes, 29(11), 2635–2648. https://doi.org/10.1002/HYP.10394 View Article

[8]	Jolliffe, I. T. (2002). Principal component analysis for special types of data (pp. 338-372). Springer, New York. https://doi.org/10.1007/b98835 View Article
[9]	Sidak, K. (2023, December). Overview of Principal Component Analysis (PCA). https://codefinity.com/blog/Overview-of-Principal-Component-Analysis-(PCA) accessed 02 May 2023. View Article

[13]	Ayinla, B., & Akinola, S. O. (2021). An Improved Collaborative Pruning Using Ant Colony Optimization and Pessimistic Technique of C5.0 Decision Tree Algorithm. Article in International Journal of Computer Science and Information Security. https://doi.org/10.5281/zenodo.4427699 View Article
[14]	Zhong, R., Chen, X., Lai, C., Wang, Z., Lian, Y., Yu, H., & Wu, X. (2019). Drought monitoring utility of satellite-based precipitation products across mainland China. Journal of Hydrology, 568, 343–359. https://doi.org/10.1016/J.JHYDROL.2018.10.072 View Article

[16]	Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451 View Article
[17]	Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics and Data Analysis, 38(4), 367–378. https://doi.org/10.1016/S0167-9473(01)00065-2 View Article
[18]	Mason, L., Bartlett, P., Baxter, J., & Frean, M. (2000). Boosting Algorithm as Gradient Descent. Advances in Neural Information Processing Systems, 512–518.

[13]	Ayinla, B., & Akinola, S. O. (2021). An Improved Collaborative Pruning Using Ant Colony Optimization and Pessimistic Technique of C5.0 Decision Tree Algorithm. Article in International Journal of Computer Science and Information Security. https://doi.org/10.5281/zenodo.4427699 View Article
[21]	Breiman, L. (2001). Random forests. Kluwer Academic Publishers, Netherlands 45(1), 5–32.

S/N	Attributes	Description
1.	WS10M_MIN	Minimum Wind Speed at 10 Meters (m/s)
2.	QV2M	Specific Humidity at 2 Meters (g/kg)
3.	T2M_RANGE	Temperature Range at 2 Meters (C)
4.	WS10M	Wind Speed at 10 Meters (m/s)
5.	T2M	Temperature at 2 Meters (C)
6.	WS50M_MIN	Minimum Wind Speed at 50 Meters (m/s)
7.	T2M_MAX	Maximum Temperature at 2 Meters (C)
8.	WS50M	Wind Speed at 50 Meters (m/s)
9.	TS	Earth Skin Temperature (C)
10.	WS50M_RANGE	Wind Speed Range at 50 Meters (m/s)
11.	WS50M_MAX	Maximum Wind Speed at 50 Meters (m/s)
12.	WS10M_MAX	Maximum Wind Speed at 10 Meters (m/s)
13.	WS10M_RANGE	Wind Speed Range at 10 Meters (m/s)
14.	PS	Surface Pressure (kPa)
15.	T2MDEW	Dew/Frost Point at 2 Meters (C)
16.	T2M_MIN	Minimum Temperature at 2 Meters (C)
17.	T2MWET	Wet Bulb Temperature at 2 Meters (C)
18.	PRECTOT	Precipitation (mm day-1)

S/N	Label	Description
a.	None	Drought Absent
b.	D0	Abnormally dry
c.	D1	Moderate drought
d.	D2	Severe drought
e.	D3	Extreme drought
f.	D4	Exceptional drought

S/N	Feature	Kurtosis values	S/N	Feature	Kurtosis values
1.	fips	-1.10	13.	WS10M_MAX	0.70
2.	PRECTOT	33.30	14.	WS10M_MIN	3.15
3.	PS	4.81	15.	WS10M_RANGE	2.08
4.	QV2M	-0.78	16.	WS50M	0.81
5.	T2M	0.55	17.	WS50M_MAX	0.98
6.	T2MDEW	-0.73	18.	WS50M_MIN	0.59
7.	T2MWET	-0.75	19.	WS50M_RANGE	2.20
8.	T2M_MAX	-0.50	20.	Score	1.38
9.	T2M_MIN	-0.44	21.	Year	1.20
10.	T2M_RANGE	-0.31	22.	month	1.20
11.	TS	-0.53	23.	day	1.19
12.	WS10M	1.41

	precision	recall	f1-score	Support
Class 0	0.57	0.50	0.53	10904
Class 1	0.56	0.23	0.33	10737
Class 2	0.69	0.16	0.26	10880
Class 3	0.28	0.34	0.31	10932
Class 4	0.33	0.72	0.45	10890
Class 5	0.75	0.78	0.76	10803
Accuracy			0.46	65146
macro avg	0.53	0.46	0.44	65146
weighted avg	0.53	0.46	0.44	65146

Class Name	precision	recall	f1-score	support
Class 0	0.56	0.49	0.52	10904
Class 1	0.53	0.23	0.32	10737
Class 2	0.68	0.15	0.25	10880
Class 3	0.28	0.35	0.31	10932
Class 4	0.33	0.72	0.45	10890
Class 5	0.74	0.77	0.76	10803
accuracy			0.45	65146
macro avg	0.52	0.45	0.43	65146
weighted avg	0.52	0.45	0.43	65146

Class Name	precision	Recall	f1-score	Support
Class 0	0.6095	0.6039	0.6067	10904
Class 1	0.4020	0.3868	0.3943	10737
Class 2	0.2748	0.2707	0.2727	10880
Class 3	0.1716	0.1701	0.1709	10932
Class 4	0.1565	0.1680	0.1620	10890
Class 5	0.8606	0.8578	0.8592	10803
accuracy			0.4089	65146
macro avg	0.4125	0.4095	0.4110	65146
weighted avg	0.4118	0.4089	0.4103	65146

Class Name	precision	recall	f1-score	support
Class 0	0.3554	0.4534	0.3985	10904
Class 1	0.2651	0.2493	0.2570	10737
Class 2	0.1807	0.1833	0.1820	10880
Class 3	0.1472	0.1452	0.1462	10932
Class 4	0.2031	0.1811	0.1915	10890
Class 5	0.6855	0.6098	0.6455	10803
accuracy			0.3033	65146
macro avg	0.3062	0.3037	0.3034	65146
weighted avg	0.3057	0.3033	0.3030	65146

Class Name	precision	recall	f1-score	support
Class 0	0.2647	0.1045	0.1499	10904
Class 1	0.1584	0.0392	0.0629	10737
Class 2	0.1171	0.0342	0.0529	10880
Class 3	0.0639	0.0369	0.0467	10932
Class 4	0.2375	0.9366	0.3789	10890
Class 5	0.4070	0.2164	0.2826	10803
accuracy			0.2283	65146
macro avg	0.2081	0.2280	0.1623	65146
weighted avg	0.2079	0.2283	0.1623	65146

CC	Climate Change
MLM	Machine Learning Model
RF	Random Forest
GBA	Gradient Boosting Algorithm
KGBA	K-means++ Clustering and Gradient Boosting Algorithm
XGBA	Extreme Gradient Boosting Algorithm
LightGBA	Light Gradient Boosting Algorithm
PCA	Principal Component Analysis
ML	Machine Learning
SVM	Support Vector Machine
ANN	Artificial Neural Networks
ELM	Extreme Learning Machine (ELM)
SPI	Standardized Precipitation
I	Drought Index
Ped	Ped Index
SWAT	Soil and Water Assessment Tool
NDMC	National Drought Mitigation

[1]	NOS Science Report 2021 https://oceanservice.noaa.gov/about/nos-science-report/2021/ accessed 02 May 2023.
[2]	Mortuza, M. R., Moges, E., Demissie, Y., & Li, H. Y. (2019). Historical and future drought in Bangladesh using copula-based bivariate regional frequency analysis. Theoretical and Applied Climatology, 135(3–4), 855–871. https://doi.org/10.1007/s00704-018-2407-7
[3]	Khan, N., Sachindra, D. A., Shahid, S., Ahmed, K., Shiru, M. S., & Nawaz, N. (2020). Prediction of droughts over Pakistan using machine learning algorithms. Advances in Water Resources, 139. https://doi.org/10.1016/j.advwatres.2020.103562
[4]	Barua, S., Ng, A. W. M., & Perera, B. J. C. (2012). Artificial Neural Network–Based Drought Forecasting Using a Nonlinear Aggregated Drought Index. Journal of Hydrologic Engineering, 17(12), 1408–1413. https://doi.org/10.1061/(ASCE)HE.1943-5584.0000574
[5]	Ghimire, S., Deo, R. C., Downs, N. J., & Raj, N. (2019). Global solar radiation prediction by ANN integrated with European Centre for medium range weather forecast fields in solar rich cities of Queensland Australia. Journal of Cleaner Production, 216, 288–310. https://doi.org/10.1016/J.JCLEPRO.2019.01.158
[6]	Xiang, B., Lin, S. J., Zhao, M., Johnson, N. C., Yang, X., & Jiang, X. (2019). Subseasonal Week 3–5 Surface Air Temperature Prediction During Boreal Wintertime in a GFDL Model. Geophysical Research Letters, 46(1), 416–425. https://doi.org/10.1029/2018GL081314
[7]	Yang, T., Zhou, X., Yu, Z., Krysanova, V., & Wang, B. (2015). Drought projection based on a hybrid drought index using Artificial Neural Networks. Hydrological Processes, 29(11), 2635–2648. https://doi.org/10.1002/HYP.10394
[8]	Jolliffe, I. T. (2002). Principal component analysis for special types of data (pp. 338-372). Springer, New York. https://doi.org/10.1007/b98835
[9]	Sidak, K. (2023, December). Overview of Principal Component Analysis (PCA). https://codefinity.com/blog/Overview-of-Principal-Component-Analysis-(PCA) accessed 02 May 2023.
[10]	Mokhtar, A., Jalali, M., He, H., Al-Ansari, N., Elbeltagi, A., Alsafadi, K., Abdo, H. G., Sammen, S. S., Gyasi-Agyei, Y., & Rodrigo-Comino, J. (2021). Estimation of SPEI Meteorological Drought Using Machine Learning Algorithms. IEEE Access, 9, 65503–65523. https://doi.org/10.1109/ACCESS.2021.3074305
[11]	Jiang, W., & Luo, J. (2021). An Evaluation of Machine Learning and Deep Learning Models for Drought Prediction using Weather Data. https://doi.org/10.3233/JIFS-212748
[12]	Gan, T. Y., Ito, M., Hülsmann, S., Qin, X., Lu, X. X., Liong, S. Y., Rutschman, P., Disse, M., & Koivusalo, H. (2016). Possible climate change/variability and human impacts, vulnerability of drought-prone regions, water resources and capacity building for Africa. Hydrological Sciences Journal, 61(7), 1209–1226. https://doi.org/10.1080/02626667.2015.1057143
[13]	Ayinla, B., & Akinola, S. O. (2021). An Improved Collaborative Pruning Using Ant Colony Optimization and Pessimistic Technique of C5.0 Decision Tree Algorithm. Article in International Journal of Computer Science and Information Security. https://doi.org/10.5281/zenodo.4427699
[14]	Zhong, R., Chen, X., Lai, C., Wang, Z., Lian, Y., Yu, H., & Wu, X. (2019). Drought monitoring utility of satellite-based precipitation products across mainland China. Journal of Hydrology, 568, 343–359. https://doi.org/10.1016/J.JHYDROL.2018.10.072
[15]	Breiman, L. (1997). ARCING THE EDGE.
[16]	Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451
[17]	Friedman, J. H. (2002). Stochastic gradient boosting. Computational Statistics and Data Analysis, 38(4), 367–378. https://doi.org/10.1016/S0167-9473(01)00065-2
[18]	Mason, L., Bartlett, P., Baxter, J., & Frean, M. (2000). Boosting Algorithm as Gradient Descent. Advances in Neural Information Processing Systems, 512–518.
[19]	Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2021). A Comparative Analysis of XGBoost. Artificial Intelligence Review, 54, 1937–1967. https://doi.org/10.1007/s10462-020-09896-5
[20]	Friedman, J., Hastie, T., & Tibshirani, R. (2000). ADDITIVE LOGISTIC REGRESSION: A STATISTICAL VIEW OF BOOSTING. In The Annals of Statistics (Vol. 28, Issue 2).
[21]	Breiman, L. (2001). Random forests. Kluwer Academic Publishers, Netherlands 45(1), 5–32.
[22]	Luo, H., Bhardwaj, J., Choy, S., & Kuleshov, Y. (2022). Applying Machine Learning for Threshold Selection in Drought Early Warning System. Climate, 10(7). https://doi.org/10.3390/cli10070097
[23]	Felsche, E., & Ludwig, R. (n.d.). Applying machine learning for drought prediction using data from a large ensemble of climate simulations. https://doi.org/10.5194/nhess-2021-110
[24]	Likas, A., Vlassis, N., & Verbeek, J. (n.d.). The global k-means clustering algorithm The global k-means clustering algorithm. [Technical. https://hal.inria.fr/inria-00321515
[25]	Tri, D. Q., Dat, T. T., & Truong, D. D. (2019). Application of meteorological and hydrological drought indices to establish drought classification maps of the Ba River basin in Vietnam. Hydrology, 6(2). https://doi.org/10.3390/hydrology6020049
[26]	Christoph, M. (2021, July 23). Predict Droughts using Weather & Soil Data. https://www.kaggle.com/datasets/cdminix/us-drought-meteorological-data accessed 18 May 2023
[27]	Nitin. (2020, April 22). LightGBM Binary Classification, Multi-Class Classification, Regression using Python. Https://Nitin9809.Medium.Com/Lightgbm-Binary-Classification-Multi-Class-Classification-Regression-Using-Python-4f22032b36a2 accessed 18 May 2023
[28]	Amber, T., & US, D. M. (2021). amberthomas/us-drought-monitor \| Workspace \| data. world. https://data.world/amberthomas/us-drought-monitor/workspace/project-summary?agentid=amberthomas&datasetid=us-drought-monitor accessed 20 May 2023