Forecasting Rainfall in Selected Cities of Southwest Nigeria: A Comparative Study of Random Forest and Long Short-Term Memory Models
Abstract:
Rainfall is crucial for agricultural practices, and climate change has significantly altered rainfall patterns. Understanding the dynamic nature of rainfall in the context of climate change through Machine Learning (ML) and Deep Learning (DL) algorithms is essential for ensuring food security. ML techniques provide tools for processing large-scale data to extract meaningful insights. This study aims to compare the performance of a ML algorithm, Random Forest (RF), with a DL algorithm, Long Short-Term Memory (LSTM), in predicting rainfall in six state capitals in Southwest Nigeria: Osogbo, Ikeja, Ibadan, Akure, Ado-Ekiti, and Abeokuta. The dataset for this study was sourced from the HelioClim website archive, which offers high-quality solar radiation and meteorological data derived from satellite measurements. This archive is known for its accuracy and reliability, providing extensive and consistent historical datasets for various applications. The monthly rainfall data spanning from January 1, 1980, to December 31, 2022, were used, with 80% allocated for training and 20% for validation. As the data are time series, each model was constructed using a look-back period of five months, meaning the past five months' rainfall data served as input features. The performance of these algorithms was evaluated using Mean Square Error (MSE), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE). The results indicated that the RF algorithm yielded the lowest MSE, RMSE, and MAE across all selected cities in Southwest Nigeria. This study demonstrated the superiority of RF regression over LSTM in predicting rainfall in these regions, providing a valuable tool for agricultural planning and climate adaptation strategies.
1. Introduction
Meteorologically, rainfall has been shown to contribute significantly to the lives of humans and animals on the earth's surface. As a result, this remains one of the key parameters in atmospheric science, which deals with the pattern and movement of the planet and agricultural development around the world. According to the study by Le et al. [1], heavy rainfall contributes significantly to flooding in any area, causing infrastructure damage, road network collapse, and disruption of some socioeconomic activities. Furthermore, it is well known that floods and other extreme events are major consequences of climate change, and they are expected to occur more frequently in high-rainfall areas, potentially causing catastrophic consequences in any developed area [2].
According to the study by Czarnecka and Nidzgorska-Lencewicz [3], weather conditions have the potential to increase air pollution, which has been identified as a major concern. However, research has shown that rainfall mitigation improves the approach of invitation and possible forecasting by increasing human mobility [4], [5], [6]. It also helps to produce agricultural and industrial products for any community's use and development [7], [8], [9], [10], [11], [12]. Research on temperature and rainfall variability is critical for Nigeria due to its unique climatological and geographical features, such as its diverse climate zones ranging from arid in the north to humid in the south [4]. This variability has a significant impact on agriculture, a key economic sector, as well as food security [5]. Nigeria's rapidly growing population and reliance on rain-fed agriculture make it especially vulnerable to climate change. Understanding these variations aids in the development of adaptive strategies to reduce negative impacts on crop yields, water resources, and overall livelihood [6]. One of the most important forecasting approaches is the timely prediction of rainfall, which can be expressed using statistical techniques that result in a correlation between geographical coordinates and rainfall, allowing the use of the area's longitude and latitude. Other parameters, such as humidity, temperature, pressure, wind speed, and direction, can be used to forecast rainfall based on past events [13]. As a result, empirical models, singular spectrum models, and non-linear models, among others, must be used in prediction and forecasting [14], [15].
The comparison between Nigerian sub-regions and regions such as India reveals similarities and differences in climatic and agricultural conditions, thereby improving understanding of the Nigerian context [2]. Nigeria's diverse climate zones, such as the arid north and humid south, mirror India's diverse climate, making such comparisons appropriate [3]. This relevance stems from identifying shared challenges and successful strategies used in India that can be adapted for Nigeria. For example, irrigation techniques that work in India's dry regions may apply to Nigeria's arid zones [4]. These comparisons offer a broader perspective, assisting in contextualizing Nigerian agricultural practices and informing targeted interventions based on proven methods in similar environments [6].
According to the study by Singh and Borah [16], statistical and mathematical models require composite calculating power. Therefore, they appear to be good models to be used with the least effect. As a result, rainfall can be forecast properly using Artificial Neural Network (ANN) models, which have been widely adopted by researchers worldwide, as reported by Singh and Borah [16] and Liu et al. [17]. The Internet of Things (IoT), known for its research growth and development, can be widely used by the ANN, which has contributed to the arrival of wireless technologies globally, thus increasing the capture of various satellites for the development of technology [18]. Furthermore, as reported by Liu et al. [17], some factors contribute to the growth of technology using the ANN approach for rainfall forecasting with the non-linearity of rainfall data, thereby bringing the knowledge between some relationships required for other variables. It has been observed that temporal variation takes different forms in the regional pattern of rainfall, indicating that there are numerous challenges in forecasting rainfall using ANN and ML models [19]. According to the studies by Balluff et al. [20], and Ramos et al. [21], ANNs demonstrate that Recurrent Neural Networks (RNNs) are the most appropriate factors for research in the development of rainfall forecasting and prediction. However, Elman [22] proposed that RNNs' temporal dynamics of previous information allowed for information remembrance when the architectural connection was adopted. Furthermore, the architectural RNN information has a lower ability to predict long-term data parameters [23].
Research indicates a barrier to ML algorithms. To overcome this barrier, LSTM networks have been developed to regulate information passage and facilitate cell information processing [24]. As a result, several researchers proposed that the LSTM take the lead in various stages of rainfall prediction [25], [26], [27]. As a result, as reported by Barrera-Animas et al. [18], the forecasting of rainfall volume must be addressed after the use of mathematical models based on various alternative ANN methods for rainfall forecast development.
2. Research Methodology
The HelioClim-1 (www.soda-pro.com) archives were used in this study to collect monthly rainfall data for selected Nigerian stations using the Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2) technique, as reported by several researchers [28], [29], [30]. The data spanning from January 1, 1980, to December 31, 2022, was retrieved on January 1, 2024. Furthermore, the data was collected based on the comma-separated value (CSV) data format as a monthly average from January to December of each year [30], [31], [32]. Furthermore, the data collection process was consistent with what Aweda et al. [29] reported, with the data being downloaded in 10-minute intervals for the first download and then converted to an Excel data format for analysis.
The stations used were selected from Nigeria's southwest geographical region. However, these stations serve the southwest state of Nigeria, limiting the study's geographical scope. This restriction means that diverse climatic zones within Nigeria are not represented, which may have an impact on the generalizability of the results to other regions with different climatic conditions. Located in the country's rainforest, the stations' coordinates and map are presented in Table 1 and Figure 1, respectively.
Stations | Longitude (0N) | Latitude (0E) | Altitude (m) | Vegetation |
Abeokuta | 7.1475 | 3.3619 | 66 | Rainforest vegetation |
Ado-Ekiti | 7.6124 | 5.2371 | 455 | Tropical forest |
Akure | 7.2571 | 5.2058 | 350 | Forest vegetation |
Ibadan | 7.3775 | 3.9470 | 230 | Savannah woodland |
Ikeja | 6.6018 | 3.3515 | 39 | Freshwater vegetation |
Osogbo | 7.7827 | 4.5418 | 320 | Tropical rainforest |
The data was preprocessed before being subjected to RF regression and LSTM algorithms. It is necessary to prepare data in a way that is suitable for RF regression and LSTM algorithms. As part of the data preprocessing, the data was cleaned, and an effort was made to identify if there were any missing values in the data as well as outliers. After it was ascertained that there was no missing value or outlier in the data set, the data was scaled using the MinMaxScaler in Python. The selected features for constructing the models were based on the rainfall data of the previous five months, establishing a look-back period of five months. The data was divided into training data (80%) and test data (20%). Below is a brief description of the RF regression and LSTM algorithms.
Within the category of supervised learning techniques, RF uses a combination of several trees, which enables RF ensemble learning to solve complicated problems and enhance model performance. To increase its forecast accuracy, this method uses an average of several decision trees based on various samples of the data set. In RFs, the trees grow in parallel. Therefore, there is no interaction between them as they are being built. Figure 2 shows the diagram of the RF regression. This tree representation of the model is used for the analysis.
This is a type of RNN capable of learning and remembering information over time. This makes it a powerful algorithm for time series forecasting such as rainfall, as considered in this study. The LSTM networks can learn long-term dependencies in sequential data, which makes them well-suited for tasks such as rainfall forecasting. The LSTM is an extension of the RNN, specially designed to deal with situations where the RNN fails. It was designed to overcome the vanishing gradient problem that occurs in traditional RNNs. By incorporating memory cells as well as input, output, and forget gates into its structure, the LSTM selectively retains and forgets information, thereby learning long-term dependencies in sequential data. The memory cells can store information for a long time, while the gates control the flow of information into and out of the cells.
The LSTM was designed to handle the challenges of capturing and processing long-term dependencies within sequential input. The layer contains memory cells that can retain information over extended periods, enabling the network to learn patterns and relationships in sequences such as time series or natural language data. This problem occurs when gradients diminish during backpropagation, limiting the network’s ability to learn long-term dependencies. The LSTM addresses this by utilizing a gating mechanism that regulates the flow of information into and out of memory cells. This allows it to selectively retain or discard information, facilitating the modelling of complex sequential patterns. Figure 3 shows the structure of the LSTM neural network for data analysis.
The performance of these algorithms was compared using MSE, RMSE, and MAE as follows:
where, $R_i$ is the actual rainfall, and $\hat{R}_i$ is the forecast value of rainfall.
3. Results and Discussion
Table 1 shows the result of the descriptive statistics for rainfall in the selected cities in Southwest Nigeria. The result indicates that Osogbo has the least rainfall (0.000002), while the highest rainfall within the period of study was reported in Abeokuta (865.7734). The average rainfall in Akure (197.1602) was higher than that obtained in other selected cities in Southwest Nigeria. The variability in the rainfall was assessed using the Coefficient of Variation (COV). The result shows that Ado-Ekiti reported the highest COV (88.65%), followed by Osogbo (84.36%), while Ikeja reported the lowest COV (62.40%), implying that there is a higher variability in rainfall pattern in Ado-Ekiti compared to other selected locations. It was found that the rainfall was more consistent in Ikeja compared to other locations (Table 2).
Location | Minimum | Maximum | Mean | Standard Deviation | Skewness | COV (%) |
Osogbo | 0.000002 | 692.2735 | 175.8342 | 148.3250 | 0.7184 | 84.36 |
Ikeja | 0.2584 | 626.0036 | 164.1716 | 102.4408 | 0.5950 | 62.40 |
Ibadan | 0.0107 | 638.3207 | 178.0813 | 140.8551 | 0.6894 | 79.10 |
Akure | 0.0133 | 602.6252 | 197.1602 | 142.5600 | 0.4091 | 72.31 |
Ado-Ekiti | 0.0016 | 619.1906 | 162.6627 | 144.1942 | 0.7428 | 88.65 |
Abeokuta | 0.1262 | 865.7734 | 161.9588 | 123.9102 | 0.9190 | 76.51 |
Figure 4 illustrates the seasonal variation of rainfall in selected Nigerian stations. Subgraph (a) of Figure 4 depicts the rainfall variation in Osogbo. It can be observed that the minimum rainfall experienced in the station was around 2016 with approximately 330 mm, while the maximum rainfall happened in 2008 with an approximate value of 690 mm. The results show that the variation in rainfall in Osogbo is caused by climatic variation, which could be the station's weather conditions. Osogbo, located in western Nigeria, is expected to receive more rainfall due to its proximity to the ocean and rivers. Climate change influenced the rainfall pattern in Osogbo. Subgraph (b) of Figure 4 depicts the rainfall pattern at Ibadan, a sub-Saharan station in Nigeria. The station experienced the minimum rainfall in 2015, with about 280 mm, and the maximum rainfall in 1993, with an approximate value of about 680 mm. This demonstrates that in the 1990s, Ibadan station received more rainfall than Osogbo. The results also show that the rainfall fall pattern in Ibadan had almost the same trend in the 1990s, whereas the rainfall pattern oscillated differently in the 2020s, resulting in a variety of pattern trends. Located in western Nigeria, Ibadan’s higher rainfall is caused by its geographical location as a rainforest. This demonstrates that farmers in the region can benefit more from planting in the sub-region. Subgraph (c) of Figure 4 shows the results observed in Ikeja, with a minimum rainfall variation of about 230 mm in 1992 and a maximum rainfall of about 650 mm in 2008. The rainfall pattern in Ikeja shows that it varies constantly, except for some years when the station received a lot of rain due to its proximity to the ocean. Rainfall in Nigeria is critical for the development of agricultural products [6], [35]. However, agricultural products in Ikaja are becoming increasingly available as a result of the station's development, which includes more factories, companies, and other activities. Furthermore, compared to Osogbo and Ibadan, Ikeja's seasonal rainfall pattern is preferable because it is more consistent. Subgraph (d) of Figure 4 shows the variation in rainfall experienced in Akure. The results show that Akure experienced its lowest rainfall variation in 2003, with a value of 350 mm and the maximum rainfall in both 2001 and 2022, with a value of approximately 590 mm. Akure experienced almost the same variation in rainfall pattern. However, it appears that the station's proximity to the ocean and rivers contributes to the same variation in rainfall experience at the station. Akure, another major city in southern Nigeria known for agricultural activities, requires more rainfall for its plantations and other activities that must be kept to a minimum. Subgraph (e) of Figure 4 depicts the variation in rainfall observed in Ado-Ekiti. The results show that the station experienced the lowest rainfall in 1987, with a value of about 150 mm, and the highest rainfall in 2000, with a value of around 620 mm. The results show that the rainfall variation experienced in Ado-Ekiti is low when compared to other stations, probably due to the station's higher hills and mountains, which may reduce the intensity of rainfall. Subgraph (f) of Figure 4 depicts the rainfall pattern experienced in Abeokuta, a station in western Nigeria. The station's minimum rainfall experience was approximately 270 mm in 1997, while the maximum rainfall variation experience was observed in 2009 with an approximate value of 850 mm. The difference between the minimum and maximum values of rainfall experienced at the station is large, approximately 580 mm, with the highest rainfall among all stations considered. As a result, according to the study by Aweda et al. [36], the country's rainfall pattern contributes to its growth and development if the government has some good storage facilities for agricultural activities.
Figure 5 depicts the monthly variation of rainfall in the selected stations. It can be observed that the lowest or no rainfall occurred in January, while the highest rainfall occurred in September across all stations. Ado-Ekiti recorded the lowest rainfall (20 mm), followed by Osogbo (30 mm), Ibadan (35 mm), Akure (40 mm), and Ikeja (49 mm). This indicates that Osogbo had the least amount of rainfall. However, the maximum rainfall values observed at all of the stations considered are as follows: Osogbo > Ibadan > Ado-Ekiti > Abeokuta > Ikeja. However, this demonstrates that Osogbo received significantly more rainfall than other stations. As a result, Osogbo was classified as a rainforest region, resulting in increased rainfall and the presence of more rivers and streams near the station. The findings also revealed a sharp increase in rainfall patterns for all of the stations studied, with the first peak occurring around July and then dropping around August, which is considered the country's August break. Then it peaked and reached a second peak in September before dropping to zero or no rainfall in December.
The performance of the LSTM was evaluated on the training and test data, aiming to check if there is evidence of overfitting. Table 3 presents the result, indicating that there is no evidence of overfitting as the performance of the LSTM on training and test data measured by the metrics was almost the same in all locations. Similarly, overfitting was also checked based on the RF regression. Table 4 presents the result, which reveals that there is no overfitting as the performance of the RF regression on both training and test data was not too different. The result of the comparative analysis of the performance of the LSTM algorithm and RF regression is presented in Table 5. As shown in Table 5, the MSE, RMSE, and MAE of Osogbo using the LSTM are 40441.2904, 201.1002, and 163.1744, respectively, while those using the RF regression are 4403.4632, 66.3586, and 46.0613, respectively. The MSE, RMSE, and MAE obtained using RF regression in Osogbo are less than those obtained for LSTM in the same location. Similar results were obtained in Ikeja, Ibadan, Akure, and Abeokuta, as RF regression produced lower MSE, RMSE, and MAE compared with those obtained using the LSTM.
| Performance on Training Data | Performance on Test Data | ||||
Location | MSE | RMSE | MAE | MSE | RMSE | MAE |
Osogbo | 42406.5059 | 205.9284 | 162.4215 | 40441.2904 | 201.1002 | 163.1744 |
Ikeja | 35056.4959 | 187.2338 | 176.0601 | 34400.1966 | 185.4729 | 175.6704 |
Ibadan | 40521.8108 | 201.3003 | 167.8530 | 38637.5240 | 196.5643 | 168.3751 |
Akure | 55989.1190 | 236.6202 | 206.7963 | 54380.6076 | 233.1965 | 207.3904 |
Ado-Ekiti | 44380.2057 | 210.6661 | 170.5023 | 42557.8746 | 206.2956 | 171.3379 |
Abeokuta | 35556.6121 | 188.56461 | 164.3951 | 34365.1883 | 185.3785 | 165.1266 |
| Performance on Training Data | Performance on Test Data | ||||
Location | MSE | RMSE | MAE | MSE | RMSE | MAE |
Osogbo | 4434.9558 | 66.5955 | 42.0992 | 4403.4632 | 66.3586 | 46.0613 |
Ikeja | 4842.50283 | 69.5881 | 41.0079 | 4753.4365 | 68.9452 | 49.0195 |
Ibadan | 5117.3659 | 71.5358 | 42.4818 | 5227.7389 | 72.3031 | 49.7411 |
Akure | 5772.1796 | 75.9749 | 50.9361 | 5763.9477 | 75.9207 | 54.4619 |
Ado-Ekiti | 6993.9922 | 83.6301 | 42.8607 | 6967.0814 | 83.4690 | 49.3846 |
Abeokuta | 6980.3658 | 83.5486 | 50.6573 | 6763.3621 | 82.2397 | 53.6803 |
LSTM | RF Regression | |||||
Location | MSE | RMSE | MAE | MSE | RMSE | MAE |
Osogbo | 40441.2904 | 201.1002 | 163.1744 | 4403.4632 | 66.3586 | 46.0613 |
Ikeja | 34400.1966 | 185.4729 | 175.6704 | 4753.4365 | 68.9452 | 49.0195 |
Ibadan | 38637.5240 | 196.5643 | 168.3751 | 5227.7389 | 72.3031 | 49.7411 |
Akure | 54380.6076 | 233.1965 | 207.3904 | 5763.9477 | 75.9207 | 54.4619 |
Ado-Ekiti | 42557.8746 | 206.2956 | 171.3379 | 6967.0814 | 83.4690 | 49.3846 |
Abeokuta | 34365.1883 | 185.3785 | 165.1266 | 6763.3621 | 82.2397 | 53.6803 |
This study has established that average rainfall was higher in Akure than in other cities considered in the study. This higher mean rainfall in Akure compared with that of Osogbo, Ikeja, Ibadan, Akure, Ado-Ekiti and Abeokuta could be a result of the fact that Akure is a coastal region as well as a rain forest, while other stations like Ado-Ekiti, Osogbo, and Ibadan are in the Savannah region. It was also found in this study that rainfall in Ado-Ekiti exhibits higher variation than other cities because Ado-Ekiti has a lot of valleys, which could reduce rainfall intensity in the area. In addition, it was found in this study that RF regression performed better than the LSTM in forecasting rainfall in the selected cities in Southwest Nigeria. This finding is corroborated by that of the study by Kanani et al. [37] in 49 different locations in Australia. This finding is not consistent with that of the studies by Chen et al. [38] and Khairudin et al. [39], which found that the LSTM outperformed RF regression. This disparity in findings can potentially be attributed to differences in the geographic characteristics and temporal frames of the studies involved.
4. Conclusion
The performance of LSTM and RF regression in predicting monthly rainfall in the six major towns in Southwest Nigeria was compared in this study. The superiority of the RF over the LSTM was established in this study, as the former model was found to report the least MSE, RMSE and MAE. Therefore, the RF algorithm is recommended for forecasting rainfall in these areas with the hope that this will enhance a better understanding of the rainfall dynamics for better agriculture, farming and food sufficiency in the study area. This model also helps to consider the changing pattern of global rainfall due to global warming.
5. Recommendation
It is critical to discuss environmental and socioeconomic factors because they have a significant impact on observed climatic patterns and study outcomes. For example, deforestation and urbanization can alter local climates, increasing temperature and rainfall variability. Poverty and agricultural dependence are socioeconomic factors that influence community resilience to climate impacts. Poor infrastructure and limited access to technology amplify the negative effects of climate variability on agriculture and livelihoods. Understanding these interactions aids in the development of more effective adaptation strategies and policies that address both environmental changes and socioeconomic vulnerabilities, resulting in comprehensive solutions to mitigate climate-related risks in Nigeria. Therefore, it is recommended in this study that the government of the Federal Republic of Nigeria implement all necessary mechanisms to actualize mitigation of environmental and socioeconomic factors that hinder the country's development.
The data used to support the findings of this study are available from the corresponding author upon request.
The authors declare that they have no conflicts of interest.