Skip to main content

Stepwise integration of analytical hierarchy process with machine learning algorithms for landslide, gully erosion and flash flood susceptibility mapping over the North-Moungo perimeter, Cameroon

Abstract

Background

The Cameroon Volcanic Line (CVL) is an oceanic-continental megastructure prone to geo-hazards, including landslide/mudslide, gully erosion and flash floods targeted in this paper. Recent geospatial practices advocated a multi-hazard analysis approach supported by artificial intelligence. This study proposes the Multi-Geoenvironmental Hazards Susceptibility (MGHS) tool, by combining Analytical Hierarchy Process (AHP) with Machine Learning (ML) over the North-Moungo perimeter (Littoral Region, Cameroon).

Methods

Twenty-four factors were constructed from satellite imagery, global geodatabase and fieldwork data. Multicollinearity among these factors was quantified using the tolerance coefficient (TOL) and variance inflation factor (VIF). The AHP coefficients were used to weigh the factors and produce a preliminary map per Geoenvironmental hazard through weighted linear combination (WLC). The sampling was conducted based on events records and analyst knowledge to proceed with classification using Google Earth Engine (GEE) cloud computing interface. Classification and Regression Trees (CART), Random Forest (RF) and Gradient Boosting Regression Trees (GBRT), were used as basic learners of the stacked hazard factors, whereas, Support Vector Regression (SVR), was used for a meta-learning.

Results

The rainfall was ranked as the highest triggering factor for all Geoenvironmental hazards according to AHP, with a coefficient of 1, while the after-learning importance assessment was more varied. The area under receiver operating characteristic (AUROC/AUC) was always more than 0.96, and F1-score is between [0.86–0.88] for basic classifiers. Landslides, gully erosion and flash floods showed different spatial distributions, confirming then their probability of co-occurrence. MGHS outputs clearly displayed two and three simultaneous occurrences. Finally, the human vulnerability assessed with population layer and SVR outputs showed that high human concentrations are also the most exposed, using the example of Nkongsamba’s extract.

Conclusions

Combining AHP with single learners, then a meta-learner, was efficient in modelling MGHS and related human vulnerability. Interactions among geo-environmental hazards are the next step and city councils are recommended to integrate results in the planning process.

Introduction

The Sendai Framework for Disaster Risk Reduction (SFDRR), stresses the need to understand disaster risks through identifying and assessing exposure to all hazards (UNISDRFootnote 12015b). Geo-environmental hazards specifically, result from combined anthropogenic and natural factors (IRDRFootnote 22014). They can occur suddenly (earthquakes and volcanic eruptions) or over time (flash floods, drought, desertification, landslides, erosion), leading to fatalities, damages to property/environment and disruption of services (Bang 2022). A rapid response to these hazards is necessary to assess damages, save lives and mitigate reoccurrence consequences (Bang et al. 2019). This study focuses on landslides, gully erosion and flash floods.

Landslides are a sudden upward-downward mass movement of debris, soils or rocks due to gravity and other interaction of geo-environmental and human-induced factors (Hungr et al. 2014). The majority of landslides originate from hydroclimatic events, specifically prolonged and/or huge rainfall (Park and Kim 2019) with several societal and environmental losses. An estimated 4.8 million peoples have been affected by landslides, and more than 18,000 deaths were recorded between 1998 and 2017.Footnote 3

Gully erosion on its own, is an important phase/form of soil loss and land degradation especially in semi-arid and arid regions. Several processes, i.e., anthropogenic – agricultural practices (Zhao and Hou 2019), deforestation (Gholami 2013) – natural, i.e., wind (Skidmore 1982), as well as overland water flows (Xiong et al. 2018), easily trigger or promote soil erosion. According to literature, flowing water, i.e., torrential, represents the main natural driver, initiating or emphasizing gullies dynamic, which are particularly devastating for both environment and the society (Arabameri et al. 2020; Pourghasemi et al. 2020a).

Besides, flood refers to an overflow of water onto normally dry surrounding lands (NOAA/NWS).Footnote 4 Both coast lands and inlands are affected. Depending on the triggering factor, the spatial location and the speed of the event/process, there three main types such as, coastal flood, fluvial flood and pluvial flood. Between 1995 and 2015, about 150,061 flood events happened worldwide, approximately 109 million people were impacted by the flood damages, causing 157,000 deaths (UNISDR 2015a),Footnote 5 with direct costs of USD 75 billion per year, which were liable for 11.1% of the global disaster casualties (Alfieri et al. 2017). Several investigations report that floods affected around 200 million people each year on a global scale (e.g., Tien Bui et al. 2019). In this study, pluvial flood was studied, due to its suddenness compared to the other two types, affecting more the population preparedness/resilience; while events of its (re)occurrence in the study area, were recorded during the rainy season, preceding or simultaneously with landslide and gully erosion (see Data and Methods section). Usually, the streams overflow from an existing waterway (river, stream, watershed, drainage ditch, etc.), happens between few minutes to less than 6 h, often as a result of heavy rainfall and inundate surrounding areas (Kron 2005). It is then commonly referred as flash flood, as used here.

Recently, the UNISDR has emphasized the significance of multi-hazard assessment and referred to it as an essential tool for a safer world in the twenty-first century (UNISDR 2015b, 2015c). In this regard, static and dynamic geospatial models have been developed and diversified for single hazard, with technical approaches applicable to simultaneous multiple hazards mapping.

The field inventory-based and plotting of data formerly used to produce hazard susceptibility map (HSM) (Varnes 1984; Colombo et al. 2005), was first completed by aerial photointerpretation, then satellite data, to build heuristic, statistical and data-driven models (Al-Juaidi et al. 2018; Khosravi et al., 2018; Meena et al. 2019; Rahmati et al. 2016; Mihi et al. 2020;). Then, artificial intelligence (AI) algorithms have been introduced in recent decades, to overcome performance limitations, especially artificial neural network (ANN) (Wang et al. 2019), and machine learning (ML) (Sadler et al. 2018; Pham et al. 2019; Chen et al. 2020; Towfiqul Islam et al. 2021), improving at the same time warning systems based on high computational approaches, for the monitoring and near real-time forecasting (Kirschbaum and Stanley 2018; Fayne et al. 2019; Ngandam Mfondoum et al., 2021; USGS 2022a, b, c).

Concerning multi-hazard mapping specifically, the Maximum Entropy (MaxEnt) algorithm has been combined with the technique for order of preference by similarity to the ideal solution and Mahalanobis distance (TOPSIS-MD), to harmonize the probabilistic context of multiple natural hazards (Sheikh et al. 2019). This ML has also been proven efficient in selecting predictors, assessing the degree of model fitting and predicting performance, for combined landslide, erosion/gully erosion and flood susceptibility assessment (Javidan et al. 2021). Another approach modelled different single hazards susceptibility through the simple additive weight (SAW) with the Analytical Hierarchy Process (AHP) coefficients, to prepare the path to the multi-hazard mapping of landslide, rockfall, goaf collapse and soil erosion (Sun et al. 2019). Further, the Boruta algorithm has been highly proficient in selecting and prioritizing factors per type of hazard in the same analysis; and then applying the Random Forest (RF) algorithm to train and validate flood, landslides and forest fire, to produce multi-hazard susceptibility map (Pourghasemi et al. 2020b). Moreover, a multiple and comparative learning, (i.e., boosted regression tree, random forest, support vector machine), was envisaged to produce a valid multi-hazard map of flooding, gully erosion, forest fires, and earthquakes (Pouyan et al., 2021).

However, most studied locations benefit from accurate and up-to-date databases on the occurrence and location of geo-hazards. This is not always the case in some areas. In addition, the socio-economic component is usually introduced as settlements (built-up, road), while the human dimension itself as the number of residents is not exclusively considered.

From the above statements, this study mixes field survey data and an analyst knowledge-based approach, with robust artificial intelligence processing, to predict landslide, gully erosion and flash flood distribution, from single to simultaneous occurrences. The main rationale of this work is to demonstrate how combining AHP to ML algorithms in the same modelling process, produces satisfactory performances to map the multi-geo-env hazards susceptibility (MGHS) and related human vulnerability. Specific goals are the following: i) use the AHP method to weigh the factors/predictors per model; ii) compare performances of different ML algorithms in accurately predicting occurrences on stacked-weighed factors; iii) assess the proficiency of a meta-learner in arbitrating the best distribution of single and combined geo-env hazards from stacked basic learners’ output; iv) predict the human direct threats to each and all occurrences from the population distribution perspective.

The major novelty of this work, resides in performing ML over AHP’s weighed factors, for multi-geo-env hazards modelling. The two have always been used separately according to the available literature that has been synthesized above, but not in the same modelling process, to the best of our knowledge. Furthermore, the population distribution layer is newly introduced in hazard multi-analysis.

Dataset and methods

Study area

The study was conducted in the Northern subset of the Moungo division (Littoral Region), embedded between three volcanic mountains, i.e., Manengouba (2396 m), Koupé (2070 m), Nlonako (1822 m), and belongs to the Cameroon Volcanic Line (CVL) (Elsheikh et al. 2014). The soils are essentially less evolved volcanic ones known as andosols (Gèze, 1942). It is slightly extended over the surrounding natural perimeter encompassing administrative divisions of, Haut-Nkam, Menoua (West Region), and Koupé-Manengouba (South-West Region), on the edges of the Bamileke Plateaus, which is constantly subject to different geo-env hazards (Nguimbous and Manguelle, 2010; Wotchoko et al. 2016; Ndonbou et al. 2022). The whole area is inside coordinates, North 4°54’-5°21’ and East 9°46′30″-10°6’. The main cities covered are, Nkongsamba (Moungo’s head of division), Melong, Kekem, then partially Santchou and Bangem (Fig. 1).

Fig. 1
figure 1

Study location inside administrative (a) and altimetry (b) contexts

The Equatoguinean climate dominates the area, with eight months of rainfall (mid-March/mid-November) averaging 1400 mm/year, while peaks exceed 2000 mm in September (Kadjio Feudjio et al. 2021). Rainfall distribution impacts annual temperatures, which vary between 16 and 23°C, as well as relative humidity which varies between 55 and 99%. These climate parameters are deeply influenced by altimetry (Kadjio Feudjio et al. 2021) and in return, they influence the natural land cover distribution. The vegetation comprises degraded forest mixed with savannas (tree and grassy) on mountains top and hillslopes, forest galleries in valleys and spots of agroforestry (Tsewoue et al. 2020). From the compiled district council and national census statistics, about 288,604 inhabitants live in the studied subset, with an average population density of 142.9 per km2.

Work environment and tools

The processing was conducted alternatively between the cloud environment and desktop tools. Google Earth Engine (GEE) platform was used for cloud computing (Gorelick et al. 2017). Python and JavaScript codes have been previously implemented for semi-automatic or fully automatic socio-environmental monitoring and hazard susceptibility mapping (e.g., UN-SPIDER 2020; Elnashar et al. 2021; Handwerger et al. 2022). A JavaScript coding process was been used to select, filter, load, and preview the study area and subsequent data, as well as further machine learning.

The offline tools are licensed software such as Erdas Imagine 2020 version 16.6.0.1366, ArcGIS version 10.8.2, SAGA-GIS version 8.0.1, Super Decision Software (SDS) version 3.2 and, XLStats version 2020.1.64570. They were used for the preparation of factors, statistics and layouts.

Data preprocessing and factors preparation

Susceptibility modelling refers to the spatial prediction of a phenomenon by perception and formulating the link between the occurrence evidence and the influential thematic factors/predictors (Sheikh et al. 2019). To define these predictors and before any operation, each data was clipped at the extent of the subset.

Shuttle radar topography mission-digital elevation model imagery and direct layers

Direct layers refer to those prepared in a straight computation by the software. The Shuttle Radar Topography Mission-Digital Elevation Model, SRTM-DEM, was the only raw data provider of this group of layers. This data is freely available worldwide on the website of the United States Geological Survey Earth Resources Observation and Science (USGS/EROS), and offers a void-filled data at a resolution of 1 arc-second, i.e., approximately 30 m.Footnote 6 It was mostly preprocessed with SAGA-GIS, to produce thirteen (13) raster layers, which are, elevation, slope degrees, slope aspect, plan curvature, profile curvature, relative slope position, terrain ruggedness index (TRI), slope length and steepness (LS), valley depth, drainage density, channel network distance, stream power index (SPI) and topographic wetness index (TWI).

Other satellite data and indirect layers

Indirect layers are those prepared through several computations. The Sentinel-2 multispectral instrument (S2-MSI) surface reflectance image, was first used to perform the land use land cover (LULC) classification. In the collection, five cloudiness granules were available for the 22nd of December 2021 (Appendixes 1 & 2). They were loaded on the GEE interface, and then compiled, i.e., merged and stacked, by using the median reducer function (GEE 2022). Then, the image was converted from 32-bit to 8-bit unsigned integer, before proceeding to the learning process. Four classes of land features were defined, i.e., water, vegetation, soil and built-up. The Random Forest (RF) algorithm was applied (Number of Trees:100, Variables per split:5, Minimum Leaf Population:1, Seed:0.5) (Friedman 2001). Then, the Dynamic World dataset, which is a 10 m near-real-time (NRT) Land Use/Land Cover (LULC) product based on predictions of the Sentinel-2 L1C (Brown et al. 2022),was used for cross validation and accuracy assessment. The overall accuracy was 87% and the kappa coefficient was 0.82. (Appendix 3).

Further, the Shadow-Eliminated Vegetation Index (SEVI) was computed, to assess the natural coverage/bareness, complementarily to the previous vegetation class from the RF. This index eliminates the effect of terrain shadow, including the self and cast shadows (Jiang et al. 2019). The algorithm is built around any conventional vegetation index (CVI), such as the reduced simple ratio index (RSR) used here, then the shadow vegetation index (SVI) and finally, the micro-topography correction factor (f(Δ)). The latter adjusts the CVI and the corresponding SVI ratio, to avoid under-elimination or over-elimination. This study used, f(Δ) = 0.581, as in the original formulation (Table 1).

Table 1 Conditioning factors in detail

Regarding the natural land cover, the soil typology layer was produced using the basic map (1:1 000 000) available (ORSTOM 1965). There are six types of soils depending on the altitude and the bedrock, i.e., black humus soils on basalts, red to yellowish ferrallisol on basic rock, red to yellowish ferrallisol on sediments, oligotrophic organic hydromorphic soil, organic gleysol and brown soil on basic rock (ORSTOM 1970).Footnote 7 Next, the soil depth layer was prepared from the USDA soil texture categorization, known as, the USDA system (Hengl 2018). Version v02 was used, as it globally describes several textures and measures of six different depths (0, 10, 30, 60, 100 and 200 cm), at 250 m of spatial resolution. However, because the study area mixes different materials according to the classes present in features of the source code, a combination was necessary to deduct new intervals of depths, with a probability, of class > 200 cm (Table 1).

Another factor is rainfall, which was composed of two sources. The first one is the TerraClimate data, which is a dataset of monthly climate and climatic water balance for global terrestrial surfaces (Abatzoglou et al. 2018). Its spatial resolution is about 4 km, all available scenes for the year 2021 were exported to the ArcGIS environment, and average values were computed. Complementarily, records from the rain gauges available at local offices of agriculture and livestock were collected, averaged and displayed on the TerraClimate data, to create shapefile in point features. From there, the mean annual rainfall (MAR) layer was created by using the Inverse Distance Weighted (IDW) interpolation for landslide and flash flooding models. Whilst the modified Fourier Index (MFI) was used to create the corresponding layer for the gully erosion model since this computing process better approximates the kinetic energy of rain over erosion-prone soils, i.e., R-factor (Arnoldus 1980).

Additionally, some complementary pre-processing of the SRTM-DEM was performed in ArcGIS, using the spatial analysis set of tools. The fault density layer was prepared, following five steps: (i) producing the hillshade using surface tool; (ii) displaying the shapefile of existing faultsFootnote 8; (iii) conditioning the hillshade in fourth different azimuth/altitude angles of shaded combinations (i.e., 315°/45°; 200°/50°; 100°/60°; 50°/90°); iv) digitizing each shading layer to complete the existing shapefile; v) interpolating shapefile using the line density tool. The same shapefile was used as an entry to produce the distance-to-fault layer, with the Euclidean distance tool, fixing 1500 m as the influence zone, which is assumed to be the closest easily affected perimeter according to field knowledge.

Another shapefile created from the SRTM-DEM was the stream network, using the hydrology tool. After producing the flow direction and flow accumulation sub-components, the output raster is converted into a polyline feature. Because several rivers and small streams abound, the values selected were stream ≥ 200. This feature was used to produce the distance-to-stream layer, following a Euclidean distance of 1500 m.

Furthermore, the road density layer was prepared based on the existing shapefile,Footnote 9 also using the line density tool. Because the road degradation is very pronounced and consequently contributes to trigger hazards with multiple potholes on an extended perimeter of 1000 m on both sides of the pavement, the distance-to-road was equally prepared using the Euclidean distance tool.

In addition, a lithology layer was prepared by using the geological base map (1:500 000), scanned, reprojected, orthorectified and digitized according to the rocks existing in the area. Those are, embrechite, syenite, rhyolite-trachyte and basalt (ORSTOMFootnote 101965 and 1970).

Therefore, a total of twenty-four (24) factors were prepared from the collected data (Appendix 4). Table 1 summarizes their expressions based on ArcGIS, SAGA-GIS, and additional literature.

Inventory of hazards

The most reliable records are found in annual reports of sub-division offices and councils, but without a proper Global Positioning System (GPS) location. Some other previous local events are mapped and documented at the Institute of Mining and Geological Research (IMGR-Yaoundé), but not regularly (e.g., Season, Year, Pace). The two sources were combined to schedule fieldwork and geolocate areas/spots of occurrence. The most reliable records are between 1997 and 2022. Then, landslide/mudslide affected twenty-six (26) points causing, 4 deaths, destroying 7 houses and making 14 homeless in Nkongsamba in 1997,Footnote 111death, 12 houses, 103 homeless and 1 road in the district of Kekem in 2007 (at Moumé-Bafang), then traffic disruption toward Santchou (Ayonghe et al. 2002; Tchindjang 2013; Ngandam et al., 2021). Gully erosion is permanent on hillsides and beside paved roads along steep slopes, with aggravation and sudden collapses during every rainy season, on thirteen (13) known points. However, none of the gully’s events has been declared deadly since then, although about 200 houses have been voluntarily abandoned or evicted by authorities; while more than eleven kilometers (11 km) of road became increasingly risky,Footnote 12 especially in Bangem and Melong. With regards to flash floods, it happens only during the rainy season in marshy pools and lowlands, with fifteen (15) records including about 198 abandoned or evicted houses recorded mainly in Santchou, Kekem and Nkongsamba.

The data once gathered were displayed on raw elevation and slope maps, and reclassified into five classes each, to allow comparison with the study subset. Then based on contour lines, altitudes and slope angle, the same number of points as records were created to build the most accurate and reliable geodatabase for each hazard (Appendix 5). This file has been useful to perform the learning process.

Multicollinearity and co-(in)dependency test

The multicollinearity among different conditioning factors is an indispensable step to assess their co-(in)dependency. As a basic assumption, a strong linear correlation among the factors indicates redundancy, which reduces the model’s accuracy and performance, leading to errored results (Dormann et al. 2013). Among other tools, Relief-F, Chi-square statistics, Information, Linear Support Vector Machine (LSVM), Variance Inflation Factor (VIF) and Tolerance coefficient (TOL), are dominant in the literature (Mind’je et al., 2020). The two latter, i.e., VIF and TOL were used here, as they have been previously successful in multi-hazard factors’ selection (e.g., Javidan et al. 2021). They are expressed as a pair of reciprocals, and calculated as follows (O’Brien 2007) (Eq. 1 & 2):

$$VIF=\frac{1}{{1-R}^{2}}$$
(1)
$$Tol=\frac{1}{VIF}$$
(2)

where, \({R}^{2}\) is the coefficient of determination of the regression between the factor of interest and the other landslide conditioning factors. As a rule of thumb to interpret these coefficients, when VIF > 5 or 10 and TOL < 0.1, there is a multicollinearity obstacle (O’Brien 2007; Dormann et al. 2013). As such, their values depend on the analyst and the goal of the research. In this study, since there is a high number of layers, VIF ≤ 3 and TOL ≥ 0.3, were proposed as thresholds to maximize the probability of excluding any similarity in-between factors, per geo-env hazard. All factors were standardized, [0–1], before proceeding with the test, while the priority in including/excluding a factor was given to the VIF values.

Core processing

Analytic Hierarchy Process

The Analytical Hierarchy Process (AHP) is a widespread multicriteria decision-making (MCDM) tool for various types of assessments, especially for geospatial applications. This approach was introduced in the 1980s by Thomas L. Saaty (1980 and 1988). It has been broadly used ever since, with successful applications in varied geospatial modelling around the world, for instance, site suitability selection (e.g., Ngandam Mfondoum et al., 2014 and 2019) or hazards prediction (e.g., El Jazouli et al. 2019; Nsangou et al. 2022).

AHP proceeds by a semiquantitative and flexible process, which involves a matrix-based pairwise comparison of different criteria or sub-criteria depending on the ramification of the tree, to solve a specific complex problem. The problem is decomposed into a hierarchical structure of several levels where, the target, main criteria, sub-criteria and alternatives are considered. A comparison is made between the criteria and the applicable alternatives, using the assessment scale from 1 to 9 based on the relative importance (Saaty 1988). Then, comparison matrices are formed and validated by using a coefficient, the consistency ratio (CR), that articulates the compatibility of all the parameters involved. The CR value should be, < 0.1, before the model is considered valid. This coefficient is defined as the consistency index (CI) divided by the random consistency index (RI), and they are expressed as follows (Saaty 1980) (Eq. 3 and Table 2):

$$CI=\frac{{\lambda }_{max}-n}{n-1}$$
(3)

where \({\lambda }_{max}\) is the largest eigenvalue of the considered matrix, and \(n\) is number of criteria.

Table 2 Random consistency index values

Therefore, using the SDS v3.2, each hazard was considered as a problem to solve. Consequently, three different AHP were performed, to define a weight of evidence corresponding to each factor in triggering the targeted geo-env hazard. From the importance explained by a pairwise matrix and the structure of the hierarchy, absolute values of weights were obtained for further analysis.

Machine learning algorithmic

Three ML algorithms were used to classify the weighed and stacked factors, i.e., Classification and Regression Trees (CART; Breiman et al. 1984), Random Forest (RF; Breiman 2001) and Gradient Boosting Regression Trees (GBRT; Friedman 2002). Then, the Support Vector Regression (SVR) ML, was used for a meta-learning of the three previous.

The SVR especially is a variant of the support vector machine (SVM), and its principle is founded on the pattern’s recognition. As a reminder, SVM constructs hyperplanes in a multidimensional space separating linear and nonlinear samples of different class labels (Cortes and Vapnik 1995). Special properties are built for the decision surface, to ensure high generalization ability of the learning. The output should present a separating hyperplane or decision boundary, that maximizes the distance between these hyperplanes and the classes sample, as summarized below (Eq. 4):

$$u=w\bullet x-b$$
(4)

where \(w\) is the normal vector to the hyperplane and \(x\) is the input vector. The separating hyperplane is the plane \(u=0\). The nearest points lie on the plane \(u=\pm 1\). The margin is thus (Eq. 5):

$$m=\frac{1}{{\Vert w\Vert }_{2}}$$
(5)

The optimization parameters inside the margin \(m\) are cost, penalty and gamma (C, \(\xi\), \(\gamma\)).

From the SVR perspective,Footnote 13 there are three new considerations: an additional tunable parameter epsilon (\(\varepsilon\)) which determines the width of the margin \(m\); support vectors which are the points that fall outside rather than just the ones at the margin \(m\); penalty parameter \(\xi\) which measures now the distance to points outside the margin \(m\), with the analyst controlling the regularization of parameter C. This may allow for nonlinear relationships by fitting smooth transformation functions and adjusting deviations between classes of hazard prediction by stacked factors. From these principles and functionalities, SVR appeared as a good alternative of metric-based meta-learning to combine and adjust CART, RF and GBR outputs.

First combination, stacking and sampling

The weighted linear combination (WLC), also known as Simple Additive Weight (SAW) (Hwang and Yoon 1981), was performed among all standardized factors weighted by their AHP coefficient, to obtain the first preview of each hazard. The output was reclassified into five classes using the Jenks natural breaks method. Then the sampling was conducted for automatic learning using the events’ inventory and distributed following a 50/50 ratio as shown in Table 3 and Appendix 4.

Table 3 Sampling distribution for basic learners

Therefore, the scaled-weighted factors were stacked and reexported to the GEE platform to perform CART, RF and GBRT learning.

Further, to run the SVR meta-learning process, other stackings were done for each output of CART, RF and GBRT, i.e., landslide, gully erosion and flash flood. As such, a total of nine (09) SVR were conducted. At the difference of the basic learners, only the recorded events sample was used, for a smoothing process picturing real occurrences.

Multi-geoenvironmental hazard susceptibility (MGHS) model

The final step was the MGHS model. Using a simple linear combination (SLC), all the hazards were added per primary ML algorithm, i.e., CART, RF and GBRT. Further, the meta-learning consisted in summing the three previous SVR outputs to produce the improved MGHS. Then, all the results were exported to the ArcGIS environment, to proceed with area calculation, hazards association preview, further analysis and layouts.

Human exposure mapping process

Beyond the artificial settlements and vegetation cover that involve human impact (road, LULC, SEVI), the number of residents has been integrated as the ultimate layer to cross with each hazard, and better express the life-threatening status. Human exposure is herein defined as the intersection of a hazard susceptibility map and a spatially distributed population layer (Bernhofen et al. 2021). As such, population density has been previously used from the settlements’ perspective (e.g., Modugno et al. 2022).

Practically, the population shapefile per locality, was constructed in point features from the regional offices of the central bureau of the Census and population studies (BUCREP 2010), and then interpolated using the IDW tool. A simple linear combination (SLC) was performed with each output of the SVR. The results were reclassified to obtain the final classes of hazards’ direct threat on human life, i.e., vulnerability.

Performance assessment and comparison of models

  • Confusion matrix: The confusion matrix includes true positive (TP), false positive (FP), true negative (TN), and false negative (FN) categories (Frattini et al. 2010). The value calculated from the confusion matrix provides useful information on model performance and classification accuracy, based on probabilities. In this study, the balanced F-score (F1-score), true positive rate (TPR) and false positive rate (FPR) were exploited to assess the performance of each learning from the final output (Eq. 69).

  • Receiver Operating Characteristic: The receiver operating characteristic (ROC) or success rate curve, is a visualization tool which helps to validate the rationality and robustness of a probabilistic model (Pradhan 2013; Wu et al., 2016). The ROC curve is plotted by statistical index value pairs, with the sensitivity or TPR on the x-axis and, “1-specificity” or FPR on the y-axis. These two values are used as cut-off-dependent metrics, associated with the area under ROC (AUC/AUROC) which is the cut-off-independent metric for accuracy and efficacy evaluation of predictive model outcomes (Eq. 9):

    $${F}_{1}=\frac{2\cdot TP}{2\cdot TP+FP+fn}$$
    (6)
    $$TPR=\frac{TP}{TP+FN}$$
    (7)
    $$FPR=\frac{FP}{FP+TN}$$
    (8)
    $$AUC={\int }_{0}^{1}f\left(FPR\right)d\left(FPR\right)=1-{\int }_{0}^{1}f\left(TPR\right)d\left(TPR\right)$$
    (9)

Figure 2 summarizes the main steps used to complete the methodology.

Fig. 2
figure 2

Flowchart summarizing the used methodology

Results

Multicollinearity analysis

Globally, the VIF of different pixel sizes was around 1e5th, indicating no collinearity problem among factors, and consequently, they could contribute to subsequent modelling processes (Garosi et al. 2019). For landslide, lithology recorded the highest VIF, i.e., 2.223, and the lowest TOL value, i.e., 0.449. For gully erosion, the soil depth appears to be the most adequate factor, with a VIF value of 2.649 and a TOL value of 0.377. Then for flash flood, the plan curvature factor has the best fitting values to run the model, with 2.557 and 0.39 for VIF and TOL, respectively. While these values remained inside the targeted thresholds (i.e., VIF ≤ 3 and TOL ≥ 0.3), the relative slope position factor was excluded from the flash flood modelling, for a high VIF, i.e., 3.078, although a good TOL of 0.325.

To sum up, all 24 factors have been used in the modelling of landslide and gully erosion (Table 4). Contrarily, only 18 factors have been used for the flash flood modelling, with the exclusion of the relative slope position factor for high collinearity above the thresholds fixed (Table 4); while LS, fault density, distance to fault, distance to road and road density factors are usually not integrated with the flood modelling, according to the literature and confirmed by the analyst knowledge of this hazard’s occurrence in the area .

Table 4 Collinearity of factors

Pre-/post-machine learning importance of conditioning factors

The pre-learning importance of factors is considered here as the AHP outputs. Whereas the post-learning importance refers to the ML explanations of each factor’s contribution to the model.

The AHP values are shown by Fig. 3. From the analyst knowledge-based assessment of the study area, the rainfall is mainly responsible of all three hazards, with the highest value, 1, for all. The lowest coefficients are LULC for landslide (0.391) and flash flood (0.296), and SEVI for gully erosion (0.346). These values were confirmed acceptable by the objectivity test introduced by the consistency ratio interval, 0.0209 ≤ CR ≤ 0.0256, way below the allowed threshold, 0.1.

Fig. 3
figure 3

AHP coefficients for each factor per hazard

Automated learning gives different configurations of importance. For landslide susceptibility assessment, elevation has been the dominant factor following CART and RF learning, with 44.54% and 6.4%, versus rainfall factor (9.71%) according to GBRT learning. Concerning gully erosion, elevation (48.08%), TRI (6.51%) and slope (10.74%) are the three dominant factors according to CART, RF and GBRT respectively. With regard to flash floods, elevation (26.64%), profile curvature (11.73%) and slope (20.78%) have been the most impacting factors, according to the same order of ML algorithms. CART depicted large gaps between the highest and lowest percentages of conditioning factors, with some values almost nil, i.e., 0.006%, contrary to RF and GBRT which showed a progressive difference between factors (Fig. 4).

Fig. 4
figure 4

Automatically computed importance of factors per hazard and per learning

Validation of models using ROC, AUC and F1-score

The ROC was plotted and the default cut-off value, 0.5, was fixed for all the learners. It was completed by analyzing the area under the curve (AUC), which qualitatively assesses the prediction accuracy, i.e., the functionality of the models. A ROC value closer to 1 at the top left corner of the graph implies a larger proportion of the AUC and suggests that the model is more accurate. In contrast, a smaller AUC value indicates a poor accuracy of the model in predicting future occurrences of the phenomena, i.e., single or all hazards (Sun et al. 2019; Rehman et al. 2022). Statistics of the outputs in five classes have been used in this section.

At a general first glance, all the ROC are close to 1 (Fig. 5). For landslide learning, AUC values are between [0.967–0.972], with CART recording the lowest value and RF the highest one. The AUC values for gully erosion learning showed values between [0.976–0.978], with the lowest extremity for CART and the highest for RF. While for flash flood learning, AUC values reflected the highest accuracy, between [0.989–0.991], with GBRT and RF at low and high extremities.

Fig. 5
figure 5

ROC curves and corresponding AUC values

Concerning the meta-learner, SVR showed the lowest values of AUC for landslide, i.e., 0.964, the highest for gully erosion (0.979) and the lowest value same as for GBRT, i.e., 0.989. But generally, there are very small differences among the AUC values of ML models for each geo-env hazard occurrence, while the SVR accuracy and performance are the consensual accuracies among other learners. When it comes to the MGHS learning model, the maximum AUC, 1, was reached by all the classifiers including SVR, for all classes.

Moreover, the interpretation of the F1-score allowed to show that single geo-env hazards were assessed high enough to be relevant, with basic ML algorithms of 0.86 for gully erosion and flash flood, and 0.87 for landslide (Table 5). The same values have been recorded by SVR for gully erosion and landslide but increased to 0.88 for flash floods. Concerning the MGHS, values were a little lower, between [0.82–0.84] (Table 5).

Table 5 F1-score values

Spatial distribution and statistical trends of single geo-env hazards susceptibility

A total of twelve (12) maps were derived, i.e., CART-Landslide, RF-Landslide, GBRT-landslide, SVR-Landslide, CART-Gully Erosion, RF-Gully Erosion, GBRT-Gully Erosion, SVR-Gully Erosion, CART-Flash Flood, RF-Flash Flood, GBRT-Flash Flood, SVR-Flash Flood (Fig. 6). Five classes of susceptibility were defined, such as “Very Low”, “Low”, “Moderate”, “High” and “Very High”. The assessment uses the preview with the main focus on the two latter classes (Fig. 7).

Fig. 6
figure 6

Single geoenvironmental hazard spatial distribution

Fig. 7
figure 7

Areas of single geoenvironmental hazard per learner

Landslide mostly affects the western half according to the three classifiers. However, the spatial distribution trends show more similarity between classes for RF and GBRT, than with CART. Further, at least 30% representing the largest percentage of the area is exposed to “Very High” susceptibility according to RF (30.9%) and GBRT (31.2%), versus only 21.6% for CART, which learning estimated the “High” susceptibility more distributed, i.e., over 28.7% of the area. Moreover, summing the two highest susceptibility classes gives 50.3% of the area according to CART, but only 47.3% and 49.1% for RF and GBRT respectively.

Gully erosion on its side appears sparsely distributed. All classifiers share five major spots, which are located in the North-west, West, South, East and Centre. However, CART and GBRT properly exhibit the spots at the Centre, more than RF does. The “Very High” susceptibility class represents 15.8%, 18.5% and 20.2% of the area according to CART, RF and GBRT respectively, versus 19.2%, 22.1% and 18% for the “High” class, in the same order. In contrast to landslide, these two classes together represent less than 50% of the area, i.e., 35%, 40.6% and 38.2% according to CART, RF and GBRT respectively. Then, between 60 and 65% of the area are lowly to moderately prone to gully erosion.

Concerning flash floods, it is demarcated to the East half of the study area according to all classifiers. The “Very High” susceptibility class occupies 20.9%, 22.3% and 22.5% of the area for CART, RF and GBRT respectively. Whereas 10.6%, 13.8% and 14% represent the ratios of the “High” susceptibility class in the same order of learners. The total for these two classes is 31.6%, 36.1% and 36.6%, suggesting that 64% to 69% of the area are is lowly to moderately concerned by flash floods.

The overall output of the meta-learner SVR arbitrated the spatial distribution of susceptibility per geo-env hazard, by taking the most dominant features according to the learning process and with consequences on the percentage of areas per class. For landslide, “Very High” and “High” susceptibility classes, represent 32.30% and 15.91% respectively, i.e., 48.21% of the area. Concerning gully erosion, 13.73% and 20.99% are the ratios for the same order of classes, i.e., 34.72% of the area. Whereas for flash floods, 17.31% and 16.10% are the ratios, i.e., 33.41% of the area.

In addition, discrepancies among classes are widely effective for flash floods, but less noticeable for landslide and gully erosion. As such, CART-Flash Flood, RF-Flash Flood and GBRT-Flash Flood previews, show the largest spatial distribution for the “Low” susceptibility class, confirmed by statistics, i.e., 39.37%, 39.92% and 43.47% respectively. Whereas, SVR-Flash Flood specifically highlights the “Very Low” susceptibility class to be the most extended, which is confirmed by its largest percentage, i.e., 40.75% of the total area.

Multi-geoenvironmental hazard susceptibility (MGHS)

Combinations of geo-env hazards were conducted according to the following SLC:

$${{\varvec{M}}{\varvec{G}}{\varvec{H}}{\varvec{S}}}_{{\varvec{M}}{\varvec{L}}}=\;{{\varvec{L}}{\varvec{a}}{\varvec{n}}{\varvec{d}}{\varvec{s}}{\varvec{l}}{\varvec{i}}{\varvec{d}}{\varvec{e}}}_{{\varvec{M}}{\varvec{L}}}+{{\varvec{G}}{\varvec{u}}{\varvec{l}}{\varvec{l}}{\varvec{y}}{\varvec{E}}{\varvec{r}}{\varvec{o}}{\varvec{s}}{\varvec{i}}{\varvec{o}}{\varvec{n}}}_{{\varvec{M}}{\varvec{L}}}+{{\varvec{F}}{\varvec{l}}{\varvec{a}}{\varvec{s}}{\varvec{h}}{\varvec{F}}{\varvec{l}}{\varvec{o}}{\varvec{o}}{\varvec{d}}}_{{\varvec{M}}{\varvec{L}}}$$

where \({\varvec{M}}{\varvec{L}}\) refers to the targeted machine learning algorithm. To properly discriminate the different contributions, computations were conducted for different pairs and then for the three geo-env hazards, ML. The outputs combination obtained are therefore, “Landslide + GullyErosion”, “GullyErosion + FlashFlood”, “Landslide + FlashFlood” and “Landslide + GullyErosion + FlashFlood”, also named here as “All”.

The analysis is based on the highest classes of the outputs (Figs. 8 & 9). At a glance, the susceptibility of combined landslide and gully erosion has the largest spatial cover, towards the northern, western and south-western areas. From the basic ML, they are predicted to simultaneously occur over [17.5–24.4%] of the subset, with the lowest percentage for CART and the highest for RF. Whereas, for SVR, they are estimated to occur over 21.4% of the subset (Fig. 10). The second obvious combination is the landslide and flash flood susceptibility of occurrence, which is most likely to happen in the center, western and south-western areas. They are extended between [5.3–8.9%] of the subset (Fig. 10), from RF to CART for basic ML; while SVR gives the same value as RF, i.e., 5.3% of the subset. The “GullyErosion + FlashFlood” susceptibility combination is less spatially distributed, mostly at the center area, with a south-to-west expansion. The percentages vary between [1.2–3.4%], from RF to CART for basic ML; while SVR estimates that to 1.1%. At last, the combination of all geo-env hazards shows at first sight the same distribution patterns as the latter, with percentages [0.6–1.4%], from RF to CART/GBRT; whereas SVR estimates this ratio at 0.7%.

Fig. 8
figure 8

MGHS distribution per combination of occurrence – Only the highest value is highlighted here

Fig. 9
figure 9

MGHS summarized distribution – highest values are highlighted; lowest ones are combined into one class

Fig. 10
figure 10

MGHS areas per learner. Ratios are calculated from the highest value area that highlight probability of occurrence on one hand, and the sum of “Very Low” to “Moderate” areas

Vulnerability status

Outputs for the vulnerability analysis are presented here as heatmaps, which have been suggested to better support rapid response or mitigation for hazards (Handwerger et al. 2022). Generally, the population distribution matched the hot spots of each geo-env hazard occurrence (Fig. 11). This may lead to assume that high human concentrations are globally the most exposed. This assumption has been verified in the main city of Nkongsamba, where an extract of the urban perimeter shows a concerning exposure of population to studied geo-hazards.

Fig. 11
figure 11

Vulnerability heatmap using SVR outputs for the whole subset and the extract of Nkongsamba

At the center of this urban sub-area, vulnerabilities to landslide and gully erosion are expanded in the South-West/North-East direction (Fig. 11). The sum of “Very High” and “High” hot-spot classes, represents 36.5% of the extract for landslide, and 28.95% for gully erosion (Fig. 12). However, the spatial distribution of the two selected classes is inverted for the vulnerability to these two geo-env hazards. While the vulnerability to flash flood is highlighted in the southern center of the extract, at the complete opposite of the two previous, although still South-West/North-East orientated. The total area of flash floods vulnerability for the “Very High” and “High” hot-spots classes, represents 31.02% of the extract. Eventually, the most vulnerable spaces are widely expanded in the South-West/North-East direction, with a more important visual matching for both landslide and gully erosion, versus smaller spots of the three simultaneous vulnerabilities in the south. The “Very High” and “High” for all of them cumulate a total of 32.54% of the urban sub-area. The three mainly concerned localities are Nkongsamba, Baréock and Baré (Fig. 11).

Fig. 12
figure 12

Ratios of vulnerability for the extract of Nkongsamba. The class entitled “Other” includes the “Very Low”, “Low” and “Moderate” classes

Discussions

Inference from the findings and comparisons to previous studies

The weighing of factors with the AHP forecasted their contribution per geo-env hazard’s model, differently from usual empirical simple (factor * 1) or percentage (factor * 1/100) weighing in a raster calculation. For simultaneous occurrence, the same factor is assumed to not affect two or more, recorded or predicted geo-env hazards at the same level. Rather, the level of impact in triggering or aggravating may vary from nil (excluded) to fully, with different values. Moreover, with the usage of SLC in a single geo-env model and multi-hazards analysis, that impact always varies as previously proven (Javidan et al. 2021; Nsangou et al. 2022). As side mention, the seasonality of occurrences based on fieldwork and existing records state that landslides and flash floods only happen during the rainy season. While gully erosion takes place in any season, as they are more related to topography, lithology, and changes in human land use impacting land (vegetation) cover, and then easily triggered or emphasized by rainfall, rejoining previous remarks (Aber et al. 2010). Figure 13 illustrates some recent events in the area.

Fig. 13
figure 13

Sampled illustrations. From left to right: Road National-5 collapsed after a rainfall-triggered landslide in Kekem (October 2007) (Balla et al. 2013); landslide in Nkongsamba, near administrative area sector-1 (September 2022); another section of Road National-5, flooded and collapsing in Santchou, following the overflows of the river Elienti after a huge rainfall (September 2020) (http://www.cameroon-info.net/article/cameroun-inondation-la-circulation-est-perturbee-sur-la-route-santchou-dschang-a-la-suite-de-381012.html); houses and road flooded in Nkongsamba, sector Massa Ngah Junction-Market “C” after a huge rainfall (September 2022) (http://scores2000.info/2020/09/grandes-inondations-et-crues-exceptionnelles-sur-certaines-zones-de-la-ville-de-nkongsamba.html)

Further, performing basic ML algorithms over AHP-weighed and stacked factors is another novelty of the proposed method. The advantage of this approach is the agreement of most ML for each model, then a potential accuracy improvement for a single modelling process, since each geo-env hazard susceptibility is highly dependent on the spatial distribution of real occurrence while including the importance of every weighed factor, according to an intelligent automatic process of detection. As such, there was a global spatial agreement of basic learners, i.e., CART, RF and GBRT per geo-env hazards, supported by values of AUC and F1-score, higher than those obtained by Sheikh et al. (2019) by using the TOPSIS-MD, TOPSIS, Simple Additive Weight (SAW), and by Javidan et al (2021) by using MaxEnt. Likewise, probability of occurrence of pairs geo-env hazards is always higher than all at the same time, in accordance with the results obtained by Pourghasemi et al (2019), who assessed floods, earthquakes, and landslides.

Furthermore, introducing a regression meta-learning process over stacked basic learners’ outputs to assess the spatial trends of geo-env hazards of occurrence, better displays spatial agreement than single learning. The results obtained and values of validation tools (AUC, F1-score) for SVR, are the same or higher than those recently obtained by Bordbar et al (2022), who combined ensembles learning and meta-heuristic techniques to flood, landslide and Earthquake occurrence.

At this point, a complementary validation of the basic-learner/meta-learner approach might be important, in addition to AUC and F1-score. Since SVR is assumed to arbitrate, improve and adjust the best spatial and statistical distributions of the CART, RF and GBRT in each case, likewise, the highest correlation of a basic learner with the meta-learner will indicates, the contribution and the reliability of the first to the second. Then, a correlation analysis was conducted on the classes value using the Spearman coefficient. Overall, the correlation coefficients are very high (\(0.921\le{\varvec{\rho}}\le 0.985\)) (Fig. 14). Specifically, RF has the highest reliability to predict single geo-env hazard [\(0.933\boldsymbol{ }\le{\varvec{\rho}}\le 0.985\)]. Whereas GBRT is equally performant as RF in mapping flood with (\({\varvec{\rho}}=0.933\)) and the highest proficiency when it comes to the MGHS modelling (\({\varvec{\rho}}=0.933\)). Then, there are different methodology options in the North-Moungo perimeter hazards study, and the analysis may select one of these learning processes, depending on the goal or accuracy, to reduce the computation length and steps.

Fig. 14
figure 14

Correlation between meta-learner and single learners

In addition, geo-env hazard occurrence was predicted over all land covers, including vegetation (See Fig. 4). One logical explanation is that, topography (elevation, slope steepness), combined to volcanic lithology and pedology (less evolved soils; Gèze, 1942), globally impacts the landscape instability, more than the vegetation does. This has been highlighted and confirmed by the importance of factors analysis (See Fig. 6). Whereas on another hand, although afforestation is most of the time an important and suggested action for landslide, erosion and flood control, the vegetation composition of the area, which transits from degraded forest to shrubs and grasses savannas, deeply impacts the susceptibility of occurrence for each cover. Matter of fact, it has been previously demonstrated that in a space with mixed vegetation patterns, the geo-env hazards gradient of occurrence accordingly varies (e.g., Miles and Swanson 1986; Carone et al. 2015). As such, areas covered by grasses and herbs will be more affected, than those covered by shrubs and trees.

Another consideration is the spatial similarities pointed out on single susceptibility maps, between landslide and gully erosion expansions than with flash flood are to be examined (See Fig. 6). These trends were empirically proven in previous studies (e.g., Sun et al. 2019), and have direct implications on MGHS maps, with the largest space affectation to the couple “Landslide + Gully Erosion” and “Landslide + Flash Flood”, than “Gully Erosion + Flash Flood”. As a consequence, landslide is to be ranked as the first danger, followed by gully erosion and flash flood. Likewise, the differences among single susceptibility maps, confirm not only the robustness of the model in accurately separating/connecting their areas of occurrence, but also the spaces needing more attention for the implementation of control/protection tools specific to each geo-env hazard.

From a spatial-scale viewpoint, it has been stated that multi-hazard susceptibility assessments are more efficient on larger areas (small scale), while multi-hazard risk assessments are practical at the site scale and on an event-based scheme (Motamedi and Liang 2014). This has been verified with the vulnerability assessment. From the extract of Nkongsamba, the population vulnerability to each geo-env hazard shows three different extents. But when it comes to simultaneous exposure, landslide and gully erosion visually represent a higher direct risk of occurrence, while there is a spatially and statistically small exposure of the population to both, plus flash flood at the same time. The outputs of population exposure support the idea of a one-per-one vulnerability analysis, so as not to neglect a geo-env hazard impact on the population (e.g., flash flood vulnerability in the South-Center of Nkongsamba), when focusing on the other that shows a different spatial distribution (e.g., landslide and flood vulnerability in the northern center of Nkongsamba).

Limitations of the study

Accuracy of the inventory was the first obstacle to building the geodatabase. In written reports, there are a lot of gaps in the inventories that lack a systematic follow-up. This certainly leads to some misanalysis if the disaster has not been exactly located, as well as different deviations in the spatial density distribution. Then, because the existing events are, for some, inaccurately distributed, the sampling had to be completed by creating new points based on environmental similarities than real event occurrences, according to the analyst's knowledge of the study area, before proceeding to the learning. Consequently, we were not also able to assess the accuracy in terms of the deviation between the real and the estimated location of the event.

Moreover, most satellite ready-to-use geodatabases of natural hazards are not extended everywhere or have not considered the study subset as exposed. For instance, a detailed landslide geodatabase exists on the USGS website, but only for the United States, while we were also not able to access the global landslide susceptibility prepared by Felsberg et al (2022), as well as the global sensitivity index proposed by Van Natijne et al (2022). Another example is the global soil erosion modelling (GloSEM) proposed by Borrelli et al (2017), using the Revised Universal Soil Loss Equation (RUSLE), but our request form was not answered. As a last illustration, the global flood database is more general and targets very large areas which do not include the Moungo’s studied subset of the CVL as (enough) exposed. This may find an explanation in its small scale (continental), based on MODIS imagery which is coarse.Footnote 14 The availability of such data would have been helpful to conduct a proper accuracy assessment of each classifier, based on the error matrix, as a plus to the process validation.

Outlooks and local mitigation orientations

The main technical activity envisages the direct interactions between pairs of geo-env hazards, as well as all together. However, this study has not considered the aspect of their direct chronological occurrence, as well as their (spatial, mechanical) relationship, i.e., one could trigger the other(s). For instance, on a flow continuum scale and under heavy rainfall, the usual sediment transport of a river’s/stream’s flow, can quickly escalate a flash flooding, that in its turn, will trigger huge debris flow, i.e., mudslides/landslides.Footnote 15 It is more common that gully erosion increases the risk of flooding (Ionita et al. 2015), whereas flood precedes and triggers landslides (Motagh et al. 2020). Therefore, their mechanical connection needs then to be established in a fully different modelling process.

Other future activities involve supporting the cities’ council to build a flexible and updateable database of local disasters. This includes the exact geolocation of different events, with instant mapping, to complete the reports and avoid potential misanalysis or other biases. Practically, mitigation actions should be adapted to the highest scale, i.e., local context, more than the regional one. Based on the case of Nkongsamba, the purpose of mitigation is to properly and efficiently support the efforts of the Department of Civil Protection (DCP)Footnote 16 in the actions to prevent disasters. Among others, the most used is the population eviction from sites judged prone to disasters (hillslopes, marshy lowlands, etc.). Whereas, structures, documents and conventions such as the National Observatory of Hazards (ONR) and National Plan of Contingency (PNC) (MINATD 2007). This should be preceded or supported by a proper mapping integrating all the variables used in this study or even more at the city scale, then, stacked/joined to the urban planning. Unfortunately, the process is not harmonized and has not been set as mandatory for the councils, in such a way very few of them are moderately ready to face the occurrence of disasters.

Above all, the scope of this study has been to produce visual support for policymakers, in reducing losses and damages, building resilience and targeted mitigation actions with a more explicit focus on populations, their health and livelihoods (e.g., forecasting, warning system, relocation) (UNISDR 2015b). This is challenging at the national scale, i.e., Cameroon, considering the very varied panorama of geo-env hazard risks and implications requiring details for efficient management (Bang 2022).

Conclusion

Susceptibility mapping is a fundamental tool for disaster management and planning development activities in hilly regions of tropical and subtropical environments. This study has proposed to combine AHP with ML algorithms, to produce better performances in building a tool named the multi-geoenvironmental hazards susceptibility (MGHS), as well as assessing human vulnerability. Twenty-four factors were selected to maximize the accuracy, all were used for landslide and gully erosion modelling, while only eighteen were integrated into the flood modelling according to the multicollinearity analysis (VIF and TOL) and the literature review. Based on the validation of ROC trends, AUC and F1-score, the model demonstrates a certain robustness. The approach of using three single learners, CART, RF and GBRT, as well as a meta-learner, i.e., SVR, has also been beneficial to assess the most dominating trends. Whereas, the MGHS tool stands for different combinations of two or three simultaneous occurrences. Nevertheless, with the limitations related to this study, such as, the location of some events or the non-coverage by internationally reliable satellite geodatabases for a complementary validation, accuracies record of the outcomes/outputs suffers drawbacks and caveats that are still in work to be addressed. Overall, based on the findings, the process of weighing factors before learning has been beneficial to avoid the arbitrary preeminence of one on another. Likewise, and compared to studies based on ensemble learning, SVR demonstrated the ability to better predict multi-geo-hazards, according to its regressive process and using samples of real events, all of which make it testable for many other applications. In addition, by integrating the population density layer at last, the result of the urban sub-area of Nkongsamba clearly stated that the space planning did not previously take the risk of geo-env hazards, explaining a predicted large human exposure. Since the inhabitants are constructing everywhere, they are susceptible to disasters and vulnerable to death or eviction. Future planning is encouraged to imperatively integrate the geo-env hazard aspect, to prevent all inconvenience and threats to locals.

Availability of data and materials

All the data and material used for this research were incorporated as needed to the text or added as appendixes. All other detailed data and script links are available upon request.

Notes

  1. United Nations International Strategy for Disaster Reduction.

  2. Integrated Research on Disaster Risk.

  3. https://www.who.int/health-topics/landslides#tab=tab_1.

  4. https://www.weather.gov/mrx/flood_and_flash.

  5. United Nations Office for Disaster Risk Reduction.

  6. https://www.usgs.gov/centers/eros/science/usgs-eros-archive-digital-elevation-shuttle-radar-topography-mission-srtm-1.

  7. Translated from the original version in French.

  8. Layer provided by the Institute of Mining and Geological Research.

  9. Provided by the Ministry of Transportation.

  10. Office de la Recherche Scientifique et Technique d’Outre-Mer.

  11. Field information and analyst knowledge.

  12. Mungo Division Transport authority.

  13. https://towardsdatascience.com/support-vector-regression-svr-one-of-the-most-flexible-yet-robust-prediction-algorithms-4d25fbdaca60. Visited on September 18, 2022.

  14. https://global-flood-database.cloudtostreet.ai/#interactive-map.

  15. https://www.usgs.gov/programs/landslide-hazards/landslides-101.

  16. Ministry of Territorial Administration and Decentralization (MINATD).

References

Download references

Acknowledgements

This paper was initiated by the first author, AHNM, remembering that, on Sunday 7th of September 1997, in Nkongsamba-sector “11-Bis”, a huge rainfall-triggered landslide/mudslide, killed the four children of his neighbor and elementary school friend, named Dorethe Yonga. In support of the Nkongsamba city councils’ local efforts, for a proper integration of the geoenvironmental hazards component in urban planning. In memory of all the deaths due to disasters in Cameroon. As another scientific contribution to the knowledge of challenges along the Cameroon Volcanic Line. Thank you to the heads of our respective laboratories and departments for the pieces of advice. Special gratitude to the reviewers for their insightful comments and suggestions for the improvement of the manuscript.

Funding

The Non-Profit Scientific Association Stats’NMaps, through its ‘Research Support Funds’ (S’NM-RSF), financially supported the purchase of articles and book chapters on sale, as well as all the fieldwork investigations for complementary data acquisition. The licensed software, Erdas Imagine version 16.6.0.1366, ArcGIS version 10.8.2, SAGA-GIS version 8.0.1, Super Decision Software (SDS) version 3.2 and, XLStats version 2020.1.64570, used in this research are properties of Stats’NMaps and available upon request.

Author information

Authors and Affiliations

Authors

Contributions

AHNM initiated the study, wrote the first manuscript and designed main mapping; AHNM, PWN, DTS and JVMM contributed to data collection and curation, method conception and validation; AHNM, HBN and ID discussed contents, literature review and references sourcing; MT, BD, AM, SH, RB, FCLT, ICNP and LMB joined the study and provided complementary data and improvement comments. All the authors contributed to manuscript editing, structuring and critical reviewing. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Alfred Homère Ngandam Mfondoum.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: Sentinel2 granules information sorted by recording time

Granule Full Identification

Time

S2B_MSIL1C_20211222T093319_N0301_R136_T32NNM_20211222T113237

09:33:19

S2B_MSIL1C_20211222T093319_N0301_R136_T32NPL_20211222T113237

09:33:19

S2A_MSIL1C_20211220T094411_N0301_R036_T32NNM_20211220T113438

09:44:11

S2A_MSIL1C_20211220T094411_N0301_R036_T32NNL_20211220T113438

09:44:11

S2B_MSIL1C_20211222T093319_N0301_R136_T32NNL_20211222T113237

09:49:22

The sensing orbit direction is descending.

Appendix 2: Sentinel2 bands information

Name

Range (µm)

Bands (Resolution/m)

Coastal aerosol

0.421—0.457

B1(60)

Blue

0.439—0.535

B2(10)

Green

0.537—0.582

B3(10)

Red

0.646—0.685

B4(10)

Red_edge1

0.694—0.714

B5(10)

Red_edge2

0.731—0.749

B6(10)

Red_edge3

0.768—0.796

B7(10)

NIR wide

0.767—0.908

B8(10)

NIR narrow

0.848—0.881

B8A(20)

Water vapor

0.931—0.958

B9(60)

Cirrus

1.338—1.414

B10(60)

SWIR1

1.539—1.681

B11(20)

SWIR2

2.072—2.312

B12(20)

Appendix 3a: Validation error matrix

 

Water

Open Vegetation

Built-up

Soil

Total

Water

28

0

0

2

30

Open Vegetation

0

75

0

0

75

Built-up

0

1

33

16

50

Soil

0

3

2

33

35

Appendix 3b: Consumer’s accuracy

Class

Value

Water

1

Open Vegetation

0.95

Built-up

0.94

Soil

0.625

Appendix 4: The conditioning factors map from original/unscaled values

figure a

Appendix 5: Samples distribution for the learning process per geoenvironmental hazard

figure b

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mfondoum, A.H.N., Nguet, P.W., Seuwui, D.T. et al. Stepwise integration of analytical hierarchy process with machine learning algorithms for landslide, gully erosion and flash flood susceptibility mapping over the North-Moungo perimeter, Cameroon. Geoenviron Disasters 10, 22 (2023). https://doi.org/10.1186/s40677-023-00254-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40677-023-00254-5

Keywords