How sample size can effect landslide size distribution
© The Author(s). 2016
Received: 2 July 2016
Accepted: 4 October 2016
Published: 23 October 2016
Landslide size distribution is widely found to obey a negative power law with a rollover in the smaller size, and has been exploited by many researchers to inspect landside physics or to assess landslide erosion and landslide hazard. Yet, sample size has effect on the statistics of landslide size even though we manage to avoid complications associated with landslide datasets and statistical treatments.
In this paper, a series of stochastic simulations were implemented to explicitly and systematically quantify the effect of sample size. The results show that, the errors of parameters estimated based on small sample size can be considerably large. For a sample size of 100, the relative error of the estimated landslide erosion rate that has a probability of 50 % can approach 100 %. In addition, small sample size also obscures the statistical significance of the variances in parameters between different subsets of the same dataset. Although inconsistency was found regarding how the power exponent varies with rainfall intensity, numerical results suggest that the variance observed in a dataset with a small sample size may be not statistically significant.
This paper not only reveals the potential effect of sample size on exploiting landslide size distribution but also presents procedures for quantifying this issue in future studies.
KeywordsLandslide size distribution Sample size Statistical significance Power law Rollover
The frequency of landslide is widely observed to decrease as a power law with the increase of size after a maximum value (Stark and Hovius 2001; Malamud et al. 2004; Brunetti et al. 2009). This partial power law behavior is unique because it is represented by not only a heavy tail as observed in many phenomena (Caers et al. 1999; Cheng 2008; Pinto et al. 2012; Kolyukhin and Tveranger, 2014), but also a rollover in the smaller size. The exponent of the power law tail (γ) and the rollover (R) are therefore the two most characteristic parameters. On one hand, landslide size distribution is crucial for quantitative analysis of landslide hazard (Hungr et al. 1999; Guzzetti et al. 2005) and earth surface processes (Hovius et al. 1997; Larsen and Montgomery 2012). On the other hand, the emergence of the power law tail and the rollover are mysteries that still lack widely accepted physical explanations (Pelletier et al. 1997; Katz and Aharonov 2006; Stark and Guzzetti 2009; Lehmann and Or 2012; Frattini and Crosta 2013; Alvioli et al. 2014; Li et al. 2014 and references therein). Therefore, landslide size distribution has been exploited by many researchers either to inspect the physics of landslides or to assess landslide erosion and landslide hazard.
The statistics of landslide size could be obscured by complications associated with landslide datasets and statistical treatments in the first place. The following strategies had been adopted to mitigate these complications: 1) using event-based rather than historical landslide datasets (Malamud et al. 2004; Ghosh et al. 2012); 2) using the same dataset prepared by the same author instead of datasets prepared by different authors (Iwahashi et al. 2003; Guzzetti et al. 2008; Chen 2009); and 3) using the maximum likelihood estimation (MLE) rather than linear regression to estimate both the power exponent and the rollover (Fiorucci et al. 2011; Ghosh et al. 2012). Nevertheless, even without these complications, limited sample size can also cast a shadow on the statistics of landslide size. Sample size effect is in fact a problem faced by many disciplines (Lazzeroni and Ray 2012). The error on the estimated scaling parameter of power law distributions from sample size effects has been investigated (Clauset et al. 2009). Yet, the potential effect of sample size on the statistics of landslide size has not been explicitly addressed. The fact that small differences in parameters of the size frequency relationship may produce huge mismatches in the derived landslide erosion rates (Korup et al. 2012) suggests the significance of this issue in some respects. This paper aims to quantitatively inspect the possible effect of sample size on exploiting landslide size distribution. We will focus on the landslide area distribution because so far no satisfactory distribution function for landslide volume has been proposed and most empirical datasets do not have landslide volume data.
Landslide area distribution
The widely adopted double Pareto function (Stark and Hovius 2001) and Inverse Gamma function (Malamud et al. 2004) were used to characterize the landslide area distributions. And parameters of the two distributions were estimated by the maximum likelihood estimation (MLE). It is inconclusive whether the two functions are mathematically and physically eligible to represent the landslide area distributions in the real world, but they are a practical choice since there are capable of characterizing both the power law tail and the rollover of landslide area distribution. To examine the assumption that landslide size distribution is characterized by a power law tail goes beyond the topic of this paper, we therefore did not test that hypothesis. For the same reason, we did not examine whether log-normal function (ten Brink et al. 2009; Mackey and Roering 2011), logarithmic function (Issler et al. 2005; Che et al. 2011) or exponential function (Montgomery et al. 1998) is an alternative to characterize the empirical data of landslide size. Analytical distribution function for landslide area yielded by maximizing Tsallis entropy (Chen et al. 2011) were not used as it is accompanied by complications (Li et al. 2012).
The expression of the double Pareto distribution is:
We therefore do not use t as the “crossover” of p dp(A) as suggested by Stark and Hovius (2001), but use the area with maximum probability density as the “rollover”. We chose 1 and 1010 as the two cutoffs to ensure all the landslide area values in empirical datasets fall into this scope and found similar choices (e.g., 1 and 108) get almost the same estimates of parameters.
Average landslide volume
A straightforward procedure is designed to inspect the effect of sample size on the reliability of the parameter estimation of landslide size distribution. It is assumed that: 1) the theoretical distribution of landslide size with respect to a certain event within a certain area is constrained by physical factors and is therefore predetermined; 2) the number of landslides occurring in a certain event within a certain area is finite; and 3) the size of each individual occurred landslide is stochastic. So, we firstly introduce a predefined theoretical distribution of landslide area, and then draw N values of area from the theoretical distribution using Monte Carlo simulation, where N is the sample size. Values of sample size span from 100 to 10,000 and are logarithmic spaced. With regard to each sample size, 1,000 Monte Carlo samples are produced to reveal the stochasticity of the estimated parameters. If the sample size is large enough, the estimated parameters of the 1,000 Monte Carlo samples are expected to have a mean value similar to the parameters of the theoretical distribution and a low standard deviation. On the contrary, if the sample size is small, a mean value far different to the theoretical value and a high standard deviation are expected.
Similarly, we also use a straightforward way to inspect how the sample size influences the statistical significance of the comparison of the parameters of landslide size distribution between different subsets. Firstly, with regard to each sample size, the sample with parameters most similar to the theoretical values is picked out from the formerly produced 1,000 Monte Carlo samples as the test sample for this sample size. Then, for each sample size, the corresponding test sample is randomly subdivided into two subsets according to a subdividing ratio. And six subdividing ratios, namely 1:1, 2:1, 3:1, 4:1, 5:1, and 6:1, are used to inspect the effect of subdividing ratio as well. For each test sample, the random subdivision is repeated 1,000 times for each subdividing ratio. If we take “the observed differences in parameters between the two subsets are attributed to random processes” as the null hypothesis, the region of rejection and also the region of acceptance for a certain significance level (e.g., 0.05) can be estimated according to the statistics of the variances in parameters observed in the 1,000 random trials.
Statistical characteristics of landslide dataset and subsets
R e (m2)
The reliability of estimating parameters
If we take 95 % as a rule of thumb (Fig. 3e, f), the minimum sample size required for reliable parameter estimation can be inferred. However, finding a threshold universally applicable is unrealistic. It is related to not only the distribution function but also the theoretical parameters. Numerical experiments show that larger theoretical parameters (absolute values) require larger sample size to guarantee a high probability of low relative error. Nevertheless, Fig. 3 shows that 6,000 is a roughly safe choice for most landslide datasets. In view of this standard, utilizations of the parameters estimated based on small sample size, for example less than 1,000 (Fiorucci et al. 2011; Ghosh et al. 2012; Regmi et al. 2014), especially for quantitative use (Larsen and Montgomery 2012; Tsai et al. 2013), should be cautious. We will specifically inspect how sample size affects the reliability of landslide erosion estimates in the discussion.
The statistical significance of comparing parameters
It is shown that, regardless of distribution function, for both γ and R, as sample size gets smaller or subdividing ratio gets larger, the region of acceptance gets wider (worse). It means smaller sample size and larger subdividing ratio expect larger differences in parameters to be observed for the sake of statistical significance. It also shows that the double Pareto distribution performs better on estimating the exponent (Fig. 5) while the Inverse Gamma distribution performs slightly better on estimating the rollover (Fig. 6). The regions of acceptance go beyond the range of figures for sample sizes less 250 if the subdividing ratio is large. This is because small sample size together with large subdividing ratio will yield unrealistic wide regions of acceptance. For example, with regard to a sample size of 100 and a subdividing ratio of 5, the regions of acceptance for γ estimated using the double Pareto distribution and the Inverse Gamma distribution are [−14.69, 13.75] and [−18.22, 15.24], respectively. Therefore, comparing the parameters of different subsets of a landslide dataset with an extreme small sample size, for instance less than 100 (Iwahashi et al. 2003), is practically statistically meaningless.
Numerical experiments show that, for the same sample size and subdividing ratio, larger parameters (absolute values) of the “mother dataset” yields wider regions of acceptance. Therefore, it is hard to find a universal standard for statistical significance. It is also hard to tell whether some published variances in parameters between different subsets is statistical significant or not, because either the MLE is not used (Chen 2009) or the information is not sufficient (Guzzetti et al. 2008). Nevertheless, test of statistical significance is highly recommended prior to physical interpretations of the variation of landslide size distribution between different subsets, especially for those with small sample size (Santangelo et al. 2013; Guns and Vanacker 2014). In the discussion, we will show that small sample size can cast a shadow on interpreting the physical constraints on landslide size distribution.
The proposed statistical procedures in this paper is of potential use for exploiting landslide size distribution, including such as estimating landslide erosion rate, assessing landslide hazard and inspecting the physics of landslides. In this section, the effect of sample size on estimating landslide erosion rate is specifically discussed, and an example is presented to show that sample size can affect the confidence in attributing the variation of landslide size distribution to the spatial heterogeneity of rainfall intensity.
The estimation of landslide erosion rate
The variation of landslide size distribution with rainfall intensity
The significance of the observed variances in γ and R between the two subsets (R1 and R2) of the XW dataset (Table 1) had been tested. We randomly subdivide the 12,524 landslides into two subsets with 6,297 and 6,227 landslides respectively for 1,000 times. The results show that, all the differences in parameters observed between the R1 and R2 subsets is statistical significant at a significant level of 0.05, except for the difference in rollover estimated using the double Pareto distribution (−2.51), which has a significant level of about 0.21. Therefore, from the conservative point of view, we attribute the observed variances in rollover to random processes but suggest a physical explanation for the variances in power exponent.
We find that larger cumulative rainfall produces a steeper power law tail of the landslide area distribution (Fig. 2b). This is opposite to the previously reported result that the power law tail becomes flatter with an increase of the cumulative rainfall (Chen 2009). The variation of power exponent with rainfall intensity is essential because it concerns the problem whether increased rainfall intensity will increase the relative proportion of small size landslides or large size landslides. The explanation of this disagreement goes beyond the scope of this paper. Instead, we suggest a test of significance prior to physical interpretation. However, the statistical significance of the result published by Chen (2009) cannot be exactly told since the MLE was not used. Nevertheless, a variance of power exponent 0.27 is obtained by subdividing a landslide dataset with a sample size less than 600. This result falls into the region of acceptance according to our numerical experiments (Fig. 4). Therefore, from a statistical point of view, there may be no adequate confidence to exclude the possibility that the variance of power exponent with rainfall intensity observed in Chen (2009) is due to random processes.
A series of numerical experiments were implemented in this paper to systematically quantify the effect of sample size on exploiting landslide area distribution. The results show that, as sample size gets smaller, both the reliability of the parameter estimation and the statistical significance of the variances in parameters observed between different subsets get worse. Therefore, quantitative analysis of landslide hazard and land surface erosion based on the statistics of landslide dataset with small sample size may be accompanied by considerable errors. Specifically, with a sample size of 100, the relative error of the estimated landslide erosion rate that has a probability of 50 % can approach 100 %. Furthermore, inconsistency was found regarding how the power exponent of landslide area distribution varies with rainfall intensity. Our numerical results suggest that the variance observed in a dataset with a small sample size may be not statistically significant. Although this study had focused on landslide area distribution and adopted the double Pareto distribution and the Inverse Gamma distribution, the presented procedures can be also used to quantify the potential effects of sample size regarding landslide volume distribution and other distribution functions.
Nevertheless, because the results of numerical simulations are affected by the statistical characteristics of the concerned landslide dataset, it is hard to find universally applicable criteria for adequate (large enough) sample size. A design for testing the potential effects of sample size on landslide size statistics but not a rule of thumb was proposed in this paper. It must be emphasized that, only larger sample size cannot guarantee reliable statistical results, a landslide dataset with physically representative sample distribution is usually a prerequisite.
This research was supported by National Natural Science Foundation of China (NO. 41525010, 41272354 and 41472282) and Research Foundation for Youth Scholars of IGSNRR, CAS. The authors also wish to thank Fujian Centre for Geological Environment Monitoring for SOPT images and relevant data. Mr. Zhiwei Wang is particularly appreciated for helping to prepare the landslide dataset in the Xiayang-Wangtai area.
LP designed the study, carried out the statistical analysis and drafted the manuscript. HX conceived of the study, and participated in its design and coordination and helped to draft the manuscript. YM participated in the statistical analysis. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
- Alvioli, M., F. Guzzetti, and M. Rossi. 2014. Scaling properties of rainfall-induced landslides predicted by a physically based model. Geomorphology 213: 38–47.View ArticleGoogle Scholar
- Brunetti, M.T., F. Guzzetti, and M. Rossi. 2009. Probability distributions of landslide volumes. Nonlinear Proc Geophys 16(2): 179–188.View ArticleGoogle Scholar
- Caers, J., J. Beirlant, and M.A. Maes. 1999. Statistics for modelling heavy tailed distributions in geology: Part I. Methodology. Mathematical Geoscience 31(4): 391–410.Google Scholar
- Che, V.B., M. Kervyn, G.G.J. Ernst, P. Trefois, S. Ayonghe, P. Jacobs, E. van Ranst, and C.E. Suh. 2011. Systematic documentation of landslide events in Limbe area (Mt Cameroon Volcano, SW Cameroon): geometry, controlling, and triggering factors. Natural Hazards 59(1): 47–74.View ArticleGoogle Scholar
- Chen, C.Y. 2009. Sedimentary impacts from landslides in the Tachia River Basin, Taiwan. Geomorphology 105(3–4): 355–365.View ArticleGoogle Scholar
- Chen, C.C., L. Telesca, C.T. Lee, and Y.S. Su. 2011. Statistical physics of landslides: New paradigm. EPL 95(4): 49001.View ArticleGoogle Scholar
- Cheng, Q.M. 2008. Non-linear theory and power-Law models for information integration and mineral resources quantitative assessments. Mathematical Geoscience 40: 503–532.View ArticleGoogle Scholar
- Clauset, A., C.R. Shalizi, and M.E.J. Newman. 2009. Power-law distributions in empirical data. SIAM Review 51(4): 661–703.View ArticleGoogle Scholar
- Fiorucci, F., M. Cardinali, R. Carlà, M. Rossi, A.C. Mondini, L. Santurri, F. Ardizzone, and F. Guzzetti. 2011. Seasonal landslide mapping and estimation of landslide mobilization rates using aerial and satellite images. Geomorphology 129(1–2): 59–70.View ArticleGoogle Scholar
- Frattini, P., and G.B. Crosta. 2013. The role of material properties and landscape morphology on landslide size distributions. Earth and Planetary Science Letters 361: 310–319.View ArticleGoogle Scholar
- Ghosh, S., C.J. van Westen, E.J.M. Carranza, V.G. Jetten, M. Cardinali, M. Rossi, and F. Guzzetti. 2012. Generating event-based landslide maps in a data-scarce Himalayan environment for estimating temporal and magnitude probabilities. Engineering Geology 128: 49–62.View ArticleGoogle Scholar
- Guns, M., and V. Vanacker. 2014. Shifts in landslide frequency–area distribution after forest conversion in the tropical Andes. Anthropocene 6: 75–85.View ArticleGoogle Scholar
- Guzzetti, F., P. Reichenbach, M. Cardinali, M. Galli, and F. Ardizzone. 2005. Probabilistic landslide hazard assessment at the basin scale. Geomorphology 72(1–4): 272–299.View ArticleGoogle Scholar
- Guzzetti, F., F. Ardizzone, M. Cardinali, M. Galli, P. Reichenbach, and M. Rossi. 2008. Distribution of landslides in the Upper Tiber River basin, central Italy. Geomorphology 96(1–2): 105–122.View ArticleGoogle Scholar
- Guzzetti, F., F. Ardizzone, M. Cardinali, M. Rossi, and D. Valigi. 2009. Landslide volumes and landslide mobilization rates in Umbria, central Italy. Earth and Planetary Science Letters 279(3–4): 222–229.View ArticleGoogle Scholar
- Hovius, N., C.P. Stark, and P.A. Allen. 1997. Sediment flux from a mountain belt derived by landslide mapping. Geology 25(3): 231–234.View ArticleGoogle Scholar
- Hungr, O., S.G. Evans, and J. Hazzard. 1999. Magnitude and frequency of rock falls and rock slides along the main transportation corridors of southwestern British Columbia. Canadian Geotechnical Journal 36(2): 224–238.View ArticleGoogle Scholar
- Issler, D., F.V. De Blasio, A. Elverhøi, P. Bryn, and R. Lien. 2005. Scaling behaviour of clay-rich submarine debris flows. Mar Petrol Geol 22(1–2): 187–194.View ArticleGoogle Scholar
- Iwahashi, J., S. Watanabe, and T. Furuya. 2003. Mean slope-angle frequency distribution and size frequency distribution of landslide masses in Higashikubiki area, Japan. Geomorphology 50(4): 349–364.View ArticleGoogle Scholar
- Katz, O., and E. Aharonov. 2006. Landslides in vibrating sand box: What controls types of slope failure and frequency magnitude relations? Earth and Planetary Science Letters 247(3–4): 280–294.View ArticleGoogle Scholar
- Klar, A., E. Aharonov, B. Kalderon-Asael, and O. Katz. 2011. Analytical and observational relations between landslide volume and surface area. Journal of Geophysical Research 116(F02), F02001.Google Scholar
- Kolyukhin, D., and J. Tveranger. 2014. Statistical analysis of fracture-length distribution sampled under the truncation and censoring effects. Mathematical Geoscience 46: 733–746.View ArticleGoogle Scholar
- Korup, O., T. Görüm, and Y. Hayakawa. 2012. Without power? Landslide inventories in the face of climate change. Earth Surf Proc Land 37(1): 92–99.View ArticleGoogle Scholar
- Larsen, I.J., and D.R. Montgomery. 2012. Landslide erosion coupled to tectonics and river incision. Nature Geoscience 5(7): 468–473.View ArticleGoogle Scholar
- Larsen, I.J., D.R. Montgomery, and O. Korup. 2010. Landslide erosion controlled by hillslope material. Nature Geoscience 3(4): 247–251.View ArticleGoogle Scholar
- Lazzeroni, L.C., and A. Ray. 2012. The cost of large numbers of hypothesis tests on power, effect size and sample size. Molecular Psychiatry 17(1): 108–114.View ArticleGoogle Scholar
- Lehmann, P., and D. Or. 2012. Hydromechanical triggering of landslides: From progressive local failures to mass release. Water Resources Research 48(3), W03535.View ArticleGoogle Scholar
- Li, L.P., H.X. Lan, and Y.M. Wu. 2012. Comment on “Statistical physics of landslides: New paradigm” by Chen C.-c. et al. EPL 100(2): 29001.View ArticleGoogle Scholar
- Li, L.P., H.X. Lan, and Y.M. Wu. 2014. The volume-to-surface-area ratio constrains the rollover of the power law distribution for landslide size. Eur Phys J Plus 129(5): 89.View ArticleGoogle Scholar
- Mackey, B.H., and J.J. Roering. 2011. Sediment yield, spatial characteristics, and the long-term evolution of active earthflows determined from airborne LiDAR and historical aerial photographs, Eel River, California. Geological Society of America Bulletin 123(7–8): 1560–1576.View ArticleGoogle Scholar
- Malamud, B.D., D.L. Turcotte, F. Guzzetti, and P. Reichenbach. 2004. Landslide inventories and their statistical properties. Earth Surf Proc Land 29(6): 687–711.View ArticleGoogle Scholar
- Montgomery, D.R., K. Sullivan, and H.M. Greenberg. 1998. Regional test of a model for shallow landsliding. Hydrological Processes 12(6): 943–955.View ArticleGoogle Scholar
- Pelletier, J.D., B.D. Malamud, T. Blodgett, and D.L. Turcotte. 1997. Scale-invariance of soil moisture variability and its implications for the frequency–size distribution of landslides. Engineering Geology 48(3–4): 255–268.View ArticleGoogle Scholar
- Pinto, C.M.A., A. Mendes Lopes, and J.A. Tenreiro Machado. 2012. A review of power laws in real life phenomena. Commun Nonlinear Sci Numer Simuln 17(9): 3558–3578.View ArticleGoogle Scholar
- Regmi, N.R., J.R. Giardino, and J.D. Vitek. 2014. Characteristics of landslides in western Colorado, USA. Landslides 11(4): 589–603.View ArticleGoogle Scholar
- Santangelo, M., D. Gioia, M. Cardinali, F. Guzzetti, and M. Schiattarella. 2013. Interplay between mass movement and fluvial network organization: An example from southern Apennines, Italy. Geomorphology 188: 54–67.View ArticleGoogle Scholar
- Stark, C.P., and F. Guzzetti. 2009. Landslide rupture and the probability distribution of mobilized debris volumes. Journal of Geophysical Research 114: F00A02.View ArticleGoogle Scholar
- Stark, C.P., and N. Hovius. 2001. The characterization of landslide size distributions. Geophysical Research Letters 28(6): 1091–1094.View ArticleGoogle Scholar
- ten Brink, U.S., R. Barkan, B.D. Andrews, and J.D. Chaytor. 2009. Size distributions and failure initiation of submarine and subaerial landslides. Earth and Planetary Science Letters 287(1–2): 31–42.View ArticleGoogle Scholar
- Tsai, Z.X., G.J.Y. You, H.Y. Lee, and Y.J. Chiu. 2013. Modeling the sediment yield from landslides in the Shihmen Reservoir watershed, Taiwan. Earth Surf Proc Land 38(7): 661–674.View ArticleGoogle Scholar