GIS is employed to categorize and compartmentalize unique attributes from datasets into equal interval 10 × 10 km grids for the entire state. Grid analysis provides a higher level of specificity to weather patterns compared to the broad, low precision county level analysis previously conducted by multiple governmental and state agencies (e.g. FEMA 2002, 2008; NCDC 2013). The complete process along with the conducted regression analysis steps are demonstrated in the flowchart shown in Fig. 5 and are explained below.

### Gridding and standardizing input data

Fishnetting allows storm tracts to be standardized into grids, supporting field summing, as well as later analysis of original attributes. Grid size is standardized to 10 × 10 km in this study. A 1-arc sec digital elevation model (DEM) for the state is classified into ten classes using an interval of 82.66 m that closely mirrored a stretch classification method. These respective elevation attributes are then joined to the 10 × 10 km grid. Primary alchemy applied to this analysis revolves around the spatial join tool available in ArcGIS release 5.10.1, presenting two valuable options: (1) one-to-one, where a 1:1 ratio is maintained and the choice to sum totals is used to get sums of attributes for each respective cell and (2) one-to-many, which allows user selected attributes from a line, representing a storm track, intersecting multiple grid cell to be added. The one-to-many spatial join has been used in this study to model event frequencies for each respective weather hazard.

### Creating severity indices

Initially, event frequency and magnitude vectors are converted into raster, where the cell values of each respective variable become a grid code output, then the frequency and magnitude are combined for each weather event. Later, a severity index is established for each respective weather event, with each component of the triad being combined into a final statewide severity index using this simple formula:

$$ SSI= TS\times DS\times HS $$

(1)

where *SSI* is the statewide severity index, *TS* is the tornado severity, *DS* is the derecho severity, and *HS* is the hail severity.

### Exploratory regression

Regression analyses provide a means for exploratory data trends, offering statistical scrutiny of influential spatio-patterns. The exploratory regression (ER) tool in ArcGIS (5.10.1) provides a simplistic means for trial and error experimentation, allowing the analyst a to narrow down factors that may be influencing the dependent variable model. ER is employed in this study as a first step investigation to conduct an OLS regression on the most influential variables. Explanatory variables (EV) considered in this analysis are found to be: trailer parks, elevation, topographic protection, physiographic ecological sub regions. These variables are chosen based on results from previous studies (e.g. LaPenta et al. 2005; Bosart et al. 2006; Frame and Markowski 2006; Markowski and Dotzek 2011; Gaffin 2012; Karstens et al. 2013; Lyza and Knupp 2013) that show strong correlations between topography, elevation, land cover features, and windward aspects of topographic features to directly influence strength and subsequent severity of weather events. Several statistical properties are used to determine the strength of EV.

The coefficient of determination referred to as *Adjusted R*^{2} and evaluated by Steel and Torries (1960) as:

$$ {R}_{adj}^2=1-\left[\frac{\frac{SSError}{\left(n-k\right)}}{\frac{SSTotal}{n-1}}\right] $$

(2)

where R is the coefficient for multiple regressions, *k*, denotes the quantity of coefficients implemented in the regression, *n*, the number of variables, *SSError*, the sum for standard error and *SSTotal* is the total sum of squares.

The statistical *t-test* developed by Gosset (1908) can be simplified as:

$$ t=\frac{Z}{s}=\frac{\left(\overline{X}-\mu \right)\left(\frac{\sigma }{\sqrt{n}}\right)}{s} $$

(3)

where \( \overline{X} \) is representative of the sample’s mean where the sample ranges from *X*_{
1
}*, X*_{
2
}*,…. X*_{
n
}, out of a size *n*, which follows a natural tendency of normal distribution between the variance in *σ*^{2} and *μ,* with *μ* denoting mean population, and *σ* being the standard deviation in the population.

Koenker (BP) statistic that is a chi-squared test for heteroscedasticity, originally developed by Breusch and Pagan (1979) and later adapted to by Koenker (1981), is expressed as:

$$ LM=\frac{1}{2}\left[\frac{N}{n\left(N-n\right)}\right]{\left[{\sum}_t^n\left(\frac{{\widehat{u}}_t^2}{{\widehat{\sigma}}^2}\right)-n\right]}^2 $$

(4)

in which *LM* is a Lagrange Multiplier, *N* denotes the number of observations, n the sample size, \( {\widehat{u}}_t^2 \) are the dependent gamma residuals, \( {\widehat{\sigma}}^2 \)is the estimated residual variance in observations.

Akaike’s Information Criterion correction (AICc) is used to estimate relative quality for a given statistical model and is based on information theory and serves as a means of ranking the quality of multiple to models with respects to one another. AICc is based on Akaike Information Criterion (AIC) (Akaike 1973, 1974; 2010) and corrects for a finite sample size:

$$ AICc= AIC+\frac{2k\ \left(k+1\right)}{n-k-1} $$

(5)

with *k* denoting the number of parameters and *n*, the sample size (e.g. Burnham and Anderson 2002; Konishi and Kitagawa 2008).

The Jarque-Bera statistical test is used to check for data sample skewness and kurtosis match on a normal distribution curve through:

$$ JB=\frac{n-k+1}{6}\ \left({S}^2+\frac{1}{4}{\left(C-3\right)}^2\right) $$

(6)

in which *S* is skewness in the dataset, *C* is the sample’s kurtosis, *n* the number of observations, and *k* represents the quantity regressors (e.g. Jarque and Bera 1980, 1981; and 1987).

The reciprocal of tolerance (also known as the maximum Variance Inflation Factor - VIF) (Belsley et al. 1980; Belsley 1984; O’brien 2007) can be expressed as:

$$ VIC=\left(\frac{1}{\left(1-{R}_i^2\right)}\right) $$

(7)

where tolerance of the *i*th variable is 1 less, the proportion of variance which is \( {R}_i^2 \) (O’brien 2007).

The Spatial Autocorrelation (SA) essentially draws on a Global Moran’s *I* value based on Tobler’s (1970) Law to calculate p-scores and z-scores. P-scores designate probability percentages that range from 0.10 to < 0.01 (weak), null, and 0.10 to < 0.01 (strong). Z-scores represent standard deviations, when combined with a strong corresponding p-scores indicate robust confidence. Ranges for Z-scores are (weak) < − 2.58 up to (strong) > 2.58. Moran’s *I* is defined by ESRI (2016) as:

$$ I=\frac{n\ {\sum}_{i=1}^n{\sum}_{j=1}^n{W}_i,j{z}_i{z}_j}{S_0\ {\sum}_{i=1}^n{z}_i^2}, $$

(8)

where deviation of an attribute’s feature, *I*, from mean (*x*_{
i
} − *X*) is *z*_{
i
}, *n* denotes total feature count*,* spatial weighting between (*i*, *j*) becomes *W*_{
i
}, *j*, and lastly the amalgamation of these spatial weights is *S*_{0}:

$$ {S}_0={\sum}_{i=1}^n{\sum}_{j=1}^n{W}_i,j, $$

(9)

Z_{I}-scores are calculated with:

$$ {z}_I=\frac{I-E\left[I\right]}{\sqrt{V\left[I\right]}}, $$

(10)

where:

$$ E\left[I\right]=-\frac{1}{n-1}, $$

(11)

$$ V\left[I\right]a=E\left[{I}^2\right]-E\left[{I}^2\right], $$

(12)

### Ordinary least squares

OLS is perhaps the most commonly used forms of regression analysis in GIS. Amemiya (1985) defines it as:

$$ y={\beta}_0+{\beta}_1{X}_1+{\beta}_2{X}_2+{\beta}_3{X}_3+\cdots \cdots {\beta}_n{X}_n+\epsilon $$

(13)

where *y* is the dependent variable which is the variable that is predicting or explaining the model and is a function of *X*, which are coefficients representing EVs that, together, help answer *y*. *β* are regression coefficients that are calculated through algorithms running in the GIS background and *β*_{
0
} is the regression intercept and represents an expected outcome for *y* and *ε* are the residual random error terms.

As part of the OLS process, we run a SA utilizing Global Moran’s I, which determines the likeliness of randomly chosen EVs relative to their spatial distribution and impact. Other statistical outputs included in the final OLS include: (1) *StdError* and (2) *Robust*_{
SE
}, which are errors in standard deviation; (3) t-Statistic and (4) *Robust*_{
t
} which are ratios between an estimated value of a parameter and a hypothesized value relative to standard error; (Akaike, 1973) probability and (Akaike, 1974) *robust probability* (_{
Pr
}), which are the statistically significant coefficients (*p* < 0.01); should initial probability values possess a significant (Akaike, 2011) *Koenker* statistic, then (*pr)* is used to determine significance of coefficient; (Arkansas Farm Bureau, 2017) *VIF* factors (> 7.5) that are indicative of redundancy; (Arkansas State University, 2016) *Joint Wald* statistic, which help determine model’s overall significance if *Koenker* value is significant; and finally (Amemiya, 1985) *AICc* and (Belsley et al., 1980) *R*^{2}, which are measures of model’s overall fit and performance.

### Quantile classification

Quantile classification is used for the symbology of all choropleth maps. Quantile is chosen as the appropriate means for classification because it creates classes based on equal division of units in each class (e.g. Cromley 1996; Brewer and Pickle 2002; Burnham and Anderson 2002; Xiao et al. 2007, Sun et al. 2015). Quantile classification most closely represents the input data trends that are poorly represented using other classification methods, such as Jenks-Natural breaks, equal interval, standard deviation, and geometric classifications.