Much of statistics and machine learning deals with applying data to estimate what is most likely to happen. In these cases, large but rare events are often treated as outliers and simply ignored. However, there are many problems for which it is the extreme values that have the largest impact.

Events such as flooding, earthquakes, tsunamis and financial crashes all carry risks that are driven not by the average but by unusually large values. While coastal sea defences, for example, are expected to withstand common day-to-day tide levels, it is not practical to build defences high enough to withstand any plausible sea level. Instead, coastal flood risk may be managed by designing defences to withstand the level expected to be exceeded only once on average every 100 years. Robust estimates of the likelihood of extreme values are therefore critical for many forms of risk management.

Statistical models typically provide a fitted probability distribution which describes the estimated likelihood of frequent but also rare events. Standard statistical models fitted to regular data can therefore in principle be used to estimate rare tail probabilities. However, a model that provides a good overall fit to the data will not necessarily provide a good fit to the extreme values. Such models are typically driven by the bulk of the data and therefore insensitive to the few largest values.

The rarity of extreme values is often thought of in terms of their expected frequency, also known as the return period. When viewed on the return period scale, extreme value analysis may be seen as a form of extrapolation, for example to use 30 years of data to estimate the 100-year return level. While statisticians typically advise to never use data trends to extrapolate, it is clearly not practical to have to collect many more decades of data before robust extreme value estimates can be found. It is therefore critical that any extrapolation towards extreme values is founded on strong theoretical principles so that these estimates can be trusted.

Extreme value theory is the branch of mathematics and statistics dealing with the properties of rare events. It provides asymptotic probability theorems that justify the use of particular statistical models for fitting different forms of extreme value. Tail estimates produced by these models can therefore be trusted as they have a strong theoretical foundation. Moreover, these models are also able to appropriately quantify the uncertainty in these tail estimates, which naturally grows as we extrapolate towards rarer events.

The simplest form of extreme value is the maximum of many independent and identically distributed values. In this case, extreme value theory motivates fitting maxima to the Generalised Extreme Value (GEV) distribution. The GEV is a flexible distribution with a shape parameter governing the heaviness of the upper tail, determining the likelihood of extremely large values. This theorem motivates the common approach of fitting time series of annual maxima to the GEV distribution.

Though the likelihood of coastal flooding can be quantified by fitting coastal sea level data, these extreme events are often driven by large offshore wind, wave or surge events that may act in combination. In such cases, it is often preferable to model the extreme values of these source variables directly, so long as it is possible to transform the resulting distribution to the response variable of interest. In general, many problems are driven by extreme values of multiple variables and the relationship between them. For this, multivariate extreme value theory can be applied to extrapolate each variable towards extreme values while also maintaining appropriate dependencies between the variables.

Reliable estimates of extreme values are essential for assessing the risk of many hazards such as flooding and financial crashes. Extreme value theory provides a strong theoretical basis for applying particular models to extrapolate data towards the extremes. With the appropriate application of extreme value analysis, the likelihood of large but rare events can be accurately estimated with a measure of their uncertainty to help manage and mitigate for these risks.

Dr David Wyncoll is a recent recipient of the Institution of Civil Engineers’ Bill Curtin Medal for Innovation in relation to work on multivariate extreme value modelling of offshore sea conditions with application to national-scale coastal flood risk assessment.