9. Extreme value analysis

This module provides a wrapper for the Extremes.jl package, enabling its integration with Xarray. The Extremes.jl package allows the analysis of extreme values, providing the following features:

  • Methods for block maxima and threshold exceedances (e.g., genextreme, gumbel_r, genpareto).

  • Parameter estimation using probability-weighted moments, maximum likelihood, and Bayesian procedures.

  • Stationary and non-stationary models.

  • Return level estimation.

For more information about the Extremes.jl package, see: https://doi.org/10.18637/jss.v109.i06 and https://github.com/jojal5/Extremes.jl

[1]:
import holoviews as hv
import hvplot.xarray  # noqa
import numpy as np
import pandas as pd
import panel as pn
import pooch

from xhydro.extreme_value_analysis.parameterestimation import fit, return_level
from xhydro.testing.helpers import deveraux
Detected IPython. Loading juliacall extension. See https://juliapy.github.io/PythonCall.jl/stable/compat/#IPython

9.1. Data

This example uses GFDL ESM4 annual total precipitation data from 1955 to 2100 for 97 virtual stations in Quebec.

[2]:
eva_data = deveraux().fetch("extreme_value_analysis/NOAA_GFDL_ESM4.zip", pooch.Unzip())

df = pd.read_csv(eva_data[0], parse_dates=[0])[
    ["time", "station_num", "station_name", "total_precip"]
]
ds = df.to_xarray()
ds = ds.set_coords(["time", "station_num", "station_name"])
ds = ds.set_index(index=["station_num", "time"])
ds = ds.unstack("index")
Downloading file 'extreme_value_analysis/NOAA_GFDL_ESM4.zip' from 'https://raw.githubusercontent.com/hydrologie/xhydro-testdata/v2025.1.14/data/extreme_value_analysis/NOAA_GFDL_ESM4.zip' to 'C:\Users\ospin\AppData\Local\xhydro-testdata\xhydro-testdata\Cache\v2025.1.14'.
Unzipping contents of 'C:\Users\ospin\AppData\Local\xhydro-testdata\xhydro-testdata\Cache\v2025.1.14\extreme_value_analysis\NOAA_GFDL_ESM4.zip' to 'C:\Users\ospin\AppData\Local\xhydro-testdata\xhydro-testdata\Cache\v2025.1.14\extreme_value_analysis\NOAA_GFDL_ESM4.zip.unzip'
[3]:
station_sel = pn.widgets.Select(
    name="Select Station", options=ds.coords["station_num"].values.tolist()
)


def plot_station_data(station_num):
    station_data = ds.total_precip.sel(station_num=station_num)
    plot = station_data.hvplot.line(x="time", title=f"Virtual station {station_num}")
    return plot


interactive_plot = pn.bind(plot_station_data, station_num=station_sel)
pn.Column(station_sel, interactive_plot).servable()
[3]:

9.2. Parameter estimation for stationary model

GenExtreme parameter estimation is performed using maximum likelihood. Confidence intervals are calculated with 95% confidence.

[4]:
fit_stationary = fit(
    ds,
    dist="genextreme",
    method="ml",
    variables=["total_precip"],
    confidence_level=0.95,
)
fit_stationary
[4]:
<xarray.Dataset> Size: 8kB
Dimensions:             (station_num: 97, dparams: 3)
Coordinates:
  * station_num         (station_num) int64 776B 1001 1004 1008 ... 15403 15414
  * dparams             (dparams) <U5 60B 'shape' 'loc' 'scale'
Data variables:
    total_precip        (station_num, dparams) float64 2kB 0.1816 ... 148.2
    total_precip_lower  (station_num, dparams) float64 2kB 0.08659 ... 130.3
    total_precip_upper  (station_num, dparams) float64 2kB 0.2766 ... 168.5
Attributes:
    long_name:         genextreme parameters
    dist:              genextreme
    method:            Maximum likelihood
    confidence_level:  0.95

9.3. Return level estimation for stationary model

100-year return level for the Gumbel distribution using maximum likelihood. Confidence intervals are calculated with 95% confidence.

[5]:
rtnlv_stationary = return_level(
    ds,
    dist="gumbel_r",
    method="ml",
    variables=["total_precip"],
    confidence_level=0.95,
    return_period=100,
)
rtnlv_stationary
[5]:
<xarray.Dataset> Size: 3kB
Dimensions:             (station_num: 97, return_level: 1)
Coordinates:
  * station_num         (station_num) int64 776B 1001 1004 1008 ... 15403 15414
  * return_level        (return_level) <U12 48B 'return_level'
Data variables:
    total_precip        (station_num, return_level) float64 776B 1.704e+03 .....
    total_precip_lower  (station_num, return_level) float64 776B 1.609e+03 .....
    total_precip_upper  (station_num, return_level) float64 776B 1.8e+03 ... ...
Attributes:
    long_name:         Return level estimation
    dist:              gumbel_r
    method:            Maximum likelihood
    return_period:     100
    confidence_level:  0.95

9.4. Parameter estimation for non-stationary model

For this example the location parameter varies as a linear function of the year. To do this, a new dimension containing the year is created.

[6]:
ds2 = ds.copy(deep=True)
ds2["year"] = (
    ["station_num", "time"],
    np.tile(pd.to_datetime(ds.time).year, (ds2.dims["station_num"], 1)),
)
ds2
[6]:
<xarray.Dataset> Size: 285kB
Dimensions:       (station_num: 97, time: 146)
Coordinates:
  * station_num   (station_num) int64 776B 1001 1004 1008 ... 14586 15403 15414
  * time          (time) datetime64[ns] 1kB 1955-01-01 1956-01-01 ... 2100-01-01
    station_name  (station_num, time) object 113kB 'Dozois' ... 'Romaine 4 Sud'
Data variables:
    total_precip  (station_num, time) float64 113kB 983.6 ... 1.065e+03
    year          (station_num, time) int32 57kB 1955 1956 1957 ... 2099 2100
[7]:
fit_non_stationary = fit(
    ds2,
    dist="genextreme",
    method="ml",
    variables=["total_precip"],
    locationcov=["year"],
    confidence_level=0.95,
)
fit_non_stationary
[7]:
<xarray.Dataset> Size: 10kB
Dimensions:             (station_num: 97, dparams: 4)
Coordinates:
  * station_num         (station_num) int64 776B 1001 1004 1008 ... 15403 15414
  * dparams             (dparams) <U18 288B 'shape' 'loc' ... 'scale'
Data variables:
    total_precip        (station_num, dparams) float64 3kB 0.2051 ... 137.5
    total_precip_lower  (station_num, dparams) float64 3kB 0.1022 ... 120.9
    total_precip_upper  (station_num, dparams) float64 3kB 0.308 ... 156.4
Attributes:
    long_name:         genextreme parameters
    dist:              genextreme
    method:            Maximum likelihood
    confidence_level:  0.95

9.5. Return level estimation for non-stationary model

100-year return level with the location parameter varies as a linear function of the year.

[8]:
rtnlv_non_stationary = return_level(
    ds2,
    dist="genextreme",
    method="ml",
    variables=["total_precip"],
    locationcov=["year"],
    confidence_level=0.95,
    return_period=100,
)

rtnlv_non_stationary
[8]:
<xarray.Dataset> Size: 342kB
Dimensions:             (station_num: 97, return_level: 146)
Coordinates:
  * station_num         (station_num) int64 776B 1001 1004 1008 ... 15403 15414
  * return_level        (return_level) datetime64[ns] 1kB 1955-01-01 ... 2100...
Data variables:
    total_precip        (station_num, return_level) float64 113kB 1.361e+03 ....
    total_precip_lower  (station_num, return_level) float64 113kB 1.274e+03 ....
    total_precip_upper  (station_num, return_level) float64 113kB 1.448e+03 ....
Attributes:
    long_name:         Return level estimation
    dist:              genextreme
    method:            Maximum likelihood
    return_period:     100
    confidence_level:  0.95

9.6. Comparison of the return level using the stationary and non-stationary model

[9]:
station_selector = pn.widgets.Select(
    name="Select Station",
    options=rtnlv_non_stationary.coords["station_num"].values.tolist(),
)


def plot_station_data(station_num):

    non_stationary = rtnlv_non_stationary.sel(station_num=station_num)
    stationary = rtnlv_stationary.sel(station_num=station_num)

    line_plot_sta = hv.Curve(
        (non_stationary.return_level.values, stationary.total_precip.values[0]),
        label="Stationary",
    ).opts(color="blue")
    area_plot = hv.Area(
        (
            list(non_stationary.return_level.values),
            stationary.total_precip_lower.values[0],
            stationary.total_precip_upper.values[0],
        ),
        vdims=["y", "y2"],
    ).opts(alpha=0.3, color="blue", line_width=0)

    area_plot_non = non_stationary.hvplot.area(
        x="return_level",
        y="total_precip_lower",
        y2="total_precip_upper",
        alpha=0.3,
        color="red",
        line_width=0,
    )
    line_plot_non = non_stationary["total_precip"].hvplot.line(
        x="return_level", label="Non-stationary", color="red"
    )

    plot = area_plot * line_plot_sta * area_plot_non * line_plot_non

    plot = plot.opts(
        title=f"Virtual Station {station_num}",
        xlabel="Return Level (Time)",
        ylabel="Total Precipitation",
    )

    return plot


interactive_plot = pn.bind(plot_station_data, station_num=station_selector)


pn.Column(station_selector, interactive_plot).servable()
[9]:

9.7. Working with dask.array chunks

Currently, the extreme_value_analysis function is not thread-safe. Therefore, it is recommended to use the dask.scheduler= »processes » option to ensures that tasks are executed in separate Python processes.

[10]:
ds_c = ds.chunk({"time": -1, "station_num": 1})

fit_stationary_c = fit(
    ds_c,
    dist="genextreme",
    method="ml",
    variables=["total_precip"],
    confidence_level=0.95,
)
fit_stationary_c
C:\Users\ospin\OneDrive\Documents\Github\xhydro\src\xhydro\extreme_value_analysis\parameterestimation.py:215: UserWarning: Dataset contains chunks. It is recommended to use scheduler='processes' to compute the results.
[10]:
<xarray.Dataset> Size: 8kB
Dimensions:             (station_num: 97, dparams: 3)
Coordinates:
  * station_num         (station_num) int64 776B 1001 1004 1008 ... 15403 15414
  * dparams             (dparams) <U5 60B 'shape' 'loc' 'scale'
Data variables:
    total_precip        (station_num, dparams) float64 2kB dask.array<chunksize=(1, 3), meta=np.ndarray>
    total_precip_lower  (station_num, dparams) float64 2kB dask.array<chunksize=(1, 3), meta=np.ndarray>
    total_precip_upper  (station_num, dparams) float64 2kB dask.array<chunksize=(1, 3), meta=np.ndarray>
Attributes:
    long_name:         genextreme parameters
    dist:              genextreme
    method:            Maximum likelihood
    confidence_level:  0.95
[11]:
fit_stationary_c.compute(scheduler="processes")
[11]:
<xarray.Dataset> Size: 8kB
Dimensions:             (station_num: 97, dparams: 3)
Coordinates:
  * station_num         (station_num) int64 776B 1001 1004 1008 ... 15403 15414
  * dparams             (dparams) <U5 60B 'shape' 'loc' 'scale'
Data variables:
    total_precip        (station_num, dparams) float64 2kB 0.1816 ... 148.2
    total_precip_lower  (station_num, dparams) float64 2kB 0.08659 ... 130.3
    total_precip_upper  (station_num, dparams) float64 2kB 0.2766 ... 168.5
Attributes:
    long_name:         genextreme parameters
    dist:              genextreme
    method:            Maximum likelihood
    confidence_level:  0.95
Detected IPython. Loading juliacall extension. See https://juliapy.github.io/PythonCall.jl/stable/compat/#IPython
Detected IPython. Loading juliacall extension. See https://juliapy.github.io/PythonCall.jl/stable/compat/#IPython
Detected IPython. Loading juliacall extension. See https://juliapy.github.io/PythonCall.jl/stable/compat/#IPython
Detected IPython. Loading juliacall extension. See https://juliapy.github.io/PythonCall.jl/stable/compat/#IPython
Detected IPython. Loading juliacall extension. See https://juliapy.github.io/PythonCall.jl/stable/compat/#IPython
Detected IPython. Loading juliacall extension. See https://juliapy.github.io/PythonCall.jl/stable/compat/#IPython
Detected IPython. Loading juliacall extension. See https://juliapy.github.io/PythonCall.jl/stable/compat/#IPython
Detected IPython. Loading juliacall extension. See https://juliapy.github.io/PythonCall.jl/stable/compat/#IPython
Detected IPython. Loading juliacall extension. See https://juliapy.github.io/PythonCall.jl/stable/compat/#IPython
Detected IPython. Loading juliacall extension. See https://juliapy.github.io/PythonCall.jl/stable/compat/#IPython
Detected IPython. Loading juliacall extension. See https://juliapy.github.io/PythonCall.jl/stable/compat/#IPython
Detected IPython. Loading juliacall extension. See https://juliapy.github.io/PythonCall.jl/stable/compat/#IPython
Detected IPython. Loading juliacall extension. See https://juliapy.github.io/PythonCall.jl/stable/compat/#IPython
Detected IPython. Loading juliacall extension. See https://juliapy.github.io/PythonCall.jl/stable/compat/#IPython
Detected IPython. Loading juliacall extension. See https://juliapy.github.io/PythonCall.jl/stable/compat/#IPython
Detected IPython. Loading juliacall extension. See https://juliapy.github.io/PythonCall.jl/stable/compat/#IPython
Detected IPython. Loading juliacall extension. See https://juliapy.github.io/PythonCall.jl/stable/compat/#IPython