Use Case Example

This example illustrates a use case that covers the essential steps involved in building a hydrological model and conducting a climate change analysis:

  • Identification of the watershed and its key characteristics

    • Beaurivage watershed in Southern Quebec, at the location of the 023401 streamflow gauge.

  • Collection of observed data

    • ERA5-Land and streamflow gauge data.

  • Preparation and calibration of the hydrological model

    • GR4JCN emulated by the Raven hydrological framework.

  • Calculation of hydrological indicators

    • Mean summer flow

    • Mean monthly flow

    • 20- and 100-year maximum flow

    • 2-year minimum 7-day average summer flow

  • Assessment of the impact of climate change

    • Bias-adjusted CMIP6 simulations from the ESPO-G6-R2 dataset

Identification of a watershed and its characteristics

INFO

For more information on this section and available options, consult the GIS notebook.

This first step is highly dependent on the hydrological model. Since we will use GR4JCN in our example, we need to obtain the drainage area, centroid coordinates, and elevation. We’ll also need the watershed delineation to extract the meteorological data. All of these information can be acquired through the xhydro.gis.watershed_to_raven_hru function, which calls upon various functions of that module.

[1]:
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
[2]:
from IPython.display import clear_output

import xhydro.gis as xhgis

clear_output(wait=False)
[3]:
# Watershed delineation
coords = (-71.28878, 46.65692)
gdf = xhgis.watershed_to_raven_hru(coords)
gdf
[3]:
HRU_ID area latitude longitude elevation SubId DowSubId geometry
0 7120365812 585.585577 46.452161 -71.260464 222.55365 1 -1 POLYGON ((-71.09758 46.40035, -71.09409 46.403...

Since xhgis.watershed_delineation extracts the nearest HydroBASINS polygon, the watershed might not exactly correspond to the requested coordinates. The 023401 streamflow gauge as an associated drainage area of 708 km², which differs from our results. Streamflow will have to be adjusted using an area scaling factor.

[4]:
gauge_area = 708
scaling_factor = gdf.iloc[0]["area"] / gauge_area
scaling_factor
[4]:
np.float64(0.8270982719703007)

Collection of observed data

[5]:
import geopandas as gpd
import matplotlib.pyplot as plt
import xarray as xr

# For easy access to the specific streamflow data used here
import xdatasets
import xscen

Meteorological data

INFO

Multiple libraries could be used to perform these steps. For simplicity, this example will use the subset and aggregate modules of the xscen library.

This example will use daily ERA5-Land data hosted on the PAVICS platform.

[6]:
# Extraction of ERA5-Land data
meteo_ref = xr.open_dataset(
    "https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/datasets/reanalyses/day_ERA5-Land_NAM.ncml",
    engine="netcdf4",
    chunks={"time": 365, "lon": 50, "lat": 50},
)[["pr", "tasmin", "tasmax"]]
meteo_ref
[6]:
<xarray.Dataset> Size: 454GB
Dimensions:  (time: 27790, lat: 801, lon: 1700)
Coordinates:
  * time     (time) datetime64[ns] 222kB 1950-01-01 1950-01-02 ... 2026-01-31
  * lat      (lat) float32 3kB 10.0 10.1 10.2 10.3 10.4 ... 89.7 89.8 89.9 90.0
  * lon      (lon) float32 7kB -179.9 -179.8 -179.7 -179.6 ... -10.2 -10.1 -10.0
Data variables:
    pr       (time, lat, lon) float32 151GB dask.array<chunksize=(365, 50, 50), meta=np.ndarray>
    tasmin   (time, lat, lon) float32 151GB dask.array<chunksize=(365, 50, 50), meta=np.ndarray>
    tasmax   (time, lat, lon) float32 151GB dask.array<chunksize=(365, 50, 50), meta=np.ndarray>
Attributes: (12/30)
    Conventions:               CF-1.9
    cell_methods:              time: mean (interval: 1 day)
    doi:                       https://doi.org/10.24381/cds.e2161bac
    domain:                    NAM
    frequency:                 day
    history:                   [2022-12-25 09:07:39.901698] Converted variabl...
    ...                        ...
    institute_id:              ECMWF
    dataset_id:                ERA5-Land
    abstract:                  ERA5-Land provides hourly high resolution info...
    dataset_description:       https://www.ecmwf.int/en/era5-land
    attribution:               Contains modified Copernicus Climate Change Se...
    citation:                  Muñoz Sabater, J., (2021): ERA5-Land hourly da...

That dataset covers the entire globe and has more than 70 years of data. The first step will thus be to subset the dataset both spatially and temporally. For the spatial subset, the GeoDataFrame obtained earlier can be used.

[7]:
meteo_ref = meteo_ref.sel(time=slice("1991", "2020"))  # Temporal subsetting
meteo_ref = xscen.spatial.subset(
    meteo_ref, method="shape", tile_buffer=2, shape=gdf
)  # Spatial subsetting, with a buffer of 2 grid cells
meteo_ref
[7]:
<xarray.Dataset> Size: 9MB
Dimensions:  (time: 10958, lat: 8, lon: 8)
Coordinates:
  * time     (time) datetime64[ns] 88kB 1991-01-01 1991-01-02 ... 2020-12-31
  * lat      (lat) float32 32B 46.1 46.2 46.3 46.4 46.5 46.6 46.7 46.8
  * lon      (lon) float32 32B -71.6 -71.5 -71.4 -71.3 -71.2 -71.1 -71.0 -70.9
Data variables:
    pr       (time, lat, lon) float32 3MB dask.array<chunksize=(355, 8, 8), meta=np.ndarray>
    tasmin   (time, lat, lon) float32 3MB dask.array<chunksize=(355, 8, 8), meta=np.ndarray>
    tasmax   (time, lat, lon) float32 3MB dask.array<chunksize=(355, 8, 8), meta=np.ndarray>
Attributes: (12/30)
    Conventions:               CF-1.9
    cell_methods:              time: mean (interval: 1 day)
    doi:                       https://doi.org/10.24381/cds.e2161bac
    domain:                    NAM
    frequency:                 day
    history:                   [2026-03-31 14:35:41] shape spatial subsetting...
    ...                        ...
    institute_id:              ECMWF
    dataset_id:                ERA5-Land
    abstract:                  ERA5-Land provides hourly high resolution info...
    dataset_description:       https://www.ecmwf.int/en/era5-land
    attribution:               Contains modified Copernicus Climate Change Se...
    citation:                  Muñoz Sabater, J., (2021): ERA5-Land hourly da...
[8]:
ax = plt.subplot(1, 1, 1)
meteo_ref.tasmin.isel(time=0).plot(ax=ax)
gdf.plot(ax=ax)
[8]:
<Axes: title={'center': 'time = 1991-01-01'}, xlabel='longitude [degrees_east]', ylabel='latitude [degrees_north]'>
../_images/notebooks_use_case_12_1.png

Raven expects temperatures in Celsius and precipitation in millimetres, but they currently are in CF-compliant Kelvin and kg m-2 s-1, respectively. The xhydro.modelling.format_input function can be used to prepare data for Raven modelling. It handles unit conversion, variable renaming, and coordinate formatting to ensure compatibility with RavenPy. In the case of gridded meteorological data—as in this example—xHydro calls functions available in RavenPy to assign weights to each grid cell based on the portion that overlaps with the watershed. Alternatively, the data could be aggregated manually before being passed to the model.

For simplification matters, the grid’s elevation will be set at a flat 450 m. Computing grid cell elevation in ERA5-Land is not always trivial and is not within the scope of this example.

[9]:
from pathlib import Path
import tempfile
notebook_folder = Path(tempfile.TemporaryDirectory().name)

import xhydro as xh

# Add altitude data
meteo_ref = meteo_ref.assign_coords(
    {"elevation": xr.ones_like(meteo_ref.pr.isel(time=0).drop_vars("time")) * 450}
)
meteo_ref["elevation"].attrs = {"units": "m"}

meteo_ref, config_meteo_ref = xh.modelling.format_input(
    meteo_ref, model="GR4JCN", save_as=notebook_folder / "_data" / "meteo.nc"
)
meteo_ref
[9]:
<xarray.Dataset> Size: 9MB
Dimensions:    (time: 10958, latitude: 8, longitude: 8)
Coordinates:
  * time       (time) datetime64[ns] 88kB 1991-01-01 1991-01-02 ... 2020-12-31
  * latitude   (latitude) float32 32B 46.1 46.2 46.3 46.4 46.5 46.6 46.7 46.8
  * longitude  (longitude) float32 32B -71.6 -71.5 -71.4 ... -71.1 -71.0 -70.9
    elevation  (latitude, longitude) float32 256B dask.array<chunksize=(8, 8), meta=np.ndarray>
Data variables:
    pr         (time, latitude, longitude) float32 3MB dask.array<chunksize=(355, 8, 8), meta=np.ndarray>
    tasmin     (time, latitude, longitude) float32 3MB dask.array<chunksize=(355, 8, 8), meta=np.ndarray>
    tasmax     (time, latitude, longitude) float32 3MB dask.array<chunksize=(355, 8, 8), meta=np.ndarray>
Attributes: (12/30)
    Conventions:               CF-1.9
    cell_methods:              time: mean (interval: 1 day)
    doi:                       https://doi.org/10.24381/cds.e2161bac
    domain:                    NAM
    frequency:                 day
    history:                   [2026-03-31 14:35:41] shape spatial subsetting...
    ...                        ...
    institute_id:              ECMWF
    dataset_id:                ERA5-Land
    abstract:                  ERA5-Land provides hourly high resolution info...
    dataset_description:       https://www.ecmwf.int/en/era5-land
    attribution:               Contains modified Copernicus Climate Change Se...
    citation:                  Muñoz Sabater, J., (2021): ERA5-Land hourly da...

That function also returns information that will be used later to instanciate the hydrological model:

[10]:
config_meteo_ref
[10]:
{'data_type': ['TEMP_MAX', 'TEMP_MIN', 'PRECIP'],
 'alt_names_meteo': {'TEMP_MAX': 'tasmax',
  'TEMP_MIN': 'tasmin',
  'PRECIP': 'pr'},
 'meteo_file': '/tmp/tmpbnxtefbc/_data/meteo.nc'}

Hydrometric data

Gauge streamflow data from the Quebec government can be accessed through the xdatasets library.

[11]:
qobs = (
    xdatasets.Query(
        **{
            "datasets": {
                "deh": {
                    "id": ["023401"],
                    "variables": ["streamflow"],
                }
            },
            "time": {"start": "1991-01-01", "end": "2020-12-31"},
        }
    )
    .data.squeeze()
    .load()
).sel(spatial_agg="watershed")
[12]:
# Once again, some housekeeping is required on the metadata to make sure that xHydro understands the dataset.
qobs = qobs.rename({"streamflow": "q"})
qobs["id"].attrs["cf_role"] = "timeseries_id"
qobs = qobs.rename({"id": "station_id"})
qobs["q"].attrs = {
    "long_name": "Streamflow",
    "units": "m3 s-1",
    "standard_name": "water_volume_transport_in_river_channel",
    "cell_methods": "time: mean",
}
for c in qobs.coords:
    if len(qobs[c].dims) == 0 and "time" not in c:
        qobs[c] = qobs[c].expand_dims("station_id")

qobs
[12]:
<xarray.Dataset> Size: 132kB
Dimensions:        (time: 10958, station_id: 1)
Coordinates: (12/15)
  * time           (time) datetime64[ns] 88kB 1991-01-01 ... 2020-12-31
  * station_id     (station_id) <U6 24B '023401'
    drainage_area  (station_id) float32 4B 708.0
    end_date       (station_id) datetime64[ns] 8B 2020-12-31
    latitude       (station_id) float32 4B 46.66
    longitude      (station_id) float32 4B -71.29
    ...             ...
    source         (station_id) <U102 408B 'Ministère de l’Environnement, de ...
    spatial_agg    (station_id) <U9 36B 'watershed'
    start_date     (station_id) datetime64[ns] 8B 1991-01-01
    variable       (station_id) <U10 40B 'streamflow'
    time_agg       <U4 16B 'mean'
    timestep       <U1 4B 'D'
Data variables:
    q              (time) float32 44kB 46.0 33.9 26.2 21.0 ... 19.98 15.76 12.93
[13]:
plt.figure(figsize=[10, 5])
qobs.q.plot()
[13]:
[<matplotlib.lines.Line2D at 0x7f5f93952ba0>]
../_images/notebooks_use_case_20_1.png

As specified earlier, streamflow observations need to be modified to account for the differences in watershed sizes between the gauge and the polygon.

[14]:
with xr.set_options(keep_attrs=True):
    qobs["q"] = qobs["q"] * scaling_factor
[15]:
# Save (necessary for model calibration)
qobs.to_netcdf(notebook_folder / "_data" / "qobs.nc")

Preparation and calibration of the hydrological model (xhydro.modelling)

INFO

For more information on this section and available options, consult the Hydrological modelling notebook.

The perform_calibration function requires a model_config argument that allows it to build the corresponding hydrological model. All the required information has been acquired in previous sections, so it is only a matter of filling in the entries of the RavenPy/GR4JCN model.

For simplification matters, as snow water equivalent is not currently available on PAVICS’ database, “AVG_ANNUAL_SNOW” was roughly estimated using Brown & Brasnett (2010).

[17]:
# Model configuration
model_config = {
    "model_name": "GR4JCN",
    "workdir": notebook_folder / "model",
    "overwrite": True,
    "parameters": [0.529, -3.396, 407.29, 1.072, 16.9, 0.947],
    "hru": gdf,
    "start_date": "1991-01-01",
    "end_date": "2020-12-31",
    "rain_snow_fraction": "RAINSNOW_DINGMAN",
    "evaporation": "PET_HARGREAVES_1985",
    "global_parameter": {"AVG_ANNUAL_SNOW": 100.00},
    **config_meteo_ref,  # Reuse information gathered earlier
}

# Parameter bounds for GR4JCN
bounds_low = [0.01, -15.0, 10.0, 0.0, 1.0, 0.0]
bounds_high = [2.5, 10.0, 700.0, 7.0, 30.0, 1.0]
[18]:
# Calibration / validation period
mask_calib = xr.where(qobs.time.dt.year <= 2010, 1, 0).values
mask_valid = xr.where(qobs.time.dt.year > 2010, 1, 0).values

# Model calibration
best_parameters, best_simulation, best_objfun = perform_calibration(
    model_config,
    "kge",
    qobs=notebook_folder / "_data" / "qobs.nc",
    bounds_low=bounds_low,
    bounds_high=bounds_high,
    evaluations=8,
    algorithm="DDS",
    mask=mask_calib,
    sampler_kwargs=dict(trials=1),
)
Initializing the  Dynamically Dimensioned Search (DDS) algorithm  with  8  repetitions
The objective function will be maximized
Starting the DDS algorithm with 8 repetitions...
Finding best starting point for trial 1 using 5 random samples.
1 of 8, maximal objective function=0.129327, time remaining: 00:00:10
Initialize database...
['csv', 'hdf5', 'ram', 'sql', 'custom', 'noData']
2 of 8, maximal objective function=0.129327, time remaining: 00:00:11
3 of 8, maximal objective function=0.293959, time remaining: 00:00:09
4 of 8, maximal objective function=0.293959, time remaining: 00:00:08
5 of 8, maximal objective function=0.293959, time remaining: 00:00:05
6 of 8, maximal objective function=0.293959, time remaining: 00:00:03
7 of 8, maximal objective function=0.390197, time remaining: 00:00:00
8 of 8, maximal objective function=0.393656, time remaining: 23:59:57
Best solution found has obj function value of 0.3936562927318237 at 5



*** Final SPOTPY summary ***
Total Duration: 24.88 seconds
Total Repetitions: 8
Maximal objective value: 0.393656
Corresponding parameter setting:
param0: 1.13242
param1: -2.46413
param2: 43.5229
param3: 1.33209
param4: 24.153
param5: 0.0766439
******************************

Best parameter set:
param0=1.132418270728551, param1=-2.464133107201744, param2=43.522924628974806, param3=1.3320909887768697, param4=24.153016559360484, param5=0.07664389863678882
Run number 7 has the highest objectivefunction with: 0.3937

To reduce computation times for this example, only 10 steps were used for the calibration function, which is well below the recommended number. The parameters below were obtained by running the code above with 150 evaluations.

[19]:
# Replace the results with parameters obtained using 150 evaluations
best_parameters = [
    0.3580270511815579,
    -2.187141388684563,
    24.012067980309702,
    0.000781,
    1.9330212374187332,
    0.5491789347783598,
]
model_config["parameters"] = best_parameters

best_simulation = xh.modelling.hydrological_model(model_config).run()

The real KGE should be computed from a validation period, using get_objective_function.

[20]:
get_objective_function(
    qobs=qobs.q,
    qsim=best_simulation,
    obj_func="kge",
    mask=mask_valid,
).values
[20]:
array(0.68497379)
[21]:
ax = plt.figure(figsize=(10, 5))
qobs.q.plot(color="k", linewidth=3)
best_simulation.q.plot(color="r")
[21]:
[<matplotlib.lines.Line2D at 0x7f5f930e23c0>]
../_images/notebooks_use_case_33_1.png

Calculation of hydroclimatological indicators

[22]:
import numpy as np
import xclim

import xhydro.frequency_analysis as xhfa

Non-frequential indicators

INFO

For more information on this section and available options, consult the Climate change analysis notebook.

INFO

Custom indicators in xHydro are built by following the YAML formatting required by xclim.

A custom indicator built using xclim.core.indicator.Indicator.from_dict will need these elements:

  • “data”: A dictionary with the following information:

    • “base”: The “YAML ID” obtained from here.

    • “input”: A dictionary linking the default xclim input to the name of your variable. Needed only if it is different. In the link above, they are the string following “Uses:”.

    • “parameters”: A dictionary containing all other parameters for a given indicator. In the link above, the easiest way to access them is by clicking the link in the top-right corner of the box describing a given indicator.

    • More entries can be used here, as described in the xclim documentation under “identifier”.

  • “identifier”: A custom name for your indicator. This will be the name returned in the results.

  • “module”: Needed, but can be anything. To prevent an accidental overwriting of xclim indicators, it is best to use something different from: [“atmos”, “land”, “generic”].

For a climate change impact analysis, the typical process to compute non-frequential indicators would be to:

  1. Define the indicators either through the xclim functionalities shown below or through a YAML file.

  2. Call xhydro.indicators.compute_indicators, which would produce annual results through a dictionary, where each key represents the requested frequencies.

  3. Call xhydro.cc.climatological_op on each entry of the dictionary to compute the 30-year average.

  4. Recombine the datasets.

However, if the annual results are not required, xhydro.cc.produce_horizon can bypass steps 2 to 4 and alleviate a lot of hassle. It accomplishes that by removing the time axis and replacing it for a horizon dimension that represents a slice of time. In the case of seasonal or monthly indicators, a corresponding season or month dimension is also added.

We will compute the mean summer flow (an annual indicator) and mean monthly flows.

[23]:
indicators = [
    # Mean summer flow
    xclim.core.indicator.Indicator.from_dict(
        data={
            "base": "stats",
            "input": {"da": "q"},
            "parameters": {"op": "mean", "indexer": {"month": [6, 7, 8]}},
        },
        identifier="qmoy_summer",
        module="hydro",
    ),
    # Mean monthly flow
    xclim.core.indicator.Indicator.from_dict(
        data={
            "base": "stats",
            "input": {"da": "q"},
            "parameters": {"op": "mean", "freq": "MS"},
        },
        identifier="qmoy_monthly",
        module="hydro",
    ),
]

ds_indicators = xh.cc.produce_horizon(
    best_simulation, indicators=indicators, periods=["1991", "2020"]
)

ds_indicators
[23]:
<xarray.Dataset> Size: 316B
Dimensions:             (horizon: 1, month: 12)
Coordinates:
  * horizon             (horizon) <U9 36B '1991-2020'
  * month               (month) <U3 144B 'JAN' 'FEB' 'MAR' ... 'OCT' 'NOV' 'DEC'
    subbasin_id         <U1 4B '1'
    elevation           float32 4B 222.6
    drainage_area       float64 8B 585.6
    centroid_longitude  float64 8B -71.26
    centroid_latitude   float64 8B 46.45
Data variables:
    qmoy_summer         (horizon) float64 8B 9.589
    qmoy_monthly        (horizon, month) float64 96B 5.919 4.565 ... 12.27 9.125
Attributes:
    Conventions:           CF-1.6
    featureType:           timeSeries
    history:               Created on 2026-03-31T14:36:41 by Raven 4.1
    description:           Standard Output
    references:            Craig J.R. and the Raven Development Team Raven us...
    model_id:              GR4JCN
    Raven_version:         4.1
    RavenPy_version:       0.20.0
    cat:xrfreq:            fx
    cat:frequency:         fx
    cat:processing_level:  horizons
    cat:variable:          ('qmoy_monthly',)

Frequency analysis

INFO

For more information on this section and available options, consult the Local frequency analysis notebook.

A frequency analysis typically follows these steps:

  1. Get the raw data needed for the analysis, such as annual maximums, through xhydro.indicators.get_yearly_op.

  2. Call xhfa.local.fit to obtain the parameters for a specified number of distributions, such as Gumbel, GEV, and Pearson-III.

  3. (Optional) Call xhfa.local.criteria to obtain goodness-of-fit parameters.

  4. Call xhfa.local.parametric_quantiles to obtain the return levels.

We will compute the 20 and 100-year annual maximums, as well as the 2-year minimum 7-day summer flow.

[24]:
qref_max = xh.indicators.get_yearly_op(
    best_simulation,
    op="max",
    timeargs={"annual": {}},
    missing="pct",
    missing_options={"tolerance": 0.15},
)
qref_min = xh.indicators.get_yearly_op(
    best_simulation,
    op="min",
    window=7,
    timeargs={"summer": {"date_bounds": ["05-01", "11-30"]}},
    missing="pct",
    missing_options={"tolerance": 0.15},
)
[25]:
ax = plt.figure(figsize=(10, 5))
qref_max.q_max_annual.dropna("time", how="all").plot()
qref_min.q7_min_summer.dropna("time", how="all").plot()
[25]:
[<matplotlib.lines.Line2D at 0x7f5f920e8440>]
../_images/notebooks_use_case_40_1.png
[26]:
params_max = xhfa.local.fit(
    qref_max,
    distributions=["genextreme", "gumbel_r", "norm", "pearson3"],
    method="MLE",
    min_years=20,
).compute()
params_min = xhfa.local.fit(
    qref_min,
    distributions=["genextreme", "gumbel_r", "norm", "pearson3"],
    method="MLE",
    min_years=20,
).compute()

params_max
[26]:
<xarray.Dataset> Size: 436B
Dimensions:             (scipy_dist: 4, dparams: 4)
Coordinates:
  * scipy_dist          (scipy_dist) <U10 160B 'genextreme' ... 'pearson3'
  * dparams             (dparams) <U5 80B 'c' 'skew' 'loc' 'scale'
    horizon             <U9 36B '1991-2020'
    subbasin_id         <U1 4B '1'
    elevation           float32 4B 222.6
    drainage_area       float64 8B 585.6
    centroid_longitude  float64 8B -71.26
    centroid_latitude   float64 8B 46.45
Data variables:
    q_max_annual        (scipy_dist, dparams) float64 128B 0.08161 nan ... 87.46
Attributes: (12/13)
    Conventions:           CF-1.6
    featureType:           timeSeries
    history:               Created on 2026-03-31T14:36:41 by Raven 4.1
    description:           Standard Output
    references:            Craig J.R. and the Raven Development Team Raven us...
    model_id:              GR4JCN
    ...                    ...
    RavenPy_version:       0.20.0
    cat:xrfreq:            YS-JAN
    cat:frequency:         yr
    cat:processing_level:  indicators
    cat:variable:          ('q_max_annual',)
    cat:id:                

While not foolproof, the best fit can be identified using the Bayesian Information Criteria.

[27]:
criteria_max = xhfa.local.criteria(qref_max, params_max)
criteria_max = str(
    criteria_max.isel(
        scipy_dist=criteria_max.q_max_annual.sel(criterion="bic").argmin(dim="scipy_dist").values
    ).scipy_dist.values
)  # Get the best fit as a string
print(f"Best distribution for the annual maxima: {criteria_max}")

criteria_min = xhfa.local.criteria(qref_min, params_min)
criteria_min = str(
    criteria_min.isel(
        scipy_dist=criteria_min.q7_min_summer.sel(criterion="bic").argmin(dim="scipy_dist").values
    ).scipy_dist.values
)  # Get the best fit as a string
print(f"Best distribution for the summer 7-day minima: {criteria_min}")
Best distribution for the annual maxima: gumbel_r
Best distribution for the summer 7-day minima: gumbel_r

Plotting functions will eventually come to xhydro.frequency_analysis, but they currently are a work-in-progress and are hidden by default. Until a public function is added to the library, they can still be called to illustrate the results.

[28]:
# Plotting
data_max = xhfa.local._prepare_plots(
    params_max, xmin=1, xmax=1000, npoints=50, log=True
)
data_max["q_max_annual"].attrs["long_name"] = "q"
pp = xhfa.local._get_plotting_positions(qref_max[["q_max_annual"]])

ax = plt.figure(figsize=(10, 5))
data_max.q_max_annual.sel(scipy_dist=criteria_max).plot(color="k")
plt.xscale("log")
pp.plot.scatter(
    x="q_max_annual_pp",
    y="q_max_annual",
    color="k",
)
[28]:
<matplotlib.collections.PathCollection at 0x7f5f920b6a50>
../_images/notebooks_use_case_45_1.png
[29]:
# Plotting
data_min = xhfa.local._prepare_plots(
    params_min, xmin=1, xmax=1000, npoints=50, log=True
)
data_min["q7_min_summer"].attrs["long_name"] = "q"
pp = xhfa.local._get_plotting_positions(qref_min[["q7_min_summer"]])

ax = plt.figure(figsize=(10, 5))
data_min.q7_min_summer.sel(scipy_dist=criteria_min).plot(color="k")
plt.xscale("log")
pp.plot.scatter(
    x="q7_min_summer_pp",
    y="q7_min_summer",
    color="k",
)
[29]:
<matplotlib.collections.PathCollection at 0x7f5f90358690>
../_images/notebooks_use_case_46_1.png
[30]:
# Computation of return levels
rl_max = xhfa.local.parametric_quantiles(
    params_max.sel(scipy_dist=criteria_max).expand_dims("scipy_dist"),
    return_period=[20, 100],
).squeeze()
rl_min = xhfa.local.parametric_quantiles(
    params_min.sel(scipy_dist=criteria_min).expand_dims("scipy_dist"),
    return_period=[2],
    mode="min",
).squeeze()
rl_max
[30]:
<xarray.Dataset> Size: 148B
Dimensions:             (return_period: 2)
Coordinates:
  * return_period       (return_period) int64 16B 20 100
    p_quantile          (return_period) float64 16B 0.95 0.99
    horizon             <U9 36B '1991-2020'
    scipy_dist          <U8 32B 'gumbel_r'
    subbasin_id         <U1 4B '1'
    elevation           float32 4B 222.6
    drainage_area       float64 8B 585.6
    centroid_longitude  float64 8B -71.26
    centroid_latitude   float64 8B 46.45
Data variables:
    q_max_annual        (return_period) float64 16B 363.4 473.8
Attributes: (12/13)
    Conventions:           CF-1.6
    featureType:           timeSeries
    history:               Created on 2026-03-31T14:36:41 by Raven 4.1
    description:           Standard Output
    references:            Craig J.R. and the Raven Development Team Raven us...
    model_id:              GR4JCN
    ...                    ...
    RavenPy_version:       0.20.0
    cat:xrfreq:            YS-JAN
    cat:frequency:         yr
    cat:processing_level:  indicators
    cat:variable:          ('q_max_annual',)
    cat:id:                
[31]:
print(
    f"20-year annual maximum: {np.round(rl_max.q_max_annual.sel(return_period=20).values, 1)} m³/s"
)
print(
    f"100-year annual maximum: {np.round(rl_max.q_max_annual.sel(return_period=100).values, 1)} m³/s"
)
print(
    f"2-year minimum 7-day summer flow: {np.round(rl_min.q7_min_summer.values, 1)} m³/s"
)
20-year annual maximum: 363.4 m³/s
100-year annual maximum: 473.8 m³/s
2-year minimum 7-day summer flow: 2.4 m³/s

Future streamflow simulations and indicators

Future meteorological data

Now that we have access to a calibrated hydrological model and historical indicators, we can perform the climate change analysis. This example will use a set of CMIP6 models that have been bias adjusted using ERA5-Land, for consistency with the reference product. Specifically, we will use the ESPO-G6-E5L dataset, also hosted on PAVICS. While it is recommended to use multiple emission scenarios, this example will only use the SSP2-4.5 simulations from 14 climate models.

We can mostly reuse the same code as above. One difference is that climate models often use custom calendars. Raven can manage them, but it can still be required to convert them back to standard ones. The convert_calendar_missing argument of format_input can be used for that matter. If “True”, it will linearly interpolate temperature data and put 0 precipitation for calendar days that are being added to the dataset.

[32]:
models = [
    "day_ESPO-G6-E5L_v1.0.0_CMIP6_ScenarioMIP_NAM_CAS_FGOALS-g3_ssp245_r1i1p1f1_1950-2100.ncml",
    "day_ESPO-G6-E5L_v1.0.0_CMIP6_ScenarioMIP_NAM_CMCC_CMCC-ESM2_ssp245_r1i1p1f1_1950-2100.ncml",
    "day_ESPO-G6-E5L_v1.0.0_CMIP6_ScenarioMIP_NAM_CSIRO-ARCCSS_ACCESS-CM2_ssp245_r1i1p1f1_1950-2100.ncml",
    "day_ESPO-G6-E5L_v1.0.0_CMIP6_ScenarioMIP_NAM_CSIRO_ACCESS-ESM1-5_ssp245_r1i1p1f1_1950-2100.ncml",
    "day_ESPO-G6-E5L_v1.0.0_CMIP6_ScenarioMIP_NAM_DKRZ_MPI-ESM1-2-HR_ssp245_r1i1p1f1_1950-2100.ncml",
    "day_ESPO-G6-E5L_v1.0.0_CMIP6_ScenarioMIP_NAM_INM_INM-CM5-0_ssp245_r1i1p1f1_1950-2100.ncml",
    "day_ESPO-G6-E5L_v1.0.0_CMIP6_ScenarioMIP_NAM_MIROC_MIROC6_ssp245_r1i1p1f1_1950-2100.ncml",
    "day_ESPO-G6-E5L_v1.0.0_CMIP6_ScenarioMIP_NAM_MPI-M_MPI-ESM1-2-LR_ssp245_r1i1p1f1_1950-2100.ncml",
    "day_ESPO-G6-E5L_v1.0.0_CMIP6_ScenarioMIP_NAM_MRI_MRI-ESM2-0_ssp245_r1i1p1f1_1950-2100.ncml",
    "day_ESPO-G6-E5L_v1.0.0_CMIP6_ScenarioMIP_NAM_NCC_NorESM2-LM_ssp245_r1i1p1f1_1950-2100.ncml",
    "day_ESPO-G6-E5L_v1.0.0_CMIP6_ScenarioMIP_NAM_CNRM-CERFACS_CNRM-ESM2-1_ssp245_r1i1p1f2_1950-2100.ncml",
    "day_ESPO-G6-E5L_v1.0.0_CMIP6_ScenarioMIP_NAM_NIMS-KMA_KACE-1-0-G_ssp245_r1i1p1f1_1950-2100.ncml",
    "day_ESPO-G6-E5L_v1.0.0_CMIP6_ScenarioMIP_NAM_NOAA-GFDL_GFDL-ESM4_ssp245_r1i1p1f1_1950-2100.ncml",
    "day_ESPO-G6-E5L_v1.0.0_CMIP6_ScenarioMIP_NAM_BCC_BCC-CSM2-MR_ssp245_r1i1p1f1_1950-2100.ncml",
]

The code below will showcase how to proceed with the first simulation, but it could be looped to process all 14.

[33]:
model = models[0]

meteo_sim = xr.open_dataset(
    f"https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/datasets/simulations/bias_adjusted/cmip6/ouranos/ESPO-G/ESPO-G6-E5Lv1.0.0/{model}",
    engine="netcdf4",
    chunks={"time": 365, "lon": 50, "lat": 50},
)
meteo_sim = xscen.spatial.subset(meteo_sim, method="shape", tile_buffer=2, shape=gdf)
# Add altitude data
meteo_sim = meteo_sim.assign_coords(
    {"elevation": xr.ones_like(meteo_sim.pr.isel(time=0).drop_vars("time")) * 350}
)
meteo_sim["elevation"].attrs = {"units": "m"}

# Format the dataset
meteo_sim, config_meteo_sim = xh.modelling.format_input(
    meteo_sim,
    model="GR4JCN",
    convert_calendar_missing=True,
    save_as=notebook_folder / "_data" / f"meteo_sim0.nc",
)

Future streamflow data and indicators

Once again, the same code as before can be roughly reused here, but with xhydro.modelling.hydrological_model. The main difference is that the best parameters can be used when setting up the hydrological model, and that the dates are the full range going to 2100.

Indicators are also computed similarly, with the addition of using a list of periods in the periods argument to create a horizon dimension, instead of a single period. The xhydro.frequency_analysis.fit function can also accept a periods argument.

The code below will once again only showcase the first simulation, but could be used to process all 14.

[34]:
from copy import deepcopy

import xhydro.modelling as xhm
[35]:
model_cfg = (
    deepcopy(model_config) | config_meteo_sim
)  # Update the config with the new information from the simulation file
model_cfg["parameters"] = np.array(best_parameters)
model_cfg["start_date"] = "1991-01-01"
model_cfg["end_date"] = "2099-12-31"

qsim = xhm.hydrological_model(model_cfg).run()
[36]:
plt.figure(figsize=(10, 5))
qsim.q.plot()
[36]:
[<matplotlib.lines.Line2D at 0x7f5f906d7e00>]
../_images/notebooks_use_case_56_1.png
[37]:
# Non-frequential indicators
ds_indicators = xh.cc.produce_horizon(
    qsim, indicators=indicators, periods=[["1991", "2020"], ["2070", "2099"]]
)

# Frequency analysis
qsim_max = xh.indicators.get_yearly_op(
    qsim,
    op="max",
    timeargs={"annual": {}},
    missing="pct",
    missing_options={"tolerance": 0.15},
)
qsim_min = xh.indicators.get_yearly_op(
    qsim,
    op="min",
    window=7,
    timeargs={"summer": {"date_bounds": ["05-01", "11-30"]}},
    missing="pct",
    missing_options={"tolerance": 0.15},
)

params_max = xhfa.local.fit(
    qsim_max,
    distributions=["genextreme", "gumbel_r", "norm", "pearson3"],
    method="MLE",
    min_years=20,
    periods=[["1991", "2020"], ["2070", "2099"]],
).compute()
params_min = xhfa.local.fit(
    qsim_min,
    distributions=["genextreme", "gumbel_r", "norm", "pearson3"],
    method="MLE",
    min_years=20,
    periods=[["1991", "2020"], ["2070", "2099"]],
).compute()

criteria_max = xhfa.local.criteria(
    qsim_max.sel(time=slice("1991", "2020")), params_max.sel(horizon="1991-2020")
)
criteria_max = str(
    criteria_max.isel(
        scipy_dist=criteria_max.q_max_annual.sel(criterion="bic").argmin(dim="scipy_dist").values
    ).scipy_dist.values
)  # Get the best fit as a string

criteria_min = xhfa.local.criteria(
    qsim_min.sel(time=slice("1991", "2020")), params_min.sel(horizon="1991-2020")
)
criteria_min = str(
    criteria_min.isel(
        scipy_dist=criteria_min.q7_min_summer.sel(criterion="bic").argmin(dim="scipy_dist").values
    ).scipy_dist.values
)  # Get the best fit as a string

# Computation of return levels
rl_max = xhfa.local.parametric_quantiles(
    params_max.sel(scipy_dist=criteria_max).expand_dims("scipy_dist"),
    return_period=[20, 100],
).squeeze()
rl_min = xhfa.local.parametric_quantiles(
    params_min.sel(scipy_dist=criteria_min).expand_dims("scipy_dist"),
    return_period=[2],
    mode="min",
).squeeze()

# Combine all indicators
ds_indicators = xr.merge([ds_indicators, rl_max, rl_min], compat="override")

ds_indicators
[37]:
<xarray.Dataset> Size: 568B
Dimensions:             (horizon: 2, month: 12, return_period: 2)
Coordinates:
  * horizon             (horizon) <U9 72B '1991-2020' '2070-2099'
  * month               (month) <U3 144B 'JAN' 'FEB' 'MAR' ... 'OCT' 'NOV' 'DEC'
  * return_period       (return_period) int64 16B 20 100
    p_quantile          (return_period) float64 16B 0.95 0.99
    subbasin_id         <U1 4B '1'
    elevation           float32 4B 222.6
    drainage_area       float64 8B 585.6
    centroid_longitude  float64 8B -71.26
    centroid_latitude   float64 8B 46.45
    scipy_dist          <U8 32B 'gumbel_r'
Data variables:
    qmoy_summer         (horizon) float64 16B 9.707 8.611
    qmoy_monthly        (horizon, month) float64 192B 7.359 6.124 ... 11.74
    q_max_annual        (return_period, horizon) float64 32B 325.4 ... 437.9
    q7_min_summer       (horizon) float64 16B 2.109 1.983
Attributes:
    Conventions:           CF-1.6
    featureType:           timeSeries
    history:               Created on 2026-03-31T14:40:01 by Raven 4.1
    description:           Standard Output
    references:            Craig J.R. and the Raven Development Team Raven us...
    model_id:              GR4JCN
    Raven_version:         4.1
    RavenPy_version:       0.20.0
    cat:xrfreq:            fx
    cat:frequency:         fx
    cat:processing_level:  horizons
    cat:variable:          ('qmoy_monthly',)

Climate change impacts

INFO

For more information on this section and available options, consult the Climate change analysis notebook.

This example will keep the climate change analysis fairly simple.

  • Compute the difference between the future and reference periods using xhydro.cc.compute_deltas.

  • Use those differences to compute ensemble statistics using xhydro.cc.ensemble_stats: ensemble percentiles and agreement between the climate models.

[38]:
# Differences
deltas = xh.cc.compute_deltas(
    ds_indicators, reference_horizon="1991-2020", kind="%", rename_variables=False
).isel(horizon=-1)

# Save the results
deltas.to_netcdf(notebook_folder / "_data" / f"deltas_sim0.nc")

deltas.squeeze()
[38]:
<xarray.Dataset> Size: 404B
Dimensions:             (month: 12, return_period: 2)
Coordinates:
  * month               (month) <U3 144B 'JAN' 'FEB' 'MAR' ... 'OCT' 'NOV' 'DEC'
  * return_period       (return_period) int64 16B 20 100
    p_quantile          (return_period) float64 16B 0.95 0.99
    subbasin_id         <U1 4B '1'
    elevation           float32 4B 222.6
    drainage_area       float64 8B 585.6
    centroid_longitude  float64 8B -71.26
    centroid_latitude   float64 8B 46.45
    horizon             <U9 36B '2070-2099'
    scipy_dist          <U8 32B 'gumbel_r'
Data variables:
    qmoy_summer         float64 8B -11.29
    qmoy_monthly        (month) float64 96B 14.88 10.29 61.14 ... 22.37 24.68
    q_max_annual        (return_period) float64 16B 4.637 7.371
    q7_min_summer       float64 8B -5.966
Attributes:
    Conventions:           CF-1.6
    featureType:           timeSeries
    history:               Created on 2026-03-31T14:40:01 by Raven 4.1
    description:           Standard Output
    references:            Craig J.R. and the Raven Development Team Raven us...
    model_id:              GR4JCN
    Raven_version:         4.1
    RavenPy_version:       0.20.0
    cat:xrfreq:            fx
    cat:frequency:         fx
    cat:processing_level:  deltas
    cat:variable:          ('qmoy_monthly',)

There are many ways to create the ensemble itself. If using a dictionary of datasets, the key will be used to name each element of the new realization dimension. This can be very useful when performing more detailed analyses or when wanting to weight the different models based, for example, on the number of available simulations. In our case, since we only wish to compute ensemble statistics, we can keep it simpler and provide a list.

[39]:
import pooch

# Acquire deltas for the other 13 simulations
from xhydro.testing.helpers import (  # In-house function to access xhydro-testdata
    deveraux,
)

deltas_files = deveraux().fetch("use_case/deltas.zip", processor=pooch.Unzip())
deltas_files = xclim.ensembles.create_ensemble(deltas_files)

# Fix variable names and combine with the file we just created
deltas_files = deltas_files.rename(
    {"streamflow_max_annual": "q_max_annual", "streamflow7_min_summer": "q7_min_summer"}
)
deltas_sim0 = xr.open_dataset(
    notebook_folder / "_data" / f"deltas_sim0.nc"
).assign_coords({"realization": 13})
deltas_files = xr.concat([deltas_files, deltas_sim0], dim="realization")
clear_output(wait=False)
[40]:
# Statistics to compute
statistics = {
    "ensemble_percentiles": {"values": [10, 25, 50, 75, 90], "split": False},
    "robustness_fractions": {"test": None},
}

ens_stats = xh.cc.ensemble_stats(deltas_files, statistics)

ens_stats
[40]:
<xarray.Dataset> Size: 2kB
Dimensions:                         (percentiles: 5, month: 12, return_period: 2)
Coordinates:
  * percentiles                     (percentiles) int64 40B 10 25 50 75 90
  * month                           (month) <U3 144B 'JAN' 'FEB' ... 'NOV' 'DEC'
  * return_period                   (return_period) int64 16B 20 100
    p_quantile                      (return_period) float64 16B 0.95 0.99
    basin_name                      <U7 28B 'sub_001'
    horizon                         <U9 36B '2070-2099'
    subbasin_id                     <U1 4B '1'
    elevation                       float32 4B 222.6
    drainage_area                   float64 8B 585.6
    centroid_longitude              float64 8B -71.26
    centroid_latitude               float64 8B 46.45
Data variables: (12/32)
    qmoy_summer                     (percentiles) float64 40B -14.0 ... 19.14
    qmoy_monthly                    (month, percentiles) float64 480B dask.array<chunksize=(12, 5), meta=np.ndarray>
    q_max_annual                    (return_period, percentiles) float64 80B dask.array<chunksize=(2, 5), meta=np.ndarray>
    q7_min_summer                   (percentiles) float64 40B -28.54 ... 11.68
    qmoy_summer_changed             float64 8B 1.0
    qmoy_summer_positive            float64 8B 0.5714
    ...                              ...
    q7_min_summer_positive          float64 8B 0.2143
    q7_min_summer_changed_positive  float64 8B 0.2143
    q7_min_summer_negative          float64 8B 0.7857
    q7_min_summer_changed_negative  float64 8B 0.7857
    q7_min_summer_valid             float64 8B 1.0
    q7_min_summer_agree             float64 8B 0.7857
Attributes:
    cat:xrfreq:            fx
    cat:frequency:         fx
    cat:processing_level:  ensemble
    cat:variable:          qmoy_monthly
    ensemble_size:         4
[41]:
# Recreate the boxplots based on the computed percentiles
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(13, 5), sharey=True)

ax = plt.subplot(1, 3, 1)
for i, rp in enumerate(ens_stats.return_period.values):
    stats = [
        {
            "label": rp,
            "med": ens_stats.q_max_annual.sel(percentiles=50, return_period=rp).values,
            "q1": ens_stats.q_max_annual.sel(percentiles=25, return_period=rp).values,
            "q3": ens_stats.q_max_annual.sel(percentiles=75, return_period=rp).values,
            "whislo": ens_stats.q_max_annual.sel(
                percentiles=10, return_period=rp
            ).values,
            "whishi": ens_stats.q_max_annual.sel(
                percentiles=90, return_period=rp
            ).values,
        }
    ]

    ax.bxp(stats, showfliers=False, positions=[i], widths=0.5)
ax.set_title("Maximum annual streamflow")
plt.xlabel("Return period")
plt.ylabel("Difference Fut-Hist (%)")

ax = plt.subplot(1, 3, 2)
for i, rp in enumerate(ens_stats.return_period.values):
    stats = [
        {
            "label": rp,
            "med": ens_stats.q7_min_summer.sel(percentiles=50).values,
            "q1": ens_stats.q7_min_summer.sel(percentiles=25).values,
            "q3": ens_stats.q7_min_summer.sel(percentiles=75).values,
            "whislo": ens_stats.q7_min_summer.sel(percentiles=10).values,
            "whishi": ens_stats.q7_min_summer.sel(percentiles=90).values,
        }
    ]

    ax.bxp(stats, showfliers=False, positions=[i], widths=0.5)
ax.set_title("Minimum summer streamflow (7-day avg)")
plt.xlabel("")

ax = plt.subplot(1, 3, 3)
stats = [
    {
        "label": "",
        "med": ens_stats.qmoy_summer.sel(percentiles=50).values,
        "q1": ens_stats.qmoy_summer.sel(percentiles=25).values,
        "q3": ens_stats.qmoy_summer.sel(percentiles=75).values,
        "whislo": ens_stats.qmoy_summer.sel(percentiles=10).values,
        "whishi": ens_stats.qmoy_summer.sel(percentiles=90).values,
    }
]

ax.bxp(stats, showfliers=False, positions=[i], widths=0.25)
ax.set_title("Mean summer flow")

plt.show()
../_images/notebooks_use_case_63_0.png
[42]:
print(
    f"Fraction of simulations with a positive change (maximum streamflow): {ens_stats.q_max_annual_positive.values}"
)
print(
    f"Fraction of simulations with a positive change (minimum summer streamflow): {ens_stats.q7_min_summer_positive.values}"
)
print(
    f"Fraction of simulations with a positive change (mean summer streamflow): {ens_stats.qmoy_summer_positive.values}"
)
Fraction of simulations with a positive change (maximum streamflow): [0.5        0.64285714]
Fraction of simulations with a positive change (minimum summer streamflow): 0.21428571428571427
Fraction of simulations with a positive change (mean summer streamflow): 0.5714285714285714