Use Case Example¶

This example illustrates a use case that covers the essential steps involved in building a hydrological model and conducting a climate change analysis:

Identification of the watershed and its key characteristics
- Beaurivage watershed in Southern Quebec, at the location of the 023401 streamflow gauge.
Collection of observed data
- ERA5-Land and streamflow gauge data.
Preparation and calibration of the hydrological model
- GR4JCN emulated by the Raven hydrological framework.
Calculation of hydrological indicators
- Mean summer flow
- Mean monthly flow
- 20- and 100-year maximum flow
- 2-year minimum 7-day average summer flow
Assessment of the impact of climate change
- Bias-adjusted CMIP6 simulations from the ESPO-G6-R2 dataset

Identification of a watershed and its characteristics¶

INFO

For more information on this section and available options, consult the GIS notebook.

This first step is highly dependent on the hydrological model. Since we will use GR4JCN in our example, we need to obtain the drainage area, centroid coordinates, and elevation. We’ll also need the watershed delineation to extract the meteorological data. All of these information can be acquired through the xhydro.gis.watershed_to_raven_hru function, which calls upon various functions of that module.

[1]:

import warnings
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=FutureWarning)

[2]:

from IPython.display import clear_output

import xhydro.gis as xhgis

clear_output(wait=False)

[3]:

# Watershed delineation
coords = (-71.28878, 46.65692)
gdf = xhgis.watershed_to_raven_hru(coords)
gdf

[3]:

	HRU_ID	area	latitude	longitude	elevation	SubId	DowSubId	geometry
0	7120365812	585.585577	46.452161	-71.260464	222.55365	1	-1	POLYGON ((-71.09758 46.40035, -71.09409 46.403...

Since xhgis.watershed_delineation extracts the nearest HydroBASINS polygon, the watershed might not exactly correspond to the requested coordinates. The 023401 streamflow gauge as an associated drainage area of 708 km², which differs from our results. Streamflow will have to be adjusted using an area scaling factor.

[4]:

gauge_area = 708
scaling_factor = gdf.iloc[0]["area"] / gauge_area
scaling_factor

[4]:

np.float64(0.8270982719703007)

Collection of observed data¶

[5]:

import geopandas as gpd
import matplotlib.pyplot as plt
import xarray as xr

# For easy access to the specific streamflow data used here
import xdatasets
import xscen

Meteorological data¶

INFO

Multiple libraries could be used to perform these steps. For simplicity, this example will use the subset and aggregate modules of the xscen library.

This example will use daily ERA5-Land data hosted on the PAVICS platform.

[6]:

# Extraction of ERA5-Land data
meteo_ref = xr.open_dataset(
    "https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/datasets/reanalyses/day_ERA5-Land_NAM.ncml",
    engine="netcdf4",
    chunks={"time": 365, "lon": 50, "lat": 50},
)[["pr", "tasmin", "tasmax"]]
meteo_ref

[6]:

<xarray.Dataset> Size: 454GB
Dimensions:  (time: 27790, lat: 801, lon: 1700)
Coordinates:
  * time     (time) datetime64[ns] 222kB 1950-01-01 1950-01-02 ... 2026-01-31
  * lat      (lat) float32 3kB 10.0 10.1 10.2 10.3 10.4 ... 89.7 89.8 89.9 90.0
  * lon      (lon) float32 7kB -179.9 -179.8 -179.7 -179.6 ... -10.2 -10.1 -10.0
Data variables:
    pr       (time, lat, lon) float32 151GB dask.array<chunksize=(365, 50, 50), meta=np.ndarray>
    tasmin   (time, lat, lon) float32 151GB dask.array<chunksize=(365, 50, 50), meta=np.ndarray>
    tasmax   (time, lat, lon) float32 151GB dask.array<chunksize=(365, 50, 50), meta=np.ndarray>
Attributes: (12/30)
    Conventions:               CF-1.9
    cell_methods:              time: mean (interval: 1 day)
    doi:                       https://doi.org/10.24381/cds.e2161bac
    domain:                    NAM
    frequency:                 day
    history:                   [2022-12-25 09:07:39.901698] Converted variabl...
    ...                        ...
    institute_id:              ECMWF
    dataset_id:                ERA5-Land
    abstract:                  ERA5-Land provides hourly high resolution info...
    dataset_description:       https://www.ecmwf.int/en/era5-land
    attribution:               Contains modified Copernicus Climate Change Se...
    citation:                  Muñoz Sabater, J., (2021): ERA5-Land hourly da...

That dataset covers the entire globe and has more than 70 years of data. The first step will thus be to subset the dataset both spatially and temporally. For the spatial subset, the GeoDataFrame obtained earlier can be used.

[7]:

meteo_ref = meteo_ref.sel(time=slice("1991", "2020"))  # Temporal subsetting
meteo_ref = xscen.spatial.subset(
    meteo_ref, method="shape", tile_buffer=2, shape=gdf
)  # Spatial subsetting, with a buffer of 2 grid cells
meteo_ref

[7]:

<xarray.Dataset> Size: 9MB
Dimensions:  (time: 10958, lat: 8, lon: 8)
Coordinates:
  * time     (time) datetime64[ns] 88kB 1991-01-01 1991-01-02 ... 2020-12-31
  * lat      (lat) float32 32B 46.1 46.2 46.3 46.4 46.5 46.6 46.7 46.8
  * lon      (lon) float32 32B -71.6 -71.5 -71.4 -71.3 -71.2 -71.1 -71.0 -70.9
Data variables:
    pr       (time, lat, lon) float32 3MB dask.array<chunksize=(355, 8, 8), meta=np.ndarray>
    tasmin   (time, lat, lon) float32 3MB dask.array<chunksize=(355, 8, 8), meta=np.ndarray>
    tasmax   (time, lat, lon) float32 3MB dask.array<chunksize=(355, 8, 8), meta=np.ndarray>
Attributes: (12/30)
    Conventions:               CF-1.9
    cell_methods:              time: mean (interval: 1 day)
    doi:                       https://doi.org/10.24381/cds.e2161bac
    domain:                    NAM
    frequency:                 day
    history:                   [2026-03-31 14:35:41] shape spatial subsetting...
    ...                        ...
    institute_id:              ECMWF
    dataset_id:                ERA5-Land
    abstract:                  ERA5-Land provides hourly high resolution info...
    dataset_description:       https://www.ecmwf.int/en/era5-land
    attribution:               Contains modified Copernicus Climate Change Se...
    citation:                  Muñoz Sabater, J., (2021): ERA5-Land hourly da...

[8]:

ax = plt.subplot(1, 1, 1)
meteo_ref.tasmin.isel(time=0).plot(ax=ax)
gdf.plot(ax=ax)

[8]:

<Axes: title={'center': 'time = 1991-01-01'}, xlabel='longitude [degrees_east]', ylabel='latitude [degrees_north]'>

Raven expects temperatures in Celsius and precipitation in millimetres, but they currently are in CF-compliant Kelvin and kg m-2 s-1, respectively. The xhydro.modelling.format_input function can be used to prepare data for Raven modelling. It handles unit conversion, variable renaming, and coordinate formatting to ensure compatibility with RavenPy. In the case of gridded meteorological data—as in this example—xHydro calls functions available in RavenPy to assign weights to each grid cell based on the portion that overlaps with the watershed. Alternatively, the data could be aggregated manually before being passed to the model.

For simplification matters, the grid’s elevation will be set at a flat 450 m. Computing grid cell elevation in ERA5-Land is not always trivial and is not within the scope of this example.

[9]:

from pathlib import Path
import tempfile
notebook_folder = Path(tempfile.TemporaryDirectory().name)

import xhydro as xh

# Add altitude data
meteo_ref = meteo_ref.assign_coords(
    {"elevation": xr.ones_like(meteo_ref.pr.isel(time=0).drop_vars("time")) * 450}
)
meteo_ref["elevation"].attrs = {"units": "m"}

meteo_ref, config_meteo_ref = xh.modelling.format_input(
    meteo_ref, model="GR4JCN", save_as=notebook_folder / "_data" / "meteo.nc"
)
meteo_ref

[9]:

<xarray.Dataset> Size: 9MB
Dimensions:    (time: 10958, latitude: 8, longitude: 8)
Coordinates:
  * time       (time) datetime64[ns] 88kB 1991-01-01 1991-01-02 ... 2020-12-31
  * latitude   (latitude) float32 32B 46.1 46.2 46.3 46.4 46.5 46.6 46.7 46.8
  * longitude  (longitude) float32 32B -71.6 -71.5 -71.4 ... -71.1 -71.0 -70.9
    elevation  (latitude, longitude) float32 256B dask.array<chunksize=(8, 8), meta=np.ndarray>
Data variables:
    pr         (time, latitude, longitude) float32 3MB dask.array<chunksize=(355, 8, 8), meta=np.ndarray>
    tasmin     (time, latitude, longitude) float32 3MB dask.array<chunksize=(355, 8, 8), meta=np.ndarray>
    tasmax     (time, latitude, longitude) float32 3MB dask.array<chunksize=(355, 8, 8), meta=np.ndarray>
Attributes: (12/30)
    Conventions:               CF-1.9
    cell_methods:              time: mean (interval: 1 day)
    doi:                       https://doi.org/10.24381/cds.e2161bac
    domain:                    NAM
    frequency:                 day
    history:                   [2026-03-31 14:35:41] shape spatial subsetting...
    ...                        ...
    institute_id:              ECMWF
    dataset_id:                ERA5-Land
    abstract:                  ERA5-Land provides hourly high resolution info...
    dataset_description:       https://www.ecmwf.int/en/era5-land
    attribution:               Contains modified Copernicus Climate Change Se...
    citation:                  Muñoz Sabater, J., (2021): ERA5-Land hourly da...

That function also returns information that will be used later to instanciate the hydrological model:

[10]:

config_meteo_ref

[10]:

{'data_type': ['TEMP_MAX', 'TEMP_MIN', 'PRECIP'],
 'alt_names_meteo': {'TEMP_MAX': 'tasmax',
  'TEMP_MIN': 'tasmin',
  'PRECIP': 'pr'},
 'meteo_file': '/tmp/tmpbnxtefbc/_data/meteo.nc'}

Preparation and calibration of the hydrological model (xhydro.modelling)¶

INFO

For more information on this section and available options, consult the Hydrological modelling notebook.

[16]:

import xhydro as xh
from xhydro.modelling.calibration import perform_calibration
from xhydro.modelling.obj_funcs import get_objective_function

The perform_calibration function requires a model_config argument that allows it to build the corresponding hydrological model. All the required information has been acquired in previous sections, so it is only a matter of filling in the entries of the RavenPy/GR4JCN model.

For simplification matters, as snow water equivalent is not currently available on PAVICS’ database, “AVG_ANNUAL_SNOW” was roughly estimated using Brown & Brasnett (2010).

[17]:

# Model configuration
model_config = {
    "model_name": "GR4JCN",
    "workdir": notebook_folder / "model",
    "overwrite": True,
    "parameters": [0.529, -3.396, 407.29, 1.072, 16.9, 0.947],
    "hru": gdf,
    "start_date": "1991-01-01",
    "end_date": "2020-12-31",
    "rain_snow_fraction": "RAINSNOW_DINGMAN",
    "evaporation": "PET_HARGREAVES_1985",
    "global_parameter": {"AVG_ANNUAL_SNOW": 100.00},
    **config_meteo_ref,  # Reuse information gathered earlier
}

# Parameter bounds for GR4JCN
bounds_low = [0.01, -15.0, 10.0, 0.0, 1.0, 0.0]
bounds_high = [2.5, 10.0, 700.0, 7.0, 30.0, 1.0]

[18]:

# Calibration / validation period
mask_calib = xr.where(qobs.time.dt.year <= 2010, 1, 0).values
mask_valid = xr.where(qobs.time.dt.year > 2010, 1, 0).values

# Model calibration
best_parameters, best_simulation, best_objfun = perform_calibration(
    model_config,
    "kge",
    qobs=notebook_folder / "_data" / "qobs.nc",
    bounds_low=bounds_low,
    bounds_high=bounds_high,
    evaluations=8,
    algorithm="DDS",
    mask=mask_calib,
    sampler_kwargs=dict(trials=1),
)

Initializing the  Dynamically Dimensioned Search (DDS) algorithm  with  8  repetitions
The objective function will be maximized
Starting the DDS algorithm with 8 repetitions...
Finding best starting point for trial 1 using 5 random samples.
1 of 8, maximal objective function=0.129327, time remaining: 00:00:10
Initialize database...
['csv', 'hdf5', 'ram', 'sql', 'custom', 'noData']
2 of 8, maximal objective function=0.129327, time remaining: 00:00:11
3 of 8, maximal objective function=0.293959, time remaining: 00:00:09
4 of 8, maximal objective function=0.293959, time remaining: 00:00:08
5 of 8, maximal objective function=0.293959, time remaining: 00:00:05
6 of 8, maximal objective function=0.293959, time remaining: 00:00:03
7 of 8, maximal objective function=0.390197, time remaining: 00:00:00
8 of 8, maximal objective function=0.393656, time remaining: 23:59:57
Best solution found has obj function value of 0.3936562927318237 at 5



*** Final SPOTPY summary ***
Total Duration: 24.88 seconds
Total Repetitions: 8
Maximal objective value: 0.393656
Corresponding parameter setting:
param0: 1.13242
param1: -2.46413
param2: 43.5229
param3: 1.33209
param4: 24.153
param5: 0.0766439
******************************

Best parameter set:
param0=1.132418270728551, param1=-2.464133107201744, param2=43.522924628974806, param3=1.3320909887768697, param4=24.153016559360484, param5=0.07664389863678882
Run number 7 has the highest objectivefunction with: 0.3937

To reduce computation times for this example, only 10 steps were used for the calibration function, which is well below the recommended number. The parameters below were obtained by running the code above with 150 evaluations.

[19]:

# Replace the results with parameters obtained using 150 evaluations
best_parameters = [
    0.3580270511815579,
    -2.187141388684563,
    24.012067980309702,
    0.000781,
    1.9330212374187332,
    0.5491789347783598,
]
model_config["parameters"] = best_parameters

best_simulation = xh.modelling.hydrological_model(model_config).run()

The real KGE should be computed from a validation period, using get_objective_function.

[20]:

get_objective_function(
    qobs=qobs.q,
    qsim=best_simulation,
    obj_func="kge",
    mask=mask_valid,
).values

[20]:

array(0.68497379)

[21]:

ax = plt.figure(figsize=(10, 5))
qobs.q.plot(color="k", linewidth=3)
best_simulation.q.plot(color="r")

[21]:

[<matplotlib.lines.Line2D at 0x7f5f930e23c0>]

Climate change impacts¶

INFO

For more information on this section and available options, consult the Climate change analysis notebook.

This example will keep the climate change analysis fairly simple.

Compute the difference between the future and reference periods using xhydro.cc.compute_deltas.
Use those differences to compute ensemble statistics using xhydro.cc.ensemble_stats: ensemble percentiles and agreement between the climate models.

[38]:

# Differences
deltas = xh.cc.compute_deltas(
    ds_indicators, reference_horizon="1991-2020", kind="%", rename_variables=False
).isel(horizon=-1)

# Save the results
deltas.to_netcdf(notebook_folder / "_data" / f"deltas_sim0.nc")

deltas.squeeze()

There are many ways to create the ensemble itself. If using a dictionary of datasets, the key will be used to name each element of the new realization dimension. This can be very useful when performing more detailed analyses or when wanting to weight the different models based, for example, on the number of available simulations. In our case, since we only wish to compute ensemble statistics, we can keep it simpler and provide a list.

[39]:

import pooch

# Acquire deltas for the other 13 simulations
from xhydro.testing.helpers import (  # In-house function to access xhydro-testdata
    deveraux,
)

deltas_files = deveraux().fetch("use_case/deltas.zip", processor=pooch.Unzip())
deltas_files = xclim.ensembles.create_ensemble(deltas_files)

# Fix variable names and combine with the file we just created
deltas_files = deltas_files.rename(
    {"streamflow_max_annual": "q_max_annual", "streamflow7_min_summer": "q7_min_summer"}
)
deltas_sim0 = xr.open_dataset(
    notebook_folder / "_data" / f"deltas_sim0.nc"
).assign_coords({"realization": 13})
deltas_files = xr.concat([deltas_files, deltas_sim0], dim="realization")
clear_output(wait=False)

[40]:

# Statistics to compute
statistics = {
    "ensemble_percentiles": {"values": [10, 25, 50, 75, 90], "split": False},
    "robustness_fractions": {"test": None},
}

ens_stats = xh.cc.ensemble_stats(deltas_files, statistics)

ens_stats

[40]:

<xarray.Dataset> Size: 2kB
Dimensions:                         (percentiles: 5, month: 12, return_period: 2)
Coordinates:
  * percentiles                     (percentiles) int64 40B 10 25 50 75 90
  * month                           (month) <U3 144B 'JAN' 'FEB' ... 'NOV' 'DEC'
  * return_period                   (return_period) int64 16B 20 100
    p_quantile                      (return_period) float64 16B 0.95 0.99
    basin_name                      <U7 28B 'sub_001'
    horizon                         <U9 36B '2070-2099'
    subbasin_id                     <U1 4B '1'
    elevation                       float32 4B 222.6
    drainage_area                   float64 8B 585.6
    centroid_longitude              float64 8B -71.26
    centroid_latitude               float64 8B 46.45
Data variables: (12/32)
    qmoy_summer                     (percentiles) float64 40B -14.0 ... 19.14
    qmoy_monthly                    (month, percentiles) float64 480B dask.array<chunksize=(12, 5), meta=np.ndarray>
    q_max_annual                    (return_period, percentiles) float64 80B dask.array<chunksize=(2, 5), meta=np.ndarray>
    q7_min_summer                   (percentiles) float64 40B -28.54 ... 11.68
    qmoy_summer_changed             float64 8B 1.0
    qmoy_summer_positive            float64 8B 0.5714
    ...                              ...
    q7_min_summer_positive          float64 8B 0.2143
    q7_min_summer_changed_positive  float64 8B 0.2143
    q7_min_summer_negative          float64 8B 0.7857
    q7_min_summer_changed_negative  float64 8B 0.7857
    q7_min_summer_valid             float64 8B 1.0
    q7_min_summer_agree             float64 8B 0.7857
Attributes:
    cat:xrfreq:            fx
    cat:frequency:         fx
    cat:processing_level:  ensemble
    cat:variable:          qmoy_monthly
    ensemble_size:         4

[41]:

# Recreate the boxplots based on the computed percentiles
fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(13, 5), sharey=True)

ax = plt.subplot(1, 3, 1)
for i, rp in enumerate(ens_stats.return_period.values):
    stats = [
        {
            "label": rp,
            "med": ens_stats.q_max_annual.sel(percentiles=50, return_period=rp).values,
            "q1": ens_stats.q_max_annual.sel(percentiles=25, return_period=rp).values,
            "q3": ens_stats.q_max_annual.sel(percentiles=75, return_period=rp).values,
            "whislo": ens_stats.q_max_annual.sel(
                percentiles=10, return_period=rp
            ).values,
            "whishi": ens_stats.q_max_annual.sel(
                percentiles=90, return_period=rp
            ).values,
        }
    ]

    ax.bxp(stats, showfliers=False, positions=[i], widths=0.5)
ax.set_title("Maximum annual streamflow")
plt.xlabel("Return period")
plt.ylabel("Difference Fut-Hist (%)")

ax = plt.subplot(1, 3, 2)
for i, rp in enumerate(ens_stats.return_period.values):
    stats = [
        {
            "label": rp,
            "med": ens_stats.q7_min_summer.sel(percentiles=50).values,
            "q1": ens_stats.q7_min_summer.sel(percentiles=25).values,
            "q3": ens_stats.q7_min_summer.sel(percentiles=75).values,
            "whislo": ens_stats.q7_min_summer.sel(percentiles=10).values,
            "whishi": ens_stats.q7_min_summer.sel(percentiles=90).values,
        }
    ]

    ax.bxp(stats, showfliers=False, positions=[i], widths=0.5)
ax.set_title("Minimum summer streamflow (7-day avg)")
plt.xlabel("")

ax = plt.subplot(1, 3, 3)
stats = [
    {
        "label": "",
        "med": ens_stats.qmoy_summer.sel(percentiles=50).values,
        "q1": ens_stats.qmoy_summer.sel(percentiles=25).values,
        "q3": ens_stats.qmoy_summer.sel(percentiles=75).values,
        "whislo": ens_stats.qmoy_summer.sel(percentiles=10).values,
        "whishi": ens_stats.qmoy_summer.sel(percentiles=90).values,
    }
]

ax.bxp(stats, showfliers=False, positions=[i], widths=0.25)
ax.set_title("Mean summer flow")

plt.show()

[42]:

print(
    f"Fraction of simulations with a positive change (maximum streamflow): {ens_stats.q_max_annual_positive.values}"
)
print(
    f"Fraction of simulations with a positive change (minimum summer streamflow): {ens_stats.q7_min_summer_positive.values}"
)
print(
    f"Fraction of simulations with a positive change (mean summer streamflow): {ens_stats.qmoy_summer_positive.values}"
)

Fraction of simulations with a positive change (maximum streamflow): [0.5        0.64285714]
Fraction of simulations with a positive change (minimum summer streamflow): 0.21428571428571427
Fraction of simulations with a positive change (mean summer streamflow): 0.5714285714285714

Use Case Example¶

Identification of a watershed and its characteristics¶

Collection of observed data¶

Meteorological data¶

Hydrometric data¶

Preparation and calibration of the hydrological model (xhydro.modelling)¶

Calculation of hydroclimatological indicators¶

Non-frequential indicators¶

Frequency analysis¶

Future streamflow simulations and indicators¶

Future meteorological data¶

Future streamflow data and indicators¶

Climate change impacts¶