5. Climate change analysis of hydrological data

[1]:
# Imports
# Hide INFO-level logs
import logging
from pathlib import Path
from zipfile import ZipFile

import hvplot.xarray
import numpy as np
import pooch
import xarray as xr
import xclim

import xhydro as xh

logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)

# This notebook will use data from the 2022 edition of the Hydrological Atlas of Southern Quebec, which can be accessed from the xhydro-testdata repository.
# They cover 2 stations: ABIT00057 and ABIT00058
GITHUB_URL = "https://github.com/hydrologie/xhydro-testdata"
BRANCH_OR_COMMIT_HASH = "main"

# Streamflow file (1 file - Hydrotel driven by BCC-CSM-1.1(m))
streamflow_file = pooch.retrieve(
    url=f"{GITHUB_URL}/raw/{BRANCH_OR_COMMIT_HASH}/data/cc_indicators/streamflow_BCC-CSM1.1-m_rcp45.nc",
    known_hash="md5:0ac83a4ee9dceecda68ac1ee542f50de",
)

# Reference QMOYAN (6 platforms)
ref_zip = pooch.retrieve(
    url=f"{GITHUB_URL}/raw/{BRANCH_OR_COMMIT_HASH}/data/cc_indicators/reference.zip",
    known_hash="md5:192544f3a081375a81d423e08038d32a",
)
directory_to_extract_to = Path(ref_zip).parent
with ZipFile(ref_zip, "r") as ziploc:
    ziploc.extractall(directory_to_extract_to)
    files = ziploc.namelist()
reference_files = [directory_to_extract_to / f for f in files]

# QMOYAN deltas (63 simulations x 6 platforms)
deltas_zip = pooch.retrieve(
    url=f"{GITHUB_URL}/raw/{BRANCH_OR_COMMIT_HASH}/data/cc_indicators/deltas.zip",
    known_hash="md5:ce6371e073e5324f9ade385c1c03e7eb",
)
directory_to_extract_to = Path(deltas_zip).parent
with ZipFile(deltas_zip, "r") as ziploc:
    ziploc.extractall(directory_to_extract_to)
    files = ziploc.namelist()
deltas_files = [directory_to_extract_to / f for f in files]
/home/docs/checkouts/readthedocs.org/user_builds/xhydro/conda/latest/lib/python3.12/site-packages/clisops/core/regrid.py:42: UserWarning: xarray version >= 2023.03.0 is not supported for regridding operations with cf-time indexed arrays. Please use xarray version < 2023.03.0. For more information, see: https://github.com/pydata/xarray/issues/7794.
  warnings.warn(
ERROR 1: PROJ: proj_create_from_database: Open of /home/docs/checkouts/readthedocs.org/user_builds/xhydro/conda/latest/share/proj failed

While there is a huge variety of analyses that could be done to assess the impacts of climate change on hydrology, this notebook will go through some of the most common steps:

  • Computing a list of relevant indicators over climatological periods

  • Computing future deltas

  • Computing ensemble statistics to assess future changes

INFO

Multiple functions in xh.indicators and xh.cc have been leveraged from the xscen library and made accessible to xhydro users. For more information on these function, it is recommended to look at:

5.1. Computing hydrological indicators over a given time period

[2]:
# The file used as an example is a daily timeseries of streamflow data generated from the Hydrotel hydrological model
# driven by bias-adjusted data from the BCC-CSM-1.1(m) climatological model (RCP4.5), from 1950 to 2100.
# For this example, the dataset covers only 2 stations.
ds = xr.open_dataset(streamflow_file)
ds.streamflow.hvplot(x="time", grid=True, widget_location="bottom", groupby="station")
[2]:

Hydrological indicators can be separated in two broad categories:

  • Frequential indicators, such as the maximum 20-year flow (Qmax20) or the minimum 2-year 7-day averaged flow in summer (Q7min2_summer). Computing these is already covered in the Local Frequency Analysis notebook notebook.

  • Non frequencial indicators, such as the average yearly flow.

Since frequential indicators have already been covered in another example, this notebook will instead look at the methodology that would be used to compute non frequential indicators using xhydro.indicators.compute_indicators. The inputs of that function are:

  • ds: the Dataset.

  • indicators: a list of indicators to compute, or the path to a YAML file containing those.

  • periods (optional): either [start, end] or list of [start, end] of continuous periods over which to compute the indicators.

INFO

Custom indicators are built by following the YAML formatting required by xclim. More information is available in the xclim documentation.

The list of Yaml IDs is available here.

[3]:
# We'll define 2 indicators to compute by using dictionaries.
#
# We minimally need to define three things under `data`:
#    1. 'base': A base indicator for the computation, identified through its Yaml ID (here, 'stats').
#    2. 'parameters': Specific parameters to use instead of the defaults.
#      - This potentially includes a 'indexer' parameter to focus on particular periods of the year.
#    3. 'input': The name of the input variable. The key here must be the variable name used by xclim (here, 'da').
#
# The 'identifier' is the label that will be given by 'xclim' to the new indicator. The 'module' can be anything.

indicators = [
    # 1st indicator: Mean annual flow
    xclim.core.indicator.Indicator.from_dict(
        data={
            "base": "stats",
            "input": {"da": "streamflow"},
            "parameters": {"op": "mean"},
        },
        identifier="QMOYAN",
        module="hydro",
    ),
    # 2nd indicator: Mean summer-fall flow
    xclim.core.indicator.Indicator.from_dict(
        data={
            "base": "stats",
            "input": {"da": "streamflow"},
            "parameters": {"op": "mean", "indexer": {"month": [6, 7, 8, 9, 10, 11]}},
        },  # The indexer is used to restrict available data to the relevant months only
        identifier="QMOYEA",
        module="hydro",
    ),
]

# Call compute_indicators
dict_indicators = xh.indicators.compute_indicators(ds, indicators=indicators)

dict_indicators
[3]:
{'YS-JAN': <xarray.Dataset> Size: 4kB
 Dimensions:        (station: 2, time: 151)
 Coordinates:
     lat            (station) float32 8B 49.53 49.53
     lon            (station) float32 8B -77.05 -77.02
     drainage_area  (station) float32 8B 60.76 54.61
   * station        (station) <U9 72B 'ABIT00057' 'ABIT00058'
   * time           (time) datetime64[ns] 1kB 1950-01-01 ... 2100-01-01
 Data variables:
     QMOYAN         (station, time) float32 1kB nan nan nan ... 0.8027 0.743 nan
     QMOYEA         (station, time) float32 1kB nan nan nan ... 0.385 1.035
 Attributes: (12/48)
     institution:                    DEH (Direction de l'Expertise Hydrique)
     institute_id:                   DEH
     contact:                        atlas.hydroclimatique@environnement.gouv....
     date_created:                   2020-05-23
     featureType:                    timeSeries
     cdm_data_type:                  station
     ...                             ...
     cat:bias_adjust_project:        INFO-Crue-CM5
     cat:mip_era:                    CMIP5
     cat:activity:                   CMIP5
     cat:experiment:                 rcp45
     cat:member:                     r1i1p1
     cat:bias_adjust_institution:    OURANOS}
[4]:
dict_indicators["YS-JAN"].QMOYAN.hvplot(
    x="time", grid=True, widget_location="bottom", groupby="station"
)
[4]:

Since indicators could be output at varying frequencies, compute_indicators will return a dictionary where the keys are the output frequencies. In this example, we only have one key: AS-JAN (annual data starting in January). The keys follow the pandas nomenclature.

The next step is to obtain averages over climatological periods. The xh.cc.climatological_op function can be called for this purpose. The inputs of that function are:

  • ds: Dataset to use for the computation.

  • op: Operation to perform over time. While other operations are technically possible, the following are recommended and tested: [‘max’, ‘mean’, ‘median’, ‘min’, ‘std’, ‘sum’, ‘var’, ‘linregress’].

  • window (optional): Number of years to use for the rolling operation. If None, all the available data will be used.

  • min_periods (optional): For the rolling operation, minimum number of years required for a value to be computed.

  • stride: Stride (in years) at which to provide an output from the rolling window operation.

  • periods (optional): Either [start, end] or list of [start, end] of continuous periods to be considered.

  • rename_variables: If True, ‘clim{op}’ will be added to variable names.

  • horizons_as_dim: If True, the output will have ‘horizon’ and the frequency as ‘month’, ‘season’ or ‘year’ as dimensions and coordinates.

[5]:
# Define the periods using a list of lists
periods = [[1981, 2010], [2011, 2040], [2041, 2070], [2071, 2100]]
min_periods = 29  # This is an example of a model where the data stops in 2099, so we can use 'min_periods' to still obtain a value for the last period

# Call climatological_op. Here we don't need 'time' anymore, so we can use horizons_as_dim=True
ds_avg = xh.cc.climatological_op(
    dict_indicators["YS-JAN"],
    op="mean",
    periods=periods,
    min_periods=min_periods,
    horizons_as_dim=True,
    rename_variables=False,
).drop_vars(["time"])
ds_avg
[5]:
<xarray.Dataset> Size: 304B
Dimensions:        (horizon: 4, station: 2)
Coordinates:
    lat            (station) float32 8B 49.53 49.53
    lon            (station) float32 8B -77.05 -77.02
    drainage_area  (station) float32 8B 60.76 54.61
  * station        (station) <U9 72B 'ABIT00057' 'ABIT00058'
  * horizon        (horizon) <U9 144B '1981-2010' '2011-2040' ... '2071-2100'
Data variables:
    QMOYAN         (horizon, station) float32 32B 0.8968 0.8041 ... 0.7745
    QMOYEA         (horizon, station) float32 32B 0.9335 0.8379 ... 0.6966
Attributes: (12/48)
    institution:                    DEH (Direction de l'Expertise Hydrique)
    institute_id:                   DEH
    contact:                        atlas.hydroclimatique@environnement.gouv....
    date_created:                   2020-05-23
    featureType:                    timeSeries
    cdm_data_type:                  station
    ...                             ...
    cat:bias_adjust_project:        INFO-Crue-CM5
    cat:mip_era:                    CMIP5
    cat:activity:                   CMIP5
    cat:experiment:                 rcp45
    cat:member:                     r1i1p1
    cat:bias_adjust_institution:    OURANOS

Computing deltas is then as easy as calling xh.cc.compute_deltas. The inputs of that function are:

  • ds: Dataset to use for the computation.

  • reference_horizon: Either a YYYY-YYYY string corresponding to the ‘horizon’ coordinate of the reference period, or a xr.Dataset containing the climatological mean.

  • kind: [‘+’, ‘/’, ‘%’] Whether to provide absolute, relative, or percentage deltas. Can also be a dictionary separated per variable name.

[6]:
# Here, we'll use a string from the 'horizon' dimension.
reference_horizon = "1981-2010"
kind = "%"

ds_deltas = xh.cc.compute_deltas(
    ds_avg, reference_horizon=reference_horizon, kind=kind, rename_variables=False
)
ds_deltas
[6]:
<xarray.Dataset> Size: 304B
Dimensions:        (station: 2, horizon: 4)
Coordinates:
    lat            (station) float32 8B 49.53 49.53
    lon            (station) float32 8B -77.05 -77.02
    drainage_area  (station) float32 8B 60.76 54.61
  * station        (station) <U9 72B 'ABIT00057' 'ABIT00058'
  * horizon        (horizon) <U9 144B '1981-2010' '2011-2040' ... '2071-2100'
Data variables:
    QMOYAN         (horizon, station) float32 32B 0.0 0.0 4.273 ... -3.64 -3.678
    QMOYEA         (horizon, station) float32 32B 0.0 0.0 ... -16.67 -16.86
Attributes: (12/48)
    institution:                    DEH (Direction de l'Expertise Hydrique)
    institute_id:                   DEH
    contact:                        atlas.hydroclimatique@environnement.gouv....
    date_created:                   2020-05-23
    featureType:                    timeSeries
    cdm_data_type:                  station
    ...                             ...
    cat:bias_adjust_project:        INFO-Crue-CM5
    cat:mip_era:                    CMIP5
    cat:activity:                   CMIP5
    cat:experiment:                 rcp45
    cat:member:                     r1i1p1
    cat:bias_adjust_institution:    OURANOS
[7]:
# Show the results as Dataframes
print("30-year averages")
display(ds_avg.QMOYAN.isel(station=0).to_dataframe())
print("Deltas")
display(ds_deltas.QMOYAN.isel(station=0).to_dataframe())
30-year averages
lat lon drainage_area station QMOYAN
horizon
1981-2010 49.529999 -77.050003 60.759998 ABIT00057 0.896769
2011-2040 49.529999 -77.050003 60.759998 ABIT00057 0.935091
2041-2070 49.529999 -77.050003 60.759998 ABIT00057 0.944270
2071-2100 49.529999 -77.050003 60.759998 ABIT00057 0.864129
Deltas
lat lon drainage_area station QMOYAN
horizon
1981-2010 49.529999 -77.050003 60.759998 ABIT00057 0.000000
2011-2040 49.529999 -77.050003 60.759998 ABIT00057 4.273284
2041-2070 49.529999 -77.050003 60.759998 ABIT00057 5.296827
2071-2100 49.529999 -77.050003 60.759998 ABIT00057 -3.639797

5.2. Ensemble statistics

[8]:
# To save time, let's open pre-computed deltas for the RCP4.5 simulations used in the 2022 Hydroclimatic Atlas
ds_dict_deltas = {}
for f in deltas_files:
    id = Path(f).stem
    ds_dict_deltas[id] = xr.open_dataset(f)

It is a good practice to use multiple climate models to perform climate change analyses, especially since the impacts on the hydrological cycle can be non linear. Once multiple hydrological simulations have been run and are ready to be analysed, xh.cc.ensemble_stats can be used to call a variety of functions available in xclim.ensemble, such as for getting ensemble quantiles or the agreement on the sign of the change.

5.2.1. Weighting simulations

If the ensemble of climate models is heterogeneous, for example if a given climate model has provided more simulations, it is recommended to weight the results accordingly. While this is not currently available through xhydro, xscen.generate_weights can create a first approximation of the weights to use, based on available metadata.

The following attributes are required for the function to work:

  • ‘cat:source’ in all datasets

  • ‘cat:driving_model’ in regional climate models

  • ‘cat:institution’ in all datasets if independence_level=’institution’

  • ‘cat:experiment’ in all datasets if split_experiments=True

That function has three possible independence levels:

  • model: 1 Model - 1 Vote

  • GCM: 1 GCM - 1 Vote

  • institution: 1 institution - 1 Vote

[9]:
import xscen

independence_level = "model"  # 1 Model - 1 Vote

weights = xscen.generate_weights(ds_dict_deltas, independence_level=independence_level)

# Show the results. We multiply by 6 for the showcase here simply because there are 6 hydrological platforms in the results.
weights.where(weights.realization.str.contains("LN24HA"), drop=True) * 6
[9]:
<xarray.DataArray (realization: 63)> Size: 504B
array([1.        , 1.        , 1.        , 0.2       , 0.2       ,
       0.2       , 0.2       , 0.2       , 1.        , 1.        ,
       1.        , 1.        , 1.        , 0.1       , 0.1       ,
       0.1       , 0.1       , 0.1       , 0.1       , 0.1       ,
       0.1       , 0.1       , 0.1       , 1.        , 0.25      ,
       0.25      , 0.25      , 0.25      , 1.        , 1.        ,
       1.        , 0.33333333, 0.33333333, 0.33333333, 1.        ,
       1.        , 0.33333333, 0.33333333, 0.33333333, 0.33333333,
       0.33333333, 0.33333333, 1.        , 1.        , 0.5       ,
       0.5       , 1.        , 0.33333333, 0.33333333, 0.33333333,
       1.        , 1.        , 0.5       , 0.5       , 0.5       ,
       1.        , 0.5       , 1.        , 1.        , 1.        ,
       1.        , 1.        , 1.        ])
Coordinates:
  * realization  (realization) <U112 28kB 'AtlasHydro2022_HYDROTEL_LN24HA_INF...

5.2.2. Use Case #1: Deterministic reference data

In most cases, you’ll likely have deterministic data for the reference period, meaning that for a given location, the 30-year average for the indicator is a single value.

[10]:
# The Hydrological Portrait produces probabilistic estimates, but we'll take the 50th percentile to fake deterministic data
ref = xr.open_dataset(reference_files[0]).sel(percentile=50).drop_vars("percentile")

Multiple methodologies exist on how to combine the information of the observed and simulated data. Due to biases that may remain in the climate simulations even after bias adjustment and affect the hydrological modelling, we’ll use a perturbation technique. This is especially relevant in hydrology with regards to non linear interactions between the climate and hydrological indicators.

The perturbation technique consists in computing ensemble percentiles on the deltas, then apply them on the reference dataset.For this example, we’ll compute the 10th, 25th, 50th, 75th, and 90th percentiles of the ensemble, as well as the agreement on the sign of change, using xh.cc.ensemble_stats. The inputs of that function are:

  • datasets: List of file paths or xarray Dataset/DataArray objects to include in the ensemble. A dictionary can be passed instead of a list, in which case the keys are used as coordinates along the new realization axis.

  • statistics: dictionary of xclim.ensembles statistics to be called, with their arguments.

  • weights (optional): Weights to apply along the ‘realization’ dimension.

[11]:
# Statistics to compute
statistics = {
    "ensemble_percentiles": {"values": [10, 25, 50, 75, 90], "split": False},
    "robustness_fractions": {"test": None},
}  # Robustness fractions is the function that provides the agreement between models.

# Here, we call ensemble_stats on the dictionary deltas, since this is the information that we want to extrapolate.
# If relevant, weights are added at this step
ens_stats = xh.cc.ensemble_stats(ds_dict_deltas, statistics, weights=weights)
[12]:
# Additional statistics not explicitly supported by ensemble_stats
from xclim.ensembles import robustness_categories

# Interquartile range
ens_stats["QMOYAN_iqr"] = ens_stats["QMOYAN"].sel(percentiles=75) - ens_stats[
    "QMOYAN"
].sel(percentiles=25)

# Categories of agreement for the sign of change. This follows the Advanced IPCC Atlas categories.
# See the Cross-Chapter Box 1 for reference: https://www.cambridge.org/core/books/climate-change-2021-the-physical-science-basis/atlas/24E1C016DBBE4725BDFBC343695DE7DB
# For thresholds and ops, the first entry is related to the significance test, while the 2nd is related to the percentage of simulations that see a positive delta.
# For example, "Agreement towards increase" is met if more than 66% of simulations see a significant change AND 80% of simulations see a positive change.
categories = [
    "Agreement towards increase",
    "Agreement towards decrease",
    "Conflicting signals",
    "No change or robust signal",
]
thresholds = [[0.66, 0.8], [0.66, 0.2], [0.66, 0.8], [0.66, np.nan]]
ops = [[">=", ">="], [">=", "<="], [">=", "<"], ["<", None]]

ens_stats["QMOYAN_robustness_categories"] = robustness_categories(
    changed_or_fractions=ens_stats["QMOYAN_changed"],
    agree=ens_stats["QMOYAN_positive"],
    categories=categories,
    thresholds=thresholds,
    ops=ops,
)

# The future values for QMOYAN can be obtained by multiplying the reference indicator with the percentiles of the ensemble deltas
ens_stats["QMOYAN_projected"] = ref.QMOYAN * (1 + ens_stats.QMOYAN / 100)
[13]:
ens_stats
[13]:
<xarray.Dataset> Size: 2kB
Dimensions:                       (station: 2, horizon: 3, percentiles: 5)
Coordinates:
    lat                           (station) float32 8B 49.53 49.53
    lon                           (station) float32 8B -77.05 -77.02
    drainage_area                 (station) float32 8B 60.76 54.61
  * station                       (station) <U9 72B 'ABIT00057' 'ABIT00058'
  * horizon                       (horizon) <U9 108B '2021-2051' ... '2071-2100'
  * percentiles                   (percentiles) int64 40B 10 25 50 75 90
    realization                   <U86 344B ...
Data variables:
    QMOYAN                        (percentiles, station, horizon) float64 240B ...
    QMOYAN_changed                (station, horizon) float64 48B 1.0 1.0 ... 1.0
    QMOYAN_positive               (station, horizon) float64 48B 0.689 ... 0....
    QMOYAN_changed_positive       (station, horizon) float64 48B 0.689 ... 0....
    QMOYAN_negative               (station, horizon) float64 48B 0.311 ... 0....
    QMOYAN_changed_negative       (station, horizon) float64 48B 0.311 ... 0....
    QMOYAN_valid                  (station, horizon) float64 48B 0.09524 ... ...
    QMOYAN_agree                  (station, horizon) float64 48B 0.689 ... 0....
    QMOYAN_iqr                    (station, horizon) float64 48B 6.981 ... 9.842
    QMOYAN_robustness_categories  (station, horizon) int64 48B 3 3 3 3 3 3
    QMOYAN_projected              (station, percentiles, horizon) float64 240B ...
Attributes: (12/37)
    title:                          Simulation Atlas Hydroclimatique 2020
    summary:                        Derive relative des indicateurs hydrologi...
    institution:                    DEH (Direction de l'Expertise Hydrique)
    institute_id:                   DEH
    contact:                        atlas.hydroclimatique@environnement.gouv....
    experiment:                     Hydrological simulation
    ...                             ...
    cat:bias_adjust_project:        INFO-Crue-CM5
    cat:mip_era:                    CMIP5
    cat:experiment:                 rcp45
    cat:bias_adjust_institution:    OURANOS
    cat:id:                         INFO-Crue-CM5_CMIP5_rcp45
    ensemble_size:                  378

5.2.3. Use Case #2: Probabilistic reference data

This method follows a similar approach to Use Case #1, but for a case like the Hydrological Atlas of Southern Quebec, where the hydrological indicators computed for the historical period are represented by a probability density function (PDF), rather than a discrete value. This means that the ensemble percentiles can’t simply be multiplied by the reference value.

INFO

Note that the percentiles in ref are not the interannual variability, but rather the uncertainty related, for example, to hydrological modelling or the quality of the input data. At this stage, the temporal average should already have been done.

[14]:
ref = xr.open_mfdataset(reference_files, combine="nested", concat_dim="platform")

# Rather than a single value, QMOYAN is represented by 21 percentiles that try to represent the uncertainty surrounding this statistics.
# Like for the future simulations, we also have 6 hydrological platforms to take into account.
ref
[14]:
<xarray.Dataset> Size: 3kB
Dimensions:        (platform: 6, station: 2, percentile: 21)
Coordinates:
    lat            (station) float32 8B dask.array<chunksize=(2,), meta=np.ndarray>
    lon            (station) float32 8B dask.array<chunksize=(2,), meta=np.ndarray>
  * percentile     (percentile) float32 84B 1.0 5.0 10.0 15.0 ... 90.0 95.0 99.0
    drainage_area  (station) float32 8B dask.array<chunksize=(2,), meta=np.ndarray>
  * station        (station) <U9 72B 'ABIT00057' 'ABIT00058'
    realization    (platform) <U86 2kB 'A20_HYDREP_QCMERI_XXX_DEBITJ_HIS_XXX_...
Dimensions without coordinates: platform
Data variables:
    QMOYAN         (platform, station, percentile) float32 1kB dask.array<chunksize=(1, 2, 21), meta=np.ndarray>
Attributes: (12/29)
    title:                          Analyse des indicateurs hydrologiques aux...
    summary:                        Hydrological indicator analysis
    institution:                    DEH (Direction de l'Expertise Hydrique)
    institute_id:                   DEH
    contact:                        edouard.mailhot@environnement.gouv.qc.ca
    date_created:                   2020-11-01
    ...                             ...
    cat:frequency:                  fx
    cat:variable:                   QMOYAN
    cat:type:                       reconstruction-hydro
    cat:processing_level:           indicators
    cat:xrfreq:                     fx
    cat:id:                         PortraitHydro_HYDROTEL_LN24HA_GovQc_GCQ_QC
[15]:
# This can also be represented as a cumulative distribution function (CDF)
import matplotlib.pyplot as plt

for platform in ref.platform:
    plt.plot(
        ref.QMOYAN.isel(station=0).sel(platform=platform),
        ref.QMOYAN.percentile / 100,
        "grey",
    )
    plt.xlabel("Mean annual flow (m³/s)")
    plt.ylabel("Probability")
    plt.title("CDF for QMOYAN @ ABIT00057 \nEach line is an hydrological platform")
../_images/notebooks_climate_change_25_0.png

Because of their probabilistic nature, the historical reference values can’t easily be combined to the future deltas. The sampled_indicators function has been created to circumvent this issue. That function will:

  1. Sample ‘n’ values from the historical distribution, weighting the percentiles by their associated coverage.

  2. Sample ‘n’ values from the delta distribution, using the provided weights.

  3. Create the future distribution by applying the sampled deltas to the sampled historical distribution, element-wise.

  4. Compute the percentiles of the future distribution.

The inputs of that function are:

  • ds: Dataset containing the historical indicators. The indicators are expected to be represented by a distribution of pre-computed percentiles.

  • deltas: Dataset containing the future deltas to apply to the historical indicators.

  • delta_type: Type of delta provided. Must be one of [‘absolute’, ‘percentage’].

  • ds_weights (optional): Weights to use when sampling the historical indicators, for dimensions other than ‘percentile’/’quantile’. Dimensions not present in this Dataset, or if None, will be sampled uniformly unless they are shared with ‘deltas’.

  • delta_weights (optional): Weights to use when sampling the deltas, such as along the ‘realization’ dimension. Dimensions not present in this Dataset, or if None, will be sampled uniformly unless they are shared with ‘ds’.

  • n: Number of samples to generate.

  • seed (optional): Seed to use for the random number generator.

  • return_dist: Whether to return the full distributions (ds, deltas, fut) or only the percentiles.

[16]:
n = 300000
deltas = xclim.ensembles.create_ensemble(
    ds_dict_deltas
)  # The function expects an xarray object. This xclim function can be used to easily create the required input.

# Compute the perturbed indicators
fut_pct, hist_dist, delta_dist, fut_dist = xh.cc.sampled_indicators(
    ref,
    deltas=deltas,
    delta_type="percentage",
    delta_weights=weights,
    n=n,
    seed=0,
    return_dist=True,
)

fut_pct
[16]:
<xarray.Dataset> Size: 1kB
Dimensions:        (station: 2, horizon: 3, percentile: 21)
Coordinates:
    lat            (station) float32 8B 49.53 49.53
    lon            (station) float32 8B -77.05 -77.02
    drainage_area  (station) float32 8B 60.76 54.61
  * station        (station) <U9 72B 'ABIT00057' 'ABIT00058'
  * horizon        (horizon) <U9 108B '2021-2051' '2041-2071' '2071-2100'
  * percentile     (percentile) float32 84B 1.0 5.0 10.0 15.0 ... 90.0 95.0 99.0
Data variables:
    QMOYAN         (percentile, station, horizon) float64 1kB dask.array<chunksize=(21, 2, 3), meta=np.ndarray>
[17]:
# First, let's show how the historical distribution was sampled and reconstructed


def _make_cdf(ds, bins):
    count, bins_count = np.histogram(ds.QMOYAN.isel(station=0), bins=bins)
    pdf = count / sum(count)
    return bins_count, np.cumsum(pdf)


# Barplot
plt.subplot(2, 1, 1)
uniquen = np.unique(hist_dist.QMOYAN.isel(station=0), return_counts=True)
plt.bar(uniquen[0], uniquen[1], width=0.01, color="k")
plt.ylabel("Number of instances")
plt.title("Sampling within the historical distribution")

# CDF
plt.subplot(2, 1, 2)
for i, platform in enumerate(ref.platform):
    plt.plot(
        ref.QMOYAN.isel(station=0).sel(platform=platform),
        ref.percentile / 100,
        "grey",
        label="CDFs from the percentiles" if i == 0 else None,
    )
bc, c = _make_cdf(hist_dist, bins=50)
plt.plot(bc[1:], c, "r", label=f"Sampled historical CDF (n={n})", linewidth=3)
plt.ylabel("Probability")
plt.xlabel("QMOYAN (m³/s)")
plt.legend()

plt.tight_layout()
../_images/notebooks_climate_change_28_0.png
[18]:
# Then, let's show how the deltas were sampled, for the last horizon

# Plot #3
plt.subplot(2, 1, 1)
uniquen = np.unique(delta_dist.QMOYAN.isel(station=0, horizon=-1), return_counts=True)
plt.bar(uniquen[0], uniquen[1], width=0.25, color="k")
plt.ylabel("Number of instances")
plt.title("Sampling within the historical distribution")

# Plot #2
plt.subplot(2, 1, 2)
bc, c = _make_cdf(delta_dist, bins=100)
plt.plot(bc[1:], c, "k", label=f"Sampled deltas CDF (n={n})", linewidth=3)
plt.ylabel("Probability")
plt.xlabel("Deltas (%)")
plt.legend()

plt.tight_layout()
../_images/notebooks_climate_change_29_0.png
[19]:
# The distributions can be used to quickly create boxplots
plt.boxplot(
    [
        hist_dist.QMOYAN.isel(station=0),
        fut_dist.QMOYAN.isel(station=0, horizon=0),
        fut_dist.QMOYAN.isel(station=0, horizon=1),
        fut_dist.QMOYAN.isel(station=0, horizon=2),
    ],
    labels=["Historical", "2011-2040", "2041-2070", "2071-2100"],
)

plt.ylabel("Mean summer flow (m³/s)")
plt.tight_layout()
../_images/notebooks_climate_change_30_0.png
[20]:
# The same statistics as before can also be computed by using delta_dist
delta_dist = delta_dist.rename({"sample": "realization"})  # xclim compatibility
ens_stats_2 = xh.cc.ensemble_stats(delta_dist, statistics)

# Interquartile range
ens_stats_2["QMOYAN_iqr"] = ens_stats_2["QMOYAN"].sel(percentiles=75) - ens_stats_2[
    "QMOYAN"
].sel(percentiles=25)

# Categories of agreement on the sign of change
ens_stats_2["QMOYAN_robustness_categories"] = robustness_categories(
    changed_or_fractions=ens_stats_2["QMOYAN_changed"],
    agree=ens_stats_2["QMOYAN_positive"],
    categories=categories,
    thresholds=thresholds,
    ops=ops,
)

ens_stats_2
[20]:
<xarray.Dataset> Size: 772B
Dimensions:                       (station: 2, horizon: 3, percentiles: 5)
Coordinates:
    lat                           (station) float32 8B dask.array<chunksize=(2,), meta=np.ndarray>
    lon                           (station) float32 8B dask.array<chunksize=(2,), meta=np.ndarray>
    drainage_area                 (station) float32 8B dask.array<chunksize=(2,), meta=np.ndarray>
  * station                       (station) <U9 72B 'ABIT00057' 'ABIT00058'
  * horizon                       (horizon) <U9 108B '2021-2051' ... '2071-2100'
  * percentiles                   (percentiles) int64 40B 10 25 50 75 90
Data variables:
    QMOYAN                        (station, horizon, percentiles) float32 120B dask.array<chunksize=(2, 3, 5), meta=np.ndarray>
    QMOYAN_changed                (station, horizon) float64 48B dask.array<chunksize=(2, 3), meta=np.ndarray>
    QMOYAN_positive               (station, horizon) float64 48B dask.array<chunksize=(2, 3), meta=np.ndarray>
    QMOYAN_changed_positive       (station, horizon) float64 48B dask.array<chunksize=(2, 3), meta=np.ndarray>
    QMOYAN_negative               (station, horizon) float64 48B dask.array<chunksize=(2, 3), meta=np.ndarray>
    QMOYAN_changed_negative       (station, horizon) float64 48B dask.array<chunksize=(2, 3), meta=np.ndarray>
    QMOYAN_valid                  (station, horizon) float64 48B dask.array<chunksize=(2, 3), meta=np.ndarray>
    QMOYAN_agree                  (station, horizon) float64 48B dask.array<chunksize=(2, 3), meta=np.ndarray>
    QMOYAN_iqr                    (station, horizon) float32 24B dask.array<chunksize=(2, 3), meta=np.ndarray>
    QMOYAN_robustness_categories  (station, horizon) int64 48B dask.array<chunksize=(2, 3), meta=np.ndarray>
Attributes: (12/45)
    title:                          Simulation Atlas Hydroclimatique 2020
    summary:                        Derive relative des indicateurs hydrologi...
    institution:                    DEH (Direction de l'Expertise Hydrique)
    institute_id:                   DEH
    contact:                        atlas.hydroclimatique@environnement.gouv....
    date_created:                   2020-11-23
    ...                             ...
    cat:mip_era:                    CMIP5
    cat:activity:                   CMIP5
    cat:experiment:                 rcp45
    cat:member:                     r1i1p1
    cat:bias_adjust_institution:    OURANOS
    ensemble_size:                  1